A capable multimedia content discovery platform based on visual content analysis and intelligent data enrichment

A capable multimedia content discovery platform based on visual content analysis and intelligent... Multimed Tools Appl (2018) 77:14077–14091 DOI 10.1007/s11042-017-5014-1 A capable multimedia content discovery platform based on visual content analysis and intelligent data enrichment 1 2 3 Remigiusz Baran & Andrzej Dziech & Andrzej Zeja Received: 27 September 2016 /Revised: 30 November 2016 /Accepted: 12 December 2016 / Published online: 28 July 2017 The Author(s) 2017. This article is an open access publication Abstract A new capable content discovery platform based on multimedia data enrich- ment is presented in this paper. The platform, known as the IMCOP system, refers to the concept of intelligent discovery and delivery of multimedia content. Relevant state- of-the-art solutions are described in detail in the background section. The overall architecture and the main components of the IMCOP system are presented next. An original concept of Complex Multimedia Objects which extend the MPEG-7 standard to hold the processed data and bind it into content related collections is introduced. Selected results of tests illustrating how the IMCOP system performs in terms of responsiveness and stability under a particular workload are reported. Finally, IMCOP’s advantages in refer- ence to other content discovery platforms are discussed and summed up. . . . Keywords Content discovery and delivery Recommender engines Visual analysis Data . . enrichment Multimedia indexing Complex multimedia objects * Remigiusz Baran r.baran@tu.kielce.pl Andrzej Dziech dziech@kt.agh.edu.pl Andrzej Zeja a.zeja@wstkt.pl Faculty of Electrical Engineering, Automatics and Computer Science, Kielce University of Technology, al. 1000-lecia P.P. 7, 25-314 Kielce, Poland AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland University of Computer Engineering and Telecommunications, ul. Toporowskiego 98, 25-553 Kielce, Poland 14078 Multimed Tools Appl (2018) 77:14077–14091 1 Introduction The volume of data stored in, downloaded from and shared via the Internet is increasing rapidly. As reported in [17] the storage capacity of the Internet Bis doubling in size every two years^.Thisis chiefly due to the growing number of devices connected to the Internet, including mobile phones, computers and tablets as well as other types of smart devices, Bfrom minuscule chips to mammoth machines that use wireless technology to talk to each other (and to us)^ [16]. This hard to imagine source of big data, mainly represented by images and video sequences (the most rapidly-growing types of media), provides incredible but real opportunities for building heterogeneous intelligence systems [10]. The first issues which need to be addressed are content-oriented data analysis, enrichment, retrieval and recommendation. A new content discovery platform, known as the IMCOP system, which uses these and other technologies, is presented in this paper. The IMCOP system is the result of an international collaboration within the framework of the second joint Polish-Israeli R&D project titled "Intelligent Multimedia System for Web and IPTV Archiving. Digital Analysis and Documentation of Multimedia Content". According to initial assumptions, the capabilities of the IMCOP system should include multimedia data aggregation, analysis and processing. The implementation of these processes must be extremely flexible to fulfill the requirements of different IMCOP applications. According to customer needs, the IMCOP system should be able to perform different kinds of processing and content analysis of multimedia data aggregated using various Internet data sources. Content analysis should enrich the data – extend the metadata list – to confirm the relationship between the data and the subject matter and find its content-related connections with other data. Thanks to this flexibility, the IMCOP system should be able to address customer demands regarding relevant content and ways of presenting it to their users. The system presented in this paper meets all these initial assumptions and expectations of the IMCOP project. For example, various types of aggregated data comprising text, still images, film footage and video sequences are considered. Mechanisms for extensive analysis of all types of aggregated data, including detection and extraction of various features and different classification approaches, were also used. A range of descriptive metadata is extracted in this way to enrich the aggregated data and give the foundation for finding connections. In addition, a flexible and efficient representation of data, known as Complex Multimedia Objects (CMO), maintaining the metadata and the content-related connections, is proposed. As well as these objectives, the IMCOP project has the following minor goals: & to ensure that the IMCOP system is platform-independent and capable of incorporating external services to improve its efficiency and increase its intelligent facilities, for example by removing duplicate images from the database [9], & to guarantee scalability despite vast numbers of multimedia objects processed, & to make absolutely certain that the system is legal in terms of copyright law and that the processed objects and their content are fully protected against copying, reproducing, modifying and other forms of authentication rights violation. It was also agreed between the project partners that the IMCOP system should comply with the Data Enrichment and Engagement Platform (DEEP), which was largely developed by the Israeli partner as part of their project activities. According to [21], the DEEP platform is Ba revolutionary new solution (which) resolves the complexities of content discovery, recommendation, usability and engagement all at once, and consumers already know how to use it^. All these values can be verified through the DEEP Magazines application which is its final product [11]. Multimed Tools Appl (2018) 77:14077–14091 14079 The DEEP Magazines app shows how an advanced and professionally-made end-user appli- cation of the IMCOP system could work and look. However, it should be stressed that the platform’s capabilities are significantly broader than those of the DEEP platform. The DEEP platform, according to the DEEP Magazines app requirements, is designated to collect, analyze and select images and short descriptive information (e.g. news) which stay in relation to Bthe hottest stories about celebs, actors, movies and TV shows^ [23]. In contrast, the IMCOP system can be categorized as a comprehensive and versatile content discovery and delivery platform which is capable of addressing topics of any kind. The IMCOP platform also has other functionalities. It can be applied for instance as a multimedia indexing (labeling) application, as e.g. Imagga (http://imagga.com/), or as a reverse search engine tool, as e.g. TinEye (https://www.tineye.com/). The remaining part of this paper is organized as follows. The next section presents background materials and a literature review within the framework of content discovery platforms. The overall architecture with an insight into various categories of the IMCOP web services and the concept of Complex Multimedia Objects are introduced in the sub- sections of Section 3. Section 4 reports on the IMCOP system performance. Final conclusions and potential future improvements are presented in Section 5. 2 Background and literature review According to the Wikipedia definition [22], a content discovery platform Bis an implemented software recommendation platform which uses recommender system tools^. As defined, in turn, in [24], a content-based recommender system is a system Bthat recommend an item to a user based upon a description of the item and a profile of the user’sinterests^. In general, there are three main approaches to recommendation system design: collaborative filtering [29], content-based filtering [4] and hybrid, where the two former approaches are combined [27]. In fact, the definitions cover a range of solutions with different goals, domains and approaches using computer science techniques such as data mining, information retrieval and filtering, machine learning, artificial intelligence and so on. Google Scholar, Pubnet, CiteSeer and Web of Science are the leading scientific and academic literature search and recommender engines. The platforms work in two key ways. In Google Scholar, papers related to the user’s research interests (notified as Scholar Updates) are found through a statistical analysis according to Bwhat your work is about, the citation graph between articles, the fact that interests can change over time, and the authors you work with and cite^ [12]. The other way of finding relevant articles in Google Scholar is setting user alerts. Despite noticeable differences, both methods are content-based filtering approaches which use a record of the researcher’s authored papers and citations. This approach is effective for researchers who have published many papers; however, it provides poor results for others, such as graduate students. In other types of recommender engines related to academic content such drawbacks do not occur. One such tool is PubChase. BPubChase suggests articles from PubMed on the basis of a user’s publishing record, but it also learns from the articles that the user has read and stored in his or her online library […] it adds another machine-learning technique: comparing this library with other people’s collections, with the logic that people with common research interests might benefit from each others’ preferences^ [18]. Content discovery and personal recommendation have also become crucial to the global pay-TV industry due to the proliferation of choice offered by the television of today, including video-on-demand, personalized video recording, streaming services and web-based content. In 14080 Multimed Tools Appl (2018) 77:14077–14091 addition, Bthe global pay-TV market is projected to grow from more than 900 million subscribers in 2014 to 1.21 billion by 2022^ [19]. Such a major and interactive marketplace is an important challenge for operators, who are forced to look for new ways of satisfying customers and holding their attention for longer. Platforms where Ba blend of sophisticated content discovery algorithms, personalized recommendations are offered to drive additional purchases,^ as in COMPASS from Viacess- Orca, are appropriate solutions within this context. As reported in [20], collaborative filtering, external rating and operator’s promotions are the main algorithms driving the COMPASS engines. Additionally, an algorithm known as related content, where Bsimilar item recommen- dations based on content metadata (actors, directors, years, countries, etc.) and keywords^ are proposed, is also essential. The Kannuu content discovery platform (http://www.kannuu.com/) is another example of a successful and innovative integration of various recommendation layers. In the Kannuu system, similar title recommendations are achieved using a proprietary metadata analysis which aims to find connections between titles. Similarly to the COMPASS engine, metadata analysis performed in the Kannuu system comprises (for example) genre, cast, director, date and expanded keyword lists. A similar approach, based on collaborative filtering where profiling and behavioral data are used, is also applied as the social recommendations layer. This is the key difference between the Kannuu and COMPASS systems, at the social recommendations layer is integrated with social networking sites. This means Kannuu users can provide and receive recommendations from people in their social media circles. In turn, ratings, in particular personal, and streaming history are the only key algorithms in Netflix – one of the most successful providers of streaming media and video on demand in the world. Since personal ratings and past viewing habits are insufficient when used for a single user, Netflix combines ratings of all its users with similar tastes. Personal ratings are a type of personal data collected by TV operators. However, personal data transferred to providers must be limited and reasonably selected to prevent eavesdropping such as was found in Samsung [15]. Therefore, platforms where algorithms from various levels are combined, as in the COMPASS or Kannuu recommender engines, are more efficient (their recommendations are more relevant) and more secure. Another market where content discovery plays an important role is online publishing, mainly of blogs, podcasts, websites, etc. Outbrain, Taboola and Google AdSense are the leading platforms here. They recommend content via online media, from the largest and most widely respected such as BPeople^ (in the case of Outbrain) and BUSA Today^ (in the case of Taboola), to those with limited audiences such as BBuildEazy^ (in the case of AdSense). Content discovery algorithms applied in these cases are complex and sophisticated. For example, according to Outbrain, its recommender engine is based on more than 50 algorithms which are run in parallel to determine a set of candidate recommendations. These algorithms are categorized into four main groups: contextual, behavioral, personal and popular [14]. Contextual algorithms aim to find relationships (identify connections) between recommended content and that found on a target website. The Solr search engine [13]isusedat thisstage. The aim of behavioral algorithms is to learn the statistical behaviors of users. This can be achieved by simply signing the most visited or most rated documents on a site or by applying collaborative filtering methods. The latter approach lets users access other content Bliked^ by people. In turn, personal algorithms learn user properties and history. Personalization is applied in Outbrain using cookies. Recommendations of each type, given independently by behavioral, contextual and personal algorithms, are then processed (using machine learning techniques) to select the most relevant ones. Multimed Tools Appl (2018) 77:14077–14091 14081 3 IMCOP platform architecture As stated in Section 1, the IMCOP system is a capable content discovery platform. As such, it can use user metadata to discover customer-relevant content which can be delivered to websites, mobile devices, set-top boxes, etc. In this domain, IMCOP aspires in part to operate similarly to Outbrain and Taboola – the largest content discovery startups in Israel. IMCOP also uses specialized and sophisticated algorithms to select the most relevant content. Although IMCOP does not currently apply algorithms which could be categorized as behavioral and personal, it uses an assortment of intelligent automated multimedia content analysis algorithms from a range of computer vision techniques. As such, IMCOP goes beyond typical content discovery platforms and also serves as a data enrichment platform. The IMCOP platform is a distributed system based on Service-Oriented Architecture (SOA). IMCOP services are RESTful web services, implemented according to the Represen- tational State Transfer (REST) architectural style. As such, IMCOP services are self-contained applications with their own REST-based interface. In addition, they are fully independent from operating systems and platforms on which they are implemented and run. This means they are also scalable, fast and modifiable. All IMCOP services have been developed in Java according to the original MESCore library, which provides API, programming specifications and code examples within SDK for developers. As the MESCore library is open, IMCOP services can be freely added, edited or improved by third-party developers. This also implies that IMCOP’s capabilities can be accessed and extended by external companies and institutions interested in a partnership with the IMCOP team. The main categories of IMCOP’s services are as follows: & Metadata Enhancement Services (MES) – specialized services, mainly in multimedia data analysis and enrichment (there are also other types of MES driven services such as management and connection), & Data Aggregation Services (DAS) – mainly used for web crawlers which extract and collect data from the web as well as exchanging data with the IMCOP database, known as the Data Repository (DR). Regardless of the category, the IMCOP services can be run in heterogeneous environments, e.g. MS Windows and Linux (×32 or ×64) using virtual machine applications. In other words, there is no need to unify the IMCOP software components. This makes them easy to implement and integrate with the rest of the system. According to the MESCore library, IMCOP services are in fact wrapper functions which call other specialized applications or processes. These applications or processes can be written and compiled, independently of each other, on any platform, e.g. Java, .NET, native C++, Phyton, etc., and then used directly in target web applications. An overall architecture of the IMCOP system is depicted in Fig. 1. 3.1 Metadata enhancement services There are many different kinds of Metadata Enhancement Services (MES) in the IMCOP system. The majority focus on content discovery and metadata enhancement. MES Services of this kind are dedicated to perform selected operations from the scope of text and signal (image and video) processing. Text processing operations mainly include detection and localization of areas where text features are present; they also conduct semantically-organized and dictionary-driven text recognition. The list of image, frame and video processing applications is as follows: 14082 Multimed Tools Appl (2018) 77:14077–14091 IMCOP platform IMCOP MANAGER 4U Third-party MES services Tomcat+MESCore /CC++ java/.net/python windows/linux Fig. 1 The overall IMCOP platform architecture & image transforms for detecting, extracting and calculating descriptors for various types of local features, e.g. SIFT, SURF [2], MSER, Piecewise-linear [1], CEDD [7], CLD, EHD and SCD [5], FCHT [8], & algorithms for detecting and recognizing (e.g. using local feature descriptors listed above) different kinds of objects, content and scenes, including faces, bodies, nudity, dress color, sky, images with the Bokeh effect, landscapes or pictures with man-made structures (buildings, monuments, etc.), logos and visual watermarks, etc., & procedures for estimating similarities between images or their selected regions of interest [9] and evaluating selected image and video quality metrics, including noise, blur, blockiness, slicing, etc. (acquired from [6, 25]), & compression of images using selected compression schemes [28], & algorithms designated to analyze and classify faces according to various traits, e.g. profile, presence of red eye, smile and facial hair (to identify unshaven faces), etc., & text, speech and face recognition processes applied mainly in order to index film footage and video sequences. Descriptors, labels and other values returned by the above applications during data analysis enrich the data. They stand for descriptive metadata added to other descriptive information about data, as e.g. keywords and URIs (in fact, a common scenario is to gather only the URIs of the processed data instead of the data itself) stored in data representation objects (CMO). For clarifi- cation, it should be noted that regardless of how metadata is added automatically during data analysis provided by MES services, keywords and URIs are affixed by DAS services during data aggregation. However, aside from automatic data aggregation provided by DAS services, there are also other methods of entering the data into the IMCOP system. For example, data can be entered (individuallyoringroups) usingthe GUI of the IMCOP system. GUI and selected MES services of the IMCOP system can be accessed using the following link: https://imcop.pl/. An example view of the IMCOP GUI while sample data is being entered and the results (labels) provided by selected IMCOP services after it has been processed are depicted in Figs. 2 and 3, respectively. Other categories of highly specialized MES services are also incorporated in the IMCOP system. Some, known as Management Services (MS), are control different activities of the other services and manage data-interchange processes inside the system, including Data Repository (DR). Watermark Retrieval and Embedding Services (WRES) are activated in Multimed Tools Appl (2018) 77:14077–14091 14083 https://pl.wikipedia.org/wiki/T ilda_Swinton#/media/File:Tild a_Swinton_crop.jpg Fig. 2 The IMCOP GUI and adding a sample picture to the list of processed objects order to mark the processed data with hidden messages, known as IMCOP signatures, and to protect the data against forbidden use, manipulation and sharing with unauthorized end-users. Connection Services (CS) stand for the final but perhaps most important category of IMCOP services. The aim is to identify relationships (connections) between the processed data. In the case of images, for instance, connections are identified twofold: & by matching keywords, URIs and labels given by MES services (the Solr search engine [13]is usedatthisstage), & by analyzing numerical descriptors of selected image features to find the list of stored objects which are similar to the processed image. Fig. 3 Labels given by selected services to the sample image from Fig. 2 14084 Multimed Tools Appl (2018) 77:14077–14091 From the end-user perspective, the IMCOP system can also be seen as a provider of services in the cloud. However, unlike in standard cloud computing models, IMCOP does not run client applications. Instead, end-user requests activate IMCOP services to prepare recommendations in terms of the desired multimedia content. Other details of IMCOP’s components and their functionalities can be found in preceding articles, e.g. in [3]. 3.2 Data aggregation services There are, in general, different data sources to which particular DAS services can be addressed. Their selection has to meet end-user requirements in terms of multimedia forms and data content relevance. The DEEP-like end-users, for example, need to aggregate and process textual information, images and video sequences related to celebrities, movies and actors. Thus, sources selected for data aggregation in this case should include, for example, selected multimedia data hosting websites (e.g. Getty Images), community-curated knowledge bases and encyclopedias (e.g. Wikipedia), news providers (e.g. BBC), social networking services (e.g. Twitter), etc. According to the IMCOP platform, it currently incorporates DAS services in all the above data sources, except Getty Images which operates as a commercial photo agency. Instead, the list of IMCOP’s DAS services also includes Flickr, Foursquare (https://foursquare. com/), Allocine (http://www.allocine.fr/) and the New York Times. As DAS services have to be developed with regard to APIs, which differ from data source to data source, there is no single common model for implementing them. They also have to be implemented and configured separately because of the data source authorization requirements, data-interchange protocols and formats (e.g. XML-REST, JSON, PHP), license conditions, etc. At the end of the data aggregation process the CMO objects, which refer to data representation objects, are instantiated. With regard to the IMCOP terminology, the DAS service creates a separate and self-contained CMO object for every single aggregated data point (image, text object, etc.), which is known as a Multimedia Object (MO). 3.3 Complex multimedia objects As stated in Section 1, CMO objects are dedicated to represent multimedia data in the IMCOP system. CMO objects are content type independent, which means that all data forms processed in the IMCOP system have the same flexible and general XML representation. After instan- tiation, CMO objects are exchanged between IMCOP MES services according to different schedules. MES activities planned in these schedules depend on end-user requirements and the type of processed data. The general scheme of CMO object processing is illustrated in Fig. 4. The definition of CMO derives from the MPEG-7 multimedia content description standard. Therefore, descriptive metadata such as topic, name, age, date of birth (e.g. in the case of an actor), keywords, brief text, etc., are registered according to MPEG-7 Description Schemes [26]. SIFT, Shape Context, MSER, Piecewise-linear or any other feature descriptors, extracted by dedicated MES services, are stored according to MPEG-7 Descriptor specification. The CMO definition extends the MPEG-7 standard in some respects. The most significant extension refers to connec- tions between data and the way in which pointers to these connections are stored in CMO objects. As illustrated in Fig. 4, each CMO object present in the IMCOP system has its own Universally Unique Identifier (UUID). UUID identifiers make it possible to distinguish between particular CMO objects, regardless of IMCOP distributed architecture and despite the lack of central coordination. After instantiation by DAS services, CMO objects are passed to MES services Multimed Tools Appl (2018) 77:14077–14091 14085 Fig. 4 General scheme of CMO object instantiation and processing where they are processed. As a result, MES driven metadata is added to their properties. Next (or in the meantime – these processes can take place simultaneously) UUID identifiers of objects recognized as related are appended to the list which stands for the list of connected objects. 4 Performance analysis The IMCOP system needs to be capable of serving a large number of clients. As each client may require many multimedia objects of different content forms, scalability was a major challenge facing IMCOP designers and developers. Although some of the algorithms incorporated by IMCOP services, e.g. those responsible for text and object detection and recognition, are compu- tationally highly expensive, the IMCOP system’s ability to replicate services and to apply concurrent and parallel computing ensures that the IMCOP objectives can be put into practice. A number of load tests were conducted to verify the above. During these tests, the IMCOP system was subjected to peaks in activity reflecting the likely demands of IMCOP users. A heavy concurrent load on the system was simulated using test plans executed by JMeter applications run from the outside of the IMCOP network, as illustrated in Fig. 5. JMeter instances distributed over the IMCOP platform public network IMCOP GATEWAY JMeter 3U MASTER Fig. 5 Configuration for generating a heavy concurrent load on the IMCOP system Slave JMeter Slave JMeter instance instance Slave JMeter SlaveJMeter Slave JMeter instance inSlave stancJMeter e instance instance 14086 Multimed Tools Appl (2018) 77:14077–14091 Test plans executed on slave JMeter instances were also diversified as they implemented a range of scenarios involving various types and numbers of IMCOP services. This imitated the usual system load during DEEP magazine preparation. Selected results of such tests, related to the running time of the processes, are presented in Table 1. The tests were performed in accordance with four different test plans (TP1÷TP4) which were executed iteratively for the growing number of concur- rent user requests (N). However, changes in average duration of IMCOP responses (Δt) to executed test plans per single user request are shown instead of directly measured particular time periods (t) for clarity of presentation. Changes in average durations were calculated as follows: t −t i i−1 Δt ¼ for i ¼ 2; 3; …; 8 ð1Þ N −N i i−1 The last column of Table 1 shows the averages of the Δt values obtained for each iteration. A plot of a trend line (of an exponential regression type) showing the relationship between the averages Δt and the growing number of concurrent user requests N is depicted in Fig. 6. It is clear that the greater the number of concurrent user requests N, the smaller the changes in average duration of IMCOP responses. This is mainly due to concurrent computing and parallel processing utilized in the IMCOP platform. The majority of IMCOP tasks are performed concurrently by particular IMCOP services. For example, image analysis (results shown in Fig. 3.) is performed simultaneously by all the MES services. In turn, MES services evaluating selected image quality metrics, detecting nudity and recognizing text (if present in an image), etc., were implemented using parallel processing. There is an additional reason why the changes in average duration of IMCOP responses are smaller when the number of concurrent user requests continues to grow. This is because of load balancing which improves the distribution of processes carried out in the IMCOP system across multiple replications of MES services. The feature of the IMCOP system replicating particular services when an overload occurs and the resulting significant improvement in system performance are illustrated in Fig. 7. The plot depicted in Fig. 7 shows the relationship between the averages Δt and the growing number of concurrent user requests N, as discussed above, although related to video indexing in the case of the test plan TP5. Table 1 Changes in average durations of IMCOP responses to executed test plans (TP1÷TP4) per single user request iN TP1 TP2 TP3 TP3 Δt[s] Δt [s] Δt [s] Δt [s] Δt [s] 1 1 44.00 110.00 80.00 346.00 145.00 22 −4.00 19.00 28.00 81.00 31.00 3 15 5.46 7.92 10.31 76.54 25.06 4 30 20.00 7.20 6.93 4.60 9.68 5 70 7.73 3.90 3.75 25.23 10.15 6 100 6.36 7.00 10.93 0.70 6.25 7 150 4.94 10.02 7.36 12.00 8.58 8 200 2.42 1.64 9.56 33.60 11.81 Multimed Tools Appl (2018) 77:14077–14091 14087 Fig. 6 Trend line of the averages of changes Δt versus the number of concurrent user requests N Three of the MES services of the IMCOP system are dedicated to automatic content-based video indexing tasks. Described briefly in Section 3.1, they index audio-video sequences and film footage with regard to: & speechrv – speech transcripts obtained using speech recognition techniques, & textdrv – text transcripts obtained using text detection methods and recognized using optical character recognition, & facerv – actors distinguished using face detection and classification methods. Algorithms used by these services are more complex and thus more time consuming than those used in test plans TP1÷TP4. To protect the IMCOP system against overload, which may occur when complex processes consuming vast amounts of computational resources are executed, an automatic mechanism of service replication was built into the system. This mechanism is able to multiply instances of particular services and run them in IMCOP cloud and third-party machines. Fig. 7. shows how the averages of changes Δt obtained under TP5 test conditions vary with the growing number of service instances. It is clear that system performance increases significantly when the number of instances, instantiated as a result of the replication mecha- nism, increases to four per service. Fig. 7 Averages of changes Δt versus the number of concurrent user requests N in test plan TP5 14088 Multimed Tools Appl (2018) 77:14077–14091 5 Summary and conclusions The IMCOP platform is a service-oriented architecture with a vast number of specialized web services. Distinct functions of IMCOP services which aggregate, analyze and enrich the processed data mean the IMCOP platform is flexible and able to meet customer needs concerning different subjects of demanded content and ways of presenting content to end-users. IMCOP’s openness to third-party services and scalability provided by SOA-driven architecture (using mechanisms of service replication and concurrent and parallel computing) means the system capabilities are unrestricted. The ability of the IMCOP platform to process different multimedia formats (text, still images, audio-video sequences, footage) ensures diversity of information sources and gives the foundation for rich presentation layers of IMCOP end-apps. As such, the IMCOP system outperforms other content discovery platforms in terms of their universality and versatility. The IMCOP system shares certain features with other content discovery platforms, such as searching for connections (as e.g. in the Kannuu system), related content function (as e.g. in the COMPASS platform) and the Solr engine (as e.g. in Outbrain). However, in contrast to these and other platforms which have limited functionality, the IMCOP platform addresses a range of goals and serves different categories of customers. For example, the content discovery and delivery engine of the IMCOP platform can be used to produce DEEP-like magazines whose subject matter can extend beyond actors, movies and celebrities. Such automatically generated magazines can cover subjects such as cultural events in a given city. Instead of using web-based portals, users can use a DEEP-like mobile app powered by the IMCOP platform. This enables them to find the latest theatre shows and learn about the shows, directors, actors and so on. In this instance, the city is the IMCOP customer while users are the end-user of the app. The concept of Complex Multimedia Objects (CMO) is another significant difference between the IMCOP system and other content discovery platforms. CMO objects with their Universally Unique Identifiers (UUIDs) which extend the MPEG-7 standard to hold descrip- tive and descriptor metadata and connection information allow the IMCOP system to exchange data with other systems. To our best knowledge, content discovery platforms described in Section 2 do not offer this capability. There are certain drawbacks of the current implementation of the IMCOP system which need to be eliminated. Our current efforts aim to improve the accuracy of particular MES services. In addition, other types of MES and DAS services are required to extend and diversify IMCOP capabilities. However, the openness of the IMCOP system means we hope to incorporate new services in collaboration with partners. Acknowledgements This work was supported by the Polish National Centre for Research and Development (NCBR), as a part of the EUREKA Projects no. E! II/PL-IL/10/02A/2012 and E!II/PL-IL/10/03A/2012. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and repro- duction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. References 1. Baran R, Wiraszka D, Dziech W (2000) Scalar quantization in the PWL transform spectrum domain. Proc. Intern. Conf. on Mathematical Methods in Electromagnetic Theory, In, pp 218–221. doi:10.1109 /MMET.2000.888560 Multimed Tools Appl (2018) 77:14077–14091 14089 2. Baran R, Rusc T, Rychlik M (2014) A smart camera for traffic surveillance. In: Dziech A, Czyżewski A (eds) MCSS 2014. CCIS, vol 429. Springer, Heidelberg, pp 1–15. doi:10.1007/978- 3-319-07569-3_1 3. Baran, R., Zeja, A., Slusarczyk, P.: An Overview of the IMCOP System Architecture with Selected Intelligent Utilities Emphasized. In Multimedia Communications, Services and Security, vol. 566 of the series Communications in Computer and Information Science, pp 3–17. Springer, Heidelberg (2015), doi:10.1007/978-3-319-26404-2_1 4. Blanco-Fernandez Y, Pazos-arias JJ, Gil-Solla A, Ramos-Cabrer M, Lopez-Nores M (2008) Providing entertainment by content-based filtering and semantic reasoning in intelligent recommender systems. IEEE Trans Consum Electron 54(2):727–735 5. Bleschke, M, Madonski R, Rudnicki R (2009) image retrieval system based on combined MPEG-7 texture and colour descriptors. In Proc. of the 16th Int. Conf. On mixed Design of Integrated Circuits & systems (MIXDES '09), pp. 635-639, Lodz 6. Cerqueira E, Janowski L, Leszczuk M, Papir Z, Romaniak P (2009) Video artifacts assessment for live mobile streaming applications. In: Mauthe A, Zeadally S, Cerqueira E, Curado M (eds) FMN 2009. LNCS, vol 5630. Springer, Heidelberg, pp 242–247 7. Chatzichristofis, S. A., Boutalis, Y. S.: CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval. Computer Vision Systems, vol. 5008, pp. 312–322, Springer, Heidelberg (2008), doi:10.1007/978-3-540-79547-6_30, 8. Chatzichristofis SA, Boutalis YS (2008) FCTH: fuzzy color and texture histogram - a low level feature for accurate image retrieval. In Proc. of the Ninth Int. Workshop on Image Analysis for Multimedia Interactive Services, Klagenfurt, pp 191–196 9. Eshkol A, Grega M, Leszczuk M, Weintraub O (2014) Practical application of near duplicate detection for image database. In: Dziech A, Czyżewski A (eds) MCSS 2014. CCIS, vol 429. Springer, Heidelberg, pp 73–82. doi:10.1007/978-3-319-07569-3_6 10. Howlett RJ (2003) Internet-based intelligent information processing systems, series on innovative intelli- gence, vol 3. World Scientific 11. http://deep.it/, (viewed July 25, 2016) 12. http://googlescholar.blogspot.com/2012/08/scholar-updates-making-new-connections.html, (viewed July 25, 2016) 13. http://lucene.apache.org/solr/, (viewed July 25, 2016) 14. http://techblog.outbrain.com/2011/04/under-the-hood-of-our-algorithmic-engine-how-we-serve-content- recommendations/, (viewed July 25, 2016) 15. http://www.cnet.com/how-to/samsung-smart-tv-spying/, (viewed July 25, 2016) 16. http://www.intel.com/content/www/us/en/internet-of-things/infographics/guide-to-iot.html, (viewed July 25, 2016) 17. http://www.live-counter.com/how-big-is-the-internet/, (viewed July 25, 2016) 18. http://www.nature.com/news/how-to-tame-the-flood-of-literature-1.15806, (viewed July 25, 2016) 19. http://www.tvbeurope.com/global-pay-tv-market-exceed-one-billion-2017/, (viewed July 25, 2016) 20. http://www.viaccess-orca.com/content-discovery-platform.html, (viewed July 25, 2016) 21. http://www.viaccess-orca.com/resource-center/white-papers/462-going-deep-into-discovery.html, (viewed July 25, 2016) 22. https://en.wikipedia.org/wiki/Content_discovery_platform, (viewed July 25, 2016) 23. https://www.facebook.com/deepmagazines/, (viewed July 25, 2016) 24. Michael J. Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. In the adaptive web, Peter Brusilovsky, Alfred Kobsa, and Wolfgang Nejdl (Eds.). Lecture notes in computer science, Vol. 4321. Springer-Verlag, berlin 325-341, 25. Romaniak P, Janowski L, Leszczuk M, Papir Z (2012) Perceptual quality assessment for H.264/AVC compression. In: Proc. of consumer communications and networking conference (CCNC), pp 597-602. doi:10.1109/CCNC.2012.6181021 26. Salembier P, Smith JR (2001) MPEG-7 multimedia description schemes. IEEE Transactions on Circuits and Systems for Video Technology 11(6):748–759 27. Salter J, Antonoupoulos N (2006) CinemaScreen recommender agent: combining collaborative and content- based filtering. IEEE Intell Syst 21(1):35–41 28. Slusarczyk, P., Baran, R.: Piecewise-linear subband coding scheme for fast image decomposition, multi- media tools and applications. Springer, US (2014), doi:10.1007/s11042-014-2173-1, 29. Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Advances in Artificial Intelligence archive 14090 Multimed Tools Appl (2018) 77:14077–14091 Remigiusz Baran was awarded the M.Sc. in Electrical Engineering from the Faculty of Electrical and Control Engineering, Kielce University of Technology in 1993, and the Ph.D. in Telecommunications from the Faculty of Electrical, Control, Electronic and Computer Engineering, AGH University of Science and Technology in Kraków in 2004. He is currently working as an Assistant Professor at the Kielce University of Technology. He is the author or co-author of over 50 publications in the field of digital signal processing, focusing on image compression and feature extraction. The main areas of his academic interest are feature- and appearance-based object detection and recognition techniques, and microprocessor technology and embedded systems. Apart of his scientific activity he is also an academic teacher. He has promoted approximately 60 MSc students as well at undergraduate as graduate levels He also serves as a reviewer of international journals and conferences. Dr. Baran has participated numerous international and national (Polish) research projects including INDECT, OASIS Archive, Calibrate, INWAS, INSIGMA, TAPAS. At present he is the Project Manager of the second joint Polish-Israeli R&D project IMCOP BIntelligent Multimedia System for Web and IPTV Archiving. Digital Analysis and Documentation of Multimedia Content. Andrzej Dziech Ph.D. Hab. received his M.Sc. and Ph.DPhD in telecommunications from the Electro-technical Institute in Leningrad, Faculty of Automation and Computer Science, in 1970 and 1973. He received postdoc- toral degree in engineering, Science Tech, from the Poznan University of Technology, Faculty of Electrical Engineering in 1978. He is full professor from 1986. His fields of interest are related to digital communication, image and data processing, data compression, information and coding theory, random signals, computer Multimed Tools Appl (2018) 77:14077–14091 14091 communications networks and signal processing. He has worked in number of foreign universities, most recently in 2001–2003 he worked as a visiting professor at the University of Wuppertal in Germany. He co-authored 180 publications including 5 books. He promoted 18 Ph.D. students and approx. 100 M.Sc. He was awarded 4 times by Ministry of Education of Poland for his research achievements. He serves as a co-ordinator of FP7 project INDECT. Andrzej Zeja was awarded the M.Sc. in Electrical Engineering from the Faculty of Electrical and Control Engineering, Kielce University of Technology in 1992. He is currently working as an Assistant Professor at the Kielce University of Technology. The main areas of his activities are programming and integrating distributed heterogeneous computer systems and hardware/software codesing of embedded systems. A. Zeja has participated numerous international and national (Polish) research projects including Calibrate, INWAS, INSIGMA, IMCOP. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Multimedia Tools and Applications Springer Journals

A capable multimedia content discovery platform based on visual content analysis and intelligent data enrichment

Free
15 pages

Loading next page...
 
/lp/springer_journal/a-capable-multimedia-content-discovery-platform-based-on-visual-VpwsRSkZkM
Publisher
Springer Journals
Copyright
Copyright © 2017 by The Author(s)
Subject
Computer Science; Multimedia Information Systems; Computer Communication Networks; Data Structures, Cryptology and Information Theory; Special Purpose and Application-Based Systems
ISSN
1380-7501
eISSN
1573-7721
D.O.I.
10.1007/s11042-017-5014-1
Publisher site
See Article on Publisher Site

Abstract

Multimed Tools Appl (2018) 77:14077–14091 DOI 10.1007/s11042-017-5014-1 A capable multimedia content discovery platform based on visual content analysis and intelligent data enrichment 1 2 3 Remigiusz Baran & Andrzej Dziech & Andrzej Zeja Received: 27 September 2016 /Revised: 30 November 2016 /Accepted: 12 December 2016 / Published online: 28 July 2017 The Author(s) 2017. This article is an open access publication Abstract A new capable content discovery platform based on multimedia data enrich- ment is presented in this paper. The platform, known as the IMCOP system, refers to the concept of intelligent discovery and delivery of multimedia content. Relevant state- of-the-art solutions are described in detail in the background section. The overall architecture and the main components of the IMCOP system are presented next. An original concept of Complex Multimedia Objects which extend the MPEG-7 standard to hold the processed data and bind it into content related collections is introduced. Selected results of tests illustrating how the IMCOP system performs in terms of responsiveness and stability under a particular workload are reported. Finally, IMCOP’s advantages in refer- ence to other content discovery platforms are discussed and summed up. . . . Keywords Content discovery and delivery Recommender engines Visual analysis Data . . enrichment Multimedia indexing Complex multimedia objects * Remigiusz Baran r.baran@tu.kielce.pl Andrzej Dziech dziech@kt.agh.edu.pl Andrzej Zeja a.zeja@wstkt.pl Faculty of Electrical Engineering, Automatics and Computer Science, Kielce University of Technology, al. 1000-lecia P.P. 7, 25-314 Kielce, Poland AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland University of Computer Engineering and Telecommunications, ul. Toporowskiego 98, 25-553 Kielce, Poland 14078 Multimed Tools Appl (2018) 77:14077–14091 1 Introduction The volume of data stored in, downloaded from and shared via the Internet is increasing rapidly. As reported in [17] the storage capacity of the Internet Bis doubling in size every two years^.Thisis chiefly due to the growing number of devices connected to the Internet, including mobile phones, computers and tablets as well as other types of smart devices, Bfrom minuscule chips to mammoth machines that use wireless technology to talk to each other (and to us)^ [16]. This hard to imagine source of big data, mainly represented by images and video sequences (the most rapidly-growing types of media), provides incredible but real opportunities for building heterogeneous intelligence systems [10]. The first issues which need to be addressed are content-oriented data analysis, enrichment, retrieval and recommendation. A new content discovery platform, known as the IMCOP system, which uses these and other technologies, is presented in this paper. The IMCOP system is the result of an international collaboration within the framework of the second joint Polish-Israeli R&D project titled "Intelligent Multimedia System for Web and IPTV Archiving. Digital Analysis and Documentation of Multimedia Content". According to initial assumptions, the capabilities of the IMCOP system should include multimedia data aggregation, analysis and processing. The implementation of these processes must be extremely flexible to fulfill the requirements of different IMCOP applications. According to customer needs, the IMCOP system should be able to perform different kinds of processing and content analysis of multimedia data aggregated using various Internet data sources. Content analysis should enrich the data – extend the metadata list – to confirm the relationship between the data and the subject matter and find its content-related connections with other data. Thanks to this flexibility, the IMCOP system should be able to address customer demands regarding relevant content and ways of presenting it to their users. The system presented in this paper meets all these initial assumptions and expectations of the IMCOP project. For example, various types of aggregated data comprising text, still images, film footage and video sequences are considered. Mechanisms for extensive analysis of all types of aggregated data, including detection and extraction of various features and different classification approaches, were also used. A range of descriptive metadata is extracted in this way to enrich the aggregated data and give the foundation for finding connections. In addition, a flexible and efficient representation of data, known as Complex Multimedia Objects (CMO), maintaining the metadata and the content-related connections, is proposed. As well as these objectives, the IMCOP project has the following minor goals: & to ensure that the IMCOP system is platform-independent and capable of incorporating external services to improve its efficiency and increase its intelligent facilities, for example by removing duplicate images from the database [9], & to guarantee scalability despite vast numbers of multimedia objects processed, & to make absolutely certain that the system is legal in terms of copyright law and that the processed objects and their content are fully protected against copying, reproducing, modifying and other forms of authentication rights violation. It was also agreed between the project partners that the IMCOP system should comply with the Data Enrichment and Engagement Platform (DEEP), which was largely developed by the Israeli partner as part of their project activities. According to [21], the DEEP platform is Ba revolutionary new solution (which) resolves the complexities of content discovery, recommendation, usability and engagement all at once, and consumers already know how to use it^. All these values can be verified through the DEEP Magazines application which is its final product [11]. Multimed Tools Appl (2018) 77:14077–14091 14079 The DEEP Magazines app shows how an advanced and professionally-made end-user appli- cation of the IMCOP system could work and look. However, it should be stressed that the platform’s capabilities are significantly broader than those of the DEEP platform. The DEEP platform, according to the DEEP Magazines app requirements, is designated to collect, analyze and select images and short descriptive information (e.g. news) which stay in relation to Bthe hottest stories about celebs, actors, movies and TV shows^ [23]. In contrast, the IMCOP system can be categorized as a comprehensive and versatile content discovery and delivery platform which is capable of addressing topics of any kind. The IMCOP platform also has other functionalities. It can be applied for instance as a multimedia indexing (labeling) application, as e.g. Imagga (http://imagga.com/), or as a reverse search engine tool, as e.g. TinEye (https://www.tineye.com/). The remaining part of this paper is organized as follows. The next section presents background materials and a literature review within the framework of content discovery platforms. The overall architecture with an insight into various categories of the IMCOP web services and the concept of Complex Multimedia Objects are introduced in the sub- sections of Section 3. Section 4 reports on the IMCOP system performance. Final conclusions and potential future improvements are presented in Section 5. 2 Background and literature review According to the Wikipedia definition [22], a content discovery platform Bis an implemented software recommendation platform which uses recommender system tools^. As defined, in turn, in [24], a content-based recommender system is a system Bthat recommend an item to a user based upon a description of the item and a profile of the user’sinterests^. In general, there are three main approaches to recommendation system design: collaborative filtering [29], content-based filtering [4] and hybrid, where the two former approaches are combined [27]. In fact, the definitions cover a range of solutions with different goals, domains and approaches using computer science techniques such as data mining, information retrieval and filtering, machine learning, artificial intelligence and so on. Google Scholar, Pubnet, CiteSeer and Web of Science are the leading scientific and academic literature search and recommender engines. The platforms work in two key ways. In Google Scholar, papers related to the user’s research interests (notified as Scholar Updates) are found through a statistical analysis according to Bwhat your work is about, the citation graph between articles, the fact that interests can change over time, and the authors you work with and cite^ [12]. The other way of finding relevant articles in Google Scholar is setting user alerts. Despite noticeable differences, both methods are content-based filtering approaches which use a record of the researcher’s authored papers and citations. This approach is effective for researchers who have published many papers; however, it provides poor results for others, such as graduate students. In other types of recommender engines related to academic content such drawbacks do not occur. One such tool is PubChase. BPubChase suggests articles from PubMed on the basis of a user’s publishing record, but it also learns from the articles that the user has read and stored in his or her online library […] it adds another machine-learning technique: comparing this library with other people’s collections, with the logic that people with common research interests might benefit from each others’ preferences^ [18]. Content discovery and personal recommendation have also become crucial to the global pay-TV industry due to the proliferation of choice offered by the television of today, including video-on-demand, personalized video recording, streaming services and web-based content. In 14080 Multimed Tools Appl (2018) 77:14077–14091 addition, Bthe global pay-TV market is projected to grow from more than 900 million subscribers in 2014 to 1.21 billion by 2022^ [19]. Such a major and interactive marketplace is an important challenge for operators, who are forced to look for new ways of satisfying customers and holding their attention for longer. Platforms where Ba blend of sophisticated content discovery algorithms, personalized recommendations are offered to drive additional purchases,^ as in COMPASS from Viacess- Orca, are appropriate solutions within this context. As reported in [20], collaborative filtering, external rating and operator’s promotions are the main algorithms driving the COMPASS engines. Additionally, an algorithm known as related content, where Bsimilar item recommen- dations based on content metadata (actors, directors, years, countries, etc.) and keywords^ are proposed, is also essential. The Kannuu content discovery platform (http://www.kannuu.com/) is another example of a successful and innovative integration of various recommendation layers. In the Kannuu system, similar title recommendations are achieved using a proprietary metadata analysis which aims to find connections between titles. Similarly to the COMPASS engine, metadata analysis performed in the Kannuu system comprises (for example) genre, cast, director, date and expanded keyword lists. A similar approach, based on collaborative filtering where profiling and behavioral data are used, is also applied as the social recommendations layer. This is the key difference between the Kannuu and COMPASS systems, at the social recommendations layer is integrated with social networking sites. This means Kannuu users can provide and receive recommendations from people in their social media circles. In turn, ratings, in particular personal, and streaming history are the only key algorithms in Netflix – one of the most successful providers of streaming media and video on demand in the world. Since personal ratings and past viewing habits are insufficient when used for a single user, Netflix combines ratings of all its users with similar tastes. Personal ratings are a type of personal data collected by TV operators. However, personal data transferred to providers must be limited and reasonably selected to prevent eavesdropping such as was found in Samsung [15]. Therefore, platforms where algorithms from various levels are combined, as in the COMPASS or Kannuu recommender engines, are more efficient (their recommendations are more relevant) and more secure. Another market where content discovery plays an important role is online publishing, mainly of blogs, podcasts, websites, etc. Outbrain, Taboola and Google AdSense are the leading platforms here. They recommend content via online media, from the largest and most widely respected such as BPeople^ (in the case of Outbrain) and BUSA Today^ (in the case of Taboola), to those with limited audiences such as BBuildEazy^ (in the case of AdSense). Content discovery algorithms applied in these cases are complex and sophisticated. For example, according to Outbrain, its recommender engine is based on more than 50 algorithms which are run in parallel to determine a set of candidate recommendations. These algorithms are categorized into four main groups: contextual, behavioral, personal and popular [14]. Contextual algorithms aim to find relationships (identify connections) between recommended content and that found on a target website. The Solr search engine [13]isusedat thisstage. The aim of behavioral algorithms is to learn the statistical behaviors of users. This can be achieved by simply signing the most visited or most rated documents on a site or by applying collaborative filtering methods. The latter approach lets users access other content Bliked^ by people. In turn, personal algorithms learn user properties and history. Personalization is applied in Outbrain using cookies. Recommendations of each type, given independently by behavioral, contextual and personal algorithms, are then processed (using machine learning techniques) to select the most relevant ones. Multimed Tools Appl (2018) 77:14077–14091 14081 3 IMCOP platform architecture As stated in Section 1, the IMCOP system is a capable content discovery platform. As such, it can use user metadata to discover customer-relevant content which can be delivered to websites, mobile devices, set-top boxes, etc. In this domain, IMCOP aspires in part to operate similarly to Outbrain and Taboola – the largest content discovery startups in Israel. IMCOP also uses specialized and sophisticated algorithms to select the most relevant content. Although IMCOP does not currently apply algorithms which could be categorized as behavioral and personal, it uses an assortment of intelligent automated multimedia content analysis algorithms from a range of computer vision techniques. As such, IMCOP goes beyond typical content discovery platforms and also serves as a data enrichment platform. The IMCOP platform is a distributed system based on Service-Oriented Architecture (SOA). IMCOP services are RESTful web services, implemented according to the Represen- tational State Transfer (REST) architectural style. As such, IMCOP services are self-contained applications with their own REST-based interface. In addition, they are fully independent from operating systems and platforms on which they are implemented and run. This means they are also scalable, fast and modifiable. All IMCOP services have been developed in Java according to the original MESCore library, which provides API, programming specifications and code examples within SDK for developers. As the MESCore library is open, IMCOP services can be freely added, edited or improved by third-party developers. This also implies that IMCOP’s capabilities can be accessed and extended by external companies and institutions interested in a partnership with the IMCOP team. The main categories of IMCOP’s services are as follows: & Metadata Enhancement Services (MES) – specialized services, mainly in multimedia data analysis and enrichment (there are also other types of MES driven services such as management and connection), & Data Aggregation Services (DAS) – mainly used for web crawlers which extract and collect data from the web as well as exchanging data with the IMCOP database, known as the Data Repository (DR). Regardless of the category, the IMCOP services can be run in heterogeneous environments, e.g. MS Windows and Linux (×32 or ×64) using virtual machine applications. In other words, there is no need to unify the IMCOP software components. This makes them easy to implement and integrate with the rest of the system. According to the MESCore library, IMCOP services are in fact wrapper functions which call other specialized applications or processes. These applications or processes can be written and compiled, independently of each other, on any platform, e.g. Java, .NET, native C++, Phyton, etc., and then used directly in target web applications. An overall architecture of the IMCOP system is depicted in Fig. 1. 3.1 Metadata enhancement services There are many different kinds of Metadata Enhancement Services (MES) in the IMCOP system. The majority focus on content discovery and metadata enhancement. MES Services of this kind are dedicated to perform selected operations from the scope of text and signal (image and video) processing. Text processing operations mainly include detection and localization of areas where text features are present; they also conduct semantically-organized and dictionary-driven text recognition. The list of image, frame and video processing applications is as follows: 14082 Multimed Tools Appl (2018) 77:14077–14091 IMCOP platform IMCOP MANAGER 4U Third-party MES services Tomcat+MESCore /CC++ java/.net/python windows/linux Fig. 1 The overall IMCOP platform architecture & image transforms for detecting, extracting and calculating descriptors for various types of local features, e.g. SIFT, SURF [2], MSER, Piecewise-linear [1], CEDD [7], CLD, EHD and SCD [5], FCHT [8], & algorithms for detecting and recognizing (e.g. using local feature descriptors listed above) different kinds of objects, content and scenes, including faces, bodies, nudity, dress color, sky, images with the Bokeh effect, landscapes or pictures with man-made structures (buildings, monuments, etc.), logos and visual watermarks, etc., & procedures for estimating similarities between images or their selected regions of interest [9] and evaluating selected image and video quality metrics, including noise, blur, blockiness, slicing, etc. (acquired from [6, 25]), & compression of images using selected compression schemes [28], & algorithms designated to analyze and classify faces according to various traits, e.g. profile, presence of red eye, smile and facial hair (to identify unshaven faces), etc., & text, speech and face recognition processes applied mainly in order to index film footage and video sequences. Descriptors, labels and other values returned by the above applications during data analysis enrich the data. They stand for descriptive metadata added to other descriptive information about data, as e.g. keywords and URIs (in fact, a common scenario is to gather only the URIs of the processed data instead of the data itself) stored in data representation objects (CMO). For clarifi- cation, it should be noted that regardless of how metadata is added automatically during data analysis provided by MES services, keywords and URIs are affixed by DAS services during data aggregation. However, aside from automatic data aggregation provided by DAS services, there are also other methods of entering the data into the IMCOP system. For example, data can be entered (individuallyoringroups) usingthe GUI of the IMCOP system. GUI and selected MES services of the IMCOP system can be accessed using the following link: https://imcop.pl/. An example view of the IMCOP GUI while sample data is being entered and the results (labels) provided by selected IMCOP services after it has been processed are depicted in Figs. 2 and 3, respectively. Other categories of highly specialized MES services are also incorporated in the IMCOP system. Some, known as Management Services (MS), are control different activities of the other services and manage data-interchange processes inside the system, including Data Repository (DR). Watermark Retrieval and Embedding Services (WRES) are activated in Multimed Tools Appl (2018) 77:14077–14091 14083 https://pl.wikipedia.org/wiki/T ilda_Swinton#/media/File:Tild a_Swinton_crop.jpg Fig. 2 The IMCOP GUI and adding a sample picture to the list of processed objects order to mark the processed data with hidden messages, known as IMCOP signatures, and to protect the data against forbidden use, manipulation and sharing with unauthorized end-users. Connection Services (CS) stand for the final but perhaps most important category of IMCOP services. The aim is to identify relationships (connections) between the processed data. In the case of images, for instance, connections are identified twofold: & by matching keywords, URIs and labels given by MES services (the Solr search engine [13]is usedatthisstage), & by analyzing numerical descriptors of selected image features to find the list of stored objects which are similar to the processed image. Fig. 3 Labels given by selected services to the sample image from Fig. 2 14084 Multimed Tools Appl (2018) 77:14077–14091 From the end-user perspective, the IMCOP system can also be seen as a provider of services in the cloud. However, unlike in standard cloud computing models, IMCOP does not run client applications. Instead, end-user requests activate IMCOP services to prepare recommendations in terms of the desired multimedia content. Other details of IMCOP’s components and their functionalities can be found in preceding articles, e.g. in [3]. 3.2 Data aggregation services There are, in general, different data sources to which particular DAS services can be addressed. Their selection has to meet end-user requirements in terms of multimedia forms and data content relevance. The DEEP-like end-users, for example, need to aggregate and process textual information, images and video sequences related to celebrities, movies and actors. Thus, sources selected for data aggregation in this case should include, for example, selected multimedia data hosting websites (e.g. Getty Images), community-curated knowledge bases and encyclopedias (e.g. Wikipedia), news providers (e.g. BBC), social networking services (e.g. Twitter), etc. According to the IMCOP platform, it currently incorporates DAS services in all the above data sources, except Getty Images which operates as a commercial photo agency. Instead, the list of IMCOP’s DAS services also includes Flickr, Foursquare (https://foursquare. com/), Allocine (http://www.allocine.fr/) and the New York Times. As DAS services have to be developed with regard to APIs, which differ from data source to data source, there is no single common model for implementing them. They also have to be implemented and configured separately because of the data source authorization requirements, data-interchange protocols and formats (e.g. XML-REST, JSON, PHP), license conditions, etc. At the end of the data aggregation process the CMO objects, which refer to data representation objects, are instantiated. With regard to the IMCOP terminology, the DAS service creates a separate and self-contained CMO object for every single aggregated data point (image, text object, etc.), which is known as a Multimedia Object (MO). 3.3 Complex multimedia objects As stated in Section 1, CMO objects are dedicated to represent multimedia data in the IMCOP system. CMO objects are content type independent, which means that all data forms processed in the IMCOP system have the same flexible and general XML representation. After instan- tiation, CMO objects are exchanged between IMCOP MES services according to different schedules. MES activities planned in these schedules depend on end-user requirements and the type of processed data. The general scheme of CMO object processing is illustrated in Fig. 4. The definition of CMO derives from the MPEG-7 multimedia content description standard. Therefore, descriptive metadata such as topic, name, age, date of birth (e.g. in the case of an actor), keywords, brief text, etc., are registered according to MPEG-7 Description Schemes [26]. SIFT, Shape Context, MSER, Piecewise-linear or any other feature descriptors, extracted by dedicated MES services, are stored according to MPEG-7 Descriptor specification. The CMO definition extends the MPEG-7 standard in some respects. The most significant extension refers to connec- tions between data and the way in which pointers to these connections are stored in CMO objects. As illustrated in Fig. 4, each CMO object present in the IMCOP system has its own Universally Unique Identifier (UUID). UUID identifiers make it possible to distinguish between particular CMO objects, regardless of IMCOP distributed architecture and despite the lack of central coordination. After instantiation by DAS services, CMO objects are passed to MES services Multimed Tools Appl (2018) 77:14077–14091 14085 Fig. 4 General scheme of CMO object instantiation and processing where they are processed. As a result, MES driven metadata is added to their properties. Next (or in the meantime – these processes can take place simultaneously) UUID identifiers of objects recognized as related are appended to the list which stands for the list of connected objects. 4 Performance analysis The IMCOP system needs to be capable of serving a large number of clients. As each client may require many multimedia objects of different content forms, scalability was a major challenge facing IMCOP designers and developers. Although some of the algorithms incorporated by IMCOP services, e.g. those responsible for text and object detection and recognition, are compu- tationally highly expensive, the IMCOP system’s ability to replicate services and to apply concurrent and parallel computing ensures that the IMCOP objectives can be put into practice. A number of load tests were conducted to verify the above. During these tests, the IMCOP system was subjected to peaks in activity reflecting the likely demands of IMCOP users. A heavy concurrent load on the system was simulated using test plans executed by JMeter applications run from the outside of the IMCOP network, as illustrated in Fig. 5. JMeter instances distributed over the IMCOP platform public network IMCOP GATEWAY JMeter 3U MASTER Fig. 5 Configuration for generating a heavy concurrent load on the IMCOP system Slave JMeter Slave JMeter instance instance Slave JMeter SlaveJMeter Slave JMeter instance inSlave stancJMeter e instance instance 14086 Multimed Tools Appl (2018) 77:14077–14091 Test plans executed on slave JMeter instances were also diversified as they implemented a range of scenarios involving various types and numbers of IMCOP services. This imitated the usual system load during DEEP magazine preparation. Selected results of such tests, related to the running time of the processes, are presented in Table 1. The tests were performed in accordance with four different test plans (TP1÷TP4) which were executed iteratively for the growing number of concur- rent user requests (N). However, changes in average duration of IMCOP responses (Δt) to executed test plans per single user request are shown instead of directly measured particular time periods (t) for clarity of presentation. Changes in average durations were calculated as follows: t −t i i−1 Δt ¼ for i ¼ 2; 3; …; 8 ð1Þ N −N i i−1 The last column of Table 1 shows the averages of the Δt values obtained for each iteration. A plot of a trend line (of an exponential regression type) showing the relationship between the averages Δt and the growing number of concurrent user requests N is depicted in Fig. 6. It is clear that the greater the number of concurrent user requests N, the smaller the changes in average duration of IMCOP responses. This is mainly due to concurrent computing and parallel processing utilized in the IMCOP platform. The majority of IMCOP tasks are performed concurrently by particular IMCOP services. For example, image analysis (results shown in Fig. 3.) is performed simultaneously by all the MES services. In turn, MES services evaluating selected image quality metrics, detecting nudity and recognizing text (if present in an image), etc., were implemented using parallel processing. There is an additional reason why the changes in average duration of IMCOP responses are smaller when the number of concurrent user requests continues to grow. This is because of load balancing which improves the distribution of processes carried out in the IMCOP system across multiple replications of MES services. The feature of the IMCOP system replicating particular services when an overload occurs and the resulting significant improvement in system performance are illustrated in Fig. 7. The plot depicted in Fig. 7 shows the relationship between the averages Δt and the growing number of concurrent user requests N, as discussed above, although related to video indexing in the case of the test plan TP5. Table 1 Changes in average durations of IMCOP responses to executed test plans (TP1÷TP4) per single user request iN TP1 TP2 TP3 TP3 Δt[s] Δt [s] Δt [s] Δt [s] Δt [s] 1 1 44.00 110.00 80.00 346.00 145.00 22 −4.00 19.00 28.00 81.00 31.00 3 15 5.46 7.92 10.31 76.54 25.06 4 30 20.00 7.20 6.93 4.60 9.68 5 70 7.73 3.90 3.75 25.23 10.15 6 100 6.36 7.00 10.93 0.70 6.25 7 150 4.94 10.02 7.36 12.00 8.58 8 200 2.42 1.64 9.56 33.60 11.81 Multimed Tools Appl (2018) 77:14077–14091 14087 Fig. 6 Trend line of the averages of changes Δt versus the number of concurrent user requests N Three of the MES services of the IMCOP system are dedicated to automatic content-based video indexing tasks. Described briefly in Section 3.1, they index audio-video sequences and film footage with regard to: & speechrv – speech transcripts obtained using speech recognition techniques, & textdrv – text transcripts obtained using text detection methods and recognized using optical character recognition, & facerv – actors distinguished using face detection and classification methods. Algorithms used by these services are more complex and thus more time consuming than those used in test plans TP1÷TP4. To protect the IMCOP system against overload, which may occur when complex processes consuming vast amounts of computational resources are executed, an automatic mechanism of service replication was built into the system. This mechanism is able to multiply instances of particular services and run them in IMCOP cloud and third-party machines. Fig. 7. shows how the averages of changes Δt obtained under TP5 test conditions vary with the growing number of service instances. It is clear that system performance increases significantly when the number of instances, instantiated as a result of the replication mecha- nism, increases to four per service. Fig. 7 Averages of changes Δt versus the number of concurrent user requests N in test plan TP5 14088 Multimed Tools Appl (2018) 77:14077–14091 5 Summary and conclusions The IMCOP platform is a service-oriented architecture with a vast number of specialized web services. Distinct functions of IMCOP services which aggregate, analyze and enrich the processed data mean the IMCOP platform is flexible and able to meet customer needs concerning different subjects of demanded content and ways of presenting content to end-users. IMCOP’s openness to third-party services and scalability provided by SOA-driven architecture (using mechanisms of service replication and concurrent and parallel computing) means the system capabilities are unrestricted. The ability of the IMCOP platform to process different multimedia formats (text, still images, audio-video sequences, footage) ensures diversity of information sources and gives the foundation for rich presentation layers of IMCOP end-apps. As such, the IMCOP system outperforms other content discovery platforms in terms of their universality and versatility. The IMCOP system shares certain features with other content discovery platforms, such as searching for connections (as e.g. in the Kannuu system), related content function (as e.g. in the COMPASS platform) and the Solr engine (as e.g. in Outbrain). However, in contrast to these and other platforms which have limited functionality, the IMCOP platform addresses a range of goals and serves different categories of customers. For example, the content discovery and delivery engine of the IMCOP platform can be used to produce DEEP-like magazines whose subject matter can extend beyond actors, movies and celebrities. Such automatically generated magazines can cover subjects such as cultural events in a given city. Instead of using web-based portals, users can use a DEEP-like mobile app powered by the IMCOP platform. This enables them to find the latest theatre shows and learn about the shows, directors, actors and so on. In this instance, the city is the IMCOP customer while users are the end-user of the app. The concept of Complex Multimedia Objects (CMO) is another significant difference between the IMCOP system and other content discovery platforms. CMO objects with their Universally Unique Identifiers (UUIDs) which extend the MPEG-7 standard to hold descrip- tive and descriptor metadata and connection information allow the IMCOP system to exchange data with other systems. To our best knowledge, content discovery platforms described in Section 2 do not offer this capability. There are certain drawbacks of the current implementation of the IMCOP system which need to be eliminated. Our current efforts aim to improve the accuracy of particular MES services. In addition, other types of MES and DAS services are required to extend and diversify IMCOP capabilities. However, the openness of the IMCOP system means we hope to incorporate new services in collaboration with partners. Acknowledgements This work was supported by the Polish National Centre for Research and Development (NCBR), as a part of the EUREKA Projects no. E! II/PL-IL/10/02A/2012 and E!II/PL-IL/10/03A/2012. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and repro- duction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. References 1. Baran R, Wiraszka D, Dziech W (2000) Scalar quantization in the PWL transform spectrum domain. Proc. Intern. Conf. on Mathematical Methods in Electromagnetic Theory, In, pp 218–221. doi:10.1109 /MMET.2000.888560 Multimed Tools Appl (2018) 77:14077–14091 14089 2. Baran R, Rusc T, Rychlik M (2014) A smart camera for traffic surveillance. In: Dziech A, Czyżewski A (eds) MCSS 2014. CCIS, vol 429. Springer, Heidelberg, pp 1–15. doi:10.1007/978- 3-319-07569-3_1 3. Baran, R., Zeja, A., Slusarczyk, P.: An Overview of the IMCOP System Architecture with Selected Intelligent Utilities Emphasized. In Multimedia Communications, Services and Security, vol. 566 of the series Communications in Computer and Information Science, pp 3–17. Springer, Heidelberg (2015), doi:10.1007/978-3-319-26404-2_1 4. Blanco-Fernandez Y, Pazos-arias JJ, Gil-Solla A, Ramos-Cabrer M, Lopez-Nores M (2008) Providing entertainment by content-based filtering and semantic reasoning in intelligent recommender systems. IEEE Trans Consum Electron 54(2):727–735 5. Bleschke, M, Madonski R, Rudnicki R (2009) image retrieval system based on combined MPEG-7 texture and colour descriptors. In Proc. of the 16th Int. Conf. On mixed Design of Integrated Circuits & systems (MIXDES '09), pp. 635-639, Lodz 6. Cerqueira E, Janowski L, Leszczuk M, Papir Z, Romaniak P (2009) Video artifacts assessment for live mobile streaming applications. In: Mauthe A, Zeadally S, Cerqueira E, Curado M (eds) FMN 2009. LNCS, vol 5630. Springer, Heidelberg, pp 242–247 7. Chatzichristofis, S. A., Boutalis, Y. S.: CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval. Computer Vision Systems, vol. 5008, pp. 312–322, Springer, Heidelberg (2008), doi:10.1007/978-3-540-79547-6_30, 8. Chatzichristofis SA, Boutalis YS (2008) FCTH: fuzzy color and texture histogram - a low level feature for accurate image retrieval. In Proc. of the Ninth Int. Workshop on Image Analysis for Multimedia Interactive Services, Klagenfurt, pp 191–196 9. Eshkol A, Grega M, Leszczuk M, Weintraub O (2014) Practical application of near duplicate detection for image database. In: Dziech A, Czyżewski A (eds) MCSS 2014. CCIS, vol 429. Springer, Heidelberg, pp 73–82. doi:10.1007/978-3-319-07569-3_6 10. Howlett RJ (2003) Internet-based intelligent information processing systems, series on innovative intelli- gence, vol 3. World Scientific 11. http://deep.it/, (viewed July 25, 2016) 12. http://googlescholar.blogspot.com/2012/08/scholar-updates-making-new-connections.html, (viewed July 25, 2016) 13. http://lucene.apache.org/solr/, (viewed July 25, 2016) 14. http://techblog.outbrain.com/2011/04/under-the-hood-of-our-algorithmic-engine-how-we-serve-content- recommendations/, (viewed July 25, 2016) 15. http://www.cnet.com/how-to/samsung-smart-tv-spying/, (viewed July 25, 2016) 16. http://www.intel.com/content/www/us/en/internet-of-things/infographics/guide-to-iot.html, (viewed July 25, 2016) 17. http://www.live-counter.com/how-big-is-the-internet/, (viewed July 25, 2016) 18. http://www.nature.com/news/how-to-tame-the-flood-of-literature-1.15806, (viewed July 25, 2016) 19. http://www.tvbeurope.com/global-pay-tv-market-exceed-one-billion-2017/, (viewed July 25, 2016) 20. http://www.viaccess-orca.com/content-discovery-platform.html, (viewed July 25, 2016) 21. http://www.viaccess-orca.com/resource-center/white-papers/462-going-deep-into-discovery.html, (viewed July 25, 2016) 22. https://en.wikipedia.org/wiki/Content_discovery_platform, (viewed July 25, 2016) 23. https://www.facebook.com/deepmagazines/, (viewed July 25, 2016) 24. Michael J. Pazzani and Daniel Billsus. 2007. Content-based recommendation systems. In the adaptive web, Peter Brusilovsky, Alfred Kobsa, and Wolfgang Nejdl (Eds.). Lecture notes in computer science, Vol. 4321. Springer-Verlag, berlin 325-341, 25. Romaniak P, Janowski L, Leszczuk M, Papir Z (2012) Perceptual quality assessment for H.264/AVC compression. In: Proc. of consumer communications and networking conference (CCNC), pp 597-602. doi:10.1109/CCNC.2012.6181021 26. Salembier P, Smith JR (2001) MPEG-7 multimedia description schemes. IEEE Transactions on Circuits and Systems for Video Technology 11(6):748–759 27. Salter J, Antonoupoulos N (2006) CinemaScreen recommender agent: combining collaborative and content- based filtering. IEEE Intell Syst 21(1):35–41 28. Slusarczyk, P., Baran, R.: Piecewise-linear subband coding scheme for fast image decomposition, multi- media tools and applications. Springer, US (2014), doi:10.1007/s11042-014-2173-1, 29. Su X, Khoshgoftaar TM (2009) A survey of collaborative filtering techniques. Advances in Artificial Intelligence archive 14090 Multimed Tools Appl (2018) 77:14077–14091 Remigiusz Baran was awarded the M.Sc. in Electrical Engineering from the Faculty of Electrical and Control Engineering, Kielce University of Technology in 1993, and the Ph.D. in Telecommunications from the Faculty of Electrical, Control, Electronic and Computer Engineering, AGH University of Science and Technology in Kraków in 2004. He is currently working as an Assistant Professor at the Kielce University of Technology. He is the author or co-author of over 50 publications in the field of digital signal processing, focusing on image compression and feature extraction. The main areas of his academic interest are feature- and appearance-based object detection and recognition techniques, and microprocessor technology and embedded systems. Apart of his scientific activity he is also an academic teacher. He has promoted approximately 60 MSc students as well at undergraduate as graduate levels He also serves as a reviewer of international journals and conferences. Dr. Baran has participated numerous international and national (Polish) research projects including INDECT, OASIS Archive, Calibrate, INWAS, INSIGMA, TAPAS. At present he is the Project Manager of the second joint Polish-Israeli R&D project IMCOP BIntelligent Multimedia System for Web and IPTV Archiving. Digital Analysis and Documentation of Multimedia Content. Andrzej Dziech Ph.D. Hab. received his M.Sc. and Ph.DPhD in telecommunications from the Electro-technical Institute in Leningrad, Faculty of Automation and Computer Science, in 1970 and 1973. He received postdoc- toral degree in engineering, Science Tech, from the Poznan University of Technology, Faculty of Electrical Engineering in 1978. He is full professor from 1986. His fields of interest are related to digital communication, image and data processing, data compression, information and coding theory, random signals, computer Multimed Tools Appl (2018) 77:14077–14091 14091 communications networks and signal processing. He has worked in number of foreign universities, most recently in 2001–2003 he worked as a visiting professor at the University of Wuppertal in Germany. He co-authored 180 publications including 5 books. He promoted 18 Ph.D. students and approx. 100 M.Sc. He was awarded 4 times by Ministry of Education of Poland for his research achievements. He serves as a co-ordinator of FP7 project INDECT. Andrzej Zeja was awarded the M.Sc. in Electrical Engineering from the Faculty of Electrical and Control Engineering, Kielce University of Technology in 1992. He is currently working as an Assistant Professor at the Kielce University of Technology. The main areas of his activities are programming and integrating distributed heterogeneous computer systems and hardware/software codesing of embedded systems. A. Zeja has participated numerous international and national (Polish) research projects including Calibrate, INWAS, INSIGMA, IMCOP.

Journal

Multimedia Tools and ApplicationsSpringer Journals

Published: Jul 28, 2017

References

You’re reading a free preview. Subscribe to read the entire article.


DeepDyve is your
personal research library

It’s your single place to instantly
discover and read the research
that matters to you.

Enjoy affordable access to
over 18 million articles from more than
15,000 peer-reviewed journals.

All for just $49/month

Explore the DeepDyve Library

Search

Query the DeepDyve database, plus search all of PubMed and Google Scholar seamlessly

Organize

Save any article or search result from DeepDyve, PubMed, and Google Scholar... all in one place.

Access

Get unlimited, online access to over 18 million full-text articles from more than 15,000 scientific journals.

Your journals are on DeepDyve

Read from thousands of the leading scholarly journals from SpringerNature, Elsevier, Wiley-Blackwell, Oxford University Press and more.

All the latest content is available, no embargo periods.

See the journals in your area

DeepDyve

Freelancer

DeepDyve

Pro

Price

FREE

$49/month
$360/year

Save searches from
Google Scholar,
PubMed

Create lists to
organize your research

Export lists, citations

Read DeepDyve articles

Abstract access only

Unlimited access to over
18 million full-text articles

Print

20 pages / month

PDF Discount

20% off