Activity Recognition and Psychophysics: Towards Real-Time Processing of 3D Tele-immersive Video

Klara Nahrstedt, 09:15-10:30, Session Chair: Laszlo Böszörmenyi

Abstract:

With the decreasing cost and ubiquity of 3D cameras, real-time 3D tele-immersive video is becoming a real possibility in the Next Generation of stored and interactive teleimmersive systems. However, real-time processing of 3D tele-immersive videos is still very challenging when executing queries, transmission and/or user interactions with 3D tele-immersive video. In this talk, we will make the case that in order to achieve real-time processing of 3D tele-immersive video, we need to consider two important issues, the understanding of the user perception, and detection of semantics. Based on different case studies, we discuss two psychophysical activity-driven approaches for real-time 3D teleimmersive video. Both approaches achieve real-time processing due to adaptive compression of 3D teleimmersive videos based on understanding of user perception and detecting activities semantics in the content. Both approaches reduce underlying resource usage, while preserving the overall perceived visual Quality of Experience.

In the first case study, we show the importance of the semantic factor “CZLoD: Color-plus-Depth Level-of-Details”. Through psychophysical study we show the existence of two important thresholds, the Just Noticeable Degradation and Just Unacceptable Degradation thresholds on the CZLoD factor which are activity-dependent. This approach then utilizes the activity-dependent CZLoD thresholds in the real-time perception-based quality adaptor, while reducing underlying resource usage and enhancing perceived visual quality. In the second study, we combine activity recognition and real-time morphing-based compression of 3D tele-immersive video, where the morphing rate is controlled by user experience and activities within a resource adaptor, reducing bandwidth and preserving perceived visual quality. In both studies, the results are very encouraging, promising fast retrieval and transmission times, and high quality user interactions with 3D teleimmersive video.

Short Bio

Klara Nahrstedt is the Ralph and Catherine Fisher Professor in the Computer Science Department, and Acting Director of Coordinated Science Laboratory in the College of Engineering at the University of Illinois at Urbana-Champaign. Her research interests are directed toward 3D teleimmersive systems, mobile systems, Quality of Service (QoS) and resource management, Quality of Experience in multimedia systems, and real-time security in mission-critical systems. She is the co-author of widely used multimedia books `Multimedia: Computing, Communications and Applications’ published by Prentice Hall, and ‘Multimedia Systems’ published by Springer Verlag. She is the recipient of the IEEE Communication Society Leonard Abraham Award for Research Achievements, University Scholar, Humboldt Award, IEEE Computer Society Technical Achievement Award, and the former chair of the ACM Special Interest Group in Multimedia. She was the general chair of ACM Multimedia 2006, general chair of ACM NOSSDAV 2007 and the general chair of IEEE Percom 2009.

Klara Nahrstedt received her Diploma in Mathematics from Humboldt University, Berlin, Germany in numerical analysis in 1985. In 1995 she received her PhD from the University of Pennsylvania in the Department of Computer and Information Science. She is ACM Fellow, IEEE Fellow, and Member of the Leopoldina German National Academy of Sciences.

Session Chair: Vincent Charvillat, University of Toulouse, FR; 11:00-12:30

TAME-Diff: Smart Differencing of Time-aligned Multimedia Metadata

Werner Bailer and Andras Horti

We address the issue of determining differences between multimedia metadata documents (e.g., between a reference such as a ground truth and the output produced by an analysis tool), taking the semantics of time-aligned metadata into account. We analyze some issues to be considered when representing ground truth and differences, and propose a representation using a set of edits. As general purpose XML differencing tools turn out to produce noisy output on practical examples from multimedia processing, we propose a tool called TAME-Diff in order to determine differences between multimedia metadata documents. We have implemented the tool to handle time-based metadata represented using the MPEG-7 Audiovisual Description Profile (AVDP).

SeViAnno 2.0: Web-Enabled Collaborative Semantic Video Annotion Beyond the Obvious

Petru Nicolaescu and Ralf Klamma

A lot of tool support for semantic video annotation is already available, supporting users or even automating tasks on different granularity levels in different metadata formats. In some domains like soccer or medicine, highly specialized solutions are even more advanced and created a lot of new business opportunities. Still, support for mobile Web-enabled and collaborative general semantic video annotation is limited, mostly due to the high costs of developing and running such services. SeViAnno is a long-term development addressing in particular such engineering tasks. The current release is based on RESTful Web services enabling much easier integration of new input and output formats on the metadata management level as well as an easier development process of mobile and Web-based clients. This will be demonstrated by different existing prototypes. Together with a cloud computing-oriented open-source strategy this engineering solution is more sustainable and can be exploited with lower costs.

Session Chair: Mathias Lux; 14:00-15:30

Task-based Performance Assessment of Automatic Metadata Extraction Tools

Werner Bailer, Alberto Messina and Fulvio Negro

Automatic metadata extraction tools can improve the effectiveness of media production processes. However, it is difficult to assess the applicability, expected performance and thus cost-effectiveness of a specific tool in a specific task context. We propose the introduction of a task-based approach in the domain of multimedia analysis, in order to assess the practical value of automatic information extraction tools for media production tasks. Using formalized machine readable models of these tasks, multimedia analysis tools can be assessed in the context of a real media production workflow rather than evaluating these tools in an isolated lab setting. We model dependencies of the performance of analysis tools on their input using a Bayesian network, and show that we obtain a better measure for the quality of the analysis results from this particular tool than with generic metrics. We also show that the method can be used for performance prediction based on content properties. The second contribution of this paper is the application of task models for cost simulation, comparing the effort of manual correction of results generated by automatic content analysis tools with different performance levels with a fully manual completion of the task. This enables determining a minimum performance level for cost-effective application of the automatic tools being assessed.

Benchmarking Violent Scenes Detection in Movies

Claire-Hélène Demarty, Bogdan Ionescu, Yu-Gang Jiang, Vu Lam Quang, Markus Schedl and Cédric Penet

This paper addresses the issue of detecting violent scenes in Hollywood movies. In this context, we describe the MediaEval 2013 Violent Scene Detection task which proposes a consistent evaluation framework to the research community. 9 participating teams proposed systems for evaluation in 2013, which denotes an increasing interest for the task. In this paper, the 2013 dataset, the annotations process and the task's rules are detailed. The submitted systems are thoroughfully analysed and compared through several metrics to draw conclusions on the most promising techniques among which multimodal systems and mid-level concept detection. Some further late fusions of the systems are investigated and show promising performances.

CBMI 2014 Program