Real-time feature extraction from video stream data for stream ...

ai.cs.uni.dortmund.de

Real-time feature extraction from video stream data for stream ...

2. Video Segmentation and Tagging

Spatial segmentation denotes the segmentation of video content into objects, that share

similar characteristics. This includes Object Recognition as well as Foreground and Background

Identification and is used in various application areas.

Face Detection and People Detection refers to the task of localizing human faces.

Application fields include motion tracking in fun applications like video game consoles

(i.g. XBOX kinect) as well as security applications or the face priority autofocus

capacity of digital cameras and web-cams [Rahman and Kehtarnavaz, 2008].

Face Recognition extends the face detection task by adding functionality to identify

the detected person. It can be used for security applications like access control of

buildings or to detect unusual events in a given environment by analyzing surveillance

videos [Stringa and Regazzoni, 2000, Zhong et al., 2004].

License Plate Recognition is another popular subtask of object recognition. It is

used in order to find stolen vehicles or collect tolls [Chang et al., 2004].

Environment Classification is used to detect and track objects like streets and obstacles

(i.g. [Farabet et al., 2011]). This enable autonomous vehicles ([Turk et al., 1987],

[Pomerleau, 1993]) or planes ([McGee et al., 2005]) to find their way in an unknown

environment.

Foreground and Background Identification can, for example, be used to reconstruct

3D settings from images or video data [Vaiapury and Izquierdo, 2010].

We can imagine some of these approaches to be useful for the segmentation and tagging

of video content. For example, by detecting faces in video data, we could recognize

anchorshots or interviews in news shows. Furthermore face recognition might be useful

to identify actors in movies and hence help to tag scenes. Thus, some of the above

mentioned approaches will be picked up later (i.g. Face Detection in news videos, chapter

2.3.3). But the main task of this thesis is the temporal segmentation of video data in

meaningful parts. Therefore I will focus on temporal video segmentation approaches in

the further course of this chapter.

2.1. Temporal hierarchy of video content

”Recent advances in multimedia compression technology, coupled with the significant

increase in computer performance and the growth of the Internet, have led to the

widespread use and availability of digital videos” [Yuan et al., 2007]. In order to efficiently

use the information given in video material on the web, it is necessary to efficiently

index, browse and retrieve the video data. So far this is most often done manually,

but as the amount of video data is increasing rapidly, automatic parsing and matching

has become an important and quickly growing field of research.

When aiming on cutting a video into meaningful segments, we will find various useful

levels of abstraction. This hierarchy differs from author to author, but most approaches

will agree that television broadcasts consist of shows on the first level. These shows can

8

More magazines by this user
Similar magazines