Real-time feature extraction from video stream data for stream ...

2. Video Segmentation and Tagging

Mierswa and Morik [Mierswa and Morik, 2005] address the problem of automatically

classifying audio data. They extract features from music files and predict the genre, in

which the music file falls, based on the extracted features. Lui et al.[Liu et al., 1998]

use audio data to classify television broadcasts. They observed that low-level audio features

already have a good differentiation power for classifying ”commercials”, ”basketball

games”, ”football games”, ”news”, and ”weather forecasts”. This is already indicated

when looking at the waveforms of different audio data. Figure 2.12 shows three examples.

(a) commercials (b) basketball (c) news

Figure 2.12.: Waveforms for different audio data, corresponding to video data. Figures

taken from [Liu et al., 1998].

2.5.2. Multi-modal analyze

Beside the aforementioned rule-based segmentation algorithm by Aigrain et al.

[Aigrain et al., 1997], other scientists have also developed approaches for the segmentation

of video data that can be seen as multi-modal approaches, taking into account

video and audio data. Sundaram and Chang [Sundaram and Chang, 2000], for example,

segment the audio and video data into scenes separately. Afterwards they determine, at

which shot boundaries as well the audio scene detection as the video scene segmentation

approach claim scene boundaries. Only those that were detected on as well the video as

the audio data are taken into account.

Snoek, and Worring [Snoek and Worring, 2005] and Wang et al. [Wang et al., 2000] have

written good comparisons of further approaches. In the further course of this work, I will

focus on features extracted from video data. As video data does not come along with

audio data in any setting (e.g. surveillance cameras and especially the ”coffee” dataset

I am using later on), I am not going to take into account audio features at all.


