Real-time feature extraction from video stream data for stream ...

Real-time feature extraction from video stream data for stream ...

2. Video Segmentation and Tagging

order to decrease the misclassification rate, further assumptions are made. For example

it is assumed, that anchorshots are at least two seconds long. Furthermore similar frames

are only taken into account, if they appear at least 10 seconds apart from each other.

Nevertheless misclassifications occur due to anchorshots that only appear once per show

and similar news shots that appear more than once throughout a news show. Bertini

et al.[Bertini et al., 2001] assume that anchorshots differ from all other shots in regard

to motion. As neither the anchorperson nor the background or the camera are moving,

their classification approach bases on the presence or absence of motion vectors. But as

digital animations are getting more and more usual in nowadays television shows, their

assumption might already be wrong in many cases. Furthermore zooming is also a widely

common camera technique in some news shows and could cause motion in anchorshots.

Unfortunately these approaches again assume that we have seen the whole news show

at the time we want to label the anchorshots. This assumption is fine in case we want

to organize and structure a news archive, but it is not given in a real-time environment.

Hence the approaches will not be applicable in the ViSTA-TV project.

Multi-expert systems

As none of the above mentioned approaches provides fully satisfactory performance

on arbitrary news videos, it might be useful to combine some of the above mentioned

techniques in a multi-expert system (MES). Such a system uses different classifiers for

anchorshots to classify the incoming data. Afterwards the results of all classifiers are

combined by taking an average or majority voting. A promising approach has been

developed by De Santo, Percannella, Sansone, and Vento [Santo et al., 2004], using three

different unsupervised classifiers.

2.4. Tagging the segments

Beside the segmentation of the video data we are moreover interested in assigning tags

to the video material.

Definition 6 (Tag) A tag is a term or keyword assigned to a piece of data, describing

the data or including valuable meta data.

The segmentation of video data itself already provides a set of tags for videos. For

example we can use the shot boundary detection to directly assign the keyword ”shot

boundary”, ”first frame of a shot”, or ”representative frame” to single shots. Furthermore

the segmentation of news videos allows us to label frames as ”anchorshot frames”, or

assigning keywords like ”news story 1” to the data. But in order to better understand

the user behavior, it might be necessary to add way more tags to the incoming video

data stream. For this thesis I only focused on the above mentioned tags. Nevertheless

this section gives a brief overview of further tags, that might be useful in the future.

Advertisement Another interesting tag might be the segmentation of broadcasted

movies into ”advertise” and ”movie” respectively. This recognition of advertisement

has been done extensively and approaches vary a lot. Some again utilize


More magazines by this user
Similar magazines