Real-time feature extraction from video stream data for stream ...

Real-time feature extraction from video stream data for stream ...

2. Video Segmentation and Tagging

(a) Four successive frames taken from a ”Tagesschau” news show.

(b) Corresponding color histograms

Figure 2.4.: Difference of gray-level histograms around a shot boundary

Basing on this easy color histogram difference, other authors tried to improve the detection

rate by modifying the formula. Dailianas et al. [Dailianas et al., 1995], for example,

have developed a method that ”tries to amplify the differences of two frames” by squareing

them. Shots are declared, when



(H i (j) − H i+1 (j)) 2

max(H i (j), H i+1 (j)) > T

This method is known as color histogram difference based on χ 2 -tests and goes back

on Nagasaka and Tanaka [Nagasaka and Tanaka, 1992]. ”Since the color histogram does

not incorporate the spatial distribution information of various colors, it is more invariant

to local or small global movements than pixel-based methods.” [Yuan et al., 2007]

Block-based difference

Instead of applying metrics to entire frames, we can also think of comparing smaller

blocks (also referred to as regions or grids) of successive frames. Therefore each frame is

segmented into blocks. Afterwards the shot boundary detection gets either based on all

blocks or just a selection of blocks. Kasturi et al. [Kasturi and Jain, 1991] for example cut

each frame into a fixed number of regions. Then the likelihood ratio for each corresponding

block is computed and a shot boundary is declared if the likelihood ratio of more than

T frames exceeds a given threshold t. Nagasaka and Tanaka [Nagasaka and Tanaka, 1992]

have developed a similar approach. They divide each frame into 4 × 4 grids and compare

the corresponding regions of successive frames. Hence they get 16 difference values.

Instead of using all of them, they base the shot boundary detection on the eight grids

that have the lowest difference values. Hereby their approach gets robust towards noise,

but is prone to panning.

Net comparison by Xiong et al. [Xiong et al., 1995] uses a block-based approach in order

to speed the shot boundary detection up. They break a frame into base windows and only

take some of these base windows into account in order to detect a shot boundary. By only


More magazines by this user
Similar magazines