Real-time feature extraction from video stream data for stream ...

2. Video Segmentation and Tagging

All pictures between one I-frame and the next, including the first I-frame, are called a

group-of-pictures (GOP).

Figure 2.6.: Group-of-pictures (GOP)

As the similarity in successive frames is quite large within one shot, encoders tend to

prefer P- and B-frames to encode the video frame. On the other hand, consecutive frames

with hardly no similarity are encoded as I-frames. This is especially the case for frames

right after a hard cut. Therefore the occurrence of I-frames is a good indicator for the

presence of a shot boundary. Furthermore the P- and B-frames hold information about

the motion of objects from one frame to the next. This information is captured in so

called motion vectors (MVs). The comparison of the length, direction and coherence of

motion vectors over several images can hence be used for shot boundary detection, as


Shot boundary detection on compressed video data has, for example, been done by Arman

et al. [Arman et al., 1993]. They are using the discrete co-sinus transformations

(DCT)-features in an encoded MPEG-stream to find shots. Lee et al. [Lee et al., 2000]

use ”direct edge information extraction from MPEG video data”. Zhang et al.

[Zhang et al., 1995a] detect shots based on the pair-wise difference of DCT coefficients

gained from MPEG compressed video data.

Edge tracking

Instead of detecting shot transitions on the images themselves, Zabih et al. have proposed

an algorithm based on transformed frames [Zabih et al., 1995]. They apply a border

detection algorithm (see chapter 6.2.1) on each incoming frame first. Then they count the

number of border pixels that do not correspond to any border pixel in the previous frame

(window size = 2). This is identical to the number of border pixels, which is further than

t pixels away from any border in the previous frame. The higher this amount of changed

borders is, the more likely it is that a shot boundary exists. The comparison to some of

the above mentioned approaches has shown, ”that the simple histogram feature usually is

able to achieve a satisfactory result while some complicated features such as edge tracking

can not outperform the simple feature.” [Yuan et al., 2007]. Lienhard [Lienhart, 1999]

has furthermore pointed out that the edge tracking method is computationally much

more expensive than the simple color histogram methods. Hence I decided to not further

test this approach.

2.2.2. Detection of gradual transitions

Comparing gradual transitions to cuts, we find that the difference values of two successive

frames are significantly smaller for gradual transitions. Hence the above mentioned


