Real-time feature extraction from video stream data for stream ...

Real-time feature extraction from video stream data for stream ...

2. Video Segmentation and Tagging

of totally different objects will have similar colors” [Zhang et al., 1995c]. Nevertheless

the different approaches vary a lot. Mostly only two consecutive frames are considered.

This works well for hard cuts, but is often not enough to detect gradual transitions.

Hence we are first focusing on the detection of hard cuts. In a later subsection I address

the problem problem of detecting gradual transitions.

2.2.1. Detection of hard cuts

As hard cuts are the most common type of transition between two consecutive frames, a

lot of different approaches have been researched to detect them. Most of them quantify

the difference of two images based on the color, but some also focus on edge detection or

motion vector comparison. In this paragraph, some selected approaches are described.

Pair-wise pixel difference

A quite simple way to compare two images in order to figure out, whether they are

significantly different or not, is to determine, how many pixels have changed. These

approaches are most often used on monochromatic images (see chapter 6.2.1), but can of

course be transfered to color images easily. One of the first approaches was published by

Nagasaka et al. in 1991 [Nagasaka and Tanaka, 1992]. They simply add up the differences

of intensity of all pixels (x, y), with x ∈ X, y ∈ Y , where X and Y denote the width and

height of all frames, over two successive frames F i and F i+1 . This sum is then compared

to a given threshold T and a shot boundary is declared, when T is exceeded.

( ∑ x∈X

|F i (x, y) − F i+1 (x, y)|) > T


Using this metric, a small amount of pixels that are dramatically changing can already

cause an exceedance of the threshold. Zhang et al. [Zhang et al., 1993] therefore developed

an approach with two thresholds. One to judge a pixel (x, y) as changed or not

(P C=Pixel Changed) and one to detect the shot boundaries themself. A pixel is declared

as changed in the i + 1th frame, if the difference of its grayscale values between frame

F i and frame F i+1 exceeds a given threshold t.

{ 1 if |Fi (x, y) − F

P C(x, y) =

i+1 (x, y)| > t

0 otherwise

As soon as at least T percent of the total number of pixels changed between frame F i

and frame F i+1 , the frames are declared to be right before respectively right behind a

shot boundary.

P C(x, y)


max(x) + max(y) ∗ 100 > T

Unfortunately this easy metric does not work well in many settings, as it is very sensitive

towards object and camera movements. Given the camera is panning, each pixel (x 1 ,y 1 )

in frame F i+1 will be identically to a pixel (x 2 ,y 2 ) close by in frame F i . But it might


More magazines by this user
Similar magazines