Real-time feature extraction from video stream data for stream ...

Real-time feature extraction from video stream data for stream ...

7. Experiments and Evaluation

main reason for misclassification: 42 out of 51 misclassification occur due to photographic

flashes. This phenomenon can be observed in the other news videos of the dataset as

well. Hence it seems that photographic flashes are very common. Especially in TV news

those flashes tend to produce a lot of segmentation errors and there are already existing

approaches how to detect them (e.g. [Quénot et al., 2003], [Heng and Ngan, 2003]).

Flash detection is particularly interesting, as a high density of photographic flashes

also is a good indicator for the importance of an events. They most often occur ”in

impressive scenes, such as interviews of important persons” [Takimoto et al., 2006]. Most

likely scenes like that will be recorded by different cameras from different angles. In this

scenario the pattern of photographic flashes can also used to identify the same scene in

different videos [Takimoto et al., 2006]. Therefore photographic flashes can be a valuable

semantic information and their identification might be an interesting task for the further

ViSTA-TV project.

7.1.3. Real-time behavior of the approach

As figure 7.1 shows, the shot boundary detection experiment has been run on a scaled

version of the news show video. This scaled version was used in order to speed the experiment

up and reach a throughput of more than 25 frames per second. This throughput is

necessary when working on live video data later on, as the television program delivered

by Zattoo is broadcasted with 25 frames per second. As the throughput could also be

increased by using better hardware, it nevertheless suggests itself to check, if the recognition

rate improves when using video data with a higher resolution. Hence I have run the

experiment on four different scaled versions of the video. The results of the experiments

are shown in table 7.3.

video file (resolution) time needed (in ms) FP FN

20120911-micro.raw (160 × 90) 67.797 49 3

20120911-small.raw (320 × 180) 193.935 51 3

20120911-scaled.raw (640 × 360) 583.756 51 3

20120911.raw (1280 × 720) 2.455.627 51 3

Table 7.3.: Quality of the classifier for different scaled versions of the news show video.

As we see, the experiment could not be accomplished in realtime using the video file with

the maximum resolution. Taking 2.455.627 milliseconds ≈ 2.455 seconds ≈ 41 minutes

to process and given that the video consists out of 23.457 frames, the throughput is

less than 10 frames per second. Fortunately results on the scaled versions of the videos

turned out to have the same quality than on the largest video. Hence processing scaled

versions of the television programm will be suffient for the shot detection task in the

ViSTA-TV project.


More magazines by this user
Similar magazines