an innovative algorithm for key frame extraction in video ...

More documents

Recommendations

Info

5. Experimental Setup5.1 Algorithms TestedWe have compared the results of our algorithm with those of five other key frameextraction algorithms. Together using our Curvature Points (CP) algorithm, wetested the Adaptive Temporal Sampling (ATS) algorithm of Hoon et al. [18], the“Flexible Rectangles” (FR) algorithm of Hanjalic et al. [17], the ShotReconstruction Degree Interpolation (SRDI) algorithm of Tieyan Liu et al. [27],the Perceived Motion Energy (PME) algorithm of Tianming Liu et al. [21], and asimple Mid Point (MP) algorithm.Both the ATS and FR algorithms select the key frames on the basis of thecumulative frame differences: it computes the color histogram differences on theRGB color space, and plots them on a curve of cumulative differences. The keyframes are selected by sampling the y-axis of the curve of the cumulativedifferences at constant intervals. The corresponding values in the x-axis representthe key frames. More key frames are likely to be found in intervals where theframe differences are pronounced than in intervals with lower values for framedifferences. The FR algorithm also uses color histogram differences to build itscurve, but the histograms are computed on the YUV color space. The curve ofcumulative differences is approximated by a set of rectangles, each of which usedto select a key frame. As the widths of the rectangles calculated to minimize theapproximation error, an optimization algorithm is required (an iterative searchalgorithm is used). The input parameter of the algorithm is the number of keyframes (and thus of the approximation rectangles) to be extracted. These authorsalso propose a strategy for deciding how many key frames to select from the shotsof the video sequence: it assigns the number of key frames in proportion to thelength of the shot.Given a specified number of key frames to be extracted, the SRDI algorithmuses the motion vectors to compute the frame’s motion energy. All the motionenergy values are then used to build a motion curve that is passed to a polygonsimplification algorithm. This algorithm retains only the most salient points thatcan approximate the whole curve. The frames corresponding to these points formthe key frame set. If the number of frames in the final set differs from the numberof key frames requested, the set is reduced or increased by interpolating framesaccording to the Shot Reconstruction Degree criteria. When the number of framesis lower than the number desired, the shot is reconstructed by interpolating theframes in the frame set, and the interpolated frames that have largestreconstruction errors are retained up to the number of key frames needed. Whenthe number of frames in the frame set is greater than the number of key framesneeded, the frames in the frame set are interpolated, and those with the minimalreconstruction error are removed from the set. The result of the SRDI algorithmdepends on both the polygon simplification algorithm and the SRD criteria. Theonly parameter that must be set for the ATS, FR, and SRDI algorithms is thenumber of key frames to be extracted.The PME algorithm works on compressed video using the motion vectors asindicators of the visual content complexity of a shot. A triangle model based on aperceived motion energy feature is used to select key frames. With this model, ashot is segmented into sub-segments representing an acceleration and decelerationmotion pattern (each modeled by a triangle). Key frames are then extracted fromthe shots by taking the vertices of the triangles. To compute the perceived motion14
energy feature, the magnitudes of the motion vectors of the B-frames are firstfiltered with two nonlinear filters. For each motion vector in the frame feature, aspatial filter is applied within a given spatial window, and a temporal filter isapplied on values belonging to frames within a given temporal window. For eachB frame, the PME is then computed on the magnitudes of the motion vectors andthe dominant motion direction. This preprocessing requires the setting of severalparameters.A simple procedure then automatically computes the triangles on the PMEvalues and the corresponding key frames. The algorithm requires the setting oftwo parameters the most important of which is the minimum size of a trianglesince it influences the length of the interval between two consecutive key framesThe MP algorithm was chosen because it can represent the extreme case of ouralgorithm, when no evident high curvature points can be found in a shot and thecenter frame of the sequence is chosen as the key frame instead.Where available the parameters set for the algorithms were always thosereported in the original papers. The ATS, FR and SRDI algorithms require theinput parameter of the number of key frames that must be provided. Defining ageneral rule for setting this number is a crucial matter; mthe results may varywidely, depending on the rule selected. We have set the input parameter for thesealgorithms as the same number of key frames found by our algorithm. We canthen compare the algorithms regardless of the number of key frames: anydifference in results depend only on the selection strategy adopted. Since the PMEalgorithm, instead, extracts the key frames in a totally automatic way, as does ouralgorithm, the results depend on both the number of key frames extracted and theselection strategy applied.5.2 Video Data SetSix videos of various genre were used to test the performance of the key frameextraction algorithms. Table 1 summarizes the characteristics of the six video testsequences. The “eeopen” video is a MPEG1 intro sequence of a TV series withshort shots and several transition effects. The “news” and “nwanw1” are twoMPEG1 news sequences; the shots are moderately long, not too dynamic, andmixed with commercial sequences of very fast-paced shots. The “nwanw1” videois similar to the “news” video, but has longer shots. The “football” and“basketball2” video are two MPEG1 sport sequences: “football” exhibits ratherlong shots, while “basketball2” is a single long shot, and both have panning andcamera motion effects. Finally, “bugsbunny” is a MPEG1 short cartoon sequencewith many shots and a number of transition effects.Table 1. The six videos used to test the key frame extraction algorithms. TNF denotes the totalnumber of frames, and NS, denotes the number of shots found; both refer to type A and type Bshots.Video Name GenreLength Resolution(mm:ss) (W×H)TNF NSeeopen TV series intro 00:42 352×240 1 289 24nwanw1 News with commercials 03:39 176×112 6 556 39news News with commercials 02:39 176×112 4 757 12football Sport 03:43 176×112 6 697 28bugsbunny Short cartoon 07:30 352×240 13 492 89basketball2 Single shot sport sequence 00:30 320×220 893 115
Page 1 and 2: AN INNOVATIVE ALGORITHM FOR KEYFRAM
Page 3 and 4: Section 2 of this paper presents se
Page 5 and 6: into the summarization algorithm to
Page 7 and 8: ... ... S tF(t) F(t+1)F(t+n) F(t+γ
Page 9 and 10: histogram. The threshold has been h
Page 11 and 12: the first pass the algorithm detect
Page 13: Fig. 7. An example of key frame sel
Page 17 and 18: was available we also selected the
Page 19 and 20: 6.2 Computational TimeTable 4 shows
Page 21 and 22: performance of the SRDI algorithm e
Page 23 and 24: Table 6. Fidelity measure results c
Page 25 and 26: eeopenFidelityHSTFidelityHWDSRDHSTS
Page 27 and 28: AcknowledgementsThe video indexing

an innovative algorithm for key frame extraction in video ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?