12.07.2015 Views

Detecting and Segmenting Periodic Motion - CiteSeerX

Detecting and Segmenting Periodic Motion - CiteSeerX

Detecting and Segmenting Periodic Motion - CiteSeerX

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

M.I.T Media Laboratory Perceptual Computing Section Technical Report No. 400<strong>Detecting</strong> <strong>and</strong> <strong>Segmenting</strong> <strong>Periodic</strong> <strong>Motion</strong> Fang Liu <strong>and</strong> Rosalind W. PicardThe Media Laboratory, E15-390Massachusetts Institute of Technology, Cambridge, MA 02139iu@media.mit.edu, picard@media.mit.eduAbstractA new algorithm is presented forthe detection<strong>and</strong> segmentation of multiple periodic motionsin image sequences. A periodicity measure whichprovides an accurate quantitativemeasure ofsignalperiodicity in the form of temporal harmonicenergy ratio is proposed. The periodicity templateswhich incorporate the segmented regionwith the associated periodicity energy distribution<strong>and</strong> motion fundamental frequencies providea new means of periodic motion representation.The algorithm is computationallysimple <strong>and</strong> robustin the presence ofnoise. Real world examplesare used toillustrate the techniques.1 Introduction1.1 Overview<strong>Periodic</strong> motion is common in the natural world. For instance,locomotion often exhibits cyclic movement. <strong>Periodic</strong>ityisalso an important cue in human motion perception.Information regarding the nature of a periodic motion,such aslocation, extent, <strong>and</strong> frequency, isimportantfor motion underst<strong>and</strong>ing. Techniques for periodic motiondetection <strong>and</strong> segmentation can assist in manyapplicationsrequiring object <strong>and</strong> activity recognition <strong>and</strong> representation.The main body of work on periodic motion is modelbased(eg. [1][2]). More recently there is work on motionrecognition directly using low-level features of motion information(eg. [3][4][5]). However, to date, there has notbeen a method which uses low-level motion features to detect<strong>and</strong> segment periodic motion simultaneously. In thiswork, we attempt to tackle this problem by using a Fourierspectral based approach. The periodicity templates generatedin the process incorporate the location, extent, <strong>and</strong>other characteristic information, such asfrequency, ofaperiodicmotion. Therefore, the templates can also be usedfor periodic motion representation.1.2 Our ApproachThe term temporal texture is dened in [4] as \the motionpatterns of indeterminate spatial <strong>and</strong> temporal extent".We would like touse \texture" to refer to any approximatelyhomogeneous phenomenon. In this sense, periodictemporal activity isatype of temporal texture. This work was supported in part by IBM <strong>and</strong> NEC.The algorithm we present hereismotivated by theoryfor textured static images that assumes an underlying r<strong>and</strong>omeld representation for the data [4]. In particular,1-D signals along the temporal dimension can be consideredas stochastic processes. When assuming stationarity,astochastic signal can be decomposed into deterministic(periodic) <strong>and</strong> indeterministic (r<strong>and</strong>om) components(Wold decomposition [6]). In the frequency domain, thetwo components correspond to the singular <strong>and</strong> continuouspart of the Fourier spectra respectively. In practicalapplications, this is to say that the repetitive structure inthe signal contributes only to the spectral harmonic peaks<strong>and</strong> the r<strong>and</strong>om behavior to the smooth part of the spectra.Therefore, the energy contained in the harmonic peaks isagoodmeasure of periodicity inthe signal. In this work,we apply this analysis to the temporal dimension of theimage sequences <strong>and</strong> propose a new measure of temporalperiodicity. This measure is then used to build a periodicitytemplate for simultaneous periodic motion detection<strong>and</strong> segmentation.The actual implementation of the approach above assumesthat the repetitiveness of an action is observablealong lines parallel to the temporal (T) axis. In otherwords, the moving object needs to be tracked just like wex our eyes on a walking person. Typically, optical owbased techniques are used for object tracking. However,ow based methods are usually susceptible to noise. Wepresent here a non-ow-based procedure for foveating onmoving objects by frame alignment.1.3 Related WorkOf the large amount ofresearch in this area, the workof Polana <strong>and</strong> Nelson on periodic motion detection [4] isperhaps the most relevant totheapproach presented inthis paper. In their work, reference curves, which are linesparallel to the trajectory of the motion ow centroid, areextracted <strong>and</strong> their power spectra computed. The periodicitymeasure p f of each reference curve isdened as thenormalized dierence between the sum of the spectral energyat the highest amplitude frequency <strong>and</strong> its multiples<strong>and</strong> the sum of the energy at the frequencies half way between.Besides the value of the periodicity measure itself,there is no checking on the signal harmonicity along thecurve, which isaweakness of the method. The periodicitymeasure for an entire sequence is the maximum of p faveraged among pixels whose highest power spectrum valuesappear on the same frequency. The nal periodicitymeasure is used to distinguish periodic <strong>and</strong> non-periodicmotion by thresholding.In [3], ow based algorithms are used to transform the1


(a)(b)Frame 20 Frame 40Figure 2: Head <strong>and</strong> ankle level XT slices of the Walkersequence. (a) Head leaves a straight track. (b) Walkingankles make acrisscross patternFrame 60 Frame 80Figure 1: Frames 20, 40, 60, <strong>and</strong> 80 of the 97 frame sequenceWalker. Frame size is 320 by 240.image sequence so that the object in consideration staysat the center of the image frame. Then ow magnitudes intessellated frame areas of periodic motion were used as featurevectors for motion classication. There is no precisesegmentation of regions of periodic motion involved.2 Method2.1 OverviewOur system for periodic motion detection <strong>and</strong> segmentationconsists of two stages: 1) foveating by frame alignment2) simultaneous detection <strong>and</strong> segmentation of regionsof periodic motion.In this work, two types of image sequences are considered.(1) Each object which mayhave aperiodic motion isas a whole stationary to the camera, but the backgroundcan be moving. (2) The relative motion between the camera<strong>and</strong> the background is small, permitting minor shake<strong>and</strong> gradual drift as in the h<strong>and</strong>-held situation, <strong>and</strong> eachmoving object is as a whole moving approximately frontoparallelto the camera at relatively constant speeds <strong>and</strong>in a translatory manner. In practice, large number of imagesequences containing natural periodic motions can becategorized into one of the two types.When watching a sequence of a person walking acrossthe image plane, weexperience a notion of repetitive movement.However, if examining the individual frames, therewould be no re-occurring scenes. Several frames of a sequenceWalker are shown in Figure 1 (see also http://[webseq-1]).The reason why wesee a periodic motion in thesequence is due to our ability tofocusour visual attentionon the moving object, or, foveat. The eect of foveatingcan be accomplished computationally by frame alignment.Obviously, foveating is not necessary for sequence type (1),but in fact is a process of transforming sequences of type(2) into type (1).In the second stage, 1-D Fourier transforms are performedon each pixel location along the temporal dimensionof the aligned frames. The spectral harmonic peaksare then detected <strong>and</strong> used to compute harmonic energyinvolved in each frame pixel location. A periodicity templatein frame size is generated based on the ratio betweenthe harmonic energy <strong>and</strong> the total energy. The originalsequence is then masked for regions of periodic motion.In the following, we usethe term data cube to refer tothe three dimensional (X: horizontal Y: vertical <strong>and</strong> T:temporal) data volume formed by putting together all theframes in a sequence, one behind the other. The XT <strong>and</strong>YT slices of the data cube reveal the temporal behaviorusually hidden from the viewer. Figure 2 shows the head<strong>and</strong> ankle level XT slices of the Walker sequence. Thehead leaves a more or less straight track while the walkingankles make acrisscross pattern. Throughout this section,the Walker sequence will be used to illustrate the technicalpoints. More examples are given in Section 3.2.2 Foveating by Frame AlignmentTo align a sequence to a particular moving object, we needrst to detect the trajectory of the object. In existingworks, certain kinds of tracking techniques based on differencingconsecutive frames are usually used to locate themoving objects. The ow-based methods are inevitablysubject to the noise sensitivity (which will be demonstratedlater) inherent in pairwise frame comparison. To avoidthese shortcomings, we takeadvantage of the two types ofrestrictions on data <strong>and</strong> use a method similar to the onein [7] to look directly in the data cube for tracks left bythe moving objects.Since the background of the sequence changes slowly, a1-D median lter can be applied to the data volume alongthe temporal dimension T to exclude moving objects <strong>and</strong>result in a sequence containing mostly the background.Filter length of 11 was used in the Walker sequence. Subtractingthe background from the original sequence, weobtain a dierence sequence containing mainly the movingobjects. By the condition (2) on data, the trajectoriesof the objects as a whole in the dierence cube shouldbe approximately linear. To obtain 2-D representations ofthe trajectories, we simply compute the average of the XTor the YT slices of the dierence cube, which isequivalenttocollapsing the dierence cube top down to the XTplane or sideways to the YT plane. Next, lines in the 2-Dtrajectory images are detected via the Hough transform.The detected lines give theXorYpositions of the movingobjects in each frame. These position values are calledalignment indices. Theaveraged XT image of the Walker2


(a)(b)Figure 3: (a) Averaged XT image of the Walker sequenceafter background removal. (b) Line found in (a) by usingthe Hough transform method.dierence sequence <strong>and</strong> the line found by the Hough transformmethod are shown in Figure 3. Note that multipleobject trajectories can be detected simultaneously usingthis procedure. An example of three walking persons isshown in Section 3.Using the alignment indices of an object, each framein the image sequence can be repositioned to center theobject to any specied position in the XY plane. Afteralignment, the object should appear as moving in place.For instance, after aligning the Walker sequence in the Xdimension (alignment inYdimension is not necessary sincethe overall Y position of the person does not change much),the position of the walker's torso remains in place, but thesurroundings move tothe right. In eect, this is equivalentto focusing our visual attention on an object when viewinga sequence in which the object's position changes frame byframe. We call this process foveating by frame alignment.The aligned sequences are passed on to the second stageof the system.2.3 Detection <strong>and</strong> SegmentationIn this stage, we only deal with sequences already alignedto a particular object. To save computation <strong>and</strong> storage,an aligned sequence can be cropped to limit processing tothe object <strong>and</strong> its vicinity. Itwill become clear later thatthe cropping has no eect on the estimation of periodicmotion. The location <strong>and</strong> size of the cropping windowis estimated from the average XY image of the dierencesequence which is rst aligned to the particular object.Figure 4 shows the averaged aligned XY image <strong>and</strong> thealigned <strong>and</strong> cropped Walker sequence with splits near thecenter of the frames to show the inside of the data cube.Facing the data cube, a line can be drawn from a pixel(x 0y 0)inthe rst frame all the way through the cube toarrive atpixel (x 0y 0)inthe last frame. Clearly, this linecontains pixel (x 0y 0)ofall frames. We namethislinethetemporal line at (x 0y 0), which isofessential importanceto our discussion. If the frame size is N x by N y,thenthereare N x N y temporal lines in the data cube.Since the image sequence is aligned to a particular object,the object should be moving in place. Apparently, ifthe object is moving cyclically in any manner, the periodicitywill be reected in some of the temporal lines. Thetop two images of Figure 5 shows the head <strong>and</strong> ankle levelXT slices of 64 frames (Frame 17 to 80) of the data cubein Figure 4 (b). These slices are in fact the aligned <strong>and</strong>cropped version of the two XTslices in Figure 2. Everycolumn in the two images is a temporal line. The Fourierpower spectra of these temporal lines are the columns ofthe corresponding bottom two imagesinFigure 5. Note(a)(b)Figure 4: (a) Averaged XY image of the aligned Walker sequenceafter background removal. (b) Aligned <strong>and</strong> croppedWalker sequence with splits near the center of the framesto show theinside of the data cube.that the power spectrum values are normalized among alltemporal lines in the data cube. While the spectra of thehead slice shows no harmonic peaks, the periodicity ofthemoving ankles is captured by the spectral harmonic peaks.We refer to the spectral energy corresponding to these harmonicpeaks as the temporal harmonic energy <strong>and</strong> proposeusing the ratio between the harmonic energy <strong>and</strong> the totalenergy of a temporal line (temporal harmonic energy ratio)as a measure of temporal periodicity atthe correspondingpixel location.In [8], a method was presented to detect spectral harmonicpeaks in 2-D r<strong>and</strong>om elds. Here the algorithm isadapted to use for 1-D signals. Given a temporal line, itis rst zero-meaned <strong>and</strong> Gaussian tapered, then its powerspectrum values are computed via a fast Fourier transform.To locate the harmonic peaks, local maxima of thespectrum values (excluding values below 10% of the valuerange in the data cube) are found by searching a size 7neighborhood of each frequency sample. These local maximaprovide c<strong>and</strong>idate locations of harmonic peaks. A localmaximum marks the location of a harmonic peak onlywhen its spectral frequency is either a fundamental or aharmonic. A fundamental is dened as a frequency whichcan be used to linearly express the frequencies of someother local maxima. A harmonic is a frequency which canbe expressed as a linear combination of some fundamentals.Starting from the lowest frequency to the highest,each localmaxima is checked rst for its harmonicity |if its frequency can be expressed as a linear combinationof the existing fundamentals, <strong>and</strong> then for its fundamentality|ifthe multiples of its frequency, combined withthe multiples of existing fundamentals, coincide with thefrequency of another local maximum. A tolerance of onesample point isused in the frequency matching.Due to the nature of the natural temporal signal <strong>and</strong> thewindowing eect in spectra computation, a harmonic peakusually does not appear as a single impulse. Therefore theprocedure so far only provides the summit location of eachpeak. The support of a peak is determined by growingoutward from the summit location along the frequency axisuntil the spectrum value is below certain small value (5% ofthe spectrum value range in this work). After the harmonicpeaks are identied, it is straightforward to compute the3


(a)(b)(a)(b)Figure 5: Temporal lines <strong>and</strong> their normalized power spectraat the head <strong>and</strong> ankle level of the aligned Walker datacube. Top row: XT slices, every column in the images isa temporal line. Bottom row: power spectra, each columnis the power spectra of the corresponding column in theXT slice. (a) Head level, showing no harmonic peaks. (b)Ankle level, periodicity ofthemoving ankles is capturedby the harmonic peaks in spectra.harmonic energy ratio of the temporal line.One may argue that the technique discussed above willfail when a temporal line contains only a sinusoidal signalwhich produces only a single spectral peak. Theoretically,this is correct. However, this situation almost never arisesin the image sequences of natural scenes <strong>and</strong> objects. Toexplain this, wetrace back tothe formation of the signal ona temporal line. Since a temporal line corresponds to a particularpixel in the image plane, having a pattern movingacross the pixel is equivalent tohaving the pixel scanningacross the pattern. When will this scan create a sinusoidalsignal? The answer is only when the pattern has a sinusoidalprole. (An example is to translate horizontally avertical sine grating pattern frontoparallel to the camara ina constant speed.) However, natural edges, patterns, <strong>and</strong>surfaces hardly ever have suchaprole. Therefore, it issafe to say that higher harmonics will usually accompanythe fundamentals in the Fourier Spectra of the temporallines.Applying the peak detection procedure to all temporallines in a data cube <strong>and</strong> registering the temporal harmonicenergy ratio at each pixel location as gray level values ina blank plane of frame size, a periodicity template for periodicmotion is built for the aligned sequence. The largerthe template value, the more periodic energy at the location.At places where no periodic motion is involved, thetemplate value remains zero. Under circumstances suchas a noisy background, some speckle noise may appearinthe template. Simple morphological closing <strong>and</strong> openingoperations can be applied to the template to clean up thespeckle.Figure 6 (a) is the periodicity template of the Walkersequence after one closing <strong>and</strong> one opening operation witha3pixel diameter circular structuring element. As ex-Figure 6: (a) <strong>Periodic</strong>ity template generated from thealigned Walker sequence of Figure 4 (b). The value ateach pixel is that of the temporal harmonic energy ratioof the corresponding temporal line. High value indicatesmore periodic energy at the location. (b) Using the periodicitytemplate to mask the original sequence. The fourframes in Figure 1 are masked <strong>and</strong> stacked together intoone frame.pected, the brightest region is the wedge shape created bythe walking legs. The head, the shoulder, <strong>and</strong> the outlineof the backpack areshown because the walker bounces.The h<strong>and</strong>s appear at the front ofthe body since in mostparts of the sequence the walker was xing his gloves <strong>and</strong>moving his h<strong>and</strong>s in a rather periodic manner. Note thatthe moving background <strong>and</strong> parts of the walker do not appearin the template since there is no periodic motion inthose areas.Since the non-periodic motion of the background doesnot light upinthe templates, it is clear that the sequencecropping earlier in the second stage does not eect thealgorithm itself, but only increases the computational eciency.Using the alignment indices generated at the rst stage,the periodicity template can also be applied to the originalsequence to mask the regions of periodic motion in eachframe. The masked frames corresponding to the ones inFigure 1 are stacked together <strong>and</strong> shown in Figure 6 (b)(see also http://[web-seq-2]).3 ExamplesThree example sequences are used in this paper. One is theWalker sequence. Another walking sequence is Trio, wherethree people walking <strong>and</strong> passing each other. The last oneis an aerobics sequence called Jumping Jack. Walker <strong>and</strong>Trio are taken by ah<strong>and</strong>-held consumer-grade camcorderduring a snow storm. Camera drift <strong>and</strong> the inuence ofbreathing of the cameraman are visible in the sequences.Jumping jack istaken by axed Betacam camera in anindoor setting. These examples are used to demonstrate1) the eectiveness of the new algorithm in the detection<strong>and</strong> segmentation of both single <strong>and</strong> multiple objects withperiodic motions 2) the robustness of the algorithm undernoisy conditions 3) the noise sensitivity of opticalow based estimation methods, which are used for trajectorydetection in many existing works, <strong>and</strong> avoided bythe method proposed here.4


(a)(b)Figure 8: (a) Averaged XT image of the Trio sequenceafter background removal. (b) Lines found in (a) by usingthe Hough transform method.Figure 7: Left column: frames 40, 61, <strong>and</strong> 88 of the 156frame sequence Trio, with frame size 320 by 240. Rightcolumn: frames 40, 61, <strong>and</strong> 88 masked by theperiodicitytemplates of three individuals.3.1 TrioTrio is a 156 frame sequence of three people walking <strong>and</strong>passing each other. Frames 40, 61, <strong>and</strong> 88 of the sequenceare shown in the left column of Figure 7 (see alsohttp://[web-seq-3]).As in the Walker example, a temporal median lter isused to extract the background. After the background islargely removed from the sequence, the averaged XT imageis computed. The lines in the XT image are then detectedvia Hough transform. Figure 8 shows the averaged XTimage <strong>and</strong> the lines detected from the image. The detectedlines provide the alignment indices of each objects. Notethat the alignment indices of all three objects are estimatedsimultaneously.Next, as in the Walker example, the original sequence isaligned <strong>and</strong> cropped for each ofthethree moving individuals.All aligned sequences contain 64 frames. Then thealigned sequences go though the process of power spectrumestimation <strong>and</strong> harmonic peak detection. Finally,the temporal harmonic energy ratio at each pixel locationis computed to generate the periodicity templates. Figure9 shows example frames of each aligned sequences <strong>and</strong>the periodicity templates. The templates are then usedto mask the original sequence. Examples of masked sequenceare shown in the right column of Figure 7 (see alsohttp://[web-seq-4]).Notice that, besides the center person, there is a secondor even a third person passing through in all three alignedsequences. However, their appearance has no eect on theresult of periodic motion detection <strong>and</strong> segmentation. Thisis because, to any individual temporal line, these passersbyare one-time event <strong>and</strong>do not contribute to the temporalharmonic energy of the temporal line. The Trio exampledemonstrates that the proposed algorithm is well suitedfor the detection <strong>and</strong> segmentation of multiple periodicmotions, even under the circumstances of temporary objectocclusion.3.2 Jumping JackIn the Jumping Jack sequence, there is no translatory motioninvolved <strong>and</strong> most part of the background is verysmooth. We use this sequence <strong>and</strong> the noisy versions ofit to demonstrate the robustness of the new algorithm undernoisy conditions <strong>and</strong> also to show the sensitivity ofthe optical ow based motion estimation to noise. Thereare three dierent kinds of input given to the system: theoriginal sequence <strong>and</strong> the sequences corrupted by additiveGaussian white noise (AGWN) with variance 100 <strong>and</strong> 400.The length of the sequences used in power spectrum estimationis increased to 128 due to the cycle of the jumpingmotion. All the data related to the Jumping Jack sequenceare shown in Figure 10 (see also http://[web-seq-5] to [webseq-10]).The second row inFigure 10 are the 57th TY (not YT!)image of each data cube. They show thetracks left by theright h<strong>and</strong> <strong>and</strong> leg. The rows in these images are temporallines <strong>and</strong> the corresponding power spectra are shown in thethird row ofthe gure. The periodicity templates can befound in the fourth row ofthegure. Although the noisedoes cause some degradation in the arm region, other areasof the templates are well preserved.The reason why the proposed algorithm is not aectedby certain amount ofwhitenoiseinthe input is that whitenoise only contributes to the continuous component ofthepower spectrum. As long as the noise energy is not sohigh that it overwhelms the spectral harmonic peaks, thealgorithm works.Most of the related works use ow based methods to extractspatiotemporal surfaces or curves to locate the movingobjects in a sequence, but wewould like todemonstratehere that the noise sensitivity of the ow based methodcan be a drawback inrealapplications. The optical owmagnitudes are estimated here by using the hierarchicalleast-squares algorithm [9] which isbased on a gradientapproach described by [10] [11]. Two pyramids are built,5


Original GWN Var=100 GWN Var=400Figure 9: Example frames of aligned sequences <strong>and</strong> thecorresponding periodicity templates for each individualsof the Trio sequence. Two left columns: example frames.Right column: periodicity templates.one for each ofthe two consecutive frames, <strong>and</strong> motion parametersare progressively rened by residual motion estimationfrom coarse images to the higher resolution images.This algorithm is representative ofthe existing optical owestimation techniques.The last row ofimages in Figure 10 is the optical owmagnitudes of frame 61 in the sequences. When given aclean input such asthe original Jumping Jack sequence,the ow magnitudes can be used to segment out the movingobject. However, under the noisy condition, the algorithmis mostly ineective.3.3 WalkerThe detection <strong>and</strong> segmentation results of the Walker sequenceare mostly shown in the previous section. Herewe show the results from noisy inputs. Additive Gaussianwhite noise of variance 100 <strong>and</strong> 400 was used to corruptthe original sequence. The length of the aligned sequencesfor power spectrum estimation is 64. The resulting periodicitytemplates in Figure 11 clearly show that the proposedalgorithm is robust in the presence of noise.Figure 10: Data of 128 frame Jumping Jack sequence withframe size 155 by 170. Left column: original sequence.Middle column: corrupted by AGWN with variance 100.Right column: corrupted by AGWN with variance 400.Row 1: frame 61 of Jumping Jack sequence. Row 2: TYslice 57, showing the tracks left by the right h<strong>and</strong> <strong>and</strong>leg each rowofthese images is a temporal line. Row3: temporal line power spectra of TY slice 57. Row 4:periodicity templates. Row 5: optical ow magnitudes offrame 61.6


(a) (b) (c)Figure 11: <strong>Periodic</strong>ity templates of the Walker sequence:(a) from original sequence (b) with AGWN of variance100 (c) with AGWN of variance 400. The proposed algorithmis robust in the presence of noise.4 Discussion4.1 AlgorithmCompared to the one used in [4], the periodicity measureproposed here in the form of the temporal harmonic energyratio is a more accurate <strong>and</strong> reliable measure of signalperiodicity. Itnot only can indicate the presence of the periodicbehavior, but also gives a quantitative measure ofhow much energy at each pixel location is contributed bythe periodicity.The use of the periodicity template allows the detection<strong>and</strong> segmentation of the periodic motion to be accomplishedsimultaneously. Since the fundamental frequenciesof the temporal lines are extracted in the process of computingthe templates, they can be registered to the templatesas well. Using this information, areas involved inperiodic motion with dierent cycles can be easily distinguished.This is useful in situations such asapersonwalking in front ofapicket fence. When the sequence isaligned to the person, the fence will be moving in the background<strong>and</strong> leave aperiodic signature on some temporallines. Consequently, the fence area will light upintheperiodicitytemplate. However, the fundamental frequencyof the moving fence is usually quite dierent fromthe oneof the walker.The proposed algorithm can also be considered as a periodicmotion lter. At rst, all moving objects are targetsfor foveating, but then only the ones exhibiting periodicmotions remain. Consider an input of a street with cars<strong>and</strong> pedestrians, the system would lter out the cars <strong>and</strong>keep the pedestrians.Existing related work often uses ow based methods toextract the trajectories of moving objects. Since owbasedmethods can be susceptible to noise, we instead use foveatingby frame alignment tofocus on individual objects. Thisapproach not only helps to improve the performance of thesystem in the presence of noise, but also is ecient inthatit generates alignment parameters of all moving objects atonce.The method introduced here is not computationallytaxing. The most machine intensive part of the algorithmis the 1-D fast Fourier transform used in power spectrumcomputation. However, when the motion cycle is reasonablyshort, such as walking in normal speed, sequencelength of 64 suces. Cropping of aligned sequences alsohelps to speed up the processing.In the current work, we putrestrictive conditions ondata. The steady background condition in the translatorymoving object case is for the background subtraction. Thesystem actually tolerates small camera movement quitewell. When an object is not moving in a translatory mannerwith respect to the camera, its trajectory will not belinear in the data cube <strong>and</strong> a scheme more sophisticatedthan the Hough line detection will have tobeused for theframe alignment. If the object is not moving frontoparallelto the camera, the perspective eect will change thesize of the object in the sequence. However, this changeshould not be large during the period of 64 frames whenthe distance between the camera <strong>and</strong> the object is suf-ciently large comparing to the size of the camera, <strong>and</strong>in many practical situations, this is the case. When thedata is reasonably clean, ow based tracking <strong>and</strong> scalingmethods such astheone in [3] can also be used for framealignment.4.2 ApplicationsAmong other possible applications, the proposed algorithmcan be applied readily to motion classication <strong>and</strong> recognition.In [5], the shape of the active region in a sequencewas used for activity recognition. In [3], the sum of theow magnitudes in tessellated frame areas of periodic motionwere used as feature vectors for motion classication.The periodicity templates produced by the algorithm introducedhere can provide both distinct shapes of regionsof periodic motion, such asthewedge for the walking motion<strong>and</strong> the snow angle for the jumping jack, <strong>and</strong> accuratepixel-level description of a periodic action in the form oftemporal harmonic energy ratio <strong>and</strong> motion fundamentalfrequencies.The characterization of periodic motion is also importanttovideo database related applications. The presence,position, extent, <strong>and</strong> frequency information of a periodicmotion can be used for video representation <strong>and</strong> retrieval.5 SummaryWe have presented a new method for detecting <strong>and</strong> segmentingmultiple periodic motions in image sequences.The method consists of two main parts: 1) frame alignmenttofoveate on individual moving objects 2) Fourierspectral harmonic peak detection <strong>and</strong> energy computationto identify regions of periodic motion. This method detects<strong>and</strong> segments a periodic motion simultaneously <strong>and</strong> is notcomputationally taxing. We illustrated the technique usingreal-world examples <strong>and</strong> demonstrated its robustnessto noise.Anewperiodicity measure in the form of temporal harmonicenergy ratio is proposed. Compared to the periodicitymeasure used in other work, this is a more accuratequantitative measure of how much energyateach pixellocation is contributed by thesignal periodicity.The periodicity templates contain information of thesegmented region as well as the associated energy distribution<strong>and</strong> characteristic cyclic frequencies. The templatescan be used as a means of periodic motion representation.7


AcknowledgmentsThe authors would like tothank John Wang for the hierarchicaloptical ow estimation program, Aaron Bobickfor insightful discussions, <strong>and</strong> Jim Davis for the JumpingJack sequence.References[1] D.D. Homan <strong>and</strong> B.E. Flinchbuagh. The interpretationof biological motion. Biological Cybernatics.,pages 195{204, 1982.[2] M. Allmen <strong>and</strong> C.R. Dyer. Cyclic motion detectionusing spatiotemporal surface <strong>and</strong> curves. In Proc. Int.Conf. on Patt. Rec., pages 365{370, 1990.[3] R. Polana <strong>and</strong> R. Nelson. Low level recognition ofhuman motion. In IEEE Workshop on <strong>Motion</strong> of Nonrigid<strong>and</strong> Articulated Objects, Austin, TX, 1994.[4] R. Polana <strong>and</strong> R. C. Nelson. <strong>Detecting</strong> activities. InProceedings CVPR '93, pages 2{7, New York City,NY, June 1993. Computer Vision <strong>and</strong> Pattern Recognition,IEEE Computer Society Press.[5] A.F. Bobick <strong>and</strong>J.W. Davis. Real-time recognitionof activity using temporal templates. In Workshop inApplications of Computer Vision, Sarasota, Florida,December 1996.[6] H. Wold. A Study in the Analysis of Stationary TimeSeries. Stockholm, Almqvist & Wiksell, 1954.[7] S. A. Niyogi <strong>and</strong> E. H. Adelson. Analyzing gait withspatiotemporal surfaces. In IEEE Workshop on <strong>Motion</strong>of Non-rigid <strong>and</strong> Articulated Objects, pages64{69, Austin, Texas, Nov. 11-12 1994.[8] F. Liu <strong>and</strong> R. W. Picard. <strong>Periodic</strong>ity, directionality,<strong>and</strong> r<strong>and</strong>omness: Wold features for image modeling<strong>and</strong> retrieval. IEEE T. Patt. Analy. <strong>and</strong> Mach. Intell.,18(7):722{733, July 1996.[9] John Y. A. Wang. Layered Image Representation:Identication of Coherent Components in Image Sequences.PhD thesis, Department ofElectrical Engineering<strong>and</strong> Computer Science, Massachusetts Instituteof Technology, Cambridge, September 1996.[10] J. Bergen et al. Hierarchial model-based motion estimation.In Proc. Second Europ. Conf. Comput. Vision,pages 237{252, 1992.[11] B. Lucas <strong>and</strong> T. Kanade. An iterative image registrationtechnique with an application to stereo vision.In Proc. Image Underst<strong>and</strong>ing Workshop, pages 121{131, 1981.8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!