12.07.2015 Views

Background Subtraction Using Ensembles of Classifiers with an ...

Background Subtraction Using Ensembles of Classifiers with an ...

Background Subtraction Using Ensembles of Classifiers with an ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Background</strong> <strong>Subtraction</strong> <strong>Using</strong> <strong>Ensembles</strong> <strong>of</strong> <strong>Classifiers</strong> <strong>with</strong> <strong>an</strong> Extended Feature SetbyBrend<strong>an</strong> F. KlareA thesis submitted in partial fulfillment<strong>of</strong> the requirements for the degree <strong>of</strong>Master <strong>of</strong> Science in Computer ScienceDepartment <strong>of</strong> Computer Science <strong>an</strong>d EngineeringCollege <strong>of</strong> EngineeringUniversity <strong>of</strong> South FloridaMajor Pr<strong>of</strong>essor: Sudeep Sarkar, Ph.D.Lawrence O. Hall, Ph.D.Dmitry B. Goldg<strong>of</strong>, Ph.D.Date <strong>of</strong> Approval:June 30, 2008Keywords: tracking, classification, segmentation, fusion, illumination invari<strong>an</strong>tc○ Copyright 2008, Brend<strong>an</strong> F. Klare


DEDICATIONMy thesis is dedicated to Fred Klare, Christina Hindm<strong>an</strong>, <strong>an</strong>d the 75th R<strong>an</strong>ger Regiment.These have been the largest influences <strong>an</strong>d <strong>of</strong>fered the most support in my life. Without them Iwould not be where I am today.RLTW!


LIST OF FIGURESFigure 1.1 Surveill<strong>an</strong>ce tracking system flowchart 1Figure 1.2 <strong>Background</strong> classification using only RGB features 3Figure 1.3 <strong>Background</strong> classification using ensemble classifier 5Figure 3.1 Set <strong>of</strong> Haar features being used 22Figure 3.2 Illustration <strong>of</strong> equation to compute Haar value 22Figure 3.3 Effects <strong>of</strong> varying illumination on different features 24Figure 4.1 High level few <strong>of</strong> ensemble algorithm 26Figure 5.1 Sample frames from OTCBVS data set 32Figure 5.2 Sample frames from PETS 2001 data set 33Figure 5.3 Sample frames from PETS 2006 data set 33Figure 5.4 Example <strong>of</strong> thresholding a classifier hypothesis 35Figure 5.5 Overall results on the OTCBVS dataset 38Figure 5.6 Results for each individual feature on the OTCBVS dataset 38Figure 5.7 Frame by frame false positive results on OTCBVS dataset 39Figure 5.8 RGB classification image <strong>of</strong> <strong>an</strong> OTCBVS frame 39Figure 5.9 Ensemble classification image <strong>of</strong> <strong>an</strong> OTCBVS frame 40Figure 5.10 Overall results on the PETS 2001 dataset 42Figure 5.11 Results for each individual feature on the PETS 2001 dataset 42Figure 5.12 Frame by frame false positive results on PETS 2001 dataset 43Figure 5.13 RGB classification image <strong>of</strong> a PETS 2001 frame 44Figure 5.14 Ensemble classification image <strong>of</strong> a PETS 2001 frame 45Figure 5.15 Overall results on the PETS 2006 dataset 46Figure 5.16 Results for each individual feature on the PETS 2006 dataset 47iii


Figure 5.17 Frame by frame false positive results on PETS 2006 dataset 47Figure 5.18 Weak hypotheses fused into a single strong hypothesis 49Figure 5.19 Effect <strong>of</strong> object size on gradient magnitude 50Figure 5.20 Classification using ensemble <strong>an</strong>d gradient magnitude 51Figure 6.1 False positive results at 90% true positive rate 53Figure 6.2 OTCBVS ROC graph summary 53Figure 6.3 PETS 2001 ROC graph summary 54iv


<strong>Background</strong> <strong>Subtraction</strong> <strong>Using</strong> <strong>Ensembles</strong> <strong>of</strong> <strong>Classifiers</strong> <strong>with</strong> <strong>an</strong> Extended FeatureSetBrend<strong>an</strong> F. KlareABSTRACTThe limitations <strong>of</strong> foreground segmentation in difficult environments using st<strong>an</strong>dard colorspace features <strong>of</strong>ten result in poor perform<strong>an</strong>ce during autonomous tracking. This work presentsa new approach for classification <strong>of</strong> foreground <strong>an</strong>d background pixels in image sequences byemploying <strong>an</strong> ensemble <strong>of</strong> classifiers, each operating on a different feature type such as the threeRGB features, gradient magnitude <strong>an</strong>d orientation features, <strong>an</strong>d eight Haar features.Thesethirteen features are used in <strong>an</strong> ensemble classifier where each classifier operates on a singleimage feature. Each classifier implements a Mixture <strong>of</strong> Gaussi<strong>an</strong>s-based unsupervised backgroundclassification algorithm. The non-thresholded, classification decision score <strong>of</strong> each classifier arefused together by taking the average <strong>of</strong> their outputs <strong>an</strong>d creating one single hypothesis. Theresults <strong>of</strong> using the ensemble classifier on three separate <strong>an</strong>d distinct data sets are compared tousing only RGB features through ROC graphs. The extended feature vector outperforms theRGB features on all three data sets, <strong>an</strong>d shows a large scale improvement on two <strong>of</strong> the threedata sets. The two data sets <strong>with</strong> the greatest improvements are both outdoor data sets <strong>with</strong>global illumination ch<strong>an</strong>ges <strong>an</strong>d the other has m<strong>an</strong>y local illumination ch<strong>an</strong>ges. When using theentire feature set, to operate at a 90% true positive rate, the per pixel, false alarm rate is reducedfive times in one data set <strong>an</strong>d six times in the other data set.v


classification because a pixel either does or does not belong to the background. This process issynonymously referred to as background classification <strong>an</strong>d background subtraction.Simple background subtraction thresholds a static model image from a current frame <strong>an</strong>dclassifies the resulting pixels as foreground. Sophisticated background subtraction involves theunsupervised learning <strong>of</strong> a background model based on previous image history. This learning c<strong>an</strong>occur at different rates, causing the adaptation to occur at different rates. The learning rate isgenerally a function <strong>of</strong> the frame rate <strong>an</strong>d domain knowledge.<strong>Background</strong> classification is a critical step in surveill<strong>an</strong>ce tracking. A tracking system c<strong>an</strong> onlybe as reliable as the information it is provided, <strong>an</strong>d if the foreground segmentation is performedpoorly then the high level tracker will be processing erroneous information.M<strong>an</strong>y issues plague background classifiers. A learning rate must walk the fine line <strong>of</strong> adaptingto objects that become a member <strong>of</strong> the background (such as a parked car), <strong>an</strong>d recognizing a slowmoving object (such as a car parked at a traffic light). The example <strong>of</strong> h<strong>an</strong>dling both a parkedcar <strong>an</strong>d a waiting car may be mitigated by maintaining a less adaptive rate <strong>an</strong>d relying on thehigh level tracker to distinguish between the two. Other situations may not be ignored as easily.<strong>Background</strong> classification algorithms are generally responsible for h<strong>an</strong>dling dynamic backgrounds.Dynamic backgrounds represent regions in <strong>an</strong> image sequence that occasionally undergo ch<strong>an</strong>ge,but not as a result <strong>of</strong> <strong>an</strong>y interesting foreground object. Examples <strong>of</strong> these include swaying trees,rippling water, <strong>an</strong>d other regions that occasionally exhibit multimodal properties.Arguably the most difficult dynamic background event to process is varying illuminations.Because the brightness <strong>of</strong> a point on a surface is directly related to the illumination it receives[1], a ch<strong>an</strong>ge in illumination will cause the intensity <strong>of</strong> a pixel to vary as well. With RGB <strong>an</strong>dgray intensity levels, this intensity ch<strong>an</strong>ge will appear to be a foreground object when in fact itis the same background object only under different illumination.Illumination ch<strong>an</strong>ges happen either locally or globally. A global illumination ch<strong>an</strong>ge resultsin every pixel in the image undergoing the same ch<strong>an</strong>ge in illumination.This is common inindoor scenes when a light is switched, or during dusk <strong>an</strong>d dawn in outdoor scenes. Because theentire scene is undergoing the ch<strong>an</strong>ge, global illumination ch<strong>an</strong>ges are <strong>of</strong>ten easy to detect. Localillumination ch<strong>an</strong>ges occur when only a specific region <strong>of</strong> <strong>an</strong> image undergoes <strong>an</strong> illuminationch<strong>an</strong>ge. This commonly occurs <strong>with</strong> shadows <strong>an</strong>d <strong>with</strong> directed light sources, such as a flashlight.2


(a) Simple Frame(b) Simple Frame Classification(c) Difficult Frame(d) Difficult Frame ClassificationFigure 1.2. <strong>Background</strong> classification using only RGB featuresLocal illumination ch<strong>an</strong>ges are more difficult to h<strong>an</strong>dle because if the region is recognized asforeground then it is likely to be processed by a high level tracker. This is because its shape c<strong>an</strong>be similar to other real world objects.Illumination ch<strong>an</strong>ges may also be gradual or sharp.Gradual illumination ch<strong>an</strong>ges c<strong>an</strong> beh<strong>an</strong>dled by adaptive background modeling algorithms <strong>with</strong> few issues. Sharp illumination ch<strong>an</strong>geswill generally cause a classifier to fail for a period <strong>of</strong> time until it is able to adapt to the ch<strong>an</strong>ge.If the sharp ch<strong>an</strong>ges occur at a high enough frequency then multimodal modeling algorithms maybe able to overcome this occurrence.Figure 1.2 shows classification on a frame <strong>with</strong> low dynamic properties, as well as a framefrom the same scene that has undergone <strong>an</strong> illumination ch<strong>an</strong>ge. After background subtraction,m<strong>an</strong>y false positive pixels are present in the frame <strong>with</strong> the varying illumination.While theperform<strong>an</strong>ce in the simple frame is acceptable, it is not for the difficult frame.3


Difficulties <strong>with</strong> varying illumination conditions in background classification is one <strong>of</strong> themotivating issues for this work.Most background classification algorithms either ignore theproblem, or apply a specific heuristic to a particular domain. It is the intent <strong>of</strong> this work toexamine whether training classifiers on different image features will allow a meta-classifier to bemore robust to difficult scenarios such as illumination ch<strong>an</strong>ges <strong>an</strong>d dynamic backgrounds, whilestill h<strong>an</strong>dling the simple background classification scenarios. Heightened perform<strong>an</strong>ce is desiredin general classification as well.In this work a novel solution to background classification is presented where multiple backgroundclassifiers are used. These classifiers operate on features that include the st<strong>an</strong>dard RGBintensities, gradient orientation, gradient magnitude, <strong>an</strong>d eight separate Haar features. The resultsgenerated support the fact that this is a superior method to background classification ifcomputation dem<strong>an</strong>ds are not considered. <strong>Using</strong> the approach described throughout the rest <strong>of</strong>the paper <strong>of</strong>fers a promising new direction in background classification in difficult environments.In Figure 1.3, classification using the multiple classifiers algorithm presented in this paperon the same two frames that were classified using RGB features in Figure 1.2 is shown. In thedifficult frame that is undergoing <strong>an</strong> illumination ch<strong>an</strong>ge it is seen that background subtractionyields much better classification results. Perform<strong>an</strong>ce in both frames are acceptable, which wasnot the case using only RGB features.The remainder <strong>of</strong> the thesis is org<strong>an</strong>ized as follows: In Chapter 2 related works to this paperare discussed. Section 2.1 discusses the early approaches to background classification, Section2.2 discusses recent methods <strong>of</strong> background classification, Section 2.3 explains work that focuseson illumination issues in background modeling, <strong>an</strong>d Section 2.4 discusses ensemble methods. InChapter 3 the features that the separate classifiers use are described.Section 3.1 overviewsthe use <strong>of</strong> gradient features, Section 3.2 describes the Haar features, <strong>an</strong>d Section 3.3 discussesthe use <strong>of</strong> alternate color spaces in background classification. Chapter 4 details the algorithmused for background subtraction in this paper. Section 4.1 describes the ch<strong>an</strong>ges made to theMixture <strong>of</strong> Gaussi<strong>an</strong>s algorithm, <strong>an</strong>d Section 4.2 discusses the method used for fusing the ensemble<strong>of</strong> classifiers. Chapter 5 contains the results from using the ensemble algorithm. Section 5.1describes the methodology for generating the comparative results, Section 5.2 discusses the datasets used, <strong>an</strong>d in Section 5.3 the parameter space <strong>of</strong> the algorithm is discussed. In Section 5.44


(a) Simple Frame(b) Simple Frame Classification(c) Difficult Frame(d) Difficult Frame ClassificationFigure 1.3. <strong>Background</strong> classification using ensemble classifier5


comprehensive results using ROC graphs are provided. Chapter 6 concludes this work, whereSection 6.1 contains a summary <strong>of</strong> the work, Section 6.2 discusses future work, <strong>an</strong>d Section 6.3provides final thoughts.6


CHAPTER 2RELATED WORKS2.1 Early Work in <strong>Background</strong> ModelingBecause <strong>of</strong> the computational dem<strong>an</strong>ds that dynamic background modeling incurs, progress inthe field has mainly been over the past decade, paralleling the explosion <strong>of</strong> faster computers. Priorto the current robust models, simple background estimation was performed where the backgroundwas assumed to be static. In these schemes <strong>an</strong> algorithm similar to the one in Equation 2.1 wasused, where B(p) is the estimated background pixel value at pixel p, n is the number <strong>of</strong> framesused to build the background model, <strong>an</strong>d I t (p) is the value <strong>of</strong> pixel p at time t.B(p) =n∑i=0I t (p)n(2.1)For a future frame t, pixel p is then predicted to be foreground (FG) or background (BG)based on Equation 2.2, where τ is the threshold (typically set around 50).if |I t (p) − B(p)| ≥ τ then F G else BG (2.2)The largest adv<strong>an</strong>tage <strong>of</strong> this technique is that it is fast <strong>an</strong>d has very low memory requirementsbecause the images may be deleted after the average is calculated. The major problem <strong>with</strong> such<strong>an</strong> algorithm is that it does not adapt to its environment, <strong>an</strong>d it does not have the ability todetect dynamic background regions. The failure to adapt is eliminated if a running average istaken instead. The running average uses a learning rate α (where 0 < α < 1), <strong>an</strong>d updates thebackground model <strong>with</strong> a new frame I new using Equation 2.3:B new (p) = α · I new (p) + (1 − α) · B old (p) (2.3)7


Updating the background image via Equation 2.3 <strong>of</strong>fers a solution to not having <strong>an</strong> adaptivebackground. However a major problem still exists because the algorithm still relies heavily on τ.A poor selection <strong>of</strong> τ will either result in m<strong>an</strong>y false positives or false negatives. One solutionthat eliminates the usage <strong>of</strong> the rather arbitrary threshold τ is using a Gaussi<strong>an</strong> distribution tomodel each pixel instead <strong>of</strong> merely the me<strong>an</strong> or running average, as is demonstrated in [2, 3].Because a pixel is modeled as a Gaussi<strong>an</strong> distribution, foreground detection is based on a frame’scurrent pixel value against the vari<strong>an</strong>ce.if |I t (p) − B(p)| ≥ σ · k then F G else BG (2.4)In Equation 2.4, now τ = σ · k, where k is some const<strong>an</strong>t, typically around 2. Because thethreshold is directly related to the vari<strong>an</strong>ce the equation is able to adapt to its environment.In [4] (one <strong>of</strong> the most influential papers in background modeling), Stauffer et al. first proposedmodeling the background as a combination <strong>of</strong> multiple Gaussi<strong>an</strong> distributions. This algorithmis referred to the Mixture <strong>of</strong> Gaussi<strong>an</strong>s algorithm. Each pixel p is modeled <strong>with</strong> a group <strong>of</strong> KGaussi<strong>an</strong> distributions for each <strong>of</strong> the red <strong>an</strong>d green color components <strong>of</strong> p (It is assumed thatthe blue color component is ignored due to its poor reception in hum<strong>an</strong> vision), where K is aheuristic value generally set between the values <strong>of</strong> 3 <strong>an</strong>d 5. The algorithm is first initialized,where a series <strong>of</strong> image frames are used to train each pixel by clustering the pixel’s observedtraining values into K sets using simple K-me<strong>an</strong>s clustering [5]. For each set k ∈ K, the me<strong>an</strong>(µ k ) <strong>an</strong>d vari<strong>an</strong>ce (σk 2 ) are computed to parameterize the corresponding Gaussi<strong>an</strong> distribution.Because there are multiple distributions for a single pixel, each distribution k is initially weightedsuch that w(k) = ||k||||K||. There is a future constraint thatK∑w(k) = 1k=1which, holds initially as well.When a new image is processed for segmentation, a pixel is considered to match a particulardistribution if the pixel’s value is <strong>with</strong>in 2.5 st<strong>an</strong>dard deviations, where 2.5 is a heuristic thatmay ch<strong>an</strong>ge based on a particular domain. So <strong>with</strong> K Gaussi<strong>an</strong> distributions, <strong>an</strong>d a pixel historyfor some pixel p = I t (x, y) at time t being {X 1 ...X t }, the probability <strong>of</strong> observing the pixel X t is:8


K∑P (X t ) = w t (k) · η(X t , µ k,t , Σ k,t ) (2.5)k=11η(X t , µ, Σ) =(2π) n 2 |Σ| 1 2e − 1 2 (Xt−µt)T Σ −1 (X t−µ t)where η is the Gaussi<strong>an</strong> density function shown in Equation 2.6.As incoming images are processed each distribution matched is updated based on the pixel’svalue, a learning rate α, <strong>an</strong>d the probability the new pixel belongs to the distribution, as seen inEquations 2.7, 2.8, <strong>an</strong>d 2.9.(2.6)µ t = (1 − ρ) · µ t−1 + ρ · X t (2.7)σ 2 t = (1 − ρ) · σ 2 t−1 + ρ · (X t − µ t ) T · (X t − µ t ) (2.8)ρ = α · η(X t |µ k , σ k ) (2.9)The me<strong>an</strong> <strong>an</strong>d vari<strong>an</strong>ce <strong>of</strong> unmatched distributions remain the same.The weights <strong>of</strong> everydistribution are updated based on Equation 2.10, where M t (k) is 1 if distribution k is matchedat time t, <strong>an</strong>d 0 otherwise.w t (k) = (1 − α) · w t−1 (k) + α · M t (k) (2.10)In order to determine if the pixel is a member <strong>of</strong> the background or foreground, the sum <strong>of</strong>each matched distribution’s weight, β, is calculated. If this value is greater th<strong>an</strong> the threshold T ,then the pixel is classified as a member <strong>of</strong> the background, otherwise it is classified as foreground.The value for T is typically around 0.2.The novelty <strong>of</strong> the approach in [4] at the time it was presented greatly affected the futureprogress <strong>of</strong> background modeling. This is because by using a mixture <strong>of</strong> distributions, the algorithmc<strong>an</strong> h<strong>an</strong>dle dynamic background events as well as gradual illumination ch<strong>an</strong>ges. Previousalgorithms failed to make this guar<strong>an</strong>tee.The parameter space used in Mixture <strong>of</strong> Gaussi<strong>an</strong>s background modeling is characterized<strong>an</strong>d evaluated by Atev et al. in [6]. The effects <strong>of</strong> setting α <strong>an</strong>d β, using various covari<strong>an</strong>ce9


epresentations, <strong>an</strong>d various color spaces were explored in order to underst<strong>an</strong>d the effects <strong>of</strong>altering the parameters.In [7], Gordon et al. used a Mixture <strong>of</strong> Gaussi<strong>an</strong>s along <strong>with</strong> depth information from a stereovision system. Adding the depth component demonstrated more effective results in regions <strong>with</strong>m<strong>an</strong>y foreground objects, but the use <strong>of</strong> this algorithm is contingent on the implementation <strong>of</strong>a stereo vision surveill<strong>an</strong>ce system. Similarly, in [8], depth information generated from a stereovision system is used <strong>with</strong> a Mixture <strong>of</strong> Gaussi<strong>an</strong>s model by Harville. In this method Harville etal. use YUV color space in order to make the algorithm more illumination invari<strong>an</strong>t. Also thelearning rates for each pixel are dynamic in order to allow pixels to adapt at different rates basedon the unique characteristics that the pixel observes.First proposed by Karm<strong>an</strong>n et al. in [9] <strong>an</strong>d later by Ridder et al. in [10], using a Kalm<strong>an</strong> filter[11] to model the background is <strong>an</strong>other popular method <strong>of</strong> performing background classification.A Kalm<strong>an</strong> filter is a recursive estimator that makes a predication on a future state <strong>of</strong> a variablebased on previous state information <strong>an</strong>d noise estimation. When used in background estimationeach image pixel is modeled <strong>with</strong> a Kalm<strong>an</strong> filter. A key advatage <strong>of</strong> a Kalm<strong>an</strong> filter is its abilityto h<strong>an</strong>dle slow illumination ch<strong>an</strong>ges. Because it recursively updates itself <strong>an</strong>d accounts for noise inthe estimations, slow illumination ch<strong>an</strong>ges are seamlessly incorporated into the filter. The failingsin a Kalm<strong>an</strong> filter approach to FG/BG segmentation is its ability to h<strong>an</strong>dle sharp illuminationch<strong>an</strong>ges. The problem is that in order to h<strong>an</strong>dle sharp illumination ch<strong>an</strong>ges one must increasethe gain on the filter, because increasing the gain allows for a more rapid update. According to[10], when the gain is increased it is not possible to stop foreground objects from being rapidlyadapted to by the filter <strong>an</strong>d becoming modeled as background. So the Kalm<strong>an</strong> filter alone hasno ability to distinguish between sharp background ch<strong>an</strong>ges <strong>an</strong>d foreground objects.Based on the equations used in [9] <strong>an</strong>d [10], the procedure for using a Kalm<strong>an</strong> filter inbackground modeling is as follows. A pixel p is classified as foreground if |I(p)−ŝ t (p)| > τ, whereτ is the threshold <strong>an</strong>d ŝ t is the Kalm<strong>an</strong> filter prediction at time t. ŝ t is generated using Equations2.11, 2.12, <strong>an</strong>d 2.13. Both α <strong>an</strong>d β are learning rates, where α < β in order to update the filterless when a foreground outlier is present.10


ŝ t (p) = ŝ t−1 + K(p, t) · (I t (p) − ŝ t−1 ) (2.11)K(p, t) = α · m t−1 (p) + β · (1 − m t−1 (p)) (2.12)if I i (p) = F G then m i (p) = 1 else m i (p) = 0 (2.13)When we have a small number <strong>of</strong> images for background modeling, the use <strong>of</strong> a frame differencingalgorithm is common [12]. In frame differencing, the difference between two adjacentframes is used to determine the presence <strong>of</strong> foreground, as seen in Equation 2.14. Again, the adv<strong>an</strong>tage<strong>of</strong> this algorithm is that it is able to operate on a limited amount <strong>of</strong> data. Shortcomingslie in the fact that the size <strong>of</strong> the detected foreground regions will be too large, large holes mayexist inside <strong>of</strong> objects, <strong>an</strong>d dynamic background detection is difficult.if |I t (p) − I t−1 (p)| ≥ τ then F G else BG (2.14)Toyama et al. describe the Wallflower framework in [13], which performs background subtractionin three separate phases: at the pixel level, region level, <strong>an</strong>d frame level. The pixel levelsuses a Wiener filter to predict future intensity value <strong>of</strong> the pixel. If this differs beyond a thresholdthen the pixel is considered a member <strong>of</strong> the foreground. This approach is similar to usinga Kalm<strong>an</strong> filter for background classification. The region level incorporates spatial informationfrom neighboring pixels into each pixel’s classification. The frame level is used to incorporate amulti-modal property into the framework. Multiple pixel models are used to h<strong>an</strong>dle the differentmodes each pixel has observed. These modes are defined using K-me<strong>an</strong>s clustering over a trainingperiod.2.2 Modern Techniques in <strong>Background</strong> ClassificationIn [14], Zh<strong>an</strong>g et al. build on the work done by Stauffer in [4], <strong>an</strong>d the work by Hou <strong>an</strong>d H<strong>an</strong>in [15]. They use the same K-me<strong>an</strong>s clustering <strong>of</strong> adaptive Gaussi<strong>an</strong>s to model the background.Their technique varies in that the initialization <strong>of</strong> the Guassi<strong>an</strong>s is performed based on theassumption that a pixels background value will always be more frequent then a pixels foregroundvalue. <strong>Using</strong> this assumption the Gaussi<strong>an</strong>s are built around <strong>an</strong> initialization image set. As11


each frame is processed a pixel’s value is used to update its Mixture <strong>of</strong> Gaussi<strong>an</strong>s based on itssimilarity to each Gaussi<strong>an</strong>.In [16], Shimada et al.propose <strong>an</strong> improvement to the Mixture <strong>of</strong> Gaussi<strong>an</strong>s framework,where the number <strong>of</strong> distributions may increase or decrease throughout tracking. The algorithmis highly similar to the one in by Stauffer <strong>an</strong>d Grimson in [4], where the key difference is theaddition <strong>of</strong> steps that decide on whether or not to add or remove a Gaussi<strong>an</strong>. A new distributionis added if no current distribution matches the current pixel value. A distribution is removedwhen its weight decreases below <strong>an</strong> heuristic threshold. Also two distributions will be combinedif the difference <strong>of</strong> their me<strong>an</strong>s are below a certain threshold. One <strong>of</strong> the most signific<strong>an</strong>t benefitsdemonstrated by this technique is improving the runtime. A direct relation was shown betweenthe computation time <strong>an</strong>d the number <strong>of</strong> Gaussi<strong>an</strong>s. By modulating the number <strong>of</strong> distributions<strong>an</strong> optimal number will always be used, ensuring a minimal amount <strong>of</strong> computation.A specialized Kalm<strong>an</strong> filter is used by Gao et al. in [17] to model the background. Thebackground is modeled based on small regions instead <strong>of</strong> a pixel based model. This is based onthe assumption that illumination ch<strong>an</strong>ges <strong>an</strong>d noise are identical to pixels <strong>with</strong>in a region. Theparameters <strong>of</strong> the Kalm<strong>an</strong> filter are predicted by a recursive least square (RLS) filter. The use<strong>of</strong> a RLS filter allows for proper parameterization <strong>of</strong> the Kalm<strong>an</strong> filter in various illuminationconditions.Messelodi et al. propose a Kalm<strong>an</strong> filter that is able to h<strong>an</strong>dle sharp, global illuminationch<strong>an</strong>ges in [18]. This is detected by performing a ratio comparison between each pixel to thebackground models value. For each pixel p, the value I t (p)/K(p) (where K(p) is the Kalm<strong>an</strong>predicted value at pixel p) is placed in a histogram. If the peak <strong>of</strong> the histogram does not lienear 1 then a sharp illumination ch<strong>an</strong>ge is considered to have occurred.This approach wasdemonstrated to have highly satisfactory results in recognizing the sharp illumination ch<strong>an</strong>ges.The major drawback <strong>of</strong> this technique is that it does not successful h<strong>an</strong>dle dynamic backgroundregions, such as moving leaves.In [19], W<strong>an</strong>g <strong>an</strong>d Suter propose a background model using consensus methods, which essentiallystacks separate background models that each use a largely disjoint set <strong>of</strong> parameters. Inthe paper these parameters are the separate color components in RGB, though <strong>an</strong>y other pixelproperty could be used instead. A three phase procedure is used to generate the background12


Segmenting images through the concept <strong>of</strong> image layers is a popular approach to binarysegmentation, where the layers represent different pl<strong>an</strong>es or object groups in <strong>an</strong> image Typicallym<strong>an</strong>y layers will represent both the foreground <strong>an</strong>d background. A foreground layer is generallyconsidered a layer that has spatiotemporal ch<strong>an</strong>ge.Criminisi et al. perform background classification using image layering in [22]. A probabilisticapproach is used where the motion, color, <strong>an</strong>d contrast are combined <strong>with</strong> the spatial <strong>an</strong>d temporalpriors. This use <strong>of</strong> motion cues is generally a limiting feature due to the computational dem<strong>an</strong>ds,however the algorithm presented is able to run in real-time because the actual pixel velocities arenever computed. Instead a binary motion value is assigned to represent the presence or absence<strong>of</strong> motion.In [23], Patwardh<strong>an</strong> et al. present <strong>an</strong> automated method <strong>of</strong> pixel layering. The algorithm has along initialization step followed by a simpler detection process. T frames are used in initialization.The first frame is segmented using the following procedure. The maximum <strong>of</strong> the histogram <strong>of</strong>the gray level pixel intensities, h max is found, <strong>an</strong>d all pixels whose gray levels are <strong>with</strong>in h max ± ρare added to the initial layer, where ρ is derived from the image covari<strong>an</strong>ce matrix. A sampling<strong>of</strong> the about 10-20% <strong>of</strong> the image is used to perform a Kernel Density Estimation (KDE), whichgenerates a probability density function for the initial layer based on a pixel’s feature vector (inthis case RGB features are mapped to the space presented in [24], which is discussed further inSection 2.3.). Based on the KDE probability, the image pixels are reassigned to the layer, whichis called the refinement step. The sampling <strong>an</strong>d KDE continues until the initial layer stabilizes.The previously extracted layer is suppressed from the initial histogram <strong>an</strong>d this process <strong>of</strong> layerextraction continues until a Kullback-Leibler divergence states the most recent extracted layerwas not me<strong>an</strong>ingful. The rest <strong>of</strong> the images are processed using the previous layers <strong>an</strong>d startingat the refinement step.Once the image set is layered from the initialization, future frames are able to be processed.In order to incorporate spatial information, a w × w × M window is used for each pixel, whereM is the the amount <strong>of</strong> previous frames being considered, <strong>an</strong>d w is the amount <strong>of</strong> pixels in the x<strong>an</strong>d y direction considered for spatial information. For each layer that exists in this window, theKDE is used to generate the probability that pixel p belongs to the layer, as well as the layer <strong>of</strong>14


outliers which comprises the foreground. Depending on the layer classification, the pixel is thenclassified as foreground or background.Optic flow [25] has been used for image segmentation as well. In [26], Zucchelli et al. clustermotion fields in order to recognize image pl<strong>an</strong>es. While these pl<strong>an</strong>es are not applied to foregroundsegmentation, the step is highly intuitive given the pl<strong>an</strong>es recovered. The use <strong>of</strong> motion in imagesegmentation has been gaining traction recently because <strong>of</strong> new hardware implementations thatare able to calculate flow in real-time. This represents <strong>an</strong> exciting avenue for future researchusing this additional feature.In [24], motion is used directly for background subtraction. Kernel Density Estimation [27] isused to model the distribution <strong>of</strong> the background using five dimensional feature vectors. Basedon this distribution classification is performed based on thresholded probabilities <strong>of</strong> inst<strong>an</strong>cesbelonging to the background distribution. The feature vectors consist <strong>of</strong> three dimensions fromcolor space intensities, <strong>an</strong>d two from optic flow measurements (which consists <strong>of</strong> the flow measurements<strong>an</strong>d their uncertainties). This algorithm performed extremely well when compared toMixture <strong>of</strong> Gaussi<strong>an</strong> models.2.3 Illumination ConsiderationsM<strong>an</strong>y approaches have been used for illumination invari<strong>an</strong>t background classification. Onereason for poor FG/BG segmentation in varying illumination is because most tracking systemsrely on color from RGB color space, which is highly vari<strong>an</strong>t to illumination ch<strong>an</strong>ges. Becausegrayscale color space is a linear tr<strong>an</strong>sformation from RGB the same problem exists. A commonsolution is the non-linear mapping <strong>of</strong> RGB color to <strong>an</strong>other color space that is less illuminationinvari<strong>an</strong>t.M<strong>an</strong>y color conversions that claim to be illumination invari<strong>an</strong>t have been proposed. Theoritically,the hue component <strong>of</strong> HSI color space <strong>an</strong>d the luma commponent (Y) <strong>of</strong> YCbCr colorspace are illumination invari<strong>an</strong>t, however in practice this is not typically the case (see Section3.3). In [24] Mittal <strong>an</strong>d Paragios use a color mapping <strong>of</strong> RGB → rgI (where I = (R + G + B)/3,r = 3R/I, <strong>an</strong>d g = 3G/I) due to claimed illumination invari<strong>an</strong>ce under certain conditions. Experimentationperformed in this work using this color space failed to observe such illuminationinvari<strong>an</strong>t conditions, though this is not to say they do not exist. In [28], Gevers <strong>an</strong>d Smeulders15


present the color space in Equation 2.15 as <strong>an</strong>other illumination invari<strong>an</strong>t color space. A furtherdiscussion <strong>of</strong> the observed failures using these alternate color spaces c<strong>an</strong> be found in Section 3.3.l 1 =l 2 =l 3 =(R − G) 2(R − G) 2 + (R − B) 2 + (G − B) 2(R − B) 2(R − G) 2 + (R − B) 2 + (G − B) 2 (2.15)(G − B) 2(R − G) 2 + (R − B) 2 + (G − B) 2Illumination ch<strong>an</strong>ges <strong>of</strong>ten cause errors in background segmentation because when illuminationch<strong>an</strong>ges are sharp the pixel intesity value will vary considerably. When these ch<strong>an</strong>ges arenot global <strong>an</strong>d isolated in time, careful color space <strong>an</strong>alysis is <strong>of</strong>ten used to prevent these illuminationch<strong>an</strong>ges being improperly classified. In [29], Horprasert et al. perform segmentation int<strong>of</strong>our separate pixel classes: normal background, shaded background, highlighted background, <strong>an</strong>dforeground. This is accomplished by statistically modeling each pixel based on its chromacity κ(Equation 2.18) <strong>an</strong>d brightness α (Equation 2.17), where, over <strong>an</strong> observed initialization period,µ R (p) is the me<strong>an</strong> value for the red color ch<strong>an</strong>nel at pixel p, σ G (p) is the vari<strong>an</strong>ce over the periodfor the green color ch<strong>an</strong>nel, <strong>an</strong>d I B (p) is the current value pixel p.α p= argmax α[ (IR (p)−α p·µ R (p)) 2+(IG (p)−α p·µ G (p)σ G (p)σ R (p)()I R (p)·µ R (p)σR 2 + I G (p)·µ G (p)(p)σG 2 + I B (p)·µ B (p)(p)σ([ ] [ ] [ B 2 (p)µR (p) µG (p) µB (p)+ +σ R (p) σ G (p) σ B (p)=) 2 ( ) ]IB (p)−α + p·µ B (p) 2σ B (p)(2.16)]) (2.17)κ p =√ (IR (p)−α p·µ R (p)σ R (p)) 2+(IG (p)−α p·µ G (p)σ G (p)) 2+(IB (p)−α p·µ B (p)σ B (p)) 2(2.18)This model is best understood by considering pixel color values as a vector in a three dimensionalspace, where the red, green <strong>an</strong>d blue color components are the different dimensions. For<strong>an</strong>y intensity value I(p), ch<strong>an</strong>ging only the brightness <strong>of</strong> that color will result in a new intensityI ′ (p), where I ′ (p) = α · I(p). In other words, the new vector I ′ (p) is the same underlying coloras I(p), only its intensity has ch<strong>an</strong>ged based on the radi<strong>an</strong>ce <strong>of</strong> the pixel under the illuminationcondition. If the chromacity (color) <strong>of</strong> the pixel has ch<strong>an</strong>ged to I ′′ (p), however, then κ represents16


the shortest Euclide<strong>an</strong> dist<strong>an</strong>ce from the value I ′′ (p) to the line that represents the vector I(p).This me<strong>an</strong>s the chromacity dist<strong>an</strong>ce is based only the color <strong>of</strong> the pixel <strong>an</strong>d not the illumination.When a new pixel is processed by the algorithm from [29], the dist<strong>an</strong>ce κ <strong>an</strong>d scale α arecomputed based on the expected value <strong>of</strong> the pixel from the initialization period.When κsurpasses a threshold the pixel is classified as foreground based on the fact that the color hassignific<strong>an</strong>tly ch<strong>an</strong>ged. Otherwise the pixel is labeled as normal background, shaded backgroundor illuminated background based on α.This algorithm does not update the expected valuesgenerated over the training period. This ch<strong>an</strong>ge could easily be applied where the me<strong>an</strong> <strong>an</strong>dvari<strong>an</strong>ce <strong>of</strong> the pixels are updated using learning rates similar to the methods in [4]. The methodpresented also performs <strong>an</strong> automatic threshold selection based on histogram <strong>an</strong>alysis, makingthe method non-parametric.In [30], Xu <strong>an</strong>d Ellis use Mixture <strong>of</strong> Gaussi<strong>an</strong> background modeling by mapping the RGB colorcomponents into a more illumination invari<strong>an</strong>t color space. Denoting a RGB pixel as p RGB =< p R , p G , p B >, a mapping is performed such that p RGB → p rgb , where p rgb = < p r , p g , p b >,√<strong>an</strong>d p i = p I / p 2 R + p2 G + p2 Bfor iɛ{r, g, b} <strong>an</strong>d Iɛ{R, G, B}. <strong>Using</strong> this color space resulted in aclaimed higher perform<strong>an</strong>ce when classifying background pixels in image sequences <strong>with</strong> variedillumination.One promising technique for background modeling <strong>with</strong> illumination invari<strong>an</strong>ce is using imagegradient information as features for background classification. One <strong>of</strong> the earliest applications<strong>of</strong> using gradient features for background subtraction was by Jabri et al. in [31]. In this workresults are shown using tracking on indoor image sequences. The background is simply modeledusing the me<strong>an</strong> <strong>an</strong>d vari<strong>an</strong>ce the features set (RGB <strong>an</strong>d gradient magnitude).In [32], Javed et al. use a five dimensional feature set containing RGB, gradient magnitude <strong>an</strong>dgradient direction for background subtraction. The RGB features are processed using a Mixture<strong>of</strong> Gaussi<strong>an</strong>s algorithm, <strong>an</strong>d the hypothesis is then augmented <strong>with</strong> information taken from theimage gradients. Results are shown to improve in <strong>an</strong> indoor sequence <strong>with</strong> varied illumination.17


CHAPTER 3FEATURESSt<strong>an</strong>dard bitmaps store a three dimensional vector <strong>of</strong> features per pixel: red, green, <strong>an</strong>d blueintensity values. In this work the number <strong>of</strong> features per pixels will be increased through nonlineartr<strong>an</strong>sformations in order to provide separate feature biases <strong>of</strong> the scenes being observed.A summary <strong>of</strong> the image features that will be used in this work c<strong>an</strong> be found in Table 3.1.<strong>Using</strong> these features gives a total <strong>of</strong> 13 features per pixel. Each <strong>of</strong> these features will be used bya separate classifier that only has knowledge <strong>of</strong> its respective feature.3.1 Gradient FeaturesGradient features are used to detect edges <strong>an</strong>d peaks over intensity ch<strong>an</strong>ges in images. For thebackground classifiers, each pixel in the image frame will be characterized by the non-thresholdedvalues from the magnitude <strong>an</strong>d orientation values <strong>of</strong> the C<strong>an</strong>ny edge detector [44]. <strong>Using</strong> thesetwo features will <strong>of</strong>fer adv<strong>an</strong>tages <strong>an</strong>d disadv<strong>an</strong>tages not found using only RGB features. A majoradv<strong>an</strong>tage is found under varying illumination. In Figures 3.3(g) <strong>an</strong>d 3.3(h), a comparison <strong>of</strong> thegradient magnitude <strong>of</strong> the same scene <strong>with</strong> different illumination conditions shows that gradientmagnitude remains largely invari<strong>an</strong>t to the illumination ch<strong>an</strong>ge. This property will allow ourclassifier to be more robust to varying illumination conditions.One disadv<strong>an</strong>tage <strong>of</strong> using the gradient magnitude is that foreground objects <strong>with</strong> homogeneousintensities will not appear to ch<strong>an</strong>ge for the classifier in the inner areas <strong>of</strong> the object. Thisfact leads to a import<strong>an</strong>t point about the features being used. Individually, these features do not<strong>of</strong>fer a signific<strong>an</strong>t enough representation <strong>of</strong> the scene for a classifier to make accurate predictions.Instead, the combination each <strong>of</strong> these lesser tracking features are assumed to provide a higherrepresentation <strong>of</strong> the image space when used in <strong>an</strong> ensemble.20


Figure 3.1. Set <strong>of</strong> Haar features being usedFigure 3.2. Illustration <strong>of</strong> equation to compute Haar valuethe value <strong>of</strong> a box is computed using Equation 3.3, which is illustrated in Figure 3.2. In theillustration, the points represent different values <strong>of</strong> the integral image. A HaarBox is <strong>an</strong>y <strong>of</strong> theblack <strong>an</strong>d white regions <strong>with</strong>in the Haar features shown in Figure 3.1.IN(p x,y ) = IN(p x,y−1 ) + IN(p x−1,y ) − IN(p x−1,y−1 ) − I(p x,y ) (3.2)HaarBox = IN(x, y) + IN(x + width, y + height) − IN(x, y + height) + IN(x + width, y) (3.3)An example <strong>of</strong> how the value for a particular Haar feature is computed will now be shown.Suppose that Haar Feature 1 is to be computed at point x = 20, y = 20, using a 12x12 sizedwindow. First the values <strong>of</strong> the boxes that define the feature are computed. As seen in Figure3.1, one box covers the left half <strong>of</strong> the window <strong>an</strong>d the other box covers the right half. For boxon the left half the boundaries <strong>of</strong> the box are: left = 14, right = 20, top = 14,bottom = 26. Forthe box on the right half the boundaries <strong>of</strong> the box are: left = 20, right = 26, top = 14,bottom= 26. <strong>Using</strong> the integral image from Equation 3.2, the value for the left box (V l ) is computedusing Equation 3.3: V l = IN(20, 26) + IN(14, 14) − IN(14, 26) − IN(20, 14). The value <strong>of</strong> the right22


ox (V r ) is: V r = IN(26, 26) + IN(20, 14) − IN(20, 26) − IN(26, 14). So the value <strong>of</strong> Haar Feature1 at point x = 20, y = 20 will be V l − V r .3.3 Alternate Color SpacesIt is common in tracking algorithms for alternate color spaces to be used, generally due totheir claimed illumination invari<strong>an</strong>ce.This work does not pl<strong>an</strong> to use alternate color spaces,such as those mentioned in Section 2.3, primarily because illumination invari<strong>an</strong>ce has not beenobserved using alternate color spaces.In [28], <strong>an</strong> evaluation <strong>of</strong> various color spaces is performed to determine which are invari<strong>an</strong>tto illumination ch<strong>an</strong>ges. In this paper it is claimed that hue is invari<strong>an</strong>t to illumination ch<strong>an</strong>ges,however the images in Figures 3.3(c) <strong>an</strong>d 3.3(d) show that this is not necessarily the case. Becausehue is represented as <strong>an</strong> <strong>an</strong>gle θ : θ ∈ [0, 360), each pixel p in the two hue images in Figures 3.3(c)<strong>an</strong>d 3.3(d) maps θ to p according to Equation 3.4. Of course, in this representation values forθ = 0 <strong>an</strong>d θ = 355 will have two highly different pixel values, when the circumferal dist<strong>an</strong>ceis very small. However, for the purposes <strong>of</strong> this visualization that inaccuracy does no matterbecause the ch<strong>an</strong>ge in hue from the illumination ch<strong>an</strong>ge is clearly not because <strong>of</strong> this. Figures3.3(e) <strong>an</strong>d 3.3(f) show that the lumin<strong>an</strong>ce component <strong>of</strong> YCbCr also exhibits signific<strong>an</strong>t ch<strong>an</strong>gesunder varying illumination.p θ = 255 ∗θ360(3.4)23


(a) Original Image, no shadow(b) Original Image, shadow(c) Hue Image, no shadow(d) Hue Image, shadow(e) Luma from YCbCr, no shadow(f) Luma from YCbCr, shadow(g) Gradient Magnitude, no shadow(h) Gradient Magnitude, shadowFigure 3.3. Effects <strong>of</strong> varying illumination on different features24


Figure 4.1. High level few <strong>of</strong> ensemble algorithm26


<strong>of</strong> distributions observed exceeds K then a behavior similar to trashing in operating systems willoccur.In this work each pixel begins <strong>with</strong> one distribution. When <strong>an</strong> new value occurs that doesnot a match a distribution, a new Gaussi<strong>an</strong> is created <strong>an</strong>d added to that pixel. Periodically theweight <strong>of</strong> the distributions at each pixel are checked, <strong>an</strong>d distributions are removed if their weightis less th<strong>an</strong> the learning rate α.<strong>Using</strong> a variable number <strong>of</strong> distributions per pixel allows each pixel to characterize the multimodality<strong>of</strong> its observed region based on observations <strong>an</strong>d not a predefined heuristic value. Thecomputational overhead is minimal when each pixel stores its distributions in a linked list.4.1.2 TrainingIn order initialize the Gaussi<strong>an</strong> distributions used in the Mixture <strong>of</strong> Gaussi<strong>an</strong>s algorithma series <strong>of</strong> training frames are used to provide <strong>an</strong> initial characterization <strong>of</strong> the scene.Thisgenerally requires clustering the observed values from each cluster into K distributions. Unless aconvergence <strong>of</strong> clusters occurs rapidly, clustering is computationally expensive - which me<strong>an</strong>s themodel may not be trained in real-time. A further problem to the st<strong>an</strong>dard initialization is thatif a pixel’s observed training values are from a region that remained static through the trainingframes (i.e. only one cluster actually exists), that pixel’s values will still be partitioned into Kclusters.The training method for the Mixture <strong>of</strong> Gaussi<strong>an</strong>s algorithm in this work is a more simplifiedapproach that is enabled by the fact that the number <strong>of</strong> Gaussi<strong>an</strong> distributions used is variable.<strong>Using</strong> a smaller number <strong>of</strong> training frames (less th<strong>an</strong> 25), the me<strong>an</strong> <strong>of</strong> each <strong>of</strong> these pixels arecalculated <strong>an</strong>d a single distribution is created for that pixel using the me<strong>an</strong> calculated <strong>an</strong>d a fixedvari<strong>an</strong>ce, typically around 10 2 (a fixed vari<strong>an</strong>ce is used because <strong>of</strong> regions that may have ch<strong>an</strong>gesin their training data, causing <strong>an</strong> abnormally high vari<strong>an</strong>ce). For static regions this approachis more robust in terms <strong>of</strong> initial accuracy <strong>an</strong>d speed. For regions that observe ch<strong>an</strong>ge over thetraining frames the speed <strong>of</strong> this approach outweighs the initial lack <strong>of</strong> accuracy.One reason this approach works is that Mixture <strong>of</strong> Gaussi<strong>an</strong>s is <strong>an</strong> adaptive algorithm. Initiallysetting only one distribution for a pixel does not have a long term impact because as othermodes present themselves to that pixel new distributions will be added. Because the me<strong>an</strong> <strong>an</strong>d27


vari<strong>an</strong>ce are able to adapt, <strong>an</strong> initial me<strong>an</strong> <strong>an</strong>d vari<strong>an</strong>ce that are inaccurate will become correctedafter a short period <strong>of</strong> frames.4.2 Classifier FusionAs each frame is processed, <strong>an</strong>d each classifier makes its predication as to whether each pixelis a member <strong>of</strong> the background or foreground, those predications must be merged into a singleprediction. Because this algorithm is <strong>an</strong> unsupervised learner, i.e. there is no labeled trainingdata or user feedback, traditional ensemble algorithms such as bagging <strong>an</strong>d boosting are notpossible. Instead classifier fusion approaches will be used.Three main fusion techniques will be considered in this work, both taken from [40],[41]. Theseare the max rule, the average rule, <strong>an</strong>d the majority vote rule. In [41] it was shown that for binaryclassification the max rule classifier combination is the same as the min rule, <strong>an</strong>d that majorityvote is the same as medi<strong>an</strong> rule, which is why they are not being directly considered.If K classifiers exist, then (because P (F G) = 1−P (BG) for the two classes) the meta decisionrules are found in Equation 4.1 for the max rule, Equation 4.2 for the average rule, <strong>an</strong>d Equation4.3 for the majority voite rule. In the equations, C i is the ith classifier in the ensemble, <strong>an</strong>dP (BG|C i ) is the background probability predicted by classifier i. In Equation 4.3, the output <strong>of</strong>F i (x) is 0 if P (x|C i ) < .5, <strong>an</strong>d 1 if P (x|C i ) >= .5, where x ∈ BG,FG.[argmax argmaxx∈{BG,FG} i[K∑argmaxx∈{BG,FG}1Kargmaxx∈{BG,FG}i=0](P (x|C i ))P (x|C i )[∑ K]F i (x)i=0](4.1)(4.2)(4.3)It is import<strong>an</strong>t to note that the use <strong>of</strong> max rule combination <strong>an</strong>d medi<strong>an</strong> rule combination arepossible because the classifiers being used are able to generate probabilities <strong>of</strong> class membership.This is because their decisions are both based on thresholding functional outputs which representa degree <strong>of</strong> membership. Contrarily, if a classifier only generated a binary decision then majorityvote would be required.28


In both [40] <strong>an</strong>d [41], the average rule was shown to be the most effective fusion technique.Basic observations yield no key differences between the three when fusing the classifiers in thiswork. Because <strong>of</strong> this the average rule was used for classifier fusion in this work.29


CHAPTER 5RESULTS5.1 MethodologyIn order to evaluate the results <strong>of</strong> the proposed meta-classifier, a comprehensive method <strong>of</strong>comparing the outputs from the classifiers must be used.Each classifier outputs a per pixelhypothesis indicating whether that pixel is a member <strong>of</strong> the foreground or some backgroundprocess. A high level tracking system would receive these pixels, cluster them into blobs, <strong>an</strong>dprocess them for a higher level sem<strong>an</strong>tic.<strong>Classifiers</strong> in this work will be evaluated based on their ratio <strong>of</strong> true positive to false positiveclassification. <strong>Using</strong> these measurements allows for a receiver operating characteristics (ROC)graph.ROC graphs are highly expressive, <strong>an</strong>d provide a visualizable comparison <strong>of</strong> multipleclassifiers’ perform<strong>an</strong>ces. ROC graphs are more ideal <strong>with</strong> classifiers that generate a class probabilityor degree <strong>of</strong> membership [49], which the Mixture <strong>of</strong> Gaussi<strong>an</strong>s does by varying the βthreshold parameter. Another benefit <strong>of</strong> using ROC graphs is that the results are invari<strong>an</strong>t tothe distributions <strong>of</strong> classes [49], which is not the case using metrics such as precision <strong>an</strong>d recall[50].In one method <strong>of</strong> evaluating the classifier true positive results would be based on its per pixelaccuracy, however some problems exist using this method. One major problem <strong>with</strong> a per pixelevaluation criteria is that the true positive results may be misleading. Consider, for example, ascene <strong>with</strong> two foreground objects x <strong>an</strong>d y that each occupy the same number <strong>of</strong> image pixels. IfClassifier A correctly classifies all the pixels <strong>of</strong> object x as foreground <strong>an</strong>d incorrectly classifies allthe pixels <strong>of</strong> object y as background, then it will have 50% true positive rate for that frame. IfClassifier B correctly classifies half <strong>of</strong> the pixels in both objects then it too will have a 50% truepositive rate. However, a strong argument could be made that Classifier B is more successful.This is due to the fact that half <strong>of</strong> <strong>an</strong> object’s pixels being classified as foreground should beenough for the higher level tracking algorithm to recognize both <strong>of</strong> these objects. The output30


from Classifier A will certainly not be sufficient in recognizing both objects. In addition to thisdrawback, generating per pixel ground truth information is highly tedious <strong>an</strong>d prone to error.Another method <strong>of</strong> generating true positive results is based on the number <strong>of</strong> objects correctlyfound, which is the method that will be used in this work. <strong>Using</strong> this method the ground truthwill consist <strong>of</strong> bounding box locations <strong>of</strong> the foreground objects. Because the bounding box foreach foreground object will also contain background pixels, whether or not <strong>an</strong> object is found willbe based on the percentage <strong>of</strong> pixels classified as foreground <strong>with</strong>in the box.There are few choices for representations <strong>of</strong> false positives. These pixels could be grouped intoobjects <strong>an</strong>d the number <strong>of</strong> false objects found would then be reported, however this metric failsto incorporate the size or shape <strong>of</strong> the objects found. Instead, a simple count <strong>of</strong> the number <strong>of</strong>false positive pixels will be used. Of course, this will only include pixels outside <strong>of</strong> the boundingbox, ground truth regions labeled as foreground. The ROC graphs presented will list the amount<strong>of</strong> false positives in a r<strong>an</strong>ge <strong>of</strong> 0 to 1. This will be the average number <strong>of</strong> false positives pixelsfrom each frame divided by the total area <strong>of</strong> the frame. Of course this prevents ever having thefalse positive ratio equal 1 when foreground objects are present because not all pixels c<strong>an</strong> belabeled as false positive. The alternative is represent the number the false positives in a frame asthe number <strong>of</strong> false positive pixels divided by the amount <strong>of</strong> possible false positive pixels (whichwould be the total image area minus the area occupied by foreground regions). Theoretically thisis the appropriate metric, however is has the potential to falsely characterize the perform<strong>an</strong>ce.The reason for this is that is that the false positive measure will be a function <strong>of</strong> the foregroundsize. If one frame has a low number <strong>of</strong> foreground regions then the a small number <strong>of</strong> false positivepixels will appear equally as negative as a frame <strong>with</strong> a large number <strong>of</strong> foreground regions <strong>an</strong>dm<strong>an</strong>y more false positive pixels. Clearly the false positive perform<strong>an</strong>ce in the later case is worse<strong>an</strong>d should be reflected when <strong>an</strong>alyzing the total results over a feature set.5.2 Data SetsThree publicly available data sets are used in this work for evaluation. Two <strong>of</strong> the data sets aretaken from the International Workshop on Perform<strong>an</strong>ce Evaluation <strong>of</strong> Tracking <strong>an</strong>d Surveill<strong>an</strong>ce(PETS). One is data set 2 from the 2001 conference [51], <strong>an</strong>d the other is from data set S3, subset31


Table 5.1. Description <strong>of</strong> data sets usedData Set Setting Difficulty Resolution Frame Rate IlluminationOTCBVS[52] Outdoor Hard 320 x 240 Low Highly DynamicPETS 2001 [51] Outdoor Medium 768 x 576 High Slightly DynamicPETS 2006[53] Indoor Easy 720 x 576 High StaticFigure 5.1. Sample frames from OTCBVS data set3, from the 2006 conference. The third data set used is data set 03 from the OTCBVS dataset[52]. These sets each represent separate image domains as seen in Table 5.1.The OTCBVS data set <strong>of</strong>fers the most difficultly due to the varying illumination caused fromcloud cover. The examples that were shown in Figure 3.3 were taken from this data set. Thesharp illumination ch<strong>an</strong>ges caused from rolling cloud cover causes extreme variations in the RGBintensities <strong>an</strong>d do not match the background distributions in the Mixture <strong>of</strong> Gaussi<strong>an</strong>s model.More sample frames from the OTCBVS data may been seen in Figure 5.1.As will be mentioned in Section 5.3, each data set is split into two separate sets: one fortraining to generate the optimal parameters, <strong>an</strong>d one for testing. The split <strong>of</strong> the PETS 2001data sets was such that in the training set no major illumination variations were present <strong>an</strong>din the testing set a gradual, global illumination ch<strong>an</strong>ge occurred.This non-stratified split isexpected when performing sequential splits <strong>of</strong> data sets, but it causes a suboptimal perform<strong>an</strong>ce<strong>of</strong> all classifiers tested on the test set. One solution would be to generate the split <strong>of</strong> the testing<strong>an</strong>d training sets by putting every other image in one set because this would reflect the sameillumination condition in each set. This was not done, however, because having unpredictableconditions in <strong>an</strong> image set demonstrates a greater reflection <strong>of</strong> the real world difficulties trackingsystems face.32


Figure 5.2. Sample frames from PETS 2001 data setFigure 5.3. Sample frames from PETS 2006 data setOther th<strong>an</strong> the global illumination ch<strong>an</strong>ge at the end <strong>of</strong> the PETS 2001 data set it is not aterribly difficult data set. There are trees which sway lightly from wind <strong>an</strong>d some foregroundobjects that are <strong>of</strong>f far in the dist<strong>an</strong>ce. Figure 5.2 contains sample images from the PETS 2001data set.The PETS 2006 data set is the easiest data set <strong>of</strong> the three. It is in <strong>an</strong> indoor environment <strong>an</strong>dthe camera is in close proximity to the foreground objects. The largest difficultly is the reflect<strong>an</strong>ce<strong>of</strong> the floor which causes minor reflections from the foreground objects. Sample frames from thePETS 2006 data set may be found in Figure 5.3.5.3 Parameter SelectionEach <strong>of</strong> the three data sets were separated into two dichotomic, sequential sets, where onewas used for training/parameter exploration, <strong>an</strong>d the other was used for evaluation. Tables 5.2<strong>an</strong>d 5.3 show the precise splits that were used for each data set. Three parameters exist in the33


Table 5.2. Frames used for trainingData Set Training Frames CountOTCBVS Subset 1b: ”img 00000.bmp” - ”img 02106.bmp” 1054PETS 2001 Subset Camera 1: ”0001.jpg” - ”2989.jpg” 2989PETS 2006 ”S7-T6-B.00000.jpg” - ”S7-T6-B.01700.jpg” 1701Table 5.3. Frames used for testingData Set Testing Frames CountOTCBVS Subset 2b: img 00000.bmp - img 01200.bmp 601PETS 2001 Subset Camera 2: ”0001.jpg” - ”2989.jpg” 2989PETS 2006 ”S7-T6-B.01701.jpg” - ”S7-T6-B.03400.jpg” 1700Mixture <strong>of</strong> Gaussi<strong>an</strong>s algorithm. The first parameter is the learning rate α <strong>of</strong> the classifier. Theoptimality <strong>of</strong> this parameter is largely dependent on the frame rate <strong>of</strong> the image sequence. Thenext parameter is the scale factor <strong>of</strong> the st<strong>an</strong>dard deviation for matching <strong>an</strong> incoming value to<strong>an</strong> existing distribution, which we will refer to as k. In [4], k was set to 2.5.The final parameter is the threshold β, which is the parameter that determines what the totalweight <strong>of</strong> the matched Gaussi<strong>an</strong> distributions at a pixel must be for <strong>an</strong> object to be background.This parameter is varied in order to generate a ROC graph so its optimality was not explored.An example <strong>of</strong> the effects <strong>of</strong> varying this parameter on a classifier’s hypothesis over <strong>an</strong> imageis shown in Figure 5.4. The raw image in Figure 5.4(b) shows the classifier predictions prior tothresholding. The sum <strong>of</strong> matched weights for each distribution at a pixel is multiplied by 255 inorder to map a r<strong>an</strong>ge <strong>of</strong> [0, 1] to [0, 255], where the closer to 0 (black) a pixel is the more likelyit is background <strong>an</strong>d the closer to 255 (white) a pixel is the more likely that pixel is foreground.ROC graphs were generated for a discrete sampling <strong>of</strong> α <strong>an</strong>d k, whereα ∈ {.001, .0025, .005, .01, .02, .04} <strong>an</strong>d k ∈ {1.5, 2.5, 3.5, 4.5}. This resulted in 24 distinct parametercombinations tested for each data set using the traditional RGB feature set. Table 5.4 liststhe parameters selected for each data set based on these evaluations.Table 5.4. Parameters used for each data setData Set α kOTCBVS [52] .0025 2.5PETS 2001 [51] .001 1.5PETS 2006 [53] .001 1.534


(a) Original Image(b) Non-Thresholded Classifier Hypothesis Image(c) Threshold=130(d) Threshold=170(e) Threshold=210(f) Threshold=250Figure 5.4. Example <strong>of</strong> thresholding a classifier hypothesis35


The ground truth for the test sets were taken at about every 30 frames in the set, startingabout 800 to 1000 frames into the set.This <strong>of</strong>fered a representation <strong>of</strong> the entire set whileprovided enough time for the classifier to stabilize after initialization.5.4 ResultsThe overall results for using the presented feature sets are extremely encouraging. The use <strong>of</strong>the extended feature set outperformed the baseline RGB classifier all three data sets. The mostsignific<strong>an</strong>t gains were observed in the OTCBVS data set <strong>an</strong>d the PETS 2001 data set. Each <strong>of</strong>these are outdoor data sets <strong>an</strong>d present different illumination issues. Perform<strong>an</strong>ce in the PETS2006 data set was also slightly improved, but because the data set itself is not difficult onlylimited gains were possible.When evaluating the results <strong>of</strong> ensemble classifiers in the ROC graphs, five separate featurecombinations will be compared. The first set <strong>of</strong> features compared is the perform<strong>an</strong>ce using onlythe RGB features. This is the baseline perform<strong>an</strong>ce <strong>an</strong>d failure to improve upon classificationusing only RGB features implies <strong>an</strong> overall failure <strong>of</strong> the feature set. The second set <strong>of</strong> featuresbeing compared is the use <strong>of</strong> every feature mentioned in this paper, which will be referred to asthe extended feature set. This is 13 features (3 for RGB, 2 for gradient, <strong>an</strong>d 8 for Haar). Thenext two feature sets compared are only the Haar features <strong>an</strong>d only the gradient features. Finallya feature set <strong>with</strong> only the RGB <strong>an</strong>d gradient features is compared.5.4.1 OTCBVS Data SetIn Figure 5.10 the ROC graph <strong>of</strong> the perform<strong>an</strong>ce for each feature set on the OTCBVS dataset is shown. The perform<strong>an</strong>ce <strong>of</strong> classification using only the RGB features is clearly the worst.<strong>Using</strong> all <strong>of</strong> the features appears to perform the best unless a liberal accept<strong>an</strong>ce <strong>of</strong> false positivesis used, in which case it only performs slightly worse th<strong>an</strong> then using only the Haar features.The OTCBVS data set is the most difficult data set because <strong>of</strong> the trouble the cloud covercauses. Difficulties in classification on this data set was largely responsible for the concept <strong>of</strong>using <strong>an</strong> extended feature set. The overwhelmingly higher perform<strong>an</strong>ce <strong>of</strong> the extended featureset compared to the RGB feature set is viewed a major success.36


In Figure 5.6, <strong>an</strong> ROC graph <strong>of</strong> the perform<strong>an</strong>ce <strong>of</strong> classifiers using only individual featuresis shown. The results show that the individual features that make up the extended feature set donot perform well individually. This is especially the case <strong>with</strong> the Haar features which generatea high number <strong>of</strong> false positives when working independently. These results are expected fromthe individual classifiers that make up <strong>an</strong> ensemble. That is, a successful ensemble classifier isgenerally the fusion <strong>of</strong> m<strong>an</strong>y weak classifiers [34]. Though the gradient magnitude outperformsthe baseline RGB, this will be shown to not be consistent throughout all the data sets. Further<strong>an</strong>aylsis <strong>of</strong> the individual features perform<strong>an</strong>ce will be shown for the other data sets <strong>an</strong>d discussedin Section 5.4.4.Results <strong>of</strong> the false alarm rate at a 90% true positive rate for each frame tested are shown inFigure 5.7. The perform<strong>an</strong>ce <strong>of</strong> using all features is highly consistent, while when using the RGBfeatures results more sporadic, <strong>an</strong>d worse at every frame. Consistency in classification is a highlydesirable feature, <strong>an</strong>d it is clearly exhibited here when using the extended feature set.Sample bitmaps <strong>of</strong> the classification <strong>of</strong> a frame in the OTCBVS data set at various thresholdsare shown for RGB classification in Figure 5.8 <strong>an</strong>d classification using the entire feature set inFigure 5.9. The dramatic effects <strong>of</strong> the cloud cover is apparent in both, but it is signific<strong>an</strong>tlydiminished using the extended feature set. In the RGB classification using a more discriminatingthreshold erodes the foreground objects before eroding the erroneous classification <strong>of</strong> the cloudcover, as clearly visible in Figure 5.8. Observing the bitmaps from the extended feature set showthat while optimal perform<strong>an</strong>ce is still not achieved, the gains are clear <strong>an</strong>d signific<strong>an</strong>t whencompared to using the RGB feature set.5.4.2 PETS 2001 Data SetThe results for the PETS 2001 data set are shown in Figure 5.10. It is quite clear that thebaseline RGB feature set was easily beaten by all other combinations. Once again using all 13features generated the optimal perform<strong>an</strong>ce throughout most <strong>of</strong> the ROC graph. Depending onthe accepted toler<strong>an</strong>ce for false positives, using only the Haar features <strong>an</strong>d only the gradientfeatures <strong>of</strong>fer solid perform<strong>an</strong>ce as well.The disparity between the perform<strong>an</strong>ce <strong>of</strong> the traditional RGB feature set’s perform<strong>an</strong>ce <strong>an</strong>dthe extended feature set’s perform<strong>an</strong>ce are clear <strong>an</strong>d signific<strong>an</strong>t when classifying on the PETS37


1OTCBVS - Total Perform<strong>an</strong>ce0.90.80.7True Positives0.60.50.40.30.20.1RGBRGB <strong>an</strong>d GradientAll FeaturesGradientHaar0 0.05 0.1 0.15 0.2False PostivesFigure 5.5. Overall results on the OTCBVS dataset1OTCBVS - Total Perform<strong>an</strong>ce for Individual Features0.90.80.7True Positives0.60.50.40.30.2GrayRGBGradient MagnitudeGradient OrientationHaar 1Haar 2Haar 3Haar 4Haar 5Haar 6Haar 7Haar 80.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4False PostivesFigure 5.6. Results for each individual feature on the OTCBVS dataset38


0.25All FeaturesRGBFrame-by-Frame Perform<strong>an</strong>ce - OTCBVS0.2False Positives0.150.10.0500 2 4 6 8 10 12 14 16Figure 5.7. Frame by frame false positive results on OTCBVS dataset(a) Original Image(b) Threshold=220(c) Threshold=230(d) Threshold=240Figure 5.8. RGB classification image <strong>of</strong> <strong>an</strong> OTCBVS frame39


(a) Original Image (b) All Features, Threshold = 220(c) All Features, Threshold = 230 (d) All Features, Threshold = 240Figure 5.9. Ensemble classification image <strong>of</strong> <strong>an</strong> OTCBVS frame40


2001 data set. The same dramatic improvement was shown as well for the OTCBVS data set.This c<strong>an</strong> lead to a generalization that using the extended feature set on outdoor imagery isoptimal to using only RGB features.The results <strong>of</strong> the individual features on the PETS 2001 data set are shown in Figure 5.10.Once again the individual features do not perform as well as the RGB feature set even thoughthe ensemble <strong>of</strong> these features greatly outperforms the RGB only classifier. As was also the casein the OTCBVS data set, the gradient magnitude feature does perform better th<strong>an</strong> the RGBfeature. The reason for this will be discussed in Section 5.4.4.The individual frame perform<strong>an</strong>ce is shown in Figure 5.12.Once again, using the entirefeature set results in <strong>an</strong> extremely consistent false alarm rate. This is still not the case for theRGB feature classification. It exhibits results comparable to the extended feature set initially,but this perform<strong>an</strong>ce deteriorates signific<strong>an</strong>tly at the end <strong>of</strong> the data set. The end <strong>of</strong> the dataset is when the global illumination ch<strong>an</strong>ge occurs.As mentioned previously, one characteristic <strong>of</strong> the PETS 2001 data set is that there is agradual, global illumination ch<strong>an</strong>ge towards the end <strong>of</strong> the sequence. This illumination ch<strong>an</strong>gecauses problems using only RGB features but not when using the extended feature set.Anexample <strong>of</strong> this is shown in Figures 5.13 <strong>an</strong>d 5.14, where the regions in the yellow box indicatethe foreground objects, which are two people walking together.In Figure 5.13,the RGB classification <strong>of</strong> the sample frame from when the illumination ch<strong>an</strong>geis occurring shows that the reflectiveness <strong>of</strong> the building is causing m<strong>an</strong>y pixels <strong>of</strong> the building tobe classified a foreground when a more liberal threshold is applied. As that threshold is raised,though, the actual foreground pixels are lost. This shows that <strong>an</strong> optimal threshold does not existthat c<strong>an</strong> discriminate the false positives from the true positives. In Figure 5.14 the same sample ispresented but classification is now done <strong>with</strong> the entire feature set. With a liberal threshold falsepositives exist as well, however as this threshold is raised the false positive regions are suppressed<strong>an</strong>d the actual foreground objects are still maintained. Clearly this is <strong>an</strong> signific<strong>an</strong>t improvement.5.4.3 PETS 2006 Data SetFigure 5.15 contains the ROC graph for the perform<strong>an</strong>ce <strong>of</strong> the feature sets on the PETS2006 data set. It is not the case that the RGB feature performed the worst in this set. In fact the41


PETS 2001 - Total Perform<strong>an</strong>ce10.90.8True Positives0.70.60.5RGBRGB <strong>an</strong>d GradientAll FeaturesGradientHaar0 0.002 0.004 0.006 0.008 0.01 0.012False PositivesFigure 5.10. Overall results on the PETS 2001 datasetPETS 2001 - Total Perform<strong>an</strong>ce for Individual Features10.90.8True Positives0.70.60.50.4GrayRGBGradient MagnitudeGradient OrientationHaar 1Haar 2Haar 3Haar 4Haar 5Haar 6Haar 7Haar 80 0.01 0.02 0.03 0.04 0.05False PositivesFigure 5.11. Results for each individual feature on the PETS 2001 dataset42


0.06All FeaturesRGBFrame-by-Frame Perform<strong>an</strong>ce - PETS 20010.050.04False Positives0.030.020.0100 5 10 15 20Figure 5.12. Frame by frame false positive results on PETS 2001 datasetperform<strong>an</strong>ce <strong>of</strong> the RGB feature set was quite competitive <strong>with</strong> the extended feature set. It is stillthe case, though, that the extended feature set provided the optimal perform<strong>an</strong>ce <strong>with</strong> respectto the other feature set combinations. The poor perform<strong>an</strong>ce when using only the Haar features<strong>an</strong>d only the gradient features are quite notable as well. Despite the disappointing perform<strong>an</strong>ce<strong>of</strong> these features they still m<strong>an</strong>age to augment the RGB features <strong>an</strong>d make the extended featureset the optimal classifier.In Figure 5.16, the results <strong>of</strong> the individual features classification on the PETS 2006 data setare shown. The RGB clearly outperforms all other features, including the gradient magnitude.This finalizes the trend seen now in all three data sets that the individual features do not <strong>of</strong>fer<strong>an</strong>y consistent improvement over the RGB feature set, while using all <strong>of</strong> these features in <strong>an</strong>ensemble classifier outperforms the RGB features every time.Figure 5.17 contains the perform<strong>an</strong>ce at individual frames for the PETS 2006 data set. Theextended feature set appears less consistent in the data set.This may be rather misleading,however, because the false alarm rate is so low that the inconsistencies become exaggerated.The PETS 2006 data set <strong>of</strong>fered the lowest perform<strong>an</strong>ce gains when comparing the extendedfeature set the RGB features. The primary reason for this is that this data set is not a terribly43


(a) Original Image(b) Threshold=220(c) Threshold=230(d) Threshold=240Figure 5.13. RGB classification image <strong>of</strong> a PETS 2001 frame44


(a) Original Image(b) All Features, Threshold = 220(c) All Features, Threshold = 230(d) All Features, Threshold = 240Figure 5.14. Ensemble classification image <strong>of</strong> a PETS 2001 frame45


PETS 2006 - Total Perform<strong>an</strong>ce10.950.90.85True Positives0.80.750.70.650.6RGBRGB <strong>an</strong>d GradientAll FeaturesGradientHaar0.005 0.01 0.015 0.02 0.025False PostivesFigure 5.15. Overall results on the PETS 2006 datasetdifficult data set. Because it is <strong>an</strong> indoor environment <strong>an</strong>d the foreground objects are in closeproximity to the camera there is little room for improvement from the st<strong>an</strong>dard RGB features.Despite this a slight improvement was still seen when incorporating gradient features <strong>an</strong>d Haarfeatures. In real world use it would be unlikely to use these additional features, however, due tothe increased computational load they incur <strong>an</strong>d the marginal gains they <strong>of</strong>fer.5.4.4 Feature Perform<strong>an</strong>ceThe perform<strong>an</strong>ce <strong>of</strong> classifiers only using a single feature from the extended feature set waspoor, as observed in Figures 5.6, 5.11, <strong>an</strong>d 5.16. When these weak classifiers are combined into<strong>an</strong> ensemble <strong>of</strong> classifiers, the meta-classification <strong>of</strong>fers subst<strong>an</strong>tial results. This adheres to thetheory <strong>of</strong> ensemble learning presented by Schapire in [34].An example <strong>of</strong> how a series <strong>of</strong> weak classifier hypotheses are able to result in a strong hypothesisin background classification is shown in Figure 5.18, where the 8 hypotheses from eachindividual Haar feature are shown as well as the fusion <strong>of</strong> those classifiers. The Haar classifierclearly has a tendency to over classify pixels are foreground, resulting in m<strong>an</strong>y false positives.This is evidenced in the ROC graphs where the classifiers using individual Haar features will bothmaximize the true positives <strong>an</strong>d false positives. Fortunately there is a low overlap between these46


PETS 2006 - Total Perform<strong>an</strong>ce for Individual Features10.90.80.7True Positives0.60.5GrayRGB0.4Gradient MagnitudeGradient OrientationHaar 10.3Haar 2Haar 3Haar 40.2Haar 5Haar 6Haar 7Haar 80.10 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045False PostivesFigure 5.16. Results for each individual feature on the PETS 2006 dataset0.012Frame-by-Frame Perform<strong>an</strong>ce - PETS 2006All FeaturesRGB0.010.008False Positives0.0060.0040.00200 2 4 6 8 10 12 14 16 18 20Figure 5.17. Frame by frame false positive results on PETS 2006 dataset47


features false positives <strong>an</strong>d a high overlap in their true positives, resulting in strong backgroundclassification.It was observed in the PETS 2001 data set <strong>an</strong>d the OTCBVS data set that the perform<strong>an</strong>ce<strong>of</strong> classification using only the gradient magnitude feature was more successful th<strong>an</strong> using thebaseline RGB. Similar results were not observed in the PETS 2006 data set. It is believed thatthe reason this occured is that the foreground objects in the PETS 2001 <strong>an</strong>d OTCBVS datasets are much farther from the camera th<strong>an</strong> they are in the PETS 2006 data set. This resultsin lower resolutions for the foreground objects, which in turn implies less homogeneous regionsover the space occupied by the foreground object. The final implication is that edge featuresare more prominent on the low resolution objects. This may be observed in Figure 5.19, where<strong>an</strong> object from the PETS 2006 <strong>an</strong>d OTCBVS data set are shown as well as their correspondinggradient magnitude image. In the low resolution object it has distinct edge features over its entiresilhouette, while in the high resolution object these features are only prominent on the boundaries<strong>of</strong> the object <strong>an</strong>d not in the homogeneous regions. A conclusion from this observation is thatthe use <strong>of</strong> gradient magnitude in background classification <strong>of</strong> smaller objects, or objects are fardist<strong>an</strong>ces, is superior to the RGB features.Due to the heightened perform<strong>an</strong>ce <strong>of</strong> the gradient magnitude in the PETS 2001 <strong>an</strong>d OTCBVSdata sets, a comparison <strong>of</strong> the perform<strong>an</strong>ce against the extended feature set was performed. TheROC comparisons c<strong>an</strong> be seen in Figure 5.20. In the OTCBVS data set, the use <strong>of</strong> all featuresresults in far more accurate classification th<strong>an</strong> only using the gradient magnitude. The extendedfeature set also outperforms the gradient magnitude in the PETS 2001 data set, though thereis less signific<strong>an</strong>t <strong>of</strong> <strong>an</strong> improvement. Because <strong>of</strong> the consistent outperform<strong>an</strong>ce <strong>of</strong> the extendedfeature set over using only the gradient magnitude, it recomended to use all the features for moreaccurate classification.If computational dem<strong>an</strong>ds are a limitation in a system observing lowresolution objects then using only the gradient magnitude is <strong>an</strong> adequate alternative.48


(a) Original Image(b) Haar 1 Classification(c) Haar 2 Classification(d) Haar 3 Classification(e) Haar 4 Classification(f) Haar 5 Classification(g) Haar 6 Classification(h) Haar 7 Classification(i) Haar 8 Classification(j) All Haar Classifications CombinedFigure 5.18. Weak hypotheses fused into a single strong hypothesis49


(a) PETS 2006 Original Image(b) PETS 2006 Magnitude Image(c) OTCBVS Original Image (4xZoom)(d) OTCBVS Magnitude Image (4xZoom)Figure 5.19. Effect <strong>of</strong> object size on gradient magnitude50


1OTCBVS - Total Perform<strong>an</strong>ce0.90.80.7True Positives0.60.50.40.30.2All FeaturesGradient Magnitude0 0.01 0.02 0.03 0.04 0.05 0.06 0.07False Positives1PETS 2001 - Total Perform<strong>an</strong>ce0.90.80.7True Positives0.60.50.40.30.2All FeaturesGradient Magnitude0 0.001 0.002 0.003 0.004 0.005 0.006False PositivesFigure 5.20. Classification using ensemble <strong>an</strong>d gradient magnitude51


CHAPTER 6CONCLUSION6.1 SummaryTraditional image features in background classification are insufficient in m<strong>an</strong>y environments.Particularly outdoor environments, environments <strong>with</strong> varying illumination, <strong>an</strong>d scenes where theforeground objects are either small or a far dist<strong>an</strong>ce from the camera. Without the availability<strong>of</strong> additional spectral data (such as infrared or thermal), alternate feature sets will need to bederived from the st<strong>an</strong>dard RGB features. <strong>Using</strong> a feature set that includes gradient magnitude,gradient orientation, <strong>an</strong>d eight separate Haar features for classification yielded signific<strong>an</strong>tly superiorclassification perform<strong>an</strong>ce on more difficult data sets, <strong>an</strong>d still marginally outperformed ina simpler data set.A summary <strong>of</strong> the perform<strong>an</strong>ces in terms <strong>of</strong> false positive rates on all three data sets areshown in Figure 6.1 (NOTE: In Figure 6.1, false positive perform<strong>an</strong>ce in the OTCBVS data setscaled down by a factor <strong>of</strong> 6.75). Each perform<strong>an</strong>ce is based on the lowest false positive ratethat generated at least a 90% true positive classification rate. Table 6.1 shows the same results,as well as the ratio <strong>of</strong> improvement seen using the extended feature set.It c<strong>an</strong> be seen thatfalse positives are more th<strong>an</strong> 5 times higher in the OTCBVS data set, <strong>an</strong>d more th<strong>an</strong> 6 timeshigher in the PETS 2001 data set when using RGB features instead <strong>of</strong> the extended feature setpresented. These two data sets are both outdoor data sets <strong>with</strong> dynamic illumination events. InFigure 6.2 <strong>an</strong>d Figure 6.3 ROC graphs are shown that compare RGB classification strictly to theclassification using the extended feature set on each <strong>of</strong> these two datasets.Table 6.1. False positives at a 90% true positive rateData Set RGB FP All Features FP FP RatioOTCBVS 0.1046 0.0204 5.17:1PETS 2001 0.0094 0.0014 6.71:1PETS 2006 0.0027 0.0019 1.42:152


Figure 6.1. False positive results at 90% true positive rateOTCBVS - Summary10.90.8True Positives0.70.60.50.4RGBAll Features0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18Figure 6.2. OTCBVS ROC graph summary53


OTCBVS - Summary10.90.8True Positives0.70.60.50.4RGBAll Features0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18Figure 6.3. PETS 2001 ROC graph summaryComparisons using ROC graphs clearly showed the adv<strong>an</strong>tage <strong>of</strong> using additional features forbackground classification in image sequences. The algorithm used was <strong>an</strong> ensemble algorithm.It fused the result <strong>of</strong> multiple classifier hypotheses into a single hypothesis.Each individualclassifier operated on only one <strong>of</strong> the features in the feature set, yielding a decision independent<strong>of</strong> all other features (assuming the features themselves are independent, which strictly speakingthey are not).Various feature set combinations were explored <strong>an</strong>d the overall result was that using all <strong>of</strong> thefeatures resulted in the highest perform<strong>an</strong>ce. Individually most features performed poorly. Thisadhered to the predomin<strong>an</strong>t theory in ensemble classifiers that the fusion <strong>of</strong> m<strong>an</strong>y weak classifierswill act together as a single strong classifier. When these features were all fused together, theclassification at a fixed true positive rate always yielded less false positives th<strong>an</strong> using only RGBfeatures.One single feature that did perform well individually on two <strong>of</strong> three data sets was the gradientmagnitude. Each <strong>of</strong> the two data sets it excelled in were outdoor <strong>an</strong>d the foreground objects wereat a far dist<strong>an</strong>ce from the camera. The perform<strong>an</strong>ce did not surpass using all <strong>of</strong> the features, butit did surpass using the RGB features. This draws the conclusion that for a low cost <strong>an</strong>d more54


efficient classifier when being restricted to low resolution images <strong>of</strong> the objects being tracked,using gradient magnitude as a single feature will <strong>of</strong>fer satisfactory perform<strong>an</strong>ce gains over RGBfeatures.6.2 Future WorkThe optimistic results generated opens the door for m<strong>an</strong>y avenues <strong>of</strong> future research in usingensembles <strong>of</strong> background classifiers.While signific<strong>an</strong>t improvements were shown using Haar features, as well as gradient features,incorporating additional image processing features may allow for a further improvement. Onefeature that would be intuitive to incorporate is texture. Per pixel texture features using Gaborfilters as described in [54], [55] would <strong>of</strong>fer a similar, but distinctly different, representation <strong>of</strong>a pixel th<strong>an</strong> Haar features.Additional features that could be included are the Laplaci<strong>an</strong> <strong>of</strong>Gaussi<strong>an</strong>, non-thresholded SIFT features [56], depth, <strong>an</strong>d motion features.Another direction is the fusion <strong>of</strong> different classifier algorithms. For example, instead <strong>of</strong> usinga series <strong>of</strong> Mixture <strong>of</strong> Gaussi<strong>an</strong> classifiers, <strong>an</strong> ensemble <strong>of</strong> Mixture <strong>of</strong> Gaussi<strong>an</strong>, Kalm<strong>an</strong> Filter,<strong>an</strong>d other classifiers could be used. This approach was the first direction this work researched.The major stopping point was the lack <strong>of</strong> distinct classifiers for background subtraction.Instead<strong>of</strong> fusing strong classifier algorithms, using a series <strong>of</strong> weak algorithms should <strong>of</strong>fer similarimprovements to using <strong>an</strong> extended feature set.An extended feature set could also be used <strong>with</strong>out having multiple classifiers. Because theMixture <strong>of</strong> Gaussi<strong>an</strong>s algorithm normally operates on a feature vector (where the three featuresare RGB), using the same features in this paper as a single vector for one classifier should produceunique results. It may not be the case that signific<strong>an</strong>tly different results are yielded <strong>with</strong> thisapproach unless a full covari<strong>an</strong>ce matrix is computed, which is generally not the case. Insteadit is generally the case that only the diagonal <strong>of</strong> the covari<strong>an</strong>ce matrix is calculated in order toreduce computational dem<strong>an</strong>ds. Another consideration when using a vector <strong>of</strong> features instead<strong>of</strong> <strong>an</strong> ensemble <strong>of</strong> classifiers is that if a single dimension <strong>of</strong> new inst<strong>an</strong>ce vector does not fall<strong>with</strong>in 2.5σ <strong>of</strong> its corresponding dimension, <strong>an</strong>d every other vector dimension does, then theinst<strong>an</strong>ce does not match the distribution. With a larger feature set it is <strong>of</strong>ten the case that notevery feature matches its corresponding dimension for its feature in the model. This approach is55


also different th<strong>an</strong> using <strong>an</strong> un<strong>an</strong>imous vote scheme for classifier fusion in the ensemble classifierpresented in this work. This is because if only a few features match the new inst<strong>an</strong>ce the classifierswill update themselves based on a match in the ensemble algorithm, while for the single featurevector algorithm the update would not be based on a match unless every dimension was matched(where matching is being <strong>with</strong> 2.5σ).Evaluation <strong>of</strong> the ensemble algorithm’s perform<strong>an</strong>ce could be performed using the VACE evaluationframework in [57]. This framework <strong>of</strong>fers a more comprehensive <strong>an</strong>alysis <strong>of</strong> the algorithmsrobustness, as well as a comparison against the perform<strong>an</strong>ce <strong>of</strong> other tracking algorithms.Real-time implementation <strong>of</strong> this system would require a system <strong>with</strong> multiple processingcores <strong>an</strong>d a large sized main memory, but otherwise its realization is believed to be reasonable.Enh<strong>an</strong>ced engineering <strong>of</strong> the algorithms to exploit the inherent parallelism <strong>of</strong> the ensemble shouldallow for the h<strong>an</strong>dling <strong>of</strong> relatively high frame rates depending on the image resolution.6.3 Final ThoughtsIt is believed that further research needs to be conducted in order to further underst<strong>an</strong>d thegains that were observed in this work. There was a clearly demonstrated adv<strong>an</strong>tage to usingadditional features in background classification, even though m<strong>an</strong>y <strong>of</strong> these features are verypoor when used individually. Through greater investigation a more clearly defined set <strong>of</strong> optimalensemble features would allow for a more general background classification algorithm that couldhave key impacts in defense, robotics, <strong>an</strong>d surveill<strong>an</strong>ce applications.56


REFERENCES[1] R. Jain, R. Kasturi, <strong>an</strong>d B. Schunck. Machine Vision. McGraw Hill, 1995.[2] C. Wren, A. Azarbayej<strong>an</strong>i, T. Darrell, <strong>an</strong>d A. Pentl<strong>an</strong>d. Pfinder: Real-time tracking <strong>of</strong> thehum<strong>an</strong> body. IEEE Tr<strong>an</strong>s. Pattern Anal. Mach. Intell., 19(7):780–785, 1997.[3] D. Koller, J. Weber, <strong>an</strong>d J. Malik. Robust multiple car tracking <strong>with</strong> occlusion reasoning.In ECCV (1), pages 189–196, 1994.[4] C. Stauffer <strong>an</strong>d W. Grimson. Adaptive background mixture models for real-time tracking.In IEEE Conference on Computer Vision <strong>an</strong>d Pattern Recognition, volume 2, 1999.[5] J. B. MacQueen. Some methods for classification <strong>an</strong>d <strong>an</strong>alysis <strong>of</strong> multivariate observations.In Proceedings <strong>of</strong> 5th Berkeley Symposium on Mathematical Statistics <strong>an</strong>d Probability, pages281–297, 1967.[6] S. Atev, O. Masoud, <strong>an</strong>d N. Pap<strong>an</strong>ikolopoulos. Practical mixtures <strong>of</strong> gaussi<strong>an</strong>s <strong>with</strong> brightnessmonitoring. In The 7th International IEEE Conference on Intelligent Tr<strong>an</strong>sportationSystems, 2004.[7] G. Gordon, T. Darrell, M. Harville, <strong>an</strong>d J. Woodfill. <strong>Background</strong> estimation <strong>an</strong>d removalbased on r<strong>an</strong>ge <strong>an</strong>d color. In Image Processing, 2001. Proceedings. 2001 International Conferenceon, volume 2, pages 395–398, 2001.[8] M. Harville, G. Gordon, <strong>an</strong>d J. Woodfill. Adaptive video background modeling using color<strong>an</strong>d depth. In Conference on Image Processing, pages 90–93, 2001.[9] K. Karm<strong>an</strong>n, A. Br<strong>an</strong>dt, <strong>an</strong>d R. Gerl. Moving object segmentation based on adaptivereference images. In Europe<strong>an</strong> Signal Processing Conf, pages 951–954, 1990.[10] C. Ridder, O. Munkelt, <strong>an</strong>d H. Kirchner. Adaptive bbackground estimation <strong>an</strong>d foregrounddetection using kalm<strong>an</strong> filtering. In In Proc. ICAM, pages 193–199, 1995.[11] R. E. Kalm<strong>an</strong>. A new approach to linear filtering <strong>an</strong>d prediction problems. Journal <strong>of</strong> BasicEngineering, 82:3545, 1960.[12] M. Leung <strong>an</strong>d Y. Y<strong>an</strong>g. Hum<strong>an</strong> body motion segmentation in a complex scene. PatternRecognition, 20(1):55–64, 1987.[13] K. Toyama, J. Krumm, B. Brumitt, <strong>an</strong>d B. Meyers. Wallflower: Principles <strong>an</strong>d practice <strong>of</strong>background mainten<strong>an</strong>ce. In ICCV (1), pages 255–261, 1999.[14] Y. Zh<strong>an</strong>g, Z. Li<strong>an</strong>g, Z. Hou, H. W<strong>an</strong>g, <strong>an</strong>d M. T<strong>an</strong>. An adaptive mixture gaussi<strong>an</strong> backgroundmodel <strong>with</strong> online background reconstruction <strong>an</strong>d adjustable foreground mergence time formotion segmentation. In IEEE International Conference on Industrial Technology, volume14-17, pages 23–27, 2005.57


[15] Z. Hou <strong>an</strong>d C. H<strong>an</strong>. A background reconstruction algorithm based on pixel intensity classificationin remote video surveill<strong>an</strong>ce system. In Proceedings <strong>of</strong> the Seventh InternationalConference on Information Fusion, pages 754–759, 2004.[16] A. Shimada, D. Arita, <strong>an</strong>d R. T<strong>an</strong>iguchi. Dynamic control <strong>of</strong> adaptive mixture-<strong>of</strong>-gaussi<strong>an</strong>sbackground model. In Proceedings <strong>of</strong> the IEEE International Conference on Video <strong>an</strong>d SignalBased Surveill<strong>an</strong>ce, page 5, 2006.[17] D Gao, J Zhou, <strong>an</strong>d L Xin. A novel algorithm <strong>of</strong> adaptive background estimation. InInternational Conference on Image Processing, 2001.[18] S. Messelodi, C. Modena, N. Segata, <strong>an</strong>d M. Z<strong>an</strong>in. A kalm<strong>an</strong> filter based backgroundupdating algorithm robust to sharp illumination ch<strong>an</strong>ges. In ICIAP 2005, 13th InternationalConference on Image Analysis <strong>an</strong>d Processing, pages 163–170, 2005.[19] H. W<strong>an</strong>g <strong>an</strong>d D. Suter. <strong>Background</strong> subtraction based on a robust consensus method. InICPR ’06: Proceedings <strong>of</strong> the 18th International Conference on Pattern Recognition, pages223–226, Washington, DC, USA, 2006. IEEE Computer Society.[20] S. Avid<strong>an</strong>. Ensemble tracking. IEEE Tr<strong>an</strong>sactions on Pattern Analysis <strong>an</strong>d Machine Intelligence,29(2):261–271, 2007.[21] Y. Freund <strong>an</strong>d R.E. Schapire. Decision-theoretic generalization <strong>of</strong> on-line learning <strong>an</strong>d <strong>an</strong>application to boosting. Journal <strong>of</strong> Computer <strong>an</strong>d System Sciences, 55(1):119139, 1997.[22] A. Criminisi, G. Cross, A. Blake, <strong>an</strong>d V. Kolmogorov. Bilayer segmentation <strong>of</strong> live video.CVPR, 1:53–60, 2006.[23] K. Patwardh<strong>an</strong>, G. Sapiro, <strong>an</strong>d V. Morellas. Robust foreground segmentation using pixellayers. Pattern Analysis <strong>an</strong>d Machine Intelligence, IEEE Tr<strong>an</strong>sactions on, 4:746–751, 2008.[24] A. Mittal <strong>an</strong>d N. Paragios. Motion-based background subtraction using adaptive kerneldensity estimation. Computer Vision <strong>an</strong>d Pattern Recognition, 2:302–309, 2004.[25] B.K.P. Horn <strong>an</strong>d B.G. Schunck. Determining optical flow. Artificial Intelligence, 17:185203,1981.[26] M. Zucchelli, J. S<strong>an</strong>tos-Victor, <strong>an</strong>d H. Christensen. Multiple pl<strong>an</strong>e segmentation using opticalflow. In British Machine Vision Conference, 2002.[27] B. W. Silverm<strong>an</strong>. Density Estimation for Statistics <strong>an</strong>d Data Analysis. Chapm<strong>an</strong> <strong>an</strong>d Hall,1986.[28] T Gevers <strong>an</strong>d A Smeulders. Color-based object recognition. Pattern Recognition, 32(3):453–464, 1999.[29] T. Horprasert, D. Harwood, <strong>an</strong>d L. S. Davis. A statistical approach for real-time robustbackground subtraction <strong>an</strong>d shadow detection. In ICCV, 1999.[30] M. Xu <strong>an</strong>d T. Ellis. Illumination-invari<strong>an</strong>t motion detection using colour mixture models.In British Machine Vision Conference, 2001.[31] S. Jabri, Z. Duric, H. Wechsler, <strong>an</strong>d A. Rosenfeld. Detection <strong>an</strong>d location <strong>of</strong> people in videoimages using adaptive fusion <strong>of</strong> color <strong>an</strong>d edge information. In International Conference onPattern Recognition, volume 4, pages 627–630, 2000.58


[32] O. Javed, K. Shafique, <strong>an</strong>d M. Shah. A hierarchical approach to robust background subtractionusing color <strong>an</strong>d gradient information. In Motion <strong>an</strong>d Video Computing, Proceedings <strong>of</strong>Workshop on, pages 22–27, 2002.[33] B.V. Dasarathy <strong>an</strong>d B.V. Sheela. Composite classifier system design: Concepts <strong>an</strong>d methodology.Proceedings <strong>of</strong> the IEEE, 67(5):708–713, 1979.[34] R. Schapire. The strength <strong>of</strong> weak learnability. Machine Learning, 5(2):197–227, 1990.[35] L. Breim<strong>an</strong>. Bagging predictors. Machine Learning, 24(2):123140, 1996.[36] D.H. Wolpert. Stacked generalization. Neural Networks, 5:241–259, 1992.[37] R.A. Jacobs, M.I. Jord<strong>an</strong>, S.J. Nowl<strong>an</strong>, <strong>an</strong>d G.E. Hinton. Adaptive mixtures <strong>of</strong> local experts.Neural Computation, 3:7987, 1991.[38] R. Polikar. Ensemble based systems in decision making. IEEE Circuits <strong>an</strong>d Systems Magazine,6(3):21–45, 2006.[39] N. Siebel <strong>an</strong>d S. Mayb<strong>an</strong>k. Fusion <strong>of</strong> multiple tracking algorithms for robust people tracking.In Proceedings <strong>of</strong> the 7th Europe<strong>an</strong> Conference on Computer Vision, volume IV, pages 373–387, May 2002.[40] J. Kittler, M. Hatef, R. Duin, <strong>an</strong>d J. Matas. On combining classifiers. Pattern Analysis <strong>an</strong>dMachine Intelligence, IEEE Tr<strong>an</strong>sactions on, 20(3):226–269, 1998.[41] L. Kuncheva. A theoretical study on six classifier strategies. Pattern Analysis <strong>an</strong>d MachineIntelligence, IEEE Tr<strong>an</strong>sactions on, 24(2):281–286, 2002.[42] T. Ho, J. Hull, <strong>an</strong>d S. Srihari. Decision combination in multiple classifier systems. PatternAnalysis <strong>an</strong>d Machine Intelligence, IEEE Tr<strong>an</strong>sactions on, 16(1):66–75, 1994.[43] A. Ross <strong>an</strong>d A.K. Jain. Information fusion in biometrics. Pattern Recognition Letters,24(13):2115–2125, 2003.[44] J. C<strong>an</strong>ny. A computational approach to edge detection. IEEE Tr<strong>an</strong>sactions on PatternAnalysis <strong>an</strong>d Machine Intelligence, 8:679–714, 1986.[45] C. Papageorgiou, M. Oren, , <strong>an</strong>d T. Poggio. A general framework for object detection. InIn International Conference on Computer Vision, 1998.[46] M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, <strong>an</strong>d T. Poggio. Pedestri<strong>an</strong> detection usingwavelet templates. In Proc. Computer Vision <strong>an</strong>d Pattern Recognition, pages 193–199, 1997.[47] P. Viola <strong>an</strong>d M. Jones. Rapid object detection using a boosted cascade <strong>of</strong> simple features.In Proceedings IEEE Conference on Computer Vision <strong>an</strong>d Pattern Recognition, 2001.[48] R. Lienhart <strong>an</strong>d J. Maydt. An extended set <strong>of</strong> haar-like features for rapid object detection.In International Conference on Image Processing, 2002.[49] T. Fawcett. An introduction to roc <strong>an</strong>alysis. Pattern Recognition Letters, 27:861–874, 2006.[50] J. Makhoul, F. Kubala, R. Schwartz, <strong>an</strong>d R. Weischedel. Perform<strong>an</strong>ce measures for informationextraction. In Proceedings <strong>of</strong> DARPA Broadcast News Workshop, 1999.[51] PETS 2001 Dataset. ftp://ftp.pets.rdg.ac.uk/pub/PETS2001. Data set 2.59


[52] OTCBVS Benchmark Dataset Collection. http://newton.ex.ac.uk/tex/pack/bibtex/btxdoc/node6.html. Data set 03.[53] PETS 2006 Dataset. http://www.cvg.rdg.ac.uk/PETS2006/data.html. Data set S3.[54] A.K. Jain <strong>an</strong>d F. Farrokhnia. Unsupervised texture segmentation using gabor filters. InSystems, M<strong>an</strong> <strong>an</strong>d Cybernetics, IEEE International Conference on, 1990.[55] B. M<strong>an</strong>junath <strong>an</strong>d W. Ma. Texture features for browsing <strong>an</strong>d retrieval <strong>of</strong> image data. PatternAnalysis <strong>an</strong>d Machine Intelligence, 18:837–842, 1996.[56] D. Lowe. Distinctive image features from scale-invari<strong>an</strong>t keypoints. International Journal <strong>of</strong>Computer Vision, 60(2):91–110, 2004.[57] V. M<strong>an</strong>ohar, M. Boonstra, V. Korzhova, P. Soundararaj<strong>an</strong>, D. Goldg<strong>of</strong>, R. Kasturi,S. Prasad, H. Raju, R. Bowers, <strong>an</strong>d J. Gar<strong>of</strong>olo. Pets vs. vace evaluation programs: Acomparative study. In Proceedings <strong>of</strong> the Ninth IEEE International Workshop on Perform<strong>an</strong>ceEvaluation <strong>of</strong> Tracking <strong>an</strong>d Surveill<strong>an</strong>ce, 2006.60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!