12.07.2015 Views

Weakly supervised classification of objects in images using soft ...

Weakly supervised classification of objects in images using soft ...

Weakly supervised classification of objects in images using soft ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Weakly</strong> <strong>supervised</strong> <strong>classification</strong> <strong>of</strong> <strong>objects</strong> <strong>in</strong><strong>images</strong> us<strong>in</strong>g s<strong>of</strong>t random forestsRiwal Lefort (1,2), Ronan Fablet (2), Jean-Marc Boucher (2)(1) Ifremer/STH, Technopole Brest Iroise, 29280 Plouzane(2) Telecom-Bretagne/LabSticc, Technopole Brest Iroise, 29280 PlouzaneAbstract. The development <strong>of</strong> robust <strong>classification</strong> model is among theimportant issues <strong>in</strong> computer vision. This paper deals with weakly <strong>supervised</strong>learn<strong>in</strong>g that generalizes the <strong>supervised</strong> and semi-<strong>supervised</strong>learn<strong>in</strong>g. In weakly <strong>supervised</strong> learn<strong>in</strong>g tra<strong>in</strong><strong>in</strong>g data are given as thepriors <strong>of</strong> each class for each sample. We first propose a weakly <strong>supervised</strong>strategy for learn<strong>in</strong>g s<strong>of</strong>t decision trees. Besides, the <strong>in</strong>troduction<strong>of</strong> class priors for tra<strong>in</strong><strong>in</strong>g samples <strong>in</strong>stead <strong>of</strong> hard class labels makesnatural the formulation <strong>of</strong> an iterative learn<strong>in</strong>g procedure. We report experimentsfor UCI object recognition datasets. These experiments showthat recognition performance close to the <strong>supervised</strong> learn<strong>in</strong>g can beexpected us<strong>in</strong>g the propose framework. Besides, an application to semi<strong>supervised</strong>learn<strong>in</strong>g, which can be regarded as a particular case <strong>of</strong> weakly<strong>supervised</strong> learn<strong>in</strong>g, further demonstrates the pert<strong>in</strong>ence <strong>of</strong> the contribution.We further discuss the relevance <strong>of</strong> weakly <strong>supervised</strong> learn<strong>in</strong>gfor computer vision applications.1 IntroductionNumerous applications <strong>of</strong> computer vision <strong>in</strong>volve the development <strong>of</strong> robust<strong>classification</strong> models. One may for <strong>in</strong>stance cite object recognition [1], texturerecognition [2], event recognition <strong>in</strong> videos [3] or scene analysis [4]. Two ma<strong>in</strong>aspects have to deal with: the def<strong>in</strong>ition <strong>of</strong> application-related feature vectorsor <strong>in</strong>formation representations and the development <strong>of</strong> some associated <strong>classification</strong>models. The latter typically <strong>in</strong>volves a tra<strong>in</strong><strong>in</strong>g step that depends onthe tra<strong>in</strong><strong>in</strong>g data characteristics. New learn<strong>in</strong>g have been developed to exploitthe actually available tra<strong>in</strong><strong>in</strong>g data, for <strong>in</strong>stance the semi-<strong>supervised</strong> learn<strong>in</strong>g [5]which assumes that a small subset <strong>of</strong> labelled samples complemented by a largerset <strong>of</strong> unlabelled samples is provided for the tra<strong>in</strong><strong>in</strong>g stage. For certa<strong>in</strong> applications,recognition performances similar to the <strong>supervised</strong> case can be reached.As a more general situation, we focus on the weakly <strong>supervised</strong> context. Theweakly <strong>supervised</strong> learn<strong>in</strong>g <strong>in</strong>cludes both the <strong>supervised</strong> and semi-<strong>supervised</strong>learn<strong>in</strong>g. It considers probability vectors that <strong>in</strong>dicate the prior <strong>of</strong> each class foreach sample <strong>of</strong> the tra<strong>in</strong><strong>in</strong>g set. Let {x n , π n } n be the tra<strong>in</strong><strong>in</strong>g dataset, where x nis the feature vector for sample n and π n = {π ni } i is the prior vector for samplen, i <strong>in</strong>dex<strong>in</strong>g the classes. π ni gives the likelihood for the example x n to belong toclass i. A <strong>supervised</strong> case corresponds to priors π ni set to 0 or 1 whether or notsample n belongs to class i. The semi-<strong>supervised</strong> learn<strong>in</strong>g is also a specific case


2 Riwal Lefort, Ronan Fablet, Jean-Marc Boucherwhere tra<strong>in</strong><strong>in</strong>g priors π ni are given as 0 or 1 for the subset <strong>of</strong> the fully labelledtra<strong>in</strong><strong>in</strong>g samples and as uniform priors for the rema<strong>in</strong><strong>in</strong>g unlabelled samples.<strong>Weakly</strong> <strong>supervised</strong> learn<strong>in</strong>g also covers other cases <strong>of</strong> <strong>in</strong>terest. Especially, <strong>in</strong>image and video <strong>in</strong>dex<strong>in</strong>g issue, object recognition dataset may <strong>in</strong>volve tra<strong>in</strong><strong>in</strong>g<strong>images</strong> labelled with the presence or the absence <strong>of</strong> each object category[1] [6] [7] [8] [9] [10]. Aga<strong>in</strong> such presence/absence dataset can be regarded asspecific cases <strong>of</strong> tra<strong>in</strong><strong>in</strong>g priors. As an other example, one can imag<strong>in</strong>e an annotationby experts with some uncerta<strong>in</strong>ty measure [11]. This situation would betypical from photo-<strong>in</strong>terpretation applications especially remote sens<strong>in</strong>g applications[12]. A further generalization can be issued from expert-driven or priorautomated analysis provid<strong>in</strong>g some confidence or uncerta<strong>in</strong>ty measure <strong>of</strong> the<strong>classification</strong> <strong>of</strong> <strong>objects</strong> or group <strong>of</strong> <strong>objects</strong>. This is typical from remote sens<strong>in</strong>gapplications. For <strong>in</strong>stance, <strong>in</strong> the acoustics sens<strong>in</strong>g <strong>of</strong> the ocean for fisheriesmanagement [13], <strong>images</strong> <strong>of</strong> fish schools are labelled with class proportions thatlead to <strong>in</strong>dividual priors for each fish school. Such tra<strong>in</strong><strong>in</strong>g dataset provided withclass priors <strong>in</strong>stead <strong>of</strong> hard class labels could also be dealt with when a cascade<strong>of</strong> classifiers or <strong>in</strong>formation could be processed before the f<strong>in</strong>al decision. In suchcases, hard classifiers and vot<strong>in</strong>g procedures are <strong>of</strong>ten considered but one couldbenefit from s<strong>of</strong>t decisions to keep all relevant <strong>in</strong>formation <strong>in</strong> the cascade untilthe f<strong>in</strong>al decisions. This is typical <strong>of</strong> computer vision applications where multiplesources <strong>of</strong> <strong>in</strong>formation may be available [14] [15].In this paper, we address weakly <strong>supervised</strong> learn<strong>in</strong>g us<strong>in</strong>g decision trees.The most common probabilistic classifiers are provided by generative modelslearnt us<strong>in</strong>g Expectation Maximization algorithm [16] [17]. These generativeprobabilistic classifiers are also used <strong>in</strong> ensemble classifiers [18] as <strong>in</strong> boost<strong>in</strong>gschemes or with iterative classifiers [15]. In contrast, we <strong>in</strong>vestigate the construction<strong>of</strong> decision trees with weakly <strong>supervised</strong> dataset. Decision tree and randomforest are among the most flexible and efficient techniques for <strong>supervised</strong> object<strong>classification</strong> [19]. However to our knowledge, no extension to the weakly<strong>supervised</strong> context has been reported previously. Decision trees and associatedrandom forests can provide probabilistic outputs but tra<strong>in</strong><strong>in</strong>g <strong>in</strong>puts are alwaysgiven as labelled samples. The second contribution <strong>of</strong> this work is to develop aniterative procedure for weakly <strong>supervised</strong> learn<strong>in</strong>g. The objective is to iterativelyref<strong>in</strong>e the class priors <strong>of</strong> the tra<strong>in</strong><strong>in</strong>g data <strong>in</strong> order to improve the <strong>classification</strong>performance <strong>of</strong> the classifier at convergence. We report different experiments todemonstrate the relevance <strong>of</strong> these contributions. Quantitative evaluations aregiven for different reference UCI datasets. They <strong>in</strong>volve a comparison to other<strong>classification</strong> and learn<strong>in</strong>g strategies. The focus is given to examples demonstrat<strong>in</strong>gthe genericity <strong>of</strong> the proposed weakly <strong>supervised</strong> framework, <strong>in</strong>clud<strong>in</strong>gapplications to semi-<strong>supervised</strong> learn<strong>in</strong>g. In all cases, the proposed approachfavourably compares to previous work, especially hard decision trees, generative<strong>classification</strong> models, and discrim<strong>in</strong>ative <strong>classification</strong> models.This paper is organized as follows. In section 2, we present the weakly <strong>supervised</strong>learn<strong>in</strong>g <strong>of</strong> decision trees. In section 3, the iterative procedure for weakly<strong>supervised</strong> learn<strong>in</strong>g is detailed. The application to semi-<strong>supervised</strong> learn<strong>in</strong>g is


<strong>Weakly</strong> <strong>supervised</strong> <strong>classification</strong> <strong>of</strong> <strong>objects</strong> <strong>in</strong> <strong>images</strong> 3presented <strong>in</strong> section 4 while experiments and conclusions are given <strong>in</strong> sections 5and 6.2 Decision trees and random forestWe present <strong>in</strong> this section the weakly <strong>supervised</strong> learn<strong>in</strong>g <strong>of</strong> decision trees andrandom forests. To this end, we first briefly review <strong>supervised</strong> decision trees andrandom forests.2.1 Supervised decision trees and random forestsDecision trees are usually built from a <strong>supervised</strong> tra<strong>in</strong><strong>in</strong>g data set. The methodconsists <strong>in</strong> hierarchically splitt<strong>in</strong>g the descriptor space <strong>in</strong>to sub-sets that arehomogeneous <strong>in</strong> terms <strong>of</strong> object classes. More precisely, the feature space is splitbased on the maximization <strong>of</strong> the ga<strong>in</strong> <strong>of</strong> <strong>in</strong>formation. Different split criteriahave been proposed such as the G<strong>in</strong>i criterion [20], the Shannon entropy [21][22], or other on statistical tests such as ANOVA [23] or χ 2 test [24]. All <strong>of</strong> thesemethods have shown to lead to rather equivalent <strong>classification</strong> performances.We focus here on the C4.5 decision trees which are among the most popular[22]. Dur<strong>in</strong>g the tra<strong>in</strong><strong>in</strong>g step, at a given node <strong>of</strong> the tree, the procedure choosesdescriptor d and associated split value S d that maximize <strong>in</strong>formation ga<strong>in</strong> G:arg max G(S d) (1){d,S d }where ga<strong>in</strong> G is issued from⎧the Shannon entropy <strong>of</strong> object classes [22]:( ) ∑⎪⎨ G = E m − E 0m⎪⎩E m = − ∑ . (2)p mi log(p mi )iwhere E 0 <strong>in</strong>dicates the entropy at the parent considered node, E m the entropyat children node m, and p mi the likelihood <strong>of</strong> class i at node m.A test sample is passed trough the tree and follows the test rules associatedwith each node. It is assigned to the class <strong>of</strong> the term<strong>in</strong>al node (or descriptorsubspace) that it reaches.Random forests comb<strong>in</strong>e a ”bagg<strong>in</strong>g” procedure [25] and the random selection<strong>of</strong> a subset <strong>of</strong> descriptors at each node [26]. The random forest [19] can provideprobabilistic outputs given by the posterior distribution <strong>of</strong> the class votes overthe trees <strong>of</strong> the forest. Additional randomization-based procedures can be applieddur<strong>in</strong>g the construction <strong>of</strong> the tree [27]. In some cases, they may lead to improveperformances. Here, the standard random forests will be considered [19].2.2 <strong>Weakly</strong> <strong>supervised</strong> learn<strong>in</strong>g <strong>of</strong> s<strong>of</strong>t decision treesIn this section, we <strong>in</strong>troduce a weakly <strong>supervised</strong> procedure for learn<strong>in</strong>g s<strong>of</strong>tdecision trees. Let us denote by {x n , π n } the provided weakly <strong>supervised</strong> dataset.In contrast to the standard decision tree, any node <strong>of</strong> the tree is associatedwith class priors. In the weakly <strong>supervised</strong> sett<strong>in</strong>g, the key idea is to propagateclass priors through tree nodes rather than class labels as <strong>in</strong> the <strong>supervised</strong> case.


4 Riwal Lefort, Ronan Fablet, Jean-Marc BoucherConsequently, given a constructed decision tree, a test sample will be passedtrough the tree and be assigned the class priors <strong>of</strong> the term<strong>in</strong>al it will reache.Let us denote by p mi the class priors at node m <strong>of</strong> the tree. The key aspect<strong>of</strong> the weakly <strong>supervised</strong> learn<strong>in</strong>g <strong>of</strong> the s<strong>of</strong>t decision tree is the computation <strong>of</strong>class prior p mi at any node m. In the <strong>supervised</strong> case it consists <strong>in</strong> evaluat<strong>in</strong>g theproportion <strong>of</strong> each class at node m. In a weakly <strong>supervised</strong> learn<strong>in</strong>g context, realclasses are unknown and class proportions can not be easily assessed. We proposeto compute p mi as a weighted sum over priors {π ni } for all samples attached tonode m. For descriptor d, denot<strong>in</strong>g x d n the <strong>in</strong>stance value and consider<strong>in</strong>g thechildren node m 1 that groups together data such as {x d n} < S d , the follow<strong>in</strong>gfusion rule is then proposed: ∑pm1i ∝(π ni ) α (3){n}|{x d n } S d , the∑(π ni ) α (4){n}|{x d n }>S dThe considered power α weighs low-uncerta<strong>in</strong>ty samples, i.e. samples such thatclass priors closer to 1 should contribute more to the overall cluster mean p mi . An<strong>in</strong>f<strong>in</strong>ite exponent values resorts to assign<strong>in</strong>g the class with the greatest prior overall samples <strong>in</strong> the cluster. In contrast, an exponent value close to zero withdrawsfrom the weighted sum low class prior. In practice, we typically set α to 0.8.This sett<strong>in</strong>g comes to give more importance to priors close to one. If α < 1, highclass priors are given a similar greater weight compared to low class priors. Ifα > 1, the closer to one the prior the greater the weight.Consider<strong>in</strong>g a random forest, the output from each tree t for a given testdata x is a prior vector p t = {p ti }. p ti is the prior for class i at the term<strong>in</strong>alnode reached for tree t. The overall probability that x is assigned to class i, i.e.posterior likelihood p(y = i|x), is then given by the mean:p(y = i|x) = 1 T∑p ti (5)Tt=1where y n = i denotes that sample x n is assigned to class i. A hard <strong>classification</strong>resorts to select<strong>in</strong>g the most likely class accord<strong>in</strong>g to posteriors (5).3 Iterative <strong>classification</strong>In this section, an iterative procedure that is applied to the tra<strong>in</strong><strong>in</strong>g dataset issuggested. A naive version is first presented, and f<strong>in</strong>ally, a more robust versionthat avoids over tra<strong>in</strong><strong>in</strong>g is proposed.3.1 Naive iterative procedureThe basic idea <strong>of</strong> the iterative scheme is that the class priors <strong>of</strong> the tra<strong>in</strong><strong>in</strong>gsamples can be ref<strong>in</strong>ed iteratively from the overall knowledge acquired by the


<strong>Weakly</strong> <strong>supervised</strong> <strong>classification</strong> <strong>of</strong> <strong>objects</strong> <strong>in</strong> <strong>images</strong> 5tra<strong>in</strong>ed classifier such that these class priors f<strong>in</strong>ally converge to the real class <strong>of</strong>the tra<strong>in</strong><strong>in</strong>g samples. The classifier at a given iteration can then be viewed as afilter that should reduce the noise or uncerta<strong>in</strong>ty on the class priors <strong>of</strong> tra<strong>in</strong><strong>in</strong>gsamples. Note that this iterative method is only applied to the tra<strong>in</strong><strong>in</strong>g dataset.Such an iterative procedure has previously been <strong>in</strong>vestigated <strong>in</strong> different contexts,especially with probabilistic generative classifier [15]. Theoretical resultsregard<strong>in</strong>g convergence properties can hardly be derived [28] [29], though goodexperimental performances have been reported [30]. The major drawbacks <strong>of</strong> thisapproach are possible over-tra<strong>in</strong><strong>in</strong>g effects and the propagation <strong>of</strong> early <strong>classification</strong>errors [5]. Bayesian models may contribute to solve for these over-tra<strong>in</strong><strong>in</strong>gissues.Given an <strong>in</strong>itial tra<strong>in</strong><strong>in</strong>g data set T 1 = {x n, π 1 n} and M iterations,1. For m from 1 to M– Learn a classifier C m from T m.– Apply the classifier C m to T m.– Update T m+1 = {x n, πn m+1 } with πnm+1 ∝ πnp(x 1 n|y n = i, C m).2. Learn the f<strong>in</strong>al classifier us<strong>in</strong>g T M+1.Table 1. Naive iterative procedure for weakly <strong>supervised</strong> learn<strong>in</strong>g (IP1).The implementation <strong>of</strong> this naive iterative procedure proceeds as follows forweakly <strong>supervised</strong> learn<strong>in</strong>g. At iteration m given the weakly <strong>supervised</strong> dataset{x n , πn m }, a random forest C m can be learnt. The updated random forest couldbe used to process any tra<strong>in</strong><strong>in</strong>g sample {x n , πn m } to provide an updates classprior π m+1 . This update <strong>of</strong> class prior π m+1 should exploit both the output <strong>of</strong>the random forest and the <strong>in</strong>itial prior π 1 . Here, the updated priors are given by:π m+1n∝ π 1 np(x n |y n = i, C m ) where y n = i denotes the classe variable for samplen.This algorithm is sketched <strong>in</strong> Tab. 1. In the subsequent, this procedure willbe referred to as IP1 (Iterative Procedure 1).3.2 Randomization-based iterative procedure without over tra<strong>in</strong><strong>in</strong>gA major issue with the above naive iterative procedure is that the random forestis repeatedly applied to the tra<strong>in</strong><strong>in</strong>g data such that over-tra<strong>in</strong><strong>in</strong>g effects may beexpected. Such over-tra<strong>in</strong><strong>in</strong>g effects should be avoided.To this end, we propose a second iterative procedure. The key idea is toexploit a randomization-based procedure to dist<strong>in</strong>guish at each iteration separatetra<strong>in</strong><strong>in</strong>g and test subsets. More precisely, we proceed as follows. At iteration m,the tra<strong>in</strong><strong>in</strong>g dataset T m = {x n , πn m } is randomly split <strong>in</strong>to a tra<strong>in</strong><strong>in</strong>g datasetT r m and a test dataset T t m accord<strong>in</strong>g to a given proportion β. T r m is exploitedto build a weakly <strong>supervised</strong> random forest C m . Samples <strong>in</strong> T t m are passedthrough random forest C m and updated class priors are issued from the same ruleas previously: πnm+1 ∝ πnp(x 1 n |y n = i, C m ). β gives the proportion <strong>of</strong> tra<strong>in</strong><strong>in</strong>gexamples <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set T r m while the rema<strong>in</strong>der (1 − β) tra<strong>in</strong><strong>in</strong>g examples


6 Riwal Lefort, Ronan Fablet, Jean-Marc Boucherfall <strong>in</strong> the test set T t m . Sett<strong>in</strong>g β obeys to a trade-<strong>of</strong>f: for a good assessment <strong>of</strong>random forest C m , the number <strong>of</strong> samples <strong>in</strong> T r m must be high enough. But ifβ is too high, only very few samples will be updated at each iteration lead<strong>in</strong>g toa very slow convergence <strong>of</strong> the algorithm. In practice β is typically set to 0.75.The algorithm is shown <strong>in</strong> the table 2. In the subsequent, this procedure willbe denoted as IP2 (Iterative Procedure 2).Given a tra<strong>in</strong><strong>in</strong>g data set T 1 = {x n, π 1 n} and M iterations,1. for m from 1 to M– Randomly split T m <strong>in</strong> two groups: T r m = {x n, πn m } and T t m = {x n, πn m }accord<strong>in</strong>g to a split proportion β.– Learn a classifier C m from subset T r m.– Apply classifier C m to subset T t m.– Update T t m+1 = {x n, πn m+1 } with πnm+1 ∝ πnp(x 1 n|y n = i, C m).– Update tra<strong>in</strong><strong>in</strong>g dataset T m+1 as T t m+1: T m+1 = {T r m, T t m+1}.2. Learn the f<strong>in</strong>al classifier us<strong>in</strong>g T M+1.Table 2. Randomization-based iterative procedure for weakly <strong>supervised</strong> learn<strong>in</strong>g (IP2).4 Application to semi-<strong>supervised</strong> learn<strong>in</strong>gAs a specific case <strong>of</strong> <strong>in</strong>terest <strong>of</strong> the proposed weakly <strong>supervised</strong> strategy, we consideran application to semi-<strong>supervised</strong> learn<strong>in</strong>g. We first briefly review exist<strong>in</strong>gapproaches and then detail our contribution.4.1 Related workSemi-Supervised Learn<strong>in</strong>g is reviewed <strong>in</strong> [5]. Four types <strong>of</strong> methods can be dist<strong>in</strong>guished.The first type <strong>in</strong>cludes generative models <strong>of</strong>ten exploit<strong>in</strong>g ExpectationMaximization schemes that assess parameters <strong>of</strong> mono-modal Gaussianmodels [31] [5] or multi-modal Gaussian models [32]. Their advantages are theconsistency <strong>of</strong> the mathematical framework with the probabilistic sett<strong>in</strong>g. Thesecond category refers to discrim<strong>in</strong>ative models such as the semi-<strong>supervised</strong> supportvector mach<strong>in</strong>e (S3VM) [33] [5]. Despite a mathematically-sound basis andgood performances, S3VM are subject to local optimization issues and S3VMcan be outperformed by other models depend<strong>in</strong>g on the dataset. Graph-basedclassifier is an other well known category <strong>in</strong> semi-<strong>supervised</strong> learn<strong>in</strong>g [34] [5].The approach is close to the K-nearest-neighbour approach but similarities betweenexamples are also taken <strong>in</strong> account. The pr<strong>in</strong>cipal drawback is that thisclassifier is mostly transductive: generalization properties are rather weak andperformances decrease with unobserved data. The last family <strong>of</strong> semi-<strong>supervised</strong>models is formed by iterative schemes such as the self-tra<strong>in</strong><strong>in</strong>g approach [35]or the co-tra<strong>in</strong><strong>in</strong>g approach [36] that is applicable if observation features canbe split <strong>in</strong>to two <strong>in</strong>dependent groups. The advantage is the good performancereached by these methods and the simplicity <strong>of</strong> the approach. Their drawbacksmostly lie <strong>in</strong> the difficulties to characterize convergence properties.


<strong>Weakly</strong> <strong>supervised</strong> <strong>classification</strong> <strong>of</strong> <strong>objects</strong> <strong>in</strong> <strong>images</strong> 74.2 Self tra<strong>in</strong><strong>in</strong>g with s<strong>of</strong>t random forestsA semi-<strong>supervised</strong> version <strong>of</strong> the iterative procedure proposed <strong>in</strong> the previoussection can be derived. Follow<strong>in</strong>g a self tra<strong>in</strong><strong>in</strong>g strategy, it conststs <strong>in</strong> <strong>in</strong>itiallytra<strong>in</strong><strong>in</strong>g a random forest from groundtruthed tra<strong>in</strong><strong>in</strong>g samples only. Then, ateach iteration, unlabelled data are processed by the current classifier and the Ksamples with the greatest class posteriors are appended to the tra<strong>in</strong><strong>in</strong>g databaseto retra<strong>in</strong> the classifier. If should be stressed that <strong>in</strong> the standard implementation<strong>of</strong> semi-<strong>supervised</strong> learn<strong>in</strong>g with SVMs and random forest the new samplesappended to tra<strong>in</strong><strong>in</strong>g set at each iteration are assigned class labels. In contrast,we benefit from the proposed weakly <strong>supervised</strong> decision trees. This is expectedto reduce the propagation <strong>of</strong> <strong>classification</strong> errors. The sketch <strong>of</strong> semi-<strong>supervised</strong>learn<strong>in</strong>g is given <strong>in</strong> table 3.It should be stressed that this semi-<strong>supervised</strong> procedure can be regarded asa specific case <strong>of</strong> the iterative weakly <strong>supervised</strong> learn<strong>in</strong>g <strong>in</strong> which the iteratedrandom sampl<strong>in</strong>g <strong>of</strong> the tra<strong>in</strong><strong>in</strong>g and test subsets is replaced by a determ<strong>in</strong>isticsampl<strong>in</strong>g <strong>in</strong>itially exploit<strong>in</strong>g the presence <strong>of</strong> fully groundtruthed samples <strong>in</strong> theprocessed dataset.Given an <strong>in</strong>itial tra<strong>in</strong><strong>in</strong>g data set T = {T L, T U }, where T L conta<strong>in</strong>s labelled data andT U unlabelled data, and M iterations.1. For m from 1 to M– Learn a classifier C m from T L.– Apply classifier C m to T U .– For each classes, transfer from T U to T L the most confident examples, withweak label, accord<strong>in</strong>g to the probabilistic <strong>classification</strong>.2. Generate the f<strong>in</strong>al classifier us<strong>in</strong>g T L.Table 3. S<strong>of</strong>t self-tra<strong>in</strong><strong>in</strong>g procedure for semi-<strong>supervised</strong> learn<strong>in</strong>g.5 Experiments5.1 Simulation protocolIn this section, we compare four <strong>classification</strong> models: IP1 us<strong>in</strong>g s<strong>of</strong>t randomforests, IP2 us<strong>in</strong>g s<strong>of</strong>t random forests, s<strong>of</strong>t random forests alone, and the generativemodel proposed <strong>in</strong> [6] for weakly labelled data.Given a <strong>supervised</strong> dataset, a weakly <strong>supervised</strong> tra<strong>in</strong><strong>in</strong>g dataset is built.We distribute all the tra<strong>in</strong><strong>in</strong>g examples <strong>in</strong> several groups accord<strong>in</strong>g to predef<strong>in</strong>edtarget class proportions (table 4). All the <strong>in</strong>stances <strong>in</strong> a given group areassigned the class proportion <strong>of</strong> the group. In table 4, we show an example <strong>of</strong>target class proportions for a three-class dataset. In this example, we can creategroups conta<strong>in</strong><strong>in</strong>g from one class (<strong>supervised</strong> learn<strong>in</strong>g) to three classes. For eachcase <strong>of</strong> class-mixture, different mixture complexities can be created: from oneclass dom<strong>in</strong>at<strong>in</strong>g the mixture, i.e. the prior <strong>of</strong> one class be<strong>in</strong>g close to one, toequiprobable class, i.e. equal values for non-zeros class priors.


8 Riwal Lefort, Ronan Fablet, Jean-Marc BoucherTo evaluate the performances <strong>of</strong> the proposed weakly <strong>supervised</strong> learn<strong>in</strong>gstrategies, we consider different reference datasets <strong>of</strong> the UCI mach<strong>in</strong>e learn<strong>in</strong>grepository so that reported experiments could be reproduced. The three considereddatasets have been chosen to provide representative examples <strong>of</strong> thedatasets to be dealt with <strong>in</strong> computer vision applications. We do not considerdatasets with two classes because they do not allow us to generate complex classmixtures.D1 is an image segmentation dataset conta<strong>in</strong><strong>in</strong>g 7 classes <strong>of</strong> texture and330 <strong>in</strong>stances per class. Each sample is characterized by a 19-dimensional realfeature vector. D1 is a typical computer vision dataset drawn from a database<strong>of</strong> 7 outdoor <strong>images</strong> (brickface, sky, foliage, cement, w<strong>in</strong>dow, path, grass). D2is the classical Iris dataset conta<strong>in</strong><strong>in</strong>g 3 classes and 50 <strong>in</strong>stances per class. Eachobjectis characterized by geometric features, i.e. length and width <strong>of</strong> the petals.D3 is the Synthetic Control Chart Time Series dataset, conta<strong>in</strong><strong>in</strong>g 6 classes <strong>of</strong>typical l<strong>in</strong>e evolutions, 100 <strong>in</strong>stances per classes, and 5 quantified descriptors. An<strong>in</strong>terest<strong>in</strong>g property <strong>of</strong> this dataset is that the distribution <strong>of</strong> the features with<strong>in</strong>each class is not unimodal and cannot be easily modelled us<strong>in</strong>g a parametricapproach. This is particularly relevant for computer vision applications where<strong>objects</strong> classes <strong>of</strong>ten lead non-l<strong>in</strong>ear manifolds <strong>in</strong> the feature space. Dataset D3was also chosen to <strong>in</strong>vestigate the <strong>classification</strong> <strong>of</strong> time series depict<strong>in</strong>g differentbehaviours (cyclic, <strong>in</strong>creas<strong>in</strong>g vs. decreas<strong>in</strong>g trend, upward vs. downward shift).This is regarded as a mean to evaluate the ability to discrim<strong>in</strong>ate dynamic contentsparticularly relevant for video analysis, i.e. activity recognition, trajectory<strong>classification</strong>, event detection.Dataset with 3 classes, 1-class mixture labels:⎛ ⎞ ⎛ ⎞ ⎛ ⎞1 0 0⎝0⎠⎝1⎠⎝00 0 1⎠ (<strong>supervised</strong> learn<strong>in</strong>g)Dataset with 3 classes, 2-class mixture labels:⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞0.8 0.2 0.6 0.4 0.8 0.2 0.6 0.4 0 0 0 0⎝0.2⎠⎝0.8⎠⎝0.4⎠⎝0.6⎠⎝ 0 ⎠ ⎝ 0 ⎠ ⎝ 0 ⎠ ⎝ 0 ⎠ ⎝0.8⎠⎝0.2⎠⎝0.6⎠⎝0.4⎠0 0 0 0 0.2 0.8 0.4 0.6 0.2 0.8 0.4 0.6Dataset with 3 classes, 3-class mixture labels:⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞0.8 0.1 0.1 0.4 0.2 0.2⎝0.1⎠⎝0.8⎠⎝0.1⎠⎝0.2⎠⎝0.4⎠⎝0.2⎠0.1 0.1 0.8 0.2 0.2 0.4Table 4. Example <strong>of</strong> tra<strong>in</strong><strong>in</strong>g class priors for a dataset with 3 classes. Different casesare carried out: from the <strong>supervised</strong> labell<strong>in</strong>g to the high complexity mixture.Cross validation over 100 tests allows assess<strong>in</strong>g a mean <strong>classification</strong> rate.90% <strong>of</strong> data are used to tra<strong>in</strong> classifier while the 10% rema<strong>in</strong>ders are used totest. Dataset is randomly split every test and the procedure that affects weaklabels to the tra<strong>in</strong><strong>in</strong>g data is carried out every test. A mean correct <strong>classification</strong>rate is extracted over the cross validation.5.2 Experiments on weakly <strong>supervised</strong> datasetWe report weakly <strong>supervised</strong> experiments <strong>in</strong> table 5 for the tree datasets. Resultsare provided as a function <strong>of</strong> the tra<strong>in</strong><strong>in</strong>g dataset and as a function <strong>of</strong> theclass mixture complexity, from the <strong>supervised</strong> learn<strong>in</strong>g (1-class mixture) to themaximum complexity mixture (mixture with all classes). Results are reported


<strong>Weakly</strong> <strong>supervised</strong> <strong>classification</strong> <strong>of</strong> <strong>objects</strong> <strong>in</strong> <strong>images</strong> 9for the iterative procedures IP1 (section 3.1) and IP2 (section 3.2), the weakly<strong>supervised</strong> learn<strong>in</strong>g <strong>of</strong> s<strong>of</strong>t random forests, the generative and discrim<strong>in</strong>ant modelspreviously proposed <strong>in</strong> [6] and [13]. The later respectively exploit Gaussianmixtures and a kernel Fisher discrim<strong>in</strong>ation technique.Overall, the iterative process IP2 outperforms the other models. Even if randomforests alone are outperformed by the generative model with D2, the iterativeprocedure leads to improved <strong>classification</strong>. The explanation is that classpriors are iteratively ref<strong>in</strong>ed to f<strong>in</strong>ally resort to less fuzzy priors. Experimentswith dataset D3, particularly stress the relevance <strong>of</strong> the <strong>in</strong>troduction <strong>of</strong> s<strong>of</strong>t decisiontrees. Due to the <strong>in</strong>terlaced structure <strong>of</strong> the feature distribution for eachclass <strong>of</strong> dataset D3, the generative and discrim<strong>in</strong>ative models perform poorly. Incontrast the weakly <strong>supervised</strong> random forests reach correct <strong>classification</strong> ratesclose to the <strong>supervised</strong> reference even with complex 6-class tra<strong>in</strong><strong>in</strong>g mixtures.For <strong>in</strong>stance, consider<strong>in</strong>g D3 and 6-classes mixture labels, the iterative procedureIP2 comb<strong>in</strong>ed to s<strong>of</strong>t forest reach 98.8% (vs. 100% <strong>in</strong> the <strong>supervised</strong> case)where the generative end discrim<strong>in</strong>ative models only reach 58.3% and 59.8% <strong>of</strong>correct <strong>classification</strong>.Dataset, IP1 IP2 s<strong>of</strong>t Naive Fishertype <strong>of</strong> mixtures + s<strong>of</strong>t trees + s<strong>of</strong>t trees trees bayes + K-pcaD1, 1 classes mixture - - 96.1% 83.7% 89.7%D1, 2 classes mixture 90.7% 96.1% 92.3% 83.6% 89.2%D1, 3 classes mixture 88.7% 95.9% 91.2% 84.4% 89.5%D1, 4 classes mixture 88.3% 94.4% 88.4% 83.7% 89.1%D1, 5 classes mixture 85.0% 94.1% 88.8% 83.8% 89.1%D1, 6 classes mixture 75.2% 92.7% 84.6% 83.1% 89.1%D1, 7 classes mixture 55.1% 81.4% 62.6% 75.1% 85.9%D2, 1 classes mixture - - 97.3% 94.6% 96.0%D2, 2 classes mixture 97.3% 97.3% 90.6% 95.3% 87.3%D2, 3 classes mixture 84.0% 92.6% 81.3% 85.3% 76.6%D3, 1 classes mixture - - 100% 77.1% 66.8%D3, 2 classes mixture 90.5% 100% 90.0% 62.2% 63.6%D3, 3 classes mixture 91.3% 99.5% 89.3% 62.1% 61.5%D3, 4 classes mixture 82.1% 98.1% 75.6% 45.5% 62%D3, 5 classes mixture 74.6% 97.3% 82.1% 47.3% 59.1%D3, 6 classes mixture 94.0% 98.8% 88.6% 58.3% 59.8%Table 5. Classification performances for datasets D1, D2, and D3: the mean correct<strong>classification</strong> rate (%) is reported as a function <strong>of</strong> the complexity <strong>of</strong> the mixture labelfor the 5 <strong>classification</strong> models IP1 + s<strong>of</strong>t trees, IP2+ s<strong>of</strong>t trees, s<strong>of</strong>t trees and randomforest alone, a EM-based generative algorithm [6], and a discrim<strong>in</strong>ative-based algorithm[13].To further illustrate the behaviour <strong>of</strong> the iterative procedures IP1 and I2, wereport <strong>in</strong> figure 1 the evolution <strong>of</strong> the mean correct <strong>classification</strong> rate (for thetest set) as a function <strong>of</strong> the iteration for dataset D1 and D2 and several types<strong>of</strong> class mixtures. These plots state the relevance <strong>of</strong> the procedure IP2 comparedto procedure IP1. Whereas the later does not lead to significant improvementover iterations, the ga<strong>in</strong> <strong>in</strong> the correct <strong>classification</strong> rate can be up to 10% afterthe convergence <strong>of</strong> the IP2 procedure, for <strong>in</strong>stance for dataset D2 and 2-class


10 Riwal Lefort, Ronan Fablet, Jean-Marc Bouchermixtures. The convergence is typically observed on a ten <strong>of</strong> iterations. Theseresults can be expla<strong>in</strong>ed by the fact that the IP2 procedure dist<strong>in</strong>guishes at eachiteration separate tra<strong>in</strong><strong>in</strong>g and test set to update the random forests and theclass priors.Fig. 1. Evolution <strong>of</strong> the performances <strong>of</strong> the iterative procedures IP1 and IP2 throughiteration: dataset D1 (left), dataset D3 (right).5.3 Semi-<strong>supervised</strong> experimentsSemi-<strong>supervised</strong> experiments have been carried out us<strong>in</strong>g a procedure similarto the previous section. Tra<strong>in</strong><strong>in</strong>g and test sets are randomly built for a givendataset. Each tra<strong>in</strong><strong>in</strong>g set is composed <strong>of</strong> labelled and unlabelled samples. Wehere report results for datasets D2 and D3 with the follow<strong>in</strong>g experimental sett<strong>in</strong>g.For dataset D3 the tra<strong>in</strong><strong>in</strong>g dataset conta<strong>in</strong>s 9 labelled examples (3 for eachclass) and 126 unlabelled examples (42 for each class). For dataset D3, we focuson a two-class example consider<strong>in</strong>g only samples correspond<strong>in</strong>g to normal andcyclic pattern. Tra<strong>in</strong><strong>in</strong>g datasets conta<strong>in</strong> 4 labelled samples and 86 unlabelledsamples per class. This particular experimental sett<strong>in</strong>g is chosen to illustrate therelevance <strong>of</strong> the semi-<strong>supervised</strong> learn<strong>in</strong>g when only very fulled labelled tra<strong>in</strong><strong>in</strong>gsamples are available. In any case, the upper bound <strong>of</strong> the <strong>classification</strong> performances<strong>of</strong> a semi-<strong>supervised</strong> scheme is given by the <strong>supervised</strong> case. Therefore,only weak ga<strong>in</strong> can be expected when a representative set <strong>of</strong> fully labelled samplesis provided to the semi-<strong>supervised</strong> learn<strong>in</strong>g.Five semi-<strong>supervised</strong> procedure are compared: three based on self-tra<strong>in</strong><strong>in</strong>g(ST) strategies [5], with s<strong>of</strong>t random forests (ST-SRF), with standard (hard)random forests (ST-RF), with a nave Bayes classifier (ST-NBC), a EM-basednave Bayes classifier (EM-NBC) [6] and the iterative procedure IP2 to s<strong>of</strong>t randomforest (IP2-SRF). Results are reported <strong>in</strong> figure 2.These semi-<strong>supervised</strong> experiments first highlight the relevance <strong>of</strong> the s<strong>of</strong>trandom forests compared to their standard versions. For <strong>in</strong>stance, when compar<strong>in</strong>gboth to a self-tra<strong>in</strong><strong>in</strong>g strategy, the s<strong>of</strong>t random forests lead to a ga<strong>in</strong><strong>of</strong> 5% <strong>of</strong> correct <strong>classification</strong> with dataset D3. This is regarded as a directconsequence <strong>of</strong> a reduced propagation <strong>of</strong> <strong>in</strong>itial <strong>classification</strong> errors with s<strong>of</strong>tdecisions. The structure <strong>of</strong> the feature space for dataset D3 further illustratesas previously the flexibility <strong>of</strong> the random forest schemes compared to the otherones, especially generative models which behave poorly.


<strong>Weakly</strong> <strong>supervised</strong> <strong>classification</strong> <strong>of</strong> <strong>objects</strong> <strong>in</strong> <strong>images</strong> 11These experiments also demonstrate the relevance <strong>of</strong> the weakly <strong>supervised</strong>learn<strong>in</strong>g IP2-SRF <strong>in</strong> a semi-<strong>supervised</strong> context. The later favourably comparesto the best self-tra<strong>in</strong><strong>in</strong>g strategy (i.e. 90% vs 82.5% <strong>of</strong> correct <strong>classification</strong> fordataset D2 after 10 iterations). This can be justified by the relations betweenthe two procedures. As mentioned <strong>in</strong> section 4, the self tra<strong>in</strong><strong>in</strong>g procedure withs<strong>of</strong>t random forests can be regarded as a specific implementation <strong>of</strong> the iterativeprocedure IP2. More precisely, the self-tra<strong>in</strong><strong>in</strong>g strategy consists <strong>in</strong> iteratively append<strong>in</strong>gunlabelled samples to the tra<strong>in</strong><strong>in</strong>g dataset. At a given iteration, amongthe samples not yet appended to the tra<strong>in</strong><strong>in</strong>g set, those with the greatest measures<strong>of</strong> the confidence <strong>in</strong> the <strong>classification</strong> are selected. Hence the <strong>classification</strong>decisions performed for the samples <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g set are never re-evaluated. Incontrast to this determ<strong>in</strong>istic update <strong>of</strong> the tra<strong>in</strong><strong>in</strong>g set, the weakly <strong>supervised</strong> iterativeprocedure IP2 exploits a randomization-based strategy where unlabelledsample are randomly picked to build at each iteration a tra<strong>in</strong><strong>in</strong>g set. Therefore,s<strong>of</strong>t <strong>classification</strong> decisions are repeatedly re-evaluated with respect to the updatedoverall knowledge. Then, the proposed entropy criterion (equation (3))implies that fully labelled samples are also implicitly given more weight as <strong>in</strong>the s<strong>of</strong>t tra<strong>in</strong><strong>in</strong>g procedures. These different features support the better performancesreported here for the iterative procedure IP2 comb<strong>in</strong>ed to s<strong>of</strong>t randomforests.Fig. 2. Classification performances <strong>in</strong> semi-<strong>supervised</strong> contexts: dataset D2 (left) anddataset D3 (right) restricted to classes ”standard patterns” and ”cyclic pattern”. Fiveapproaches are compared: ST-SRF, ST-RF, ST-NBC, EM-NBC, and IP2-SRF (cf. textfor details).5.4 Application to fish school <strong>classification</strong> <strong>in</strong> sonar <strong>images</strong><strong>Weakly</strong> <strong>supervised</strong> learn<strong>in</strong>g is applied to fisheries acoustics data [13] formed by aset <strong>of</strong> fish schools automatically extracted <strong>in</strong> sonar acoustics data. Fish schoolsare extracted as connected components us<strong>in</strong>g a threshold<strong>in</strong>g-based algorithm.Each school is characterized by a X-dimensional feature vector compris<strong>in</strong>g geometric(i.e. surface, width, height <strong>of</strong> the school) and acoustic (i.e. backscatteredenergy) descriptors. At the operation level, tra<strong>in</strong><strong>in</strong>g samples would issue fromthe sonar <strong>in</strong> trawled areas, such any tra<strong>in</strong><strong>in</strong>g school would be assigned the relativepriors <strong>of</strong> each class. With a view to perform<strong>in</strong>g a quantitative evaluation,


12 Riwal Lefort, Ronan Fablet, Jean-Marc Bouchersuch weakly <strong>supervised</strong> situations are simulated from a groundtruthed fish schooldataset. The later has been built from sonar <strong>images</strong> <strong>in</strong> trawled regions depict<strong>in</strong>gonly one species.From results given <strong>in</strong> table 6, we perform a comparative evaluation based onthe same methods than <strong>in</strong> section 5.2 as shown <strong>in</strong> table 5. Class proportions havebeen simulated as presented <strong>in</strong> table 4. Similar conclusions can be drawn. Overallthe iterative procedure with s<strong>of</strong>t random forests (IP2-SRF) outperforms theother techniques <strong>in</strong>clud<strong>in</strong>g the generative and discrim<strong>in</strong>ative models presented<strong>in</strong> [6] [13], except for the four-class mixture case where s<strong>of</strong>t random forestsalone perform better (58% vs. 55%). The operational situations typically <strong>in</strong>volvemixtures between two or three species and the reported recognition performances(between 71% and 79%) are relevant w.r.t. ecological objectives <strong>in</strong> terms <strong>of</strong>species biomass evaluation and the associated expected uncerta<strong>in</strong>ty levels.Dataset, IP1 IP2 s<strong>of</strong>t Naive Fishertype <strong>of</strong> mixtures + s<strong>of</strong>t trees + s<strong>of</strong>t trees trees bayes + K-pcaD4, 1 classes mixture - - 89.3% 66.9% 69.9%D4, 2 classes mixture 72.3% 79.4% 71.9% 52% 71.7%D4, 3 classes mixture 62.9% 70% 68.3% 51.2% 65.9%D4, 4 classes mixture 45.3% 55% 58.7% 47.9% 56.2%Table 6. Classification performances for sonar image dataset D4: the mean correct<strong>classification</strong> rate (%) is reported as a function <strong>of</strong> the complexity <strong>of</strong> the mixture labelfor the 5 <strong>classification</strong> models IP1 + s<strong>of</strong>t trees, IP2+ s<strong>of</strong>t trees, s<strong>of</strong>t trees and randomforest alone, a EM-based generative algorithm [6], and a discrim<strong>in</strong>ative-based algorithm[13].6 ConclusionWe have presented <strong>in</strong> this paper methodological contributions for learn<strong>in</strong>g objectclass models <strong>in</strong> a weakly <strong>supervised</strong> context. The key feature is that <strong>classification</strong><strong>in</strong>formation <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g datasets are given as the priors <strong>of</strong> the different classesfor each tra<strong>in</strong><strong>in</strong>g sample. This weakly <strong>supervised</strong> sett<strong>in</strong>g covers the <strong>supervised</strong>situations, the semi-<strong>supervised</strong> situations, and computer vision applications asthe object recognition scheme that only specifies the presence or absence <strong>of</strong> eachclass <strong>in</strong> each image <strong>of</strong> the tra<strong>in</strong><strong>in</strong>g dataset.S<strong>of</strong>t random forests for weakly <strong>supervised</strong> learn<strong>in</strong>g: From a methodologicalpo<strong>in</strong>t <strong>of</strong> view, a first orig<strong>in</strong>al contribution is a procedure to learn s<strong>of</strong>trandom forests <strong>in</strong> a weakly <strong>supervised</strong> context. Interest<strong>in</strong>gly the later is equivalentto the C4.5 random forest [19] if a <strong>supervised</strong> dataset is available such thatrecognition performances for low uncerta<strong>in</strong>ty priors can be granted. The secondmethodological contribution is an iterative procedure aimed at iterativelyimprov<strong>in</strong>g the learn<strong>in</strong>g model.The experimental evaluation <strong>of</strong> these methodological contributions for severalreference UCI datasets demonstrate their relevance compared to previous work<strong>in</strong>clud<strong>in</strong>g generative and discrim<strong>in</strong>ative models [6] [13]. We have also shownthat these weakly <strong>supervised</strong> learn<strong>in</strong>g schemes are relevant for semi-<strong>supervised</strong>datasets for which they favourably compare to standard iterative techniquessuch as self-tra<strong>in</strong><strong>in</strong>g procedures. The experiments support the statement widely


<strong>Weakly</strong> <strong>supervised</strong> <strong>classification</strong> <strong>of</strong> <strong>objects</strong> <strong>in</strong> <strong>images</strong> 13acknowledged <strong>in</strong> pattern recognition that, when relevantly iterated, s<strong>of</strong>t decisionsperform better than hard decisions.<strong>Weakly</strong> <strong>supervised</strong> learn<strong>in</strong>g for computer vision applications: Thereference UCI datasets considered <strong>in</strong> the reported experiments are representative<strong>of</strong> different types <strong>of</strong> computer vision datasets (i.e. patch <strong>classification</strong>, objectrecognition, trajectory analysis). For these datasets, we have shown that recognitionperformances close to upper bounds provides by the <strong>supervised</strong> learn<strong>in</strong>gcould be reached by the proposed weakly learn<strong>in</strong>g strategy even when the tra<strong>in</strong><strong>in</strong>gset mostly high uncerta<strong>in</strong>ty class priors.This is <strong>of</strong> particular importance as it supports the relevance <strong>of</strong> the weakly<strong>supervised</strong> learn<strong>in</strong>g to process uncerta<strong>in</strong> or contradictory expert or automatedprelim<strong>in</strong>ary <strong>in</strong>terpretations. In any case, such a learn<strong>in</strong>g scheme should make <strong>in</strong>computer vision applications the construction <strong>of</strong> tra<strong>in</strong><strong>in</strong>g datasets which is task<strong>of</strong>ten a very tedious task. The later observation <strong>in</strong>itially motivated the <strong>in</strong>troduction<strong>of</strong> the semi-<strong>supervised</strong> learn<strong>in</strong>g and <strong>of</strong> the weakly <strong>supervised</strong> case restrictedto presence and absence <strong>in</strong>formation. Our work should be regarded as a furtherdevelopment <strong>of</strong> these ideas to take <strong>in</strong>to account more general prior <strong>in</strong>formation.As illustrated for <strong>in</strong>stance by the fisheries acoustics dataset, we believe that thisgeneralization may permit reach<strong>in</strong>g relevant <strong>classification</strong> performances when theknowledge <strong>of</strong> presence and/or absence <strong>in</strong>formation only leads to unsatisfactory<strong>classification</strong> rates [37] [6].Given the central role <strong>of</strong> the randomization-based sampl<strong>in</strong>g <strong>in</strong> the iterativeprocedure, future work will focus on its analysis both form experimental andtheoretical po<strong>in</strong>ts <strong>of</strong> view. The objectives will be to characterize the convergence<strong>of</strong> this procedure as well to evaluate different random sampl<strong>in</strong>g beyond the uniformsampl<strong>in</strong>g tester <strong>in</strong> the reported experiments. A stopp<strong>in</strong>g criteria might alsobe considered.References1. Crandall, D.J., Huttenlocher, D.P.: <strong>Weakly</strong> <strong>supervised</strong> learn<strong>in</strong>g <strong>of</strong> part-based spatialmodels for visual object recognition. ECCV (2006)2. Lazebnik, S., Schmid, C., Ponce, J.: A sparse texture representation us<strong>in</strong>g localaff<strong>in</strong>e regions. IEEE Transaction on PAMI 27 (2005) 1265–12783. Hongeng, S., Nevatia, R., Bremond, F.: Video-based event recognition: activityrepresentation and probabilistic recognition methods. CVIU 96 (2004) 129–1624. Torralba, A.: Contextual prim<strong>in</strong>g for object detection. IJCV 53 (2003) 169–1915. Chapelle, O., Schlkopf, B., Zien, A.: Semi-<strong>supervised</strong> learn<strong>in</strong>g. MIT Press (2006)6. Ulusoy, I., Bishop, C.: Generative versus discrim<strong>in</strong>ative methods for object recognition.CVPR 2 (2005) 258–2657. Ponce, J., Hebert, M., Schmid, C., Ziserman, A.: Toward category-level objectrecognition. Lecture Notes <strong>in</strong> Computer Science, Spr<strong>in</strong>ger (2006)8. Weber, M., Well<strong>in</strong>g, M., Perona, P.: Un<strong>supervised</strong> learn<strong>in</strong>g <strong>of</strong> models for objectrecognition. ECCV 1 (2000) 18–329. Fergus, R., Perona, P., Zisserman, A.: <strong>Weakly</strong> <strong>supervised</strong> scaled-<strong>in</strong>variant learn<strong>in</strong>g<strong>of</strong> models for visual recognition. IJCV 71 (2006) 273–30310. Schmid, C.: <strong>Weakly</strong> <strong>supervised</strong> learn<strong>in</strong>g <strong>of</strong> visual models and its application tocontent-based retrieval. IJCV 56 (2004) 7–16


14 Riwal Lefort, Ronan Fablet, Jean-Marc Boucher11. Rossiter, J., Mukai, T.: Bio-mimetic learn<strong>in</strong>g from <strong>images</strong> us<strong>in</strong>g imprecise expert<strong>in</strong>formation. Fuzzy Set and Systems 158 (2007) 295–31112. van de Vlag, D., Ste<strong>in</strong>, A.: Incorpor<strong>in</strong>g uncerta<strong>in</strong>ty via hierarchical <strong>classification</strong>us<strong>in</strong>g fuzzy decision trees. IEEE Transaction on GRS 45 (2007) 237–24513. Lefort, R., Fablet, R., Boucher, J.M.: Comb<strong>in</strong><strong>in</strong>g image-level and object-level <strong>in</strong>ferencefor weakly <strong>supervised</strong> object recognition. application to fisheries acoustics.ICIP (2009)14. Maccormick, J., Blake, A.: A probabilistic exclusion pr<strong>in</strong>ciple for track<strong>in</strong>g multiple<strong>objects</strong>. IJCV 39 (2000) 57–7115. Neville, J., Jensen, D.: Iterative <strong>classification</strong> <strong>in</strong> relational data. AAAI workshopon learn<strong>in</strong>g statistical models from relational data (2000) 42–4916. Neal, R., H<strong>in</strong>ton, G.: A view <strong>of</strong> the EM algorithm that justifies <strong>in</strong>cremental, sparseand other variants. Kluwer Academic Publishers (1998)17. Mc Lachlan, G., Krishnan, T.: The EM algorithm and extentions. Wiley (1997)18. Kotsiantis, P., P<strong>in</strong>telas, P.: Logitboost <strong>of</strong> simple bayesian classifier. InformaticaJournal 29 (2005) 53–5919. Breiman, L.: Random forests. Mach<strong>in</strong>e Learn<strong>in</strong>g 45 (2001) 5:3220. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and regressiontrees. Chapman & Hall. (1984)21. Qu<strong>in</strong>lan, J.: Induction <strong>of</strong> decision trees. Mach<strong>in</strong>e Learn<strong>in</strong>g 1 (1986) 81–10622. Qu<strong>in</strong>lan, J.: C4.5: Programs for mach<strong>in</strong>e learn<strong>in</strong>g. Morgan Kaufmann Publishers(1993)23. Loh, W.Y., Shih, Y.Y.: Split selection methods for <strong>classification</strong> trees. StatisticaS<strong>in</strong>ica 7 (1997) 815–84024. Kass, G.: An exploratory technique for <strong>in</strong>vesgat<strong>in</strong>g large quantities <strong>of</strong> categoricaldata. Journal <strong>of</strong> applied statistics 29 (1980) 119–12725. Breiman, L.: Bagg<strong>in</strong>g predictors. Mach<strong>in</strong>e Learn<strong>in</strong>g 26 (1996) 123–14026. Ho, T.K.: Random decision forest. ICDAR (1995)27. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach<strong>in</strong>e Learn<strong>in</strong>g36 (2006) 3–4228. Culp, M., Michailidis, G.: An iterative algorithm for extend<strong>in</strong>g learners to semi<strong>supervised</strong>sett<strong>in</strong>g. The 2007 Jo<strong>in</strong>t Statistical Meet<strong>in</strong>gs (2007)29. Haffari, G., Sarkar, A.: Analysis <strong>of</strong> semi-<strong>supervised</strong> learn<strong>in</strong>g with the yarowskyalgorithm. 23rd Conference on Uncerta<strong>in</strong>ty <strong>in</strong> Artificial Intelligence (2007)30. Macskassy, S.A., Provost, F.: A simple relational classifier. Proceed<strong>in</strong>gs <strong>of</strong> thesecond workshop on multi-relational data m<strong>in</strong><strong>in</strong>g (2003) 64–7631. Nigam, K., McCullum, A., Thrun, S., Mitchel, T.: Learn<strong>in</strong>g to classify text fromlabeled and unlabeled documents. AAAIJ (1998)32. Nigam, K., McCallum, A., Thrun, S., Mann, G.: Text <strong>classification</strong> from labeledand unlabeled documents us<strong>in</strong>g em. Mach<strong>in</strong>e Learn<strong>in</strong>g 39 (2000) 103–13433. Joachims, T.: Transductive <strong>in</strong>ference for text <strong>classification</strong> us<strong>in</strong>g support vectormach<strong>in</strong>es. ICML (1999) 200–20934. Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-<strong>supervised</strong> learn<strong>in</strong>g us<strong>in</strong>g gaussianfields and harmonic functions. ICML (2003) 912–91935. Rosenberg, C., Hebert, M., Schneidermann, H.: Semi-<strong>supervised</strong> self-tra<strong>in</strong><strong>in</strong>g <strong>of</strong>object detection models. WACV (2005)36. Blum, A., Mitchel, T.: Comb<strong>in</strong><strong>in</strong>g labeled and unlabeled data with co-tra<strong>in</strong><strong>in</strong>g.WCLT (1998) 92–10037. Fablet, R., Lefort, R., Scalabr<strong>in</strong>, C., Mass, J., Boucher, J.M.: <strong>Weakly</strong> <strong>supervised</strong>learn<strong>in</strong>g us<strong>in</strong>g proportion based <strong>in</strong>formation: an application to fisheries acoustic.International Conference on Pattern Recognition (2008)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!