13.07.2015 Views

Shape and Appearance Context Modeling - UCLA Vision Lab

Shape and Appearance Context Modeling - UCLA Vision Lab

Shape and Appearance Context Modeling - UCLA Vision Lab

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Algorithm 1: Fast occurrence computation12345Data: Functions A <strong>and</strong> SResult: Occurrence matrix ΘbeginUse (7) to compute G from a single pass inspection of A// Compute |D s| α Ds <strong>and</strong> ∇ · D sforeach x ∈ Λ do|D S(x) | ←− |D S(x) | + 1ifIsCorner (x) thenSet α DS(x) (x)6S 7 ∇ · D S(x) ←− ∇ · D S(x) {x}// Use (6) to compute Θ// |p| α p <strong>and</strong> ∇ · p known a priori8 foreach s ∈ S do9 foreach p ∈ P do10foreach x ∈ ∇ · D s do11foreach y ∈ ∇ · p do12Θ(·, s, p) ←− Θ(·, s, p)+13α Ds (x)α p(y)G(·,x + y)1415 return Θ16 endΘ(·, s, p) ←− |D s| −1 |p| −1 Θ(·, s, p)domain, the definition can be generalized to any dimension,<strong>and</strong> Theorem 2 still holds.Complexity analysis. Given S <strong>and</strong> A, the na¨ive approachto compute Θ costs O(N 4 ) in time (we assume M ∼ N),which is too much for real-time applications, even if N isnot very large. In [8] a dynamic programming approachreduces the cost to O(N 3 ). In [19] a particular partition Pwhere every p ∈ P is a square ring defined by |∇ · p| = 8corners enables a computation cost 10 of O(N 2 l|∇ · p|) =O(N 2 .C P ), where C P = l|∇·p| represents the total numberof corners of P.We now calculate the computational cost of Algorithm 1.Line 2 can be evaluated by a single pass inspection of A, <strong>and</strong>has the same computational cost of an integral histogram,i.e. O(N 2 ). Line 3-7 is another single pass inspection of Swith cost O(N 2 ). Line 12 costs O(1). Line 11 is an average.multiplying factor of C P /l, where C P =∑i |∇ · p i|. Line.∑10 is an average multiplying factor of C S /n, where C S =i |∇ · D s i|. Line 8 <strong>and</strong> 9 are multiplying factors of n<strong>and</strong> l respectively. Therefore, the cost of 8-13 is O(C S C P ).Finally, the total cost of Algorithm 1 is O(N 2 + C S C P ).In practice we have C s C P ∼ N 2 . Therefore, Algorithm 1has an effective cost of O(N 2 ), which is C P (the number ofcorner points of the partition P) times faster then the stateof-the-art[19]. It is interesting to note that Algorithm 1 isonly marginally sensitive to the choice of the partition P,which, as opposed to [19], here is allowed to be arbitrary.10 In [19] |∇ · p| is part of the hidden constants. Here we make thedependency explicit to better compare their approach with ours.4. Bag-of-features modelingIn this section, as well as in Sections 5 <strong>and</strong> 6, we are interestedin designing a highly distinctive descriptor for animage I, belonging to I, the space of all the images definedon a discrete domain Λ of dimensions M × N pixels.To this end we process the image by applying an operatorΦ : I × Λ → R r , such that (I,x) is mapped to a local descriptorϕ(x) . = Φ(I,x). The operator Φ could be a bankof linear filters, as well as any other non-linear operation.Once ϕ is available, the descriptor is computed in twosteps. The first one performs a vector quantization of ϕ,according to a quantization function q : R r → A, withquantization levels A = {a 1 , · · · , a m }. This produces theappearance labeled image A(x) . = q ◦ ϕ(x) (see Figure 3for an example). We refer to A as the appearance dictionary,made of appearance labels learnt a priori. The secondstep computes the histogram of the labels h : A → [0, 1],such that the label a maps toh(a) . = P[A(x) = a] . (8)The image descriptor is defined to be the histogram h.HOG Log-RGB operator. In Section 7 we experimentwith several operators Φ, such as different color spaces <strong>and</strong>filter banks, <strong>and</strong> test their matching performance with thedescriptor (8). The best performer operates in the RGBcolor space, <strong>and</strong> is such that ϕ(x) = . (HOG(∇ log(I R ),x);HOG(∇ log(I G ),x); HOG(∇ log(I B ),x)), where I R , I G ,I B , are the R, G, <strong>and</strong> B channels of I respectively. Theoperator HOG(·,x) computes the l bins histogram of orientedgradients of the argument, on a region of w × w pixelsaround x. The gradient of the Log-RGB space has aneffect similar to the homomorphic filtering, <strong>and</strong> makes thedescriptor robust to illumination changes.5. <strong>Appearance</strong> context modelingThe main drawback of the bag-of-features model is that imagesof different objects that share the same appearance labeldistribution h(a), share also the same descriptor, annihilatingthe distinctiveness that we are seeking. This is due tothe fact that (8) does not incorporate any description of howthe object appearance is distributed in space. On the otherh<strong>and</strong>, this information could be captured by computing thespatial co-occurrence between appearance labels.<strong>Appearance</strong> context. More precisely, the co-occurrencematrix Θ, computed on the appearance labeled image A(x)with the plane partition P depicted in Figure 4, will be referredto as the appearance context descriptor of I, whichis an m × m × l matrix.<strong>Appearance</strong> context vs. bag-of-features. Note that theinformation carried by the descriptor (8) is included in theappearance context descriptor. In fact, by using (6) one canshow that Θ reduces to (8), in particular, for every b ∈ Awe have h(a) = 1/|Λ| ∑ p∈P|p|Θ(a, b, p).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!