11.07.2015 Views

Feature-Based Stereo Matching Using Graph Cuts.pdf

Feature-Based Stereo Matching Using Graph Cuts.pdf

Feature-Based Stereo Matching Using Graph Cuts.pdf

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ASCI – IPA – SIKS tracks, ICT.OPEN, Veldhoven, November 14–15, 20112 Disparity EstimationOur disparity estimation algorithm consists of four mainstages. First, feature points (key points) are extractedand matched, and the image is over-segmented usingmean-shift colour segmentation. Second, we performinitial disparity estimation with an adaptive blockmatching. Third, we fit planes that estimate the geometryof each segment and we cluster the resulting disparityplanes that describe the depicted scene. Finally, adisparity plane is assigned to each segment using graphcuts. The flowchart of the algorithm is shown in Fig. 1.The four steps are described in detail in subsections 2.1to 2.4.Figure 1: The flowchart of the proposed algorithm.2.1 Colour Segmentation and SURF Key PointExtractionMost of the camera calibration algorithms extract key(salient) points and match them to determine the transformationbetween cameras such as in [18]. Such keypoints are commonly close to the corners or edges insidethe image where with high probability, there existsdisparity discontinuity. Disparities at such locationsare hard to estimate because of the intrinsic disparitysmoothness assumption of block matching insidethe matching window. Although, such non-trivial disparitiesare easy to estimate with the use of matchingkey points between stereo pairs, recent state-ofthe-artdisparity estimation algorithms do not incorporatethe key point disparities into the disparity estimation.We propose to incorporate information obtainedby salient points into the estimation of initial disparity.Our approach decreases the noise as a result ofusing a restricted search space, because it reduces thetendency of many algorithms to overestimate the disparities.The overall gain depends on the validity andamount of matching key points found in pairs of rectifiedimages. Since the images are already rectified forstereo-matching, the matching key points should lie onthe same epipolar line. Therefore, the vertical positionsof key points should satisfy:∀s ∈ S : |y sL − y sR | < 0.5, (1)where S is the set of salient points for right and leftimages, y sL and y sR are the vertical positions of thosepoints in left and right views, respectively. Mean-shiftcolour segmentation is applied to over-segment the imageinto homogenous image regions in which we do notexpect to have disparity discontinuity. Since we knowthe disparity of the key points and in which segmentthe feature point is located, we can obtain a rough estimateof the disparity of the segment efficiently with abounded disparity search given by:∀t ∈ T ∧ ∀(x, y) ∈ t : d t,low ≤ d(x, y) ≤ d t,high . (2)d t,low and d t,high are:{ max{⌊θ1 − θd t,low =2 ⌋, d min } if K t ≠ ∅d min otherwise,{ min{⌈θ3 − θd t,high =2 ⌉, d max } ifK t ≠ ∅otherwise,d maxwhere, θ 1 , θ 2 and θ 3 are:θ 1 = min{∀(x, y) ∈ K t : |x L − x R |},θ 2 = α × (d max − d min ),θ 3 = max{∀(x, y) ∈ K t : |x L − x R |},K t : ∀(x, y) ∈ t ∩ S.(3)(4)We define T as the set of segments, d(x, y) is the disparityof the pixel at location (x, y), d max and d min are themaximum and minimum passible disparities for the imagepairs, whereas d t,low and d t,high are upper and lowerboundaries of the disparity range to search for segmentt, α is the scaling coefficient ranges in between 0 and 1.Experimentally we found that choosing α equal to 0.25produces appropriate disparity ranges for various imagedatasets.2.2 Local Pixel <strong>Matching</strong>Local pixel matching is based on a matching cost functionand an aggregation window around the pixel of interest.The most common choices for the cost functionare sum of squared differences (SSD) and sum of absolutedifferences (SAD). We chose SAD for matching costof pixel intensity and gradient:C inten (i, j, d) = |I L (i, j) − I R (i + d, j)|, (5)C grad (i, j, d) = | ▽ x I L (i, j) − ▽ x I R (i + d, j)| (6)d L (x, y) = argmin d∑+| ▽ y I L (i, j) − ▽ y I R (i + d, j)|,+C grad (i, j, d),∀i,j∈R xyC inten (i, j, d) (7)2


ASCI – IPA – SIKS tracks, ICT.OPEN, Veldhoven, November 14–15, 2011where d is the disparity values subject to Eq.2 and R xy isthe aggregation window constructed around the pixel at(x, y). The contents of the aggregation window shouldnot contain disparity discontinuities in order to havereliable matching scores. In order to obtain such aggregationwindows, we use the following adaptive boxmatching approach:• Take a box around the pixel of interest.• Aggregate the matching cost for only the pixelsthat lies on the same region as the pixel of interestinside the box as represented in Eq.7. The calculationof right to left disparity is similar to calculationof left to right, but this time right view is used asreference.The matching is done for both reference (left) image andright image and the disparity is assigned in conjunctionwith a winner-takes-all (WTA) optimization. In orderto find non-occluded (reliable) matches, a cross-check isperformed. To alleviate the foreground fattening effectdescribed in [6], the minimum value of the left and rightdisparities is used as the final initial estimate:{dd x,y =L (x, y) iff d L (x, y) = d R (x, y)min{d L (x, y), d R (x, y)} otherwise.Fig. 2 shows the initial disparity estimation results forwith and without key points. The region where theeffect of key-points can be clearly seen is encircled withred a elipse. The error around the region is supressedwith the disparity range restrictions implied by the keypoints inside the region.Figure 2: The initial disparity result; (Left) WithoutSurf (Right) With Surf.2.3 Determining Plane ParametersThe next step after the initial disparity estimation is toestimate the plane parameters that represent the scene.The general form of the equation is:d(x, y) = ax + by + c, (8)Figure 3: The disparity map after plane fitting.where, a,b and c are the plane parameters to estimateand x,y and d(x, y) are the row, column and initial disparityvalues of the pixels. There are three main approachesto estimate the disparity plane parameters:(1) a RANSAC solution, (2) a histogram-based solution,(3) a Least-squares solution. Yang et al. incorporatesRANSAC in their algorithm [10]. RANSACis very robust to outliers such that the algorithm caneven work effectively when there are only 50 percentof inliers. Wang and Zheng have shown that RANSACprovides even better solutions than histogram-based approach,however the result mostly depends on the initialset of the algorithm [7]. The third approach is sensitiveto outliers. Hong and Chen used least-squaressolution with only non-occluded pixel disparities insidethe segment [9]. Because of its superior performanceand robustness to outliers, we selected the RANSAC algorithmthat only considers non-occluded disparities asinput. In most of the disparity estimation algorithms,the estimation of disparity plane parameters is eitherperformed for all of the segments or until a sufficientnumber of plane equations is found to represent thescene. However, fitting planes for all segments is computationallycostly, and may lead to erroneous disparityplanes in small or noisy segments. Because RANSACworks best when there are at least 50 percent of inliersand because large regions provide larger clusters of reliabledisparities than smaller regions, we opt to applythe RANSAC to segments that contain more than 100pixels, and of which at least 50 percent of the pixelsare non-occluded. The disparity estimation result afterplane fitting is shown in Fig. 3.2.4 Disparity Plane Assignment <strong>Using</strong> <strong>Graph</strong> <strong>Cuts</strong>The final stage of the algorithm assigns a disparity planeto each image segment by minimizing an energy functionthat incorporates a smoothness constraint. The energyminimization problem is solved using a graph cut approachin which each node corresponds to a segment.Let P be the set of disparity plane parameter labels.Our aim is to find a labelling f that assigns each segmentt ∈ T to its plane label p ∈ P by minimizing the3


ASCI – IPA – SIKS tracks, ICT.OPEN, Veldhoven, November 14–15, 2011following energy function:E(f) = E data (f) + E smooth (f), (9)where E data (f) is the cost of assigning plane labels tothe segments.In most of the state-of-the-art algorithms, such as [8,9], the matching costs evaluated by Eq. 7 over all nonoccludedpixels is used as the data term. In this work,instead of using these matching costs, we proposed touse the following modified data cost (MDC) which isgiven by:E data (f) = ∑ t∑λ|d f(t) (x, y) − d(x, y)|e −n/m , (10)(x,y)∀t ∈ T, ∀(x, y) ∈ t − O t ,in which O t is the set of occluded pixels in t, n is thenumber of non-occluded pixels that has the same initialdisparity as the disparity after plane fitting , m is thenumber of non-occluded pixels inside the segment,λ isthe scaling coefficient and finally d f(t) represents thedisparity of the pixel (x, y) after fitting a plane withlabel f(t):d f(t) (x, y) = a f(t) x + b f(t) y + c f(t) . (11)By using absolute difference of the disparities ratherthan the matching cost makes any refinements after initialdisparity estimation step to directly improve theaccuracy of E data (f). Therefore, any effort to have betterinitial estimate such as preventing the foregroundfattening directly affects the final disparity estimation.E smooth (f) is a smoothness term that penalizes the discontinuitiesin plane labels of neighboring segments. Wedefine E smooth (f) as:E smooth (f) = ∑ t∑γ(t, q)(1 − δ(f(t), f(q))), (12)q∀t ∈ T,∀q ∈ N(t).Herein, N(t) is the set of neighbors of t and γ(t, q) is:γ(t, q) = wβe (−τ 2 /σ 2) , (13)where w and σ are scaling parameters,β and τ are theboundary length and mean colour difference between tand q, respectively.3 ExperimentsThis section describes our experimental evaluation. Section3.1 presents the setup of our experiments; the resultsof our experiments are presented in Section 3.2.3.1 Experimental SetupTo evaluate the performance of our algorithm, we performedexperiments on the image datasets provided byScharstein and Szelinski [2]. We evaluate the algorithmby measuring the percentage of pixels that have erroneousdisparity values. Herein, a disparity value isdefined to be erroneous if the absolute difference fromground truth is larger than 1. As in common practicein the evaluation of stereo algorithm, we look at resultsfor (1) non-occluded pixels only (nonocc), (2) all pixels(all), (3) pixels in image regions that are close to adisparity discontinuity (disc). Experimentally we foundthat choosing α in Eq. 11 as 10 and choosing w,σ inEq. 13 as 25 and 150 gives the best results for all stereopairs in Middleburry image dataset. Since the algorithmis using segments instead of pixels in the globalenergy minimization step, the two parameters of meanshift segmentation, namely h s and h r , may affect thefinal results and the optimum values of these parametersmay change for different image datasets. Howeverchoosing h s as 5 and h r as 4 leads satisfactory resultsfor all pairs of images.3.2 Experimental ResultsIn Table 1, we show the performance obtained by thefour variants of our algorithm, namely without keypoints (KP) and without MDC which constitutes thebaseline, with only KP, with only MDC, and finally withKP and MDC on Tsukuba data set. The results showthat both the KP and the MDC lead to improvementsin all three scores, and that their combination leads toa further improvement. The resulting disparity mapsof the four variants of our algorithm are illustrated inFig. 4. The figure shows that the variants without KPstill have the problems in the area encapsulated by redellipse which was also the case in Fig. 2. In Fig.5, weshow the performance of the best variant of our algorithm(KP+MDC) on all four Middleburry data sets.The results show that our algorithm produces almostthe same result as the ground truth for Venus data set,whereas the results are very similar to ground truth forTsukuba and Teddy data sets. Since cones are difficultto be represented by planes, our algorithm suffers mostin Cones data set. In Table 2, we compare the performanceof our algorithm with seven state-of-the-art disparitymatching algorithms. The results show that ouralgorithm performs on par with the state-of-the-art; iteven outperforms all other algorithms on the Venus dataset.4 ConclusionsIn this paper, we presented a novel disparity estimationalgorithm. The two main novelties of this paper are4


ASCI – IPA – SIKS tracks, ICT.OPEN, Veldhoven, November 14–15, 2011Table 1: Percentage of erroneous disparity values of the disparity estimations for Tsukuba data setAlgorithm nonocc all discbaseline 2.64 3.26 11.8KP 1.56 2.23 7.42MDC 1.25 1.75 6.28MDC and KP 1.08 1.59 5.82Figure 4: Estimated final disparities:(a)Ground truth, (b)without KP and without MDC, (c)with only KP, (d)withonly MDC, (e)with MDC and KP.that (1) it incorporates the key points into the matchingprocess and (2) that it uses a novel cost function fordisparity plane assignment. The results indicate thatthe proposed algorithm perform in par with the currentstate-of-the-art algorithms and that it outperformsstate-of-the-art techniques in some datasets. Althoughwe currently only use key points in the initial disparityestimation, the key points can also be used for trainingan adaptive cost function, because they provide reliabledisparities at their locations. We aim to investigate suchan approach in future work.References[1] D. Scharstein and R. Szelinski. A Taxonomy andEvaluation of Dense Two-Frame <strong>Stereo</strong> CorrespondanceAlgorithms. International Journal of ComputerVision, vol. 47, pp. 7-42, 2003.[2] D. Scharstein and R. Szelinski. Middleburry <strong>Stereo</strong>Vision Page. http://vision.edu/stereo/eval.[3] R. Zambih and J. Woodfill. Non-parametric LocalTransforms for Computing Visual Correspondance.ECCV, vol. 2, pp. 151-158, 1994.[4] Y. Baykov, O. Veksler, R. Zambih. A Variable WindowApproach for Early Vision. IEEE Transactionson Pattern Analysis and Machine Interaction,vol.20, pp. 1283-1294, 1998.[5] M. Gong and Z. Zheng. Multi-Resolution <strong>Stereo</strong><strong>Matching</strong> <strong>Using</strong> Genetic Algorithm. SMBV,2001.[6] M. Gerrits and P. Bekaert. Local <strong>Stereo</strong> <strong>Matching</strong>with Segmentation-<strong>Based</strong> Outlier Rejection. 3 r dCanadian Conference on Computer and Robot Vision,June 2006.[7] Z. Wang and Z. Zheng. A Region <strong>Based</strong> <strong>Stereo</strong><strong>Matching</strong> Algorithm <strong>Using</strong> Cooperative Optimization.CVPR,2008.[8] A. Klaus, M. Sourmann and K. Karner. Segment-<strong>Based</strong> <strong>Stereo</strong> <strong>Matching</strong> <strong>Using</strong> Belief Propagation anda Self Adapting Dissimilarity Measure. Proc. ICPR,pp. 15-18, 2006.[9] L. Hong and G. Chen. Segment-<strong>Based</strong> <strong>Stereo</strong> <strong>Matching</strong><strong>Using</strong> <strong>Graph</strong>-<strong>Cuts</strong>. Proc. CVPR, vol. 1, pp. 74-81, 2004.[10] Q. Yang, L. Wang, R. Yang, H. Stewenius and D.Nister. <strong>Stereo</strong> <strong>Matching</strong> with Color-Weighted Correlation,Hierarchical Belief Propagation, and OcclusionHandling. IEEE Trans. on Pattern Analysisand Machine Intelligence, vol. 3, pp. 492-504, 2009.[11] P. F. Felzenszwalb and D. P. Huttenlocher. EfficientBelief Propagation for Early Vision. CVPR,vol. 1, pp. 261-268, 2004.[12] Y. Boykov, O. Veksler and R. Zabih. Fast ApproximateEnergy Minimization via <strong>Graph</strong> <strong>Cuts</strong>. IEEETrans. Pattern Analysis and Machine Intelligence,vol. 23, pp. 1222-1239, 2001.[13] Y. Boykov and V. Kolmogorov. An ExperimentalComparison of Min-Cut/Max-Flow Algorithms forEnergy Minimization in Computer Vision. IEEETrans. on Pattern Analysis and Machine Intelligence,September 2004.[14] M. Tappen and W. Freeman. Comparison of <strong>Graph</strong><strong>Cuts</strong> with Belief Propagation for <strong>Stereo</strong>. Proc. IEEEICCV, vol. 1, pp. 508-515, 2003.5


ASCI – IPA – SIKS tracks, ICT.OPEN, Veldhoven, November 14–15, 2011Table 2: Percentage of erroneous disparity values of proposed algorithm with top performing algorithms.Algorithm Avg. RankTsukuba Venus Teddy Conesnonocc all disc nonocc all disc nonocc all disc nonocc all discProposed 24.2 1.08 1.59 5.82 0.08 0.16 1.11 7.17 8.25 18.5 3.59 9.4 11.0ADCensus[19] 5.8 1.07 1.48 5.73 0.09 0.25 1.15 4.1 6.22 10.9 2.42 7.25 6.95AdaptingBP[8] 7.2 1.11 1.37 5.79 0.10 0.21 1.44 4.22 7.06 11.8 2.48 7.92 7.32CoopRegion[7] 7.2 0.87 1.16 4.61 0.11 0.21 1.54 5.16 8.31 13.0 2.79 7.18 8.01DoubleBP[10] 9.7 0.88 1.29 4.76 0.13 0.45 1.87 3.53 8.3 9.63 2.9 8.78 7.79RDP[20] 10.3 0.97 1.39 5.00 0.21 0.38 1.89 4.84 9.94 12.6 2.53 7.69 7.38OutlierConf[21] 10.8 0.88 1.43 4.74 0.18 0.26 2.40 5.01 9.12 12.8 2.78 8.57 6.99SubPixDBP[22] 14.5 1.24 1.76 5.98 0.12 0.46 1.74 3.45 8.38 10.0 2.93 8.73 7.91Figure 5: Results on Middleburry datasets. From top to bottom: Tsukuba, Venus, Teddy, Cones. From left to right:reference images,ground truth disparities, the results of the proposed algorithm and the error images where the blackregions represents the erroneous pixels.[15] M. Bleyer and M. Gelautz. <strong>Graph</strong>-<strong>Based</strong> SurfaceReconstruction from <strong>Stereo</strong> Pairs <strong>Using</strong> Image Segmentation.SPIE, vol. 5665, pp. 508-515, 2003.[16] D. Comaniciu and P. Meer. Mean-Shift: A RobustApproach Toward <strong>Feature</strong> Space Analysis. in IEEEPAMI, vol. 5, pp. 603-619, 2002.[17] M. Fischler and R. Bolles. Random Sample Consesus:A Paradigm for Model Fitting with Applicationsto Image Analysis and Automated Cartography.CACM, vol. 6, pp. 381-395, June 1981.[18] T. Schreuder, E. A. Hendriks and A. Redert.OTESC: Online Transformation Estimation Between<strong>Stereo</strong> Cameras. Proc. 3DVP, pp. 45-50, October2010.[19] X. Mei,X. Sun, M. Zhou, S. Jiao, H. Wang and X.Zhang. On Building an Accurate <strong>Stereo</strong> <strong>Matching</strong>System on <strong>Graph</strong>ics Hardware. GPUCV 2011.[20] X. Sun, X. Mei, S. Jiao, M. Zhou, and H. Wang.<strong>Stereo</strong> <strong>Matching</strong> with Reliable Disparity Propagation.3DIMPVT 2011.[21] L. Xu and J. Jia. <strong>Stereo</strong> <strong>Matching</strong>: An OutlierConfidence Approach. ECCV 2008.[22] Q. Yang, R. Yang, J. Davis, and D. Nistr. Spatial-Depth Super Resolution for Range Images. CVPR2007.6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!