12.07.2015 Views

Visual SLAM Based on Rigid-Body 3D Landmarks - Computer vision ...

Visual SLAM Based on Rigid-Body 3D Landmarks - Computer vision ...

Visual SLAM Based on Rigid-Body 3D Landmarks - Computer vision ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

J Intell Robot Syst (2012) 66:125–149DOI 10.1007/s10846-011-9601-5<str<strong>on</strong>g>Visual</str<strong>on</strong>g> <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> <str<strong>on</strong>g>Based</str<strong>on</strong>g> <strong>on</strong> <strong>Rigid</strong>-<strong>Body</strong> <strong>3D</strong> <strong>Landmarks</strong>Patricio L<strong>on</strong>comilla · Javier Ruiz del SolarReceived: 17 December 2010 / Accepted: 11 May 2011 / Published <strong>on</strong>line: 17 August 2011© Springer Science+Business Media B.V. 2011Abstract In current visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> methods, pointlikelandmarks (As in Filliat and Meyer (CognSyst Res 4(4):243–282, 2003), we use this expressi<strong>on</strong>to denote a landmark generated by a pointor an object c<strong>on</strong>sidered as punctual.) are usedfor representati<strong>on</strong> <strong>on</strong> maps. As the observati<strong>on</strong>of each point-like landmark gives <strong>on</strong>ly angularinformati<strong>on</strong> about a bearing camera, a covariancematrix between point-like landmarks must be estimatedin order to c<strong>on</strong>verge with a global scaleestimati<strong>on</strong>. However, as the computati<strong>on</strong>al complexityof covariance matrices scales in a quadraticway with the number of landmarks, the maximumnumber of landmarks that is possible touse is normally limited to a few hundred. In thispaper, a visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system based <strong>on</strong> the useof what are called rigid-body <strong>3D</strong> landmarks isproposed. A rigid-body <strong>3D</strong> landmark representsthe 6D pose of a rigid body in space (positi<strong>on</strong>and orientati<strong>on</strong>), and its observati<strong>on</strong> gives fullposeinformati<strong>on</strong> about a bearing camera. Eachrigid-body <strong>3D</strong> landmark is created from a set of Npoint-like landmarks by collapsing 3N state comp<strong>on</strong>entsinto seven state comp<strong>on</strong>ents plus a set ofparameters that describe the shape of the landmark.<strong>Rigid</strong>-body <strong>3D</strong> landmarks are representedand estimated using so-called point-quaterni<strong>on</strong>s,which are introduced here. By using rigid-body<strong>3D</strong> landmarks, the computati<strong>on</strong>al time of anEKF-<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system can be reduced up to 5.5%, asthe number of landmarks increases. The proposedvisual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system is validated in simulated andreal video sequences (outdoor). The proposedmethodology can be extended to any <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> systembased <strong>on</strong> the use of point-like landmarks,including those generated by laser measurement.Keywords Robotics · Localizati<strong>on</strong> · <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> ·6D <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> · <str<strong>on</strong>g>Visual</str<strong>on</strong>g> <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> · M<strong>on</strong>o<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> ·<strong>3D</strong> Mapping · Model reducti<strong>on</strong>1 Introducti<strong>on</strong>P. L<strong>on</strong>comilla · J. R. del SolarDepartment of Electrical Engineering,Universidad de Chile, Santiago, ChileP. L<strong>on</strong>comilla (B) · J. R. del SolarAdvanced Mining Technology Center,Universidad de Chile, Santiago, Chilee-mail: pl<strong>on</strong>comi@ing.uchileSimultaneous Localizati<strong>on</strong> and Mapping (<str<strong>on</strong>g>SLAM</str<strong>on</strong>g>)has been <strong>on</strong>e of the most highly investigated topicsin mobile robotics in the last 20 years. Severalworkshops, special sessi<strong>on</strong>s in c<strong>on</strong>ferences andspecial issues in journals have been devoted tothis research topic. Visi<strong>on</strong>-based or visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>,i.e. the attempt to solve <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> using standard


126 J Intell Robot Syst (2012) 66:125–149cameras as the main sensory input [2], has attractedthe attenti<strong>on</strong> of the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> communityin recent years. Main challenges in visi<strong>on</strong>-based<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> are robust feature detecti<strong>on</strong>, efficient androbust data associati<strong>on</strong> and loop-closure, andcomputati<strong>on</strong>ally efficient large-scale state estimati<strong>on</strong>[2]. <str<strong>on</strong>g>Visual</str<strong>on</strong>g>-landmark definiti<strong>on</strong>, representati<strong>on</strong>,and estimati<strong>on</strong> are some of the key issues totackle in order to address these challenges.In the current visi<strong>on</strong>-based <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> literature,points are selected as landmarks because of theirdirect geometrical interpretati<strong>on</strong>, which enablesthe straightforward formulati<strong>on</strong> of the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>problem [3–5]. However, when a fully calibratedcamera observes a point, <strong>on</strong>ly weak angular informati<strong>on</strong>that relates the observati<strong>on</strong> to the posesof the landmarks and the camera is obtained.As <strong>on</strong>ly angular informati<strong>on</strong> is available, severalpossible maps can explain the observati<strong>on</strong>s, sinceany rotati<strong>on</strong>, translati<strong>on</strong> or scale transformati<strong>on</strong>sapplied to them preserve the coherence betweenthe model and the measurements [3].As each point-like observati<strong>on</strong> fulfills <strong>on</strong>ly 2degrees of freedom, and the map has 7 degreesof freedom, sets of several simultaneously observedpoints must be used in order to estimatethe map, which necessitates the use of a fullcovariancematrix [6]. When the size of the mapincreases, the number of landmarks becomes veryrelevant, as the number of computati<strong>on</strong>s requiredto update the covariance matrix is proporti<strong>on</strong>alto the square of the full state size. As a result ofthis fact, the map size is limited by the numberof landmarks, which can increase <strong>on</strong>ly up to afew hundred for real-time applicati<strong>on</strong>s. Since amap created by using <strong>on</strong>ly angular informati<strong>on</strong> isweakly c<strong>on</strong>strained, robustness and precisi<strong>on</strong> oflocal maps become very limited [3]. Landmarkrecogniti<strong>on</strong> is based <strong>on</strong> point-projecti<strong>on</strong> predicti<strong>on</strong>and matching of local patches around eachpoint, which gives weak associati<strong>on</strong> informati<strong>on</strong>,forcing the use of RANSAC-like strategies fordiscarding sets of false associati<strong>on</strong>s [4]. Then, alternativelandmark-modeling methodologies arerequired in order to overcome the inherent limitati<strong>on</strong>sof point-like landmarks.<strong>Landmarks</strong>, in their widest sense, are geometricalfeatures that enable the descripti<strong>on</strong> of amap in a fashi<strong>on</strong> understandable to humans, andthat make possible self-localizati<strong>on</strong>. By followingthis wide definiti<strong>on</strong>, it can be noted that humanslocalize themselves using landmarks that do notcorresp<strong>on</strong>d to points, but instead corresp<strong>on</strong>d towide regi<strong>on</strong>s in space that are recognized by visualinspecti<strong>on</strong>, by means of a hierarchy of increasinglysophisticated representati<strong>on</strong>s. <str<strong>on</strong>g>Visual</str<strong>on</strong>g> observati<strong>on</strong>sfrom these wide-regi<strong>on</strong> landmarks are not limitedto angular informati<strong>on</strong>, since they include bothrelative distance and orientati<strong>on</strong> between the observerand each landmark. In additi<strong>on</strong>, humansare able to give descripti<strong>on</strong>s of places or pathsbetween different places by using references tosemantic informati<strong>on</strong> that is more related to fullobjects than to points. Thus, the ability of a robotto use landmarks related to wide areas, instead ofto points, is desirable in order to generate morerobust observati<strong>on</strong>s, to facilitate semantic labeling,and to reduce the amount of data needed tomaintain the map.In order to address the previously menti<strong>on</strong>edaspects, a methodology for generating, representingand estimating rigid-body <strong>3D</strong> landmarks isproposed. A rigid-body <strong>3D</strong> landmark representsthe 6D pose of a rigid body in space (positi<strong>on</strong> andorientati<strong>on</strong>), and its observati<strong>on</strong> gives full-poseinformati<strong>on</strong> about the camera. Each rigid-body<strong>3D</strong> landmark is created from a set of N pointlikelandmarks by collapsing 3N state comp<strong>on</strong>entsinto seven state comp<strong>on</strong>ents plus a set of parametersthat describe the shape of the landmark (socalledbody points and their covariance matrices).<strong>Rigid</strong>-body <strong>3D</strong> landmarks are represented andestimated using point-quaterni<strong>on</strong>s, which are hereintroduced and named.A visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system that uses point-likeand rigid-body <strong>3D</strong> landmarks, based <strong>on</strong> the EKF-<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> formulati<strong>on</strong>, is also proposed. The use ofrigid-body <strong>3D</strong> landmarks permits reducing thecomputati<strong>on</strong>al time of the EKF-<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system upto 5.5%, as the number of landmarks increases.The proposed visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system is validated insimulated and real video sequences.This paper is organized as follows. Importantrelated work is presented in Secti<strong>on</strong> 2. InSecti<strong>on</strong> 3, the proposed methodology used torepresent and estimate rigid-body <strong>3D</strong> landmarksis described. In Secti<strong>on</strong> 4, the proposed visual<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system is explained. An experimental


J Intell Robot Syst (2012) 66:125–149 127evaluati<strong>on</strong> of the system is presented in Secti<strong>on</strong> 5.Finally, some c<strong>on</strong>clusi<strong>on</strong>s of this work are drawnin Secti<strong>on</strong> 6.2 Related WorkVisi<strong>on</strong>-based <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> is an important researchtopic that has attracted increasing attenti<strong>on</strong> inthe mobile robotics community. Interestingly, assmart-ph<strong>on</strong>es and digital cameras are gaining popularity,visi<strong>on</strong>-based <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> has acquired manypotential applicati<strong>on</strong>s bey<strong>on</strong>d robotics, “due tothe capability it can give a camera to serve as ageneral-purpose <strong>3D</strong> positi<strong>on</strong> sensor” [2].Most of the current work related to m<strong>on</strong>ocularbasedvisual localizati<strong>on</strong> stands <strong>on</strong> two approaches:structure-from-moti<strong>on</strong> recovery, andm<strong>on</strong>ocular <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>. Structure-from-moti<strong>on</strong> recoveryis based <strong>on</strong> algorithms that estimate corresp<strong>on</strong>dingpoints between c<strong>on</strong>secutive imageswithout using a dynamic model. Methodologiesbased <strong>on</strong> Nistér’s visual odometry [7] using optimalpreemptive RANSAC [8] applied over setsof three- and five-points, which are extracted byHarris filtering and processed using local bundleadjustment optimizati<strong>on</strong>, can achieve impressiveresults over large paths, but they accumulate anever-increasing error over time, as they are based<strong>on</strong>ly <strong>on</strong> relative moti<strong>on</strong>.M<strong>on</strong>ocular visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> approaches, based <strong>on</strong>the seminal works of Davis<strong>on</strong> [6, 9], can achieveimpressive results in small to middle-size maps,but the management of large maps is a hardtopic to face because of scale drifts, covariancematrix expansi<strong>on</strong>, and loop-closure limitati<strong>on</strong>s.Live dense rec<strong>on</strong>structi<strong>on</strong> [10] can be achievedby updating an active mesh by means of c<strong>on</strong>strainedoptical-flow based minimizati<strong>on</strong>. Scalableactive matching [5] has been proposed to managelarge maps that involve a large amount ofcross-correlati<strong>on</strong> by using a graph-pruning approachin order to reduce covariance data, andto limit uncertainty propagati<strong>on</strong> between distantpoints. A drift-aware m<strong>on</strong>ocular <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> [3]has been proposed to model scale-drift by usinga Lie group approach over the rotati<strong>on</strong>translati<strong>on</strong>-scaletransformati<strong>on</strong> group, to achievedifferential-c<strong>on</strong>strained bundle-adjustment optimizati<strong>on</strong>for loop closing. As the time neededfor RANSAC to solve a problem increases dramaticallywith the number of points needed forc<strong>on</strong>forming a minimal subset, 1-point RANSAC[4] has been proposed to achieve fast data associati<strong>on</strong>.This approach is based <strong>on</strong> using 1 pointto update the pose of the robot, and then usingthe new robot’s pose for evaluating c<strong>on</strong>sensus <strong>on</strong>the other points using a chi-square test. Finally,appearance and <strong>3D</strong> geometry [11] have been usedto cluster a map into sets of points that are closein space, and that have similar image areas aroundthem. This approach looks promising for buildingsemantic models.As has been already menti<strong>on</strong>ed, some of thecurrent problems of visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> systems arederived from the fact that the percepti<strong>on</strong> of apoint-like landmark does not allow the camera’spose to be inferred, and several points must beperceived and analyzed. To overcome this drawback,high-level landmarks based <strong>on</strong> sets of pointsscattered over the object surfaces can be used[12, 13]. In [12] high-level structures, such asplanes and lines, are built <strong>on</strong>line using a bottomupprocess that first maps point-like and line-likelandmarks, and then searches for sets of themthat agree with the high level landmarks’ hypothesis.In [13] locally planar landmarks representedusing the inverse depth parametrizati<strong>on</strong> [14] aredefined. The camera’s state, landmark’s normalplane,and measurement errors are representedas Lie groups. Local reference frames defined bya central point and Euler-like angles have beenused in <strong>3D</strong> laser <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> for representing localplanar patches, which generate more compact andmeaningfully maps [15–18]. However, the use ofplane-based features limits the ability of the methodsto handle general outdoor envir<strong>on</strong>ments, andobservati<strong>on</strong>s related to planes lose two degrees offreedom respect to full pose informati<strong>on</strong>, whichlimits the amount of informati<strong>on</strong> gathered fromeach observati<strong>on</strong>. The approach proposed in thiswork is also based <strong>on</strong> the collapsing of pointlikelandmarks into high-level landmarks, but themain differences are the use of n<strong>on</strong>-planar <strong>3D</strong>landmarks, which adds flexibility to the system,and the definiti<strong>on</strong> of a methodology for landmarkrepresentati<strong>on</strong> and estimati<strong>on</strong> that is based<strong>on</strong> the use of point-quaterni<strong>on</strong>s, which form a


128 J Intell Robot Syst (2012) 66:125–149rotati<strong>on</strong>ally-symmetric algebraic group representati<strong>on</strong>for poses in space.3 <strong>Rigid</strong>-<strong>Body</strong> <strong>3D</strong> Landmark Representati<strong>on</strong>and Estimati<strong>on</strong>A rigid-body <strong>3D</strong> landmark, from now <strong>on</strong> referredto as a <strong>3D</strong> landmark, represents the 6D pose ofa rigid body in space. A rigid body is composedby a set of observable points called body points,which are used to create a <strong>3D</strong> landmark. The poseof the <strong>3D</strong> landmark is determined by the locati<strong>on</strong>of the rigid body points when referred to a globalreference frame. The pose of the <strong>3D</strong> landmarkis encoded by using a point and a quaterni<strong>on</strong>[19] chained into a unique object named a pointquaterni<strong>on</strong>.The covariance of a <strong>3D</strong> landmark’spose is determined by the covariance of its associatedpoint-quaterni<strong>on</strong>.In a <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system, every time a subset of thebody points is observed, a compatible pose for the<strong>3D</strong> landmark is computed and used as a virtualobservati<strong>on</strong>. Uncertainty related to the observati<strong>on</strong>of the body points can be propagated intouncertainty in the pose of the <strong>3D</strong> landmark. Thevirtual observati<strong>on</strong> and its covariance enable thecorrecti<strong>on</strong> of the <strong>3D</strong> landmark pose estimati<strong>on</strong>.3.1 6D pose representati<strong>on</strong> usingpoint-quaterni<strong>on</strong>sA point-quaterni<strong>on</strong> η is introduced in this paperas a 7D mathematical object that is composed bya point t and a quaterni<strong>on</strong> q. The point is usedto denote a positi<strong>on</strong>, and the quaterni<strong>on</strong> is usedto denote an orientati<strong>on</strong>. In this way, a pointquaterni<strong>on</strong>can represent a 6D pose in space. Aquaterni<strong>on</strong> can be formed by specifying a unitaryrotati<strong>on</strong> axis ω, and a rotati<strong>on</strong> angle θ. Then, η isdefined as:η 7×1 =(t3×1q 4×1); q =⎛ ⎞ ⎛⎞a cos (θ/2)⎜ b⎟⎝ c ⎠ = ⎜ ω X sin (θ/2)⎟⎝ ω Y sin (θ/2) ⎠ ;d ω Z sin (θ/2)⎛ ⎞xt = ⎝ y ⎠ (1)zSimilarly as in the case of using transformati<strong>on</strong>matrices, using point-quaterni<strong>on</strong>s allows defininga transformati<strong>on</strong> operati<strong>on</strong>, transop, over a vectorp, c<strong>on</strong>sisting of a rotati<strong>on</strong> followed by atranslati<strong>on</strong>:transop (η, p) = q · p · q −1 + t (2)The inverse transformati<strong>on</strong>, inv_transop, isdefinedas:inv_transop (η, p) = q −1 · (p − t) · q (3)Point-quaterni<strong>on</strong>s can be composed by using amultiplicati<strong>on</strong> operati<strong>on</strong>, which is defined as:η 1 · η 2 =( ) ( ) (t1 t2 q1 · t· = 2 · q −1 )1+ t 1q 1 q 2 q 1 · q 2(4)Point-quaterni<strong>on</strong>s c<strong>on</strong>taining a zero quaterni<strong>on</strong>are ill-posed as they do not represent any rotati<strong>on</strong>.Valid point-quaterni<strong>on</strong>s and their multiplicati<strong>on</strong>form a group as they have closure, associativity,an identity element (η I ) and an inverse element(see proof in [20]):η I =η −1 =(03×1) ( 0, 0, 0T=1( )−q−1 · t · qq −11, 0, 0, 0 T ),(5)A special sum for point-quaterni<strong>on</strong>s is notdefined because of the lack of distributive properties,but vector summati<strong>on</strong> can be applied forJacobian-calculati<strong>on</strong> purposes [20].Point-quaterni<strong>on</strong> multiplicati<strong>on</strong> can be usedto relate different reference systems as transformati<strong>on</strong>matrices do. Coordinates from points <strong>on</strong>a reference system A can be transformed intocoordinates <strong>on</strong> a reference system B by usinga point-quaterni<strong>on</strong> η AB . Coordinate transformati<strong>on</strong>sbetween reference systems A, B and C canbe composed by using point-quaterni<strong>on</strong> multiplicati<strong>on</strong>(the multiplicati<strong>on</strong> directi<strong>on</strong> is the same asthat used in homogeneous matrix compositi<strong>on</strong>):η AC = η BC · η AB (6)


J Intell Robot Syst (2012) 66:125–149 129Each point-quaterni<strong>on</strong> η has an associated homogeneousmatrix H(η) that represents the sametransformati<strong>on</strong>:(( )) tH (η) = Hq⎛=⎜⎝a 2 +b 2 −c 2 −d 2a 2 +b 2 +c 2 +d 22bc−ada 2 +b 2 +c 2 +d 22bc+2ad a 2 −b 2 +c 2 −d 2a 2 +b 2 +c 2 +d 2 a 2 +b 2 +c 2 +d 22bd−2aca 2 +b 2 +c 2 +d 22bd+2aca 2 +b 2 +c 2 +d 22cd−2aba 2 +b 2 +c 2 +d 22cd+2ab a 2 −b 2 −c 2 +d 2a 2 +b 2 +c 2 +d 2 a 2 +b 2 +c 2 +d 2t Xt Yt Z0 0 0 1⎞⎟⎠(7)Thus, point-quaterni<strong>on</strong>s can be used for thesame purposes as homogeneous matrices, beingmore compact and well-posed as they do notinclude distorting effects associated with homogeneousmatrices. As each homogeneous matrixc<strong>on</strong>tains 12 variable comp<strong>on</strong>ents, the errorcovariance representati<strong>on</strong> associated with a homogeneousmatrix uses 12 × 12 comp<strong>on</strong>ents,and it is ill-posed when representing pose uncertaintyas it can encode uncertainty about axisorthog<strong>on</strong>ality and scaling. C<strong>on</strong>versely, the covariancematrix of a point-quaterni<strong>on</strong>, a 7 × 7symmetric semipositive-definite matrix, encodesthe uncertainty about a pose in space, and itis well posed as it always represents pure poseuncertainties.3.2 <strong>Rigid</strong>-body <strong>3D</strong> Landmark Generati<strong>on</strong>ProcedureThe procedure used for creating a rigid-body <strong>3D</strong>landmark from N individual point-like landmarks(points) involves transforming 3N positi<strong>on</strong> statecomp<strong>on</strong>ents into seven-pose state comp<strong>on</strong>ents.The covariance representati<strong>on</strong> must be transformedat the same time.First, the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> state vector x (see Secti<strong>on</strong> 4)is divided into the set of points p SET to be fused,and the other state comp<strong>on</strong>ents o:x =(pSETo⎛); p SET = ⎝⎞ ⎛ ⎞p 1x i... ⎠ ; p i = ⎝ y i⎠ (8)p N z i<strong>Body</strong> points i are computed by subtractingthe mean positi<strong>on</strong> value from each point p i to befused: i = p i − m, i = 1,...,N;m 1 NN∑p i (9)In general terms, a point-quaterni<strong>on</strong> defining acoordinate transformati<strong>on</strong> T that relates a set ofpoints in a reference frame A to a set of points in areference frame B, can be computed with minimalerror as:T(p (B)1,...,p (B)= arg minη LNN), p(A) 1,...,p (A)N(∑ N ∥∥q LN · p (A)i=1−p (B)iii=1· q −1LN + t LN∥ 2) (10)Given that the transformati<strong>on</strong> that relates thebody points to the original points corresp<strong>on</strong>ds toa translati<strong>on</strong> m and an identity rotati<strong>on</strong>:η LAND−MAP =( )tLAND−MAPq LAND−MAP= T (p 1 ,...,p N , 1 ,..., N )( ) m=1(11)The new state representati<strong>on</strong> of the <strong>3D</strong> landmarkis put into the full state vector:x NEW =(ηLAND−MAPo)(12)The covariance matrix of the state P is dividedinto four sub matrices, where P pp c<strong>on</strong>tains thecovariances from the points to be fused:( )Ppp PP =po,P op P oo⎛P pp = ⎜⎝P (1,1)3×3P (1,2)3×3... P(1,N)3×3P (2,1)3×3P (2,2)3×3... ...... ... ... ...P (N,1)3×3... ... P (N,N)3×3⎞⎟⎠(13)


130 J Intell Robot Syst (2012) 66:125–149C<strong>on</strong>sidering the size reducti<strong>on</strong> of the vectorstate, P will adopt the following form:( )P77 PP =7o(14)P o7 P ooBy c<strong>on</strong>sidering that each body point i willhave an associated covariance P (i) , covariancepropagati<strong>on</strong> from p i and i into η LAND−MAP canbe estimated using a first order Taylor expansi<strong>on</strong>:⎛P (1) ⎞P 77 = J p P pp J T p − J ⎝ ... ⎠ JP (N) T (15)withJ p = ∂T (p 1,...,p N , 1 ,..., N );∂ (p 1 ,...,p N )J = ∂T (p 1,...,p N , 1 ,..., N )∂ ( 1 ,..., N )P 7o and P o7 are updated as:=−J p (16)P 7o = J p P po ; P o7 = P op J T p (17)The error associated to P pp must be dividedinto the P 77 pose covariance and the P (i)body-points covariance. The decompositi<strong>on</strong> isnot unique as any choice for the set of covariancesP (i) and P 77 is valid, when all of theinvolved covariance matrices are positive semidefinite.Then, several criteria for selecting theP (i) can be generated. In this work, two criteriawill be c<strong>on</strong>sidered, maximal pose covariance andmaximal body-points covariance.1. Maximal pose covariance. The procedure c<strong>on</strong>sidersthe following steps. First, transfer allthe covariance error associated to P pp intoP 77 :P 77 = J p P pp J T p (18)Then, compute the covariance matrix P REC ,which corresp<strong>on</strong>ds to an approximati<strong>on</strong> ofP 77 :P REC = GP 77 G T (19)withG = ∂U ( 1,..., N ,η LAND−MAP )∂ ( 1 ,..., N )and(20)⎛q LAND−MAP · 1 · q −1LAND−MAP + t ⎞ ⎛ ⎞LAND−MAP p 1U ( 1 ,..., N ,η LAND−MAP ) = ⎝...⎠ ≈ ⎝ ... ⎠ (21)q LAND−MAP · N · q −1LAND−MAP + t LAND−MAP p NAfterwards, calculate the covariance matrixof each body point P (i) by subtracting theoriginal P pp and its approximati<strong>on</strong> P RECas:P (i) = D(i,i) 3×3(22)P DIFF = P pp − P REC⎛= ⎜⎝D (1,1)3×<strong>3D</strong> (1,2)3×3... D(1,N)3×<strong>3D</strong> (2,1)3×<strong>3D</strong> (2,2)3×3... ...... ... ... ...D (N,1)3×3... ... D (N,N)3×3⎞⎟⎠(23)Finally, each P (i) must be checked for positivesemidefiniteness by making zero its negativeeigenvalues and rec<strong>on</strong>structing the matrix.2. Maximal body-points covariance. Transfer themaximal amount of covariance to the set ofbody points P (i) by minimizing the amountof covariance that is transferred to P 77(the Levenberg-Marquardt optimizati<strong>on</strong> procedureis used):{ ( (min λ LOWER P pp −α 1α 1 ,α 2withD = diag ( P pp)P (1,1)3x3...P (N,N)3x3) )} 2− α 2 D(24)(25)and λ LOWER (M) the lower eigenvalue of acertain matrix M.


J Intell Robot Syst (2012) 66:125–149 131As any covariance matrix must be positivesemidefinite, it is minimal when its lowereigenvalue is near to zero. After α 1 and α 2 aredetermined, the P (i) are updated as:⎛⎞⎞⎝P (i)...P (N)⎛⎠ = α 1⎝P (1,1)3×3...P (N,N)3×3⎠+ α 2 D (26)Then, P 77 is computed using (15).The body points and their covariance matricesare stored in a special data structure, whichdoes not need to be updated by the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>update procedure.3.3 <strong>Rigid</strong>-body <strong>3D</strong> Landmark Generati<strong>on</strong>Criteri<strong>on</strong>The decisi<strong>on</strong> for generating a new rigid-body <strong>3D</strong>landmark depends <strong>on</strong> the covariance of the pointsto be fused. Small covariances indicate smallererrors in the observati<strong>on</strong>s. Positive and similarcross-covariances between the points assure thatthe correcti<strong>on</strong> of <strong>on</strong>e point generates a similarcorrecti<strong>on</strong> in all the other points, and then theybehave as a rigid body.The proposed fusi<strong>on</strong> criteri<strong>on</strong> is fast to computeand enables the creati<strong>on</strong> of sets of landmarks thatare candidates for fusing. It is based <strong>on</strong> the analysisof the covariance matrix of the points to befused. A variability index (varIndex) is computed.It indicates the degree of variati<strong>on</strong> of the comp<strong>on</strong>entsfrom a subset of the covariance matrix.Subsets with low variability indicate that crosscovariances are similar.Before computing the variability indices of thecovariance matrix P pp , their diag<strong>on</strong>al comp<strong>on</strong>entsP i = P (i,i)3×3(see Eq. 13) are ordered by decreasingtrace-value of the covariance sub matrices foreach point:P i > P j ⇒ P iXX + P iYY + P iZZ> P jXX + P jYY + P jZ Z (27)with⎛⎞P iXX P iXY P iXZP i = ⎝ P iY X P iYY P iY Z⎠ (28)P iZX P iZY P iZZTheorderingindicatedin(27) can be alteredfor eliminating terms that have several negativecross-covariances over the X, Y or Z comp<strong>on</strong>ents(see details in [20]). The variability index is computed<strong>on</strong> several windows in the covariance matrixusing summed area tables for computing fastaverage values into the window:varIndex = min C X (q, r, q + M, r + M)q,r+ C Y (q, r, q + M, r + M)with+ C Z (q, r, q + M, r + M) (29)∑q 1PqrX 2 Xq=q 0 r=r 0C X (q 0 , r 0 , q 1 , r 1 ) =(q 1 − q 0 + 1)(r 1 − r 0 + 1)⎛q 1⎞∑ ∑r 1 P qrX Xq=q 0 r=r 0 × ⎜⎟⎝(q 1 −q 0 +1)(r 1 −r 0 +1) ⎠∑q 1C Y (q 0 , r 0 , q 1 , r 1 ) =(q 1 − q 0 + 1)(r 1 − r 0 + 1)⎛q 1⎞∑ ∑r 1 P qrYYq=q 0 r=r 0 × ⎜⎟⎝(q 1 −q 0 +1)(r 1 −r 0 +1) ⎠r 1 ∑r 1 ∑PqrYY2 q=q 0 r=r 0∑q 1PqrZ 2 Zq=q 0 r=r 02(30)2(31)C Z (q 0 , r 0 , q 1 , r 1 ) =(q 1 − q 0 + 1)(r 1 − r 0 + 1)⎛q 1⎞∑ ∑r 1 P qrZ Zq=q 0 r=r 0 × ⎜⎟⎝(q 1 − q 0 + 1)(r 1 −r 0 +1) ⎠r 1 ∑2(32)and M the number of points to be grouped (e.g.M = 10).


132 J Intell Robot Syst (2012) 66:125–149Finally, when a window has a varIndex below athreshold th, the selected points are collapsed intoa <strong>3D</strong> landmark using the procedure described inSecti<strong>on</strong> 3.2.3.4 Virtual and Estimated <strong>3D</strong> Observati<strong>on</strong>sEvery time the rigid body represented by therigid-body <strong>3D</strong> landmark is observed, measurementsinvolving the body points are obtained. Inthis work, body points are detected as points ofinterest using the SURF methodology [21]. Thepositi<strong>on</strong> of each interest point (pos x , pos y ) istranslated into normalized pixel coordinates, anddefines a basic observati<strong>on</strong> z uv to be used by the<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system:z uv =(ux)=v y( )posx /distFoc xpos y /distFoc y(33)with distFoc X /distFoc Y the focal distances in xand y, respectively.After data associati<strong>on</strong> (see descripti<strong>on</strong> inSecti<strong>on</strong> 4.2), the set of measurements in normalizedcoordinates {z uv } can be transformed into avirtual observati<strong>on</strong> of a rigid-body <strong>3D</strong> landmarkz rb<strong>3D</strong> . This requires minimizing a measurementerror that relates the coordinates of the bodypoints, the pose of the corresp<strong>on</strong>ding rigid-body<strong>3D</strong> landmark (whose identity is determined in thedata associati<strong>on</strong> process), and the real measuredobservati<strong>on</strong>s. As the virtual observati<strong>on</strong> computati<strong>on</strong>involves minimizing an error, an initial posemust be provided for the minimizati<strong>on</strong> algorithm.The initial pose is estimated by using the threepointalgorithm [22] in several random-selectedtriplets of measured interest points. The threepointalgorithm (alg3p functi<strong>on</strong>) enables the calculati<strong>on</strong>of the positi<strong>on</strong>s of three points in spacewhen the projected points and the distances betweenthe points in space are known. As up tofour soluti<strong>on</strong>s can be obtained, a fourth point isneeded for disambiguati<strong>on</strong>. Twelve sets of fourpoints are used to generate a set of candidateposes. The last detected pose is also added to thisset. The candidate pose with the lowest error isselected. By using this procedure, an initial poseη 0 that projects a triplet of body points into threemeasured interest points <strong>on</strong> the image with lowerror is obtained:η 0 = arg minη abcd ∈I (E P (η abcd )) ; I = {η 1 ,,,η 13 } (34)with(( )uaη abcd = alg3p a , b , c , d , ,v aandi=1(ucv c),(ubv b),(udv d))a ̸= b ̸= c ̸= d (35)E P (η LC )N∑=∥ projecti<strong>on</strong> ( ( )∥q LC · i · q −1LC + t ) ui ∥∥∥LC −v i(36)The projecti<strong>on</strong> operati<strong>on</strong> maps points in spaceinto the image space:( ) u=v( ) x/zy/z⎛ ⎞x= projecti<strong>on</strong> ⎝ y ⎠ (37)zThen, the virtual observati<strong>on</strong> z rb<strong>3D</strong> is computedby iterative optimizati<strong>on</strong> using Levenberg-Marquardt, using as the initial soluti<strong>on</strong> η 0 :z rb<strong>3D</strong> = V (u 1 ,v 1 ,...,u N ,v N , 1 ,..., N )= η LAND−CAM−MEASURED= arg minη iE P (η i ) (38)The error covariance matrix associated with thevirtual observati<strong>on</strong>al process, R rb<strong>3D</strong> is computedby propagating the errors associated with the observati<strong>on</strong>sR UV and the error associated with thebody points P :R rb<strong>3D</strong> =N∑i=1( ) T+J ( ) TJ (i)UV · R UV· J (i) (i)UV · P(i) · J (i)(39)


J Intell Robot Syst (2012) 66:125–149 133withR UV(α2pixelX/distFoc 2 X00 α 2 pixelY /distFoc2 YJ (i)UV = ∂V (u 1,v 1 ,...,u N ,v N , 1 ,..., N )∂ (u i ,v i ))(40)(41)J (i) = ∂V (u 1,v 1 ,...,u N ,v N , 1 ,..., N )(42)∂ iThe procedure used to compute these Jacobians isdetailed in the Appendix.An observati<strong>on</strong> functi<strong>on</strong> h rb<strong>3D</strong> allows computingan estimated pose for the rigid-body <strong>3D</strong> landmark.Since the observati<strong>on</strong> functi<strong>on</strong> depends <strong>on</strong>the camera pose and the <strong>3D</strong> landmark pose, it depends<strong>on</strong> the representati<strong>on</strong> used for the camerastate. In this work, a point-quaterni<strong>on</strong> η CAM−MAPencodes the pose of the camera in respect to theglobal reference frame (see Secti<strong>on</strong> 4.1). C<strong>on</strong>sideringthat the pose of the <strong>3D</strong> landmark is encodedby η LAND−MAP , h rb<strong>3D</strong> is given by:h rb<strong>3D</strong> (x) = η LAND−CAM−EXPECTED= η −1CAM−MAP · η LAND−MAP (43)When few body points are observed, a virtual observati<strong>on</strong>z rb<strong>3D</strong> cannot be computed, but the bodypoints observati<strong>on</strong>s can be used in the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> procedure(see Secti<strong>on</strong> 4.3). The observati<strong>on</strong> functi<strong>on</strong>associated with each body point h (i)bpis given by:h (i)bp (x) = projecti<strong>on</strong> (η LAND−CAM−EXPECTED· i× η −1LAND−CAM−EXPECTED+ t LAND−CAM−EXPECTED )And the associated covariance computed as:bp = R UV + dh(i) bp (x)d iR (i)P (i)3.5 Quaterni<strong>on</strong> Sign Compatibility(44)(dh(i)bp (x)d i) T(45)For each possible pose, infinity point-quaterni<strong>on</strong>scan be selected as a possible representati<strong>on</strong>.When a unitary quaterni<strong>on</strong> c<strong>on</strong>straint is imposed,there are two possible opti<strong>on</strong>s: (t, q) and(t, −q).Because the virtual observati<strong>on</strong> z rb<strong>3D</strong> and theestimated observati<strong>on</strong> h rb<strong>3D</strong> are computed inan independent way, they may lack compatiblesigns. For this reas<strong>on</strong> a procedure for correctingthis problem is required. The procedure isbased <strong>on</strong> computing a cosine distance between thepoint-quaterni<strong>on</strong>s η LAND−CAM−MEASURED andη LAND−CAM−EXPECTED . If they have differentsigns, the distance becomes negative and both theobservati<strong>on</strong> and its covariance are corrected:〈qLAND−CAM−MEASURED , q LAND−CAM−EXPECTED〉< 0⎧⎨q LAND−CAM−MEASURED =−q LAND−CAM−MEASURED⇒R⎩rb<strong>3D</strong>tq =−R rb<strong>3D</strong>tqR rb<strong>3D</strong>qt =−R rb<strong>3D</strong>qt(46)with〈q 1 , q 2 〉 = a 1 a 2 + b 1 b 2 + c 1 c 2 + d 1 d 2 (47)and R rb<strong>3D</strong>tq and R rb<strong>3D</strong>qt the covariance elementsthat are associated with the point-quaterni<strong>on</strong>’scomp<strong>on</strong>ents t and q.3.6 Computati<strong>on</strong>al Complexity<strong>Rigid</strong>-body <strong>3D</strong> landmarks are generated by transformingN point-like landmarks into a 7D poseplus shape parameters. If the original state hasn O + 3N comp<strong>on</strong>ents before fusi<strong>on</strong>, it will remainwith <strong>on</strong>ly n O + 7 comp<strong>on</strong>ents after the transformati<strong>on</strong>.For illustrating the speed gain causedby state reducti<strong>on</strong>, two opposing cases will beanalyzed.In the first case the state of the system c<strong>on</strong>tainsa camera state and D rigid-body <strong>3D</strong> landmarks.Given that the camera state, composed by a pointquaterni<strong>on</strong>,a linear velocity vector and an angularvelocity vector (see Secti<strong>on</strong> 4.1), has 13 dimensi<strong>on</strong>s,the covariance matrix size is:size 1 (D) = (13 + 7D) 2 (48)In the sec<strong>on</strong>d case the state of the system c<strong>on</strong>tainsa camera state and D ∗ n P point-like landmarks,with n P being the number of points that are


134 J Intell Robot Syst (2012) 66:125–149required to form a <strong>3D</strong> landmark. Then, the covariancematrix size is:size 2(D, np)=(13 + <strong>3D</strong>np) 2(49)As the number of landmarks increases, sizedifferences become more significant:size 1 (D) (13 + 7D)2= ( )size 2 (D, n P ) 213 + <strong>3D</strong>np= 49D2 + O (D)9 2 D 2 n 2 p + O (D) ≈ 5, ¯4(50)n 2 pThen, in case all point-like landmarks are groupedinto <strong>3D</strong> landmarks, using 10 point-like landmarksto form each <strong>3D</strong> landmark (n p = 10), the statecovariance matrix size can be reduced up to 5.5%as the number of landmarks increases. It is wellknown that the computing time needed in eachiterati<strong>on</strong> of the EKF-<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> is limited by the computati<strong>on</strong>srequired in the correcti<strong>on</strong> step, whenthe number of landmarks is large. As this timeis proporti<strong>on</strong>al to the size of the state covariancematrix, computati<strong>on</strong>al time can be reduced up to5.5%. Thus, the use of <strong>3D</strong> landmarks is especiallywell-suited for large maps.4 <str<strong>on</strong>g>Visual</str<strong>on</strong>g> <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> System Using <strong>Rigid</strong>-<strong>Body</strong><strong>3D</strong> <strong>Landmarks</strong>The proposed visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system is based <strong>on</strong>M<strong>on</strong>o<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> [6], but it incorporates the simultaneoususe of point-like and rigid-body <strong>3D</strong> landmarks.EKF-<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> is used as the basis algorithmfor implementing the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system. In a firststage point-like landmarks are stored using the inversedepth parametrizati<strong>on</strong> [14], then as standard<strong>3D</strong> points.4.1 State Representati<strong>on</strong>The state of the system x incorporates informati<strong>on</strong>about the camera state, and the poses of pointlike,inverse-depth, and rigid-body <strong>3D</strong> landmarks.The camera state x CAMERA includes the camerapose, represented by using a point-quaterni<strong>on</strong>η CAM−MAP , and linear and angular velocity vectors,v CAM−MAP and ω CAM−MAP , respectively:⎛⎞η CAM−MAPx CAMERA = ⎝ v CAM−MAP⎠ (51)ω CAM−MAPThe state update equati<strong>on</strong> for the camera is givenby (assuming a zero-mean Gaussian noise addedto both velocities):⎛⎞t cam−map(k+1)f camera = ⎜ q cam−map(k+1)⎟⎝ v cam−map(k+1)⎠ω cam−map(k+1)⎛t cam−map(k) + ( ) ⎞v cam−map(k) + n V(k) t= ⎜quat (( ) )ω cam−map(k) +n W(k) t qcam−map(k)⎟⎝ v cam−map(k) +n V(k)⎠ω cam−map(k) + n W(k)(52)with⎛⎞cos ‖ω/2‖quat (ω) = ⎜ ω X / ‖ω‖ · sin ‖ω/2‖⎟⎝ ω Y / ‖ω‖ · sin ‖ω/2‖ ⎠ (53)ω Z / ‖ω‖ · sin ‖ω/2‖andn V ∼ N (0, P V )n W ∼ N (0, P W ) (54)Given the fact that the inverse depth parametrizati<strong>on</strong>[14] permits an efficient and accuraterepresentati<strong>on</strong> of uncertainty during undelayedinitializati<strong>on</strong> of point-like landmarks, the positi<strong>on</strong>of these landmarks is represented in a first stageusing 6D inverse depth points q i :q i = ( x i y i z i θ i φ i ρ i) T(55)with (x i y i z i ) T the first camera positi<strong>on</strong> fromwhich the feature was observed [14], θ i and φ iazimuth and elevati<strong>on</strong> angles of the first featureobservati<strong>on</strong>, and ρ i the inverse of the distance tothe first observati<strong>on</strong>.The error covariance associated with q i is givenby (40). Every time the uncertainty associatedwith landmark represented using the inversedepthparametrizati<strong>on</strong> drops below a given


J Intell Robot Syst (2012) 66:125–149 135threshold (see details in [14]), the landmark isrepresented as a <strong>3D</strong> Cartesian point p i .The pose of rigid-body <strong>3D</strong> landmarks is representedusing point-quaterni<strong>on</strong>s η i , as explained inSecti<strong>on</strong> 3.The state update equati<strong>on</strong> for point-like andrigid-body <strong>3D</strong> landmarks is the identity.4.2 <str<strong>on</strong>g>Visual</str<strong>on</strong>g> Observati<strong>on</strong>s and Data Associati<strong>on</strong>Observati<strong>on</strong>s are generated by computingSURF’s interest points and descriptors [21]. SinceSURF’s interest points computati<strong>on</strong> is based <strong>on</strong>the use of n<strong>on</strong>-smooth square kernels, they canbe computed quickly, but some interest pointsappear over lines. These interest points haven<strong>on</strong>-repeatable positi<strong>on</strong>s because they move <strong>on</strong>the line from frame to frame. Unrepeatable pointsare deleted by applying the Harris cornernesstest [23] <strong>on</strong> each individual interest point (pointwith a cornerness less than 1E-30 are eliminated).The parameters for the Harris filter are sd = 1.3,si = 2.0, a = 0.04.Observati<strong>on</strong>s, i.e. measured interest points, arecompared with estimated observati<strong>on</strong>s that areproduced by projecting <strong>3D</strong> points l i bel<strong>on</strong>gingto point-like, inverse-depth, and rigid-body landmarks<strong>on</strong>to pixels coordinates. First, point positi<strong>on</strong>sare estimated using h uv :( ) uh uv (i) = projecti<strong>on</strong>v× ( q −1CAM−MAP · (l i − t CAM−MAP ))· q CAM−MAP (56)Then, pixel coordinates are obtained by using thefocal distance in x and y:( ) x=y( )u · distFocxv · distFoc y(57)In the case of point-like landmarks the pointsto be projected are the <strong>on</strong>es defining the landmarks(p i ). In the case of rigid-body landmarksthe points to be projected are the rigid-bodypoints associated with the landmark, which positi<strong>on</strong>is given by p i = q LN · i · q −1LN + t LN, withq LN and t LN the quaterni<strong>on</strong> and point defining thelandmark.Finally, in the case of inverse-depth landmarks,the coordinates of the points to be projected aregiven by [14]:⎛ ⎞ ⎛ ⎞x il i = ⎝ y i⎠ + 1 cos φ i sin θ i⎝ − sin φ i⎠ (58)ρz i i cos φ i cos θ iA planar model is detected <strong>on</strong> the set of measureddescriptors by searching a similarity transformati<strong>on</strong>that relates both sets of associated descriptors.The similarity transformati<strong>on</strong> is computedby using the L&R matching procedure [24, 25],which uses an approximate nearest neighbor procedurebased <strong>on</strong> a kd-tree representati<strong>on</strong> for generatingmatches between descriptors, a Houghtransform for filtering outliers, and several teststo reject incorrect transformati<strong>on</strong>s. The systemworks by generating corresp<strong>on</strong>dences betweenkeypoints from both sets of descriptors, then ituses differences in positi<strong>on</strong>, orientati<strong>on</strong> and scaleassociated to each corresp<strong>on</strong>dence for computingsimilarity transformati<strong>on</strong>s. Hypothesis with highc<strong>on</strong>sensus are used to generate an affine transformati<strong>on</strong>that relate both images, and severalc<strong>on</strong>sistence tests are d<strong>on</strong>e for rejecting transformati<strong>on</strong>swith low score, transformati<strong>on</strong>s sufferingfrom excessive distorti<strong>on</strong> and for deleting wr<strong>on</strong>gmatches in correct transformati<strong>on</strong>s. When camerarotates, all keypoints are displaced in a coherentwayandthesystemisabletofindthetransformati<strong>on</strong>that relates all of the displacements, whichgive it the ability to cope with significant camerarotati<strong>on</strong>s. Transformati<strong>on</strong>s that have an excessiveassociated translati<strong>on</strong> or scaling are rejected aspossible detecti<strong>on</strong>s. A chi-square test is used toreject some false landmark detecti<strong>on</strong>s that cansurvive the tests. For rigid body <strong>3D</strong> landmarks,7 × 7-dimensi<strong>on</strong>al covariance S matrices are usedfor the chi-square test, while for the point-likelandmarks, 2 × 2-dimensi<strong>on</strong>al matrices are used.The similarity transformati<strong>on</strong> stage can be relaxedwhen the camera is lost.This system does not use pixel tracking, butuses landmark detecti<strong>on</strong> in each frame. Then,loop closure occurs naturally as old descriptors arefound.


136 J Intell Robot Syst (2012) 66:125–1494.3 <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> Algorithm Formulati<strong>on</strong>The <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> algorithm includes the followingstages: SURF features detecti<strong>on</strong>, Matching ofSURF features with point-like, inverse-point andrigid-body landmarks, EKF State Predicti<strong>on</strong>, EKFState Update, Inverse-depth landmarks collapsing,Point-like landmarks collapsing, Inversedepthlandmarks generati<strong>on</strong>, and Inverse-depthlandmarks deleti<strong>on</strong>.1. SURF features detecti<strong>on</strong>. SURF features aredetected in the current image, and translatedinto normalized pixel coordinates as describedin Secti<strong>on</strong> 4.2.2. Matching of SURF features with point-like,inverse-point and rigid-body landmarks. Asoutlined in Secti<strong>on</strong> 4.2, the L&R matching systemis used, which includes several rejecti<strong>on</strong>tests.3. EKF State Predicti<strong>on</strong>. Thestatex k and thecovariance matrix of the state P k are updatedusing the standard EKF predicti<strong>on</strong> step [26].The camera state is updated using (52) and(53). The state update equati<strong>on</strong> of the landmarksis the identity. P k is updated using theusual EKF procedure.4. EKF State Update. The observati<strong>on</strong> model isused to update the system state and covarianceby using the difference between the expectedand real values of the observati<strong>on</strong>s forcorrecting the model.As usual, the innovati<strong>on</strong> y k and the innovati<strong>on</strong>covariance S k are computed as:y k = z k − H k · x − kS k = H k · P − k · HT k + R k (59)Four different cases for the innovati<strong>on</strong> computati<strong>on</strong>need to be c<strong>on</strong>sidered:– In the case of point-like landmarks, z k and R kare given by (33) and(40), respectively, andH k is the Jacobian matrix of partial derivativesof h uv (i) (given by (56)), with respect to x.– In the case of inverse-depth landmarks, z k andR k are given by (33) and(40), respectively,and H k is the Jacobian of h uv, (i) given by (56)and (58).– In the case of rigid-body landmarks, z k and R kare given by (38) and(39), respectively, andH k is the Jacobian of h rb<strong>3D</strong> , given by (43).– In case a virtual observati<strong>on</strong> can not be obtainedfor an existing landmark because notenough body points are observed, body pointscan also be used in the correcti<strong>on</strong> process. Inthis case, for each observed body point, z k andR k are given by (33) and(45), respectively,and H k is the Jacobian of h (i)bp, given by (44).Fast covariance correcti<strong>on</strong> can be achievedby decomposing the state covariance matrix Pinto observed (o) and n<strong>on</strong>-observed (n) comp<strong>on</strong>entsbefore applying Kalman correcti<strong>on</strong> step, asfollows:K k = ( H k P − ) Tk S−1kP k = P − k − K (k Hk P − K)(60)withH k = ( H o 0 ) ( ), P − k = Poo P <strong>on</strong>P<strong>on</strong> T P nmH k P − k = ( )H o P oo H o P <strong>on</strong> (61)In a very small percentage of the frames, numericallyunstable state covariance matrices are obtainedby using the fast covariance update formulabecause of floating-point rounding errors. In thiswork, a covariance matrix is c<strong>on</strong>sidered unstableif P ii P jj < Pij 2 for any combinati<strong>on</strong> of (i, j).Inthatcase, covariance correcti<strong>on</strong> step is d<strong>on</strong>e by usingCholesky downdating, which is a method involvingCholesky decompositi<strong>on</strong> that gives a positivesemidefinite matrix as result:P − k = L PL T P , S−1 k= U T S U SK (H k P K ) = (H k P k ) T S −1k(H k P k ) T= (H k P k U S ) T (H k P k U S )= ( )( ) Tv 1 v 2 ... v no v1 v 2 ... v noP k = P − k − KH k P − k ⇔ ( L P L T )P k= ( nL P L T ) o −P k − ∑v i vi T (62)A good implementati<strong>on</strong> of Cholesky decompositi<strong>on</strong>is faster than normal matrix multiplicati<strong>on</strong>.As a C or C++ efficient code for Cholesky down-i=1


J Intell Robot Syst (2012) 66:125–149 137dating is not available, a C versi<strong>on</strong> of the zchddsubroutine from LINPACK library [27], originallywritten in Fortran, was obtained using fable [28].After each correcti<strong>on</strong> step, the quaterni<strong>on</strong>comp<strong>on</strong>ents in state are normalized. To ensurecoherence in the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>, a Jacobian from thenormalizing functi<strong>on</strong> is used to propagate normalizati<strong>on</strong>effects into the state covariance matrix.5. Inverse-depth landmarks collapsing. Inversedepthlandmarks whose uncertainty drops belowa threshold are c<strong>on</strong>verted into normalpoint-like landmarks.6. Point-like landmarks collapsing. The covarianceof points P pp is analyzed in order toverify if a set of point-like landmarks existsthat can generate a rigid-body landmark. Asexplained in Secti<strong>on</strong> 3.3, the procedure requiresverifying whether a variability indexassociated with a set of point-like landmarksis below a threshold th (Eqs. 29, 30, 31). Incase a rigid-body landmark can be generated,the procedure described in Secti<strong>on</strong> 3.2 is used(Eqs. 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,21, 22, 23, 24, 25, 26).7. Inverse-depth landmarks generati<strong>on</strong>. ImageSURF features that were not matched, i.e.they are distant from landmarks in the imagedomain, are added as new inverse-depthlandmarks, using the procedure described inSecti<strong>on</strong> 4.1. By selecting an appropriate distancethreshold, the density of observed descriptorsin the image can be kept within adesired range.8. Inverse-depth landmarks deleti<strong>on</strong>. Newinverse-depth landmarks need to be observedfor a certain number of frames in order to bec<strong>on</strong>firmed. If the number of frames in whichan inverse-depth landmark was not observed,but was expected to be, is over a certainthreshold, the landmark is deleted.5 Experimental Evaluati<strong>on</strong>5.1 Simulated ExperimentsThe system is evaluated by simulating the movementof a camera using four types of trajectories,and applying different visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> approachesfor recovering the camera’s path. In all cases thesimulated camera moves looking all the time ata fixed point. The following six trajectories areused, which have closed form for repeatabilitypurposes.1. U-shaped path– Trajectory: x =−60 sin ( π2 sin ( 2π 8)) t , y =−90 cos ( π2 sin ( 2π 8)) t , z = 0– Camera looking at positi<strong>on</strong> (0,0,0)2. S-shaped path– Trajectory: x =−40 1+t54 cos ( cos ( ))t3t , y =40 1+t54 sin ( cos ( ))t3t , z = 0– Camera looking at positi<strong>on</strong> (30,0,0)3. C<strong>on</strong>tinual Lost path– Trajectory: S-shaped path (80 s) followedby a square path with four very distantpoints (−90, −40), (−90, 40), (−30, 40),(−30, −40). Each periodic sequence takes4 s, and the transiti<strong>on</strong> between the fourpoints is d<strong>on</strong>e without any delay.– Camera looking at positi<strong>on</strong> (30,0,0)4. S-shaped+Random-Walk path– Trajectory: S-shaped path (80 s) followedby a random walk: x t = x t+1 + n X , y t =y t+1 + n Y , z t = z t+1 + n z , n X , n Y , n Z →N (0, 100) .– Camera looking at positi<strong>on</strong> (30,0,0)The trajectories were sampled into c<strong>on</strong>trolpoints by using a 1 s step. Intermediate pointswere calculated by using spline interpolati<strong>on</strong> inthe point-quaterni<strong>on</strong> space. A set of 900 SURFdescriptors are generated randomly in a 400 × 400square area centered at the origin of the globalcoordinate system by using a uniform distributi<strong>on</strong>;64D values for the SURF descriptors areinitialized in a random way by using a uniformdistributi<strong>on</strong> followed by a normalizati<strong>on</strong>. Some ofthe descriptors will give rise to landmarks whenobserved by first time, and then each landmarkcorresp<strong>on</strong>ds to a unique known feature in space,which enables comparing landmarks and originalfeatures. The frame rate of the simulated camerais 15 fps. Gaussian noise with a standard devia-


138 J Intell Robot Syst (2012) 66:125–149Fig. 1 Simulated cameratrajectories used inexperimentsU-shapedS-shapedC<strong>on</strong>tinual LostS-shaped+Random-Walkti<strong>on</strong> of six pixels was added to the observati<strong>on</strong>sin order to simulate noise which is intrinsic toSURF detecti<strong>on</strong> process. The simulated camerahas a resoluti<strong>on</strong> of 320 × 200. A visualizati<strong>on</strong>example of each path, showing both the c<strong>on</strong>trolpoints and the spline interpolati<strong>on</strong> is shown inFig. 1.Several simulati<strong>on</strong> tests were carried out ineach of the paths. Each test takes 2,800 frames,and it is started with the restricti<strong>on</strong> of having amaximum of 60 landmarks:Test 1:Test 2:Test 3:Test 4:Test 5:Only point-like landmarks are used.All kind of landmarks are used. The maximalpose covariance criteri<strong>on</strong> is used incase of body-point <strong>3D</strong> landmarks.All kind of landmarks are used. The maximalbody-points covariance criteri<strong>on</strong> isused in case of body-point <strong>3D</strong> landmarks.Only point-like landmarks are used. Themaximum number of landmarks is c<strong>on</strong>strainedto 4 at frame 1,800.All kind of landmarks are used. Themaximal pose covariance criteri<strong>on</strong> is applied.The maximum number of rigidbody landmarks is c<strong>on</strong>strained to 4 atframe 1,800, and no point-like landmarksare used.Test 6:Same as test 5, but the maximal bodypointscovariance criteri<strong>on</strong> is applied.In all cases, point-like landmarks are first createdas inverse-depth landmarks.Given that map building by using a single cameracan produce differences in positi<strong>on</strong>, orientati<strong>on</strong>and scale respect to the ground-truth, anoptimal transformati<strong>on</strong> that c<strong>on</strong>sider all threecharacteristics is found and applied to experimentalpath for making comparis<strong>on</strong> possible (leastsquares procedure). The average Euclidean distancebetween corresp<strong>on</strong>dent pairs of points in theground truth and the computed paths, i.e. pairs ofpoints that corresp<strong>on</strong>d to the same time, is used aserror measurement.In Fig. 2 are shown some examples of recoveredpaths together with the corresp<strong>on</strong>ding groundtruth. As obtained results need to be analyzedvery carefully, histograms of the errors will bepresented in additi<strong>on</strong> to mean errors and standarddeviati<strong>on</strong> values. In the histograms visualizati<strong>on</strong>,errors with values over 60 will be cut to that valorfor maintaining an adequate scale, and they willbe c<strong>on</strong>sidered failures. Table 1 presents experimentalresults in terms of mean error, standarddeviati<strong>on</strong>, and failure percentage for all experiments,while Figs. 3, 4, 5, 6 shows the histograms


J Intell Robot Syst (2012) 66:125–149 139Fig. 2 Simulati<strong>on</strong>example of the recoveredpaths drawn over theground truth <strong>on</strong>e for eachkind of path is shown.The set of all features isshown in blue, thesetoffeatures that wereselected as landmarks isshown in green andcurrent landmarks areshown in redU-shapedS-shapedC<strong>on</strong>tinual LostS-shaped+Random-WalkTable 1 Experimentalresults of visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>for the differenttrajectories and simulatedtestsPath Test Max num. Mean Standard Failure<strong>Landmarks</strong> error deviati<strong>on</strong> percentageU-shaped 1 60 1.79 0.69 0%U-shaped 2 60 8.02 5.80 0%U-shaped 3 60 3.67 0.84 0%U-shaped 4 4 – – 100%U-shaped 5 4 41.03 16.18 76.7%U-shaped 6 4 4.19 0.93 0%S-shaped 1 60 6.81 3.53 0%S-shaped 2 60 26.12 20.78 86.67%S-shaped 3 60 10.95 6.64 0%S-shaped 4 4 18.34 5.67 90%S-shaped 5 4 – – 100%S-shaped 6 4 13.22 6.52 0%C<strong>on</strong>tinual lost 1 60 11.69 3.18 0%C<strong>on</strong>tinual lost 2 60 30.07 14.49 70%C<strong>on</strong>tinual lost 3 60 25.34 10.9 6.67%C<strong>on</strong>tinual lost 4 4 – – 100%C<strong>on</strong>tinual lost 5 4 – – 100%C<strong>on</strong>tinual lost 6 4 25.75 6.25 0%S-shaped+R 1 60 2.54 1.24 0%S-shaped+R 2 60 20.89 13.66 0%S-shaped+R 3 60 3.80 1.10 0%S-shaped+R 4 4 32.97 7.10 0%S-shaped+R 5 4 29.75 7.87 0%S-shaped+R 6 4 3.57 0.86 0%


140 J Intell Robot Syst (2012) 66:125–14919Test 1 Test 2 Test 330 23Test 4 Test 5 Test 6Fig. 3 Histograms of tests applied over the U-shaped path26Test 1 Test 2 Test 32730Test 4 Test 5 Test 6Fig. 4 Histograms of tests applied over the S-shaped path


J Intell Robot Syst (2012) 66:125–149 14121Test 1 Test 2 Test 330 30Test 4 Test 5 Test 6Fig. 5 Histograms of tests applied over the C<strong>on</strong>tinual Lost pathTest 1 Test 2 Test 3Test 4 Test 5 Test 6Fig. 6 Histograms of tests applied over the S-shaped + Random walk path


142 J Intell Robot Syst (2012) 66:125–149Fig. 7 Executi<strong>on</strong> time for<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> predicti<strong>on</strong>-updatesteps versus number offeatures in the mapFig. 8 <str<strong>on</strong>g>Visual</str<strong>on</strong>g>izati<strong>on</strong> example of the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system workingin a video sequence. Observati<strong>on</strong>s are shown as rhombsover imposed <strong>on</strong> captured image and innovati<strong>on</strong> covarianceis drawn as a set of ellipses. Point-like landmarks aredrawn as blue dots. <strong>Rigid</strong> body landmarks are drawn aswhite reference systems with white body points. MatricesC X , C Y and C Z , used to evaluate variability index, areshown bottom left. Full covariance matrix is shown top right


J Intell Robot Syst (2012) 66:125–149 143Fig. 9 Selected images from the garden video database. In each image rhombs corresp<strong>on</strong>d to real observati<strong>on</strong>s (SURFinterest points), and ellipses represent the innovati<strong>on</strong> covariancefor the errors. In all cases each test was run 30times in every path in order to generate robuststatistics.In the case of the U-shaped, S-shaped, andS-shaped+Random-Walk paths (see Table 1 andFigs. 3, 4, 5, 6), it can be observed that when thenumber of landmarks is limited to 60, the bestopti<strong>on</strong> is to use point-like landmarks, and rigidbodylandmarks using the maximal body-pointscovariance criteri<strong>on</strong> produce a slightly largererror.However, when the number of landmarks isvery small (limited to 4), rigid-body landmarksusing the maximal body-points covariance criteri<strong>on</strong>show an impressive advantage over pointlikelandmarks, as the reducti<strong>on</strong> of the numberFig. 10 Example of the visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system running in<strong>on</strong>e of the real video sequence. The camera is shown inyellow. Blue dots corresp<strong>on</strong>d to point-like landmarks, whilewhite structures c<strong>on</strong>sisting in three perpendicular axis anda set of white body points denote rigid-body landmarks


144 J Intell Robot Syst (2012) 66:125–149of landmarks produce <strong>on</strong>ly a very weak errorincrease. Even in some cases where the use ofpoint-like landmarks fails completely (U-shapedpath and test 4), the use of rigid-body landmarkswith the maximal body-points covariance criteri<strong>on</strong>behave appropriately. In all cases the use of theFig. 11 Maps andrec<strong>on</strong>structed paths forthe seven tested videos.The camera is shown inyellow. Blue dotscorresp<strong>on</strong>d to point-likelandmarks, while whitestructures c<strong>on</strong>sisting inthree perpendicular axisand a set of white bodypoints denote rigid-bodylandmarksVideo sequence 1; 1,338 frames.Video sequence 2; 1,134 frames.Video sequence 3; 1,107 frames.Video sequence 4; 1,491 frames.Video sequence 5; 1,469 frames.Video sequence 6; 1,372 frames.Video sequence 7; 1,494 frames.


J Intell Robot Syst (2012) 66:125–149 145maximal body-points covariance criteri<strong>on</strong> appearsas the best opti<strong>on</strong>.The C<strong>on</strong>tinual Lost path is a very hard test asit involves instantaneous and very large changesof the positi<strong>on</strong> of the camera, which can producedivergences in the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system. Point-likelandmarks behave better when the number of observedfeatures is high. However, errors involvedin all of the results are very high as they are ofthe same magnitude order than the size of thefull path (around 30). In the case of using justfour landmarks in the map, point-like landmarksare not able to follow the path, failing in all ofthe cases. <strong>Rigid</strong>-body landmarks with the maximalbody-points covariance criteri<strong>on</strong> are able to followappropriately the path.From the experimental data, it is clear that incases where the number of landmarks needs tobe limited, because of computati<strong>on</strong>al reas<strong>on</strong>s orbecause of the large size of the map, the use ofrigid-body landmarks is very useful. In additi<strong>on</strong>,it is evident that it is c<strong>on</strong>venient to propagatethe most possible quantity of covariance fromthe original point landmarks into the body pointcovariances. As the positi<strong>on</strong>s of body points arenot adapted before its creati<strong>on</strong>, the error associatedto them does not decrease, and then itscovariances must be c<strong>on</strong>stant over time. If thecovariance from the original points is propagatedmainly into the covariance of the pose, covariancefrom rigid-body landmarks will be underestimatedbecause the covariance of the pose decrease tozero when it is observed, and the covariance frombody points remains very low. This fact can causea severe covariance underestimati<strong>on</strong> when observinga landmark several frames before its creati<strong>on</strong>.Then, maximizing propagati<strong>on</strong> of originalcovariance into body point covariances is the bestopti<strong>on</strong>.Executi<strong>on</strong> times for <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>, including both predicti<strong>on</strong>and update steps, were measured as a functi<strong>on</strong>of the number of features used in the <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>system. In the runtime experiment, the path andfeature c<strong>on</strong>figurati<strong>on</strong> used in the U-shaped pathtest was selected. As it can be observed in theresults shown in Fig. 7, the ratio between executi<strong>on</strong>times for a <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> using <strong>on</strong>ly point-likepoints versus a <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> using rigid-body landmarksc<strong>on</strong>verges to the 5.5% limit, as the number offeatures in the map increases. This can be explainedbecause matrix operati<strong>on</strong>s in the EKF updatestep become the most expensive computati<strong>on</strong>in <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> when the number of features is high,because of their quadratic nature. Then the stateFig. 12 Camera movingin a polyg<strong>on</strong>al trajectory(f irst), and moving <strong>on</strong> anelliptical trajectory(sec<strong>on</strong>d row). In bothcases, the ideal cameratrajectory, as well as themaps and therec<strong>on</strong>structed paths areshown3m2m2m1m4m6m


146 J Intell Robot Syst (2012) 66:125–149reducti<strong>on</strong> capabilities of the rigid body approachhave an impressive impact in the executi<strong>on</strong> time.The performance penalty related to the computati<strong>on</strong>of virtual observati<strong>on</strong>s is very low whencompared to matrix operati<strong>on</strong>s, as they grow in alinear fashi<strong>on</strong>. This experiment was realized in anFig. 13 Recoveredelliptic paths, and errorbetween start and endpointsStart-endpoint error: 2.97%Start-endpoint error: 4.04%Start-endpoint error: 9.80%Start-endpoint error: 4.41%Start-endpoint error: 4.19% Start-endpoint error: 13.67%Start-endpoint error: 6.25%


J Intell Robot Syst (2012) 66:125–149 147Intel Core Duo processor at 1.6 GHz using <strong>on</strong>ly<strong>on</strong>e core.5.2 Experiments with Real Video SequencesIn a first set of experiments, the system was evaluatedqualitatively. A handheld camera was usedto produce seven video sequences in an outdoorenvir<strong>on</strong>ment. In order to generate the sequences,the handheld camera followed a path inside ahouse’s garden. In each of the cases, the pathfinished near its starting point. The camera has anormal lens, the video sequences were captures at30 fps, and their durati<strong>on</strong> in frames is 1,338, 1,134,1,107, 1,491, 1,469, 1,372 and 1494.The proposed visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system using pointlikeand rigid-body <strong>3D</strong> landmarks was tested inthese video sequences. The system runs in a standardlow-end laptop. Figure 8 shows a visualizati<strong>on</strong>tool used to analyze the performance of thevisual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system in the video sequences. Theposes from rigid bodies are represented by using asmall reference system represented by using threeorthog<strong>on</strong>al axis x, y, z, which are drawn in white.In Fig. 9 some selected images from the gardenvideo database are shown. Figure 10 shows anexample of the visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system running in<strong>on</strong>e of the real video sequence.In all of the tests the proposed <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> systemwas able to build a coherent map and to recoverthe path. In Fig. 10 the rec<strong>on</strong>structed paths forthe seven tested videos are shown: the camera isshown in yellow, point-like landmarks are shownas blue dots, and rigid-body landmarks are denoteas white structures.The system was also able to recognize the firstgenerated landmarks (loop closing) easily becauseno tracking is used; instead descriptor matchingbetween the map and the current image’s observati<strong>on</strong>sis d<strong>on</strong>e by using L&R system. The robustnessof the matching system is reflected inits capacity to recover the seven paths that weretested (Fig. 11).In a sec<strong>on</strong>d set of experiments, the systemwas evaluated quantitatively using ground-truthpaths of specific regular shapes. In the initialexperiments, about 40 runs were carried out indifferent envir<strong>on</strong>ments, using different polyg<strong>on</strong>alpaths c<strong>on</strong>taining right angles, but the results wereinaccurate in around half of the cases. The systemwas able to rec<strong>on</strong>struct the angles from the polyg<strong>on</strong>s,but estimati<strong>on</strong>s of the sides were not regular,and the estimated pose of camera sometimesmoved l<strong>on</strong>g distances when the loop was closed.The explanati<strong>on</strong> found is that straight angles causea loss of the speed informati<strong>on</strong> as the camera muststop moving, and features come out of the camerawhen it turns in z<strong>on</strong>es of the path with high curvature.Both problems limit scale preservati<strong>on</strong>. Thisproblem is aggravated by the decisi<strong>on</strong> of using astandard narrow-angle camera instead of a wideangle <strong>on</strong>e, which could provide major parallax forthe points when moving. Then, movements of thecamera in visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> cannot be arbitrary asthey require some softness, as some parallax <strong>on</strong>the features is required to estimate the map, andthe speed of the camera helps to preserve scale.As paths with sharp angles were troublesome,elliptical paths, as the <strong>on</strong>es shown in Fig. 12,were selected in order to generate seven videosequences for testing purposes. As ellipses havesome degree of rotati<strong>on</strong>al symmetry, the mean absoluteerror between the best possible ellipse andthe recovered path can cause an underestimati<strong>on</strong>of the error. This occurs because errors in lengthof the path will not produce errors as l<strong>on</strong>g as pathremains into the ellipse. For this reas<strong>on</strong>, the distancebetween the initial and final points from therecovered path was used as a quantitative measureof the accuracy of the proposed system. Figure 13shows the recovered paths and the error betweenthe start and end points, normalized respect tothe length of the ellipse, for ach case. It can beobserved that in most of the cases the start-endpoint error is smaller than 10%, and that its meanvalue is 6.47%.6 C<strong>on</strong>clusi<strong>on</strong>sIn this work a visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system based <strong>on</strong> theuse of what are called rigid-body <strong>3D</strong> landmarkswas proposed. A rigid-body <strong>3D</strong> landmark representsthe 6D pose of a rigid body in space, andits observati<strong>on</strong> gives full-pose informati<strong>on</strong> abouta bearing camera. The use of rigid-body <strong>3D</strong> landmarkspermits reducing the computati<strong>on</strong>al timeof the EKF-<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system up to 5.5%, as the


148 J Intell Robot Syst (2012) 66:125–149number of landmarks increases. The proposedvisual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system was validated in simulateddata and real video sequences using a standard,low cost camera. Remarkably the system performsvery well in outdoor envir<strong>on</strong>ments, allowing verygood camera localizati<strong>on</strong>.The analysis of the visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system operati<strong>on</strong>in real video sequences shows that the implementedsystem has a good performance whentested in the real-world. <strong>Rigid</strong>-body <strong>3D</strong> landmarksare able to reduce the state dimensi<strong>on</strong>ality inunstructured envir<strong>on</strong>ments with low informati<strong>on</strong>loss, which enables the camera to recover the fullpath in a reliable way, avoiding EKF covarianceoverload. SURF descriptors with delayed Harristesting are both fast and repeatable enough toprovide good quality informati<strong>on</strong> about structuresin the real-world, even when systems with limitedcomputing capabilities are used. Data associati<strong>on</strong>based <strong>on</strong> L&R system, which has been created forrobust object recogniti<strong>on</strong>, shows very good performancefor map associati<strong>on</strong> tasks and enablesthe EKF to work without map corrupti<strong>on</strong> due towr<strong>on</strong>g associati<strong>on</strong>s even in l<strong>on</strong>g video sequences,and without needing special loop-closing techniquesas all the features have the same opportunityfor being detected in every frame, becauseno features are tracked. Results show that therigid-body landmarks paradigm is both promisingand powerful, and new field applicati<strong>on</strong> can beexplored in future works.The experimental data indicates that the visual<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> system achieves good localizati<strong>on</strong> whenthe number of observed landmarks is very low,working very well with <strong>on</strong>ly four landmarks beingavailable for observati<strong>on</strong> permanently, whichis possible by using feature-rich individual landmarks.This property enables them to be used inthe generati<strong>on</strong> of large maps as very few landmarksper area are needed. As body-points areparameters and no states in this system, their errordoes not decrease in time, which can explainthe slight better performance of point-like landmarkswhen the density of landmarks is very high.Possible adaptati<strong>on</strong> of body points by creatinga dynamical sub system inside each rigid bodylandmark (EKF-like adaptati<strong>on</strong> of positi<strong>on</strong>s andcovariances from individual body points) and theuse of semantical cues for improving selecti<strong>on</strong> ofrigid bodies remains open problems that can beaddressed in future work.Acknowledgments This research work was partiallyfunded by the doctoral grant program of CONICYT(Chile), by MECESUP Project FSM 0601, and by FONDE-CYT project 1090250.AppendixComputati<strong>on</strong> of Jacobians J (i)UVand J(i) , defined in(41)and(42).As functi<strong>on</strong> V(·) is the result of an iterativeminimizati<strong>on</strong> (see (38)), the computati<strong>on</strong> of J (i)UVand J (i) by using finite differences over severalminimizati<strong>on</strong>s is a very slow process. The partialderivatives of E P (·)respect to the pose must bezero when evaluated at the optimal value, thisleads to a closed form for the Jacobians. Forsimplifying the notati<strong>on</strong>, the vector a =(u 1 , v 1 , ..., 1 ,..., N ) T collecting all the parameters will beused in the following expressi<strong>on</strong>s:V (a) = arg minη(E P (η; a)) (63)∂∂η iE P (η; a) |η = V (a) = 0, ∀i (64)( )d ∂E P (V (a) ; a) = 0, ∀i, j (65)da j ∂η i∑ ∂ 2 E P ∂V (a) k+ ∂2 E P= 0 ∀i, j (66)k ∂η i ∂η k ∂a j ∂η i ∂a jThe last expressi<strong>on</strong> can be c<strong>on</strong>verted into matrixform by making the following definiti<strong>on</strong>s:E ηη (V (a) ; a) (i. j) = ∂2 E P∂η i ∂η j(67)E ηA (V (a) ; a) (i. j) = ∂2 E P(68)∂η i ∂a jAfter the replacements, the following expressi<strong>on</strong>shold.E ηη (V (a) ; a) ∂V∂a (a) + E ηA (V (a) ; a) = 0 (69)⇒ ∂V (a) =−E−1 ηη∂a (V (a) ; a) E ηA (V (a) ; a) (70)


J Intell Robot Syst (2012) 66:125–149 149The last expressi<strong>on</strong> has a closed form, and enablesa straightforward computati<strong>on</strong> of the Jacobiansof V(·). As the quaterni<strong>on</strong> is a n<strong>on</strong>-minimal representati<strong>on</strong>for rotati<strong>on</strong>s, there is a directi<strong>on</strong> inthe observati<strong>on</strong> vector that c<strong>on</strong>tains no real informati<strong>on</strong>,then variati<strong>on</strong>s of the vector in thatdirecti<strong>on</strong> leaves the value of the error unmodified.In c<strong>on</strong>sequence, the Hessian has a null space andcannot be inverted directly. The problem can besolved by computing the inverse using an eigenvaluedecompositi<strong>on</strong> and by bounding the smallesteigenvalue from the Hessian by a small value (e.g.10 −30 ).References1. Filliat, D., Meyer, J.-A.: Map-based navigati<strong>on</strong> in mobilerobots: I. A review of localizati<strong>on</strong> strategies. Cogn.Syst. Res. 4(4), 243–282 (2003)2. Neira, J., Davis<strong>on</strong>, A.J., Le<strong>on</strong>ard, J.J.: Guest editorialspecial issue <strong>on</strong> visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>. IEEE Trans. Robotics24(4), 929–931 (2008)3. Hauke Strasdat, J., M<strong>on</strong>tiel, M. Davis<strong>on</strong>, A.J.: ScaleDrift-Aware Large Scale M<strong>on</strong>ocular <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>, RSS(2010)4. Civera, J, Grasa, O.G., Davis<strong>on</strong>, A.J., M<strong>on</strong>tiel, J.M.M.:1-point RANSAC for EKF-based Structure fromMoti<strong>on</strong>, IROS 2009 Proceedings, pp. 3498–35045. Handa, A., Chli, M., Strasdat, H., Davis<strong>on</strong>, A.J.: ScalableActive Matching, Proc. 2010 IEEE C<strong>on</strong>f. <strong>on</strong> <strong>Computer</strong>Visi<strong>on</strong> and Pattern Recogniti<strong>on</strong>, June 13–18,2010, San Francisco6. Davis<strong>on</strong>, A.J., Reid, I.D., Molt<strong>on</strong>, N., Otasse, O.:M<strong>on</strong>o<str<strong>on</strong>g>SLAM</str<strong>on</strong>g>: Real-time single camera <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>. IEEETrans. Pattern Anal. Mach. Intell. 29(6), 1052–1067(2007)7. Nistér, D., Naroditsky, O., Bergen, J.: <str<strong>on</strong>g>Visual</str<strong>on</strong>g> odometryfor ground vehicle applicati<strong>on</strong>s. J. Field Robot. 23(1),3–20 (2006)8. Nistér, D.: Preemptive RANSAC for live structure andmoti<strong>on</strong> estimati<strong>on</strong>. Mach. Vis. Appl. 16(5), 321–329(2005)9. Davis<strong>on</strong>, A.J.: Real-time simultaneous localisati<strong>on</strong> andmapping with a single camera, ICCV 2003 Proceedings,pp. 1403–1410 vol. 210. Newcombe, R, Davis<strong>on</strong>, A.J.: Live Dense Rec<strong>on</strong>structi<strong>on</strong>with a Single Moving Camera, IEEE C<strong>on</strong>ference<strong>on</strong> <strong>Computer</strong> Visi<strong>on</strong> and Pattern Recogniti<strong>on</strong> (CVPR2010)11. Angeli, A., Davis<strong>on</strong>, A.J.: Live Feature Clustering inVideo Using Appearance and <strong>3D</strong> Geometry, BMVC(2010)12. Gee, A.P., Chekhlov, D., Calway, A., Mayol-Cuevas,W.: Discovering higher level structure in visual <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>.IEEE Trans. Robotics 24(5), 980–990 (2008)13. Kw<strong>on</strong>, J., Lee, K.M.: M<strong>on</strong>ocular <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> with LocallyPlanar <strong>Landmarks</strong> via Geometric Rao-BlackwellizedParticle Filtering <strong>on</strong> Lie Groups, 2010 IEEE C<strong>on</strong>f.<strong>on</strong> <strong>Computer</strong> Visi<strong>on</strong> and Pattern Recogniti<strong>on</strong>-CVPR2010, pp. 1522–1529, 13–18 June 2010, San Francisco,USA14. Civera, J., Davis<strong>on</strong>, A.J., Martínez-M<strong>on</strong>tiel, M.: Inversedepth parametrizati<strong>on</strong> for m<strong>on</strong>ocular <str<strong>on</strong>g>SLAM</str<strong>on</strong>g>.IEEE Trans. Robotics 24(5), 932–945 (2008)15. Pathak, K., Birk, A., Vaskevicius, N., Poppinga, J.: Fastregistrati<strong>on</strong> based <strong>on</strong> noisy planes with unknown corresp<strong>on</strong>dencesfor <strong>3D</strong> mapping. IEEE Trans. Robotics26(3), 424–441 (2010)16. Kohlhepp, P., Pozzo, P., Walther, M., Dillmann, R.:Sequential <strong>3D</strong>-<str<strong>on</strong>g>SLAM</str<strong>on</strong>g> for mobile acti<strong>on</strong> planning, Proc.2004 IEEE/RSJ Internati<strong>on</strong>al C<strong>on</strong>f. <strong>on</strong> IntelligentRobots and Systems, Sendai, Japan17. Pathak, K., Birk, A., Vaskevicius, N., Pfingsthorn, M.,Schwertfeger, S., Poppinga, J.: Online <strong>3D</strong> <str<strong>on</strong>g>SLAM</str<strong>on</strong>g> byregistrati<strong>on</strong> of large planar surface segments and closedform pose-graph relaxati<strong>on</strong>. J. Field Robot. 27(1), 52–84 (2009)18. Magnuss<strong>on</strong>, M., Lilienthal, A., Duckett, T.: Scan registrati<strong>on</strong>for aut<strong>on</strong>omous mining vehicles using <strong>3D</strong>-NDT.J. Field Robot. 24(10), 803–827 (2007)19. Hamilt<strong>on</strong>, W.R.: On quaterni<strong>on</strong>s, or <strong>on</strong> a new systemof imaginaries in algebra. Philos. Mag. 25(3), 489–495(1844)20. L<strong>on</strong>comilla, P.: Generación automática de landmarksvisuales naturales tridimensi<strong>on</strong>ales basada en descriptoreslocales para auto-localización de robots móviles,Ph.D. Thesis, Universidad de Chile (2010)21. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: SURF:speeded up robust features. Comput. Vis. ImageUnderst. 110(3), 346–359 (2008)22. Haralick, B.M., Lee, Ch.-N., Ottenberg, K., Nölle, M.:Review and analysis of soluti<strong>on</strong>s of the three pointperspective pose estimati<strong>on</strong> problem. Int. J. Comput.Vis. 13(3), 331–356 (1994)23. Harris, C., Stephens, M.: A combined corner and edgedetector. Proc. of the 4th Alvey Visi<strong>on</strong> C<strong>on</strong>ference, pp.147–15124. L<strong>on</strong>comilla, P., Ruiz del Solar, J.: A fast probabilisticmodel for hypothesis rejecti<strong>on</strong> in SIFT-based objectrecogniti<strong>on</strong>, Lecture Notes in <strong>Computer</strong> Science 4225(CIARP 2006). Springer, 696–70525. Ruiz-del-Solar, J., L<strong>on</strong>comilla, P.: Robot head posedetecti<strong>on</strong> and gaze directi<strong>on</strong> determinati<strong>on</strong> using localinvariant features. Adv. Robot. 23(3), 305–328 (2009)26. Welch, G., Bishop, G.: An introducti<strong>on</strong> to the Kalmanfilter. University of North Carolina, Chapel Hill (1995)27. LINPACK library official site: http://www.netlib.org/linpack/28. Grosse-Kunstleve, R.W., Terwilliger, T.C., Adams,P.D.: Experience c<strong>on</strong>verting a large Fortran-77 programto C++. IUCr Comp. Comm. 10, 75–84 (2009)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!