13.07.2015 Views

Video Stabilization for Robot Eye Using IMU-Aided Feature ... - KAIST

Video Stabilization for Robot Eye Using IMU-Aided Feature ... - KAIST

Video Stabilization for Robot Eye Using IMU-Aided Feature ... - KAIST

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

International Conference on Control, Automation and Systems 2010Oct. 27-30, 2010 in KINTEX, Gyeonggi-do, Korea<strong>Video</strong> <strong>Stabilization</strong> <strong>for</strong> <strong>Robot</strong> <strong>Eye</strong> <strong>Using</strong> <strong>IMU</strong>-<strong>Aided</strong> <strong>Feature</strong> TrackerYeon Geol Ryu 1 , Hyun Chul Roh 2 , and Myung Jin Chung 3Department of Electrical Engineering, <strong>KAIST</strong>, Daejeon, Korea(Tel : +82-42-5429; E-mail: { 1 ygryu; 2 rohs@rr.kaist.ac.kr}, 3 mjchung@ee.kaist.ac.kr)Abstract: In this paper, new video stabilization system is presented <strong>for</strong> robot eye. This system is biologically inspiredby the human vestibulo-ocular reflex. <strong>Feature</strong> tracker with inertial sensor is proposed to estimate the motion moreaccurately and fast. The rotational motion measured by the inertial sensor is incorporated into the KLT tracker in orderto predict a position of feature in current frame. This <strong>IMU</strong>-aided tracker improves a success rate and reduces aniteration number in tracking feature. Also, a Kalman filter is applied to remove unwanted camera motion. Theexperimental results show that the proposed video stabilization system has the characteristics of the high speed andaccuracy in various conditions.Keywords: <strong>Video</strong> stabilization, Vestibulo-ocular reflex (VOR), KLT tracker, Inertial measurement unit (<strong>IMU</strong>), Kalman filter.1. INTRODUCTIONNowadays, many humanoid robots are able to walk orrun. When they move and stop, the image sequencesfrom their eyes or cameras might be unpleasant <strong>for</strong> aremote operator to see. This can degrade theper<strong>for</strong>mance of machine vision analysis, such as objecttracking, recognition, and visual surveillance, due tounintentional shakes or jiggles. Although there havebeen many studies on image stabilization <strong>for</strong> consumerelectronic apparatuses, such as digital cameras andcamcorders, there have been few studies that concernedrobotics.The video stabilization process is generallycomposed of three major steps: motion estimation,motion filtering, and image composition [1]. Theaccuracy and computation of motion estimation arecrucial in video stabilization because the per<strong>for</strong>mance ofmotion estimation mainly determines the effectivenessof video stabilization. In motion estimation,feature-tracking based correspondence has been mainlyused <strong>for</strong> robust and real-time processing. But the featuretracking method has 1) small motion constraint that isnot able to track features in large image motion inducedby abrupt camera motion and 2) not enough fast <strong>for</strong>real-time processing. We adopt KLT feature trackerintegrated with an <strong>IMU</strong> in order to overcome smallmotion constraint and improve the speed [2]. Afterestimating motion, motion filtering step extractsunwanted motion from the global motion. Many motionfiltering approaches <strong>for</strong> video stabilization have beenproposed. For example, Gaussian filtering, parabolicfitting, and regularization have been proposed asoff-line processes, while motion vector integration andKalman filter are able to process in real-time. In thispaper, the Kalman filter is adopted to remove unwantedshaky motions in real-time [3]. In image compositionstep, an input image is warped according to thecorrection motion that is extracted in motion filteringstep.There have been several studies on the design andimplementation of robotic eyes. But they have beenfocused on hardware compensation rather thansoftware-image stabilization. Only hardwarestabilization is not able to completely stabilize unstableimage sequence due to errors of sensor and actuator anddelay between them.The rest of this paper is organized as follows. Theproposed <strong>IMU</strong>-aided KLT feature tracker and videostabilization <strong>for</strong> robot eyes are described in Section 2and 3, respectively. In Section 4, various results of theexperiments are presented, while conclusions are drawnin Section 5.2. <strong>IMU</strong>-AIDED FEATURE TRACKERKLT tracker is known <strong>for</strong> a good feature tracker inmany vision applications such as optical flow, objecttracking, and motion estimation. This feature tracker hasbeen studied in-depth by Kanade and his team members[4]. However, KLT feature tracker which uses onlytemplate image alignment techniques has a limitation toa large image motion induced by an abrupt cameramotion which commonly occurs in robot eyes. Thistracker fails to track feature because the amount oftranslation is out of the tracking search region.We already made an attempt to solve this problem [5].At that time, we predicted the initial feature position byusing in-plane rotation in<strong>for</strong>mation from <strong>IMU</strong> andintegrated into KLT tracker with translational motionmodel. The per<strong>for</strong>mance was good <strong>for</strong> standing camera,but not <strong>for</strong> moving camera. Around the same time,Myung H. et al. showed inertial-aided KLT featuretracker which used affine motion model to deal with thetemplate de<strong>for</strong>mation induced by roll motion [6]. Theirtracker has high per<strong>for</strong>mance <strong>for</strong> success rate in variousexperimental conditions but is naturally slower thantranslational motion model.In this paper, we adopt translational motion modelrather than affine in order to fast process in a generalCPU not a GPU. Figure 1 shows a motivation of this


:: Algorithm of <strong>IMU</strong>-aided feature tracker ::prev curr prevload img , img , X , and Rcalculate H = KR KCtracked feature∞= 0<strong>IMU</strong>feature−1from <strong>IMU</strong><strong>IMU</strong>(a)<strong>for</strong> i=1to Nxpredip = x0prevfeature= H xprediprev∞ i(b)(c)Fig. 1. (a) Extracting feature in a previous frame. (b) Trackingfeature in a current frame with an original KLT feature tracker. (c)Tracking feature in a current frame with an <strong>IMU</strong>-aided KLTfeature tracker. Red, blue, and green rectangles indicate a positionof extracted feature from a previous frame (or an initial position ofthe feature in a current frame), a position of predicted the feature ina current frame, and a final target position of the tracked feature ina current frame, respectively.research. Original KLT tracker fails to track featureshown in figure 1(b) because the disparity betweeninitial (red rectangle) and target (green rectangle)feature is out of the tracking search range. However, ifthe initial position of the feature in a current frame ispredicted by 3D rotation in<strong>for</strong>mation from <strong>IMU</strong>, theinitial position can be more close to the target featureposition enough to be within the tracking search rangeshown in figure 1(c).Correspondence pairs, x1 ↔ x2, between two imageshave following relation as equation (1).x 12= K ⎡⎣ RK − x 1+ t Z ⎤⎦(1)where x1and x 2are correspondence pairs betweentwo images ( x1and x 2are features in a previous anda current frames, respectively), Z is the depthin<strong>for</strong>mation corresponding x1, K is the cameraintrinsic parameter, R and t are camera 3D rotationmeasured by <strong>IMU</strong> and 3D camera translation,respectively.If there is no camera translation motion or a depth <strong>for</strong>the feature is infinite, the second term in equation (1)can be canceled like equation (2). Also, because cameramotion is generally much less than the depth <strong>for</strong> thefeature, equation (2) can be approximated as equation(3). In practice, large optical flows are mainly due tocamera rotation rather than translation.t = 0 ⎫ −1⎬ ⇒ t Z = 0 ⇒ x2 = KRK x (2)1Z →∞⎭<strong>for</strong> j=0to Nj+1max iteration−1T∑x∈Aj[ ( w ( p )) T( x)]p= H J I −pcurri= p + pif convergedbreakendendif well −trackedx = pCendendj+1tracked feature= C + 1xtracked featuret Z ⇒ t Z ≈ 0 ⇒ x ≈ KRK x (3)−12 1Finally, infinity homography calculated by cameraintrinsic parameter and camera rotation measured by<strong>IMU</strong> is used to predict an initial position of trackedfeature in a current frame as equation (4)−1H = ∞KRK (4)Figure 2 shows an algorithm <strong>for</strong> <strong>IMU</strong>-aided featurepredtracker. An initial position, x , of the feature in thecurrent frame is predicted by multiplying an infinityhomography, H∞, calculated by 3D rotation from <strong>IMU</strong>prevand the position, x , of the feature from the previousframe.3. VIDEO STABILIZATION FOR ROBOTEYESMost video stabilization techniques based on featuretracker such as KLT tracker may be failed to stabilize anunstable video because these trackers cannot sometimestrack feature well when large image motion occurs,above mentioned in section 2. In order to overcome thislarge image motion, <strong>IMU</strong>-aided KLT feature tracker isadopted to our video stabilization <strong>for</strong> robot eyes. Thist⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎭original KLT trackerFig. 2. Algorithm of <strong>IMU</strong>-aided KLT feature tracker. An initialposition of feature is predicted by an infinity homography( H∞)calculated with <strong>IMU</strong>.


video stabilization is inspired by human VOR system[7]: vision and inertial sensors are corresponding tohuman eye and ear, respectively. Inertial sensor has afast response though vision sensor is slow. And inertialsensor can help vision senor to process more fast andstable.We follow the video stabilization framework of [5]such as motion estimation, motion filtering, and imagecomposition.A) Motion estimationA global motion of a current frame with respect to areference frame is calculated by updating an inter-framemotion to a global motion of a previous frame. And theinter-frame motion is estimated with correspondencepairs found by the proposed <strong>IMU</strong>-aided KLT featuretracker. Here, affine motion model is used <strong>for</strong> everyframe.B) Motion filtering and image compositionIn general, an unwanted motion is regarded as amotion with high frequency in video stabilization. Inthis paper, Kalman filter is used to eliminate unwantedmotion <strong>for</strong> real-time processing. Be<strong>for</strong>e filtering, affinemotion estimated in motion estimation step isapproximated into similarity motion in order to extractfour geometric parameters, such as scale, in-planerotation (θ ), x-translation ( t x), and y-translation ( t y).And then, each parameter except scale is filtered byKalman filter. The degree of smoothing is controlled bymeasurement noise variance in Kalman filter. The morenoise variance is, the smoother, and vice versa.After getting correction motion which is differencebetween estimated motion (θ , tx, and ty) and filteredmotion (θ , tx, and t y), stabilized image sequence iscreated with this correction motion like equation (5).xsta( θ θ ) sin ( θ θ )( θ −θ ) ( θ −θ)⎛1 0⎞⎛cos− − − ⎞ ⎛tx− tx⎞= ⎜ ⎟⎜⎟xuns+ ⎜ ⎟⎝0 1 ⎜sin cos ⎟ ty− ty ⎠⎝⎝ ⎠No compensation ⎠ <strong>for</strong> scale motiondCompensation <strong>for</strong> only unwantedin-plane rotation motionCompensation <strong>for</strong>only unwantetranslation motions(5)where xunsand xstaare pixel coordinates of unstableand stabilized image. Here, scale parameter is notcompensated <strong>for</strong> at all because scale is not quicklychanged (or has low frequency).4. EXPERIMENTAL RESULTSA) <strong>IMU</strong>-aided feature trackerFirst, we evaluated the per<strong>for</strong>mance of the proposed<strong>IMU</strong>-aided KLT feature tracker by comparing originalKLT tracker shown in figure 3. We used 632 frameswhich are 320x240 sized, 5x5 sized feature window,100 features at every frame, and no pyramid. In figure3(a), red, green, and blue indicate roll, pitch, and yawrotations of <strong>IMU</strong> between previous and current frames.<strong>IMU</strong>-aided tracker is higher success rate in featuretracking shown in figure 3(b), and loweriteration-numbers shown in figure 3(c), and closerdistance between predicted and tracked position offeature shown in figure 3(d) than original KLT tracker.As a result, we found that our proposed <strong>IMU</strong>-aidedtracker had a better per<strong>for</strong>mance than original KLTtracker as table I.TABLE ICOMPARISON OF ORIGINAL KLT AND <strong>IMU</strong>-AIDED TRACKEROriginal KLT <strong>IMU</strong>-aided trackerSuccess rate (%) 42.45% 77.78%Iteration number 12.53 7.18Distance b/w PF and TF (pixel) 6.11 2.62 PF : Predicted <strong>Feature</strong>, TF : Tracked <strong>Feature</strong>B) <strong>Video</strong> stabilization <strong>for</strong> robot eyeSecond, we used proposed <strong>IMU</strong>-aided tracker tovideo stabilization <strong>for</strong> robot eyes. We tested a videosequence captured in our laboratory. “Comparison toother stabilized video” is widely regarded as the bestassessment <strong>for</strong> video stabilization, recently. There<strong>for</strong>e aper<strong>for</strong>mance assessment of video stabilization can besomewhat subjective and difficult. We could not findwhether our proposed video stabilization based on<strong>IMU</strong>-aided tracker was better or worse than the onebased on original KLT tracker during operation shownin figure 4. However, we observed that the proposed onecould robustly stabilize unstable video sequences longerthan the one based on original KLT tracker shown infigure 4. The reason is that the proposed videostabilization which uses <strong>IMU</strong>-aided tracker can estimatemotion parameters (or find correspondence pairs) betterthan the video stabilization which uses original KLT,especially in large image motion as mentioned be<strong>for</strong>e.Figure 4 shows some results of stabilized frames.5. CONCLUSIONS AND FUTURE WORKSIn this paper, we proposed new video stabilizationsystem <strong>for</strong> robot eye which is inspired by human VORsystem. An <strong>IMU</strong> was adopted as a vestibulo system ofrobot. The initial position of feature estimated with the<strong>IMU</strong> was incorporated into the KLT tracker. Theproposed <strong>IMU</strong>-aided tracker improved the speed andaccuracy of the tracking process. Also, videostabilization <strong>for</strong> robot eyes based on <strong>IMU</strong>-aided trackerstabilized unstable videos longer than the one based onoriginal KLT tracker.In the future, we have to solve many problems suchas advanced motion model (3D), reduction of undefinedregions, and adaptive motion filtering.REFERENCES


(a) <strong>IMU</strong> rotation angle.Red, green, and blue lines indicate roll, pitch, and yaw rotation (degree / frame), respectively. Red, green, and blue arrows show roll,pitch, and yaw dominant period, respectively. And black arrow represents complex rotational motion period.(b) Success rate in tracking.Red and green balls indicate and original and <strong>IMU</strong>-aided KLT trackers, respectively.(c) Iteration numberRed and green balls indicate and original and <strong>IMU</strong>-aided KLT trackers, respectively.(d) Distance between predicted and tracked featureRed and green balls indicate and original and <strong>IMU</strong>-aided KLT trackers, respectively.Fig. 3. Comparison between original and <strong>IMU</strong>-aided KLT feature tracker.(a)(b)Fig. 4. (a) Stabilized frames by video stabilization based on originalKLT tracker. (b) Stabilized frames by video stabilization based on<strong>IMU</strong>-aided tracker. Left, center, and right frames are the 1 st , 200 th ,and 428 th frames.[1] Yasuyuki Matsushita, “Full-Frame <strong>Video</strong> <strong>Stabilization</strong> withMotion Inpainting,” EEE Trans. Pattern Analysis and MachineIntelligence, vol. 28, no. 7, July 2006.[2] Bruce D. Lucas, Takeo Kanade, “An Iterative Image RegistrationTechnique with an Application to Stereo Vision,” Proceedingsof Image Understanding Workshop, pp.121-130, 1981.[3] Sarp Erturk, “Real-Time Digital Image <strong>Stabilization</strong> <strong>Using</strong>Kalman Filters,” Real-Time Imaging 8, p.317–328, 2002.[4] J. Shi and C. Tomasi, “Good features to track,” in IEEEConference on Computer Vision and Pattern Recognition, June,1994.[5] Y. G. Ryu, H. C. Roh, S. J. Kim, K. H. An, M. J. Chung, “DigitalImage <strong>Stabilization</strong> <strong>for</strong> Humanoid <strong>Eye</strong>s Inspired by Human VORSystem,” 2009 IEEE International Conference on <strong>Robot</strong>ics andBiomimetics, Guilin, Guangxi, China, December 19-23, 2009[6] M. Hwangbo, J.S. Kim, Kanade T. “,” IEEE/RSJ InternationalConference on Intelligent <strong>Robot</strong>s and System, USA, October,2009[7] "vestibulo-ocular reflex." Encyclopaedia Britannica. 2009.Encyclopaedia Britannica Online. 29 Jul. 2009 .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!