Video Stabilization for Robot Eye Using IMU-Aided Feature ... - KAIST

International Conference on Control, Automation and Systems 2010Oct. 27-30, 2010 in KINTEX, Gyeonggi-do, KoreaVideo Stabilization for Robot Eye Using IMU-Aided Feature TrackerYeon Geol Ryu 1 , Hyun Chul Roh 2 , and Myung Jin Chung 3Department of Electrical Engineering, KAIST, Daejeon, Korea(Tel : +82-42-5429; E-mail: { 1 ygryu; 2 rohs@rr.kaist.ac.kr}, 3 mjchung@ee.kaist.ac.kr)Abstract: In this paper, new video stabilization system is presented for robot eye. This system is biologically inspiredby the human vestibulo-ocular reflex. Feature tracker with inertial sensor is proposed to estimate the motion moreaccurately and fast. The rotational motion measured by the inertial sensor is incorporated into the KLT tracker in orderto predict a position of feature in current frame. This IMU-aided tracker improves a success rate and reduces aniteration number in tracking feature. Also, a Kalman filter is applied to remove unwanted camera motion. Theexperimental results show that the proposed video stabilization system has the characteristics of the high speed andaccuracy in various conditions.Keywords: Video stabilization, Vestibulo-ocular reflex (VOR), KLT tracker, Inertial measurement unit (IMU), Kalman filter.1. INTRODUCTIONNowadays, many humanoid robots are able to walk orrun. When they move and stop, the image sequencesfrom their eyes or cameras might be unpleasant for aremote operator to see. This can degrade theperformance of machine vision analysis, such as objecttracking, recognition, and visual surveillance, due tounintentional shakes or jiggles. Although there havebeen many studies on image stabilization for consumerelectronic apparatuses, such as digital cameras andcamcorders, there have been few studies that concernedrobotics.The video stabilization process is generallycomposed of three major steps: motion estimation,motion filtering, and image composition [1]. Theaccuracy and computation of motion estimation arecrucial in video stabilization because the performance ofmotion estimation mainly determines the effectivenessof video stabilization. In motion estimation,feature-tracking based correspondence has been mainlyused for robust and real-time processing. But the featuretracking method has 1) small motion constraint that isnot able to track features in large image motion inducedby abrupt camera motion and 2) not enough fast forreal-time processing. We adopt KLT feature trackerintegrated with an IMU in order to overcome smallmotion constraint and improve the speed [2]. Afterestimating motion, motion filtering step extractsunwanted motion from the global motion. Many motionfiltering approaches for video stabilization have beenproposed. For example, Gaussian filtering, parabolicfitting, and regularization have been proposed asoff-line processes, while motion vector integration andKalman filter are able to process in real-time. In thispaper, the Kalman filter is adopted to remove unwantedshaky motions in real-time [3]. In image compositionstep, an input image is warped according to thecorrection motion that is extracted in motion filteringstep.There have been several studies on the design andimplementation of robotic eyes. But they have beenfocused on hardware compensation rather thansoftware-image stabilization. Only hardwarestabilization is not able to completely stabilize unstableimage sequence due to errors of sensor and actuator anddelay between them.The rest of this paper is organized as follows. Theproposed IMU-aided KLT feature tracker and videostabilization for robot eyes are described in Section 2and 3, respectively. In Section 4, various results of theexperiments are presented, while conclusions are drawnin Section 5.2. IMU-AIDED FEATURE TRACKERKLT tracker is known for a good feature tracker inmany vision applications such as optical flow, objecttracking, and motion estimation. This feature tracker hasbeen studied in-depth by Kanade and his team members[4]. However, KLT feature tracker which uses onlytemplate image alignment techniques has a limitation toa large image motion induced by an abrupt cameramotion which commonly occurs in robot eyes. Thistracker fails to track feature because the amount oftranslation is out of the tracking search region.We already made an attempt to solve this problem [5].At that time, we predicted the initial feature position byusing in-plane rotation information from IMU andintegrated into KLT tracker with translational motionmodel. The performance was good for standing camera,but not for moving camera. Around the same time,Myung H. et al. showed inertial-aided KLT featuretracker which used affine motion model to deal with thetemplate deformation induced by roll motion [6]. Theirtracker has high performance for success rate in variousexperimental conditions but is naturally slower thantranslational motion model.In this paper, we adopt translational motion modelrather than affine in order to fast process in a generalCPU not a GPU. Figure 1 shows a motivation of this

:: Algorithm of IMU-aided feature tracker ::prev curr prevload img , img , X , and Rcalculate H = KR KCtracked feature∞= 0IMUfeature−1from IMUIMU(a)for i=1to Nxpredip = x0prevfeature= H xprediprev∞ i(b)(c)Fig. 1. (a) Extracting feature in a previous frame. (b) Trackingfeature in a current frame with an original KLT feature tracker. (c)Tracking feature in a current frame with an IMU-aided KLTfeature tracker. Red, blue, and green rectangles indicate a positionof extracted feature from a previous frame (or an initial position ofthe feature in a current frame), a position of predicted the feature ina current frame, and a final target position of the tracked feature ina current frame, respectively.research. Original KLT tracker fails to track featureshown in figure 1(b) because the disparity betweeninitial (red rectangle) and target (green rectangle)feature is out of the tracking search range. However, ifthe initial position of the feature in a current frame ispredicted by 3D rotation information from IMU, theinitial position can be more close to the target featureposition enough to be within the tracking search rangeshown in figure 1(c).Correspondence pairs, x1 ↔ x2, between two imageshave following relation as equation (1).x 12= K ⎡⎣ RK − x 1+ t Z ⎤⎦(1)where x1and x 2are correspondence pairs betweentwo images ( x1and x 2are features in a previous anda current frames, respectively), Z is the depthinformation corresponding x1, K is the cameraintrinsic parameter, R and t are camera 3D rotationmeasured by IMU and 3D camera translation,respectively.If there is no camera translation motion or a depth forthe feature is infinite, the second term in equation (1)can be canceled like equation (2). Also, because cameramotion is generally much less than the depth for thefeature, equation (2) can be approximated as equation(3). In practice, large optical flows are mainly due tocamera rotation rather than translation.t = 0 ⎫ −1⎬ ⇒ t Z = 0 ⇒ x2 = KRK x (2)1Z →∞⎭for j=0to Nj+1max iteration−1T∑x∈Aj[ ( w ( p )) T( x)]p= H J I −pcurri= p + pif convergedbreakendendif well −trackedx = pCendendj+1tracked feature= C + 1xtracked featuret Z ⇒ t Z ≈ 0 ⇒ x ≈ KRK x (3)−12 1Finally, infinity homography calculated by cameraintrinsic parameter and camera rotation measured byIMU is used to predict an initial position of trackedfeature in a current frame as equation (4)−1H = ∞KRK (4)Figure 2 shows an algorithm for IMU-aided featurepredtracker. An initial position, x , of the feature in thecurrent frame is predicted by multiplying an infinityhomography, H∞, calculated by 3D rotation from IMUprevand the position, x , of the feature from the previousframe.3. VIDEO STABILIZATION FOR ROBOTEYESMost video stabilization techniques based on featuretracker such as KLT tracker may be failed to stabilize anunstable video because these trackers cannot sometimestrack feature well when large image motion occurs,above mentioned in section 2. In order to overcome thislarge image motion, IMU-aided KLT feature tracker isadopted to our video stabilization for robot eyes. Thist⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎭original KLT trackerFig. 2. Algorithm of IMU-aided KLT feature tracker. An initialposition of feature is predicted by an infinity homography( H∞)calculated with IMU.

video stabilization is inspired by human VOR system[7]: vision and inertial sensors are corresponding tohuman eye and ear, respectively. Inertial sensor has afast response though vision sensor is slow. And inertialsensor can help vision senor to process more fast andstable.We follow the video stabilization framework of [5]such as motion estimation, motion filtering, and imagecomposition.A) Motion estimationA global motion of a current frame with respect to areference frame is calculated by updating an inter-framemotion to a global motion of a previous frame. And theinter-frame motion is estimated with correspondencepairs found by the proposed IMU-aided KLT featuretracker. Here, affine motion model is used for everyframe.B) Motion filtering and image compositionIn general, an unwanted motion is regarded as amotion with high frequency in video stabilization. Inthis paper, Kalman filter is used to eliminate unwantedmotion for real-time processing. Before filtering, affinemotion estimated in motion estimation step isapproximated into similarity motion in order to extractfour geometric parameters, such as scale, in-planerotation (θ ), x-translation ( t x), and y-translation ( t y).And then, each parameter except scale is filtered byKalman filter. The degree of smoothing is controlled bymeasurement noise variance in Kalman filter. The morenoise variance is, the smoother, and vice versa.After getting correction motion which is differencebetween estimated motion (θ , tx, and ty) and filteredmotion (θ , tx, and t y), stabilized image sequence iscreated with this correction motion like equation (5).xsta( θ θ ) sin ( θ θ )( θ −θ ) ( θ −θ)⎛1 0⎞⎛cos− − − ⎞ ⎛tx− tx⎞= ⎜ ⎟⎜⎟xuns+ ⎜ ⎟⎝0 1 ⎜sin cos ⎟ ty− ty ⎠⎝⎝ ⎠No compensation ⎠ for scale motiondCompensation for only unwantedin-plane rotation motionCompensation foronly unwantetranslation motions(5)where xunsand xstaare pixel coordinates of unstableand stabilized image. Here, scale parameter is notcompensated for at all because scale is not quicklychanged (or has low frequency).4. EXPERIMENTAL RESULTSA) IMU-aided feature trackerFirst, we evaluated the performance of the proposedIMU-aided KLT feature tracker by comparing originalKLT tracker shown in figure 3. We used 632 frameswhich are 320x240 sized, 5x5 sized feature window,100 features at every frame, and no pyramid. In figure3(a), red, green, and blue indicate roll, pitch, and yawrotations of IMU between previous and current frames.IMU-aided tracker is higher success rate in featuretracking shown in figure 3(b), and loweriteration-numbers shown in figure 3(c), and closerdistance between predicted and tracked position offeature shown in figure 3(d) than original KLT tracker.As a result, we found that our proposed IMU-aidedtracker had a better performance than original KLTtracker as table I.TABLE ICOMPARISON OF ORIGINAL KLT AND IMU-AIDED TRACKEROriginal KLT IMU-aided trackerSuccess rate (%) 42.45% 77.78%Iteration number 12.53 7.18Distance b/w PF and TF (pixel) 6.11 2.62 PF : Predicted Feature, TF : Tracked FeatureB) Video stabilization for robot eyeSecond, we used proposed IMU-aided tracker tovideo stabilization for robot eyes. We tested a videosequence captured in our laboratory. “Comparison toother stabilized video” is widely regarded as the bestassessment for video stabilization, recently. Therefore aperformance assessment of video stabilization can besomewhat subjective and difficult. We could not findwhether our proposed video stabilization based onIMU-aided tracker was better or worse than the onebased on original KLT tracker during operation shownin figure 4. However, we observed that the proposed onecould robustly stabilize unstable video sequences longerthan the one based on original KLT tracker shown infigure 4. The reason is that the proposed videostabilization which uses IMU-aided tracker can estimatemotion parameters (or find correspondence pairs) betterthan the video stabilization which uses original KLT,especially in large image motion as mentioned before.Figure 4 shows some results of stabilized frames.5. CONCLUSIONS AND FUTURE WORKSIn this paper, we proposed new video stabilizationsystem for robot eye which is inspired by human VORsystem. An IMU was adopted as a vestibulo system ofrobot. The initial position of feature estimated with theIMU was incorporated into the KLT tracker. Theproposed IMU-aided tracker improved the speed andaccuracy of the tracking process. Also, videostabilization for robot eyes based on IMU-aided trackerstabilized unstable videos longer than the one based onoriginal KLT tracker.In the future, we have to solve many problems suchas advanced motion model (3D), reduction of undefinedregions, and adaptive motion filtering.REFERENCES

(a) IMU rotation angle.Red, green, and blue lines indicate roll, pitch, and yaw rotation (degree / frame), respectively. Red, green, and blue arrows show roll,pitch, and yaw dominant period, respectively. And black arrow represents complex rotational motion period.(b) Success rate in tracking.Red and green balls indicate and original and IMU-aided KLT trackers, respectively.(c) Iteration numberRed and green balls indicate and original and IMU-aided KLT trackers, respectively.(d) Distance between predicted and tracked featureRed and green balls indicate and original and IMU-aided KLT trackers, respectively.Fig. 3. Comparison between original and IMU-aided KLT feature tracker.(a)(b)Fig. 4. (a) Stabilized frames by video stabilization based on originalKLT tracker. (b) Stabilized frames by video stabilization based onIMU-aided tracker. Left, center, and right frames are the 1 st , 200 th ,and 428 th frames.[1] Yasuyuki Matsushita, “Full-Frame Video Stabilization withMotion Inpainting,” EEE Trans. Pattern Analysis and MachineIntelligence, vol. 28, no. 7, July 2006.[2] Bruce D. Lucas, Takeo Kanade, “An Iterative Image RegistrationTechnique with an Application to Stereo Vision,” Proceedingsof Image Understanding Workshop, pp.121-130, 1981.[3] Sarp Erturk, “Real-Time Digital Image Stabilization UsingKalman Filters,” Real-Time Imaging 8, p.317–328, 2002.[4] J. Shi and C. Tomasi, “Good features to track,” in IEEEConference on Computer Vision and Pattern Recognition, June,1994.[5] Y. G. Ryu, H. C. Roh, S. J. Kim, K. H. An, M. J. Chung, “DigitalImage Stabilization for Humanoid Eyes Inspired by Human VORSystem,” 2009 IEEE International Conference on Robotics andBiomimetics, Guilin, Guangxi, China, December 19-23, 2009[6] M. Hwangbo, J.S. Kim, Kanade T. “,” IEEE/RSJ InternationalConference on Intelligent Robots and System, USA, October,2009[7] "vestibulo-ocular reflex." Encyclopaedia Britannica. 2009.Encyclopaedia Britannica Online. 29 Jul. 2009 .

Video Stabilization for Robot Eye Using IMU-Aided Feature ... - KAIST

Create successful ePaper yourself

Delete template?

Save as template?