:: Algorithm of <strong>IMU</strong>-aided feature tracker ::prev curr prevload img , img , X , and Rcalculate H = KR KCtracked feature∞= 0<strong>IMU</strong>feature−1from <strong>IMU</strong><strong>IMU</strong>(a)<strong>for</strong> i=1to Nxpredip = x0prevfeature= H xprediprev∞ i(b)(c)Fig. 1. (a) Extracting feature in a previous frame. (b) Trackingfeature in a current frame with an original KLT feature tracker. (c)Tracking feature in a current frame with an <strong>IMU</strong>-aided KLTfeature tracker. Red, blue, and green rectangles indicate a positionof extracted feature from a previous frame (or an initial position ofthe feature in a current frame), a position of predicted the feature ina current frame, and a final target position of the tracked feature ina current frame, respectively.research. Original KLT tracker fails to track featureshown in figure 1(b) because the disparity betweeninitial (red rectangle) and target (green rectangle)feature is out of the tracking search range. However, ifthe initial position of the feature in a current frame ispredicted by 3D rotation in<strong>for</strong>mation from <strong>IMU</strong>, theinitial position can be more close to the target featureposition enough to be within the tracking search rangeshown in figure 1(c).Correspondence pairs, x1 ↔ x2, between two imageshave following relation as equation (1).x 12= K ⎡⎣ RK − x 1+ t Z ⎤⎦(1)where x1and x 2are correspondence pairs betweentwo images ( x1and x 2are features in a previous anda current frames, respectively), Z is the depthin<strong>for</strong>mation corresponding x1, K is the cameraintrinsic parameter, R and t are camera 3D rotationmeasured by <strong>IMU</strong> and 3D camera translation,respectively.If there is no camera translation motion or a depth <strong>for</strong>the feature is infinite, the second term in equation (1)can be canceled like equation (2). Also, because cameramotion is generally much less than the depth <strong>for</strong> thefeature, equation (2) can be approximated as equation(3). In practice, large optical flows are mainly due tocamera rotation rather than translation.t = 0 ⎫ −1⎬ ⇒ t Z = 0 ⇒ x2 = KRK x (2)1Z →∞⎭<strong>for</strong> j=0to Nj+1max iteration−1T∑x∈Aj[ ( w ( p )) T( x)]p= H J I −pcurri= p + pif convergedbreakendendif well −trackedx = pCendendj+1tracked feature= C + 1xtracked featuret Z ⇒ t Z ≈ 0 ⇒ x ≈ KRK x (3)−12 1Finally, infinity homography calculated by cameraintrinsic parameter and camera rotation measured by<strong>IMU</strong> is used to predict an initial position of trackedfeature in a current frame as equation (4)−1H = ∞KRK (4)Figure 2 shows an algorithm <strong>for</strong> <strong>IMU</strong>-aided featurepredtracker. An initial position, x , of the feature in thecurrent frame is predicted by multiplying an infinityhomography, H∞, calculated by 3D rotation from <strong>IMU</strong>prevand the position, x , of the feature from the previousframe.3. VIDEO STABILIZATION FOR ROBOTEYESMost video stabilization techniques based on featuretracker such as KLT tracker may be failed to stabilize anunstable video because these trackers cannot sometimestrack feature well when large image motion occurs,above mentioned in section 2. In order to overcome thislarge image motion, <strong>IMU</strong>-aided KLT feature tracker isadopted to our video stabilization <strong>for</strong> robot eyes. Thist⎫⎪⎪⎪⎪⎪⎬⎪⎪⎪⎪⎪⎪⎭original KLT trackerFig. 2. Algorithm of <strong>IMU</strong>-aided KLT feature tracker. An initialposition of feature is predicted by an infinity homography( H∞)calculated with <strong>IMU</strong>.
video stabilization is inspired by human VOR system[7]: vision and inertial sensors are corresponding tohuman eye and ear, respectively. Inertial sensor has afast response though vision sensor is slow. And inertialsensor can help vision senor to process more fast andstable.We follow the video stabilization framework of [5]such as motion estimation, motion filtering, and imagecomposition.A) Motion estimationA global motion of a current frame with respect to areference frame is calculated by updating an inter-framemotion to a global motion of a previous frame. And theinter-frame motion is estimated with correspondencepairs found by the proposed <strong>IMU</strong>-aided KLT featuretracker. Here, affine motion model is used <strong>for</strong> everyframe.B) Motion filtering and image compositionIn general, an unwanted motion is regarded as amotion with high frequency in video stabilization. Inthis paper, Kalman filter is used to eliminate unwantedmotion <strong>for</strong> real-time processing. Be<strong>for</strong>e filtering, affinemotion estimated in motion estimation step isapproximated into similarity motion in order to extractfour geometric parameters, such as scale, in-planerotation (θ ), x-translation ( t x), and y-translation ( t y).And then, each parameter except scale is filtered byKalman filter. The degree of smoothing is controlled bymeasurement noise variance in Kalman filter. The morenoise variance is, the smoother, and vice versa.After getting correction motion which is differencebetween estimated motion (θ , tx, and ty) and filteredmotion (θ , tx, and t y), stabilized image sequence iscreated with this correction motion like equation (5).xsta( θ θ ) sin ( θ θ )( θ −θ ) ( θ −θ)⎛1 0⎞⎛cos− − − ⎞ ⎛tx− tx⎞= ⎜ ⎟⎜⎟xuns+ ⎜ ⎟⎝0 1 ⎜sin cos ⎟ ty− ty ⎠⎝⎝ ⎠No compensation ⎠ <strong>for</strong> scale motiondCompensation <strong>for</strong> only unwantedin-plane rotation motionCompensation <strong>for</strong>only unwantetranslation motions(5)where xunsand xstaare pixel coordinates of unstableand stabilized image. Here, scale parameter is notcompensated <strong>for</strong> at all because scale is not quicklychanged (or has low frequency).4. EXPERIMENTAL RESULTSA) <strong>IMU</strong>-aided feature trackerFirst, we evaluated the per<strong>for</strong>mance of the proposed<strong>IMU</strong>-aided KLT feature tracker by comparing originalKLT tracker shown in figure 3. We used 632 frameswhich are 320x240 sized, 5x5 sized feature window,100 features at every frame, and no pyramid. In figure3(a), red, green, and blue indicate roll, pitch, and yawrotations of <strong>IMU</strong> between previous and current frames.<strong>IMU</strong>-aided tracker is higher success rate in featuretracking shown in figure 3(b), and loweriteration-numbers shown in figure 3(c), and closerdistance between predicted and tracked position offeature shown in figure 3(d) than original KLT tracker.As a result, we found that our proposed <strong>IMU</strong>-aidedtracker had a better per<strong>for</strong>mance than original KLTtracker as table I.TABLE ICOMPARISON OF ORIGINAL KLT AND <strong>IMU</strong>-AIDED TRACKEROriginal KLT <strong>IMU</strong>-aided trackerSuccess rate (%) 42.45% 77.78%Iteration number 12.53 7.18Distance b/w PF and TF (pixel) 6.11 2.62 PF : Predicted <strong>Feature</strong>, TF : Tracked <strong>Feature</strong>B) <strong>Video</strong> stabilization <strong>for</strong> robot eyeSecond, we used proposed <strong>IMU</strong>-aided tracker tovideo stabilization <strong>for</strong> robot eyes. We tested a videosequence captured in our laboratory. “Comparison toother stabilized video” is widely regarded as the bestassessment <strong>for</strong> video stabilization, recently. There<strong>for</strong>e aper<strong>for</strong>mance assessment of video stabilization can besomewhat subjective and difficult. We could not findwhether our proposed video stabilization based on<strong>IMU</strong>-aided tracker was better or worse than the onebased on original KLT tracker during operation shownin figure 4. However, we observed that the proposed onecould robustly stabilize unstable video sequences longerthan the one based on original KLT tracker shown infigure 4. The reason is that the proposed videostabilization which uses <strong>IMU</strong>-aided tracker can estimatemotion parameters (or find correspondence pairs) betterthan the video stabilization which uses original KLT,especially in large image motion as mentioned be<strong>for</strong>e.Figure 4 shows some results of stabilized frames.5. CONCLUSIONS AND FUTURE WORKSIn this paper, we proposed new video stabilizationsystem <strong>for</strong> robot eye which is inspired by human VORsystem. An <strong>IMU</strong> was adopted as a vestibulo system ofrobot. The initial position of feature estimated with the<strong>IMU</strong> was incorporated into the KLT tracker. Theproposed <strong>IMU</strong>-aided tracker improved the speed andaccuracy of the tracking process. Also, videostabilization <strong>for</strong> robot eyes based on <strong>IMU</strong>-aided trackerstabilized unstable videos longer than the one based onoriginal KLT tracker.In the future, we have to solve many problems suchas advanced motion model (3D), reduction of undefinedregions, and adaptive motion filtering.REFERENCES