2D image mosaic building 2D3 - Ifremer
2D image mosaic building 2D3 - Ifremer
2D image mosaic building 2D3 - Ifremer
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Underwater Systems Department<br />
A.G. Allais<br />
27/09/2006 – DOP/CM/SM/PRAO/06.224<br />
Project Exocet/D<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong><br />
<strong>building</strong><br />
Diffusion:<br />
P.M. Sarradin DOP/CB/EEP/LEP<br />
M. Perrier DOP/CM/SM/PRAO<br />
Confidential<br />
Restricted<br />
Public
Date : 27/09/2006<br />
Reference : DOP/CM/SM/PRAO/06.224<br />
Analytic N° : E010403A1<br />
Contract N° :<br />
Subject/Title :<br />
Abstract :<br />
Key-words :<br />
Number of pages : 16<br />
Number of figures :<br />
Number of annex :<br />
Project Exocet/D<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
Revisions<br />
File name : <strong>2D</strong>3.doc<br />
Writer : A.G. Allais<br />
Grade Object Date Written by Checked by Approved by<br />
1.0 Creation 14/12/05 A.G. Allais M. Perrier M. Perrier<br />
CE DOCUMENT, PROPRIETE DE L'IFREMER, NE PEUT ETRE REPRODUIT OU COMMUNIQUE SANS SON AUTORISATION
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
Project Exocet/D page 3/16<br />
TABLE OF CONTENTS<br />
1. INTRODUCTION ..............................................................................................................................4<br />
2. WHAT IS A GEO-REFERENCED VIDEO MOSAIC?......................................................................4<br />
3. MOSAICING ALGORITHMS............................................................................................................5<br />
3.1. KLT algorithm ...........................................................................................................5<br />
3.1.1. Principle ............................................................................................................5<br />
3.1.2. Algorithm description ........................................................................................6<br />
3.1.2.1. Detection of point features............................................................................6<br />
3.1.2.2. Tracking of features ......................................................................................6<br />
3.1.2.3. Global displacement computation by least square method ..........................7<br />
3.2. RMR algorithm..........................................................................................................7<br />
3.2.1. Principle ............................................................................................................7<br />
3.2.2. Algorithm description ........................................................................................8<br />
3.2.2.1. Model of motion ............................................................................................8<br />
3.2.2.2. Robust estimation .........................................................................................8<br />
3.3. Metric conversion - camera self-calibration ..............................................................9<br />
3.3.1. Extraction and matching of points...................................................................10<br />
3.3.2. Scene geometry..............................................................................................10<br />
3.3.3. Points validation..............................................................................................10<br />
3.3.4. Intrinsic parameters estimation.......................................................................11<br />
3.3.5. Extrinsic parameters estimation......................................................................11<br />
3.4. Experiments............................................................................................................11<br />
3.5. Fusion with navigation data ....................................................................................12<br />
4. MATISSE SOFTWARE® .............................................................................................................. 14<br />
4.1. General architecture ...............................................................................................14<br />
4.2. User interface .........................................................................................................15<br />
5. CONCLUSION .............................................................................................................................. 15<br />
6. BIBLIOGRAPHY ........................................................................................................................... 16<br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
1. INTRODUCTION<br />
Project Exocet/D page 4/16<br />
In this report, we address the issue of managing seabed video records. During at sea trials, a<br />
lot of video is recorded by scientists who need it to analyze the ocean floor. Because of the<br />
storage capacity increase, the number of video records always increases during at sea trials.<br />
That leads scientists to spend more and more time on analyzing video records. The aim of<br />
video <strong>mosaic</strong>ing is then to propose a tool to obtain a map whose extend is far larger than the<br />
camera field of view to enable the scientist to have a global view of the scene, and at the<br />
same time to reduce and compress the video storage capacity.<br />
2. WHAT IS A GEO-REFERENCED VIDEO MOSAIC?<br />
In order to simplify the work of the scientists concerning the exploitation of the numerous<br />
video DVD’s resulting from at sea trials, we have been led to develop a tool to provide a<br />
larger view of the seabed than the restricted field of view of a video camera. The aim of video<br />
<strong>mosaic</strong>ing is then to build <strong>image</strong>s whose extend is far larger than the snapshot of a video<br />
recording. The resulting <strong>image</strong> represents a larger area of the seabed and is called a <strong>mosaic</strong>.<br />
The principle used to build <strong>mosaic</strong>s is quite simple to understand. When the video stream is<br />
acquired, it can be seen as a succession of <strong>image</strong>s which have a great part in common. The<br />
idea is to estimate the part which has been added from one <strong>image</strong> to another. The new part<br />
of the current <strong>image</strong> is added and merged with the previous <strong>image</strong>. Every N <strong>image</strong>s, a<br />
<strong>mosaic</strong> is built and another one can begin. This step is performed by <strong>image</strong> processing<br />
techniques.<br />
The other main issue of video <strong>mosaic</strong>ing is to locate the <strong>mosaic</strong>s on the seabed so that the<br />
scientists can deal the <strong>mosaic</strong>s with other geo-referenced data such as bathymetry, samples,<br />
physical or chemical data. This can be done by two ways. In the simplest case, the operator<br />
gives the positioning, heading and altitude of the first point and then the <strong>mosaic</strong> location is<br />
calculated by <strong>image</strong> processing. But this way is not the best one since errors due to <strong>image</strong><br />
processing occur and accumulate through the whole process. So, to overcome this<br />
drawback, the other means consists in merging navigation data with <strong>image</strong>.<br />
The whole process is developed in the following part but is summarized in the sketch<br />
hereafter (Figure 1).<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
Image1<br />
Image2<br />
Image3<br />
………<br />
3. MOSAICING ALGORITHMS<br />
Project Exocet/D page 5/16<br />
(X0,Y0)<br />
1 <strong>mosaic</strong> built<br />
from 3 <strong>image</strong>s<br />
(X0,Y0) (X1,Y1)<br />
Figure 1: Illustration of video <strong>mosaic</strong>ing<br />
2 <strong>mosaic</strong>s built<br />
from 3 <strong>image</strong>s each<br />
Many <strong>image</strong> processing techniques allow us to calculate the geometric relationship between<br />
two <strong>image</strong>s. Hereafter two different methods have been investigated and integrated into the<br />
MATISSE Software® which has been developed at IFREMER. The first one relies on a<br />
feature tracking algorithm (KLT) while the second one, the RMR method, estimates the<br />
movement by a robust optical flow algorithm. Both algorithms estimate a displacement in<br />
pixels between two successive <strong>image</strong>s. In order to provide the <strong>mosaic</strong>s with metric<br />
dimension, the displacement in pixels needs to be converted into meters. This step is<br />
performed using the camera parameters and the altitude provided by navigation.<br />
3.1. KLT algorithm<br />
3.1.1. Principle<br />
This algorithm is based upon researches performed by Kanade, Lucas and Tomasi (KLT). It<br />
consists in detecting and tracking point features through an <strong>image</strong> stream [SHI94]. The<br />
points are selected if they meet a criterion which characterizes a locally textured area. Then,<br />
the points are tracked through the sequence. When a list of matched points is obtained from<br />
successive <strong>image</strong>s, a global displacement is computed to register the successive <strong>image</strong>s<br />
and to build a <strong>mosaic</strong>.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
3.1.2. Algorithm description<br />
3.1.2.1.Detection of point features<br />
Project Exocet/D page 6/16<br />
According to the KLT algorithm, a feature is selected if it is easily “trackable” through an<br />
<strong>image</strong> stream. So, the features consist of windows of several pixel side, which are locally<br />
textured. More precisely, a window is selected if its mean gradient is high enough and<br />
without any particular direction.<br />
Mathematically saying, we consider the gradient vector within a window of typically 7x7<br />
pixels:<br />
⎡g<br />
x ⎤<br />
g = ⎢ ⎥<br />
⎣<br />
gy<br />
⎦<br />
We note Z the following matrix:<br />
2<br />
⎡ g x<br />
Z = ∫∫ ⎢<br />
W ⎢⎣<br />
g g x y<br />
g g ⎤ x y<br />
⎥ ⋅ω<br />
⋅ dA<br />
2<br />
gy<br />
⎥⎦<br />
Where:<br />
• g is the <strong>image</strong> intensity gradient along the x-axis,<br />
x<br />
• g is the <strong>image</strong> intensity gradient along the y-axis,<br />
y<br />
• W is the computation window,<br />
• ω is a weigh function,<br />
• dA is the area of the computation window.<br />
This method relies on the fact that the Z-matrix eigenvalues are directly linked to the texture<br />
of the area they are computed in.<br />
If we try to give a simple meaning of the eigenvalues, we can say that two small eigenvalues<br />
represent a non-textured area whereas a high eigenvalue and a small one are characteristic<br />
of an area with one specific direction. In our case, only the textured but with no particular<br />
direction windows must be selected, that’s to say we are interested in areas having both high<br />
eigenvalues. So, within an <strong>image</strong>, small areas (7x7 pixels, for instance) will be selected only<br />
if both eigenvalues are greater than a given threshold.<br />
3.1.2.2.Tracking of features<br />
The features that have been selected in the first part of the algorithm are then tracked<br />
through the video stream or the <strong>image</strong> sequence. The displacement between two successive<br />
<strong>image</strong>s is supposed to be quite small since the camera moves slowly. So, nearly all the<br />
points of an <strong>image</strong> I are also in the next <strong>image</strong> J and they are linked by a translation vector<br />
d .<br />
Now, let’s give the relationship linking two <strong>image</strong>s between two moments.<br />
Let be I ( x ξ, y −η,<br />
t )<br />
I x,<br />
y,<br />
t + τ the <strong>image</strong> at t + τ .<br />
− the <strong>image</strong> at t and ( )<br />
Put J ( x ) = I(<br />
x,<br />
y,<br />
t + τ ) and I ( x − d)<br />
= I(<br />
x − ξ, y −η,<br />
t ) where = ( ξ, η)<br />
vector of the point x = ( x, y ) between two time steps t and t + τ .<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
d is the displacement<br />
Grade : 1.0 27/09/2006
We can note that J ( x) = I(<br />
x − d)<br />
+ n(<br />
x)<br />
where ( x)<br />
For each window W , d is obtained by minimizing [ ( ) ]<br />
That leads to solve the following equation:<br />
Gd = e<br />
With:<br />
t<br />
G = ∫∫ gg ⋅ω<br />
⋅ dA<br />
e<br />
∫∫<br />
W<br />
= W<br />
( − J )<br />
I ⋅ g ⋅ω<br />
⋅ dA<br />
Project Exocet/D page 7/16<br />
n represents the noise.<br />
∫∫<br />
W<br />
2<br />
n x ⋅ω<br />
⋅ dA .<br />
This equation is solved for each selected window. So, for each window selected in the first<br />
<strong>image</strong>, we can compute the local displacement between the first <strong>image</strong> and the second one.<br />
The steps of point detection and tracking of the KLT algorithm are illustrated in Figure 2.<br />
Figure 2: Detection and tracking of points in a sequence of coral reef<br />
3.1.2.3.Global displacement computation by least square method<br />
When the points are matched between two successive <strong>image</strong>s, a global displacement is<br />
computed in order to register the <strong>image</strong>s.<br />
The displacement is modelled as a 4-parameter rigid global <strong>2D</strong> transformation, that’s to say a<br />
transformation composed of a translation, a rotation and a scale factor.<br />
This global displacement is computed by the iterative least square method. Each iteration<br />
enables to refine the result. Besides, this method is completed by an acceptance criterion<br />
which is used to validate the matches and to strike off the false matches in order to make the<br />
computation more robust.<br />
3.2. RMR algorithm<br />
3.2.1. Principle<br />
The second method we have investigated to build <strong>mosaic</strong>s is the Robust Multi-Resolution<br />
(RMR) method which is based upon the estimation of the optical flow [ODO95]. The<br />
advantage of this method is that the motion is estimated from the whole <strong>image</strong>.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
Project Exocet/D page 8/16<br />
In the RMR algorithm, the first stage consists in choosing a motion model. Then, the aim is to<br />
estimate the parameters of the model using the classic method of robust estimation in the<br />
domain of <strong>image</strong> and signal processing. This step is combined to a coarse-to-fine estimation<br />
using multi-resolution levels of <strong>image</strong>s.<br />
3.2.2. Algorithm description<br />
3.2.2.1.Model of motion<br />
In the first step of the algorithm, a motion model is chosen. In the RMR algorithm, we<br />
consider the class of <strong>2D</strong> polynomial motion models and we deal only with the <strong>2D</strong> affine<br />
model which is not too complex but enough representative of a large part of motion<br />
transformations:<br />
⎧u(<br />
X ) = a + a x + a y<br />
i 1 2 i 3 i<br />
⎨<br />
⎩v(<br />
X ) = a + a x + a y<br />
i 4 5 i 6 i<br />
And with matrix notation, it can be stated as:<br />
⎡u(<br />
X ) i ⎤<br />
( X ) = B(<br />
X ) A<br />
i ⎢ ⎥ =<br />
⎣v(<br />
X ) i ⎦<br />
V i<br />
Where<br />
A ( a a a a a a<br />
1 2 3 4 5 6<br />
T =<br />
⎡1<br />
x y 0 0 0<br />
i i<br />
⎤<br />
And B = B(<br />
X ) =<br />
i<br />
i ⎢<br />
⎥<br />
⎣0<br />
0 0 1 x y i i ⎦<br />
)<br />
For each point X , one can write the flow constraint equation linking spatial and temporal<br />
intensity gradients:<br />
( X ) ∇I(<br />
X ) + I ( X ) = 0<br />
V ,<br />
I<br />
x<br />
i<br />
⋅ i t i<br />
i<br />
( X ) u(<br />
X ) I ( X ) v(<br />
X ) + I ( X ) = 0<br />
i<br />
i<br />
+ y i i t i<br />
where ( ) is the spatial gradient vector of the intensity, is the partial temporal<br />
i<br />
derivative of the intensity relative to time and is the vector field.<br />
X I ∇ ) ( t i X I<br />
( ) i X V<br />
3.2.2.2.Robust estimation<br />
The goal of robust estimation is to find the parameter vector Θ which best fits the model<br />
( X , Θ)<br />
to the observations . y<br />
M i<br />
i<br />
t t<br />
In our case, Θ = ( A , 0)<br />
.<br />
The estimation of parameter Θ is achieved by a maximum likelihood estimator:<br />
Θˆ = argmin∑<br />
ρ ( y − M(<br />
X , Θ))<br />
where ρ is called the M-estimator and corresponds to the<br />
i<br />
i<br />
Θ<br />
maximum likelihood estimation.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
Project Exocet/D page 9/16<br />
This estimation is performed at all the scales of a multi-resolution <strong>image</strong>s pyramid so that at<br />
each scale the estimation is refined.<br />
3.3. Metric conversion - camera self-calibration<br />
The displacement calculated by <strong>image</strong> processing is given in pixels. The aim of video<br />
<strong>mosaic</strong>ing is to provide the <strong>mosaic</strong>s with metric dimension in order to make quantitative<br />
measurements within the <strong>image</strong>s. Thus, the displacement and the <strong>image</strong> size in pixels must<br />
be converted into meters (see Figure 3).<br />
h<br />
f<br />
l<br />
camera<br />
1 pix<br />
L<br />
<strong>image</strong> plane<br />
seabed plane<br />
Figure 3: Link between one pixel and its real metric size on the seabed<br />
Thus we have the relationship:<br />
−1<br />
−1<br />
SizeOfAPixel(<br />
m ⋅ pix )<br />
l( m ⋅ pix ) =<br />
⋅ h(<br />
m)<br />
,<br />
f ( m)<br />
Given that<br />
SizeOfAPixel(<br />
m ⋅ pix<br />
−1<br />
f ( m)<br />
) = , we deduce that:<br />
α(<br />
pix)<br />
−1<br />
h(<br />
m)<br />
l(<br />
mm ⋅ pix ) = , where α is an intrinsic parameter of the camera which has to be<br />
α(<br />
pix)<br />
estimated and h is the altitude of the camera.<br />
Since it is not always possible to deploy a calibration pattern on the seabed, an algorithm for<br />
camera self-calibration has been investigated [PES03].<br />
The self-calibration algorithm allows us to determine the intrinsic and extrinsic parameters of<br />
a vertical camera mounted on an underwater vehicle. This method needs only a sequence of<br />
few <strong>image</strong>s of the seabed and is based upon the determination of the epipolar geometry<br />
between two successive <strong>image</strong> of the sequence.<br />
The diagram below presents all the steps of the self-calibration method:<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
Images of the<br />
observed<br />
scene<br />
Matching<br />
points<br />
Project Exocet/D page 10/16<br />
Points<br />
matches<br />
validation<br />
Scene<br />
geometry<br />
estimation<br />
Extrinsic<br />
parameters<br />
estimation<br />
Figure 4: Steps of camera self-calibration method<br />
3.3.1. Extraction and matching of points<br />
Intrinsic<br />
parameters<br />
estimation<br />
In our application, <strong>image</strong> sequences are composed of small displacements between two<br />
successive <strong>image</strong>s. The extraction and the matching of points are carried out by the KLT<br />
algorihthm which is detailed in paragraph 3.1. This algorithm extracts features in the first<br />
<strong>image</strong> and tracks them across the sequence.<br />
Despite the complexity of underwater <strong>image</strong>s, a great number of features are positively<br />
tracked through the sequence. Moreover, the points are well distributed in the <strong>image</strong>.<br />
3.3.2. Scene geometry<br />
The scene geometry can be represented algebraically by the fundamental matrix F . The<br />
fundamental matrix links the coordinates q and q'<br />
of a same 3D point Q in two <strong>image</strong>s:<br />
'T<br />
q F q = 0 ∀i ∈ [1, n]<br />
i<br />
i<br />
The estimation of the fundamental matrix is based on the Hartley’s normalized 8-point<br />
algorithm and uses the points detected and tracked by the KLT algorithm.<br />
This algorithm requires a set of at least eight matched points q ↔ q . In order to increase<br />
'<br />
the estimation accuracy, a set of about 30 or 50 points can be used. But using more than 8<br />
points leads to the presence of false matches which perturb the estimation of F . So, in<br />
compensation, a criterion has been integrated to validate the points and remove false<br />
matches.<br />
3.3.3. Points validation<br />
The validation of features is carried out by an algorithm based on the RANSAC (RANdom<br />
Sample Consensus) algorithm. The selection of point matches is based on the accuracy of<br />
the equation representing the scene geometry (see 3.3.2). The equation is computed for all<br />
the matched points and allows, estimating the fundamental matrix, to determine the matching<br />
error. A list of good features is then constituted to estimate the best fundamental matrix. As a<br />
result, only the “best” matches are kept to estimate the final fundamental matrix.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
i<br />
i<br />
Grade : 1.0 27/09/2006
3.3.4. Intrinsic parameters estimation<br />
Project Exocet/D page 11/16<br />
There are five intrinsic parameters: focal distance f , scale factors and according to<br />
k<br />
the <strong>image</strong> axes u and<br />
ku v<br />
u0 0 v<br />
v , and coordinates of principal point of the <strong>image</strong> and . The<br />
intrinsic parameters estimation is carried out using the Mendonça and Cipolla algorithm [4]<br />
applied to a set of five <strong>image</strong>s taken at given intervals from a dense sequence. This<br />
algorithm is based on the minimization of a cost function which takes the intrinsic parameters<br />
as arguments and the fundamental matrix as parameters. The cost function is:<br />
C ( K ) =∑∑<br />
σ − σ<br />
n n 1 2<br />
i = 1<br />
w ij<br />
j > i<br />
ij<br />
2<br />
σ ij<br />
ij<br />
With:<br />
• , , , ) u f α = where αu, αv, u0, v0 correspond respectively to the products of the<br />
( 0 0 v<br />
K u v α<br />
scale factors according to the axis u and v by the focal length and to the coordinates<br />
of the intersection of the optical axis with the <strong>image</strong> plane,<br />
• is the degree of confidence of the fundamental matrix estimation,<br />
F<br />
wij ij<br />
1 2<br />
• σ > σ are the non-zero singular values of the essential matrix E .<br />
ij<br />
ij<br />
3.3.5. Extrinsic parameters estimation<br />
The extrinsic parameters are composed of rotations and translations of the camera around<br />
the three axes (twelve parameters).<br />
The extrinsic parameters estimation represents the last step of the self-calibration algorithm.<br />
This part is function of intrinsic parameters and of the fundamental matrix F:<br />
E =<br />
T [] t R = K FK<br />
x<br />
With:<br />
• t : the antisymmetric matric associated to the translation vector t,<br />
[] x<br />
• R : the rotation matrix,<br />
• K : the intrinsic parameters matrix.<br />
The algorithm firstly determines the translation t. Afterwards, the rotation matrix is estimated<br />
by minimizing:<br />
3<br />
∑<br />
i=<br />
1<br />
E − R<br />
i<br />
T<br />
[] t<br />
xi<br />
2<br />
[] xi<br />
Where E and t are the i-th row vectors of matrices E and [ t ] x .<br />
3.4. Experiments<br />
A statistical comparative study of this camera self-calibration method is presented in<br />
[PES03]. Some studies with simulated data have allowed us to show that some trajectories<br />
of the underwater vehicle are more adapted for the intrinsic parameters estimation of the<br />
camera.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
ij<br />
Grade : 1.0 27/09/2006
Project Exocet/D page 12/16<br />
The results obtained with real data show that a rotation around the optical axis with roll and<br />
pitch angles, allows to estimate concurrently all intrinsic parameters of the camera with a<br />
good accuracy. The table below presents errors expressed as a percentage of the parameter<br />
value estimated in this case of movement.<br />
εαu % εαv % εu0 % εv0 %<br />
θz + (θx, θy) 2.06 % 2.06 % 1.65 % 1.19 %<br />
Table 1: Errors in intrinsic parameters estimations<br />
θz: rotation around the optical axis, θx: roll angle and θy : pitch angle<br />
3.5. Fusion with navigation data<br />
When the displacement is computed by <strong>image</strong> processing and when the real geographic size<br />
is provided thanks to altitude measurement and camera parameters, we could think that it is<br />
enough to obtain <strong>mosaic</strong>s well located in the seabed. In fact, <strong>image</strong> processing techniques<br />
imply errors that accumulate during the <strong>mosaic</strong>ing process. In order to reduce these errors<br />
and to obtain a geo-referencing accurately, navigation data can be used and fused with<br />
displacement given by <strong>image</strong> processing. That’s why, to correct the displacements calculated<br />
through an <strong>image</strong> sequence, we have introduced a Kalman filter [WEL94] which is well suited<br />
to problems of estimating the variables of a dynamic system (that varies with time). In<br />
dynamic systems, the system variables are denoted by the term “state variables”.<br />
The question which is addressed by the Kalman filter is “Given our knowledge of the<br />
behaviour of the system, and given our measurements, what is the best estimate of the state<br />
variables?<br />
Mathematically saying, the aim of the Kalman filter is to estimate a posteriori a state vector<br />
−<br />
xˆ in relation to the a priori estimation xˆ and a weighted difference between the<br />
k<br />
−<br />
measurement z at time k and the prediction , where the subscript k denotes the time<br />
H ˆ<br />
step.<br />
k<br />
Thus, at time k, the measurement is known and the estimation a posteriori and the xˆ<br />
−<br />
estimation a priori xˆ<br />
must be calculated.<br />
k +1<br />
k<br />
k k x<br />
zk k<br />
The two equations hereafter link the state vector and the measurement vector.<br />
x = A x + Bu + w<br />
k +1<br />
z = H x + v<br />
k<br />
k<br />
k<br />
k<br />
k<br />
k<br />
k<br />
k<br />
This yields to the two sets of equations of the classic formulation of the Kalman filtering. The<br />
−<br />
first ones are the time update equations to predict the state of the vector xˆ<br />
while the<br />
k +1<br />
second ones are the measurement update equations to correct the state of the vector xˆ .<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006<br />
k
Time update equations (“predict”)<br />
x = A xˆ<br />
+ Bu<br />
− ˆk + 1<br />
−<br />
k +1<br />
k<br />
P = A P A + Q<br />
k<br />
k<br />
k<br />
T<br />
k<br />
k<br />
k<br />
Measurement update equations (“correct”)<br />
K<br />
x ˆ<br />
k<br />
k<br />
− T − T<br />
= P H ( H P H − R )<br />
k k k k k k<br />
−<br />
−<br />
= xˆ<br />
+ K ( z − H xˆ<br />
)<br />
k k k k k<br />
P I K H ) P<br />
−<br />
= ( −<br />
k<br />
k k k<br />
−1<br />
In these formulae, the variables stand for:<br />
Project Exocet/D page 13/16<br />
−<br />
• xˆ : A priori state vector at time k (given the process before step k)<br />
k<br />
• xˆ : A posteriori state vector at time k (given the measurement at time k)<br />
k<br />
• z : Measurement vector at time k<br />
k<br />
• u : Vector of the control entry<br />
k<br />
• A : Matrix linking the states at time k and k+1<br />
k<br />
• B : Matrix linking the control entry to the state vector<br />
• K : Kalman gain matrix<br />
k<br />
−<br />
• P : Matrix of covariance of the prediction error<br />
k<br />
• P : Matrix of covariance of the a posteriori error<br />
k<br />
• H : Matrix linking the state to the measurement<br />
k<br />
• w : Process noise, assumed to be white and gaussian<br />
k<br />
• v : Measurement noise, assumed to be white and gaussian<br />
k<br />
• R : Covariance matrix of the measurement noise<br />
k<br />
• Q : Covariance matrix of the process noise<br />
k<br />
In the case of video <strong>mosaic</strong>ing, the state variables are the positioning (X_utm and Y_utm)<br />
and a term to correct the pixel size. The measurement vector consists of X_utm and Y_utm<br />
given by navigation system.<br />
The experiment we have led is to perform a route with the underwater vehicle and to perform<br />
the same route in the other direction. We can notice in Figure 5 that there is a shift of the<br />
<strong>mosaic</strong> if we don’t use navigation in the algorithm whereas when using dead-reckoning<br />
navigation in the Kalman filter, the <strong>mosaic</strong> drift is well corrected.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
Project Exocet/D page 14/16<br />
(a) (b)<br />
Figure 5: Mosaics obtained without using navigation (a), using Kalman filtering (b)<br />
4. MATISSE SOFTWARE®<br />
4.1. General architecture<br />
The MATISSE Software® [ALL04] has been developed to integrate all the algorithms of georeferenced<br />
<strong>mosaic</strong>ing. It is flexible and can be used with many underwater vehicles as soon<br />
as they are equipped with a down-facing camera and a continuous navigation system (deadreckoning<br />
navigation).<br />
In Figure 6, the diagram represents the condition of use of MATISSE Software® with a ROV.<br />
The video stream and the navigation data are transferred up to the surface via the umbilical<br />
tether. They are processed in-line to produce geo-referenced <strong>mosaic</strong>s. An option consists of<br />
recording the video stream on a video DVD and the navigation data as messages on a CD.<br />
Playing back video and navigation data together makes possible off-line <strong>mosaic</strong> <strong>building</strong>.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
MATISSE<br />
camera<br />
Navigation<br />
MATISSE<br />
Downlooking<br />
camera<br />
Real-time creation of georeferenced<br />
video <strong>mosaic</strong>s<br />
Figure 6: MATISSE Software® used with a ROV<br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
4.2. User interface<br />
Project Exocet/D page 15/16<br />
MATISSE Software® consists of a user-friendly interface which is presented in Figure 7. The<br />
main window (black background) allows the user to control and check <strong>mosaic</strong> processing. On<br />
the left, we can see MATISSE data architecture. On the bottom left hand corner, video<br />
stream is visualized. And the upper part of the interface is dedicated to specified menus and<br />
predefined sets of parameters for <strong>mosaic</strong> creation.<br />
The MATISSE outputs consist of geo-referenced <strong>image</strong>s (tiff <strong>image</strong>s and tfw geo-referencing<br />
files) and of network messages sent when a <strong>mosaic</strong> is created. These messages can be<br />
used for example by a GIS in order to integrate the geo-referenced <strong>mosaic</strong>s on-line with<br />
other geo-referenced data in a dedicated environment.<br />
5. CONCLUSION<br />
Figure 7: MATISSE software® interface<br />
In this report, we have detailed some methods to build geo-referenced <strong>mosaic</strong>s. This has<br />
resulted in the development of a user-friendly software which is used at IFREMER with ROV<br />
victor6000 and has been tested with other underwater vehicles.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006
6. BIBLIOGRAPHY<br />
Project Exocet/D page 16/16<br />
[ALL04] Allais, A.G, Borgetto, M, Opderbecke, J, Pessel, N, Rigaud, V, “Seabed video<br />
<strong>mosaic</strong>king with MATISSE: a technical overview and cruise results”, Proc. of 14 th<br />
International Offshore and Polar Engineering Conference, ISOPE-2004, vol. 2 pp<br />
417-421, Toulon, France, May 23-28, 2004.<br />
[ODO95] Odobez, J.M, Bouthémy, P, “Robust Mutiresolution Estimation of Parametric<br />
Motion Models”, Journal of visual communication and <strong>image</strong> representation, Vol.6,<br />
N°4:348-365, Dec.1995.<br />
[PES03] Pessel, N, Opderbecke, J, Aldon, M.J, “An Experimental Study of a Robust Self-<br />
Calibration Method for a Single Camera”, Proc. of the 3rd International Symposium<br />
on Image and Signal Processing and Analysis, ISPA’2003, Sponsored by IEEE<br />
and EURASIP, Rome, Italy, September 18-20, 2003.<br />
[SHI94] Shi, J, Tomasi, C, “Good features to track,” Proc. IEEE Conference on Computer<br />
Vision and Pattern Recognition, CVPR, Seattle, 1994.<br />
[WEL94] Welch, G, Bishop, G, “An introduction to the Kalman filter”, UNC-CH Computer<br />
Science Technical Report 95-041, 1995.<br />
Deliverable N° <strong>2D</strong>3<br />
Report on <strong>image</strong> <strong>mosaic</strong> <strong>building</strong><br />
DOP/CM/SM/PRAO/06.224<br />
Grade : 1.0 27/09/2006