2D image mosaic building 2D3 - Ifremer

Underwater Systems Department 

A.G. Allais 

27/09/2006 – DOP/CM/SM/PRAO/06.224 

Project Exocet/D 

Deliverable N° 2D3 

Report on image mosaic 

building 

Diffusion: 

P.M. Sarradin DOP/CB/EEP/LEP 

M. Perrier DOP/CM/SM/PRAO 

Confidential 

Restricted 

Public

Date : 27/09/2006 

Reference : DOP/CM/SM/PRAO/06.224 

Analytic N° : E010403A1 

Contract N° : 

Subject/Title : 

Abstract : 

Key-words : 

Number of pages : 16 

Number of figures : 

Number of annex : 

Project Exocet/D 


Report on image mosaic building 

Revisions 

File name : 2D3.doc 

Writer : A.G. Allais 

Grade Object Date Written by Checked by Approved by 

1.0 Creation 14/12/05 A.G. Allais M. Perrier M. Perrier 

CE DOCUMENT, PROPRIETE DE L'IFREMER, NE PEUT ETRE REPRODUIT OU COMMUNIQUE SANS SON AUTORISATION



Project Exocet/D page 3/16 

TABLE OF CONTENTS 

1. INTRODUCTION ..............................................................................................................................4 

2. WHAT IS A GEO-REFERENCED VIDEO MOSAIC?......................................................................4 

3. MOSAICING ALGORITHMS............................................................................................................5 

3.1. KLT algorithm ...........................................................................................................5 

3.1.1. Principle ............................................................................................................5 

3.1.2. Algorithm description ........................................................................................6 

3.1.2.1. Detection of point features............................................................................6 

3.1.2.2. Tracking of features ......................................................................................6 

3.1.2.3. Global displacement computation by least square method ..........................7 

3.2. RMR algorithm..........................................................................................................7 

3.2.1. Principle ............................................................................................................7 

3.2.2. Algorithm description ........................................................................................8 

3.2.2.1. Model of motion ............................................................................................8 

3.2.2.2. Robust estimation .........................................................................................8 

3.3. Metric conversion - camera self-calibration ..............................................................9 

3.3.1. Extraction and matching of points...................................................................10 

3.3.2. Scene geometry..............................................................................................10 

3.3.3. Points validation..............................................................................................10 

3.3.4. Intrinsic parameters estimation.......................................................................11 

3.3.5. Extrinsic parameters estimation......................................................................11 

3.4. Experiments............................................................................................................11 

3.5. Fusion with navigation data ....................................................................................12 

4. MATISSE SOFTWARE® .............................................................................................................. 14 

4.1. General architecture ...............................................................................................14 

4.2. User interface .........................................................................................................15 

5. CONCLUSION .............................................................................................................................. 15 

6. BIBLIOGRAPHY ........................................................................................................................... 16 

DOP/CM/SM/PRAO/06.224 

Grade : 1.0 27/09/2006

1. INTRODUCTION 


In this report, we address the issue of managing seabed video records. During at sea trials, a 

lot of video is recorded by scientists who need it to analyze the ocean floor. Because of the 

storage capacity increase, the number of video records always increases during at sea trials. 

That leads scientists to spend more and more time on analyzing video records. The aim of 

video mosaicing is then to propose a tool to obtain a map whose extend is far larger than the 

camera field of view to enable the scientist to have a global view of the scene, and at the 

same time to reduce and compress the video storage capacity. 

2. WHAT IS A GEO-REFERENCED VIDEO MOSAIC? 

In order to simplify the work of the scientists concerning the exploitation of the numerous 

video DVD’s resulting from at sea trials, we have been led to develop a tool to provide a 

larger view of the seabed than the restricted field of view of a video camera. The aim of video 

mosaicing is then to build images whose extend is far larger than the snapshot of a video 

recording. The resulting image represents a larger area of the seabed and is called a mosaic. 

The principle used to build mosaics is quite simple to understand. When the video stream is 

acquired, it can be seen as a succession of images which have a great part in common. The 

idea is to estimate the part which has been added from one image to another. The new part 

of the current image is added and merged with the previous image. Every N images, a 

mosaic is built and another one can begin. This step is performed by image processing 

techniques. 

The other main issue of video mosaicing is to locate the mosaics on the seabed so that the 

scientists can deal the mosaics with other geo-referenced data such as bathymetry, samples, 

physical or chemical data. This can be done by two ways. In the simplest case, the operator 

gives the positioning, heading and altitude of the first point and then the mosaic location is 

calculated by image processing. But this way is not the best one since errors due to image 

processing occur and accumulate through the whole process. So, to overcome this 

drawback, the other means consists in merging navigation data with image. 

The whole process is developed in the following part but is summarized in the sketch 

hereafter (Figure 1). 




Grade : 1.0 27/09/2006

Image1 

Image2 

Image3 

……… 

3. MOSAICING ALGORITHMS 


(X0,Y0) 

1 mosaic built 

from 3 images 

(X0,Y0) (X1,Y1) 

Figure 1: Illustration of video mosaicing 

2 mosaics built 

from 3 images each 

Many image processing techniques allow us to calculate the geometric relationship between 

two images. Hereafter two different methods have been investigated and integrated into the 

MATISSE Software® which has been developed at IFREMER. The first one relies on a 

feature tracking algorithm (KLT) while the second one, the RMR method, estimates the 

movement by a robust optical flow algorithm. Both algorithms estimate a displacement in 

pixels between two successive images. In order to provide the mosaics with metric 

dimension, the displacement in pixels needs to be converted into meters. This step is 

performed using the camera parameters and the altitude provided by navigation. 

3.1. KLT algorithm 

3.1.1. Principle 

This algorithm is based upon researches performed by Kanade, Lucas and Tomasi (KLT). It 

consists in detecting and tracking point features through an image stream [SHI94]. The 

points are selected if they meet a criterion which characterizes a locally textured area. Then, 

the points are tracked through the sequence. When a list of matched points is obtained from 

successive images, a global displacement is computed to register the successive images 

and to build a mosaic. 




Grade : 1.0 27/09/2006

3.1.2. Algorithm description 

3.1.2.1.Detection of point features 


According to the KLT algorithm, a feature is selected if it is easily “trackable” through an 

image stream. So, the features consist of windows of several pixel side, which are locally 

textured. More precisely, a window is selected if its mean gradient is high enough and 

without any particular direction. 

Mathematically saying, we consider the gradient vector within a window of typically 7x7 

pixels: 

⎡g 

x ⎤ 

g = ⎢ ⎥ 

⎣ 

gy 

⎦ 

We note Z the following matrix: 

2 

⎡ g x 

Z = ∫∫ ⎢ 

W ⎢⎣ 

g g x y 

g g ⎤ x y 

⎥ ⋅ω 

⋅ dA 

2 

gy 

⎥⎦ 

Where: 

• g is the image intensity gradient along the x-axis, 

x 

• g is the image intensity gradient along the y-axis, 

y 

• W is the computation window, 

• ω is a weigh function, 

• dA is the area of the computation window. 

This method relies on the fact that the Z-matrix eigenvalues are directly linked to the texture 

of the area they are computed in. 

If we try to give a simple meaning of the eigenvalues, we can say that two small eigenvalues 

represent a non-textured area whereas a high eigenvalue and a small one are characteristic 

of an area with one specific direction. In our case, only the textured but with no particular 

direction windows must be selected, that’s to say we are interested in areas having both high 

eigenvalues. So, within an image, small areas (7x7 pixels, for instance) will be selected only 

if both eigenvalues are greater than a given threshold. 

3.1.2.2.Tracking of features 

The features that have been selected in the first part of the algorithm are then tracked 

through the video stream or the image sequence. The displacement between two successive 

images is supposed to be quite small since the camera moves slowly. So, nearly all the 

points of an image I are also in the next image J and they are linked by a translation vector 

d . 

Now, let’s give the relationship linking two images between two moments. 

Let be I ( x ξ, y −η, 

t ) 

I x, 

y, 

t + τ the image at t + τ . 

− the image at t and ( ) 

Put J ( x ) = I( 

x, 

y, 

t + τ ) and I ( x − d) 

= I( 

x − ξ, y −η, 

t ) where = ( ξ, η) 

vector of the point x = ( x, y ) between two time steps t and t + τ . 




d is the displacement 

Grade : 1.0 27/09/2006

We can note that J ( x) = I( 

x − d) 

+ n( 

x) 

where ( x) 

For each window W , d is obtained by minimizing [ ( ) ] 

That leads to solve the following equation: 

Gd = e 

With: 

t 

G = ∫∫ gg ⋅ω 

⋅ dA 

e 

∫∫ 

W 

= W 

( − J ) 

I ⋅ g ⋅ω 

⋅ dA 


n represents the noise. 

∫∫ 

W 

2 

n x ⋅ω 

⋅ dA . 

This equation is solved for each selected window. So, for each window selected in the first 

image, we can compute the local displacement between the first image and the second one. 

The steps of point detection and tracking of the KLT algorithm are illustrated in Figure 2. 

Figure 2: Detection and tracking of points in a sequence of coral reef 

3.1.2.3.Global displacement computation by least square method 

When the points are matched between two successive images, a global displacement is 

computed in order to register the images. 

The displacement is modelled as a 4-parameter rigid global 2D transformation, that’s to say a 

transformation composed of a translation, a rotation and a scale factor. 

This global displacement is computed by the iterative least square method. Each iteration 

enables to refine the result. Besides, this method is completed by an acceptance criterion 

which is used to validate the matches and to strike off the false matches in order to make the 

computation more robust. 

3.2. RMR algorithm 

3.2.1. Principle 

The second method we have investigated to build mosaics is the Robust Multi-Resolution 

(RMR) method which is based upon the estimation of the optical flow [ODO95]. The 

advantage of this method is that the motion is estimated from the whole image. 




Grade : 1.0 27/09/2006


In the RMR algorithm, the first stage consists in choosing a motion model. Then, the aim is to 

estimate the parameters of the model using the classic method of robust estimation in the 

domain of image and signal processing. This step is combined to a coarse-to-fine estimation 

using multi-resolution levels of images. 

3.2.2. Algorithm description 

3.2.2.1.Model of motion 

In the first step of the algorithm, a motion model is chosen. In the RMR algorithm, we 

consider the class of 2D polynomial motion models and we deal only with the 2D affine 

model which is not too complex but enough representative of a large part of motion 

transformations: 

⎧u( 

X ) = a + a x + a y 

i 1 2 i 3 i 

⎨ 

⎩v( 

X ) = a + a x + a y 

i 4 5 i 6 i 

And with matrix notation, it can be stated as: 

⎡u( 

X ) i ⎤ 

( X ) = B( 

X ) A 

i ⎢ ⎥ = 

⎣v( 

X ) i ⎦ 

V i 

Where 

A ( a a a a a a 

1 2 3 4 5 6 

T = 

⎡1 

x y 0 0 0 

i i 

⎤ 

And B = B( 

X ) = 

i 

i ⎢ 

⎥ 

⎣0 

0 0 1 x y i i ⎦ 

) 

For each point X , one can write the flow constraint equation linking spatial and temporal 

intensity gradients: 

( X ) ∇I( 

X ) + I ( X ) = 0 

V , 

I 

x 

i 

⋅ i t i 

i 

( X ) u( 

X ) I ( X ) v( 

X ) + I ( X ) = 0 

i 

i 

+ y i i t i 

where ( ) is the spatial gradient vector of the intensity, is the partial temporal 

i 

derivative of the intensity relative to time and is the vector field. 

X I ∇ ) ( t i X I 

( ) i X V 

3.2.2.2.Robust estimation 

The goal of robust estimation is to find the parameter vector Θ which best fits the model 

( X , Θ) 

to the observations . y 

M i 

i 

t t 

In our case, Θ = ( A , 0) 

. 

The estimation of parameter Θ is achieved by a maximum likelihood estimator: 

Θˆ = argmin∑ 

ρ ( y − M( 

X , Θ)) 

where ρ is called the M-estimator and corresponds to the 

i 

i 

Θ 

maximum likelihood estimation. 




Grade : 1.0 27/09/2006


This estimation is performed at all the scales of a multi-resolution images pyramid so that at 

each scale the estimation is refined. 

3.3. Metric conversion - camera self-calibration 

The displacement calculated by image processing is given in pixels. The aim of video 

mosaicing is to provide the mosaics with metric dimension in order to make quantitative 

measurements within the images. Thus, the displacement and the image size in pixels must 

be converted into meters (see Figure 3). 

h 

f 

l 

camera 

1 pix 

L 

image plane 

seabed plane 

Figure 3: Link between one pixel and its real metric size on the seabed 

Thus we have the relationship: 

−1 

−1 

SizeOfAPixel( 

m ⋅ pix ) 

l( m ⋅ pix ) = 

⋅ h( 

m) 

, 

f ( m) 

Given that 

SizeOfAPixel( 

m ⋅ pix 

−1 

f ( m) 

) = , we deduce that: 

α( 

pix) 

−1 

h( 

m) 

l( 

mm ⋅ pix ) = , where α is an intrinsic parameter of the camera which has to be 

α( 

pix) 

estimated and h is the altitude of the camera. 

Since it is not always possible to deploy a calibration pattern on the seabed, an algorithm for 

camera self-calibration has been investigated [PES03]. 

The self-calibration algorithm allows us to determine the intrinsic and extrinsic parameters of 

a vertical camera mounted on an underwater vehicle. This method needs only a sequence of 

few images of the seabed and is based upon the determination of the epipolar geometry 

between two successive image of the sequence. 

The diagram below presents all the steps of the self-calibration method: 




Grade : 1.0 27/09/2006

Images of the 

observed 

scene 

Matching 

points 


Points 

matches 

validation 

Scene 

geometry 

estimation 

Extrinsic 

parameters 

estimation 

Figure 4: Steps of camera self-calibration method 

3.3.1. Extraction and matching of points 

Intrinsic 

parameters 

estimation 

In our application, image sequences are composed of small displacements between two 

successive images. The extraction and the matching of points are carried out by the KLT 

algorihthm which is detailed in paragraph 3.1. This algorithm extracts features in the first 

image and tracks them across the sequence. 

Despite the complexity of underwater images, a great number of features are positively 

tracked through the sequence. Moreover, the points are well distributed in the image. 

3.3.2. Scene geometry 

The scene geometry can be represented algebraically by the fundamental matrix F . The 

fundamental matrix links the coordinates q and q' 

of a same 3D point Q in two images: 

'T 

q F q = 0 ∀i ∈ [1, n] 

i 

i 

The estimation of the fundamental matrix is based on the Hartley’s normalized 8-point 

algorithm and uses the points detected and tracked by the KLT algorithm. 

This algorithm requires a set of at least eight matched points q ↔ q . In order to increase 

' 

the estimation accuracy, a set of about 30 or 50 points can be used. But using more than 8 

points leads to the presence of false matches which perturb the estimation of F . So, in 

compensation, a criterion has been integrated to validate the points and remove false 

matches. 

3.3.3. Points validation 

The validation of features is carried out by an algorithm based on the RANSAC (RANdom 

Sample Consensus) algorithm. The selection of point matches is based on the accuracy of 

the equation representing the scene geometry (see 3.3.2). The equation is computed for all 

the matched points and allows, estimating the fundamental matrix, to determine the matching 

error. A list of good features is then constituted to estimate the best fundamental matrix. As a 

result, only the “best” matches are kept to estimate the final fundamental matrix. 




i 

i 

Grade : 1.0 27/09/2006

3.3.4. Intrinsic parameters estimation 


There are five intrinsic parameters: focal distance f , scale factors and according to 

k 

the image axes u and 

ku v 

u0 0 v 

v , and coordinates of principal point of the image and . The 

intrinsic parameters estimation is carried out using the Mendonça and Cipolla algorithm [4] 

applied to a set of five images taken at given intervals from a dense sequence. This 

algorithm is based on the minimization of a cost function which takes the intrinsic parameters 

as arguments and the fundamental matrix as parameters. The cost function is: 

C ( K ) =∑∑ 

σ − σ 

n n 1 2 

i = 1 

w ij 

j > i 

ij 

2 

σ ij 

ij 

With: 

• , , , ) u f α = where αu, αv, u0, v0 correspond respectively to the products of the 

( 0 0 v 

K u v α 

scale factors according to the axis u and v by the focal length and to the coordinates 

of the intersection of the optical axis with the image plane, 

• is the degree of confidence of the fundamental matrix estimation, 

F 

wij ij 

1 2 

• σ > σ are the non-zero singular values of the essential matrix E . 

ij 

ij 

3.3.5. Extrinsic parameters estimation 

The extrinsic parameters are composed of rotations and translations of the camera around 

the three axes (twelve parameters). 

The extrinsic parameters estimation represents the last step of the self-calibration algorithm. 

This part is function of intrinsic parameters and of the fundamental matrix F: 

E = 

T [] t R = K FK 

x 

With: 

• t : the antisymmetric matric associated to the translation vector t, 

[] x 

• R : the rotation matrix, 

• K : the intrinsic parameters matrix. 

The algorithm firstly determines the translation t. Afterwards, the rotation matrix is estimated 

by minimizing: 

3 

∑ 

i= 

1 

E − R 

i 

T 

[] t 

xi 

2 

[] xi 

Where E and t are the i-th row vectors of matrices E and [ t ] x . 

3.4. Experiments 

A statistical comparative study of this camera self-calibration method is presented in 

[PES03]. Some studies with simulated data have allowed us to show that some trajectories 

of the underwater vehicle are more adapted for the intrinsic parameters estimation of the 

camera. 




ij 

Grade : 1.0 27/09/2006


The results obtained with real data show that a rotation around the optical axis with roll and 

pitch angles, allows to estimate concurrently all intrinsic parameters of the camera with a 

good accuracy. The table below presents errors expressed as a percentage of the parameter 

value estimated in this case of movement. 

εαu % εαv % εu0 % εv0 % 

θz + (θx, θy) 2.06 % 2.06 % 1.65 % 1.19 % 

Table 1: Errors in intrinsic parameters estimations 

θz: rotation around the optical axis, θx: roll angle and θy : pitch angle 

3.5. Fusion with navigation data 

When the displacement is computed by image processing and when the real geographic size 

is provided thanks to altitude measurement and camera parameters, we could think that it is 

enough to obtain mosaics well located in the seabed. In fact, image processing techniques 

imply errors that accumulate during the mosaicing process. In order to reduce these errors 

and to obtain a geo-referencing accurately, navigation data can be used and fused with 

displacement given by image processing. That’s why, to correct the displacements calculated 

through an image sequence, we have introduced a Kalman filter [WEL94] which is well suited 

to problems of estimating the variables of a dynamic system (that varies with time). In 

dynamic systems, the system variables are denoted by the term “state variables”. 

The question which is addressed by the Kalman filter is “Given our knowledge of the 

behaviour of the system, and given our measurements, what is the best estimate of the state 

variables? 

Mathematically saying, the aim of the Kalman filter is to estimate a posteriori a state vector 

− 

xˆ in relation to the a priori estimation xˆ and a weighted difference between the 

k 

− 

measurement z at time k and the prediction , where the subscript k denotes the time 

H ˆ 

step. 

k 

Thus, at time k, the measurement is known and the estimation a posteriori and the xˆ 

− 

estimation a priori xˆ 

must be calculated. 

k +1 

k 

k k x 

zk k 

The two equations hereafter link the state vector and the measurement vector. 

x = A x + Bu + w 

k +1 

z = H x + v 

k 

k 

k 

k 

k 

k 

k 

k 

This yields to the two sets of equations of the classic formulation of the Kalman filtering. The 

− 

first ones are the time update equations to predict the state of the vector xˆ 

while the 

k +1 

second ones are the measurement update equations to correct the state of the vector xˆ . 




Grade : 1.0 27/09/2006 

k

Time update equations (“predict”) 

x = A xˆ 

+ Bu 

− ˆk + 1 

− 

k +1 

k 

P = A P A + Q 

k 

k 

k 

T 

k 

k 

k 

Measurement update equations (“correct”) 

K 

x ˆ 

k 

k 

− T − T 

= P H ( H P H − R ) 

k k k k k k 

− 

− 

= xˆ 

+ K ( z − H xˆ 

) 

k k k k k 

P I K H ) P 

− 

= ( − 

k 

k k k 

−1 

In these formulae, the variables stand for: 


− 

• xˆ : A priori state vector at time k (given the process before step k) 

k 

• xˆ : A posteriori state vector at time k (given the measurement at time k) 

k 

• z : Measurement vector at time k 

k 

• u : Vector of the control entry 

k 

• A : Matrix linking the states at time k and k+1 

k 

• B : Matrix linking the control entry to the state vector 

• K : Kalman gain matrix 

k 

− 

• P : Matrix of covariance of the prediction error 

k 

• P : Matrix of covariance of the a posteriori error 

k 

• H : Matrix linking the state to the measurement 

k 

• w : Process noise, assumed to be white and gaussian 

k 

• v : Measurement noise, assumed to be white and gaussian 

k 

• R : Covariance matrix of the measurement noise 

k 

• Q : Covariance matrix of the process noise 

k 

In the case of video mosaicing, the state variables are the positioning (X_utm and Y_utm) 

and a term to correct the pixel size. The measurement vector consists of X_utm and Y_utm 

given by navigation system. 

The experiment we have led is to perform a route with the underwater vehicle and to perform 

the same route in the other direction. We can notice in Figure 5 that there is a shift of the 

mosaic if we don’t use navigation in the algorithm whereas when using dead-reckoning 

navigation in the Kalman filter, the mosaic drift is well corrected. 




Grade : 1.0 27/09/2006


(a) (b) 

Figure 5: Mosaics obtained without using navigation (a), using Kalman filtering (b) 

4. MATISSE SOFTWARE® 

4.1. General architecture 

The MATISSE Software® [ALL04] has been developed to integrate all the algorithms of georeferenced 

mosaicing. It is flexible and can be used with many underwater vehicles as soon 

as they are equipped with a down-facing camera and a continuous navigation system (deadreckoning 

navigation). 

In Figure 6, the diagram represents the condition of use of MATISSE Software® with a ROV. 

The video stream and the navigation data are transferred up to the surface via the umbilical 

tether. They are processed in-line to produce geo-referenced mosaics. An option consists of 

recording the video stream on a video DVD and the navigation data as messages on a CD. 

Playing back video and navigation data together makes possible off-line mosaic building. 



MATISSE 

camera 

Navigation 

MATISSE 

Downlooking 

camera 

Real-time creation of georeferenced 

video mosaics 

Figure 6: MATISSE Software® used with a ROV 


Grade : 1.0 27/09/2006

4.2. User interface 


MATISSE Software® consists of a user-friendly interface which is presented in Figure 7. The 

main window (black background) allows the user to control and check mosaic processing. On 

the left, we can see MATISSE data architecture. On the bottom left hand corner, video 

stream is visualized. And the upper part of the interface is dedicated to specified menus and 

predefined sets of parameters for mosaic creation. 

The MATISSE outputs consist of geo-referenced images (tiff images and tfw geo-referencing 

files) and of network messages sent when a mosaic is created. These messages can be 

used for example by a GIS in order to integrate the geo-referenced mosaics on-line with 

other geo-referenced data in a dedicated environment. 

5. CONCLUSION 

Figure 7: MATISSE software® interface 

In this report, we have detailed some methods to build geo-referenced mosaics. This has 

resulted in the development of a user-friendly software which is used at IFREMER with ROV 

victor6000 and has been tested with other underwater vehicles. 




Grade : 1.0 27/09/2006

6. BIBLIOGRAPHY 


[ALL04] Allais, A.G, Borgetto, M, Opderbecke, J, Pessel, N, Rigaud, V, “Seabed video 

mosaicking with MATISSE: a technical overview and cruise results”, Proc. of 14 th 

International Offshore and Polar Engineering Conference, ISOPE-2004, vol. 2 pp 

417-421, Toulon, France, May 23-28, 2004. 

[ODO95] Odobez, J.M, Bouthémy, P, “Robust Mutiresolution Estimation of Parametric 

Motion Models”, Journal of visual communication and image representation, Vol.6, 

N°4:348-365, Dec.1995. 

[PES03] Pessel, N, Opderbecke, J, Aldon, M.J, “An Experimental Study of a Robust Self- 

Calibration Method for a Single Camera”, Proc. of the 3rd International Symposium 

on Image and Signal Processing and Analysis, ISPA’2003, Sponsored by IEEE 

and EURASIP, Rome, Italy, September 18-20, 2003. 

[SHI94] Shi, J, Tomasi, C, “Good features to track,” Proc. IEEE Conference on Computer 

Vision and Pattern Recognition, CVPR, Seattle, 1994. 

[WEL94] Welch, G, Bishop, G, “An introduction to the Kalman filter”, UNC-CH Computer 

Science Technical Report 95-041, 1995. 




Grade : 1.0 27/09/2006

2D image mosaic building 2D3 - Ifremer

Create successful ePaper yourself

Delete template?

Save as template?