Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF
Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF
Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Multiple</strong> <strong>Sensor</strong> <strong>Multiple</strong> <strong>Object</strong> <strong>Tracking</strong> <strong>With</strong><br />
<strong>GMPHD</strong> <strong>Filter</strong><br />
Nam Trung Pham, Weimin Huang<br />
Institute for Infocomm Research, Singapore<br />
Email: {stuntp,wmhuang}@i2r.a-star.edu.sg<br />
S. H. Ong<br />
Department of Electrical and Computer Engineering<br />
National University of Singapore<br />
Email: eleongsh@nus.edu.sg<br />
Abstract— <strong>Tracking</strong> objects using multiple sensors is more<br />
efcient than those using one sensor. In this paper, we proposed<br />
a method to fuse data from multiple sensors in Gaussian mixture<br />
probability hypothesis density lter. This method can avoid the<br />
data association problem in multi-sensor multi-object tracking.<br />
Moreover, it is more reliable and less computational than particle<br />
probability hypothesis density lter for multi-sensor multi-object<br />
tracking. We demonstrated the efcient of the approach by<br />
applications such as bearing and range tracking, and multiple<br />
speaker tracking.<br />
Keywords: Random nite set, Gaussian mixture probability<br />
hypothesis density, bearing and range tracking, speaker<br />
tracking.<br />
I. INTRODUCTION<br />
Multi-sensor multi-object tracking has received many attentions<br />
in recent years. In the multi-sensor tracking system,<br />
data fusion techniques combine data from multiple sensors<br />
to obtain the state estimates of objects. The performance of<br />
the tracking system can be improved by fusing data from<br />
multi-sensor [1], [2]. However, the multi-sensor multi-object<br />
tracking problem is challenging. These challenges are varying<br />
number of objects, complexity in data association between<br />
observations and objects.<br />
Many data fusion approaches have been developed for<br />
multi-sensor multi-object tracking in recent years. Two main<br />
approaches are sensor-level fusion and feature-level fusion.<br />
These approaches correspond to two levels of data association.<br />
In the sensor-level fusion approach, observations from objects<br />
are used to track objects at each sensor. These tracks are<br />
associated and fused to obtain the state estimates by using<br />
methods such as interaction multiple model [3], joint probability<br />
data association [4], multiple hypothesis tracking [5].<br />
Some people employed the sensor-level fusion approach for<br />
tracking [1], [2], [3], [6]. The second approach is feature-level<br />
fusion. In this approach, all observations from multiple sensors<br />
are sent to the fusion center. Then, the fusion center associates<br />
these observations with objects to obtain state estimates. Some<br />
methods used this approach for multi-sensor multi-object<br />
tracking [7], [8]. However, up to now, methods that are based<br />
on two these approaches are computationally intensive because<br />
they have to solve the data association problem.<br />
Recently, random set approaches gave a new direction<br />
for multi-sensor multi-object tracking. Here, the states of<br />
objects are represented as random sets. Using this model,<br />
the birth and death of objects can be described in the tracking<br />
algorithm. Moreover, measurements and false alarms are<br />
also represented as random sets in the observation model.<br />
Mahler [9] employed the random set framework to propose a<br />
probability hypothesis density (PHD) lter. This method can<br />
avoid the data association between observations and objects.<br />
Some implementations of PHD lter are proposed by using the<br />
sequential Monte Carlo (SMC) method [10], [11]. Especially,<br />
the implementation in [10] has the convergence proof, and<br />
it is called particle PHD lter. In these implementations, the<br />
state estimates are extracted from particles representing the<br />
posterior intensity by using clustering techniques. Vo [12]<br />
proposed a close-form for PHD lter with assumptions on<br />
linear Gaussian system. It is called <strong>GMPHD</strong> lter. This method<br />
reduced a lot computation compared with particle PHD lter.<br />
For multi-sensor multi-object tracking, there are some methods<br />
to fuse data from multi-sensor in random set approaches<br />
such as multiplication likelihood function from sensors [13] or<br />
sequential sensor updating [9], [14]. These methods can track<br />
varying number of objects with multi-sensor. However, they<br />
are implemented based on sequential Monte Carlo, so they<br />
need a lot computation.<br />
In this paper, we proposed a method for multi-sensor multiobject<br />
tracking based on <strong>GMPHD</strong> lter. We extended the<br />
<strong>GMPHD</strong> lter from one sensor to multi-sensor. The way we<br />
choose to fuse data from multi-sensor is sequential sensor<br />
updating in [9], [14]. Our method can collaborate information<br />
from multiple sensors and avoid the data association<br />
between observations and objects. In addition, we applied our<br />
method in bearing and range tracking, and multiple speaker<br />
tracking. For bearing and range tracking, we proved that<br />
our method can fuse data from multiple sensors to obtain<br />
the better performance than tracking multi-object with one<br />
sensor. For multiple speaker tracking, our method reduced a lot<br />
computation compared with methods using data association or<br />
particle PHD lter in multiple speaker tracking such as [13],<br />
[14] and [15]. Moreover, our method has reliable estimations<br />
of speaker positions.<br />
The paper is organized as follows. In the section II, we<br />
formulate the multi-sensor multi-object tracking problem in<br />
random nite set model. In the section III, the PHD lter<br />
approach is reviewed. In section IV, we extend <strong>GMPHD</strong> lter<br />
from one sensor to multi-sensor and implementation issues are<br />
discussed. Finally, some experimental results in bearing and
ange tracking, and multiple speaker tracking are presented in<br />
the section V.<br />
II. PROBLEM FORMULATION<br />
The multi-sensor multi-object tracking problem can be<br />
modelled by random nite set (RFS) framework. Let X be<br />
the single object state space then multiple object state at<br />
time k is presented by X k = fx k;1 ; x k;2 :::; x k;Nk g 2 F(X ),<br />
where F(X ) denotes the collection of all nite subsets of the<br />
space X . For a multi-object state X k 1 at time k 1, each<br />
x k 1 2 X k 1 can continue to exist at time k with probability<br />
p S;k or die at time k with probability (1 p S;k ). Let S k (x k 1 )<br />
denote the object that is the transition from x k 1 at time k on<br />
condition that the object is survived and let B kjk 1 (x k 1 ) be<br />
objects spawned at time k from an object with previous state<br />
x k 1 . Let k be RFS of spontaneous births at time k and<br />
can be determined by using the assumption of spontaneous<br />
birth models. Given a multi-object state X k 1 at time k 1,<br />
the multi-object state X k at time k is given by union of the<br />
surviving objects and new objects,<br />
X k =<br />
h[ h[ i<br />
Sk (x k 1 )i<br />
[ Bkjk 1 (x k 1 )<br />
[ [ k] (1)<br />
The RFS X k encapsulates all aspects of multi-object tracking<br />
problem, such as time varying number of objects, object<br />
motion.<br />
Similarly, let Z i be the measurement space of single object<br />
at sensor i then measurements collected at sensor i time k is<br />
Zk i 2 F(Zi ). A given object state x k 2 X k is either detected<br />
with probability p D or missed with probability (1 p D ).<br />
Conditional on detection, the measurement from x k at sensor<br />
i is dened by the RFS i k (x k). The sensor i also can receive<br />
a set of clutters Ck i . So, given a multi-object state X k at time<br />
k, the measurement set from sensor i at time k is formed by<br />
the union of object generated measurements and clutters,<br />
" #<br />
[<br />
Zk i = i k(x k ) [ Ck i (2)<br />
x k 2X k<br />
Assuming that we have Q sensors, the RFS of measurements<br />
at time k is modelled by<br />
h<br />
i<br />
Z k = Zk; 1 Zk; 2 : : : ; Z Q k<br />
(3)<br />
The RFS Z k encapsulates all sensor characteristics such as<br />
measurement noise, sensor eld of view, clutter.<br />
The multi-sensor multi-object tracking can be posed as<br />
follows: given set of measurement Z 1:k collected from sensors<br />
up to time k, the problem is to nd ^X k is expectation or<br />
maximization of the posterior density function p(X k jZ 1:k ).<br />
III. PROBABILITY HYPOTHESIS DENSITY APPROACH<br />
In multiple object tracking problem, we usually need to<br />
obtain the posterior density p(X k jZ 1:k ). When the number<br />
of object increases, the multiple object state space become<br />
large. Hence, it is difcult to obtain the posterior density function.<br />
Fortunately, this density function can be approximately<br />
recovered from the probability hypothesis density (PHD) [9].<br />
The PHD is dened as follows. For a random nite set X on<br />
X with probability distribution P , the PHD is the density v(x)<br />
such that for each region S X , the integral of v over region<br />
S gives the expected number of elements of X that are in S,<br />
Z<br />
Z<br />
j X \ S j P (dX) = v(x)dx; (4)<br />
Thus, instead of estimating states of objects from posterior<br />
density, we can estimate them by investigating peaks of PHD.<br />
It helps to reduce from searching in multiple object state space<br />
to single object state space.<br />
IV. GAUSSIAN MIXTURE PROBABILITY HYPOTHESIS<br />
DENSITY FILTER IN MULTI-SENSOR MULTI-OBJECT<br />
A. Assumptions<br />
TRACKING<br />
First, there are some assumptions. The transition function<br />
of each object follows a linear Gaussian model, i.e.,<br />
f kjk 1 (xj) = N(x; F k 1 ; Q k 1 ) (5)<br />
where N(:; m; P ) denotes a Gaussian density with mean m<br />
and covariance P , F k 1 is the state transition matrix, Q k 1<br />
is the process noise covariance. There are Q sensors, the<br />
likelihood function at each sensor is also a linear Gaussian<br />
model, i.e.,<br />
g i k(zjx) = N(z; H i kx; R i k) (6)<br />
where H i k is the observation matrix of the sensor i, and Ri k is<br />
the observation noise covariance of the sensor i. The survival<br />
and detection probabilities are<br />
S<br />
p S;k (x) = p S;k (7)<br />
p D;k (x) = p D;k (8)<br />
The intensity of the spontaneous birth RFS is<br />
J ;k<br />
X<br />
k (x) =<br />
i=1<br />
w (i)<br />
;kN(x; m(i)<br />
;k ; P (i)<br />
;k ) (9)<br />
where J ;k is the number of birth Gaussian components at<br />
time k, and w (i)<br />
;k<br />
is the weight for i-th Gaussian component.<br />
The posterior intensity at time k 1 is a Gaussian mixture of<br />
the form<br />
v k 1 (x) =<br />
J<br />
X k 1<br />
i=1<br />
w (i)<br />
k 1<br />
N(x; m(i)<br />
k 1 ; P (i)<br />
k 1 ) (10)<br />
where J k 1 is the number of Gaussian components of posterior<br />
intensity at time k 1, and w (i)<br />
k 1<br />
is the weight for i-th<br />
Gaussian component.<br />
B. <strong>GMPHD</strong> lter with one sensor<br />
Vo [12] proposed a closed form expression of the PHD<br />
lter for linear Gaussian multi-object tracking, called the<br />
Gaussian mixture probability hypothesis density lter. Under<br />
assumptions in IV-A, the initial prior intensity is a Gaussian<br />
mixture, the posterior intensity at any subsequent time step is
1 (x) with measurement set Z1 k by equation (13) to obtain<br />
also a Gaussian mixture. In the case there is one sensor, this the PHD at time k sensor 1, vk 1(x). Because v k 1(x) is a<br />
method can be employed.<br />
Gaussian mixture, vk 1 (x) is also a Gaussian mixture and has<br />
In what follows, we outline the <strong>GMPHD</strong> lter with assumptions<br />
the form<br />
vkjk 1 Jk 1 = (J k 1 + J ;k )(1 + jZkj) 1 (20)<br />
there are no spawning objects. (In the case there<br />
JkX<br />
1<br />
are spawning objects, the prediction equation is modied<br />
vk(x) 1 = w (i)<br />
1;kN(x; m(i)<br />
1;k ; P (i)<br />
1;k ) (14)<br />
by adding Gaussian components representing for spawning<br />
i=1<br />
objects. The details is in [12]).<br />
Now, at the sensor 2, we use vk 1 (x) as the predicted PHD for<br />
Under assumptions in IV-A, the predicted intensity to time the sensor 2 and in the similar way to (13), we have<br />
k is given by<br />
v<br />
v kjk 1 (x) = v S;kjk 1 (x) + k (x) (11)<br />
k(x) 2 = (1 p D;k )vk(x) 1 + X<br />
v D;k (x; z) (15)<br />
z2Zk<br />
2<br />
where<br />
So, v<br />
J<br />
X k 1<br />
k 2 (x) also have the Gaussian mixture form.<br />
v S;kjk 1 (x) = p S;k w (j)<br />
k 1N(x; m(j)<br />
S;kjk 1 ; P (j)<br />
S;kjk 1 );<br />
JkX<br />
2<br />
j=1<br />
vk(x) 2 = w (i)<br />
2;kN(x; m(i)<br />
2;k ; P (i)<br />
2;k ) (16)<br />
m (j)<br />
S;kjk 1<br />
= F k 1 m (j)<br />
k 1 ;<br />
i=1<br />
P (j)<br />
S;kjk 1<br />
= Q k 1 + F k 1 P (j)<br />
k 1 F k T We repeat this process with Q sensors. At the Qth sensor, we<br />
1:<br />
obtained v Q k<br />
(x), and it has the form<br />
Because v S;kjk 1 (x) and k (x) are Gaussian mixtures,<br />
J<br />
v kjk 1 (x) can be expressed as a Gaussian mixture of the form<br />
kX<br />
Q<br />
v Q k (x) = w (i)<br />
Q;kN(x; m(i)<br />
Q;k ; P (i)<br />
Q;k ) (17)<br />
J kjk<br />
X 1<br />
v kjk 1 (x) = w (i)<br />
kjk 1N(x; m(i)<br />
kjk 1 ; P (i)<br />
i=1<br />
kjk 1 ) (12) The PHD for the multi-sensor multi-object posterior density<br />
i=1<br />
will be<br />
Then, the posterior intensity at time k is also a Gaussian<br />
v k (x) = v Q k<br />
(x) (18)<br />
mixture, and is given by<br />
v k (x) = (1 p D;k )v kjk 1 (x) + X<br />
The number of objects is estimated by<br />
v D;k (x; z) (13)<br />
Z<br />
z2Z k ^N kjk = v k (x)dx<br />
where<br />
J kjk<br />
X 1<br />
Z JkX<br />
Q<br />
v D;k (x; z) = w (j)<br />
k<br />
(z)N(x; m(j)<br />
kjk ; P (j)<br />
kjk );<br />
= w (i)<br />
Q;kN(x; m(i)<br />
Q;k ; P (i)<br />
Q;k )dx<br />
j=1<br />
i=1<br />
w (j)<br />
k (z) = p D;k w (j)<br />
kjk 1 q(j) k<br />
(z)<br />
JkX<br />
Q<br />
P<br />
= w (i)<br />
Jkjk<br />
k (z) + p<br />
1<br />
D;k l=1<br />
w (l)<br />
kjk 1 q(l) k<br />
(z);<br />
Q;k<br />
i=1<br />
(19)<br />
q (j)<br />
k (z) = N(z; H km (j)<br />
kjk 1 ; R k + H k P (j)<br />
kjk 1 HT k ); So, the properties of the Gaussian mixture in the case of<br />
m (j)<br />
kjk<br />
= m (j)<br />
kjk 1 + K(j) k (z H km (j)<br />
kjk 1 );<br />
multi-sensor is similar with one sensor case. This means in the<br />
multi-sensor multi-object tracking problem, under assumptions<br />
P (j)<br />
kjk<br />
= [I K (j)<br />
k<br />
H k]P (j)<br />
kjk 1 ;<br />
in IV-A, the initial prior intensity of multi-sensor multi-object<br />
tracking is a Gaussian mixture, the posterior intensity for<br />
K (j)<br />
k<br />
= P (j)<br />
kjk 1 HT k (H k P (j)<br />
kjk 1 HT k + R k ) 1 : asynchronous sensor fusion method at any subsequent time<br />
C. <strong>GMPHD</strong> lter with multi-sensor<br />
step is also a Gaussian mixture.<br />
When there are many sensors, we propose a method to D. Implement issues<br />
solve by using <strong>GMPHD</strong> lter sequentially at each sensor. The<br />
The state estimations of objects are the means of Gaussian<br />
algorithm is described as follows.<br />
components that have high weights (above 0.5) in v<br />
<strong>With</strong> assumptions in IV-A, at time k 1 we have<br />
k (x). This<br />
estimation method is more efcient than particle PHD lter.<br />
J<br />
X k 1<br />
Because in particle PHD lter, we obtain the number of objects<br />
v k 1 (x) = m (i)<br />
k 1N(x; w(i)<br />
k 1 ; P (i)<br />
k 1 )<br />
^N kjk then partition particles into ^N kjk clusters. If ^N kjk is not<br />
i=1<br />
First, we used assumptions on state equation (5), measurement<br />
corrected, then the tracking performance will be affected.<br />
Now, we investigate the number of Gaussian components in<br />
equation (6) and v k 1 (x) to predict the intensity vkjk 1 1 (x) v k (x). At the rst sensor, the number of Gaussian components<br />
at the sensor 1 by using the equation (11). Then we update is
At the second sensor, the number of Gaussian components is<br />
Jk 2 = Jk(1 1 + jZkj)<br />
2<br />
= (J k 1 + J ;k )(1 + jZkj)(1 1 + jZkj) 2 (21)<br />
So, the number of Gaussian components in v k (x) is<br />
J k = J Q k<br />
= (J k 1 + J ;k )(1 + jZkj) 1 (1 + jZ Q k<br />
j) (22)<br />
The number of Gaussian components in <strong>GMPHD</strong> with multisensor<br />
increases so much with the time that leading to high<br />
computation. So, at each time, methods to reduce the number<br />
of Gaussian components are required. There are some rules to<br />
reduce the number of Gaussians, such as Gaussian components<br />
that have small weights will be cut, Gaussian components that<br />
are close together will be merged into one Gaussian, and if the<br />
number of Gaussian components is over a threshold L, rst<br />
L Gaussian components with high weights will be chosen for<br />
propagating in the next iteration [12].<br />
V. EXPERIMENTAL RESULTS<br />
A. Gaussian mixture probability hypothesis density lter with<br />
multi-sensor for bearing and range tracking<br />
First, we consider a bearing and range tracking application<br />
to demonstrate the effectiveness of <strong>GMPHD</strong> lter with multisensor.<br />
There are objects that appeared and disappeared at<br />
different times. Each object has the survival probability p S;k =<br />
0:99 and follows a nonlinear nearly constant turn model [12]<br />
in which the target state takes the form x k = y T k ; ! k T<br />
,<br />
where y k = [p x;k ; p y;k ; _p x;k ; _p y;k ] T is the coordinate (x; y)<br />
and velocity in each dimension of object, and ! k is the turn<br />
rate. The state dynamic equations are given by<br />
y k = F (! k 1 )y k 1 + G! k 1 ; (23)<br />
! k = ! k 1 + u k 1 ;<br />
where = 1s, ! k N (; 0; 2 wI 2 ), w = 15 m/s 2 , u k <br />
N (; 0; 2<br />
u), and u = =180 rad/s. 3<br />
sin! 1 cos!<br />
1 0<br />
!<br />
!<br />
1 cos! sin!<br />
F (!) = 6 0 1 !<br />
! 7<br />
4 0 0 cos! sin! 5 ,<br />
0 0 sin! cos!<br />
2 3<br />
G =<br />
6<br />
4<br />
2<br />
2<br />
0<br />
<br />
0 2<br />
2<br />
0<br />
0 <br />
7<br />
5<br />
We assume no spawning and that the spontaneous birth RFS<br />
is Poisson with intensity<br />
where<br />
k (x) = 0:1N (x; m ; P )<br />
m = [0; 0; 2000; 0; 0] T ;<br />
P = diag([2500; 2500; 2500; 2500; (6=180) 2 ] T ):<br />
Fig. 1. Position (x; y) of targets with measurements from sensor 1<br />
Each object has a probability of detection p D;k = 0:98.<br />
Observations consist of bearing and range measurements from<br />
2 sensors. The position of the sensors are<br />
p 1 s = [0; 0] (24)<br />
p 2 s = [1000; 1000] (25)<br />
The observation model at sensor i is given by<br />
2<br />
3<br />
px;k p<br />
arctan<br />
i s;x<br />
zk i = 4<br />
p y;k p<br />
q<br />
i s;y 5 + k ; (26)<br />
(p x;k p i s;x) 2 + (p y;k p i s;y) 2<br />
where k N(:; 0; R k ) with R k = diag([ 2 ; 2 r] T ), =<br />
=30 rad/s and r = 10 m. The clutter RFS follows the uniform<br />
Poisson model over the surveillance region [ =2; =2]<br />
rad [0; 3000] m, with c = 1:1 10 3 radm 1 (i.e., an<br />
average of 10 clutter returns on the surveillance region). The<br />
pruning parameters for the <strong>GMPHD</strong> lters are T = 10 5 ,<br />
merging threshold U = 4, and maximum number of Gaussian<br />
components J max = 100. (More details on these parameters<br />
are in [12]).<br />
Figure 1 and 2 show the position estimations with measurements<br />
from sensor 1 and 2 respectively. Because of the<br />
high clutter and high noise, there are some errors in the<br />
lter outputs. Figure 3 shows the position estimations with<br />
<strong>GMPHD</strong> method. The performance outperformed with using<br />
one sensor. This is because the results from sensor 1 is the<br />
good prediction for sensor 2. Thus, the information from both<br />
sensors is collaborated to give the state estimates.<br />
B. Gaussian mixture probability hypothesis density lter for<br />
multiple speaker tracking<br />
Second, we tested the <strong>GMPHD</strong> lter in multiple speaker<br />
tracking. We simulated an acoustic room to test the performance<br />
of <strong>GMPHD</strong> in tracking multiple speakers. The<br />
dimensions of the room are 3m 3m 2.5m. There are four
delay of arrival measurement (TDOA) z q k<br />
is measured from the<br />
q-th microphone pair at time k. The measurement equation is<br />
z q k<br />
= T q (x k ) + v q k<br />
; q = 1; :::; Q (28)<br />
T q (x k ) = kx k p 2;q k kx k p 1;q k<br />
(29)<br />
c<br />
where p i;q is the position of microphone i of pair q, c is the<br />
speed of sound, and v q k<br />
N(0; 4 10 9 ) is uncorrelated<br />
noise. Because the measurement equation (28) is not linear<br />
Gaussian, we need to approximate the linear system by using<br />
unscented transform in <strong>GMPHD</strong> lter [12]. Each speaker has a<br />
probability of survival at time k is p S;k = 0:95, the probability<br />
of detection is p D;k = 0:7. To extract the TDOA for multiple<br />
speakers, we applied the method from [13]. Figure 4 shows<br />
an example to collect TDOA measurements at a microphone<br />
pair (for example microphone pair 2).<br />
Fig. 2. Position (x; y) of targets with measurements from sensor 2<br />
Fig. 4.<br />
GCC TDOA measurements<br />
Fig. 3.<br />
Position (x; y) of targets with fusion method<br />
microphone pairs, each of them has an inter-sensor spacing of<br />
0.5m. The speaker sources are all female. The acoustic image<br />
method [16] was used to simulate the room impulse responses.<br />
The reverberation time of the room impulse responses is about<br />
T 60 = 0:15s. The speech signal to noise ratio is about 20dB.<br />
There are 60 frames. The time frame length for measuring<br />
TDOA is 256ms, and they are non-overlapping. There are two<br />
speakers. They appeared and disappeared at different times.<br />
Let x k be the state of a speaker at time k. Here, the state<br />
is the position (x; y) of speaker. We assume that the dynamic<br />
moving equation can be given<br />
x k = Ax k 1 + w k (27)<br />
where A = [I] and w k N([0; 0]; diag([0:01; 0:01])). This<br />
means the average distance from the previous time k 1 to<br />
k of a speaker is about 10 cm. Given a speaker x k , the time<br />
Figures 5 and 6 show the multi-speaker tracking performance<br />
of particle PHD lter [14]. Because of the unreliable in<br />
clustering technique, the state estimaties are affected. Figures<br />
7 and 8 show the multi-speaker tracking performance of our<br />
method. This performance is better than particle PHD lter. In<br />
most of the time that two persons speak simultaneously, our<br />
method can give reliable estimations. This is because <strong>GMPHD</strong><br />
lter does not depend on clustering techniques. The state<br />
estimates are extracted from means of Gaussian components<br />
that have high weights.<br />
The above result is the performance for one trial. To<br />
measure the average performance, we used the performance<br />
measurement from [13]. It includes the probability of correct<br />
speaker number, expected absolute error on the number of<br />
speaker and conditional mean distance error by Wasserstein<br />
distance. The probability of correct speaker number is dened<br />
by<br />
P (j ^X k j = jX k j) (30)<br />
where ^X k is the estimation of multi-speaker state and X k is<br />
ground-truth. The expected absolute error on the number of<br />
speaker is<br />
E(j ^X k j jX k j) (31)
Fig. 5.<br />
Number of speakers by particle PHD lter<br />
Fig. 7.<br />
Number of speakers by our method<br />
Fig. 6.<br />
Position (x; y) of speakers with particle PHD lter<br />
Fig. 8.<br />
Position (x; y) of speakers by our method<br />
When j ^X k j = jX k j, the Wasserstein distance between ^X k and<br />
X k is dened as follows<br />
d(X k ; ^X k ) = inf<br />
C<br />
0<br />
@ X<br />
x i2X k<br />
11=P<br />
X<br />
d(x i ; ^x j ) P A<br />
^x j2 ^X k<br />
(32)<br />
where C represents an j ^X k j jX k j. The conditional mean<br />
distance error is dened<br />
Efd(X k ; ^X k )jcorrect speaker number estimateg (33)<br />
We tested the performance with 500 trials. Each trial is a<br />
new signal and a new TDOA measurement set. Figures 9<br />
and 10 show the probability of correct speaker number and<br />
expected absolute error in estimation of number of speaker<br />
compared between our method and particle PHD lter. Our<br />
method is more accurate than particle PHD lter. The main<br />
error in our method occur due to TDOA measurements are<br />
not reliable when there are two people speaking at the same<br />
time. Figures 11 shows the conditional mean distance error of<br />
speaker tracking. It is more stable than particle PHD lter.<br />
VI. CONCLUSIONS<br />
Multi-sensor multi-object tracking is a challenging problem.<br />
In this paper, we employed <strong>GMPHD</strong> lter for multi-sensor<br />
multi-object tracking. The sequential sensor updating was used<br />
to fuse data from multi-sensor in the <strong>GMPHD</strong> lter. We proved<br />
that our method can work well for multi-sensor in the bearing<br />
and range multi-object tracking. Moreover, we demonstrated<br />
that <strong>GMPHD</strong> lter is more efcient than particle PHD lter<br />
in multiple speaker tracking.<br />
VII. ACKNOWLEDGEMENT<br />
The authors would like to thank Prof. Ba Ngu Vo at<br />
Melbourne University for his helps and fruitful discussions.<br />
This work is partially supported by EU project ASTRALS<br />
(FP6-IST-0028097).<br />
REFERENCES<br />
[1] G. Wang, R. Rabenstein, N. Strobel, and S. Spors, “<strong>Object</strong> localization<br />
by joint audio-video signal processing,” in Vision Modelling and<br />
Visualization, Germany, 2000.<br />
[2] P. J. Escamilla-Ambrosio and N. Lieven, “A multiple-sensor multipletarget<br />
tracking approach for the autotaxi system,” in IEEE Intelligent<br />
Vehicles Symposium, Italy, 2004.
Fig. 9.<br />
Probability of correct speaker number<br />
Fig. 11.<br />
Conditional mean distance error of multi-speaker tracking<br />
tracking of sound sources using beamforming and particle ltering,” in<br />
Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-06),<br />
France, 2006.<br />
[16] J. Allen and D. Berkley, “Image method for efciently simulating small<br />
room acoustic,” Journal of the Acoustical Society of America, vol. 65,<br />
pp. 943–950, 1979.<br />
Fig. 10.<br />
Absolute error on the number of speaker<br />
[3] Z. Ding and L. Hong, “Development of a distributed IMM algorithm<br />
for multi-platform multi-sensor tracking,” in International Conference<br />
on Multisensor Fusion and Intergration for Intelligent Systems, USA,<br />
1996.<br />
[4] Y. Bar-Shalom and T. E. Fortmann, <strong>Tracking</strong> and data association. San<br />
Diego: Academic Press, 1988.<br />
[5] D. Reid, “An algorithm for tracking multiple targets,” IEEE Transaction<br />
Automatic Control, vol. 24, no. 6, pp. 84–90, 1979.<br />
[6] K. Chang, C. Chong, and Y. Bar-Shalom, “Joint probabilistic data association<br />
in distributed sensor networks,” IEEE Transaction on Automatic<br />
Control, vol. 31, no. 10, 1986.<br />
[7] L. Y. Pao and S. D. O'Neil, “Multisensor fusion algorithms for tracking,”<br />
in Proceeding of American Control Conference, USA, 1993.<br />
[8] L. Cheny, M. J. Wainwright, M. Cetiny, and A. S. Willsky, “Multitargetmultisensor<br />
data association using the tree-reweighted max-product<br />
algorithm,” in SPIE AeroSense Conference, USA, 2003.<br />
[9] R. Mahler, “Multi-target Bayes ltering via rst-order multi-target<br />
moments,” IEEE Trans. on Aerospace and Electronic Systems, vol. 39,<br />
no. 4, pp. 1152–1178, 2003.<br />
[10] B. N. Vo, S. Singh, and A. Doucet, “Sequential Monte Carlo methods<br />
for Bayesian multi-target ltering with random nite sets,” IEEE Trans.<br />
Aerospace and Electronic Systems, vol. 41, no. 4, pp. 1224–1245, Oct<br />
2005.<br />
[11] T. Zajic and R. Mahler, “A particle system implement of the PHD multitarget<br />
tracking lter,” in Signal Processing, <strong>Sensor</strong> Fusion and target<br />
Recognition XII, SPIE Proc, 2003, pp. 291–299.<br />
[12] B. N. Vo and W. K. Ma, “The Gaussian mixture probability hypothesis<br />
density lter,” IEEE Transaction Signal Processing, vol. 54, no. 11,<br />
2006.<br />
[13] W. K. Ma, B. Vo, S. Singh, and A. Baddeley, “<strong>Tracking</strong> an unknown<br />
time-varying number of speakers using TDOA measurements: a random<br />
nite set approach,” IEEE Trans Signal Processing, vol. 54, no. 9, 2006.<br />
[14] B. N. Vo, S. Singh, and W. K. Ma, “<strong>Tracking</strong> multiple speakers with<br />
random sets,” in ICASSP, Montreal, Canada, 2004.<br />
[15] J. M. Valin, F. Michaud, , and J. Rouat, “Robust 3D localization and