18.07.2014 Views

Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF

Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF

Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Multiple</strong> <strong>Sensor</strong> <strong>Multiple</strong> <strong>Object</strong> <strong>Tracking</strong> <strong>With</strong><br />

<strong>GMPHD</strong> <strong>Filter</strong><br />

Nam Trung Pham, Weimin Huang<br />

Institute for Infocomm Research, Singapore<br />

Email: {stuntp,wmhuang}@i2r.a-star.edu.sg<br />

S. H. Ong<br />

Department of Electrical and Computer Engineering<br />

National University of Singapore<br />

Email: eleongsh@nus.edu.sg<br />

Abstract— <strong>Tracking</strong> objects using multiple sensors is more<br />

efcient than those using one sensor. In this paper, we proposed<br />

a method to fuse data from multiple sensors in Gaussian mixture<br />

probability hypothesis density lter. This method can avoid the<br />

data association problem in multi-sensor multi-object tracking.<br />

Moreover, it is more reliable and less computational than particle<br />

probability hypothesis density lter for multi-sensor multi-object<br />

tracking. We demonstrated the efcient of the approach by<br />

applications such as bearing and range tracking, and multiple<br />

speaker tracking.<br />

Keywords: Random nite set, Gaussian mixture probability<br />

hypothesis density, bearing and range tracking, speaker<br />

tracking.<br />

I. INTRODUCTION<br />

Multi-sensor multi-object tracking has received many attentions<br />

in recent years. In the multi-sensor tracking system,<br />

data fusion techniques combine data from multiple sensors<br />

to obtain the state estimates of objects. The performance of<br />

the tracking system can be improved by fusing data from<br />

multi-sensor [1], [2]. However, the multi-sensor multi-object<br />

tracking problem is challenging. These challenges are varying<br />

number of objects, complexity in data association between<br />

observations and objects.<br />

Many data fusion approaches have been developed for<br />

multi-sensor multi-object tracking in recent years. Two main<br />

approaches are sensor-level fusion and feature-level fusion.<br />

These approaches correspond to two levels of data association.<br />

In the sensor-level fusion approach, observations from objects<br />

are used to track objects at each sensor. These tracks are<br />

associated and fused to obtain the state estimates by using<br />

methods such as interaction multiple model [3], joint probability<br />

data association [4], multiple hypothesis tracking [5].<br />

Some people employed the sensor-level fusion approach for<br />

tracking [1], [2], [3], [6]. The second approach is feature-level<br />

fusion. In this approach, all observations from multiple sensors<br />

are sent to the fusion center. Then, the fusion center associates<br />

these observations with objects to obtain state estimates. Some<br />

methods used this approach for multi-sensor multi-object<br />

tracking [7], [8]. However, up to now, methods that are based<br />

on two these approaches are computationally intensive because<br />

they have to solve the data association problem.<br />

Recently, random set approaches gave a new direction<br />

for multi-sensor multi-object tracking. Here, the states of<br />

objects are represented as random sets. Using this model,<br />

the birth and death of objects can be described in the tracking<br />

algorithm. Moreover, measurements and false alarms are<br />

also represented as random sets in the observation model.<br />

Mahler [9] employed the random set framework to propose a<br />

probability hypothesis density (PHD) lter. This method can<br />

avoid the data association between observations and objects.<br />

Some implementations of PHD lter are proposed by using the<br />

sequential Monte Carlo (SMC) method [10], [11]. Especially,<br />

the implementation in [10] has the convergence proof, and<br />

it is called particle PHD lter. In these implementations, the<br />

state estimates are extracted from particles representing the<br />

posterior intensity by using clustering techniques. Vo [12]<br />

proposed a close-form for PHD lter with assumptions on<br />

linear Gaussian system. It is called <strong>GMPHD</strong> lter. This method<br />

reduced a lot computation compared with particle PHD lter.<br />

For multi-sensor multi-object tracking, there are some methods<br />

to fuse data from multi-sensor in random set approaches<br />

such as multiplication likelihood function from sensors [13] or<br />

sequential sensor updating [9], [14]. These methods can track<br />

varying number of objects with multi-sensor. However, they<br />

are implemented based on sequential Monte Carlo, so they<br />

need a lot computation.<br />

In this paper, we proposed a method for multi-sensor multiobject<br />

tracking based on <strong>GMPHD</strong> lter. We extended the<br />

<strong>GMPHD</strong> lter from one sensor to multi-sensor. The way we<br />

choose to fuse data from multi-sensor is sequential sensor<br />

updating in [9], [14]. Our method can collaborate information<br />

from multiple sensors and avoid the data association<br />

between observations and objects. In addition, we applied our<br />

method in bearing and range tracking, and multiple speaker<br />

tracking. For bearing and range tracking, we proved that<br />

our method can fuse data from multiple sensors to obtain<br />

the better performance than tracking multi-object with one<br />

sensor. For multiple speaker tracking, our method reduced a lot<br />

computation compared with methods using data association or<br />

particle PHD lter in multiple speaker tracking such as [13],<br />

[14] and [15]. Moreover, our method has reliable estimations<br />

of speaker positions.<br />

The paper is organized as follows. In the section II, we<br />

formulate the multi-sensor multi-object tracking problem in<br />

random nite set model. In the section III, the PHD lter<br />

approach is reviewed. In section IV, we extend <strong>GMPHD</strong> lter<br />

from one sensor to multi-sensor and implementation issues are<br />

discussed. Finally, some experimental results in bearing and


ange tracking, and multiple speaker tracking are presented in<br />

the section V.<br />

II. PROBLEM FORMULATION<br />

The multi-sensor multi-object tracking problem can be<br />

modelled by random nite set (RFS) framework. Let X be<br />

the single object state space then multiple object state at<br />

time k is presented by X k = fx k;1 ; x k;2 :::; x k;Nk g 2 F(X ),<br />

where F(X ) denotes the collection of all nite subsets of the<br />

space X . For a multi-object state X k 1 at time k 1, each<br />

x k 1 2 X k 1 can continue to exist at time k with probability<br />

p S;k or die at time k with probability (1 p S;k ). Let S k (x k 1 )<br />

denote the object that is the transition from x k 1 at time k on<br />

condition that the object is survived and let B kjk 1 (x k 1 ) be<br />

objects spawned at time k from an object with previous state<br />

x k 1 . Let k be RFS of spontaneous births at time k and<br />

can be determined by using the assumption of spontaneous<br />

birth models. Given a multi-object state X k 1 at time k 1,<br />

the multi-object state X k at time k is given by union of the<br />

surviving objects and new objects,<br />

X k =<br />

h[ h[ i<br />

Sk (x k 1 )i<br />

[ Bkjk 1 (x k 1 )<br />

[ [ k] (1)<br />

The RFS X k encapsulates all aspects of multi-object tracking<br />

problem, such as time varying number of objects, object<br />

motion.<br />

Similarly, let Z i be the measurement space of single object<br />

at sensor i then measurements collected at sensor i time k is<br />

Zk i 2 F(Zi ). A given object state x k 2 X k is either detected<br />

with probability p D or missed with probability (1 p D ).<br />

Conditional on detection, the measurement from x k at sensor<br />

i is dened by the RFS i k (x k). The sensor i also can receive<br />

a set of clutters Ck i . So, given a multi-object state X k at time<br />

k, the measurement set from sensor i at time k is formed by<br />

the union of object generated measurements and clutters,<br />

" #<br />

[<br />

Zk i = i k(x k ) [ Ck i (2)<br />

x k 2X k<br />

Assuming that we have Q sensors, the RFS of measurements<br />

at time k is modelled by<br />

h<br />

i<br />

Z k = Zk; 1 Zk; 2 : : : ; Z Q k<br />

(3)<br />

The RFS Z k encapsulates all sensor characteristics such as<br />

measurement noise, sensor eld of view, clutter.<br />

The multi-sensor multi-object tracking can be posed as<br />

follows: given set of measurement Z 1:k collected from sensors<br />

up to time k, the problem is to nd ^X k is expectation or<br />

maximization of the posterior density function p(X k jZ 1:k ).<br />

III. PROBABILITY HYPOTHESIS DENSITY APPROACH<br />

In multiple object tracking problem, we usually need to<br />

obtain the posterior density p(X k jZ 1:k ). When the number<br />

of object increases, the multiple object state space become<br />

large. Hence, it is difcult to obtain the posterior density function.<br />

Fortunately, this density function can be approximately<br />

recovered from the probability hypothesis density (PHD) [9].<br />

The PHD is dened as follows. For a random nite set X on<br />

X with probability distribution P , the PHD is the density v(x)<br />

such that for each region S X , the integral of v over region<br />

S gives the expected number of elements of X that are in S,<br />

Z<br />

Z<br />

j X \ S j P (dX) = v(x)dx; (4)<br />

Thus, instead of estimating states of objects from posterior<br />

density, we can estimate them by investigating peaks of PHD.<br />

It helps to reduce from searching in multiple object state space<br />

to single object state space.<br />

IV. GAUSSIAN MIXTURE PROBABILITY HYPOTHESIS<br />

DENSITY FILTER IN MULTI-SENSOR MULTI-OBJECT<br />

A. Assumptions<br />

TRACKING<br />

First, there are some assumptions. The transition function<br />

of each object follows a linear Gaussian model, i.e.,<br />

f kjk 1 (xj) = N(x; F k 1 ; Q k 1 ) (5)<br />

where N(:; m; P ) denotes a Gaussian density with mean m<br />

and covariance P , F k 1 is the state transition matrix, Q k 1<br />

is the process noise covariance. There are Q sensors, the<br />

likelihood function at each sensor is also a linear Gaussian<br />

model, i.e.,<br />

g i k(zjx) = N(z; H i kx; R i k) (6)<br />

where H i k is the observation matrix of the sensor i, and Ri k is<br />

the observation noise covariance of the sensor i. The survival<br />

and detection probabilities are<br />

S<br />

p S;k (x) = p S;k (7)<br />

p D;k (x) = p D;k (8)<br />

The intensity of the spontaneous birth RFS is<br />

J ;k<br />

X<br />

k (x) =<br />

i=1<br />

w (i)<br />

;kN(x; m(i)<br />

;k ; P (i)<br />

;k ) (9)<br />

where J ;k is the number of birth Gaussian components at<br />

time k, and w (i)<br />

;k<br />

is the weight for i-th Gaussian component.<br />

The posterior intensity at time k 1 is a Gaussian mixture of<br />

the form<br />

v k 1 (x) =<br />

J<br />

X k 1<br />

i=1<br />

w (i)<br />

k 1<br />

N(x; m(i)<br />

k 1 ; P (i)<br />

k 1 ) (10)<br />

where J k 1 is the number of Gaussian components of posterior<br />

intensity at time k 1, and w (i)<br />

k 1<br />

is the weight for i-th<br />

Gaussian component.<br />

B. <strong>GMPHD</strong> lter with one sensor<br />

Vo [12] proposed a closed form expression of the PHD<br />

lter for linear Gaussian multi-object tracking, called the<br />

Gaussian mixture probability hypothesis density lter. Under<br />

assumptions in IV-A, the initial prior intensity is a Gaussian<br />

mixture, the posterior intensity at any subsequent time step is


1 (x) with measurement set Z1 k by equation (13) to obtain<br />

also a Gaussian mixture. In the case there is one sensor, this the PHD at time k sensor 1, vk 1(x). Because v k 1(x) is a<br />

method can be employed.<br />

Gaussian mixture, vk 1 (x) is also a Gaussian mixture and has<br />

In what follows, we outline the <strong>GMPHD</strong> lter with assumptions<br />

the form<br />

vkjk 1 Jk 1 = (J k 1 + J ;k )(1 + jZkj) 1 (20)<br />

there are no spawning objects. (In the case there<br />

JkX<br />

1<br />

are spawning objects, the prediction equation is modied<br />

vk(x) 1 = w (i)<br />

1;kN(x; m(i)<br />

1;k ; P (i)<br />

1;k ) (14)<br />

by adding Gaussian components representing for spawning<br />

i=1<br />

objects. The details is in [12]).<br />

Now, at the sensor 2, we use vk 1 (x) as the predicted PHD for<br />

Under assumptions in IV-A, the predicted intensity to time the sensor 2 and in the similar way to (13), we have<br />

k is given by<br />

v<br />

v kjk 1 (x) = v S;kjk 1 (x) + k (x) (11)<br />

k(x) 2 = (1 p D;k )vk(x) 1 + X<br />

v D;k (x; z) (15)<br />

z2Zk<br />

2<br />

where<br />

So, v<br />

J<br />

X k 1<br />

k 2 (x) also have the Gaussian mixture form.<br />

v S;kjk 1 (x) = p S;k w (j)<br />

k 1N(x; m(j)<br />

S;kjk 1 ; P (j)<br />

S;kjk 1 );<br />

JkX<br />

2<br />

j=1<br />

vk(x) 2 = w (i)<br />

2;kN(x; m(i)<br />

2;k ; P (i)<br />

2;k ) (16)<br />

m (j)<br />

S;kjk 1<br />

= F k 1 m (j)<br />

k 1 ;<br />

i=1<br />

P (j)<br />

S;kjk 1<br />

= Q k 1 + F k 1 P (j)<br />

k 1 F k T We repeat this process with Q sensors. At the Qth sensor, we<br />

1:<br />

obtained v Q k<br />

(x), and it has the form<br />

Because v S;kjk 1 (x) and k (x) are Gaussian mixtures,<br />

J<br />

v kjk 1 (x) can be expressed as a Gaussian mixture of the form<br />

kX<br />

Q<br />

v Q k (x) = w (i)<br />

Q;kN(x; m(i)<br />

Q;k ; P (i)<br />

Q;k ) (17)<br />

J kjk<br />

X 1<br />

v kjk 1 (x) = w (i)<br />

kjk 1N(x; m(i)<br />

kjk 1 ; P (i)<br />

i=1<br />

kjk 1 ) (12) The PHD for the multi-sensor multi-object posterior density<br />

i=1<br />

will be<br />

Then, the posterior intensity at time k is also a Gaussian<br />

v k (x) = v Q k<br />

(x) (18)<br />

mixture, and is given by<br />

v k (x) = (1 p D;k )v kjk 1 (x) + X<br />

The number of objects is estimated by<br />

v D;k (x; z) (13)<br />

Z<br />

z2Z k ^N kjk = v k (x)dx<br />

where<br />

J kjk<br />

X 1<br />

Z JkX<br />

Q<br />

v D;k (x; z) = w (j)<br />

k<br />

(z)N(x; m(j)<br />

kjk ; P (j)<br />

kjk );<br />

= w (i)<br />

Q;kN(x; m(i)<br />

Q;k ; P (i)<br />

Q;k )dx<br />

j=1<br />

i=1<br />

w (j)<br />

k (z) = p D;k w (j)<br />

kjk 1 q(j) k<br />

(z)<br />

JkX<br />

Q<br />

P<br />

= w (i)<br />

Jkjk<br />

k (z) + p<br />

1<br />

D;k l=1<br />

w (l)<br />

kjk 1 q(l) k<br />

(z);<br />

Q;k<br />

i=1<br />

(19)<br />

q (j)<br />

k (z) = N(z; H km (j)<br />

kjk 1 ; R k + H k P (j)<br />

kjk 1 HT k ); So, the properties of the Gaussian mixture in the case of<br />

m (j)<br />

kjk<br />

= m (j)<br />

kjk 1 + K(j) k (z H km (j)<br />

kjk 1 );<br />

multi-sensor is similar with one sensor case. This means in the<br />

multi-sensor multi-object tracking problem, under assumptions<br />

P (j)<br />

kjk<br />

= [I K (j)<br />

k<br />

H k]P (j)<br />

kjk 1 ;<br />

in IV-A, the initial prior intensity of multi-sensor multi-object<br />

tracking is a Gaussian mixture, the posterior intensity for<br />

K (j)<br />

k<br />

= P (j)<br />

kjk 1 HT k (H k P (j)<br />

kjk 1 HT k + R k ) 1 : asynchronous sensor fusion method at any subsequent time<br />

C. <strong>GMPHD</strong> lter with multi-sensor<br />

step is also a Gaussian mixture.<br />

When there are many sensors, we propose a method to D. Implement issues<br />

solve by using <strong>GMPHD</strong> lter sequentially at each sensor. The<br />

The state estimations of objects are the means of Gaussian<br />

algorithm is described as follows.<br />

components that have high weights (above 0.5) in v<br />

<strong>With</strong> assumptions in IV-A, at time k 1 we have<br />

k (x). This<br />

estimation method is more efcient than particle PHD lter.<br />

J<br />

X k 1<br />

Because in particle PHD lter, we obtain the number of objects<br />

v k 1 (x) = m (i)<br />

k 1N(x; w(i)<br />

k 1 ; P (i)<br />

k 1 )<br />

^N kjk then partition particles into ^N kjk clusters. If ^N kjk is not<br />

i=1<br />

First, we used assumptions on state equation (5), measurement<br />

corrected, then the tracking performance will be affected.<br />

Now, we investigate the number of Gaussian components in<br />

equation (6) and v k 1 (x) to predict the intensity vkjk 1 1 (x) v k (x). At the rst sensor, the number of Gaussian components<br />

at the sensor 1 by using the equation (11). Then we update is


At the second sensor, the number of Gaussian components is<br />

Jk 2 = Jk(1 1 + jZkj)<br />

2<br />

= (J k 1 + J ;k )(1 + jZkj)(1 1 + jZkj) 2 (21)<br />

So, the number of Gaussian components in v k (x) is<br />

J k = J Q k<br />

= (J k 1 + J ;k )(1 + jZkj) 1 (1 + jZ Q k<br />

j) (22)<br />

The number of Gaussian components in <strong>GMPHD</strong> with multisensor<br />

increases so much with the time that leading to high<br />

computation. So, at each time, methods to reduce the number<br />

of Gaussian components are required. There are some rules to<br />

reduce the number of Gaussians, such as Gaussian components<br />

that have small weights will be cut, Gaussian components that<br />

are close together will be merged into one Gaussian, and if the<br />

number of Gaussian components is over a threshold L, rst<br />

L Gaussian components with high weights will be chosen for<br />

propagating in the next iteration [12].<br />

V. EXPERIMENTAL RESULTS<br />

A. Gaussian mixture probability hypothesis density lter with<br />

multi-sensor for bearing and range tracking<br />

First, we consider a bearing and range tracking application<br />

to demonstrate the effectiveness of <strong>GMPHD</strong> lter with multisensor.<br />

There are objects that appeared and disappeared at<br />

different times. Each object has the survival probability p S;k =<br />

0:99 and follows a nonlinear nearly constant turn model [12]<br />

in which the target state takes the form x k = y T k ; ! k T<br />

,<br />

where y k = [p x;k ; p y;k ; _p x;k ; _p y;k ] T is the coordinate (x; y)<br />

and velocity in each dimension of object, and ! k is the turn<br />

rate. The state dynamic equations are given by<br />

y k = F (! k 1 )y k 1 + G! k 1 ; (23)<br />

! k = ! k 1 + u k 1 ;<br />

where = 1s, ! k N (; 0; 2 wI 2 ), w = 15 m/s 2 , u k <br />

N (; 0; 2<br />

u), and u = =180 rad/s. 3<br />

sin! 1 cos!<br />

1 0<br />

!<br />

!<br />

1 cos! sin!<br />

F (!) = 6 0 1 !<br />

! 7<br />

4 0 0 cos! sin! 5 ,<br />

0 0 sin! cos!<br />

2 3<br />

G =<br />

6<br />

4<br />

2<br />

2<br />

0<br />

<br />

0 2<br />

2<br />

0<br />

0 <br />

7<br />

5<br />

We assume no spawning and that the spontaneous birth RFS<br />

is Poisson with intensity<br />

where<br />

k (x) = 0:1N (x; m ; P )<br />

m = [0; 0; 2000; 0; 0] T ;<br />

P = diag([2500; 2500; 2500; 2500; (6=180) 2 ] T ):<br />

Fig. 1. Position (x; y) of targets with measurements from sensor 1<br />

Each object has a probability of detection p D;k = 0:98.<br />

Observations consist of bearing and range measurements from<br />

2 sensors. The position of the sensors are<br />

p 1 s = [0; 0] (24)<br />

p 2 s = [1000; 1000] (25)<br />

The observation model at sensor i is given by<br />

2<br />

3<br />

px;k p<br />

arctan<br />

i s;x<br />

zk i = 4<br />

p y;k p<br />

q<br />

i s;y 5 + k ; (26)<br />

(p x;k p i s;x) 2 + (p y;k p i s;y) 2<br />

where k N(:; 0; R k ) with R k = diag([ 2 ; 2 r] T ), =<br />

=30 rad/s and r = 10 m. The clutter RFS follows the uniform<br />

Poisson model over the surveillance region [ =2; =2]<br />

rad [0; 3000] m, with c = 1:1 10 3 radm 1 (i.e., an<br />

average of 10 clutter returns on the surveillance region). The<br />

pruning parameters for the <strong>GMPHD</strong> lters are T = 10 5 ,<br />

merging threshold U = 4, and maximum number of Gaussian<br />

components J max = 100. (More details on these parameters<br />

are in [12]).<br />

Figure 1 and 2 show the position estimations with measurements<br />

from sensor 1 and 2 respectively. Because of the<br />

high clutter and high noise, there are some errors in the<br />

lter outputs. Figure 3 shows the position estimations with<br />

<strong>GMPHD</strong> method. The performance outperformed with using<br />

one sensor. This is because the results from sensor 1 is the<br />

good prediction for sensor 2. Thus, the information from both<br />

sensors is collaborated to give the state estimates.<br />

B. Gaussian mixture probability hypothesis density lter for<br />

multiple speaker tracking<br />

Second, we tested the <strong>GMPHD</strong> lter in multiple speaker<br />

tracking. We simulated an acoustic room to test the performance<br />

of <strong>GMPHD</strong> in tracking multiple speakers. The<br />

dimensions of the room are 3m 3m 2.5m. There are four


delay of arrival measurement (TDOA) z q k<br />

is measured from the<br />

q-th microphone pair at time k. The measurement equation is<br />

z q k<br />

= T q (x k ) + v q k<br />

; q = 1; :::; Q (28)<br />

T q (x k ) = kx k p 2;q k kx k p 1;q k<br />

(29)<br />

c<br />

where p i;q is the position of microphone i of pair q, c is the<br />

speed of sound, and v q k<br />

N(0; 4 10 9 ) is uncorrelated<br />

noise. Because the measurement equation (28) is not linear<br />

Gaussian, we need to approximate the linear system by using<br />

unscented transform in <strong>GMPHD</strong> lter [12]. Each speaker has a<br />

probability of survival at time k is p S;k = 0:95, the probability<br />

of detection is p D;k = 0:7. To extract the TDOA for multiple<br />

speakers, we applied the method from [13]. Figure 4 shows<br />

an example to collect TDOA measurements at a microphone<br />

pair (for example microphone pair 2).<br />

Fig. 2. Position (x; y) of targets with measurements from sensor 2<br />

Fig. 4.<br />

GCC TDOA measurements<br />

Fig. 3.<br />

Position (x; y) of targets with fusion method<br />

microphone pairs, each of them has an inter-sensor spacing of<br />

0.5m. The speaker sources are all female. The acoustic image<br />

method [16] was used to simulate the room impulse responses.<br />

The reverberation time of the room impulse responses is about<br />

T 60 = 0:15s. The speech signal to noise ratio is about 20dB.<br />

There are 60 frames. The time frame length for measuring<br />

TDOA is 256ms, and they are non-overlapping. There are two<br />

speakers. They appeared and disappeared at different times.<br />

Let x k be the state of a speaker at time k. Here, the state<br />

is the position (x; y) of speaker. We assume that the dynamic<br />

moving equation can be given<br />

x k = Ax k 1 + w k (27)<br />

where A = [I] and w k N([0; 0]; diag([0:01; 0:01])). This<br />

means the average distance from the previous time k 1 to<br />

k of a speaker is about 10 cm. Given a speaker x k , the time<br />

Figures 5 and 6 show the multi-speaker tracking performance<br />

of particle PHD lter [14]. Because of the unreliable in<br />

clustering technique, the state estimaties are affected. Figures<br />

7 and 8 show the multi-speaker tracking performance of our<br />

method. This performance is better than particle PHD lter. In<br />

most of the time that two persons speak simultaneously, our<br />

method can give reliable estimations. This is because <strong>GMPHD</strong><br />

lter does not depend on clustering techniques. The state<br />

estimates are extracted from means of Gaussian components<br />

that have high weights.<br />

The above result is the performance for one trial. To<br />

measure the average performance, we used the performance<br />

measurement from [13]. It includes the probability of correct<br />

speaker number, expected absolute error on the number of<br />

speaker and conditional mean distance error by Wasserstein<br />

distance. The probability of correct speaker number is dened<br />

by<br />

P (j ^X k j = jX k j) (30)<br />

where ^X k is the estimation of multi-speaker state and X k is<br />

ground-truth. The expected absolute error on the number of<br />

speaker is<br />

E(j ^X k j jX k j) (31)


Fig. 5.<br />

Number of speakers by particle PHD lter<br />

Fig. 7.<br />

Number of speakers by our method<br />

Fig. 6.<br />

Position (x; y) of speakers with particle PHD lter<br />

Fig. 8.<br />

Position (x; y) of speakers by our method<br />

When j ^X k j = jX k j, the Wasserstein distance between ^X k and<br />

X k is dened as follows<br />

d(X k ; ^X k ) = inf<br />

C<br />

0<br />

@ X<br />

x i2X k<br />

11=P<br />

X<br />

d(x i ; ^x j ) P A<br />

^x j2 ^X k<br />

(32)<br />

where C represents an j ^X k j jX k j. The conditional mean<br />

distance error is dened<br />

Efd(X k ; ^X k )jcorrect speaker number estimateg (33)<br />

We tested the performance with 500 trials. Each trial is a<br />

new signal and a new TDOA measurement set. Figures 9<br />

and 10 show the probability of correct speaker number and<br />

expected absolute error in estimation of number of speaker<br />

compared between our method and particle PHD lter. Our<br />

method is more accurate than particle PHD lter. The main<br />

error in our method occur due to TDOA measurements are<br />

not reliable when there are two people speaking at the same<br />

time. Figures 11 shows the conditional mean distance error of<br />

speaker tracking. It is more stable than particle PHD lter.<br />

VI. CONCLUSIONS<br />

Multi-sensor multi-object tracking is a challenging problem.<br />

In this paper, we employed <strong>GMPHD</strong> lter for multi-sensor<br />

multi-object tracking. The sequential sensor updating was used<br />

to fuse data from multi-sensor in the <strong>GMPHD</strong> lter. We proved<br />

that our method can work well for multi-sensor in the bearing<br />

and range multi-object tracking. Moreover, we demonstrated<br />

that <strong>GMPHD</strong> lter is more efcient than particle PHD lter<br />

in multiple speaker tracking.<br />

VII. ACKNOWLEDGEMENT<br />

The authors would like to thank Prof. Ba Ngu Vo at<br />

Melbourne University for his helps and fruitful discussions.<br />

This work is partially supported by EU project ASTRALS<br />

(FP6-IST-0028097).<br />

REFERENCES<br />

[1] G. Wang, R. Rabenstein, N. Strobel, and S. Spors, “<strong>Object</strong> localization<br />

by joint audio-video signal processing,” in Vision Modelling and<br />

Visualization, Germany, 2000.<br />

[2] P. J. Escamilla-Ambrosio and N. Lieven, “A multiple-sensor multipletarget<br />

tracking approach for the autotaxi system,” in IEEE Intelligent<br />

Vehicles Symposium, Italy, 2004.


Fig. 9.<br />

Probability of correct speaker number<br />

Fig. 11.<br />

Conditional mean distance error of multi-speaker tracking<br />

tracking of sound sources using beamforming and particle ltering,” in<br />

Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP-06),<br />

France, 2006.<br />

[16] J. Allen and D. Berkley, “Image method for efciently simulating small<br />

room acoustic,” Journal of the Acoustical Society of America, vol. 65,<br />

pp. 943–950, 1979.<br />

Fig. 10.<br />

Absolute error on the number of speaker<br />

[3] Z. Ding and L. Hong, “Development of a distributed IMM algorithm<br />

for multi-platform multi-sensor tracking,” in International Conference<br />

on Multisensor Fusion and Intergration for Intelligent Systems, USA,<br />

1996.<br />

[4] Y. Bar-Shalom and T. E. Fortmann, <strong>Tracking</strong> and data association. San<br />

Diego: Academic Press, 1988.<br />

[5] D. Reid, “An algorithm for tracking multiple targets,” IEEE Transaction<br />

Automatic Control, vol. 24, no. 6, pp. 84–90, 1979.<br />

[6] K. Chang, C. Chong, and Y. Bar-Shalom, “Joint probabilistic data association<br />

in distributed sensor networks,” IEEE Transaction on Automatic<br />

Control, vol. 31, no. 10, 1986.<br />

[7] L. Y. Pao and S. D. O'Neil, “Multisensor fusion algorithms for tracking,”<br />

in Proceeding of American Control Conference, USA, 1993.<br />

[8] L. Cheny, M. J. Wainwright, M. Cetiny, and A. S. Willsky, “Multitargetmultisensor<br />

data association using the tree-reweighted max-product<br />

algorithm,” in SPIE AeroSense Conference, USA, 2003.<br />

[9] R. Mahler, “Multi-target Bayes ltering via rst-order multi-target<br />

moments,” IEEE Trans. on Aerospace and Electronic Systems, vol. 39,<br />

no. 4, pp. 1152–1178, 2003.<br />

[10] B. N. Vo, S. Singh, and A. Doucet, “Sequential Monte Carlo methods<br />

for Bayesian multi-target ltering with random nite sets,” IEEE Trans.<br />

Aerospace and Electronic Systems, vol. 41, no. 4, pp. 1224–1245, Oct<br />

2005.<br />

[11] T. Zajic and R. Mahler, “A particle system implement of the PHD multitarget<br />

tracking lter,” in Signal Processing, <strong>Sensor</strong> Fusion and target<br />

Recognition XII, SPIE Proc, 2003, pp. 291–299.<br />

[12] B. N. Vo and W. K. Ma, “The Gaussian mixture probability hypothesis<br />

density lter,” IEEE Transaction Signal Processing, vol. 54, no. 11,<br />

2006.<br />

[13] W. K. Ma, B. Vo, S. Singh, and A. Baddeley, “<strong>Tracking</strong> an unknown<br />

time-varying number of speakers using TDOA measurements: a random<br />

nite set approach,” IEEE Trans Signal Processing, vol. 54, no. 9, 2006.<br />

[14] B. N. Vo, S. Singh, and W. K. Ma, “<strong>Tracking</strong> multiple speakers with<br />

random sets,” in ICASSP, Montreal, Canada, 2004.<br />

[15] J. M. Valin, F. Michaud, , and J. Rouat, “Robust 3D localization and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!