18.07.2014 Views

Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF

Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF

Multiple Sensor Multiple Object Tracking With GMPHD Filter - ISIF

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

delay of arrival measurement (TDOA) z q k<br />

is measured from the<br />

q-th microphone pair at time k. The measurement equation is<br />

z q k<br />

= T q (x k ) + v q k<br />

; q = 1; :::; Q (28)<br />

T q (x k ) = kx k p 2;q k kx k p 1;q k<br />

(29)<br />

c<br />

where p i;q is the position of microphone i of pair q, c is the<br />

speed of sound, and v q k<br />

N(0; 4 10 9 ) is uncorrelated<br />

noise. Because the measurement equation (28) is not linear<br />

Gaussian, we need to approximate the linear system by using<br />

unscented transform in <strong>GMPHD</strong> lter [12]. Each speaker has a<br />

probability of survival at time k is p S;k = 0:95, the probability<br />

of detection is p D;k = 0:7. To extract the TDOA for multiple<br />

speakers, we applied the method from [13]. Figure 4 shows<br />

an example to collect TDOA measurements at a microphone<br />

pair (for example microphone pair 2).<br />

Fig. 2. Position (x; y) of targets with measurements from sensor 2<br />

Fig. 4.<br />

GCC TDOA measurements<br />

Fig. 3.<br />

Position (x; y) of targets with fusion method<br />

microphone pairs, each of them has an inter-sensor spacing of<br />

0.5m. The speaker sources are all female. The acoustic image<br />

method [16] was used to simulate the room impulse responses.<br />

The reverberation time of the room impulse responses is about<br />

T 60 = 0:15s. The speech signal to noise ratio is about 20dB.<br />

There are 60 frames. The time frame length for measuring<br />

TDOA is 256ms, and they are non-overlapping. There are two<br />

speakers. They appeared and disappeared at different times.<br />

Let x k be the state of a speaker at time k. Here, the state<br />

is the position (x; y) of speaker. We assume that the dynamic<br />

moving equation can be given<br />

x k = Ax k 1 + w k (27)<br />

where A = [I] and w k N([0; 0]; diag([0:01; 0:01])). This<br />

means the average distance from the previous time k 1 to<br />

k of a speaker is about 10 cm. Given a speaker x k , the time<br />

Figures 5 and 6 show the multi-speaker tracking performance<br />

of particle PHD lter [14]. Because of the unreliable in<br />

clustering technique, the state estimaties are affected. Figures<br />

7 and 8 show the multi-speaker tracking performance of our<br />

method. This performance is better than particle PHD lter. In<br />

most of the time that two persons speak simultaneously, our<br />

method can give reliable estimations. This is because <strong>GMPHD</strong><br />

lter does not depend on clustering techniques. The state<br />

estimates are extracted from means of Gaussian components<br />

that have high weights.<br />

The above result is the performance for one trial. To<br />

measure the average performance, we used the performance<br />

measurement from [13]. It includes the probability of correct<br />

speaker number, expected absolute error on the number of<br />

speaker and conditional mean distance error by Wasserstein<br />

distance. The probability of correct speaker number is dened<br />

by<br />

P (j ^X k j = jX k j) (30)<br />

where ^X k is the estimation of multi-speaker state and X k is<br />

ground-truth. The expected absolute error on the number of<br />

speaker is<br />

E(j ^X k j jX k j) (31)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!