05.06.2014 Views

PDF, 16,4 MB - Quality and Usability Lab - TU Berlin

PDF, 16,4 MB - Quality and Usability Lab - TU Berlin

PDF, 16,4 MB - Quality and Usability Lab - TU Berlin

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Multi-Microphone Speech Enhancement<br />

Centralized <strong>and</strong> Distributed Beamformers<br />

Sharon Gannot<br />

Faculty of Engineering, Bar-Ilan University, Israel<br />

<strong>TU</strong> <strong>Berlin</strong>, 7.1.2013, <strong>Berlin</strong>, Germany<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 1 / 65


Introduction<br />

Speech Enhancement <strong>and</strong> Source Extraction<br />

Utilizing Microphone Arrays<br />

Applications<br />

1 H<strong>and</strong>s-free communications.<br />

2 Teleconference.<br />

3 Skype calls.<br />

4 Hearing aids.<br />

5 Eavesdropping.<br />

Why is it a Difficult Task?<br />

1 Speech is a non-stationary<br />

signal with high dynamic<br />

range.<br />

2 Very long acoustic path<br />

(reverberation).<br />

3 Time varying acoustic path.<br />

4 Microphone arrays impose<br />

high computational burden.<br />

5 Large (<strong>and</strong> distributed)<br />

arrays require large<br />

communication b<strong>and</strong>width.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 2 / 65


Introduction<br />

Room Acoustics Essentials<br />

Reverberation<br />

Late reflections tend to be<br />

diffused, hence performance<br />

degradation.<br />

Beamforming: higher<br />

processing burden <strong>and</strong> high<br />

latency in STFT<br />

implementation.<br />

Deteriorates intelligibility,<br />

quality <strong>and</strong> ASR<br />

performance.<br />

Sound Fields<br />

Directional Room impulse<br />

response relates source <strong>and</strong><br />

microphones.<br />

Uncorrelated Signals on<br />

microphone are uncorrelated.<br />

Diffused Sound is coming<br />

from all directions.<br />

Dal-Degan, Prati, 1988; Habets, Gannot, 2007<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 3 / 65


Introduction<br />

Spatial Processing<br />

Design Criteria<br />

Tailored to Speech Applications<br />

1 BSS, CASA. Other methods<br />

2 Adaptive optimization Sondhi, Elko, 1986; Kaneda, Ohga, 1986.<br />

3 Minimum variance distortionless response (MVDR) <strong>and</strong> GSC.<br />

Van Compernolle, 1990; Affes <strong>and</strong> Grenier, 1997; Nordholm et al., 1999; Hoshuyama et al., 1999 Gannot et al., 2001;<br />

Herbordt et al., 2005<br />

4 Minimum mean square error (MMSE) - GSVD based spatial Wiener<br />

filter. Doclo, Moonen, 2002<br />

5 Speech distortion weighted multichannel Wiener filter (SDW-MWF).<br />

Spriet et al., 2004<br />

6 Maximum signal to noise ratio (SNR). Warsitz <strong>and</strong> Haeb-Umbach, 2005<br />

7 Linearly constrained minimum variance (LCMV). Markovich et al., 2009<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 4 / 65


Problem Formulation<br />

Problem Formulation<br />

Extraction of Desired Speaker Signal(s) in Multi-Interference Reverberant Environment<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 5 / 65


Problem Formulation<br />

Talk Outline<br />

Spatial processors (“beamformers”) based on the linearly constrained<br />

minimum variance (LCMV) criterion.<br />

Special cases:<br />

Enhancing single desired source (MVDR).<br />

Extracting desired source(s) in multiple competing speakers<br />

environment.<br />

Implementation: Generalized sidelobe canceller (GSC) in the<br />

short-time Fourier transform (STFT) domain.<br />

The relative transfer function (RTF): importance <strong>and</strong> estimation<br />

procedures.<br />

Distributed microphone arrays.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 6 / 65


Problem Formulation<br />

Problem Formulation in the STFT Domain<br />

Microphone Signals (m = 1, . . . , M):<br />

P d<br />

∑<br />

z m (l, k) = sj d (l, k)hjm(l, d k)<br />

P i<br />

Vector Formulation<br />

j=1<br />

∑<br />

∑P n<br />

+ sj i (l, k)hjm(l, i k) + sj n (l, k)hjm(k) n + n m (l, k)<br />

j=1<br />

j=1<br />

z(l, k) = H d (l, k)s d (l, k) + H i (k)s i (l, k) + H n (l, k)s n (l, k) + n(l, k)<br />

H(l, k)s(l, k) + n(l, k)<br />

P = P d + P i + P n<br />

≤ M<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 7 / 65


Problem Formulation<br />

Spatial Filters<br />

Filter <strong>and</strong> Combine<br />

y(l, k) = w H (l, k)z(l, k).<br />

z 1 (l, k)<br />

w ∗ 1(l, k)<br />

z 2 (l, k)<br />

w2 ∗ (l, k)<br />

y(l, k)<br />

z M (l, k)<br />

∑<br />

wM ∗ (l, k)<br />

Filters<br />

w(l, k): M × 1 vector.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 8 / 65


LCMV Beamformer<br />

Criterion <strong>and</strong> Solution<br />

The Linearly Constrained Minimum Variance Beamformer<br />

Er, Cantoni, 1983; Van Veen, Buckley, 1988<br />

LCMV Criterion<br />

Let Φ zz (l, k) be the M × M received signals correlation matrix.<br />

Minimize output power<br />

w H (l, k)Φ zz (l, k)w(l, k),<br />

such that the linear constraint set is satisfied:<br />

C H (l, k)w(l, k) = g(l, k).<br />

C : M × P constraints matrix; g : P × 1 response vector.<br />

Closed-form solution: w = Φ −1<br />

zz C ( C H Φ −1<br />

zz C ) −1 g (but we hardly use it)<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 9 / 65


LCMV Beamformer<br />

Criterion <strong>and</strong> Solution<br />

LCMV Beamformer<br />

Beampattern - Illustrative example<br />

0°<br />

0 dB<br />

315°<br />

−10 dB<br />

45°<br />

−20 dB<br />

−30 dB<br />

270°<br />

−40 dB<br />

90°<br />

225°<br />

135°<br />

180°<br />

10 microphone uniform linear array<br />

2 Desired sources in green <strong>and</strong> 2 interfering sources in red<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 10 / 65


LCMV Beamformer<br />

Criterion <strong>and</strong> Solution<br />

LCMV Minimization<br />

Graphical Interpretation Frost, 1972<br />

w 2<br />

w 1<br />

w † (l, k)Φ zz(l, k)w(l, k)<br />

w(l, k)<br />

w 0(l, k) =C(l, k) ( C † (l, k)C(l, k) ) −1 g(l, k)<br />

Constraint plane: C † (l, k)w(l, k) =g(l, k)<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 11 / 65


LCMV Beamformer<br />

The GSC Implementation<br />

The Generalized Sidelobe Canceller Implementation<br />

Split the Beamformer<br />

w = w 0 − w n .<br />

Constraints Subspace:<br />

w 0 ∈ Span{C}.<br />

Null Subspace: w n ∈ N {C}.<br />

w n Bq.<br />

B: M × (M − P) matrix.<br />

Spans the Null Subspace.<br />

q: vector of M − P filters.<br />

w n = −Bq<br />

w 2<br />

w new<br />

w old<br />

w 1<br />

w 0<br />

Constraint plane<br />

Constraint subspace<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 12 / 65


LCMV Beamformer<br />

The GSC Implementation<br />

The GSC Structure Griffiths, Jim, 1982<br />

Input<br />

FBF<br />

+<br />

+<br />

Output<br />

-<br />

BM<br />

ANC<br />

GSC Blocks<br />

Fixed beamformer (FBF) - satisfies the constraints.<br />

Blocking matrix (BM) - generates M − P unconstrained signals.<br />

Noise canceller (ANC) - adaptively suppresses the residual noise.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 13 / 65


MVDR<br />

Criterion, Solution, GSC<br />

The Minimum Variance Distortionless Beamformer<br />

Single Constraint<br />

Scenario:<br />

One desired signal.<br />

Beamformer Design:<br />

“Steers beam” towards the desired source (P = 1: one constraint).<br />

Minimize all other directions.<br />

C = h d<br />

g = 1<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 14 / 65


MVDR<br />

Criterion, Solution, GSC<br />

The MVDR Beamformer<br />

GSC Implementation Affes, Grenier, 1997; Hoshuyama et al., 1999; Gannot et al., 2000<br />

GSC Blocks:<br />

FBF - matched filter to the desired response, h d .<br />

BM - projection matrix to the null space of h d .<br />

ANC - recursively updated using the LMS algorithm (Shynk, 1992).<br />

Output signal:<br />

y = s d + residual noise <strong>and</strong> interference signals<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 15 / 65


The Transfer Function GSC<br />

Relative Transfer Function<br />

The Transfer Function GSC<br />

Relax Dereverberation Requirement Gannot et al., 2001<br />

Modified Constraint Set:<br />

C = h d<br />

˜g = (h d 1 ) ∗<br />

Equivalent to:<br />

C = ˜h d hd<br />

h d 1<br />

=<br />

[<br />

1 hd 2<br />

h d 1<br />

. . . hd M<br />

h d 1<br />

] T<br />

g = 1.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 <strong>16</strong> / 65


The Transfer Function GSC<br />

Relative Transfer Function<br />

The Transfer Function GSC utilizing RTF<br />

RTF suffices for implementing all blocks of the GSC.<br />

Output signal:<br />

y = h d 1 s d + residual noise <strong>and</strong> interference signals<br />

Tradeoff:<br />

Noise reduction is sacrificed if dereverberation required Habets et al., 2010.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 17 / 65


The Transfer Function GSC<br />

RTF Estimation<br />

The Importance of the RTF<br />

Advantages<br />

Shorter than the ATF.<br />

Estimation methods are available:<br />

Using speech non-stationarity (Gannot et al., 2001).<br />

Using speech probability <strong>and</strong> spectral subtraction Cohen, 2004.<br />

RTF equivalent to Interaural Transfer Function (ITF).<br />

Drawbacks<br />

Non-causal<br />

CTF-GSC<br />

For high T 60 multiplication in the frequency domain is only valid for very<br />

long frames. Hence, the RTFs will be extended to convolution in the<br />

STFT domain (CTF-GSC) Talmon et al., 2009.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 18 / 65


Multi-Constraint GSC<br />

Applications<br />

Multi-Constraint Beamformer<br />

Based on Multichannel Eigenspace Beamforming Markovich et al., 2009; 2010<br />

Applications:<br />

Conference call scenario with multiple participants.<br />

H<strong>and</strong>s-free cellular phone conversation in a car environment with<br />

several passengers.<br />

Cocktail Party scenario, in which desired conversation blend with<br />

many simultaneous conversations.<br />

Problem Formulation (Reminder):<br />

z = H d s d + H i s i + H n s n + n = Hs + n.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 19 / 65


Multi-Constraint GSC<br />

Derivation<br />

The Constraints Set<br />

Original<br />

C H = [ H d H i H n ]<br />

[<br />

1 . . . 1<br />

g } {{ } 0 } .{{ . . 0}<br />

P d<br />

P i +P n<br />

] T<br />

LCMV output<br />

P d<br />

∑<br />

y =<br />

j=1<br />

s d j<br />

+ noise components<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 20 / 65


Multi-Constraint GSC<br />

Derivation<br />

A Modified Constraints Set<br />

Noise & Interfernce<br />

H i , H n ⇒ orthonormal basis Q.<br />

Relax the dereverberation requirements using RTFs:<br />

[ ]<br />

(h<br />

d<br />

11 ) ∗ . . . (h d T<br />

N<br />

˜g <br />

d 1<br />

} {{ )∗ 0 . . . 0<br />

} } {{ } ⇒ ˜h d P<br />

P i +P<br />

j h d j /hj1<br />

d n d<br />

The modified Constraints Set<br />

˜C [ ˜H d Q ]<br />

(estimation by applying subspace methods)<br />

LCMV output<br />

P d<br />

∑<br />

y =<br />

j=1<br />

h d j1s d j<br />

+ noise components<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 21 / 65


Multi-Constraint GSC<br />

Experimental Study<br />

Single Desired Speaker<br />

Directional Noise Field<br />

(a) Noisy at mic. #1<br />

(b) Enhanced signal<br />

Figure: Female (desired) <strong>and</strong> male (interference) with Directional noise.<br />

8 microphones recorded at BIU acoustic lab set to T 60 = 300ms.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 22 / 65


Multi-Constraint GSC<br />

Experimental Study<br />

Multi-Speaker<br />

(a) Noisy at mic. #1<br />

(b) Enhanced signal<br />

Figure: 1 desired source <strong>and</strong> 3 competing speakers. 8 microphones recorded at<br />

BIU acoustic lab set to T 60 = 300ms.<br />

Approximately 20dB SIR <strong>and</strong> SNR improvement.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 23 / 65


Distributed Microphone Arrays<br />

Motivation<br />

Distributed Algorithms for Microphone Arrays<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 24 / 65


Distributed Microphone Arrays<br />

Motivation<br />

Condensed Microphone Array<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 25 / 65


Distributed Microphone Arrays<br />

Motivation<br />

Distributed Microphone Array<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 26 / 65


Distributed Microphone Arrays<br />

Motivation<br />

Centralized vs. Distributed Beamforming<br />

Condensed<br />

Simpler scheme.<br />

Easy calibration.<br />

Multitude of available<br />

solutions.<br />

Distributed<br />

Microphones can be placed<br />

r<strong>and</strong>omly, avoiding tedious<br />

calibration.<br />

Using more microphones<br />

improves spatial resolution.<br />

High probability to find<br />

microphones close to a<br />

relevant sound source.<br />

Improved sound field<br />

sampling.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 27 / 65


Distributed Microphone Arrays<br />

Motivation<br />

Challenges of Distributed Beamforming<br />

Power<br />

Communication b<strong>and</strong>width<br />

Computational complexity<br />

Communication<br />

Connectivity<br />

Protocol<br />

Capacity<br />

Arbitrary Constellation<br />

Ad-hoc<br />

Dynamics<br />

Signal Processing<br />

Partial data<br />

Synchronization<br />

Dynamics<br />

Coherence<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 28 / 65


Distributed Microphone Arrays<br />

Motivation<br />

Goals<br />

Develop a beamforming algorithm suitable for distributed microphone<br />

array.<br />

Analyze the performance with r<strong>and</strong>omly located microphones.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 29 / 65


Distributed Microphone Arrays<br />

Distributed GSC<br />

Distributed LCMV<br />

Formulation<br />

N nodes with M n microphones.<br />

∑ N<br />

n=1 M n = M.<br />

z(l, k) [ z T 1 (l, k) · · · zT N (l, k) ] T .<br />

Closed-form LCMV necessitates the inversion of Φ zz .<br />

A cumbersome task in distributed networks.<br />

Non-Optimal GSC Implementation<br />

Sum of local BFs: y(l, k) = ∑ N<br />

n=1 y n(l, k).<br />

Implement a local GSC at each node:<br />

M n − P outputs of the BM at the nth node (might go negative!).<br />

Total number of BM outputs: ∑ N<br />

n=1 (M n − P) = M − (N × P).<br />

M − (N × P) < (M − P) ⇒ degrees of freedom (DoF) lost<br />

⇒ performance degradation.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 30 / 65


Distributed Microphone Arrays<br />

Distributed GSC<br />

Distributed GSC<br />

Markovich-Golan, Gannot, Cohen 2013<br />

Overview<br />

Introduce P shared signals:<br />

Broadcast by a subset of<br />

the nodes.<br />

Retrieving degrees of<br />

freedom.<br />

Extended inputs at each<br />

node:<br />

Local microphones plus<br />

shared signals.<br />

Purely local FBFs.<br />

Block diagonal BM<br />

facilitates local BMs.<br />

Purely local ANC.<br />

Local BF output<br />

Shared signal<br />

Local BF output<br />

Local BF output<br />

Local BF output<br />

Shared signal<br />

Local BF output<br />

Total of N + P broadcast channels.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 31 / 65


Distributed Microphone Arrays<br />

Distributed GSC<br />

Nodes Connectivity<br />

Sources “Owned” by the nth Node:<br />

A node n that receives the pth source with the highest SNR is its<br />

“owner”.<br />

The shared signals broadcast by the nth node: r n = D H n z n .<br />

D n : an M n × P n selection matrix.<br />

A shared signal is responsible for only one source.<br />

Shared signals serve as a reference for RTF estimation in each node.<br />

Extended Inputs at the nth Node<br />

P − P n shared signals (excluding self-owned signals): ṙ n .<br />

Total number of signals: ¯M n = M n + P − P n<br />

Signals: ¯z n = [ z T n<br />

ṙ T n<br />

] T<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 32 / 65


Distributed Microphone Arrays<br />

Distributed GSC<br />

DGSC at the nth Node<br />

High Level Block-Diagram<br />

l<br />

Select<br />

Owner<br />

l<br />

Broadcast<br />

From other<br />

Nodes<br />

l<br />

l ,…, l<br />

GSC‐BF<br />

<br />

l<br />

Σ<br />

l<br />

Local & Global BF<br />

An ¯M n × 1 local BF at the nth node: ¯w n .<br />

Outputs of local BFs: ȳ n (l) = ¯w n H ¯z n (l).<br />

Global BF: ¯w [ ¯w 1<br />

T · · · ] T ¯wT N .<br />

Global output (available at each node): ȳ(l) = ∑ N<br />

n=1 ȳn(l).<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 33 / 65


Distributed Microphone Arrays<br />

Distributed GSC<br />

Blocks of the DGSC at the nth Node<br />

Fixed Beamformer (Local)<br />

Ĥ n the RTF relating the extended inputs <strong>and</strong> the shared signals.<br />

¯q n 1 N Ĥn<br />

(ĤH n Ĥ n<br />

) −1<br />

g.<br />

H H¯q n = g.<br />

Blocking Matrix (Block Diagonal)<br />

¯B n : M n × ( ¯M n − P) BM.<br />

Noise references: ū n (l) = ¯B H n ¯z n (l)<br />

∑ N<br />

n=1 ( ¯M n − P) = ∑ N<br />

n=1 (M n − P n ) = M − P ⇒ DoF fully utilized.<br />

Adaptive Noise Canceler (Local)<br />

Least Mean Squares: ¯f n (l) = ¯f n (l − 1) + µ ūn(l)ȳ ∗ (l)<br />

. ¯P u,n(l)<br />

Power normalization ¯P u,n (l).<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 34 / 65


Distributed Microphone Arrays<br />

Distributed GSC<br />

DGSC at the nth Node<br />

Low Level Block-Diagram<br />

Wireless link<br />

N nodes WASN<br />

ȳn ¯q FBF (l) + ȳ n (l)<br />

n<br />

ṙ n (l)<br />

P˙<br />

n × 1 ¯z n (l)<br />

z n (l) ¯M n × 1 ū n (l)<br />

M<br />

¯B n<br />

n × 1 ( ¯M n − P ) × 1<br />

¯f n<br />

P<br />

r n (l)<br />

P n × 1<br />

D n<br />

ȳ(l)<br />

∑<br />

00 11<br />

00 11 00 11<br />

00 11<br />

01<br />

+<br />

-<br />

ȳn<br />

NC (l)<br />

00 11 01 00 11<br />

00 11<br />

ȳ 1 (l)<br />

ȳ N (l)<br />

N<br />

Wireless link<br />

Node n<br />

¯w n ¯q n − ¯B n¯f n<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 35 / 65


Distributed Microphone Arrays<br />

Experimental study<br />

Wideb<strong>and</strong>: scenario<br />

4m × 4m × 3m room.<br />

Reverberation time<br />

T 60 = 300ms.<br />

N=4 nodes.<br />

M n = 2 microphones per<br />

node.<br />

Sampling frequency 8kHz.<br />

STFT window 4096 points,<br />

75% overlap.<br />

Desired speaker, 60dB SNR.<br />

Competing speaker, 60dB<br />

INR.<br />

4<br />

3.5<br />

3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

2 point source Gaussian<br />

noises, 47dB INR.<br />

Sensors noise.<br />

90 Monte-Carlo experiments<br />

(sources’ positions).<br />

Microphone<br />

Desired speaker<br />

Competing speaker<br />

Interference<br />

0<br />

0 0.5 1 1.5 2 2.5 3<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 36 / 65


Distributed Microphone Arrays<br />

Experimental study<br />

Wideb<strong>and</strong>: results (1)<br />

25<br />

20<br />

15<br />

SNR imp. [dB]<br />

10<br />

5<br />

0<br />

−5<br />

−10<br />

Centralized GSC<br />

DGSC<br />

Single node GSC<br />

−15<br />

0 10 20 30 40 50 60 70 80 90<br />

MC experiment<br />

SNR improvement in various Monte Carlo experiments.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 37 / 65


Distributed Microphone Arrays<br />

Experimental study<br />

Wideb<strong>and</strong>: results (2)<br />

20<br />

18<br />

<strong>16</strong><br />

14<br />

SNR imp. [dB]<br />

12<br />

10<br />

8<br />

6<br />

Centralized GSC<br />

DGSC<br />

Single node GSC<br />

4<br />

2<br />

0<br />

0 10 20 30 40 50 60<br />

Time [sec]<br />

The convergence of the tested algorithms versus time.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 38 / 65


Distributed Microphone Arrays<br />

Experimental study<br />

Wideb<strong>and</strong>: results (3)<br />

(a) Noisy at mic. #1<br />

(b) Single node GSC<br />

(c) Condensed GSC<br />

(d) Distributed GSC<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 39 / 65


Statistical Beamformer<br />

WASNs with R<strong>and</strong>om Node Deployment<br />

Scenarios<br />

Ad hoc sensor networks.<br />

Large volume (<strong>and</strong> many nodes).<br />

High fault percentage.<br />

Arbitrary deployment of nodes.<br />

Questions<br />

How many nodes are required?<br />

What is the expected<br />

performance?<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 40 / 65


Statistical Beamformer<br />

R<strong>and</strong>om Beampattern using WASN<br />

Scenario<br />

Desired speaker.<br />

Coherent or diffused noise fields.<br />

Microphones are r<strong>and</strong>omly positioned.<br />

Reverberant enclosure.<br />

Derivation<br />

ATFs are complex r<strong>and</strong>om variables.<br />

Derive a model for ATFs (based on the work of Schröder)<br />

Consider two BF performance criteria: SNR, White noise gain<br />

These criteria become r<strong>and</strong>om variables.<br />

Analyze the statistics.<br />

Derive reliability functions: The probability that the criteria exceed a<br />

pre-defined level.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 41 / 65


Statistical Beamformer<br />

Reliability<br />

1<br />

1<br />

0.9<br />

0.9<br />

0.8<br />

0.8<br />

0.7<br />

0.7<br />

0.6<br />

0.6<br />

R κ,dif<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

M=5, empirical<br />

M=5, theoretical<br />

M=10, empirical<br />

M=10, theoretical<br />

M=15, empirical<br />

M=15, theoretical<br />

M=20, empirical<br />

M=20, theoretical<br />

M=25, empirical<br />

M=25, theoretical<br />

0<br />

−2 0 2 4 6 8 10 12 14 <strong>16</strong><br />

κ dif<br />

improvement [dB]<br />

R κ,c<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

M=5, empirical<br />

M=5, theoretical<br />

M=10, empirical<br />

M=10, theoretical<br />

M=15, empirical<br />

M=15, theoretical<br />

M=20, empirical<br />

M=20, theoretical<br />

M=25, empirical<br />

M=25, theoretical<br />

0<br />

2 4 6 8 10 12 14 <strong>16</strong> 18 20<br />

κ c<br />

improvement<br />

(a) Diffuse Noise, SIR=30dB, residual interference<br />

dominant<br />

(b) Coherent Noise, SIR=0dB, residual<br />

noise dominnat<br />

SINR out − SNR in , T 60 = 0.4sec, Room dimensions 4 × 4 × 3m.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 42 / 65


Discussion<br />

Discussion<br />

Advantages of the Proposed family of Algorithms<br />

1 LCMV beamformers: low distortion <strong>and</strong> high interference cancellation.<br />

2 Practical estimation <strong>and</strong> tracking procedures.<br />

3 Extendable to binaural algorithms for hearing aids (Hadad, Gannot, Doclo, 2012).<br />

4 The RTF allows for:<br />

1 Shorter processing frames.<br />

2 Higher noise reduction (traded-off for dereverberation).<br />

3 Preservation of binaural cues.<br />

5 Distributed versions (see also Bertr<strong>and</strong>, Moonen, 2011).<br />

There is still a lot to do...<br />

1 Arbitrary activity patterns.<br />

2 Improved tracking of moving speakers <strong>and</strong> sensors.<br />

3 Robustness.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 43 / 65


Discussion<br />

Thanks to my Collaborators<br />

1 Shmulik Markovich-Golan<br />

2 Israel Cohen<br />

3 Ronen Talmon<br />

4 David Levin<br />

5 Emanuël Habets<br />

6 many more...<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 44 / 65


Acoustic <strong>Lab</strong><br />

Vielen dank !<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 45 / 65


Introduction<br />

Overview of Spatial Noise Reduction Techniques<br />

1 Fixed beamforming Combine the microphone signals using a<br />

time-invariant filter-<strong>and</strong>-sum operation (data-independent).<br />

Jan, Flanagan, 1996; Doclo, Moonen, 2003; Schäfer, Heese, Wernerus, Vary, 2012<br />

2 Adaptive Beamforming Combine the spatial focusing of fixed<br />

beamformers with adaptive suppression of (spectrally <strong>and</strong> spatially<br />

time-varying) background noise.<br />

General reading: Cox et al., 1987; Van Veen, Buckley, 1988; Van Trees, 2002<br />

3 Blind Source Separation (BSS) Considers the received signals at the<br />

microphones as a mixture of all sound sources filtered by the RIRs.<br />

Utilizes Independent Component Analysis (ICA) techniques. Makino, 2003<br />

4 Computational Auditory Scene Analysis (CASA) Aims at performing<br />

sound segregation by modelling the human auditory perceptual<br />

processing. Brown, Wang, 2005<br />

Back<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 46 / 65


The Transfer Function GSC<br />

RTF Estimation<br />

Relative Transfer Function Estimation<br />

System Perspective:<br />

z m = ˜h d mz 1 + u m<br />

System Identification:<br />

ˆΦ zmz 1<br />

= ˜h d m ˆΦ z1 z 1<br />

+ Φ umz 1<br />

+ ε m<br />

Estimation is Biased:<br />

u m <strong>and</strong> z 1 are correlated ⇒ Biased estimator for ˜h d m<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 47 / 65


The Transfer Function GSC<br />

RTF Estimation<br />

Relative Transfer Function Estimation<br />

Based on Speech Non-stationarity Shalvi, Weinstein, 1996; Gannot et al., 2001<br />

Assumptions:<br />

System is Time-Invariant<br />

Noise is stationary (true for Φ vv (k))<br />

Speech is non-stationary (use frames l i , i = 1, . . . , I )<br />

⎡<br />

⎤ ⎡<br />

⎤<br />

⎡ ⎤<br />

ˆΦ zmz1 (l 1 , k) ˆΦ z1 z 1<br />

(l 1 , k) 1<br />

ε m (l 1 , k)<br />

ˆΦ zmz1 (l 2 , k)<br />

⎢<br />

⎥<br />

⎣ . ⎦ = ˆΦ z1 z 1<br />

(l 2 , k) 1<br />

[ ]<br />

˜hd<br />

⎢<br />

⎥ m (k)<br />

ε m (l 2 , k)<br />

+ ⎢ ⎥<br />

⎣ . ⎦ Φ umz1 (k) ⎣ . ⎦<br />

ˆΦ zmz1 (l I , k) ˆΦ z1 z 1<br />

(l I , k) 1<br />

ε m (l I , k)<br />

Back<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 48 / 65


CTF vs. MTF<br />

Motivation<br />

The Convolutive TF-GSC Talmon et al., 2009<br />

Motivation<br />

The TF-GSC (Gannot et al., 2001)<br />

The RTFs are incorporated into the MVDR beamformer, implemented<br />

in GSC structure.<br />

The adaptation to reverberant environments, is obtained by<br />

time-frequency domain implementation.<br />

The TF-GSC achieves improved enhancement of a noisy speech signal<br />

(compared with the time-domain, delay-only, GSC).<br />

High T 60<br />

The RIRs become very long.<br />

The MTF approximation is only valid if the time frames are<br />

significantly larger than the relative RIR.<br />

In practice, short frames are used, resulting in inaccurate<br />

representation of the RTF deteriorating performance.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 49 / 65


CTF vs. MTF<br />

Motivation<br />

Objectives<br />

In the STFT domain<br />

Formulate the problem using system representation in the STFT<br />

domain (Avargel <strong>and</strong> Cohen, 2007).<br />

Build a GSC scheme (a TF-GSC extension).<br />

Suggest practical solutions using approximations. Specifically, show<br />

solutions under the MTF <strong>and</strong> CTF approximations.<br />

Incorporate the RTF identification based on the CTF model<br />

(Talmon et al., 2009).<br />

Significant improvements in blocking ability , NR, distortion <strong>and</strong> SNR<br />

in comparison with the TF-GSC.<br />

Back<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 50 / 65


CTF vs. MTF<br />

Simulation Results<br />

Experimental Study<br />

Setup<br />

Comparing the CTF-GSC <strong>and</strong> the TF-GSC<br />

Image method (Allan <strong>and</strong> Berkley, 1979, implemented by Habets, 2009).<br />

Array of 5 microphones.<br />

Reverberation time T 60 = 0.5s.<br />

TF-GSC:<br />

Frame length - N = 512.<br />

RTF length - 250.<br />

Noise Canceller length - 450.<br />

CTF-GSC:<br />

In FBF <strong>and</strong> BM - N = 512, 50% overlap.<br />

In adaptive NC - N = 512, 75% overlap.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 51 / 65


CTF vs. MTF<br />

Simulation Results<br />

Signal Blocking<br />

The signal blocking factor (SBF) is defined by:<br />

SBF = 10 log 10<br />

E { (s d ∗ h d 1 [n])2}<br />

Mean m E {u 2 m[n]}<br />

where u m [n]; m = 2, . . . , M are the blocking matrix outputs.<br />

The blocking ability [dB] (known RTF)<br />

mic. 2 mic. 3 mic. 4 mic. 5<br />

TF 17 12 14 8<br />

CTF 22 17 18 13<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 52 / 65


CTF vs. MTF<br />

Simulation Results<br />

Known RTF, Input SNR=0dB<br />

0.1<br />

0.1<br />

0.05<br />

0.05<br />

Amplitude<br />

0<br />

Amplitude<br />

0<br />

−0.05<br />

−0.05<br />

−0.1<br />

−0.1<br />

0 0.5 1 1.5 2 2.5<br />

Time [sec]<br />

(c) Reverberated speech at microphone #1.<br />

0 0.5 1 1.5 2 2.5<br />

Time [sec]<br />

(d) Noisy signal at microphone #1.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 53 / 65


CTF vs. MTF<br />

Simulation Results<br />

Known RTF, Input SNR=0dB (cont.)<br />

0.1<br />

0.1<br />

0.05<br />

0.05<br />

Amplitude<br />

0<br />

Amplitude<br />

0<br />

−0.05<br />

−0.05<br />

−0.1<br />

−0.1<br />

0 0.5 1 1.5 2 2.5 3<br />

Time [sec]<br />

(e) TF-GSC output.<br />

0 0.5 1 1.5 2 2.5<br />

Time [sec]<br />

(f) CTF-GSC output.<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 54 / 65


CTF vs. MTF<br />

Simulation Results<br />

Summary<br />

Output SNR <strong>and</strong> Noise Reduction [dB] for known RTF<br />

In SNR SNR NR<br />

TF-GSC CTF-GSC TF-GSC CTF-GSC<br />

-5 3.8 8.5 -5.9 -10.9<br />

-2.5 5.2 10.0 -6.2 -10.9<br />

0 6.2 11.2 -6.2 -10.9<br />

2.5 7.0 12.0 -6.7 -10.9<br />

5 7.9 12.5 -6.1 -10.9<br />

Back<br />

15<br />

12.5<br />

TF−GSC<br />

CTF−GSC<br />

Output SNR [dB]<br />

10<br />

7.5<br />

5<br />

2.5<br />

0<br />

−5 0 5<br />

Input SNR [dB]<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 55 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

Estimation<br />

Unknowns<br />

Desired sources RTFs, ˜H d .<br />

Interferences subspace basis, Q.<br />

Limiting Assumptions<br />

The RIRs are time invariant.<br />

A segment without interference speakers is available for each desired<br />

source (double talk within desired allowed).<br />

A segment without desired speakers is available for each interfering<br />

source (double talk within interfering allowed).<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 56 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

Interferences Subspace Estimation<br />

Step 1<br />

EVD <strong>and</strong> Pruning<br />

Estimate the signals<br />

subspace at each time<br />

segment without any<br />

desired sources active<br />

ˆΦ i zz = ¯Q i Λ i ¯Q H i<br />

All eigenvectors<br />

corresponding to “weak”<br />

eigenvalues are discarded<br />

dB<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

−10<br />

−20<br />

−30<br />

Eigen values per frequency<br />

100 200 300 400 500 600 700 800 900 1000<br />

Frequency index<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 57 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

Interferences Subspace Estimation<br />

Step 2<br />

Union of Estimates using Gram-Schmidt<br />

Apply QRD<br />

[<br />

]<br />

¯Q 1¯Λ 1 2<br />

1<br />

. . . ¯Q Nseg ¯Λ 1 2<br />

Nseg<br />

P = QR<br />

Discard vectors from the basis Q that correspond to “weak”<br />

coefficients in R<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 58 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

EVD per Frame - Graphical Interpretation<br />

Frame 1, strong eigenvectors<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 59 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

EVD per Frame - Graphical Interpretation<br />

Frame 2, strong eigenvectors<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 60 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

EVD per Frame - Graphical Interpretation<br />

Frame 3, strong eigenvectors<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 61 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

QRD Calculation<br />

Graphical Interpretation<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 62 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

QRD Pruning<br />

Graphical Interpretation<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 63 / 65


Multi-Constraint GSC<br />

Parameter Estimation<br />

Desired Sources RTF Estimation<br />

For a Single Desired Source<br />

PSD Estimation<br />

Estimate spatial PSD of stationary sources (during noise-only<br />

segments).<br />

Estimate spatial PSD of one desired source + stationary sources.<br />

Largest Generalized Eigenvector<br />

f = GEVD(“Desired+Noise PSD”, “Noise PSD”).<br />

Rotate <strong>and</strong> normalize f to obtain the RTF ˆ˜h d j .<br />

Back<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 64 / 65


Binaural LCMV<br />

Experimental Study<br />

Setup<br />

Hearing device:<br />

2 hearing aid devices mounted on B&K HATS, with 2 microphones,<br />

2cm inter-distance.<br />

A 9 × 5 utility device with 4 mics. at the corners, average distance<br />

3.5cm. The device placed on a table at a distance of 0.5m.<br />

Signals:<br />

1 desired speaker, θ d = 30 o , 1m (constrained).<br />

1 interference speaker at θ i = −70 o , 1m (constrained).<br />

1 directional stationary noise, θ n = −40 o , 2.5m (unconstrained).<br />

SIR=0dB, SNR=14dB.<br />

Acoustic lab:<br />

Dimensions 6 × 6 × 2.4; Controllable reverb. time T 60 = 0.3s.<br />

STFT:<br />

Sampling frequency 8kHz, 4096 points, 75% overlap.<br />

Algorithm Cue gain factors:<br />

Desired speech - η = 1.<br />

Interference speech - µ = 0.1 (20dB attenuation). Back<br />

S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 65 / 65

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!