PDF, 16,4 MB - Quality and Usability Lab - TU Berlin

Multi-Microphone Speech Enhancement 

Centralized and Distributed Beamformers 

Sharon Gannot 

Faculty of Engineering, Bar-Ilan University, Israel 

TU Berlin, 7.1.2013, Berlin, Germany 

S. Gannot (BIU) Multi-Microphone Speech Enhancement TU Berlin, 2013 1 / 65

Introduction 

Speech Enhancement and Source Extraction 

Utilizing Microphone Arrays 

Applications 

1 Hands-free communications. 

2 Teleconference. 

3 Skype calls. 

4 Hearing aids. 

5 Eavesdropping. 

Why is it a Difficult Task? 

1 Speech is a non-stationary 

signal with high dynamic 

range. 

2 Very long acoustic path 

(reverberation). 

3 Time varying acoustic path. 

4 Microphone arrays impose 

high computational burden. 

5 Large (and distributed) 

arrays require large 

communication bandwidth. 


Introduction 

Room Acoustics Essentials 

Reverberation 

Late reflections tend to be 

diffused, hence performance 

degradation. 

Beamforming: higher 

processing burden and high 

latency in STFT 

implementation. 

Deteriorates intelligibility, 

quality and ASR 

performance. 

Sound Fields 

Directional Room impulse 

response relates source and 

microphones. 

Uncorrelated Signals on 

microphone are uncorrelated. 

Diffused Sound is coming 

from all directions. 

Dal-Degan, Prati, 1988; Habets, Gannot, 2007 


Introduction 

Spatial Processing 

Design Criteria 

Tailored to Speech Applications 

1 BSS, CASA. Other methods 

2 Adaptive optimization Sondhi, Elko, 1986; Kaneda, Ohga, 1986. 

3 Minimum variance distortionless response (MVDR) and GSC. 

Van Compernolle, 1990; Affes and Grenier, 1997; Nordholm et al., 1999; Hoshuyama et al., 1999 Gannot et al., 2001; 

Herbordt et al., 2005 

4 Minimum mean square error (MMSE) - GSVD based spatial Wiener 

filter. Doclo, Moonen, 2002 

5 Speech distortion weighted multichannel Wiener filter (SDW-MWF). 

Spriet et al., 2004 

6 Maximum signal to noise ratio (SNR). Warsitz and Haeb-Umbach, 2005 

7 Linearly constrained minimum variance (LCMV). Markovich et al., 2009 


Problem Formulation 


Extraction of Desired Speaker Signal(s) in Multi-Interference Reverberant Environment 



Talk Outline 

Spatial processors (“beamformers”) based on the linearly constrained 

minimum variance (LCMV) criterion. 

Special cases: 

Enhancing single desired source (MVDR). 

Extracting desired source(s) in multiple competing speakers 

environment. 

Implementation: Generalized sidelobe canceller (GSC) in the 

short-time Fourier transform (STFT) domain. 

The relative transfer function (RTF): importance and estimation 

procedures. 

Distributed microphone arrays. 



Problem Formulation in the STFT Domain 

Microphone Signals (m = 1, . . . , M): 

P d 

∑ 

z m (l, k) = sj d (l, k)hjm(l, d k) 

P i 

Vector Formulation 

j=1 

∑ 

∑P n 

+ sj i (l, k)hjm(l, i k) + sj n (l, k)hjm(k) n + n m (l, k) 

j=1 

j=1 

z(l, k) = H d (l, k)s d (l, k) + H i (k)s i (l, k) + H n (l, k)s n (l, k) + n(l, k) 

H(l, k)s(l, k) + n(l, k) 

P = P d + P i + P n 

≤ M 



Spatial Filters 

Filter and Combine 

y(l, k) = w H (l, k)z(l, k). 

z 1 (l, k) 

w ∗ 1(l, k) 

z 2 (l, k) 

w2 ∗ (l, k) 

y(l, k) 

z M (l, k) 

∑ 

wM ∗ (l, k) 

Filters 

w(l, k): M × 1 vector. 


LCMV Beamformer 

Criterion and Solution 

The Linearly Constrained Minimum Variance Beamformer 

Er, Cantoni, 1983; Van Veen, Buckley, 1988 

LCMV Criterion 

Let Φ zz (l, k) be the M × M received signals correlation matrix. 

Minimize output power 

w H (l, k)Φ zz (l, k)w(l, k), 

such that the linear constraint set is satisfied: 

C H (l, k)w(l, k) = g(l, k). 

C : M × P constraints matrix; g : P × 1 response vector. 

Closed-form solution: w = Φ −1 

zz C ( C H Φ −1 

zz C ) −1 g (but we hardly use it) 





Beampattern - Illustrative example 

0° 

0 dB 

315° 

−10 dB 

45° 

−20 dB 

−30 dB 

270° 

−40 dB 

90° 

225° 

135° 

180° 

10 microphone uniform linear array 

2 Desired sources in green and 2 interfering sources in red 




LCMV Minimization 

Graphical Interpretation Frost, 1972 

w 2 

w 1 

w † (l, k)Φ zz(l, k)w(l, k) 

w(l, k) 

w 0(l, k) =C(l, k) ( C † (l, k)C(l, k) ) −1 g(l, k) 

Constraint plane: C † (l, k)w(l, k) =g(l, k) 



The GSC Implementation 

The Generalized Sidelobe Canceller Implementation 

Split the Beamformer 

w = w 0 − w n . 

Constraints Subspace: 

w 0 ∈ Span{C}. 

Null Subspace: w n ∈ N {C}. 

w n Bq. 

B: M × (M − P) matrix. 

Spans the Null Subspace. 

q: vector of M − P filters. 

w n = −Bq 

w 2 

w new 

w old 

w 1 

w 0 

Constraint plane 

Constraint subspace 



The GSC Implementation 

The GSC Structure Griffiths, Jim, 1982 

Input 

FBF 

+ 

+ 

Output 

- 

BM 

ANC 

GSC Blocks 

Fixed beamformer (FBF) - satisfies the constraints. 

Blocking matrix (BM) - generates M − P unconstrained signals. 

Noise canceller (ANC) - adaptively suppresses the residual noise. 


MVDR 

Criterion, Solution, GSC 

The Minimum Variance Distortionless Beamformer 

Single Constraint 

Scenario: 

One desired signal. 

Beamformer Design: 

“Steers beam” towards the desired source (P = 1: one constraint). 

Minimize all other directions. 

C = h d 

g = 1 


MVDR 

Criterion, Solution, GSC 

The MVDR Beamformer 

GSC Implementation Affes, Grenier, 1997; Hoshuyama et al., 1999; Gannot et al., 2000 

GSC Blocks: 

FBF - matched filter to the desired response, h d . 

BM - projection matrix to the null space of h d . 

ANC - recursively updated using the LMS algorithm (Shynk, 1992). 

Output signal: 

y = s d + residual noise and interference signals 


The Transfer Function GSC 

Relative Transfer Function 


Relax Dereverberation Requirement Gannot et al., 2001 

Modified Constraint Set: 

C = h d 

˜g = (h d 1 ) ∗ 

Equivalent to: 

C = ˜h d hd 

h d 1 

= 

[ 

1 hd 2 

h d 1 

. . . hd M 

h d 1 

] T 

g = 1. 

S. Gannot (BIU) Multi-Microphone Speech Enhancement TU Berlin, 2013 16 / 65


Relative Transfer Function 

The Transfer Function GSC utilizing RTF 

RTF suffices for implementing all blocks of the GSC. 

Output signal: 

y = h d 1 s d + residual noise and interference signals 

Tradeoff: 

Noise reduction is sacrificed if dereverberation required Habets et al., 2010. 



RTF Estimation 

The Importance of the RTF 

Advantages 

Shorter than the ATF. 

Estimation methods are available: 

Using speech non-stationarity (Gannot et al., 2001). 

Using speech probability and spectral subtraction Cohen, 2004. 

RTF equivalent to Interaural Transfer Function (ITF). 

Drawbacks 

Non-causal 

CTF-GSC 

For high T 60 multiplication in the frequency domain is only valid for very 

long frames. Hence, the RTFs will be extended to convolution in the 

STFT domain (CTF-GSC) Talmon et al., 2009. 


Multi-Constraint GSC 

Applications 

Multi-Constraint Beamformer 

Based on Multichannel Eigenspace Beamforming Markovich et al., 2009; 2010 

Applications: 

Conference call scenario with multiple participants. 

Hands-free cellular phone conversation in a car environment with 

several passengers. 

Cocktail Party scenario, in which desired conversation blend with 

many simultaneous conversations. 

Problem Formulation (Reminder): 

z = H d s d + H i s i + H n s n + n = Hs + n. 



Derivation 

The Constraints Set 

Original 

C H = [ H d H i H n ] 

[ 

1 . . . 1 

g } {{ } 0 } .{{ . . 0} 

P d 

P i +P n 

] T 

LCMV output 

P d 

∑ 

y = 

j=1 

s d j 

+ noise components 



Derivation 

A Modified Constraints Set 

Noise & Interfernce 

H i , H n ⇒ orthonormal basis Q. 

Relax the dereverberation requirements using RTFs: 

[ ] 

(h 

d 

11 ) ∗ . . . (h d T 

N 

˜g 

d 1 

} {{ )∗ 0 . . . 0 

} } {{ } ⇒ ˜h d P 

P i +P 

j h d j /hj1 

d n d 

The modified Constraints Set 

˜C [ ˜H d Q ] 

(estimation by applying subspace methods) 

LCMV output 

P d 

∑ 

y = 

j=1 

h d j1s d j 

+ noise components 



Experimental Study 

Single Desired Speaker 

Directional Noise Field 

(a) Noisy at mic. #1 

(b) Enhanced signal 

Figure: Female (desired) and male (interference) with Directional noise. 

8 microphones recorded at BIU acoustic lab set to T 60 = 300ms. 




Multi-Speaker 


(b) Enhanced signal 

Figure: 1 desired source and 3 competing speakers. 8 microphones recorded at 

BIU acoustic lab set to T 60 = 300ms. 

Approximately 20dB SIR and SNR improvement. 


Distributed Microphone Arrays 

Motivation 

Distributed Algorithms for Microphone Arrays 



Motivation 

Condensed Microphone Array 



Motivation 

Distributed Microphone Array 



Motivation 

Centralized vs. Distributed Beamforming 

Condensed 

Simpler scheme. 

Easy calibration. 

Multitude of available 

solutions. 

Distributed 

Microphones can be placed 

randomly, avoiding tedious 

calibration. 

Using more microphones 

improves spatial resolution. 

High probability to find 

microphones close to a 

relevant sound source. 

Improved sound field 

sampling. 



Motivation 

Challenges of Distributed Beamforming 

Power 

Communication bandwidth 

Computational complexity 

Communication 

Connectivity 

Protocol 

Capacity 

Arbitrary Constellation 

Ad-hoc 

Dynamics 

Signal Processing 

Partial data 

Synchronization 

Dynamics 

Coherence 



Motivation 

Goals 

Develop a beamforming algorithm suitable for distributed microphone 

array. 

Analyze the performance with randomly located microphones. 



Distributed GSC 

Distributed LCMV 

Formulation 

N nodes with M n microphones. 

∑ N 

n=1 M n = M. 

z(l, k) [ z T 1 (l, k) · · · zT N (l, k) ] T . 

Closed-form LCMV necessitates the inversion of Φ zz . 

A cumbersome task in distributed networks. 

Non-Optimal GSC Implementation 

Sum of local BFs: y(l, k) = ∑ N 

n=1 y n(l, k). 

Implement a local GSC at each node: 

M n − P outputs of the BM at the nth node (might go negative!). 

Total number of BM outputs: ∑ N 

n=1 (M n − P) = M − (N × P). 

M − (N × P) < (M − P) ⇒ degrees of freedom (DoF) lost 

⇒ performance degradation. 





Markovich-Golan, Gannot, Cohen 2013 

Overview 

Introduce P shared signals: 

Broadcast by a subset of 

the nodes. 

Retrieving degrees of 

freedom. 

Extended inputs at each 

node: 

Local microphones plus 

shared signals. 

Purely local FBFs. 

Block diagonal BM 

facilitates local BMs. 

Purely local ANC. 

Local BF output 

Shared signal 




Shared signal 


Total of N + P broadcast channels. 




Nodes Connectivity 

Sources “Owned” by the nth Node: 

A node n that receives the pth source with the highest SNR is its 

“owner”. 

The shared signals broadcast by the nth node: r n = D H n z n . 

D n : an M n × P n selection matrix. 

A shared signal is responsible for only one source. 

Shared signals serve as a reference for RTF estimation in each node. 

Extended Inputs at the nth Node 

P − P n shared signals (excluding self-owned signals): ṙ n . 

Total number of signals: ¯M n = M n + P − P n 

Signals: ¯z n = [ z T n 

ṙ T n 

] T 




DGSC at the nth Node 

High Level Block-Diagram 

l 

Select 

Owner 

l 

Broadcast 

From other 

Nodes 

l 

l ,…, l 

GSC‐BF 

 

l 

Σ 

l 

Local & Global BF 

An ¯M n × 1 local BF at the nth node: ¯w n . 

Outputs of local BFs: ȳ n (l) = ¯w n H ¯z n (l). 

Global BF: ¯w [ ¯w 1 

T · · · ] T ¯wT N . 

Global output (available at each node): ȳ(l) = ∑ N 

n=1 ȳn(l). 




Blocks of the DGSC at the nth Node 

Fixed Beamformer (Local) 

Ĥ n the RTF relating the extended inputs and the shared signals. 

¯q n 1 N Ĥn 

(ĤH n Ĥ n 

) −1 

g. 

H H¯q n = g. 

Blocking Matrix (Block Diagonal) 

¯B n : M n × ( ¯M n − P) BM. 

Noise references: ū n (l) = ¯B H n ¯z n (l) 

∑ N 

n=1 ( ¯M n − P) = ∑ N 

n=1 (M n − P n ) = M − P ⇒ DoF fully utilized. 

Adaptive Noise Canceler (Local) 

Least Mean Squares: ¯f n (l) = ¯f n (l − 1) + µ ūn(l)ȳ ∗ (l) 

. ¯P u,n(l) 

Power normalization ¯P u,n (l). 




DGSC at the nth Node 

Low Level Block-Diagram 

Wireless link 

N nodes WASN 

ȳn ¯q FBF (l) + ȳ n (l) 

n 

ṙ n (l) 

P˙ 

n × 1 ¯z n (l) 

z n (l) ¯M n × 1 ū n (l) 

M 

¯B n 

n × 1 ( ¯M n − P ) × 1 

¯f n 

P 

r n (l) 

P n × 1 

D n 

ȳ(l) 

∑ 

00 11 

00 11 00 11 

00 11 

01 

+ 

- 

ȳn 

NC (l) 

00 11 01 00 11 

00 11 

ȳ 1 (l) 

ȳ N (l) 

N 

Wireless link 

Node n 

¯w n ¯q n − ¯B n¯f n 



Experimental study 

Wideband: scenario 

4m × 4m × 3m room. 

Reverberation time 

T 60 = 300ms. 

N=4 nodes. 

M n = 2 microphones per 

node. 

Sampling frequency 8kHz. 

STFT window 4096 points, 

75% overlap. 

Desired speaker, 60dB SNR. 

Competing speaker, 60dB 

INR. 

4 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

2 point source Gaussian 

noises, 47dB INR. 

Sensors noise. 

90 Monte-Carlo experiments 

(sources’ positions). 

Microphone 

Desired speaker 

Competing speaker 

Interference 

0 

0 0.5 1 1.5 2 2.5 3 




Wideband: results (1) 

25 

20 

15 

SNR imp. [dB] 

10 

5 

0 

−5 

−10 

Centralized GSC 

DGSC 

Single node GSC 

−15 

0 10 20 30 40 50 60 70 80 90 

MC experiment 

SNR improvement in various Monte Carlo experiments. 





20 

18 

16 

14 

SNR imp. [dB] 

12 

10 

8 

6 

Centralized GSC 

DGSC 

Single node GSC 

4 

2 

0 

0 10 20 30 40 50 60 

Time [sec] 

The convergence of the tested algorithms versus time. 






(b) Single node GSC 

(c) Condensed GSC 

(d) Distributed GSC 


Statistical Beamformer 

WASNs with Random Node Deployment 

Scenarios 

Ad hoc sensor networks. 

Large volume (and many nodes). 

High fault percentage. 

Arbitrary deployment of nodes. 

Questions 

How many nodes are required? 

What is the expected 

performance? 



Random Beampattern using WASN 

Scenario 

Desired speaker. 

Coherent or diffused noise fields. 

Microphones are randomly positioned. 

Reverberant enclosure. 

Derivation 

ATFs are complex random variables. 

Derive a model for ATFs (based on the work of Schröder) 

Consider two BF performance criteria: SNR, White noise gain 

These criteria become random variables. 

Analyze the statistics. 

Derive reliability functions: The probability that the criteria exceed a 

pre-defined level. 



Reliability 

1 

1 

0.9 

0.9 

0.8 

0.8 

0.7 

0.7 

0.6 

0.6 

R κ,dif 

0.5 

0.4 

0.3 

0.2 

0.1 

M=5, empirical 

M=5, theoretical 









0 

−2 0 2 4 6 8 10 12 14 16 

κ dif 

improvement [dB] 

R κ,c 

0.5 

0.4 

0.3 

0.2 

0.1 











0 

2 4 6 8 10 12 14 16 18 20 

κ c 

improvement 

(a) Diffuse Noise, SIR=30dB, residual interference 

dominant 

(b) Coherent Noise, SIR=0dB, residual 

noise dominnat 

SINR out − SNR in , T 60 = 0.4sec, Room dimensions 4 × 4 × 3m. 


Discussion 

Discussion 

Advantages of the Proposed family of Algorithms 

1 LCMV beamformers: low distortion and high interference cancellation. 

2 Practical estimation and tracking procedures. 

3 Extendable to binaural algorithms for hearing aids (Hadad, Gannot, Doclo, 2012). 

4 The RTF allows for: 

1 Shorter processing frames. 

2 Higher noise reduction (traded-off for dereverberation). 

3 Preservation of binaural cues. 

5 Distributed versions (see also Bertrand, Moonen, 2011). 

There is still a lot to do... 

1 Arbitrary activity patterns. 

2 Improved tracking of moving speakers and sensors. 

3 Robustness. 


Discussion 

Thanks to my Collaborators 

1 Shmulik Markovich-Golan 

2 Israel Cohen 

3 Ronen Talmon 

4 David Levin 

5 Emanuël Habets 

6 many more... 


Acoustic Lab 

Vielen dank ! 


Introduction 

Overview of Spatial Noise Reduction Techniques 

1 Fixed beamforming Combine the microphone signals using a 

time-invariant filter-and-sum operation (data-independent). 

Jan, Flanagan, 1996; Doclo, Moonen, 2003; Schäfer, Heese, Wernerus, Vary, 2012 

2 Adaptive Beamforming Combine the spatial focusing of fixed 

beamformers with adaptive suppression of (spectrally and spatially 

time-varying) background noise. 

General reading: Cox et al., 1987; Van Veen, Buckley, 1988; Van Trees, 2002 

3 Blind Source Separation (BSS) Considers the received signals at the 

microphones as a mixture of all sound sources filtered by the RIRs. 

Utilizes Independent Component Analysis (ICA) techniques. Makino, 2003 

4 Computational Auditory Scene Analysis (CASA) Aims at performing 

sound segregation by modelling the human auditory perceptual 

processing. Brown, Wang, 2005 

Back 




Relative Transfer Function Estimation 

System Perspective: 

z m = ˜h d mz 1 + u m 

System Identification: 

ˆΦ zmz 1 

= ˜h d m ˆΦ z1 z 1 

+ Φ umz 1 

+ ε m 

Estimation is Biased: 

u m and z 1 are correlated ⇒ Biased estimator for ˜h d m 




Relative Transfer Function Estimation 

Based on Speech Non-stationarity Shalvi, Weinstein, 1996; Gannot et al., 2001 

Assumptions: 

System is Time-Invariant 

Noise is stationary (true for Φ vv (k)) 

Speech is non-stationary (use frames l i , i = 1, . . . , I ) 

⎡ 

⎤ ⎡ 

⎤ 

⎡ ⎤ 

ˆΦ zmz1 (l 1 , k) ˆΦ z1 z 1 

(l 1 , k) 1 

ε m (l 1 , k) 

ˆΦ zmz1 (l 2 , k) 

⎢ 

⎥ 

⎣ . ⎦ = ˆΦ z1 z 1 

(l 2 , k) 1 

[ ] 

˜hd 

⎢ 

⎥ m (k) 

ε m (l 2 , k) 

+ ⎢ ⎥ 

⎣ . ⎦ Φ umz1 (k) ⎣ . ⎦ 

ˆΦ zmz1 (l I , k) ˆΦ z1 z 1 

(l I , k) 1 

ε m (l I , k) 

Back 


CTF vs. MTF 

Motivation 

The Convolutive TF-GSC Talmon et al., 2009 

Motivation 

The TF-GSC (Gannot et al., 2001) 

The RTFs are incorporated into the MVDR beamformer, implemented 

in GSC structure. 

The adaptation to reverberant environments, is obtained by 

time-frequency domain implementation. 

The TF-GSC achieves improved enhancement of a noisy speech signal 

(compared with the time-domain, delay-only, GSC). 

High T 60 

The RIRs become very long. 

The MTF approximation is only valid if the time frames are 

significantly larger than the relative RIR. 

In practice, short frames are used, resulting in inaccurate 

representation of the RTF deteriorating performance. 


CTF vs. MTF 

Motivation 

Objectives 

In the STFT domain 

Formulate the problem using system representation in the STFT 

domain (Avargel and Cohen, 2007). 

Build a GSC scheme (a TF-GSC extension). 

Suggest practical solutions using approximations. Specifically, show 

solutions under the MTF and CTF approximations. 

Incorporate the RTF identification based on the CTF model 

(Talmon et al., 2009). 

Significant improvements in blocking ability , NR, distortion and SNR 

in comparison with the TF-GSC. 

Back 


CTF vs. MTF 

Simulation Results 


Setup 

Comparing the CTF-GSC and the TF-GSC 

Image method (Allan and Berkley, 1979, implemented by Habets, 2009). 

Array of 5 microphones. 

Reverberation time T 60 = 0.5s. 

TF-GSC: 

Frame length - N = 512. 

RTF length - 250. 

Noise Canceller length - 450. 

CTF-GSC: 

In FBF and BM - N = 512, 50% overlap. 

In adaptive NC - N = 512, 75% overlap. 


CTF vs. MTF 


Signal Blocking 

The signal blocking factor (SBF) is defined by: 

SBF = 10 log 10 

E { (s d ∗ h d 1 [n])2} 

Mean m E {u 2 m[n]} 

where u m [n]; m = 2, . . . , M are the blocking matrix outputs. 

The blocking ability [dB] (known RTF) 

mic. 2 mic. 3 mic. 4 mic. 5 

TF 17 12 14 8 

CTF 22 17 18 13 


CTF vs. MTF 


Known RTF, Input SNR=0dB 

0.1 

0.1 

0.05 

0.05 

Amplitude 

0 

Amplitude 

0 

−0.05 

−0.05 

−0.1 

−0.1 

0 0.5 1 1.5 2 2.5 

Time [sec] 

(c) Reverberated speech at microphone #1. 

0 0.5 1 1.5 2 2.5 

Time [sec] 

(d) Noisy signal at microphone #1. 


CTF vs. MTF 


Known RTF, Input SNR=0dB (cont.) 

0.1 

0.1 

0.05 

0.05 

Amplitude 

0 

Amplitude 

0 

−0.05 

−0.05 

−0.1 

−0.1 

0 0.5 1 1.5 2 2.5 3 

Time [sec] 

(e) TF-GSC output. 

0 0.5 1 1.5 2 2.5 

Time [sec] 

(f) CTF-GSC output. 


CTF vs. MTF 


Summary 

Output SNR and Noise Reduction [dB] for known RTF 

In SNR SNR NR 

TF-GSC CTF-GSC TF-GSC CTF-GSC 

-5 3.8 8.5 -5.9 -10.9 

-2.5 5.2 10.0 -6.2 -10.9 

0 6.2 11.2 -6.2 -10.9 

2.5 7.0 12.0 -6.7 -10.9 

5 7.9 12.5 -6.1 -10.9 

Back 

15 

12.5 

TF−GSC 

CTF−GSC 

Output SNR [dB] 

10 

7.5 

5 

2.5 

0 

−5 0 5 

Input SNR [dB] 



Parameter Estimation 

Estimation 

Unknowns 

Desired sources RTFs, ˜H d . 

Interferences subspace basis, Q. 

Limiting Assumptions 

The RIRs are time invariant. 

A segment without interference speakers is available for each desired 

source (double talk within desired allowed). 

A segment without desired speakers is available for each interfering 

source (double talk within interfering allowed). 




Interferences Subspace Estimation 

Step 1 

EVD and Pruning 

Estimate the signals 

subspace at each time 

segment without any 

desired sources active 

ˆΦ i zz = ¯Q i Λ i ¯Q H i 

All eigenvectors 

corresponding to “weak” 

eigenvalues are discarded 

dB 

60 

50 

40 

30 

20 

10 

0 

−10 

−20 

−30 

Eigen values per frequency 

100 200 300 400 500 600 700 800 900 1000 

Frequency index 




Interferences Subspace Estimation 

Step 2 

Union of Estimates using Gram-Schmidt 

Apply QRD 

[ 

] 

¯Q 1¯Λ 1 2 

1 

. . . ¯Q Nseg ¯Λ 1 2 

Nseg 

P = QR 

Discard vectors from the basis Q that correspond to “weak” 

coefficients in R 




EVD per Frame - Graphical Interpretation 

Frame 1, strong eigenvectors 














QRD Calculation 

Graphical Interpretation 




QRD Pruning 

Graphical Interpretation 




Desired Sources RTF Estimation 

For a Single Desired Source 

PSD Estimation 

Estimate spatial PSD of stationary sources (during noise-only 

segments). 

Estimate spatial PSD of one desired source + stationary sources. 

Largest Generalized Eigenvector 

f = GEVD(“Desired+Noise PSD”, “Noise PSD”). 

Rotate and normalize f to obtain the RTF ˆ˜h d j . 

Back 


Binaural LCMV 


Setup 

Hearing device: 

2 hearing aid devices mounted on B&K HATS, with 2 microphones, 

2cm inter-distance. 

A 9 × 5 utility device with 4 mics. at the corners, average distance 

3.5cm. The device placed on a table at a distance of 0.5m. 

Signals: 

1 desired speaker, θ d = 30 o , 1m (constrained). 

1 interference speaker at θ i = −70 o , 1m (constrained). 

1 directional stationary noise, θ n = −40 o , 2.5m (unconstrained). 

SIR=0dB, SNR=14dB. 

Acoustic lab: 

Dimensions 6 × 6 × 2.4; Controllable reverb. time T 60 = 0.3s. 

STFT: 

Sampling frequency 8kHz, 4096 points, 75% overlap. 

Algorithm Cue gain factors: 

Desired speech - η = 1. 

Interference speech - µ = 0.1 (20dB attenuation). Back

PDF, 16,4 MB - Quality and Usability Lab - TU Berlin

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?