PDF, 16,4 MB - Quality and Usability Lab - TU Berlin
PDF, 16,4 MB - Quality and Usability Lab - TU Berlin
PDF, 16,4 MB - Quality and Usability Lab - TU Berlin
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Multi-Microphone Speech Enhancement<br />
Centralized <strong>and</strong> Distributed Beamformers<br />
Sharon Gannot<br />
Faculty of Engineering, Bar-Ilan University, Israel<br />
<strong>TU</strong> <strong>Berlin</strong>, 7.1.2013, <strong>Berlin</strong>, Germany<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 1 / 65
Introduction<br />
Speech Enhancement <strong>and</strong> Source Extraction<br />
Utilizing Microphone Arrays<br />
Applications<br />
1 H<strong>and</strong>s-free communications.<br />
2 Teleconference.<br />
3 Skype calls.<br />
4 Hearing aids.<br />
5 Eavesdropping.<br />
Why is it a Difficult Task?<br />
1 Speech is a non-stationary<br />
signal with high dynamic<br />
range.<br />
2 Very long acoustic path<br />
(reverberation).<br />
3 Time varying acoustic path.<br />
4 Microphone arrays impose<br />
high computational burden.<br />
5 Large (<strong>and</strong> distributed)<br />
arrays require large<br />
communication b<strong>and</strong>width.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 2 / 65
Introduction<br />
Room Acoustics Essentials<br />
Reverberation<br />
Late reflections tend to be<br />
diffused, hence performance<br />
degradation.<br />
Beamforming: higher<br />
processing burden <strong>and</strong> high<br />
latency in STFT<br />
implementation.<br />
Deteriorates intelligibility,<br />
quality <strong>and</strong> ASR<br />
performance.<br />
Sound Fields<br />
Directional Room impulse<br />
response relates source <strong>and</strong><br />
microphones.<br />
Uncorrelated Signals on<br />
microphone are uncorrelated.<br />
Diffused Sound is coming<br />
from all directions.<br />
Dal-Degan, Prati, 1988; Habets, Gannot, 2007<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 3 / 65
Introduction<br />
Spatial Processing<br />
Design Criteria<br />
Tailored to Speech Applications<br />
1 BSS, CASA. Other methods<br />
2 Adaptive optimization Sondhi, Elko, 1986; Kaneda, Ohga, 1986.<br />
3 Minimum variance distortionless response (MVDR) <strong>and</strong> GSC.<br />
Van Compernolle, 1990; Affes <strong>and</strong> Grenier, 1997; Nordholm et al., 1999; Hoshuyama et al., 1999 Gannot et al., 2001;<br />
Herbordt et al., 2005<br />
4 Minimum mean square error (MMSE) - GSVD based spatial Wiener<br />
filter. Doclo, Moonen, 2002<br />
5 Speech distortion weighted multichannel Wiener filter (SDW-MWF).<br />
Spriet et al., 2004<br />
6 Maximum signal to noise ratio (SNR). Warsitz <strong>and</strong> Haeb-Umbach, 2005<br />
7 Linearly constrained minimum variance (LCMV). Markovich et al., 2009<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 4 / 65
Problem Formulation<br />
Problem Formulation<br />
Extraction of Desired Speaker Signal(s) in Multi-Interference Reverberant Environment<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 5 / 65
Problem Formulation<br />
Talk Outline<br />
Spatial processors (“beamformers”) based on the linearly constrained<br />
minimum variance (LCMV) criterion.<br />
Special cases:<br />
Enhancing single desired source (MVDR).<br />
Extracting desired source(s) in multiple competing speakers<br />
environment.<br />
Implementation: Generalized sidelobe canceller (GSC) in the<br />
short-time Fourier transform (STFT) domain.<br />
The relative transfer function (RTF): importance <strong>and</strong> estimation<br />
procedures.<br />
Distributed microphone arrays.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 6 / 65
Problem Formulation<br />
Problem Formulation in the STFT Domain<br />
Microphone Signals (m = 1, . . . , M):<br />
P d<br />
∑<br />
z m (l, k) = sj d (l, k)hjm(l, d k)<br />
P i<br />
Vector Formulation<br />
j=1<br />
∑<br />
∑P n<br />
+ sj i (l, k)hjm(l, i k) + sj n (l, k)hjm(k) n + n m (l, k)<br />
j=1<br />
j=1<br />
z(l, k) = H d (l, k)s d (l, k) + H i (k)s i (l, k) + H n (l, k)s n (l, k) + n(l, k)<br />
H(l, k)s(l, k) + n(l, k)<br />
P = P d + P i + P n<br />
≤ M<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 7 / 65
Problem Formulation<br />
Spatial Filters<br />
Filter <strong>and</strong> Combine<br />
y(l, k) = w H (l, k)z(l, k).<br />
z 1 (l, k)<br />
w ∗ 1(l, k)<br />
z 2 (l, k)<br />
w2 ∗ (l, k)<br />
y(l, k)<br />
z M (l, k)<br />
∑<br />
wM ∗ (l, k)<br />
Filters<br />
w(l, k): M × 1 vector.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 8 / 65
LCMV Beamformer<br />
Criterion <strong>and</strong> Solution<br />
The Linearly Constrained Minimum Variance Beamformer<br />
Er, Cantoni, 1983; Van Veen, Buckley, 1988<br />
LCMV Criterion<br />
Let Φ zz (l, k) be the M × M received signals correlation matrix.<br />
Minimize output power<br />
w H (l, k)Φ zz (l, k)w(l, k),<br />
such that the linear constraint set is satisfied:<br />
C H (l, k)w(l, k) = g(l, k).<br />
C : M × P constraints matrix; g : P × 1 response vector.<br />
Closed-form solution: w = Φ −1<br />
zz C ( C H Φ −1<br />
zz C ) −1 g (but we hardly use it)<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 9 / 65
LCMV Beamformer<br />
Criterion <strong>and</strong> Solution<br />
LCMV Beamformer<br />
Beampattern - Illustrative example<br />
0°<br />
0 dB<br />
315°<br />
−10 dB<br />
45°<br />
−20 dB<br />
−30 dB<br />
270°<br />
−40 dB<br />
90°<br />
225°<br />
135°<br />
180°<br />
10 microphone uniform linear array<br />
2 Desired sources in green <strong>and</strong> 2 interfering sources in red<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 10 / 65
LCMV Beamformer<br />
Criterion <strong>and</strong> Solution<br />
LCMV Minimization<br />
Graphical Interpretation Frost, 1972<br />
w 2<br />
w 1<br />
w † (l, k)Φ zz(l, k)w(l, k)<br />
w(l, k)<br />
w 0(l, k) =C(l, k) ( C † (l, k)C(l, k) ) −1 g(l, k)<br />
Constraint plane: C † (l, k)w(l, k) =g(l, k)<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 11 / 65
LCMV Beamformer<br />
The GSC Implementation<br />
The Generalized Sidelobe Canceller Implementation<br />
Split the Beamformer<br />
w = w 0 − w n .<br />
Constraints Subspace:<br />
w 0 ∈ Span{C}.<br />
Null Subspace: w n ∈ N {C}.<br />
w n Bq.<br />
B: M × (M − P) matrix.<br />
Spans the Null Subspace.<br />
q: vector of M − P filters.<br />
w n = −Bq<br />
w 2<br />
w new<br />
w old<br />
w 1<br />
w 0<br />
Constraint plane<br />
Constraint subspace<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 12 / 65
LCMV Beamformer<br />
The GSC Implementation<br />
The GSC Structure Griffiths, Jim, 1982<br />
Input<br />
FBF<br />
+<br />
+<br />
Output<br />
-<br />
BM<br />
ANC<br />
GSC Blocks<br />
Fixed beamformer (FBF) - satisfies the constraints.<br />
Blocking matrix (BM) - generates M − P unconstrained signals.<br />
Noise canceller (ANC) - adaptively suppresses the residual noise.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 13 / 65
MVDR<br />
Criterion, Solution, GSC<br />
The Minimum Variance Distortionless Beamformer<br />
Single Constraint<br />
Scenario:<br />
One desired signal.<br />
Beamformer Design:<br />
“Steers beam” towards the desired source (P = 1: one constraint).<br />
Minimize all other directions.<br />
C = h d<br />
g = 1<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 14 / 65
MVDR<br />
Criterion, Solution, GSC<br />
The MVDR Beamformer<br />
GSC Implementation Affes, Grenier, 1997; Hoshuyama et al., 1999; Gannot et al., 2000<br />
GSC Blocks:<br />
FBF - matched filter to the desired response, h d .<br />
BM - projection matrix to the null space of h d .<br />
ANC - recursively updated using the LMS algorithm (Shynk, 1992).<br />
Output signal:<br />
y = s d + residual noise <strong>and</strong> interference signals<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 15 / 65
The Transfer Function GSC<br />
Relative Transfer Function<br />
The Transfer Function GSC<br />
Relax Dereverberation Requirement Gannot et al., 2001<br />
Modified Constraint Set:<br />
C = h d<br />
˜g = (h d 1 ) ∗<br />
Equivalent to:<br />
C = ˜h d hd<br />
h d 1<br />
=<br />
[<br />
1 hd 2<br />
h d 1<br />
. . . hd M<br />
h d 1<br />
] T<br />
g = 1.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 <strong>16</strong> / 65
The Transfer Function GSC<br />
Relative Transfer Function<br />
The Transfer Function GSC utilizing RTF<br />
RTF suffices for implementing all blocks of the GSC.<br />
Output signal:<br />
y = h d 1 s d + residual noise <strong>and</strong> interference signals<br />
Tradeoff:<br />
Noise reduction is sacrificed if dereverberation required Habets et al., 2010.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 17 / 65
The Transfer Function GSC<br />
RTF Estimation<br />
The Importance of the RTF<br />
Advantages<br />
Shorter than the ATF.<br />
Estimation methods are available:<br />
Using speech non-stationarity (Gannot et al., 2001).<br />
Using speech probability <strong>and</strong> spectral subtraction Cohen, 2004.<br />
RTF equivalent to Interaural Transfer Function (ITF).<br />
Drawbacks<br />
Non-causal<br />
CTF-GSC<br />
For high T 60 multiplication in the frequency domain is only valid for very<br />
long frames. Hence, the RTFs will be extended to convolution in the<br />
STFT domain (CTF-GSC) Talmon et al., 2009.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 18 / 65
Multi-Constraint GSC<br />
Applications<br />
Multi-Constraint Beamformer<br />
Based on Multichannel Eigenspace Beamforming Markovich et al., 2009; 2010<br />
Applications:<br />
Conference call scenario with multiple participants.<br />
H<strong>and</strong>s-free cellular phone conversation in a car environment with<br />
several passengers.<br />
Cocktail Party scenario, in which desired conversation blend with<br />
many simultaneous conversations.<br />
Problem Formulation (Reminder):<br />
z = H d s d + H i s i + H n s n + n = Hs + n.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 19 / 65
Multi-Constraint GSC<br />
Derivation<br />
The Constraints Set<br />
Original<br />
C H = [ H d H i H n ]<br />
[<br />
1 . . . 1<br />
g } {{ } 0 } .{{ . . 0}<br />
P d<br />
P i +P n<br />
] T<br />
LCMV output<br />
P d<br />
∑<br />
y =<br />
j=1<br />
s d j<br />
+ noise components<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 20 / 65
Multi-Constraint GSC<br />
Derivation<br />
A Modified Constraints Set<br />
Noise & Interfernce<br />
H i , H n ⇒ orthonormal basis Q.<br />
Relax the dereverberation requirements using RTFs:<br />
[ ]<br />
(h<br />
d<br />
11 ) ∗ . . . (h d T<br />
N<br />
˜g <br />
d 1<br />
} {{ )∗ 0 . . . 0<br />
} } {{ } ⇒ ˜h d P<br />
P i +P<br />
j h d j /hj1<br />
d n d<br />
The modified Constraints Set<br />
˜C [ ˜H d Q ]<br />
(estimation by applying subspace methods)<br />
LCMV output<br />
P d<br />
∑<br />
y =<br />
j=1<br />
h d j1s d j<br />
+ noise components<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 21 / 65
Multi-Constraint GSC<br />
Experimental Study<br />
Single Desired Speaker<br />
Directional Noise Field<br />
(a) Noisy at mic. #1<br />
(b) Enhanced signal<br />
Figure: Female (desired) <strong>and</strong> male (interference) with Directional noise.<br />
8 microphones recorded at BIU acoustic lab set to T 60 = 300ms.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 22 / 65
Multi-Constraint GSC<br />
Experimental Study<br />
Multi-Speaker<br />
(a) Noisy at mic. #1<br />
(b) Enhanced signal<br />
Figure: 1 desired source <strong>and</strong> 3 competing speakers. 8 microphones recorded at<br />
BIU acoustic lab set to T 60 = 300ms.<br />
Approximately 20dB SIR <strong>and</strong> SNR improvement.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 23 / 65
Distributed Microphone Arrays<br />
Motivation<br />
Distributed Algorithms for Microphone Arrays<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 24 / 65
Distributed Microphone Arrays<br />
Motivation<br />
Condensed Microphone Array<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 25 / 65
Distributed Microphone Arrays<br />
Motivation<br />
Distributed Microphone Array<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 26 / 65
Distributed Microphone Arrays<br />
Motivation<br />
Centralized vs. Distributed Beamforming<br />
Condensed<br />
Simpler scheme.<br />
Easy calibration.<br />
Multitude of available<br />
solutions.<br />
Distributed<br />
Microphones can be placed<br />
r<strong>and</strong>omly, avoiding tedious<br />
calibration.<br />
Using more microphones<br />
improves spatial resolution.<br />
High probability to find<br />
microphones close to a<br />
relevant sound source.<br />
Improved sound field<br />
sampling.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 27 / 65
Distributed Microphone Arrays<br />
Motivation<br />
Challenges of Distributed Beamforming<br />
Power<br />
Communication b<strong>and</strong>width<br />
Computational complexity<br />
Communication<br />
Connectivity<br />
Protocol<br />
Capacity<br />
Arbitrary Constellation<br />
Ad-hoc<br />
Dynamics<br />
Signal Processing<br />
Partial data<br />
Synchronization<br />
Dynamics<br />
Coherence<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 28 / 65
Distributed Microphone Arrays<br />
Motivation<br />
Goals<br />
Develop a beamforming algorithm suitable for distributed microphone<br />
array.<br />
Analyze the performance with r<strong>and</strong>omly located microphones.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 29 / 65
Distributed Microphone Arrays<br />
Distributed GSC<br />
Distributed LCMV<br />
Formulation<br />
N nodes with M n microphones.<br />
∑ N<br />
n=1 M n = M.<br />
z(l, k) [ z T 1 (l, k) · · · zT N (l, k) ] T .<br />
Closed-form LCMV necessitates the inversion of Φ zz .<br />
A cumbersome task in distributed networks.<br />
Non-Optimal GSC Implementation<br />
Sum of local BFs: y(l, k) = ∑ N<br />
n=1 y n(l, k).<br />
Implement a local GSC at each node:<br />
M n − P outputs of the BM at the nth node (might go negative!).<br />
Total number of BM outputs: ∑ N<br />
n=1 (M n − P) = M − (N × P).<br />
M − (N × P) < (M − P) ⇒ degrees of freedom (DoF) lost<br />
⇒ performance degradation.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 30 / 65
Distributed Microphone Arrays<br />
Distributed GSC<br />
Distributed GSC<br />
Markovich-Golan, Gannot, Cohen 2013<br />
Overview<br />
Introduce P shared signals:<br />
Broadcast by a subset of<br />
the nodes.<br />
Retrieving degrees of<br />
freedom.<br />
Extended inputs at each<br />
node:<br />
Local microphones plus<br />
shared signals.<br />
Purely local FBFs.<br />
Block diagonal BM<br />
facilitates local BMs.<br />
Purely local ANC.<br />
Local BF output<br />
Shared signal<br />
Local BF output<br />
Local BF output<br />
Local BF output<br />
Shared signal<br />
Local BF output<br />
Total of N + P broadcast channels.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 31 / 65
Distributed Microphone Arrays<br />
Distributed GSC<br />
Nodes Connectivity<br />
Sources “Owned” by the nth Node:<br />
A node n that receives the pth source with the highest SNR is its<br />
“owner”.<br />
The shared signals broadcast by the nth node: r n = D H n z n .<br />
D n : an M n × P n selection matrix.<br />
A shared signal is responsible for only one source.<br />
Shared signals serve as a reference for RTF estimation in each node.<br />
Extended Inputs at the nth Node<br />
P − P n shared signals (excluding self-owned signals): ṙ n .<br />
Total number of signals: ¯M n = M n + P − P n<br />
Signals: ¯z n = [ z T n<br />
ṙ T n<br />
] T<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 32 / 65
Distributed Microphone Arrays<br />
Distributed GSC<br />
DGSC at the nth Node<br />
High Level Block-Diagram<br />
l<br />
Select<br />
Owner<br />
l<br />
Broadcast<br />
From other<br />
Nodes<br />
l<br />
l ,…, l<br />
GSC‐BF<br />
<br />
l<br />
Σ<br />
l<br />
Local & Global BF<br />
An ¯M n × 1 local BF at the nth node: ¯w n .<br />
Outputs of local BFs: ȳ n (l) = ¯w n H ¯z n (l).<br />
Global BF: ¯w [ ¯w 1<br />
T · · · ] T ¯wT N .<br />
Global output (available at each node): ȳ(l) = ∑ N<br />
n=1 ȳn(l).<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 33 / 65
Distributed Microphone Arrays<br />
Distributed GSC<br />
Blocks of the DGSC at the nth Node<br />
Fixed Beamformer (Local)<br />
Ĥ n the RTF relating the extended inputs <strong>and</strong> the shared signals.<br />
¯q n 1 N Ĥn<br />
(ĤH n Ĥ n<br />
) −1<br />
g.<br />
H H¯q n = g.<br />
Blocking Matrix (Block Diagonal)<br />
¯B n : M n × ( ¯M n − P) BM.<br />
Noise references: ū n (l) = ¯B H n ¯z n (l)<br />
∑ N<br />
n=1 ( ¯M n − P) = ∑ N<br />
n=1 (M n − P n ) = M − P ⇒ DoF fully utilized.<br />
Adaptive Noise Canceler (Local)<br />
Least Mean Squares: ¯f n (l) = ¯f n (l − 1) + µ ūn(l)ȳ ∗ (l)<br />
. ¯P u,n(l)<br />
Power normalization ¯P u,n (l).<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 34 / 65
Distributed Microphone Arrays<br />
Distributed GSC<br />
DGSC at the nth Node<br />
Low Level Block-Diagram<br />
Wireless link<br />
N nodes WASN<br />
ȳn ¯q FBF (l) + ȳ n (l)<br />
n<br />
ṙ n (l)<br />
P˙<br />
n × 1 ¯z n (l)<br />
z n (l) ¯M n × 1 ū n (l)<br />
M<br />
¯B n<br />
n × 1 ( ¯M n − P ) × 1<br />
¯f n<br />
P<br />
r n (l)<br />
P n × 1<br />
D n<br />
ȳ(l)<br />
∑<br />
00 11<br />
00 11 00 11<br />
00 11<br />
01<br />
+<br />
-<br />
ȳn<br />
NC (l)<br />
00 11 01 00 11<br />
00 11<br />
ȳ 1 (l)<br />
ȳ N (l)<br />
N<br />
Wireless link<br />
Node n<br />
¯w n ¯q n − ¯B n¯f n<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 35 / 65
Distributed Microphone Arrays<br />
Experimental study<br />
Wideb<strong>and</strong>: scenario<br />
4m × 4m × 3m room.<br />
Reverberation time<br />
T 60 = 300ms.<br />
N=4 nodes.<br />
M n = 2 microphones per<br />
node.<br />
Sampling frequency 8kHz.<br />
STFT window 4096 points,<br />
75% overlap.<br />
Desired speaker, 60dB SNR.<br />
Competing speaker, 60dB<br />
INR.<br />
4<br />
3.5<br />
3<br />
2.5<br />
2<br />
1.5<br />
1<br />
0.5<br />
2 point source Gaussian<br />
noises, 47dB INR.<br />
Sensors noise.<br />
90 Monte-Carlo experiments<br />
(sources’ positions).<br />
Microphone<br />
Desired speaker<br />
Competing speaker<br />
Interference<br />
0<br />
0 0.5 1 1.5 2 2.5 3<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 36 / 65
Distributed Microphone Arrays<br />
Experimental study<br />
Wideb<strong>and</strong>: results (1)<br />
25<br />
20<br />
15<br />
SNR imp. [dB]<br />
10<br />
5<br />
0<br />
−5<br />
−10<br />
Centralized GSC<br />
DGSC<br />
Single node GSC<br />
−15<br />
0 10 20 30 40 50 60 70 80 90<br />
MC experiment<br />
SNR improvement in various Monte Carlo experiments.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 37 / 65
Distributed Microphone Arrays<br />
Experimental study<br />
Wideb<strong>and</strong>: results (2)<br />
20<br />
18<br />
<strong>16</strong><br />
14<br />
SNR imp. [dB]<br />
12<br />
10<br />
8<br />
6<br />
Centralized GSC<br />
DGSC<br />
Single node GSC<br />
4<br />
2<br />
0<br />
0 10 20 30 40 50 60<br />
Time [sec]<br />
The convergence of the tested algorithms versus time.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 38 / 65
Distributed Microphone Arrays<br />
Experimental study<br />
Wideb<strong>and</strong>: results (3)<br />
(a) Noisy at mic. #1<br />
(b) Single node GSC<br />
(c) Condensed GSC<br />
(d) Distributed GSC<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 39 / 65
Statistical Beamformer<br />
WASNs with R<strong>and</strong>om Node Deployment<br />
Scenarios<br />
Ad hoc sensor networks.<br />
Large volume (<strong>and</strong> many nodes).<br />
High fault percentage.<br />
Arbitrary deployment of nodes.<br />
Questions<br />
How many nodes are required?<br />
What is the expected<br />
performance?<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 40 / 65
Statistical Beamformer<br />
R<strong>and</strong>om Beampattern using WASN<br />
Scenario<br />
Desired speaker.<br />
Coherent or diffused noise fields.<br />
Microphones are r<strong>and</strong>omly positioned.<br />
Reverberant enclosure.<br />
Derivation<br />
ATFs are complex r<strong>and</strong>om variables.<br />
Derive a model for ATFs (based on the work of Schröder)<br />
Consider two BF performance criteria: SNR, White noise gain<br />
These criteria become r<strong>and</strong>om variables.<br />
Analyze the statistics.<br />
Derive reliability functions: The probability that the criteria exceed a<br />
pre-defined level.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 41 / 65
Statistical Beamformer<br />
Reliability<br />
1<br />
1<br />
0.9<br />
0.9<br />
0.8<br />
0.8<br />
0.7<br />
0.7<br />
0.6<br />
0.6<br />
R κ,dif<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
M=5, empirical<br />
M=5, theoretical<br />
M=10, empirical<br />
M=10, theoretical<br />
M=15, empirical<br />
M=15, theoretical<br />
M=20, empirical<br />
M=20, theoretical<br />
M=25, empirical<br />
M=25, theoretical<br />
0<br />
−2 0 2 4 6 8 10 12 14 <strong>16</strong><br />
κ dif<br />
improvement [dB]<br />
R κ,c<br />
0.5<br />
0.4<br />
0.3<br />
0.2<br />
0.1<br />
M=5, empirical<br />
M=5, theoretical<br />
M=10, empirical<br />
M=10, theoretical<br />
M=15, empirical<br />
M=15, theoretical<br />
M=20, empirical<br />
M=20, theoretical<br />
M=25, empirical<br />
M=25, theoretical<br />
0<br />
2 4 6 8 10 12 14 <strong>16</strong> 18 20<br />
κ c<br />
improvement<br />
(a) Diffuse Noise, SIR=30dB, residual interference<br />
dominant<br />
(b) Coherent Noise, SIR=0dB, residual<br />
noise dominnat<br />
SINR out − SNR in , T 60 = 0.4sec, Room dimensions 4 × 4 × 3m.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 42 / 65
Discussion<br />
Discussion<br />
Advantages of the Proposed family of Algorithms<br />
1 LCMV beamformers: low distortion <strong>and</strong> high interference cancellation.<br />
2 Practical estimation <strong>and</strong> tracking procedures.<br />
3 Extendable to binaural algorithms for hearing aids (Hadad, Gannot, Doclo, 2012).<br />
4 The RTF allows for:<br />
1 Shorter processing frames.<br />
2 Higher noise reduction (traded-off for dereverberation).<br />
3 Preservation of binaural cues.<br />
5 Distributed versions (see also Bertr<strong>and</strong>, Moonen, 2011).<br />
There is still a lot to do...<br />
1 Arbitrary activity patterns.<br />
2 Improved tracking of moving speakers <strong>and</strong> sensors.<br />
3 Robustness.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 43 / 65
Discussion<br />
Thanks to my Collaborators<br />
1 Shmulik Markovich-Golan<br />
2 Israel Cohen<br />
3 Ronen Talmon<br />
4 David Levin<br />
5 Emanuël Habets<br />
6 many more...<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 44 / 65
Acoustic <strong>Lab</strong><br />
Vielen dank !<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 45 / 65
Introduction<br />
Overview of Spatial Noise Reduction Techniques<br />
1 Fixed beamforming Combine the microphone signals using a<br />
time-invariant filter-<strong>and</strong>-sum operation (data-independent).<br />
Jan, Flanagan, 1996; Doclo, Moonen, 2003; Schäfer, Heese, Wernerus, Vary, 2012<br />
2 Adaptive Beamforming Combine the spatial focusing of fixed<br />
beamformers with adaptive suppression of (spectrally <strong>and</strong> spatially<br />
time-varying) background noise.<br />
General reading: Cox et al., 1987; Van Veen, Buckley, 1988; Van Trees, 2002<br />
3 Blind Source Separation (BSS) Considers the received signals at the<br />
microphones as a mixture of all sound sources filtered by the RIRs.<br />
Utilizes Independent Component Analysis (ICA) techniques. Makino, 2003<br />
4 Computational Auditory Scene Analysis (CASA) Aims at performing<br />
sound segregation by modelling the human auditory perceptual<br />
processing. Brown, Wang, 2005<br />
Back<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 46 / 65
The Transfer Function GSC<br />
RTF Estimation<br />
Relative Transfer Function Estimation<br />
System Perspective:<br />
z m = ˜h d mz 1 + u m<br />
System Identification:<br />
ˆΦ zmz 1<br />
= ˜h d m ˆΦ z1 z 1<br />
+ Φ umz 1<br />
+ ε m<br />
Estimation is Biased:<br />
u m <strong>and</strong> z 1 are correlated ⇒ Biased estimator for ˜h d m<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 47 / 65
The Transfer Function GSC<br />
RTF Estimation<br />
Relative Transfer Function Estimation<br />
Based on Speech Non-stationarity Shalvi, Weinstein, 1996; Gannot et al., 2001<br />
Assumptions:<br />
System is Time-Invariant<br />
Noise is stationary (true for Φ vv (k))<br />
Speech is non-stationary (use frames l i , i = 1, . . . , I )<br />
⎡<br />
⎤ ⎡<br />
⎤<br />
⎡ ⎤<br />
ˆΦ zmz1 (l 1 , k) ˆΦ z1 z 1<br />
(l 1 , k) 1<br />
ε m (l 1 , k)<br />
ˆΦ zmz1 (l 2 , k)<br />
⎢<br />
⎥<br />
⎣ . ⎦ = ˆΦ z1 z 1<br />
(l 2 , k) 1<br />
[ ]<br />
˜hd<br />
⎢<br />
⎥ m (k)<br />
ε m (l 2 , k)<br />
+ ⎢ ⎥<br />
⎣ . ⎦ Φ umz1 (k) ⎣ . ⎦<br />
ˆΦ zmz1 (l I , k) ˆΦ z1 z 1<br />
(l I , k) 1<br />
ε m (l I , k)<br />
Back<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 48 / 65
CTF vs. MTF<br />
Motivation<br />
The Convolutive TF-GSC Talmon et al., 2009<br />
Motivation<br />
The TF-GSC (Gannot et al., 2001)<br />
The RTFs are incorporated into the MVDR beamformer, implemented<br />
in GSC structure.<br />
The adaptation to reverberant environments, is obtained by<br />
time-frequency domain implementation.<br />
The TF-GSC achieves improved enhancement of a noisy speech signal<br />
(compared with the time-domain, delay-only, GSC).<br />
High T 60<br />
The RIRs become very long.<br />
The MTF approximation is only valid if the time frames are<br />
significantly larger than the relative RIR.<br />
In practice, short frames are used, resulting in inaccurate<br />
representation of the RTF deteriorating performance.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 49 / 65
CTF vs. MTF<br />
Motivation<br />
Objectives<br />
In the STFT domain<br />
Formulate the problem using system representation in the STFT<br />
domain (Avargel <strong>and</strong> Cohen, 2007).<br />
Build a GSC scheme (a TF-GSC extension).<br />
Suggest practical solutions using approximations. Specifically, show<br />
solutions under the MTF <strong>and</strong> CTF approximations.<br />
Incorporate the RTF identification based on the CTF model<br />
(Talmon et al., 2009).<br />
Significant improvements in blocking ability , NR, distortion <strong>and</strong> SNR<br />
in comparison with the TF-GSC.<br />
Back<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 50 / 65
CTF vs. MTF<br />
Simulation Results<br />
Experimental Study<br />
Setup<br />
Comparing the CTF-GSC <strong>and</strong> the TF-GSC<br />
Image method (Allan <strong>and</strong> Berkley, 1979, implemented by Habets, 2009).<br />
Array of 5 microphones.<br />
Reverberation time T 60 = 0.5s.<br />
TF-GSC:<br />
Frame length - N = 512.<br />
RTF length - 250.<br />
Noise Canceller length - 450.<br />
CTF-GSC:<br />
In FBF <strong>and</strong> BM - N = 512, 50% overlap.<br />
In adaptive NC - N = 512, 75% overlap.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 51 / 65
CTF vs. MTF<br />
Simulation Results<br />
Signal Blocking<br />
The signal blocking factor (SBF) is defined by:<br />
SBF = 10 log 10<br />
E { (s d ∗ h d 1 [n])2}<br />
Mean m E {u 2 m[n]}<br />
where u m [n]; m = 2, . . . , M are the blocking matrix outputs.<br />
The blocking ability [dB] (known RTF)<br />
mic. 2 mic. 3 mic. 4 mic. 5<br />
TF 17 12 14 8<br />
CTF 22 17 18 13<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 52 / 65
CTF vs. MTF<br />
Simulation Results<br />
Known RTF, Input SNR=0dB<br />
0.1<br />
0.1<br />
0.05<br />
0.05<br />
Amplitude<br />
0<br />
Amplitude<br />
0<br />
−0.05<br />
−0.05<br />
−0.1<br />
−0.1<br />
0 0.5 1 1.5 2 2.5<br />
Time [sec]<br />
(c) Reverberated speech at microphone #1.<br />
0 0.5 1 1.5 2 2.5<br />
Time [sec]<br />
(d) Noisy signal at microphone #1.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 53 / 65
CTF vs. MTF<br />
Simulation Results<br />
Known RTF, Input SNR=0dB (cont.)<br />
0.1<br />
0.1<br />
0.05<br />
0.05<br />
Amplitude<br />
0<br />
Amplitude<br />
0<br />
−0.05<br />
−0.05<br />
−0.1<br />
−0.1<br />
0 0.5 1 1.5 2 2.5 3<br />
Time [sec]<br />
(e) TF-GSC output.<br />
0 0.5 1 1.5 2 2.5<br />
Time [sec]<br />
(f) CTF-GSC output.<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 54 / 65
CTF vs. MTF<br />
Simulation Results<br />
Summary<br />
Output SNR <strong>and</strong> Noise Reduction [dB] for known RTF<br />
In SNR SNR NR<br />
TF-GSC CTF-GSC TF-GSC CTF-GSC<br />
-5 3.8 8.5 -5.9 -10.9<br />
-2.5 5.2 10.0 -6.2 -10.9<br />
0 6.2 11.2 -6.2 -10.9<br />
2.5 7.0 12.0 -6.7 -10.9<br />
5 7.9 12.5 -6.1 -10.9<br />
Back<br />
15<br />
12.5<br />
TF−GSC<br />
CTF−GSC<br />
Output SNR [dB]<br />
10<br />
7.5<br />
5<br />
2.5<br />
0<br />
−5 0 5<br />
Input SNR [dB]<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 55 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
Estimation<br />
Unknowns<br />
Desired sources RTFs, ˜H d .<br />
Interferences subspace basis, Q.<br />
Limiting Assumptions<br />
The RIRs are time invariant.<br />
A segment without interference speakers is available for each desired<br />
source (double talk within desired allowed).<br />
A segment without desired speakers is available for each interfering<br />
source (double talk within interfering allowed).<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 56 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
Interferences Subspace Estimation<br />
Step 1<br />
EVD <strong>and</strong> Pruning<br />
Estimate the signals<br />
subspace at each time<br />
segment without any<br />
desired sources active<br />
ˆΦ i zz = ¯Q i Λ i ¯Q H i<br />
All eigenvectors<br />
corresponding to “weak”<br />
eigenvalues are discarded<br />
dB<br />
60<br />
50<br />
40<br />
30<br />
20<br />
10<br />
0<br />
−10<br />
−20<br />
−30<br />
Eigen values per frequency<br />
100 200 300 400 500 600 700 800 900 1000<br />
Frequency index<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 57 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
Interferences Subspace Estimation<br />
Step 2<br />
Union of Estimates using Gram-Schmidt<br />
Apply QRD<br />
[<br />
]<br />
¯Q 1¯Λ 1 2<br />
1<br />
. . . ¯Q Nseg ¯Λ 1 2<br />
Nseg<br />
P = QR<br />
Discard vectors from the basis Q that correspond to “weak”<br />
coefficients in R<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 58 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
EVD per Frame - Graphical Interpretation<br />
Frame 1, strong eigenvectors<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 59 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
EVD per Frame - Graphical Interpretation<br />
Frame 2, strong eigenvectors<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 60 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
EVD per Frame - Graphical Interpretation<br />
Frame 3, strong eigenvectors<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 61 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
QRD Calculation<br />
Graphical Interpretation<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 62 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
QRD Pruning<br />
Graphical Interpretation<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 63 / 65
Multi-Constraint GSC<br />
Parameter Estimation<br />
Desired Sources RTF Estimation<br />
For a Single Desired Source<br />
PSD Estimation<br />
Estimate spatial PSD of stationary sources (during noise-only<br />
segments).<br />
Estimate spatial PSD of one desired source + stationary sources.<br />
Largest Generalized Eigenvector<br />
f = GEVD(“Desired+Noise PSD”, “Noise PSD”).<br />
Rotate <strong>and</strong> normalize f to obtain the RTF ˆ˜h d j .<br />
Back<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 64 / 65
Binaural LCMV<br />
Experimental Study<br />
Setup<br />
Hearing device:<br />
2 hearing aid devices mounted on B&K HATS, with 2 microphones,<br />
2cm inter-distance.<br />
A 9 × 5 utility device with 4 mics. at the corners, average distance<br />
3.5cm. The device placed on a table at a distance of 0.5m.<br />
Signals:<br />
1 desired speaker, θ d = 30 o , 1m (constrained).<br />
1 interference speaker at θ i = −70 o , 1m (constrained).<br />
1 directional stationary noise, θ n = −40 o , 2.5m (unconstrained).<br />
SIR=0dB, SNR=14dB.<br />
Acoustic lab:<br />
Dimensions 6 × 6 × 2.4; Controllable reverb. time T 60 = 0.3s.<br />
STFT:<br />
Sampling frequency 8kHz, 4096 points, 75% overlap.<br />
Algorithm Cue gain factors:<br />
Desired speech - η = 1.<br />
Interference speech - µ = 0.1 (20dB attenuation). Back<br />
S. Gannot (BIU) Multi-Microphone Speech Enhancement <strong>TU</strong> <strong>Berlin</strong>, 2013 65 / 65