18.07.2013 Views

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />

8.5. HOMOMORPHIC FILTERING OF NATURAL SPEECH 465<br />

radiation of sound at the lips for unvoiced speech, while h[n] = hV [n] contains an<br />

additional convolutional component due to the glottal pulse for voiced speech. 12<br />

Furthermore, we assume that the impulse response h[n] is short compared to<br />

the length of the window so that the windowed segment can be represented as<br />

x[n] = w[n]s[n] = w[n](e[n] ∗ h[n])<br />

≈ ew[n] ∗ h[n] 0 ≤ n ≤ L − 1, (8.87)<br />

where ew[n] = w[n]e[n]; i.e., any tapering due to the analysis window is incorporated<br />

into the excitation as a slowly varying amplitude modulation.<br />

In the case of unvoiced speech, the excitation e[n] would be white noise <strong>and</strong><br />

h[n] = hU [n]. In the case of voiced speech, h[n] = hV [n] <strong>and</strong> e[n] would be a<br />

unit impulse train of the form<br />

e[n] = p[n] =<br />

Nw−1 <br />

k=0<br />

δ[n − kNp], (8.88)<br />

where Nw is the number of impulses in the window <strong>and</strong> Np is the discrete-time<br />

pitch period (measured in samples).<br />

For voiced speech, the windowed excitation is<br />

ew[n] = w[n]p[n] =<br />

Nw−1 <br />

k=0<br />

wNp[k]δ[n − kNp], (8.89)<br />

where wNp [k] is the “time-sampled” window sequence defined as<br />

<br />

w[kNp] k = 0, 1, . . . , Nw − 1<br />

wNp [k] =<br />

0 otherwise.<br />

From (8.89), the DTFT of ew[n] is<br />

Ew(e jω ) =<br />

Nw−1 <br />

k=0<br />

(8.90)<br />

wNp [k]e−jωkNp = WNp (ejωNp ), (8.91)<br />

<strong>and</strong> from (8.91) it follows that Ew(e jω ) is periodic in ω with period 2π/Np.<br />

Therefore,<br />

ˆX(e jω ) = log{HV (e jω )} + log{Ew(e jω )} (8.92)<br />

has two components: (1) log{HV (e jω )}, due to the vocal tract frequency response,<br />

which is slowly varying in ω, <strong>and</strong> (2) log{WNp (ejωNp )}, which is due to<br />

the excitation <strong>and</strong> periodic with period 2π/Np. 13 The complex cepstrum of the<br />

windowed speech segment x[n] is therefore<br />

ˆx[n] = ˆ hV [n] + êw[n]. (8.93)<br />

12 Note that it is often convenient to incorporate the excitation gain (AV or AU in Figure<br />

8.12) into h[n] so that we can assume that e[n] consists of unit impulses for voiced excitation<br />

<strong>and</strong> unit variance white noise for unvoiced excitation.<br />

13 For signals sampled with sampling rate Fs, this period corresponds to Fs/Np Hz in cyclic<br />

analog frequency.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!