LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />
8.5. HOMOMORPHIC FILTERING OF NATURAL SPEECH 465<br />
radiation of sound at the lips for unvoiced speech, while h[n] = hV [n] contains an<br />
additional convolutional component due to the glottal pulse for voiced speech. 12<br />
Furthermore, we assume that the impulse response h[n] is short compared to<br />
the length of the window so that the windowed segment can be represented as<br />
x[n] = w[n]s[n] = w[n](e[n] ∗ h[n])<br />
≈ ew[n] ∗ h[n] 0 ≤ n ≤ L − 1, (8.87)<br />
where ew[n] = w[n]e[n]; i.e., any tapering due to the analysis window is incorporated<br />
into the excitation as a slowly varying amplitude modulation.<br />
In the case of unvoiced speech, the excitation e[n] would be white noise <strong>and</strong><br />
h[n] = hU [n]. In the case of voiced speech, h[n] = hV [n] <strong>and</strong> e[n] would be a<br />
unit impulse train of the form<br />
e[n] = p[n] =<br />
Nw−1 <br />
k=0<br />
δ[n − kNp], (8.88)<br />
where Nw is the number of impulses in the window <strong>and</strong> Np is the discrete-time<br />
pitch period (measured in samples).<br />
For voiced speech, the windowed excitation is<br />
ew[n] = w[n]p[n] =<br />
Nw−1 <br />
k=0<br />
wNp[k]δ[n − kNp], (8.89)<br />
where wNp [k] is the “time-sampled” window sequence defined as<br />
<br />
w[kNp] k = 0, 1, . . . , Nw − 1<br />
wNp [k] =<br />
0 otherwise.<br />
From (8.89), the DTFT of ew[n] is<br />
Ew(e jω ) =<br />
Nw−1 <br />
k=0<br />
(8.90)<br />
wNp [k]e−jωkNp = WNp (ejωNp ), (8.91)<br />
<strong>and</strong> from (8.91) it follows that Ew(e jω ) is periodic in ω with period 2π/Np.<br />
Therefore,<br />
ˆX(e jω ) = log{HV (e jω )} + log{Ew(e jω )} (8.92)<br />
has two components: (1) log{HV (e jω )}, due to the vocal tract frequency response,<br />
which is slowly varying in ω, <strong>and</strong> (2) log{WNp (ejωNp )}, which is due to<br />
the excitation <strong>and</strong> periodic with period 2π/Np. 13 The complex cepstrum of the<br />
windowed speech segment x[n] is therefore<br />
ˆx[n] = ˆ hV [n] + êw[n]. (8.93)<br />
12 Note that it is often convenient to incorporate the excitation gain (AV or AU in Figure<br />
8.12) into h[n] so that we can assume that e[n] consists of unit impulses for voiced excitation<br />
<strong>and</strong> unit variance white noise for unvoiced excitation.<br />
13 For signals sampled with sampling rate Fs, this period corresponds to Fs/Np Hz in cyclic<br />
analog frequency.