LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />
8.3. HOMOMORPHIC ANALYSIS OF THE SPEECH MODEL 451<br />
log e | S(e j2π FT ) |<br />
arg [ S(e j2π FT ) ]<br />
ARG [ S(e j2π FT ) ]<br />
6<br />
4<br />
2<br />
0<br />
−2<br />
(a) Log Magnitude Spectrum of Synthetic Voiced Speech<br />
−4<br />
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000<br />
10<br />
5<br />
0<br />
(b) Continuous Phase of Synthetic Voiced Speech<br />
−5<br />
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000<br />
4<br />
2<br />
0<br />
−2<br />
(c) Principal Value Phase of Synthetic Voiced Speech<br />
−4<br />
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000<br />
frequency in Hz<br />
Figure 8.19: Frequency-domain representation of the complex cepstrum. (a)<br />
Log magnitude log |S(e j2πF T )| (real part of ˆ S(e j2πF T )) (b) Continuous phase<br />
arg{S(e j2πF T )} (imaginary part of ˆ S(e j2πF T )), (c) Principal value phase<br />
ARG{S(e j2πF T )}. The heavy lines in (a) <strong>and</strong> (b) represent ˆ HV (e j2πF T ) =<br />
log |HV (e j2πF T )| + j arg{HV (e j2πF T )} .<br />
since no z-transform representation exists directly for the r<strong>and</strong>om noise input<br />
signal itself. However, if we employ the autocorrelation <strong>and</strong> power spectrum<br />
representation for the model for unvoiced speech production, we can obtain<br />
similar results to those for voiced speech.<br />
Recall that for unvoiced speech, we have no glottal pulse excitation so the<br />
model output is s[n] = hU [n] ∗ u[n] = v[n] ∗ r[n] ∗ (AU u[n]), where u[n] is a unitvariance<br />
white noise sequence. The autocorrelation representation of unvoiced<br />
speech is therefore<br />
φss[n] = φvv[n] ∗ φrr[n] ∗ (A 2 U δ[n]) = A 2 U φvv[n] ∗ φrr[n], (8.47)<br />
where φvv[n] <strong>and</strong> φrr[n] are the deterministic autocorrelation functions of the<br />
vocal tract <strong>and</strong> radiation systems respectively. These are combined by convolution.<br />
The z-transform of φss[n] exists <strong>and</strong> is given by<br />
where<br />
Φss(z) = A 2 U Φvv(z)Φrr(z), (8.48)<br />
Φvv(z) = V (z)V (z −1 ) (8.49a)<br />
Φrr(z) = R(z)R(z −1 ) (8.49b)