18.07.2013 Views

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />

8.3. HOMOMORPHIC ANALYSIS OF THE SPEECH MODEL 451<br />

log e | S(e j2π FT ) |<br />

arg [ S(e j2π FT ) ]<br />

ARG [ S(e j2π FT ) ]<br />

6<br />

4<br />

2<br />

0<br />

−2<br />

(a) Log Magnitude Spectrum of Synthetic Voiced Speech<br />

−4<br />

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000<br />

10<br />

5<br />

0<br />

(b) Continuous Phase of Synthetic Voiced Speech<br />

−5<br />

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000<br />

4<br />

2<br />

0<br />

−2<br />

(c) Principal Value Phase of Synthetic Voiced Speech<br />

−4<br />

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000<br />

frequency in Hz<br />

Figure 8.19: Frequency-domain representation of the complex cepstrum. (a)<br />

Log magnitude log |S(e j2πF T )| (real part of ˆ S(e j2πF T )) (b) Continuous phase<br />

arg{S(e j2πF T )} (imaginary part of ˆ S(e j2πF T )), (c) Principal value phase<br />

ARG{S(e j2πF T )}. The heavy lines in (a) <strong>and</strong> (b) represent ˆ HV (e j2πF T ) =<br />

log |HV (e j2πF T )| + j arg{HV (e j2πF T )} .<br />

since no z-transform representation exists directly for the r<strong>and</strong>om noise input<br />

signal itself. However, if we employ the autocorrelation <strong>and</strong> power spectrum<br />

representation for the model for unvoiced speech production, we can obtain<br />

similar results to those for voiced speech.<br />

Recall that for unvoiced speech, we have no glottal pulse excitation so the<br />

model output is s[n] = hU [n] ∗ u[n] = v[n] ∗ r[n] ∗ (AU u[n]), where u[n] is a unitvariance<br />

white noise sequence. The autocorrelation representation of unvoiced<br />

speech is therefore<br />

φss[n] = φvv[n] ∗ φrr[n] ∗ (A 2 U δ[n]) = A 2 U φvv[n] ∗ φrr[n], (8.47)<br />

where φvv[n] <strong>and</strong> φrr[n] are the deterministic autocorrelation functions of the<br />

vocal tract <strong>and</strong> radiation systems respectively. These are combined by convolution.<br />

The z-transform of φss[n] exists <strong>and</strong> is given by<br />

where<br />

Φss(z) = A 2 U Φvv(z)Φrr(z), (8.48)<br />

Φvv(z) = V (z)V (z −1 ) (8.49a)<br />

Φrr(z) = R(z)R(z −1 ) (8.49b)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!