18.07.2013 Views

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

LR Rabiner and RW Schafer, June 3

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />

8.3. HOMOMORPHIC ANALYSIS OF THE SPEECH MODEL 443<br />

by a quasi-periodic impulse train or a r<strong>and</strong>om noise signal. As we have seen repeatedly<br />

in previous chapters, the fundamental problem of speech analysis is to<br />

reliably <strong>and</strong> robustly estimate the parameters of the model of Figure 8.12 (i.e.,<br />

the pitch period control, the shape of the glottal pulse, the gains for the voiced<br />

or unvoiced excitation signals, the state of the voiced/unvoiced switch, the vocal<br />

tract parameters <strong>and</strong> the radiation model, as illustrated in Figure 8.12), <strong>and</strong> to<br />

measure the variations of these model control parameters with time.<br />

Figure 8.12: General discrete-time model of speech production.<br />

Since the excitation <strong>and</strong> impulse response of a linear time-invariant system<br />

are combined by convolution, the problem of speech analysis can also be viewed<br />

as a problem in separating the components of a convolution, <strong>and</strong> therefore,<br />

homomorphic systems <strong>and</strong> the cepstrum are useful tools for speech analysis. In<br />

the model of Figure 8.12, the pressure signal at the lips, s[n], for a voiced section<br />

of speech is represented as the convolution<br />

s[n] = p[n] ∗ hV [n], (8.37a)<br />

where p[n] is the quasi-periodic voiced excitation signal, <strong>and</strong> hV [n] represents<br />

the combined effect of the vocal tract impulse response v[n], the glottal pulse<br />

g[n], the radiation load response at the lips r[n], <strong>and</strong> the voiced gain AV . The<br />

effective impulse response, hV [n], is itself the convolution of g[n], v[n], <strong>and</strong> r[n],<br />

including scaling by the voiced section gain control, AV , i.e.,<br />

hV [n] = AV · g[n] ∗ v[n] ∗ r[n]. (8.37b)<br />

Recall that it is common usage, when it is not necessary to make a fine distinction,<br />

to refer to hV [n] as simply “the vocal tract impulse response for voiced

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!