LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
LR Rabiner and RW Schafer, June 3
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
DRAFT: L. R. <strong>Rabiner</strong> <strong>and</strong> R. W. <strong>Schafer</strong>, <strong>June</strong> 3, 2009<br />
8.3. HOMOMORPHIC ANALYSIS OF THE SPEECH MODEL 443<br />
by a quasi-periodic impulse train or a r<strong>and</strong>om noise signal. As we have seen repeatedly<br />
in previous chapters, the fundamental problem of speech analysis is to<br />
reliably <strong>and</strong> robustly estimate the parameters of the model of Figure 8.12 (i.e.,<br />
the pitch period control, the shape of the glottal pulse, the gains for the voiced<br />
or unvoiced excitation signals, the state of the voiced/unvoiced switch, the vocal<br />
tract parameters <strong>and</strong> the radiation model, as illustrated in Figure 8.12), <strong>and</strong> to<br />
measure the variations of these model control parameters with time.<br />
Figure 8.12: General discrete-time model of speech production.<br />
Since the excitation <strong>and</strong> impulse response of a linear time-invariant system<br />
are combined by convolution, the problem of speech analysis can also be viewed<br />
as a problem in separating the components of a convolution, <strong>and</strong> therefore,<br />
homomorphic systems <strong>and</strong> the cepstrum are useful tools for speech analysis. In<br />
the model of Figure 8.12, the pressure signal at the lips, s[n], for a voiced section<br />
of speech is represented as the convolution<br />
s[n] = p[n] ∗ hV [n], (8.37a)<br />
where p[n] is the quasi-periodic voiced excitation signal, <strong>and</strong> hV [n] represents<br />
the combined effect of the vocal tract impulse response v[n], the glottal pulse<br />
g[n], the radiation load response at the lips r[n], <strong>and</strong> the voiced gain AV . The<br />
effective impulse response, hV [n], is itself the convolution of g[n], v[n], <strong>and</strong> r[n],<br />
including scaling by the voiced section gain control, AV , i.e.,<br />
hV [n] = AV · g[n] ∗ v[n] ∗ r[n]. (8.37b)<br />
Recall that it is common usage, when it is not necessary to make a fine distinction,<br />
to refer to hV [n] as simply “the vocal tract impulse response for voiced