A Study of the ITU-T G.729 Speech Coding Algorithm ...
A Study of the ITU-T G.729 Speech Coding Algorithm ...
A Study of the ITU-T G.729 Speech Coding Algorithm ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Open<br />
MASTER THESIS<br />
Datum - Date Rev Dokumentnr - Document no.<br />
04-09-28 PA1<br />
CELP-coding, <strong>the</strong> long-term predictor is also called an adaptive codebook. The adaptivecodebook<br />
entries are segments <strong>of</strong> <strong>the</strong> recently syn<strong>the</strong>sized signal [15]. The short-term<br />
predictor accounts for <strong>the</strong> spectral envelope, i.e. <strong>the</strong> formants, and is <strong>the</strong> same type <strong>of</strong><br />
production filter used in vocoders. Figure 11 depicts how speech is syn<strong>the</strong>sized in a CELPencoder<br />
[27]. Thus, <strong>the</strong> fixed innovation comes from <strong>the</strong> fixed codebook which is <strong>the</strong>n<br />
filtered through <strong>the</strong> long-term production filter followed by <strong>the</strong> short-term production filter.<br />
Ano<strong>the</strong>r way to see this is that <strong>the</strong> excitation vectors from <strong>the</strong> adaptive and fixed codebook<br />
are added to construct <strong>the</strong> excitation signal which is <strong>the</strong>n filtered through <strong>the</strong> short-term<br />
production filter.<br />
Figure 11: Long-term prediction / Adaptive codebook<br />
By subtracting <strong>the</strong> long-term and short-term predictions <strong>of</strong> <strong>the</strong> speech signal, <strong>the</strong> redundancy<br />
<strong>of</strong> <strong>the</strong> signal is removed. The remaining signal is called <strong>the</strong> innovation or excitation<br />
signal. A set <strong>of</strong> signals which approximates <strong>the</strong> possible excitation sequences forms <strong>the</strong><br />
codebook. Thus, a CELP- encoder algorithm consist <strong>of</strong> three stages:<br />
1. Finding <strong>the</strong> best long-term and short-term linear predictors.<br />
2. Filtering <strong>the</strong> predictor filters with all possible excitations and finding <strong>the</strong> sequence in<br />
<strong>the</strong> codebook that minimizes <strong>the</strong> perceptually-weighted mean squared error. This<br />
is an analysis-by-syn<strong>the</strong>sis procedure. The perceptual weighting takes into account<br />
how humans perceive sound or speech and is absolutely essential to <strong>the</strong> success <strong>of</strong><br />
CELP-coding.<br />
3. Send information to <strong>the</strong> decoder about <strong>the</strong> long- and short-term predictor coefficients<br />
and <strong>the</strong> binary codebook word corresponding to <strong>the</strong> sequence chosen from <strong>the</strong> codebook.<br />
The decoder <strong>the</strong>n looks up <strong>the</strong> excitation sequence from <strong>the</strong> codebook with <strong>the</strong> received<br />
codebook word and drives <strong>the</strong> excitation sequence through <strong>the</strong> long- and short-time production<br />
filters [13]. The CELP-coding principle is depicted in Figure 12 [8].<br />
CELP Complexity<br />
A CELP-coding algorithm is complex and <strong>the</strong>refore computationally demanding. Assuming<br />
a CELP-coder that uses a 10-bit word as an index to <strong>the</strong> codebook, would give a codebook<br />
consisting <strong>of</strong> 2 10 = 1024 excitation sequences. Applying each <strong>of</strong> <strong>the</strong>se 1024 excitation<br />
28 (78)