06.06.2022 Views

B. P. Lathi, Zhi Ding - Modern Digital and Analog Communication Systems-Oxford University Press (2009)

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.8 Vocoders and Video Compression 30 1

Figure 6.34

(a) The human

speech

production

mechanism.

(b) Typical

pressure

impulses.

Nose cavity

Mouth cavity

(a)

(b)

For a low-pass source signal with finite bandwidth B Hz, even if we apply the minimum

Nyquist sampling rate 2B Hz and I -bit encoding, the bit rate cannot be lower than 2B bit/s.

There have been many successful methods introduced to drastically reduce the source coding

rates of speech and video signals, very important to our daily communication needs. Unlike

waveform encoders, the most successful speech and video encoders are based on the human

physiological models involved in speech generation and in video perception. Here we describe

the basic principles of the linear prediction voice coders (known as vocoders) and the video

compression method proposed by the Moving Picture Experts Group (MPEG).

6.8.1 Linear Prediction Coding Vocoders

Voice Models and Model-Based Vocoders

Linear prediction coding (LPC) vocoders are model-based systems. The model, in tum, is

based on a good understanding of the human voice mechanism. Fig. 6.34a provides a crosssectional

illustration of the human speech apparatus. Briefly, human speech is produced by

the joint interaction of lungs, vocal cords, and the articulation tract, consisting of the mouth

and the nose cavity. Based on this physiological speech model, human voices can be divided

into voiced and the unvoiced sound categories. Voiced sounds are those made while the vocal

cords are vibrating. Put a finger on your Adam's apple* while speaking, and you can feel the

vibration the vocal cords when you pronounce all the vowels and some consonants, such as g

as in gut, b as in but, and n as in nut. Unvoiced sounds are made while the vocal cords are not

vibrating. Several consonants such as k, p, and t are unvoiced. Examples of unvoiced sounds

include h in hut, c in cut, and p in put.

For the production of voiced sounds, the lungs expel air through the epiglottis, causing

the vocal cords to vibrate. The vibrating vocal cords interrupt the airstream and produce a

quasi-periodic pressure wave consisting of impulses. The pressure wave impulses are commonly

called pitch impulses, and the frequency of the pressure signal is the pitch frequency or

fundamental frequency as shown in Fig. 6.34b. This is the part of the voice signal that defines

the speech tone. Speech that is uttered in a constant pitch frequency sounds monotonous. In

ordinary cases, the pitch frequency of a speaker varies almost constantly, often from syllable

to syllable.

* The slight projection at the front of the throat formed by the largest cartilage of the larynx, usually more prominent

in men than in women.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!