05.06.2013 Views

A Study of the ITU-T G.729 Speech Coding Algorithm ...

A Study of the ITU-T G.729 Speech Coding Algorithm ...

A Study of the ITU-T G.729 Speech Coding Algorithm ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The main error factors in pitch estimation are:<br />

Open<br />

MASTER THESIS<br />

Datum - Date Rev Dokumentnr - Document no.<br />

04-09-28 PA1<br />

• Sub-Harmonic Errors<br />

Sub-harmonics <strong>of</strong> <strong>the</strong> fundamental period T0 appear at 2T0,3T0..., and can wrongly<br />

be identified as <strong>the</strong> fundamental period.<br />

• Noisy Conditions<br />

For noisy conditions, with low SNR, pitch estimation is unreliable.<br />

• Vocal Fry<br />

For some speakers <strong>the</strong> pitch is not continuous and it may change drastically, even<br />

halve [15].<br />

2.3.2 Source-Filter Model<br />

The source-filter model is based on <strong>the</strong> human speech production system and is <strong>the</strong> most<br />

commonly used model for speech syn<strong>the</strong>sis. With <strong>the</strong> information <strong>of</strong> <strong>the</strong> pitch period and<br />

vocal tract parameters, speech can be syn<strong>the</strong>sized to replicate naturally spoken speech.<br />

In <strong>the</strong> source-filter model, depicted in Figure 8 (according to [24]), a speech signal is sep-<br />

Figure 8: Source-filter model <strong>of</strong> speech production<br />

arated into two components: <strong>the</strong> excitation and <strong>the</strong> vocal tract parameters. For voiced<br />

speech, <strong>the</strong> excitation is an impulse train with periods corresponding to <strong>the</strong> fundamental<br />

frequency (or pitch). For unvoiced speech, <strong>the</strong> excitation consists <strong>of</strong> white noise. For certain<br />

sounds, a mixture <strong>of</strong> voiced and unvoiced excitation is required. This is achieved by<br />

scaling and adding <strong>the</strong> pulse train to <strong>the</strong> white-noise excitation. Typically, this is required<br />

for segments which contain a transition between voiced and unvoiced speech. Also, some<br />

phonemes are both, voiced and unvoiced. The voiced/unvoiced switch in Figure 8, requires<br />

information about whe<strong>the</strong>r <strong>the</strong> current speech is voiced or unvoiced. Designing a<br />

voiced/unvoiced detector algorithm can be even more difficult than designing <strong>the</strong> pitch estimator.<br />

Two possible approaches for designing voiced/unvoiced detector are listed below:<br />

22 (78)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!