28.02.2013 Views

Introduction to Acoustics

Introduction to Acoustics

Introduction to Acoustics

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Some might notice that linear interpolation is not<br />

quite adequate, so they might assume that the correct solution<br />

must lie in more elaborate curve-fitting techniques<br />

using quadratics, cubics, or higher-order splines, and indeed<br />

these types of interpolation can be adequate for<br />

some applications. To arrive at the correct answer (provably<br />

correct from theory) the interpolation task should<br />

be viewed and accomplished as a filtering problem, with<br />

the filter designed <strong>to</strong> meet some appropriate error criterion.<br />

Linear time-invariant filtering is accomplished by<br />

convolution with a filter function. If the resampling filter<br />

is defined appropriately, we can exactly reconstruct<br />

the original analog waveform from the samples.<br />

The correct (ideal in a provable digital signal<br />

processing sense) way <strong>to</strong> perform interpolation is convolution<br />

with the sinc function, defined as:<br />

sinc(t/T ) = sin(πt/T )/(πt/T ) ,<br />

17.2 Pulse Code Modulation Synthesis<br />

The majority of digital sound and music synthesis <strong>to</strong>day<br />

is accomplished via the playback of s<strong>to</strong>red pulse<br />

code modulation (PCM) waveforms. Single-shot playback<br />

of entire segments of s<strong>to</strong>red sounds is common for<br />

sound effects, narrations, prompts, segments of music,<br />

etc. Most high-quality modern electronic music synthesizers,<br />

speech synthesis systems, and PC software<br />

systems for sound synthesis use pre-s<strong>to</strong>red PCM as the<br />

basic data. This data is sometimes manipulated <strong>to</strong> yield<br />

the final output sound(s).<br />

There are a number of different ways <strong>to</strong> look at<br />

sound for computer music, with PCM being only one.<br />

We can look at the physics that produce the sound and<br />

try <strong>to</strong> model those. We could also look at the spectrum<br />

of the sound and other characteristics having <strong>to</strong> do with<br />

the perception of those sounds. Indeed, much of the<br />

legacy of computer music has revolved around parametric<br />

(using mathematical algorithms, controlled by a few<br />

well-chosen/-designed control parameters) analysis and<br />

synthesis algorithms. We will discuss most of the commonly<br />

used algorithms later, but first we should look at<br />

PCM in more depth.<br />

For speech, the most common synthesis technique is<br />

concatenative synthesis [17.3]. Concatenative phoneme<br />

synthesis relies on the concatenation of roughly 40<br />

pre-s<strong>to</strong>red phonemes (for English). Examples of vowel<br />

phonemes are /i/ as in beet, /I/ as in bit, /a/ as in father,<br />

etc. Examples of nasals are /m/ as in mom, /n/<br />

Computer Music 17.2 Pulse Code Modulation Synthesis 717<br />

where<br />

T = 1/SRATE .<br />

The sinc function is the ideal low-pass filter with<br />

a cu<strong>to</strong>ff of SRATE/2, where SRATE is the sampling<br />

rate. Figure 17.4 shows reconstruction of a continuous<br />

waveform by convolution of a digital signal with the sinc<br />

function. Each sample is multiplied by a corresponding<br />

continuous sinc function, and those are added up <strong>to</strong><br />

arrive at the continuous reconstructed signal [17.2].<br />

Resampling: This is usually accomplished at the<br />

same time as interpolation, because it is not necessary <strong>to</strong><br />

reconstruct the entire continuous waveform in order <strong>to</strong><br />

acquire new discrete samples. The resampling ratio can<br />

be time varying, making the problem a little more difficult.<br />

However, viewing the problem as a filter-design and<br />

implementation issue allows for guaranteed tradeoffs of<br />

quality and computational complexity.<br />

as in none, /ng/ as in sing, etc. Examples of fricative<br />

consonant phonemes are /s/ as in sit, /sh/ as in<br />

ship, /f/ as in fifty, etc. Examples of voiced fricative<br />

consonants are /z/, /v/ (visualize), etc. Examples of<br />

plosive consonants are /t/ as in tat, /p/ as in pop, /k/<br />

as in kick, etc. Examples of voiced plosives include /d/,<br />

/b/, /g/ (dude, bob, gag) etc. Vowels and nasals are<br />

essentially periodic pitched sounds, so the minimal required<br />

s<strong>to</strong>red waveform is only one single period of<br />

each. Consonants require more s<strong>to</strong>rage because of their<br />

noisy (non-pitched, aperiodic) nature.<br />

The quality of concatenative phoneme synthesis is<br />

generally considered quite low, due <strong>to</strong> the simplistic<br />

assumption that all of the pitched sounds (vowels, etc.)<br />

are purely periodic. Also, simply gluing /s/ /I/ and<br />

/ng/ <strong>to</strong>gether does not make for a high-quality realistic<br />

synthesis of the word “sing”. In actual speech, phonemes<br />

gradually blend in<strong>to</strong> each other as the jaw, <strong>to</strong>ngue, and<br />

other articula<strong>to</strong>rs move with time.<br />

Accurately capturing the transitions between<br />

phonemes with PCM requires recording transitions from<br />

phoneme <strong>to</strong> phoneme, called diphones. A concatenative<br />

diphone synthesizer blends <strong>to</strong>gether s<strong>to</strong>red diphones.<br />

Examples of diphones include see, she, thee, and most of<br />

the roughly 40x40 possible combinations of phonemes.<br />

Much more s<strong>to</strong>rage is necessary for a diphone synthesizer,<br />

but the resulting increase in quality is significant.<br />

PCM speech synthesis can be improved further by s<strong>to</strong>r-<br />

Part E 17.2

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!