28.02.2013 Views

Introduction to Acoustics

Introduction to Acoustics

Introduction to Acoustics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

tional rather than absolute (à la <strong>to</strong>ne intervals defined as<br />

frequency ratios).<br />

Speech is rather a set of movements made audible<br />

than a set of sounds produced by movements [16.14,<br />

p. 33].<br />

In keeping with this classical statement, many investiga<strong>to</strong>rs<br />

have argued that speech entities are <strong>to</strong> be found<br />

at some upstream speech production level and should be<br />

defined as gestures [16.170–173].<br />

At the opposite end, we find researchers (e.g.,<br />

Perkell [16.163]) who endorse Roman Jakobson’s view<br />

which, in the search for units gives primacy <strong>to</strong> the perceptual<br />

representation of speech sounds. To Jakobson<br />

the stages of the speech chain form an “ ... operational<br />

hierarchy of levels of decreasing pertinence: perceptual,<br />

aural, acoustical and articula<strong>to</strong>ry (the latter carrying no<br />

direct information <strong>to</strong> the receiver).” [16.174]<br />

These viewpoints appear <strong>to</strong> conflict and do indeed<br />

divide the field in<strong>to</strong> those who see speech as a mo<strong>to</strong>ric<br />

code (the gesturalist camp) and those who maintain<br />

that it is primarily shaped by perceptual processes (the<br />

listener-oriented school).<br />

There is a great deal of experimental evidence for<br />

both sides, suggesting that this dilemma is not an<br />

either–or issue but that both parties offer valuable complementary<br />

perspectives.<br />

A different approach is taken by the H & H (Hyper<br />

and Hypo) theory [16.175, 176]. This account is<br />

developed from the following key observations about<br />

speaker–listener interaction:<br />

1. Speech perception is always a product of signal<br />

information and listener knowledge;<br />

2. Speech production is adaptively organized.<br />

Here is an experiment that illustrates the first claim<br />

about perceptual processing. Two groups of subjects<br />

listen <strong>to</strong> a sequence of two phrases: a question followed<br />

by an answer. The subject groups hear different<br />

questions but a single physically identical reply. The<br />

subjects’ task is <strong>to</strong> say how many words the reply<br />

contains.<br />

The point made here is that Group 1 subjects hear<br />

[Ð�×� �a�Ú] as“less than five”. Those in Group 2 interpret<br />

it as “lesson five”. The first group’s response<br />

is “three words”, and the answer of the second is<br />

“two words”. This is despite the fact that physically the<br />

[Ð�×� �a�Ú] stimulus is exactly the same. The syllabic [�]<br />

signals the word than in one case and the syllable –on in<br />

the other. Looking for the invariant correlates of the initial<br />

consonant than is doomed <strong>to</strong> failure because of the<br />

severe degree of reduction.<br />

The Human Voice in Speech and Singing 16.8 Control of Sound in Speech and Singing 705<br />

To proponents of H & H theory this is not an isolated<br />

case. This is the way that perception in general<br />

works. The speech percepts can never be raw records of<br />

the signal because listener knowledge will inevitably interact<br />

with the stimulus and will contribute <strong>to</strong> shaping<br />

the percept.<br />

Furthermore, H & H theory highlights the fact that<br />

spoken messages show a non-uniform distribution of<br />

information in that predictability varies from situation<br />

<strong>to</strong> situation, from word <strong>to</strong> word and from syllable <strong>to</strong><br />

syllable. Compare (a) and (b) below. What word is likely<br />

<strong>to</strong> be represented by the gap?<br />

1. “The next word is .”<br />

2. “A bird in the hand is worth two in the .”<br />

Any word can be expected in (1) whereas in (2) the<br />

predictability of the word “bush” is high.<br />

H & H theory assumes that, while learning and using<br />

their native language, speakers develop a sense of<br />

this informational dynamics. Introducing the abstraction<br />

of an ideal speaker, it proposes that the talker estimates<br />

the running contribution that signal-complementary information<br />

(listener knowledge) will make during the<br />

course of the utterance and then tunes his articula<strong>to</strong>ry<br />

performance <strong>to</strong> the presumed short-term listener<br />

needs. Statistically, this type of behavior has the longterm<br />

consequence of distributing phonetic output forms<br />

along a continuum with clear and elaborated forms<br />

(hyperspeech) at one end and casual and reduced pronunciations<br />

(hypospeech) at the other.<br />

The implication for the invariance issue is that the<br />

task of the speaker is not <strong>to</strong> encode linguistic units as<br />

physical invariants, but <strong>to</strong> make sure that the signal attributes<br />

(of phonemes, syllables, words and phrases)<br />

carry discriminative power sufficient for successful lexical<br />

access. To make the same point in slightly different<br />

terms, the task of the signal is not <strong>to</strong> embody phonetic<br />

patterns of constancy but <strong>to</strong> provide missing information.<br />

It is interesting <strong>to</strong> put this account of speech<br />

processes next <strong>to</strong> what we know about singing. Experimental<br />

observations indicate that singing in tune<br />

is not simply about invariant F0 targets. F0 is affected<br />

by the singer’s expressive performance [16.177]<br />

and by the <strong>to</strong>nal context in which a given note is<br />

embedded [16.178]. Certain deviations from nominal<br />

target frequencies are not unlike the coarticulation and<br />

undershoot effects ubiqui<strong>to</strong>usly present in speech. Accordingly,<br />

with respect <strong>to</strong> frequency control, speaking<br />

and singing are qualitatively similar. However, quantitatively,<br />

they differ drastically. Recall that in describing the<br />

dynamics of vowel reduction we noted that formant fre-<br />

Part E 16.8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!