22.04.2013 Views

PDF version of Lecture Slides - Speech Resource Pages ...

PDF version of Lecture Slides - Speech Resource Pages ...

PDF version of Lecture Slides - Speech Resource Pages ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Speech</strong> Acoustics<br />

Acoustics <strong>of</strong> Coarticulation<br />

Dr Robert H. Mannell<br />

Department <strong>of</strong> Linguistics<br />

Macquarie University<br />

Coarticulation Introduction (2)<br />

• There are no acoustic boundaries between<br />

phonemes (except across intonational<br />

phrase boundaries which are<br />

characterised by pauses).<br />

• This continuous transitioning between<br />

phonemes is a fundamental characteristic<br />

<strong>of</strong> coarticulation<br />

Coarticulation Introduction (4)<br />

• Coarticulation is seen in acoustic<br />

representations <strong>of</strong> speech by its effect on<br />

the frequencies <strong>of</strong> the formants <strong>of</strong> vowels<br />

and vowel like sounds (sonorant<br />

phonemes) and effects on the resonant<br />

peaks <strong>of</strong> non-sonorant consonants<br />

Coarticulation Introduction (1)<br />

• Articulatory gestures overlap<br />

• The gestures <strong>of</strong> slower moving articulators<br />

overlap more than those <strong>of</strong> fast moving<br />

articulators<br />

• The effect <strong>of</strong> overlapping gestures is<br />

called coarticulation<br />

• Phonemes rarely occur in isolation<br />

• Phonemes are normally articulated as part<br />

<strong>of</strong> a syllable<br />

Coarticulation Introduction (3)<br />

• Coarticulation is the effect <strong>of</strong> one adjacent<br />

sound on another (and vice versa) and<br />

can occur across all boundaries except<br />

prosodic boundaries characterised by a<br />

pause<br />

Coarticulation Introduction (5)<br />

• We perceive speech by recognising the<br />

(auditorily-transformed) acoustic patterns<br />

<strong>of</strong> syllables<br />

• Coarticulation tends to be stronger within<br />

syllables rather than across syllable<br />

boundaries<br />

• The higher the level <strong>of</strong> an intervening<br />

boundary (phoneme, syllable, prosodic)<br />

the less the extent <strong>of</strong> coarticulation<br />

between adjacent sounds<br />

1


Coarticulation Introduction (6)<br />

• Vowels affect the articulation <strong>of</strong> adjacent<br />

consonants (and adjacent vowels)<br />

• Consonants affect the articulation <strong>of</strong><br />

adjacent vowels (and other adjacent<br />

consonants)<br />

Coarticulation Introduction (8)<br />

Tongue Height and Coarticulation<br />

• Most lingual consonants have a high<br />

tongue position<br />

• High vowels are least affected by these<br />

high consonants<br />

• Low vowels are most affected by these<br />

high consonants<br />

• Why is this important?<br />

Phoneme Inventory Effects (1)<br />

• Coarticulation is resisted when it will result<br />

in perceptual confusion<br />

• Vowels coarticulate most strongly in<br />

languages with a small number <strong>of</strong> vowels<br />

• Consonants coarticulate most in<br />

languages with a small number <strong>of</strong> places<br />

<strong>of</strong> articulation<br />

Coarticulation Introduction (7)<br />

• Some sounds are more resistant to<br />

coarticulation than other sounds<br />

• Coarticulation is greatest when there is the<br />

greatest articulator movement between<br />

phonemes<br />

Coarticulation Introduction (9)<br />

• The lower the vowel, the lower the tongue<br />

position is for the production <strong>of</strong> that vowel.<br />

• For low vowels the tongue has a greater<br />

distance to travel (compared to high<br />

vowels) to achieve appropriate stricture<br />

relative to the teeth (dental, not<br />

labiodental), hard or s<strong>of</strong>t palate.<br />

• For bilabial, labiodental, and glottal this<br />

consonantal effect is absent as there is no<br />

required consonantal tongue contact.<br />

Phoneme Inventory Effects (2)<br />

• In English there are a large number <strong>of</strong><br />

vowels. English speakers need to restrict<br />

the degree <strong>of</strong> coarticulation to prevent<br />

vowels from crossing over into the<br />

articulatory and acoustic space <strong>of</strong> another<br />

vowel phoneme.<br />

• We mostly need to ensure that long<br />

vowels are not confused for other long<br />

vowels (and short vowels for short vowels)<br />

2


Phoneme Inventory Effects (3)<br />

• In Arabic there are three long vowels and<br />

three short vowels (plus some diphthongs).<br />

• There are two high vowel positions (front<br />

and back) and no mid vowels. Arabic high<br />

vowels can move down to mid-high<br />

positions without confusion and can also<br />

move a small distance towards the high<br />

central position without being confused.<br />

Phoneme Inventory Effects (5)<br />

• In English there are only three stop places<br />

<strong>of</strong> articulation (or four if the post-alveolar<br />

affricate is treated as a stop).<br />

• English alveolars can move to dental<br />

without confusion.<br />

• English velars can, and do, move between<br />

palatal and uvular as a consequence <strong>of</strong><br />

coarticulation with adjacent sounds<br />

(particularly vowels).<br />

Prosody, Stress & Coarticulation<br />

• One <strong>of</strong> the reasons why we stress a<br />

syllable or accent a word is to slow it<br />

down so that its articulatory patterns are<br />

more readily perceived.<br />

• Even in Australian Aboriginal languages<br />

the distinction between stop places <strong>of</strong><br />

articulation is relaxed in weaker syllables<br />

and is only completely clear in strong<br />

syllables.<br />

Phoneme Inventory Effects (4)<br />

• The Arabic low vowels can move around<br />

quite freely in the mid-low to low part <strong>of</strong><br />

the vowel space and from front to back in<br />

this part <strong>of</strong> the vowel space.<br />

• There are significant differences in vowel<br />

production (and therefore formant values)<br />

in the context <strong>of</strong> different consonants and<br />

the pattern varies between dialects.<br />

Phoneme Inventory Effects (6)<br />

• Most Australian Aboriginal languages have<br />

a much larger number <strong>of</strong> places <strong>of</strong><br />

articulation for oral stops.<br />

• Compared to English there is much less<br />

freedom to move stop place <strong>of</strong> articulation<br />

around without the possibility <strong>of</strong> perceptual<br />

confusion.<br />

Locus Theory (1)<br />

• In its original form, locus theory assumed a<br />

fixed ideal target for each phoneme. This<br />

target would be achieved by the articulators<br />

if there was enough time to do so.<br />

• Undershoot is said to occur if the ideal target<br />

is not reached in time. The idea <strong>of</strong><br />

“undershoot” implies a fixed ideal target.<br />

3


Locus Theory (2)<br />

• Strong Version: All consonants have a<br />

fixed target which is realised at a single<br />

frequency for each formant.<br />

• The F2 target for a particular consonant is<br />

known as its F2 locus<br />

Syllable-based theories (1)<br />

• There is a growing consensus that the<br />

most important basic element <strong>of</strong><br />

articulatory planning is the syllable.<br />

• In such approaches a gesture, rather than<br />

a series <strong>of</strong> phoneme targets, is the<br />

underlying strategy in articulatory planning<br />

(and for some theorists, also <strong>of</strong> speech<br />

perception).<br />

Coarticulation and Resonance (1)<br />

• Vocal tract articulation results in changing<br />

vocal tract shapes that can be described<br />

as one or more resonating cavities.<br />

• Resonating cavities can be modelled in<br />

terms <strong>of</strong> simple tube models (see the topic<br />

for this subject entitled “Acoustic Theory <strong>of</strong><br />

<strong>Speech</strong> Production”).<br />

Locus Theory (3)<br />

• Weak Version: Consonants don't have a<br />

fixed target as their targets are affected by<br />

coarticulation, but they do tend to have a<br />

locus space for each formant defined by a<br />

range <strong>of</strong> formant frequencies. The target<br />

frequency tends to be within this range<br />

and depends upon the adjacent sound<br />

• The weak <strong>version</strong> is supported by a vast<br />

body <strong>of</strong> research<br />

Syllable-based theories (2)<br />

• In such a theory we no longer have<br />

invariant phoneme targets (or even a<br />

range <strong>of</strong> targets) but rather we have<br />

patterns <strong>of</strong> articulation for each type <strong>of</strong><br />

syllable.<br />

• The acoustic consequences <strong>of</strong> this<br />

approach results in similar acoustic<br />

predictions the weak <strong>version</strong> <strong>of</strong> the locus<br />

theory.<br />

Coarticulation and Resonance (2)<br />

• Each resonating cavity (tube) has its own<br />

set <strong>of</strong> resonance frequencies.<br />

• Most speech articulations, with the<br />

exception <strong>of</strong> the neutral vowel (similar, but<br />

not identical, to Australian English /3:/)<br />

can be simply modelled by two tubes.<br />

4


Coarticulation and Resonance (3)<br />

• For some types <strong>of</strong> speech sounds (eg.<br />

Vowels) the two tubes are strongly<br />

coupled acoustically.<br />

• For some speech sounds (eg. oral stops)<br />

the two tubes are uncoupled.<br />

• For intermediate degrees <strong>of</strong> stricture there<br />

are intermediate degrees <strong>of</strong> acoustic<br />

coupling.<br />

Coarticulation and Resonance (5)<br />

• In the case <strong>of</strong> voiceless oral stops, during<br />

the occlusion there is no sound.<br />

• At the release there is a sound burst.<br />

Initially this is mostly de-coupled from the<br />

posterior cavity and so its resonance<br />

characteristics are related mostly to the<br />

size <strong>of</strong> the front cavity.<br />

Coarticulation and Resonance (7)<br />

• When the back resonator is fully or mostly<br />

decoupled (eg. Stop burst) the acoustic<br />

spectrum is typically dominated by the<br />

main resonance frequency <strong>of</strong> the front<br />

resonator.<br />

• As the two cavity become more coupled<br />

the spectrum becomes more vowel like,<br />

with increasingly clear formant peaks<br />

(even during the latter part <strong>of</strong> the unvoiced<br />

aspiration)<br />

Coarticulation and Resonance (4)<br />

• Vowels – very strongly coupled cavities<br />

• Approximants – strongly coupled cavities<br />

• Fricatives – weakly coupled cavities<br />

• Oral Stops – uncoupled cavities<br />

• In the case <strong>of</strong> nasal stops the velum is open<br />

and the nasal cavity is the primary<br />

resonator whilst the oral tract behind the<br />

occlusion is a secondary coupled resonator.<br />

Coarticulation and Resonance (6)<br />

• Shortly after the burst there is the<br />

aspiration phase, which is similar to<br />

fricatives (but usually with a much greater<br />

airflow). During this phase there is an<br />

increasing degree <strong>of</strong> coupling <strong>of</strong> the front<br />

and back cavity.<br />

• At the onset <strong>of</strong> voicing, say for a following<br />

vowel, the coupling <strong>of</strong> the two tracks is<br />

vowel-like (ie. highly coupled)<br />

Coarticulation and Resonance (8)<br />

• The main spectral peak in a stop burst is<br />

very <strong>of</strong>ten continuous with one <strong>of</strong> the<br />

formants in the following aspiration and<br />

vowel.<br />

• In the topic on tubes and resonance you<br />

will recall that formants are not<br />

consistently related to a single cavity.<br />

5


Coarticulation and Resonance (9)<br />

• F1 is the acoustic correlate <strong>of</strong> the lowest<br />

frequency major resonance.<br />

• F2 is the acoustic coorrelate <strong>of</strong> the second<br />

lowest frequency major resonance.<br />

• For vowels and vowel-like sounds the<br />

strongest resonance <strong>of</strong> each <strong>of</strong> the two<br />

cavities are F1 and F2<br />

• F1 or F2 cavity affiliation varies<br />

depending upon which is the longer cavity.<br />

Coarticulation and Resonance (11)<br />

• Patterns <strong>of</strong> articulator movement result in<br />

changing points <strong>of</strong> greatest constriction,<br />

cavity sizes and degrees <strong>of</strong> acoustic<br />

coupling <strong>of</strong> vocal tract resonances.<br />

• This results in changing formant patterns<br />

in vowels, approximants and nasal stops<br />

and changing patterns <strong>of</strong> resonance that<br />

become more vowel-like as constriction is<br />

reduced.<br />

Schwa (2)<br />

• In Australian English schwa tends to be<br />

more centred (ie. Closer to the mid-centre<br />

position) than the front, back, high and low<br />

vowels. That is it tends to have mid-central<br />

values <strong>of</strong> F1 and F2.<br />

• The variation <strong>of</strong> tongue position, and<br />

therefore <strong>of</strong> F1 and F2, varies much more<br />

than for the mid-central vowel /3:/<br />

Coarticulation and Resonance (10)<br />

• If the front cavity is shorter than the back<br />

cavity (determined by the point <strong>of</strong> greatest<br />

constriction) then it will have a higher<br />

frequency primary resonance than the<br />

back cavity and this will therefore be F2<br />

and the back cavity resonance will be F1.<br />

Schwa (1)<br />

• A true schwa has no phonological<br />

specification for tongue height and fronting<br />

(it only needs to be vocalic).<br />

• Schwa is therefore free to coarticulate<br />

strongly with adjacent consonants that<br />

have required tongue positions.<br />

• There is, therefore, significant variation in<br />

schwa formant values.<br />

Context F1 F2<br />

@p- 610 1360<br />

@t- 560 1570<br />

@k- 570 1650<br />

-p@ 610 1370<br />

-t@ 560 1540<br />

-k@ 570 1630<br />

-p@p- 460 1340<br />

-t@t- 410 1640<br />

-k@k- 410 1830<br />

-p@t- 430 1550<br />

-p@k- 450 1550<br />

-t@p- 440 1610<br />

-t@k- 380 1820<br />

-k@p- 480 1640<br />

-k@t- 420 1810<br />

Schwa (3)<br />

Bernard and Lloyd (1982) examined<br />

the schwa productions <strong>of</strong> 40 adult<br />

male speakers <strong>of</strong> General Australian<br />

English (20 from Sydney and 20 from<br />

Rockhampton). These schwas were<br />

produced in a number <strong>of</strong> consonantal<br />

contexts. The F1 and F2 data for the<br />

20 Sydney speakers is displayed here.<br />

They measured both the onset and<br />

<strong>of</strong>fset <strong>of</strong> each schwa, but here only the<br />

overall mean values have been<br />

displayed and only for a sub-set <strong>of</strong> the<br />

consonant contexts.<br />

6


Schwa (4)<br />

Schwa (6)<br />

The three filled squares<br />

provide the context where<br />

the schwa is most<br />

affected by the consonant<br />

(as its on both sides). We<br />

can see that the tongue is<br />

higher for all three CVC<br />

tokens (the jaw is raised,<br />

even for /p@p/, on both<br />

sides <strong>of</strong> the schwa. In the<br />

other six tokens the jaw is<br />

able to be lower (initially<br />

for VC and finally for CV).<br />

• An initially surprising result is that schwa is<br />

more fronted in the context <strong>of</strong> /k/ rather than<br />

/t/.<br />

• It should be remembered that /k/ involves the<br />

tongue body whilst /t/ involves the tongue tip<br />

(which can move with some independence <strong>of</strong><br />

the tongue body).<br />

• This independence is not complete as /t/ still<br />

pulls schwa forward compared to schwa in a<br />

/p/ context.<br />

Schwa (8)<br />

Here we can see some<br />

additional contexts in<br />

which we have different<br />

stop places <strong>of</strong> articulation<br />

in initial and final position.<br />

Note that the F1 is low<br />

(jaw raised) similar to the<br />

other CVC tokens. Also<br />

note that the mixed place<br />

<strong>of</strong> articulation contexts<br />

results in schwas with<br />

mostly intermediate<br />

values.<br />

Schwa (5)<br />

• In the previous slide schwa is the most<br />

retracted in the context <strong>of</strong> /p/. As /p/ is<br />

labial, rather than oral, the consonant has<br />

only a minimal effect (jaw height) on<br />

schwa position so we might assume that<br />

this is more like the neutral position for<br />

Aus.E. schwa.<br />

Schwa (7)<br />

• In Aus.E. /k/ is fronted (palatal) adjacent to<br />

central vowels so this pulls the whole<br />

tongue forward, affecting the schwa<br />

articulation. This suggests that schwa is<br />

regarded as centred by Aus.E. speakers<br />

for the purposes <strong>of</strong> articulatory planning.<br />

Alveolar and Velar Stops (1)<br />

• In the following slides we examine the<br />

effect <strong>of</strong> the following vowel on the major<br />

burst resonance that is in each case<br />

continuous with F2.<br />

• The consistent context is /@CVt@/.<br />

• CV is in all cases the second syllable <strong>of</strong><br />

each <strong>of</strong> these three syllable “words”.<br />

7


Alveolar and Velar Stops (2)<br />

• In the first group <strong>of</strong> six slides the stop in<br />

the “C” is /d/ whilst in the second group <strong>of</strong><br />

six slides the “C” is /g/<br />

• For each consonant the “V” context is<br />

varied, with the vowels /i: I 6: o: 3: @/<br />

following in the same syllable.<br />

• These tokens are all spoken at a normal<br />

speaking rate.<br />

Alveolar and Velar Stops (4)<br />

Stop: /d/<br />

Vowel context: /i:/<br />

Burst F2: 1700 Hz<br />

Alveolar and Velar Stops (6)<br />

Stop: /d/<br />

Vowel context: /6:/<br />

Burst F2: 1700 Hz<br />

Alveolar and Velar Stops (3)<br />

• To simplify reading the spectrograms,<br />

continuous F1, F2 and F3 tracks have been<br />

superimposed over the spectrograms.<br />

This has been done by hand to avoid<br />

algorithm errors that commonly affect<br />

formant tracks across consonants in most<br />

formant tracking s<strong>of</strong>tware.<br />

• A blue cross marks to acoustic feature that<br />

will mostly attend to in this lecture<br />

(measured to nearest 50 Hz).<br />

Alveolar and Velar Stops (5)<br />

Stop: /d/<br />

Vowel context: /I/<br />

Burst F2: 1750 Hz<br />

Alveolar and Velar Stops (7)<br />

Stop: /d/<br />

Vowel context: /o:/<br />

Burst F2: 1700 Hz<br />

8


Alveolar and Velar Stops (8)<br />

Stop: /d/<br />

Vowel context: /3:/<br />

Burst F2: 1900 Hz<br />

Alveolar and Velar Stops (10)<br />

Data summary for /d/<br />

• Frequency <strong>of</strong> this burst resonance feature<br />

varied by 400 Hz (1500-1900 Hz)<br />

• Most tokens were at 1700-1750 Hz<br />

• /3:/ at 1900 Hz<br />

• /@/ at 1500 Hz<br />

Alveolar and Velar Stops (12)<br />

Stop: /g/<br />

Vowel context: /I/<br />

Burst F2: 2200 Hz<br />

Alveolar and Velar Stops (9)<br />

Stop: /d/<br />

Vowel context: /@/<br />

Burst F2: 1500 Hz<br />

Alveolar and Velar Stops (11)<br />

Stop: /g/<br />

Vowel context: /i:/<br />

Burst F2: 1950 Hz<br />

Alveolar and Velar Stops (13)<br />

Stop: /g/<br />

Vowel context: /6:/<br />

Burst F2: 2000 Hz<br />

xxx<br />

9


Alveolar and Velar Stops (14)<br />

Stop: /g/<br />

Vowel context: /o:/<br />

Burst F2: 1350 Hz<br />

Alveolar and Velar Stops (16)<br />

Stop: /g/<br />

Vowel context: /@/<br />

Burst F2: 1800 Hz<br />

Alveolar and Velar Stops (18)<br />

Conclusions /d/<br />

• Coarticulatory effects on /6:/ are quite<br />

weak.<br />

• Alveolar place <strong>of</strong> articulation only allows<br />

for small variation in tongue tip placement.<br />

• Vowel articulations are tongue body rather<br />

than tongue tip articulations so they have<br />

a smaller effect on alveolar tongue tip<br />

placement.<br />

Alveolar and Velar Stops (15)<br />

Stop: /g/<br />

Vowel context: /3:/<br />

Burst F2: 1800 Hz<br />

Alveolar and Velar Stops (17)<br />

Data summary for /g/<br />

• /i:/ 1950 Hz<br />

• /I/ 2200 Hz<br />

• /6:/ 2000 Hz<br />

• /o:/ 1350 Hz<br />

• /3:/ 1800 Hz<br />

• /@/ 1800 Hz<br />

Alveolar and Velar Stops (19)<br />

Conclusions /g/<br />

• /g/ articulations cluster into three groups.<br />

• Front vowels /I/ 2200 Hz<br />

• Central vowels /i: 6: 3: @/ 1800-2000 Hz<br />

(note that /i:/ has a central onglide which<br />

makes it less fronted at its onset).<br />

• Back vowels /o:/ 1350 Hz<br />

10


Alveolar and Velar Stops (20)<br />

• These results are quite consistent with the<br />

hypothesis that a velar stop in English has<br />

a strong tendency to coarticulate with an<br />

adjacent vowel (in this case the onset <strong>of</strong><br />

the following vowel).<br />

• This effect is most likely strongest within<br />

the same syllable, but we haven’t tested<br />

this as the preceding vowel was always a<br />

schwa so we can only see the effect <strong>of</strong><br />

changing the vowel in the same syllable.<br />

References<br />

• J.R. Bernard and A.L. Lloyd (1982) “The indeterminate vowel in Australian<br />

English”, in Clark, J.E. (ed.) Collected papers on normal aspects <strong>of</strong> speech<br />

and language, 52 nd ANZAAS Conference, <strong>Speech</strong> and Language Section<br />

(25B), Sydney, Australia, May 1982.<br />

• R. Mannell (ca. 2005) Acoustic theory <strong>of</strong> speech production, Macquarie<br />

University,<br />

http://clas.mq.edu.au/speech/acoustics/frequency/acoustic_theory.html<br />

• R. Mannell (ca. 2005) Coarticulation, Macquarie University,<br />

http://clas.mq.edu.au/speech/acoustics/coarticulation/index.html<br />

11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!