PDF version of Lecture Slides - Speech Resource Pages ...

Speech Acoustics 

Acoustics of Coarticulation 

Dr Robert H. Mannell 

Department of Linguistics 

Macquarie University 

Coarticulation Introduction (2) 

• There are no acoustic boundaries between 

phonemes (except across intonational 

phrase boundaries which are 

characterised by pauses). 

• This continuous transitioning between 

phonemes is a fundamental characteristic 

of coarticulation 


• Coarticulation is seen in acoustic 

representations of speech by its effect on 

the frequencies of the formants of vowels 

and vowel like sounds (sonorant 

phonemes) and effects on the resonant 

peaks of non-sonorant consonants 


• Articulatory gestures overlap 

• The gestures of slower moving articulators 

overlap more than those of fast moving 

articulators 

• The effect of overlapping gestures is 

called coarticulation 

• Phonemes rarely occur in isolation 

• Phonemes are normally articulated as part 

of a syllable 


• Coarticulation is the effect of one adjacent 

sound on another (and vice versa) and 

can occur across all boundaries except 

prosodic boundaries characterised by a 

pause 


• We perceive speech by recognising the 

(auditorily-transformed) acoustic patterns 

of syllables 

• Coarticulation tends to be stronger within 

syllables rather than across syllable 

boundaries 

• The higher the level of an intervening 

boundary (phoneme, syllable, prosodic) 

the less the extent of coarticulation 

between adjacent sounds 

1


• Vowels affect the articulation of adjacent 

consonants (and adjacent vowels) 

• Consonants affect the articulation of 

adjacent vowels (and other adjacent 

consonants) 


Tongue Height and Coarticulation 

• Most lingual consonants have a high 

tongue position 

• High vowels are least affected by these 

high consonants 

• Low vowels are most affected by these 

high consonants 

• Why is this important? 

Phoneme Inventory Effects (1) 

• Coarticulation is resisted when it will result 

in perceptual confusion 

• Vowels coarticulate most strongly in 

languages with a small number of vowels 

• Consonants coarticulate most in 

languages with a small number of places 

of articulation 


• Some sounds are more resistant to 

coarticulation than other sounds 

• Coarticulation is greatest when there is the 

greatest articulator movement between 

phonemes 


• The lower the vowel, the lower the tongue 

position is for the production of that vowel. 

• For low vowels the tongue has a greater 

distance to travel (compared to high 

vowels) to achieve appropriate stricture 

relative to the teeth (dental, not 

labiodental), hard or soft palate. 

• For bilabial, labiodental, and glottal this 

consonantal effect is absent as there is no 

required consonantal tongue contact. 


• In English there are a large number of 

vowels. English speakers need to restrict 

the degree of coarticulation to prevent 

vowels from crossing over into the 

articulatory and acoustic space of another 

vowel phoneme. 

• We mostly need to ensure that long 

vowels are not confused for other long 

vowels (and short vowels for short vowels) 

2


• In Arabic there are three long vowels and 

three short vowels (plus some diphthongs). 

• There are two high vowel positions (front 

and back) and no mid vowels. Arabic high 

vowels can move down to mid-high 

positions without confusion and can also 

move a small distance towards the high 

central position without being confused. 


• In English there are only three stop places 

of articulation (or four if the post-alveolar 

affricate is treated as a stop). 

• English alveolars can move to dental 

without confusion. 

• English velars can, and do, move between 

palatal and uvular as a consequence of 

coarticulation with adjacent sounds 

(particularly vowels). 

Prosody, Stress & Coarticulation 

• One of the reasons why we stress a 

syllable or accent a word is to slow it 

down so that its articulatory patterns are 

more readily perceived. 

• Even in Australian Aboriginal languages 

the distinction between stop places of 

articulation is relaxed in weaker syllables 

and is only completely clear in strong 

syllables. 


• The Arabic low vowels can move around 

quite freely in the mid-low to low part of 

the vowel space and from front to back in 

this part of the vowel space. 

• There are significant differences in vowel 

production (and therefore formant values) 

in the context of different consonants and 

the pattern varies between dialects. 


• Most Australian Aboriginal languages have 

a much larger number of places of 

articulation for oral stops. 

• Compared to English there is much less 

freedom to move stop place of articulation 

around without the possibility of perceptual 

confusion. 

Locus Theory (1) 

• In its original form, locus theory assumed a 

fixed ideal target for each phoneme. This 

target would be achieved by the articulators 

if there was enough time to do so. 

• Undershoot is said to occur if the ideal target 

is not reached in time. The idea of 

“undershoot” implies a fixed ideal target. 

3


• Strong Version: All consonants have a 

fixed target which is realised at a single 

frequency for each formant. 

• The F2 target for a particular consonant is 

known as its F2 locus 

Syllable-based theories (1) 

• There is a growing consensus that the 

most important basic element of 

articulatory planning is the syllable. 

• In such approaches a gesture, rather than 

a series of phoneme targets, is the 

underlying strategy in articulatory planning 

(and for some theorists, also of speech 

perception). 

Coarticulation and Resonance (1) 

• Vocal tract articulation results in changing 

vocal tract shapes that can be described 

as one or more resonating cavities. 

• Resonating cavities can be modelled in 

terms of simple tube models (see the topic 

for this subject entitled “Acoustic Theory of 

Speech Production”). 


• Weak Version: Consonants don't have a 

fixed target as their targets are affected by 

coarticulation, but they do tend to have a 

locus space for each formant defined by a 

range of formant frequencies. The target 

frequency tends to be within this range 

and depends upon the adjacent sound 

• The weak version is supported by a vast 

body of research 

Syllable-based theories (2) 

• In such a theory we no longer have 

invariant phoneme targets (or even a 

range of targets) but rather we have 

patterns of articulation for each type of 

syllable. 

• The acoustic consequences of this 

approach results in similar acoustic 

predictions the weak version of the locus 

theory. 


• Each resonating cavity (tube) has its own 

set of resonance frequencies. 

• Most speech articulations, with the 

exception of the neutral vowel (similar, but 

not identical, to Australian English /3:/) 

can be simply modelled by two tubes. 

4


• For some types of speech sounds (eg. 

Vowels) the two tubes are strongly 

coupled acoustically. 

• For some speech sounds (eg. oral stops) 

the two tubes are uncoupled. 

• For intermediate degrees of stricture there 

are intermediate degrees of acoustic 

coupling. 


• In the case of voiceless oral stops, during 

the occlusion there is no sound. 

• At the release there is a sound burst. 

Initially this is mostly de-coupled from the 

posterior cavity and so its resonance 

characteristics are related mostly to the 

size of the front cavity. 


• When the back resonator is fully or mostly 

decoupled (eg. Stop burst) the acoustic 

spectrum is typically dominated by the 

main resonance frequency of the front 

resonator. 

• As the two cavity become more coupled 

the spectrum becomes more vowel like, 

with increasingly clear formant peaks 

(even during the latter part of the unvoiced 

aspiration) 


• Vowels – very strongly coupled cavities 

• Approximants – strongly coupled cavities 

• Fricatives – weakly coupled cavities 

• Oral Stops – uncoupled cavities 

• In the case of nasal stops the velum is open 

and the nasal cavity is the primary 

resonator whilst the oral tract behind the 

occlusion is a secondary coupled resonator. 


• Shortly after the burst there is the 

aspiration phase, which is similar to 

fricatives (but usually with a much greater 

airflow). During this phase there is an 

increasing degree of coupling of the front 

and back cavity. 

• At the onset of voicing, say for a following 

vowel, the coupling of the two tracks is 

vowel-like (ie. highly coupled) 


• The main spectral peak in a stop burst is 

very often continuous with one of the 

formants in the following aspiration and 

vowel. 

• In the topic on tubes and resonance you 

will recall that formants are not 

consistently related to a single cavity. 

5


• F1 is the acoustic correlate of the lowest 

frequency major resonance. 

• F2 is the acoustic coorrelate of the second 

lowest frequency major resonance. 

• For vowels and vowel-like sounds the 

strongest resonance of each of the two 

cavities are F1 and F2 

• F1 or F2 cavity affiliation varies 

depending upon which is the longer cavity. 


• Patterns of articulator movement result in 

changing points of greatest constriction, 

cavity sizes and degrees of acoustic 

coupling of vocal tract resonances. 

• This results in changing formant patterns 

in vowels, approximants and nasal stops 

and changing patterns of resonance that 

become more vowel-like as constriction is 

reduced. 

Schwa (2) 

• In Australian English schwa tends to be 

more centred (ie. Closer to the mid-centre 

position) than the front, back, high and low 

vowels. That is it tends to have mid-central 

values of F1 and F2. 

• The variation of tongue position, and 

therefore of F1 and F2, varies much more 

than for the mid-central vowel /3:/ 


• If the front cavity is shorter than the back 

cavity (determined by the point of greatest 

constriction) then it will have a higher 

frequency primary resonance than the 

back cavity and this will therefore be F2 

and the back cavity resonance will be F1. 

Schwa (1) 

• A true schwa has no phonological 

specification for tongue height and fronting 

(it only needs to be vocalic). 

• Schwa is therefore free to coarticulate 

strongly with adjacent consonants that 

have required tongue positions. 

• There is, therefore, significant variation in 

schwa formant values. 

Context F1 F2 

@p- 610 1360 

@t- 560 1570 

@k- 570 1650 

-p@ 610 1370 

-t@ 560 1540 

-k@ 570 1630 

-p@p- 460 1340 

-t@t- 410 1640 

-k@k- 410 1830 

-p@t- 430 1550 

-p@k- 450 1550 

-t@p- 440 1610 

-t@k- 380 1820 

-k@p- 480 1640 

-k@t- 420 1810 

Schwa (3) 

Bernard and Lloyd (1982) examined 

the schwa productions of 40 adult 

male speakers of General Australian 

English (20 from Sydney and 20 from 

Rockhampton). These schwas were 

produced in a number of consonantal 

contexts. The F1 and F2 data for the 

20 Sydney speakers is displayed here. 

They measured both the onset and 

offset of each schwa, but here only the 

overall mean values have been 

displayed and only for a sub-set of the 

consonant contexts. 

6

Schwa (4) 

Schwa (6) 

The three filled squares 

provide the context where 

the schwa is most 

affected by the consonant 

(as its on both sides). We 

can see that the tongue is 

higher for all three CVC 

tokens (the jaw is raised, 

even for /p@p/, on both 

sides of the schwa. In the 

other six tokens the jaw is 

able to be lower (initially 

for VC and finally for CV). 

• An initially surprising result is that schwa is 

more fronted in the context of /k/ rather than 

/t/. 

• It should be remembered that /k/ involves the 

tongue body whilst /t/ involves the tongue tip 

(which can move with some independence of 

the tongue body). 

• This independence is not complete as /t/ still 

pulls schwa forward compared to schwa in a 

/p/ context. 

Schwa (8) 

Here we can see some 

additional contexts in 

which we have different 

stop places of articulation 

in initial and final position. 

Note that the F1 is low 

(jaw raised) similar to the 

other CVC tokens. Also 

note that the mixed place 

of articulation contexts 

results in schwas with 

mostly intermediate 

values. 

Schwa (5) 

• In the previous slide schwa is the most 

retracted in the context of /p/. As /p/ is 

labial, rather than oral, the consonant has 

only a minimal effect (jaw height) on 

schwa position so we might assume that 

this is more like the neutral position for 

Aus.E. schwa. 

Schwa (7) 

• In Aus.E. /k/ is fronted (palatal) adjacent to 

central vowels so this pulls the whole 

tongue forward, affecting the schwa 

articulation. This suggests that schwa is 

regarded as centred by Aus.E. speakers 

for the purposes of articulatory planning. 

Alveolar and Velar Stops (1) 

• In the following slides we examine the 

effect of the following vowel on the major 

burst resonance that is in each case 

continuous with F2. 

• The consistent context is /@CVt@/. 

• CV is in all cases the second syllable of 

each of these three syllable “words”. 

7


• In the first group of six slides the stop in 

the “C” is /d/ whilst in the second group of 

six slides the “C” is /g/ 

• For each consonant the “V” context is 

varied, with the vowels /i: I 6: o: 3: @/ 

following in the same syllable. 

• These tokens are all spoken at a normal 

speaking rate. 


Stop: /d/ 

Vowel context: /i:/ 

Burst F2: 1700 Hz 


Stop: /d/ 

Vowel context: /6:/ 



• To simplify reading the spectrograms, 

continuous F1, F2 and F3 tracks have been 

superimposed over the spectrograms. 

This has been done by hand to avoid 

algorithm errors that commonly affect 

formant tracks across consonants in most 

formant tracking software. 

• A blue cross marks to acoustic feature that 

will mostly attend to in this lecture 

(measured to nearest 50 Hz). 


Stop: /d/ 

Vowel context: /I/ 



Stop: /d/ 

Vowel context: /o:/ 


8


Stop: /d/ 




Data summary for /d/ 

• Frequency of this burst resonance feature 

varied by 400 Hz (1500-1900 Hz) 

• Most tokens were at 1700-1750 Hz 

• /3:/ at 1900 Hz 

• /@/ at 1500 Hz 


Stop: /g/ 

Vowel context: /I/ 



Stop: /d/ 

Vowel context: /@/ 



Stop: /g/ 

Vowel context: /i:/ 



Stop: /g/ 



xxx 

9


Stop: /g/ 

Vowel context: /o:/ 



Stop: /g/ 

Vowel context: /@/ 



Conclusions /d/ 

• Coarticulatory effects on /6:/ are quite 

weak. 

• Alveolar place of articulation only allows 

for small variation in tongue tip placement. 

• Vowel articulations are tongue body rather 

than tongue tip articulations so they have 

a smaller effect on alveolar tongue tip 

placement. 


Stop: /g/ 




Data summary for /g/ 

• /i:/ 1950 Hz 

• /I/ 2200 Hz 

• /6:/ 2000 Hz 

• /o:/ 1350 Hz 

• /3:/ 1800 Hz 

• /@/ 1800 Hz 


Conclusions /g/ 

• /g/ articulations cluster into three groups. 

• Front vowels /I/ 2200 Hz 

• Central vowels /i: 6: 3: @/ 1800-2000 Hz 

(note that /i:/ has a central onglide which 

makes it less fronted at its onset). 

• Back vowels /o:/ 1350 Hz 

10


• These results are quite consistent with the 

hypothesis that a velar stop in English has 

a strong tendency to coarticulate with an 

adjacent vowel (in this case the onset of 

the following vowel). 

• This effect is most likely strongest within 

the same syllable, but we haven’t tested 

this as the preceding vowel was always a 

schwa so we can only see the effect of 

changing the vowel in the same syllable. 

References 

• J.R. Bernard and A.L. Lloyd (1982) “The indeterminate vowel in Australian 

English”, in Clark, J.E. (ed.) Collected papers on normal aspects of speech 

and language, 52 nd ANZAAS Conference, Speech and Language Section 

(25B), Sydney, Australia, May 1982. 

• R. Mannell (ca. 2005) Acoustic theory of speech production, Macquarie 

University, 

http://clas.mq.edu.au/speech/acoustics/frequency/acoustic_theory.html 

• R. Mannell (ca. 2005) Coarticulation, Macquarie University, 

http://clas.mq.edu.au/speech/acoustics/coarticulation/index.html 

11

PDF version of Lecture Slides - Speech Resource Pages ...

Create successful ePaper yourself

Delete template?

Save as template?