PDF version of Lecture Slides - Speech Resource Pages ...
PDF version of Lecture Slides - Speech Resource Pages ...
PDF version of Lecture Slides - Speech Resource Pages ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Speech</strong> Acoustics<br />
Acoustics <strong>of</strong> Coarticulation<br />
Dr Robert H. Mannell<br />
Department <strong>of</strong> Linguistics<br />
Macquarie University<br />
Coarticulation Introduction (2)<br />
• There are no acoustic boundaries between<br />
phonemes (except across intonational<br />
phrase boundaries which are<br />
characterised by pauses).<br />
• This continuous transitioning between<br />
phonemes is a fundamental characteristic<br />
<strong>of</strong> coarticulation<br />
Coarticulation Introduction (4)<br />
• Coarticulation is seen in acoustic<br />
representations <strong>of</strong> speech by its effect on<br />
the frequencies <strong>of</strong> the formants <strong>of</strong> vowels<br />
and vowel like sounds (sonorant<br />
phonemes) and effects on the resonant<br />
peaks <strong>of</strong> non-sonorant consonants<br />
Coarticulation Introduction (1)<br />
• Articulatory gestures overlap<br />
• The gestures <strong>of</strong> slower moving articulators<br />
overlap more than those <strong>of</strong> fast moving<br />
articulators<br />
• The effect <strong>of</strong> overlapping gestures is<br />
called coarticulation<br />
• Phonemes rarely occur in isolation<br />
• Phonemes are normally articulated as part<br />
<strong>of</strong> a syllable<br />
Coarticulation Introduction (3)<br />
• Coarticulation is the effect <strong>of</strong> one adjacent<br />
sound on another (and vice versa) and<br />
can occur across all boundaries except<br />
prosodic boundaries characterised by a<br />
pause<br />
Coarticulation Introduction (5)<br />
• We perceive speech by recognising the<br />
(auditorily-transformed) acoustic patterns<br />
<strong>of</strong> syllables<br />
• Coarticulation tends to be stronger within<br />
syllables rather than across syllable<br />
boundaries<br />
• The higher the level <strong>of</strong> an intervening<br />
boundary (phoneme, syllable, prosodic)<br />
the less the extent <strong>of</strong> coarticulation<br />
between adjacent sounds<br />
1
Coarticulation Introduction (6)<br />
• Vowels affect the articulation <strong>of</strong> adjacent<br />
consonants (and adjacent vowels)<br />
• Consonants affect the articulation <strong>of</strong><br />
adjacent vowels (and other adjacent<br />
consonants)<br />
Coarticulation Introduction (8)<br />
Tongue Height and Coarticulation<br />
• Most lingual consonants have a high<br />
tongue position<br />
• High vowels are least affected by these<br />
high consonants<br />
• Low vowels are most affected by these<br />
high consonants<br />
• Why is this important?<br />
Phoneme Inventory Effects (1)<br />
• Coarticulation is resisted when it will result<br />
in perceptual confusion<br />
• Vowels coarticulate most strongly in<br />
languages with a small number <strong>of</strong> vowels<br />
• Consonants coarticulate most in<br />
languages with a small number <strong>of</strong> places<br />
<strong>of</strong> articulation<br />
Coarticulation Introduction (7)<br />
• Some sounds are more resistant to<br />
coarticulation than other sounds<br />
• Coarticulation is greatest when there is the<br />
greatest articulator movement between<br />
phonemes<br />
Coarticulation Introduction (9)<br />
• The lower the vowel, the lower the tongue<br />
position is for the production <strong>of</strong> that vowel.<br />
• For low vowels the tongue has a greater<br />
distance to travel (compared to high<br />
vowels) to achieve appropriate stricture<br />
relative to the teeth (dental, not<br />
labiodental), hard or s<strong>of</strong>t palate.<br />
• For bilabial, labiodental, and glottal this<br />
consonantal effect is absent as there is no<br />
required consonantal tongue contact.<br />
Phoneme Inventory Effects (2)<br />
• In English there are a large number <strong>of</strong><br />
vowels. English speakers need to restrict<br />
the degree <strong>of</strong> coarticulation to prevent<br />
vowels from crossing over into the<br />
articulatory and acoustic space <strong>of</strong> another<br />
vowel phoneme.<br />
• We mostly need to ensure that long<br />
vowels are not confused for other long<br />
vowels (and short vowels for short vowels)<br />
2
Phoneme Inventory Effects (3)<br />
• In Arabic there are three long vowels and<br />
three short vowels (plus some diphthongs).<br />
• There are two high vowel positions (front<br />
and back) and no mid vowels. Arabic high<br />
vowels can move down to mid-high<br />
positions without confusion and can also<br />
move a small distance towards the high<br />
central position without being confused.<br />
Phoneme Inventory Effects (5)<br />
• In English there are only three stop places<br />
<strong>of</strong> articulation (or four if the post-alveolar<br />
affricate is treated as a stop).<br />
• English alveolars can move to dental<br />
without confusion.<br />
• English velars can, and do, move between<br />
palatal and uvular as a consequence <strong>of</strong><br />
coarticulation with adjacent sounds<br />
(particularly vowels).<br />
Prosody, Stress & Coarticulation<br />
• One <strong>of</strong> the reasons why we stress a<br />
syllable or accent a word is to slow it<br />
down so that its articulatory patterns are<br />
more readily perceived.<br />
• Even in Australian Aboriginal languages<br />
the distinction between stop places <strong>of</strong><br />
articulation is relaxed in weaker syllables<br />
and is only completely clear in strong<br />
syllables.<br />
Phoneme Inventory Effects (4)<br />
• The Arabic low vowels can move around<br />
quite freely in the mid-low to low part <strong>of</strong><br />
the vowel space and from front to back in<br />
this part <strong>of</strong> the vowel space.<br />
• There are significant differences in vowel<br />
production (and therefore formant values)<br />
in the context <strong>of</strong> different consonants and<br />
the pattern varies between dialects.<br />
Phoneme Inventory Effects (6)<br />
• Most Australian Aboriginal languages have<br />
a much larger number <strong>of</strong> places <strong>of</strong><br />
articulation for oral stops.<br />
• Compared to English there is much less<br />
freedom to move stop place <strong>of</strong> articulation<br />
around without the possibility <strong>of</strong> perceptual<br />
confusion.<br />
Locus Theory (1)<br />
• In its original form, locus theory assumed a<br />
fixed ideal target for each phoneme. This<br />
target would be achieved by the articulators<br />
if there was enough time to do so.<br />
• Undershoot is said to occur if the ideal target<br />
is not reached in time. The idea <strong>of</strong><br />
“undershoot” implies a fixed ideal target.<br />
3
Locus Theory (2)<br />
• Strong Version: All consonants have a<br />
fixed target which is realised at a single<br />
frequency for each formant.<br />
• The F2 target for a particular consonant is<br />
known as its F2 locus<br />
Syllable-based theories (1)<br />
• There is a growing consensus that the<br />
most important basic element <strong>of</strong><br />
articulatory planning is the syllable.<br />
• In such approaches a gesture, rather than<br />
a series <strong>of</strong> phoneme targets, is the<br />
underlying strategy in articulatory planning<br />
(and for some theorists, also <strong>of</strong> speech<br />
perception).<br />
Coarticulation and Resonance (1)<br />
• Vocal tract articulation results in changing<br />
vocal tract shapes that can be described<br />
as one or more resonating cavities.<br />
• Resonating cavities can be modelled in<br />
terms <strong>of</strong> simple tube models (see the topic<br />
for this subject entitled “Acoustic Theory <strong>of</strong><br />
<strong>Speech</strong> Production”).<br />
Locus Theory (3)<br />
• Weak Version: Consonants don't have a<br />
fixed target as their targets are affected by<br />
coarticulation, but they do tend to have a<br />
locus space for each formant defined by a<br />
range <strong>of</strong> formant frequencies. The target<br />
frequency tends to be within this range<br />
and depends upon the adjacent sound<br />
• The weak <strong>version</strong> is supported by a vast<br />
body <strong>of</strong> research<br />
Syllable-based theories (2)<br />
• In such a theory we no longer have<br />
invariant phoneme targets (or even a<br />
range <strong>of</strong> targets) but rather we have<br />
patterns <strong>of</strong> articulation for each type <strong>of</strong><br />
syllable.<br />
• The acoustic consequences <strong>of</strong> this<br />
approach results in similar acoustic<br />
predictions the weak <strong>version</strong> <strong>of</strong> the locus<br />
theory.<br />
Coarticulation and Resonance (2)<br />
• Each resonating cavity (tube) has its own<br />
set <strong>of</strong> resonance frequencies.<br />
• Most speech articulations, with the<br />
exception <strong>of</strong> the neutral vowel (similar, but<br />
not identical, to Australian English /3:/)<br />
can be simply modelled by two tubes.<br />
4
Coarticulation and Resonance (3)<br />
• For some types <strong>of</strong> speech sounds (eg.<br />
Vowels) the two tubes are strongly<br />
coupled acoustically.<br />
• For some speech sounds (eg. oral stops)<br />
the two tubes are uncoupled.<br />
• For intermediate degrees <strong>of</strong> stricture there<br />
are intermediate degrees <strong>of</strong> acoustic<br />
coupling.<br />
Coarticulation and Resonance (5)<br />
• In the case <strong>of</strong> voiceless oral stops, during<br />
the occlusion there is no sound.<br />
• At the release there is a sound burst.<br />
Initially this is mostly de-coupled from the<br />
posterior cavity and so its resonance<br />
characteristics are related mostly to the<br />
size <strong>of</strong> the front cavity.<br />
Coarticulation and Resonance (7)<br />
• When the back resonator is fully or mostly<br />
decoupled (eg. Stop burst) the acoustic<br />
spectrum is typically dominated by the<br />
main resonance frequency <strong>of</strong> the front<br />
resonator.<br />
• As the two cavity become more coupled<br />
the spectrum becomes more vowel like,<br />
with increasingly clear formant peaks<br />
(even during the latter part <strong>of</strong> the unvoiced<br />
aspiration)<br />
Coarticulation and Resonance (4)<br />
• Vowels – very strongly coupled cavities<br />
• Approximants – strongly coupled cavities<br />
• Fricatives – weakly coupled cavities<br />
• Oral Stops – uncoupled cavities<br />
• In the case <strong>of</strong> nasal stops the velum is open<br />
and the nasal cavity is the primary<br />
resonator whilst the oral tract behind the<br />
occlusion is a secondary coupled resonator.<br />
Coarticulation and Resonance (6)<br />
• Shortly after the burst there is the<br />
aspiration phase, which is similar to<br />
fricatives (but usually with a much greater<br />
airflow). During this phase there is an<br />
increasing degree <strong>of</strong> coupling <strong>of</strong> the front<br />
and back cavity.<br />
• At the onset <strong>of</strong> voicing, say for a following<br />
vowel, the coupling <strong>of</strong> the two tracks is<br />
vowel-like (ie. highly coupled)<br />
Coarticulation and Resonance (8)<br />
• The main spectral peak in a stop burst is<br />
very <strong>of</strong>ten continuous with one <strong>of</strong> the<br />
formants in the following aspiration and<br />
vowel.<br />
• In the topic on tubes and resonance you<br />
will recall that formants are not<br />
consistently related to a single cavity.<br />
5
Coarticulation and Resonance (9)<br />
• F1 is the acoustic correlate <strong>of</strong> the lowest<br />
frequency major resonance.<br />
• F2 is the acoustic coorrelate <strong>of</strong> the second<br />
lowest frequency major resonance.<br />
• For vowels and vowel-like sounds the<br />
strongest resonance <strong>of</strong> each <strong>of</strong> the two<br />
cavities are F1 and F2<br />
• F1 or F2 cavity affiliation varies<br />
depending upon which is the longer cavity.<br />
Coarticulation and Resonance (11)<br />
• Patterns <strong>of</strong> articulator movement result in<br />
changing points <strong>of</strong> greatest constriction,<br />
cavity sizes and degrees <strong>of</strong> acoustic<br />
coupling <strong>of</strong> vocal tract resonances.<br />
• This results in changing formant patterns<br />
in vowels, approximants and nasal stops<br />
and changing patterns <strong>of</strong> resonance that<br />
become more vowel-like as constriction is<br />
reduced.<br />
Schwa (2)<br />
• In Australian English schwa tends to be<br />
more centred (ie. Closer to the mid-centre<br />
position) than the front, back, high and low<br />
vowels. That is it tends to have mid-central<br />
values <strong>of</strong> F1 and F2.<br />
• The variation <strong>of</strong> tongue position, and<br />
therefore <strong>of</strong> F1 and F2, varies much more<br />
than for the mid-central vowel /3:/<br />
Coarticulation and Resonance (10)<br />
• If the front cavity is shorter than the back<br />
cavity (determined by the point <strong>of</strong> greatest<br />
constriction) then it will have a higher<br />
frequency primary resonance than the<br />
back cavity and this will therefore be F2<br />
and the back cavity resonance will be F1.<br />
Schwa (1)<br />
• A true schwa has no phonological<br />
specification for tongue height and fronting<br />
(it only needs to be vocalic).<br />
• Schwa is therefore free to coarticulate<br />
strongly with adjacent consonants that<br />
have required tongue positions.<br />
• There is, therefore, significant variation in<br />
schwa formant values.<br />
Context F1 F2<br />
@p- 610 1360<br />
@t- 560 1570<br />
@k- 570 1650<br />
-p@ 610 1370<br />
-t@ 560 1540<br />
-k@ 570 1630<br />
-p@p- 460 1340<br />
-t@t- 410 1640<br />
-k@k- 410 1830<br />
-p@t- 430 1550<br />
-p@k- 450 1550<br />
-t@p- 440 1610<br />
-t@k- 380 1820<br />
-k@p- 480 1640<br />
-k@t- 420 1810<br />
Schwa (3)<br />
Bernard and Lloyd (1982) examined<br />
the schwa productions <strong>of</strong> 40 adult<br />
male speakers <strong>of</strong> General Australian<br />
English (20 from Sydney and 20 from<br />
Rockhampton). These schwas were<br />
produced in a number <strong>of</strong> consonantal<br />
contexts. The F1 and F2 data for the<br />
20 Sydney speakers is displayed here.<br />
They measured both the onset and<br />
<strong>of</strong>fset <strong>of</strong> each schwa, but here only the<br />
overall mean values have been<br />
displayed and only for a sub-set <strong>of</strong> the<br />
consonant contexts.<br />
6
Schwa (4)<br />
Schwa (6)<br />
The three filled squares<br />
provide the context where<br />
the schwa is most<br />
affected by the consonant<br />
(as its on both sides). We<br />
can see that the tongue is<br />
higher for all three CVC<br />
tokens (the jaw is raised,<br />
even for /p@p/, on both<br />
sides <strong>of</strong> the schwa. In the<br />
other six tokens the jaw is<br />
able to be lower (initially<br />
for VC and finally for CV).<br />
• An initially surprising result is that schwa is<br />
more fronted in the context <strong>of</strong> /k/ rather than<br />
/t/.<br />
• It should be remembered that /k/ involves the<br />
tongue body whilst /t/ involves the tongue tip<br />
(which can move with some independence <strong>of</strong><br />
the tongue body).<br />
• This independence is not complete as /t/ still<br />
pulls schwa forward compared to schwa in a<br />
/p/ context.<br />
Schwa (8)<br />
Here we can see some<br />
additional contexts in<br />
which we have different<br />
stop places <strong>of</strong> articulation<br />
in initial and final position.<br />
Note that the F1 is low<br />
(jaw raised) similar to the<br />
other CVC tokens. Also<br />
note that the mixed place<br />
<strong>of</strong> articulation contexts<br />
results in schwas with<br />
mostly intermediate<br />
values.<br />
Schwa (5)<br />
• In the previous slide schwa is the most<br />
retracted in the context <strong>of</strong> /p/. As /p/ is<br />
labial, rather than oral, the consonant has<br />
only a minimal effect (jaw height) on<br />
schwa position so we might assume that<br />
this is more like the neutral position for<br />
Aus.E. schwa.<br />
Schwa (7)<br />
• In Aus.E. /k/ is fronted (palatal) adjacent to<br />
central vowels so this pulls the whole<br />
tongue forward, affecting the schwa<br />
articulation. This suggests that schwa is<br />
regarded as centred by Aus.E. speakers<br />
for the purposes <strong>of</strong> articulatory planning.<br />
Alveolar and Velar Stops (1)<br />
• In the following slides we examine the<br />
effect <strong>of</strong> the following vowel on the major<br />
burst resonance that is in each case<br />
continuous with F2.<br />
• The consistent context is /@CVt@/.<br />
• CV is in all cases the second syllable <strong>of</strong><br />
each <strong>of</strong> these three syllable “words”.<br />
7
Alveolar and Velar Stops (2)<br />
• In the first group <strong>of</strong> six slides the stop in<br />
the “C” is /d/ whilst in the second group <strong>of</strong><br />
six slides the “C” is /g/<br />
• For each consonant the “V” context is<br />
varied, with the vowels /i: I 6: o: 3: @/<br />
following in the same syllable.<br />
• These tokens are all spoken at a normal<br />
speaking rate.<br />
Alveolar and Velar Stops (4)<br />
Stop: /d/<br />
Vowel context: /i:/<br />
Burst F2: 1700 Hz<br />
Alveolar and Velar Stops (6)<br />
Stop: /d/<br />
Vowel context: /6:/<br />
Burst F2: 1700 Hz<br />
Alveolar and Velar Stops (3)<br />
• To simplify reading the spectrograms,<br />
continuous F1, F2 and F3 tracks have been<br />
superimposed over the spectrograms.<br />
This has been done by hand to avoid<br />
algorithm errors that commonly affect<br />
formant tracks across consonants in most<br />
formant tracking s<strong>of</strong>tware.<br />
• A blue cross marks to acoustic feature that<br />
will mostly attend to in this lecture<br />
(measured to nearest 50 Hz).<br />
Alveolar and Velar Stops (5)<br />
Stop: /d/<br />
Vowel context: /I/<br />
Burst F2: 1750 Hz<br />
Alveolar and Velar Stops (7)<br />
Stop: /d/<br />
Vowel context: /o:/<br />
Burst F2: 1700 Hz<br />
8
Alveolar and Velar Stops (8)<br />
Stop: /d/<br />
Vowel context: /3:/<br />
Burst F2: 1900 Hz<br />
Alveolar and Velar Stops (10)<br />
Data summary for /d/<br />
• Frequency <strong>of</strong> this burst resonance feature<br />
varied by 400 Hz (1500-1900 Hz)<br />
• Most tokens were at 1700-1750 Hz<br />
• /3:/ at 1900 Hz<br />
• /@/ at 1500 Hz<br />
Alveolar and Velar Stops (12)<br />
Stop: /g/<br />
Vowel context: /I/<br />
Burst F2: 2200 Hz<br />
Alveolar and Velar Stops (9)<br />
Stop: /d/<br />
Vowel context: /@/<br />
Burst F2: 1500 Hz<br />
Alveolar and Velar Stops (11)<br />
Stop: /g/<br />
Vowel context: /i:/<br />
Burst F2: 1950 Hz<br />
Alveolar and Velar Stops (13)<br />
Stop: /g/<br />
Vowel context: /6:/<br />
Burst F2: 2000 Hz<br />
xxx<br />
9
Alveolar and Velar Stops (14)<br />
Stop: /g/<br />
Vowel context: /o:/<br />
Burst F2: 1350 Hz<br />
Alveolar and Velar Stops (16)<br />
Stop: /g/<br />
Vowel context: /@/<br />
Burst F2: 1800 Hz<br />
Alveolar and Velar Stops (18)<br />
Conclusions /d/<br />
• Coarticulatory effects on /6:/ are quite<br />
weak.<br />
• Alveolar place <strong>of</strong> articulation only allows<br />
for small variation in tongue tip placement.<br />
• Vowel articulations are tongue body rather<br />
than tongue tip articulations so they have<br />
a smaller effect on alveolar tongue tip<br />
placement.<br />
Alveolar and Velar Stops (15)<br />
Stop: /g/<br />
Vowel context: /3:/<br />
Burst F2: 1800 Hz<br />
Alveolar and Velar Stops (17)<br />
Data summary for /g/<br />
• /i:/ 1950 Hz<br />
• /I/ 2200 Hz<br />
• /6:/ 2000 Hz<br />
• /o:/ 1350 Hz<br />
• /3:/ 1800 Hz<br />
• /@/ 1800 Hz<br />
Alveolar and Velar Stops (19)<br />
Conclusions /g/<br />
• /g/ articulations cluster into three groups.<br />
• Front vowels /I/ 2200 Hz<br />
• Central vowels /i: 6: 3: @/ 1800-2000 Hz<br />
(note that /i:/ has a central onglide which<br />
makes it less fronted at its onset).<br />
• Back vowels /o:/ 1350 Hz<br />
10
Alveolar and Velar Stops (20)<br />
• These results are quite consistent with the<br />
hypothesis that a velar stop in English has<br />
a strong tendency to coarticulate with an<br />
adjacent vowel (in this case the onset <strong>of</strong><br />
the following vowel).<br />
• This effect is most likely strongest within<br />
the same syllable, but we haven’t tested<br />
this as the preceding vowel was always a<br />
schwa so we can only see the effect <strong>of</strong><br />
changing the vowel in the same syllable.<br />
References<br />
• J.R. Bernard and A.L. Lloyd (1982) “The indeterminate vowel in Australian<br />
English”, in Clark, J.E. (ed.) Collected papers on normal aspects <strong>of</strong> speech<br />
and language, 52 nd ANZAAS Conference, <strong>Speech</strong> and Language Section<br />
(25B), Sydney, Australia, May 1982.<br />
• R. Mannell (ca. 2005) Acoustic theory <strong>of</strong> speech production, Macquarie<br />
University,<br />
http://clas.mq.edu.au/speech/acoustics/frequency/acoustic_theory.html<br />
• R. Mannell (ca. 2005) Coarticulation, Macquarie University,<br />
http://clas.mq.edu.au/speech/acoustics/coarticulation/index.html<br />
11