perilus v - Stockholms universitet
perilus v - Stockholms universitet
perilus v - Stockholms universitet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.<br />
UNIVERSITY OF STOCKHOLM<br />
INSTITUTE OF LINGUISTICS<br />
PERILUS V<br />
Starting with this issue, we will be changing slightly the publication<br />
policy of PERILUS. Earlier issues included experimental efforts of our<br />
graduate students in connection with their course work in<br />
experimental phonetics. Results of work on larger projects were, as a<br />
rule, published elsewhere. In the future we Will, of course, continue to<br />
publish our work in international periodicals. It is, however, our<br />
intention to mirror the entire spectrum of scientific activity in our lab<br />
through PERIL US. PERILUS can thus be viewed as our department's<br />
working papers in phonetics. We hope that this new PERILUS will serve<br />
as an effective avenue of communication with our colleagues in the<br />
field of phonetics. Copies of PERIL US are available from the Institute<br />
of Linguistics, Stockholm University, S-7 06 97 Stockholm, Sweden.<br />
Olle Engstrand<br />
Hartmut TraunmOller
THE PHONETICS LABORATORY GROUP<br />
Ann-Marie Alma<br />
Uif Andersson<br />
Peter Branderud<br />
Una Cunningham-Andersson<br />
Hassan Djamshidpey<br />
Ma1s Dufberg<br />
Olle Engstrand<br />
iii<br />
ACKNOWLEDGMENTS<br />
The research reported in this issue of PERILUS was sponsored in part<br />
by the following sources:<br />
THE SWEDISH COUNCIL FOR RESEARCH IN THE HUMANITIES<br />
AND SOCIAL SCIENCES<br />
THE SWEDISH NATURAL SCIENCE RESEARCH COUNCIL<br />
THE TRICENTENNIAL FOUNDATION OF THE BANK OF SWEDEN<br />
THE SWEDISH BOARD FOR TECHNICAL DEVELOPMENT
v<br />
PREVIOUS ISSUES OF PERILUS<br />
PERILUS 1 1978 - 1979<br />
Page<br />
1. INTRODUCTION<br />
Bjorn Lindblom and James Lubker<br />
4<br />
2. SOME ISSUES IN RESEARCH ON THE PERCEPTION<br />
OF STEADY-STATE VOWELS<br />
Vowel identification and spectral slope<br />
Eva Agelfors and Mary Graslund<br />
10<br />
Why does [a] change to [:l] when F O<br />
is increased?:<br />
Interplay between harmonic structure and forman frequency<br />
in the perception of vowel quality<br />
Ake Floren<br />
13<br />
Analysis and prediction of difference limen data<br />
for formant frequencies<br />
Lennart Nord and Eva Sventelius<br />
24<br />
Vowel identification as a function of<br />
increasing fundamental frequency<br />
Elisabeth Tenenholtz<br />
38<br />
Essentials of a psychoacoustic model of spectral matching<br />
Hartmut TraunmOlier<br />
49<br />
3. ON THE PERCEPTUAL ROLE OF DYNAMIC FEATURES<br />
IN THE SPEECH SIGNAL<br />
Interaction between spectral and durational cues<br />
in Swedish vowel contrasts<br />
Anette Bishop and Gunilla Edlund<br />
64<br />
On the distribution of [h] in the languages of the world:<br />
Is the rarity of syllable final [h] due to an asymmetry<br />
of backward and forward masking?<br />
Eva Holmberg and Alan Gibson<br />
68<br />
On the function of formant transitions<br />
I. Formant frequency target vs. rate of change in vowel identification 83<br />
II. Perception of steady vs. dynamic vowel sounds in noise<br />
92<br />
Karin Holmgren<br />
83<br />
Artificially clipped syllables and the role of formant transitions<br />
in consonant perception<br />
Hartmut TraunmOlier<br />
105
v<br />
4. PROSODY AND TOP DOWN PROCESSING<br />
The importance of timing and fundamental frequency contour<br />
information in the perception of prosodic categories<br />
Bertil Lyberg 123<br />
Speech perception in noise and the evaluation<br />
of language proficiency<br />
Alan C. Sheats<br />
134<br />
5. BLOD - A BLOCK DIAGRAM SIMULATOR<br />
Peter Branderud 151<br />
PERILUS II 1979-1980<br />
Page<br />
Introduction<br />
James Lubker<br />
Astudy of anticipatory labial coarticulation in the speech of children<br />
Asa Berlin, Ingrid Landberg and Lilian Persson 2<br />
Rapid reproduction of vowel-vowel sequences by children<br />
Ake Floren 19<br />
Production of bite-block vowels by children<br />
Alan Gibson and Lorrane McPhearson 26<br />
Laryngeal airway resistance as a function of phonation type<br />
Eva Holmberg<br />
44<br />
The declination effect in Swedish<br />
Diana Krull and Siv Wandeback<br />
58<br />
Compensatory articulation by deaf speakers<br />
Richard Schulman 74<br />
Neural and mechanical response time in the speech<br />
of cerebral palsied subjects<br />
Elisabeth Tenenholtz 87<br />
An acoustic investigation of production of plosives<br />
by cleft palate speakers<br />
Garda Ericsson 95
vi<br />
PERILUS '" 1982 - 1983<br />
Page<br />
Introduction<br />
Bjorn Lindblom<br />
Elicitation and perceptual judgement of disfluency and stuttering<br />
Anne-Marie Alme 3<br />
Intelligibility vs redundancy - conditions of dependency<br />
Sheri Hunnicut 27<br />
The role of vowel context on the perception<br />
of place of articulation for stops<br />
Diana Krull<br />
45<br />
Vowel categorization by the bilingual listener<br />
Richard Schulman 81<br />
Comprehension of foreign accents. (ACryptic investigation.)<br />
Richard Schulman and Maria Wingstedt 101<br />
Syntetiskt tal som hjalpmedel vid korrektion av dovas tal<br />
Anne-Marie Oster 115<br />
PERILUS IV 1984 - 1985<br />
Page<br />
Introduction<br />
Bjorn Lindblom<br />
Labial coarticulation in stutterers and normal speakers<br />
Ann-Marie Alme 3<br />
Movetrack<br />
Peter Branderud<br />
20<br />
Some evidence on rhythmic patterns of spoken French<br />
Danielle Duez and Yukihoro Nishinuma<br />
30<br />
On the relation between the acoustic properties of Swedish<br />
voiced stops and their perceptual processing<br />
Diana Krull 41<br />
Descriptive acoustic studies for the synthesis of spoken Swedish<br />
Francisco Lacerda 51
Frequency discrimination as a function of<br />
stimulus onset characteristics<br />
Francisco Lacerda<br />
66<br />
Speaker-listener interaction and phonetic variation<br />
Bjorn Lindblom and Rolf Lindgren 77<br />
Articulatory targeting and perceptual constancy of loud speech<br />
Richard Schulman<br />
86<br />
The role of the fundamental and the higher formants in<br />
the perception of speaker size, vocal effort, and vowel openness<br />
Hartmut TraunmOlier<br />
92
viii<br />
RECENT PUBLICATIONS<br />
AND PUBLICATIONS IN PROGRESS<br />
Una Cunningham-Andersson<br />
Durational correlates of post-vocalic voicing in English spoken by English and<br />
Spanish speakers. In Engstrand, O. (ed.): Papers from the Swedish Phonetics<br />
Conference held in Uppsala, Oct. 77-78, 7986, pp. 87-92.<br />
Olle Engstrand<br />
Salient features of Lule Sami pronunciation. In C.-C. E/ert (ed.): The Sounds of<br />
Lappish. University of Umea (in press).<br />
Articulatory correlates of stress and speaking rate in Swedish VCV utterances. J.<br />
Acoust. Soc. Am. (in press).<br />
The IRIS speech data base - a status report. In Engstrand, O. (ed.): Papers from<br />
the Swedish Phonetics Conference held in Uppsala, Oct. 77-78, 7986, pp.<br />
727-726.<br />
Diana Krull<br />
Spectrum and dynamics in the perception of stop consonants. In Engstrand, O.<br />
(ed.): Papers from the Swedish Phonetics Conference held in Uppsala, Oct.<br />
77-78, 7986, pp. 54-59, and contribution to the French-Swedish Research Meeting<br />
held in Grenoble, March, 7987.<br />
The locus-target relation in spontaneous speech. Contribution to the<br />
French-Swedish Research Meeting held in Grenoble, March, 7987.<br />
Evaluation of distance metrics using Swedish stop consonants. Proceedings of<br />
the 77th ICPhC, Tallinn 7987, Vol. 2, pp. 65-68.<br />
Bjorn Lindblom<br />
A typological study of consonant systems: the role of inventory size. In<br />
Engstrand, O. (ed.): Papers from the Swedish Phonetics Conference held in<br />
Uppsala, Oct. 77-78, 7986, pp. 7-9.<br />
Adaptive variability and absolute constancy in speech signals: two themes in<br />
the quest for phonetic invariance. Plenary Lecture, Proceedings of the 77th<br />
ICPhS, Tallinn 7987, Vol. 3, pp. 9-78.
ix<br />
Phonetic invariance and the adaptive nature of speech. Lecture presented at<br />
a symposium on 'Working models of human perception', celebrating the 30th<br />
anniversary of the Instituut voor Perceptie Onderzoek, Eindhoven, August 26-28,<br />
7987. Cambridge: Cambridge University Press.<br />
The concept of target and speech timing (with J. Lubker, B. Lyberg, P.<br />
Branderud and K. Holmgren). Festschrift for lise Lehiste. Dordrecht, The<br />
Netherlands: Foris (in press).<br />
Phonetic universals in consonant systems (with I. Maddieson). In L.M. Hyman<br />
and C.N. Li (eds. ): Language, brain and mind (in press).<br />
A model of phonetic variation and selection applied to the evolution of vowel<br />
systems. Presented in 7984 at a meeting at CASBS, Stanford. In S. -Y. W. Wang<br />
(ed.): Language transmission and change. New York: Blackwell (in press).<br />
Fonetik. Article submitted to the editorial board of Nationalencyklopedin.<br />
Evolution of spoken language (with P. MacNeilage and M. Studdert-Kennedy).<br />
Orlando, Florida: Academic Press (in preparation).<br />
Spraket, Lucy och datorn (with P. af Trampe). Stockholm: Bonniers (in<br />
preparation).<br />
Rolf Lindgren<br />
Phonetic reduction in spontaneous speech. Paper given at the TLH meeting in<br />
Lund, October 7987.<br />
Lennart Nord<br />
Acoustic studies of vowel reduction in Swedish. STL -QPSR 4/7986, 79-36.<br />
Vowel reduction in Swedish. In Engstrand, O. (ed. ): Papers from the Swedish<br />
Phonetics Conference held in Uppsala, Oct. 77-78, 7986, 76-27.<br />
Liselotte Roug<br />
Early phonetic development in four Swedish infants (with Ingrid Landberg and<br />
Lars-Johan Lundberg). In Engstrand, O. (ed. ): Papers from the Swedish<br />
Phonetics Conference held in Uppsala, Oct. 77-78, 7986, pp. 745-7 SO.
x<br />
Richard Schulman<br />
Articulatory dynamics of loud and normal speech. In Engstrand, O. (ed.):<br />
Papers from the Swedish Phonetics Conference held in Uppsala, Oct. 77-78,<br />
7986, pp. 60-64.<br />
Hartmut TraunmOlier<br />
Phase vowels. Psychophysics of speech perception, Dordrecht:<br />
M. Nijhoff PUbl., 7987, pp. 293-305.<br />
Some types of variation and invariant spec tro t features of vowels. In Engstrand,<br />
O. (ed. ): Papers from the Swedish Phonetics Conference held in Uppsala, Oct.<br />
77-78, 7986, pp. 48-53.<br />
Perceptual relativity in identification of two-formant vowels (with Francisco<br />
Lacerda). Speech Communication 5 (in press).<br />
An experiment on the cues to the identification of fricatives. Proceedings of the<br />
77th ICPhS, Tallinn 7987, Vol. 5., pp. 205-208.<br />
Maria Wingstedt<br />
Foreign accents and perceptual processing (with Richard Sculman). In<br />
Engstrand, O. (ed. ): Papers from the Swedish Phonetics Conference held in<br />
Uppsala, Oct. 77-78, 7986, pp. 93-97.<br />
DISSERTATIONS<br />
Garda Eriksson (1987). Analysis and treatment of cleft palate speech:<br />
Some acoustic-phonetic observations. Link6ping University Medical<br />
Dissertations No. 254. ISSN 0345-0082.<br />
Lennart Nord (1987). Acoustic-phonetic studies of Swedish with an<br />
excursion into pathological speech. TRITA-TO M-87-1. Department of<br />
Speech Communication and Musical Acoustics, Royal Institute of<br />
Technology, Stockholm. ISSN 0280-9850.
X<br />
CONTENTS OF PERILUS V<br />
Peter Branderud 1<br />
About the computer-lab<br />
Bjorn Lindblom<br />
Adaptive variability and absolute constancy<br />
in speech signals: two themes in the quest for<br />
phonetic invariance 2<br />
Richard Schulman<br />
Articulatory dynamics of loud<br />
and normal speech 21<br />
Hartmut TraunmOlier<br />
& Diana Krull<br />
An experiment on the cues to the<br />
identification of fricatives 33<br />
Diana Krull<br />
Second formant locus patterns<br />
as a measure of consonant-vowel<br />
coamculanon 43<br />
Madeleine Wulffson<br />
Exploring discourse intonation in Swedish 62<br />
Mats Dufberg<br />
Why two labialization strategies in Setswana? 78<br />
Liselotte Roug I<br />
Ingrid Landberg &<br />
Lars-Johan Lundberg<br />
Phonetic development in early infancy -<br />
a study of four Swedish children<br />
during the first 78 months of life 93<br />
Johan Stark & Mats Dufberg<br />
A simple computerized response<br />
collection system 140<br />
Robert McAllister I<br />
Mats Dufberg &<br />
Maria Wallius<br />
Experiments with technical aids in<br />
pronunciation teaching<br />
144
ABOUT THE COMPUTER-LAB<br />
Peter Branderud<br />
In the last year we have been able to build up a new and modern<br />
computer system with grants from Wallenberg Foundation and FRN.<br />
Our old computer system is from 1975-83. It consists of two<br />
mini-computers with 200 MB disk storage each. It can accommodate four<br />
users at the same time. There is software for signal processing,<br />
acoustical analysis/synthesis, simulation of perception/production etc.<br />
It can A/D convert up to 16 channels with 12 bits resolution into the<br />
computer and it can D/A convert 2 channels with 16 bits resolution from<br />
the computer.<br />
The new computer system consists of several Apollo workstations that<br />
are connected by a fast network. We will also connect several PC/XT/AT<br />
via an Ethernet network.<br />
Presently we have two Apollo DN3000 with black and white displays and<br />
two DN3000 with color displays. Each work station has about 4 MB<br />
primary memory. We have 450 MB harddisk memory. We use the operating<br />
system Unix 4.2 and the programming languages C, Pascal, Fortran 77 and<br />
Commonlisp. We also plan to get Prolog.<br />
There is a connection between the old and the new computer systems<br />
that enables a fast file-transfer between the systems. We can also run<br />
the old system through a window on the Apollo workstations. That makes<br />
it possible and easy to use the best software on each system on our<br />
data.<br />
We are also preparing to install the program package Audlab from the<br />
Alvey-group. We will continuously transfer the most important programs<br />
from our old system to the new Apollo system. Some of the work stations<br />
will also be equipped with A/D- and D/A-converters. For the printouts<br />
we use laser-writers.<br />
We will continuously expand the system: for example, Harddisks,<br />
laserdisks, primary memory, working stations, array-processors and<br />
software.<br />
These increased resources will make our lab more modern and complete,<br />
thereby strongly enlarging our possibilities to engage in larger<br />
projects and receive more guest researchers.<br />
In addition, direct interaction with other laboratories around the world<br />
will become easy and efficient.
ADAPTIVE VARIABILITY AND ABSOLUTE CONSTANCY IN<br />
SPEECH SIGNALS:<br />
TWO THEMES IN THE QUEST FOR PHONETIC INVARIANCE*<br />
BjOrn<br />
Lindblom<br />
ABSTRACT<br />
Our topic is the classical problem of reconciling the<br />
physical and linguistic descriptions of speech: the<br />
invariance issue. Evidence is first presented indicating the<br />
possibility of defining phonetic invariance at the<br />
articulatory, acoustic or auditory levels of the speech<br />
signal. However, as we broaden the scope of our review, we<br />
find that attempts to define phonetic invariance in terms of<br />
absolute physical constancies tend to lose ground to<br />
theories that recognize signal variability as an essentially<br />
systematic and adaptive consequence of the informational<br />
mutuality of natural speaker-listener interactions. We reach<br />
this conclusion not only by examining experimental data on<br />
on-line speech processes but also by analyzing typological<br />
evidence on how the phonetic structure of consonant systems<br />
vary in lawful patterns with inventory size.<br />
INTRODUCTION<br />
Traditionally the problem of invariance in phonetics<br />
can be said to be that of proposing physical descriptions of<br />
linguistic entities that have the characteristic of<br />
remaining invariant across the large range of contexts that<br />
the communicatively successful real-life speech acts present<br />
to us.<br />
Many of us share the conviction that taking steps<br />
towards the solution of this problem will be crucial if we<br />
are to acquire a deeper theoretical understanding of the<br />
behavior of speakers and listeners as well as develop more<br />
advanced systems for speech-based man-machine communication<br />
(PerkellKlatt 1986) .<br />
The present paper will attempt to address some of the<br />
questions that we typically encounter in the search for<br />
invariance. We shall do so by summarizing research<br />
undertaken mostly in our own laboratory in Stockholm.<br />
Although thus deliberately limiting the scope of our review<br />
we hope that the issues raised will nevertheless be of<br />
sufficient interest to stimulate general discussion.<br />
IS PHONETIC INVARIANCE ARTICULATORY?<br />
A few decades ago phoneticians began to interpret<br />
phonetic events by comparing articulations to highly damped<br />
oscillatory systems. More recently, such models have<br />
acquired an important role within the framework of action<br />
theory (Kelso, Saltzman and Tuller 1986) . In the sixties it<br />
was hoped that a lot of the variability that speech signals<br />
*Plenary address to be presented at the XIth International<br />
Congress of Phonetic SCiences, Tallinn, Estonia, August<br />
1987.<br />
2
typically exhibit e g reductions and vowel-consonant<br />
coarticulation (Ohman 1967) - could be explained in terms of<br />
the spatial and temporal overlap of adjacent Pmotor<br />
commands" (MacNeilage 1970) . Articulatory movements were<br />
seen as sluggish responses to an underlying forcing function<br />
which was assumed to change, usually in a step-wise fashion,<br />
at the initiation of every new phoneme (Henke 1966) . Owing<br />
to variations in say stress or speaking tempo different<br />
contexts would give rise to differences in timing for a<br />
given sequence of phoneme commands. Articulatory and<br />
acoustic goals would not always be reached, the so-called<br />
'undershoot' phenomenon (stevens and House 1963) . But since<br />
such undershoot appeared to be lawfully related to the<br />
duration and context of the gestures (Lindblom 1963) , the<br />
underlying articulatory "targets" of any given phoneme<br />
'die Lautabsicht' - would nevertheless, it was maintained,<br />
remain invariant. Accordingly, at that time it seemed<br />
possible to argue that phonetic invariance might be<br />
articulatory.<br />
Duration-dependent undershoot still seems to to be a<br />
phonetically valid notion for biomechanical reasons. But it<br />
is clearly not as inevitable a phenomenon as was first<br />
thought. Current experimental information indicates that in<br />
fast speech articulatory and acoustic goals can be attained<br />
despite short segment durations (cf Engstrand 1987, Gay<br />
1978, Kuehn and Moll 1976) . Furthermore undershoot has been<br />
observed in unstressed Swedish vowels that exhibit long<br />
durations owing to 'final lengthening' (Nord 1986) . Such<br />
deviations from simple duration-dependence appear to<br />
highlight the reorganizational abilities of the speech<br />
production system. One way of resolving the problem posed<br />
by these somewhat contradictory results might be obtained if<br />
it were shown that when instructed to speak fast subjects<br />
have a tendency to "overarticulate", thus avoiding<br />
undershoot to some extent, whereas when destressing they are<br />
more prone to "underarticulate" (cf discussion below of<br />
hypo- and hyper-speech) . The demonstration of languagespecific<br />
patterns of vowel reduction (cf Delattre's 1969<br />
discussion of English, French, German and Spanish) becomes<br />
particularly relevant in the context of addressing such<br />
questions.<br />
In summary, the original observations of 'undershoot'<br />
carried the implication that the invariant correlates of<br />
linguistic units were to be found, not in the speech wave<br />
nor at an auditory level, but upstream from the level of<br />
articulatory movement. Phonetic invariance was accordingly<br />
associated with the constancy of underlying "spatial<br />
articulatory targets" (for reviews of the target concept see<br />
e g MacNeilage 1970, 1980) . However, subsequent<br />
experimentation - some of which we already hinted at above -<br />
has revealed that the notion of segmental target must be<br />
given a much more complex interpretation.<br />
This conclUSion is reinforced particularly strongly by<br />
studies of compensatory articulation. Let us summarize some<br />
results from an experiment using the so-called "bite-block"<br />
paradigm (Lindblom, Lubker, Lyberg, Branderud, Holmgren in<br />
press) . Native Swedish speakers were asked to pronounce<br />
monosyllables and bi- and trisyllabic words under two<br />
3
conditions: normally and with a large bite-block between<br />
their teeth. They were instructed to try to produce the<br />
bite-block utterances with the same rhythm and stress<br />
pattern as the corresponding normal items. Real Swedish<br />
words as well as "reiterant" nonsense forms were used: To<br />
exemplify, one of the metric patterns was: - '- - This<br />
pattern would occur in the lists as "begabbaN and<br />
Iba'bab:ab/. Measurements were made of the duration of the<br />
consonant and vowel segments of the normal and the biteblock<br />
versions of the reiterant speech samples. The question<br />
was thus whether subjects would be able achieve the bilabial<br />
closure for the Ibl segments in spite of the abnormally low<br />
and fixed jaw position and whether they would be able to do<br />
so reproducing the normal durational patterns.<br />
We found that the timing in the bite-block words<br />
deviated systematically but very little from the normal<br />
patterns and concluded that our subjects were indeed capable<br />
of compensating. To explain the results we suggested that a<br />
representation of the Ndesired end-product" - the metric<br />
pattern of the word - must be available in some form to the<br />
the subjects' speech motor systems and that the successful<br />
compensations implied a reorganization of articulatory<br />
gestures that must have been controlled by such an outputoriented<br />
target representation. These results are in<br />
agreement with those reported earlier by Netsell, Kent and<br />
Abbs (1978) . Moreover, they are completely analogous to the<br />
previous demonstrations that naive speakers are capable of<br />
producing isolated vowels whose formant patterns are normal<br />
at the first glottal pulse in spite of an unnatural jaw<br />
opening imposed by the use of a Nbite-block" (Lindblom,<br />
Lubker and Gay 1979, Gay, Lindblom and Lubker 1981) .<br />
These results bear on the recent discussion of speech<br />
timing as "intrinsicallyN or "extrinsicallyN controlled.<br />
Proponents of action theory (Fowler, Rubin, Remez and Turvey<br />
19BO) approach the physics of the speech motor system from a<br />
dynamical perspective with a view to reanalyzing many of the<br />
traditional notions that now require explicit representation<br />
in extant speech production models such as 'feedback loop',<br />
'target' etc. Their writings convey the expectation that<br />
many aspects of the traditional "translation models" will<br />
simply fall out as consequences of the dynamic properties<br />
intrinsic to the speech motor system. In the terminology of<br />
Kelso, Saltzman and Tuller (1986, 55) N • . • . , both time and<br />
timing are deemed to be intrinsic consequences of the<br />
system's dynamical organization. N Methodologically, action<br />
theory is commendable Since, being committed to interpreting<br />
phonetic phenomena as fortutitous (intrinsic) consequences<br />
rather than as controlled (extrinsic) aspects of a speaker's<br />
articulatory behavior, it guarantees a maximally thorough<br />
examination of speech production processes. However, it is<br />
difficult to see how, applying the action theoretic<br />
framework to the data on compensatory timing just reviewed,<br />
we could possibly avoid postulating some sort of Ntemporal<br />
targetN representation which is (i) extrinsic to the<br />
particular structures executing the gestures and which is<br />
(ii) responsible for extrinsically tuning their dynamics.<br />
Speech production is a highly versatile process and<br />
sometimes appears strongly listener-oriented.<br />
4
The plasticity of the speech motor system is further<br />
illustrated by an experiment recently done by Schulman<br />
(forthcoming) invoking a "natural bite-block" situation.<br />
This condition is provided by loud speech in which a more<br />
open mandible tends to be used than in normally spoken<br />
syllables.<br />
Whether rounded or not the vowels of loud test words<br />
produced by Schulman's talkers were found to exhibit almost<br />
three times as large jaw openings as the corresponding<br />
segments in the normal words. In the context of compensatory<br />
articulation two observations call for special comments. Why<br />
do not speakers compensate for the greater jaw opening in<br />
the loud vowels the way they do in the bite-block<br />
experiments? Schulman shows that they do not since the<br />
fundamental frequency and (as predicted by articulatoryacoustic<br />
nomograms) the first formant of the loud vowels are<br />
shifted upwards by about one Bark whereas the other formants<br />
do not undergo comparable modification. (Below we shall<br />
relate the F1 and FO shift to the results of a perceptual<br />
experiment) .<br />
The other finding of interest is the fact that loud<br />
vowel durations increase whereas loud consonant durations<br />
tend to decrease (cf Fonagy and Fonagy 1966) . What does that<br />
result mean? The normal-loud vowel duration differences look<br />
suspiciously similar to the durational differences between<br />
normal open and close vowels which have been observed for<br />
many languages (Lehiste 1970) . Finding that the duration of<br />
the EMG recorded from the anterior belly of the digastric<br />
correlated with both mandibular displacement and vowel<br />
duration Westbury and Keating (1980) suggest that this<br />
temporal variation among vowels, although non-distinctive,<br />
must be seen as present in the neuromuscular signals<br />
controlling their articulation. An alternative<br />
interpretation would be to regard the differences as<br />
automatic consequences of an interaction between an<br />
invariant underlying "vowel duration command" and<br />
articulatory inertia (cf Keating 1985 for further<br />
discussion) . In (Lindblom 1967) we reported some evidence in<br />
favor of the latter interpretation, the "extent of movement<br />
hypothesis" (Fischer-Jorgensen 1964) . We also found that the<br />
durational consequences of more extensive articulatory<br />
gestures were sometimes actively counteracted.<br />
The question whether the open-close vowel duration<br />
difference is an intrinsic or extrinsic phonetic phenomenon<br />
is accordingly somewhat controversial. Schulman's findings<br />
bear on the problem. He constructed a model of loud speech<br />
based on the observation that loud movements appear to be<br />
"exaggerated" versions of the normal movements. Assuming<br />
that the lips and the jaw are linear mechanical systems and<br />
that loud differs from normal speech solely in terms of the<br />
amplitudes of the underlying excitation forces he performed<br />
a linear scaling of all articulatory parameters recorded for<br />
normal syllables (vertical displacements of upper and lower<br />
lips and jaw) and combined the scaled curves so as to derive<br />
the vertical separation of the lips - the parameter that<br />
determines the open-closed state of the mouth opening. By<br />
using the value of this parameter at opening and cloSing in<br />
the normal syllables as his criterion he was then able to<br />
5
predict the durations of vowel and consonant segments for<br />
loud speech. He found that linear scaling eliminated stop<br />
closures entirely or produced much too long vowels.<br />
The implication of this result is that it clearly<br />
attributes the durational differences to a superposition<br />
effect, that is the interaction arising from the<br />
superposition of the lip and the jaw movements. Schulman<br />
concludes that, unless the effect of opening and closing of<br />
the jaw had been actively counteracted, loud and normal<br />
vowel durations would have differed even more than they<br />
actually did.<br />
Let us remark in the present context that, while it<br />
appears reasonable to suggest, as do Westbury and Keating,<br />
that the acoustic vowel duration differences are probably<br />
reflected at a level of neuromuscular control, there is also<br />
evidence indicating that the function of neural control<br />
signals may be a compensatory rather than a positive one,<br />
that is a function opposite to that suggested by Westbury<br />
and Keating.<br />
The preliminary implication of all work touching the<br />
theme of compensatory articulation appears to be that<br />
whether we use utarget" with reference to segmental<br />
attributes, segment durations or patterns of speech rhythm -<br />
the term is better defined, not in terms of any simple<br />
articulatory invariants, but with respect to the acoustic<br />
output that the talker wants to achieve. If phonetic<br />
invariance is not articulatory could it be acoustic then?<br />
IS PHONETIC INVARIANCE ACOUSTIC?<br />
The suggestion that the speech signal contains absolute<br />
physical invariants corresponding to phonetic segments and<br />
features has received a lot of attention thanks to the work<br />
by Stevens and Blumstein (Stevens and Blumstein 1978, 1981;<br />
Blumstein and Stevens 1979, 1981) . The idea has been<br />
favorably received by many, for instance Fowler in her<br />
attempts to apply the perspective of direct perception to<br />
speech (Fowler 1986) .<br />
Others have been provoked to emphasize the inadequacy<br />
of the non-dynamic nature of the Stevens template notion<br />
(Kewley-Port 1983) and the substantial context-dependence<br />
that the stop consonants of various languages typically<br />
display even in samples of carefully enunciated speech<br />
(Ohman 1966) .<br />
Recent work by Krull and Lacerda in our Stockholm<br />
laboratory uses the method of quantifying the extent of<br />
consonant-vowel coarticulation in the form of linear "locus<br />
equations". These relationships are obtained by plotting<br />
formant frequencies at CVz- and V1C-boundaries as a function<br />
of the formants for Vz and V1 respectively. Acoustic theory<br />
indicates that for the consonant-vowel combinations in<br />
question near-linear relationships should be expected. Such<br />
diagrams show clearly that, although a ulocusu pattern can<br />
exhibit considerable variation, it is predictable from<br />
information on stop consonant identity and adjacent vowel<br />
context. Here coarticulation stands out as the salient fact<br />
and the lack rather than the presence of absolute acoustic<br />
invariance tends to be reinforced.<br />
6
Inc identally, let us note that, if it ex ists, acoust ic<br />
invar iance is a strange not ion since talkers can only<br />
mon itor it through their senses and listeners can only<br />
access it through the ir hear ing system. Why should sensory<br />
and aud itory transduct ion be assumed to have a transfer<br />
funct ion of one impos ing no transformat ion? Is it the case<br />
that what people really mean when they talk about acoustic<br />
invar iance is in fact Uauditory· invar iance? Let us look at<br />
some psycho-acoust ic results.<br />
IS PHONETIC INVARIANCE AUDITORY?<br />
We ment ioned earl ier a perceptual result that offers a<br />
rather cur ious parallel to Schulman's find ings. It is the<br />
uTraunmCller effect· wh ich is a demonstrat ion of the<br />
transforms requ ired to preserve the perceptual constancy of<br />
vowel quality under changes in (i) vocal effort and (i i)<br />
vocal tract size. It is also somewhat rem inscent of the<br />
find ings on FO-F1 interrelat ionsh ips in soprano vowels<br />
(Sundberg 197).<br />
Effort and vocal tract var iat ions can be dramat ically<br />
illustrated by synthetically modify ing a naturally spoken<br />
IiI. When all formants and FO are sh ifted equally along a<br />
Bark scale an IiI-l ike vowel is perce ived but the voice<br />
changes from an adult's to a ch ild's. When both F1 and FO<br />
are var ied in such a way that F1-FO is kept constant on a<br />
Bark scale - and the upper formant complex is left unchanged<br />
- an IiI-l ike vowel is perceived. Th is is remarkable in view<br />
of the fact that F1 reaches a value more typical of a lowpitched<br />
I /. One's impress ion is that the speaker rema ins<br />
the same but that she ·shouts·.<br />
Note the parallel between Schulman's and TraunmCller's<br />
results. Are the find ings causally related? Do we expla in<br />
the lack of formant compensat ion in loud speech in terms of<br />
the TraunmCller effect? Or do we account for the vowel<br />
qual ity results in terms of the ·Schulmanu effect?<br />
Of importance for the present discuss ion is the fact<br />
that behav ioral constanc ies have been demonstrated and that<br />
they imply that at least in this case phonet iC invar iance<br />
must be def ined at a level of auditory representat ion.<br />
Let us return for a moment to the alleged invar iance of<br />
the release spectra of stop consonants. Diana Krull<br />
collected perceptual responses from Swed ish listeners to<br />
burst fragments obtained from V1C:VZ words (Krull 1987). One<br />
hundred test words were generated by constructing all<br />
poss ible combinat ions of V1 or Vz = short Ii e a 0 ul with<br />
C: · Ib: d: rd: g:/. Confus ion matr ices for the burst<br />
st imul i demonstrate the drastic coart iculat ion effects. By<br />
and large, listener responses can be accounted for in terms<br />
of the acoust ic propert ies of the st imul i. Th is is shown in<br />
her attempts to predict the confus ions from aud itor ily based<br />
·perceptual distance- computat ions.<br />
A related study has been carr ied out by Lacerda (1986).<br />
W. can characterize one part of his research as var ia tions<br />
on the theme struck by Flanagan in his .arly -difference<br />
limen· exper iments on vowel formant frequenc ies (Flanagan<br />
19). Lac.rda's quest ion was: How well can listeners<br />
discr im inate four-formant st imuli that differ solely in<br />
terms of the frequency of F2. His work perm its us to compare<br />
7
a psycho-acou.tic task: th. discrimination of F2 in bri.f<br />
ton. burst. with formant patt.rns static - with a "spe.ch<br />
task": the discrimination of th. onset of F2-transitions in<br />
Ida/-.timuli.<br />
The r.sults indicate that the subjects' ability to<br />
discriminate on the p.ycho-acoustic task is in close<br />
agr.em.nt with Flanagan'. findings whereas th.ir performance<br />
on the Ida/-stimuli is drastically impaired. One<br />
interpretation is that the di.crimination chang. is related<br />
to the fact that intra-category di.crimination is<br />
considerably worse than inter-cat.gory discrimination<br />
(Liberman. Harris, Hoffman and Griffith 197).<br />
With reference to the invariance issue it is important<br />
to note the following. Krull's results on .top perception<br />
indicate that the coarticulatory spectral variability of the<br />
stop releas.s is rather accurat.ly reflected in the<br />
confusions that her listeners made of such brief sounds.<br />
This is fully compatible with Lacerda's results on tone<br />
bursts. Note that in Lacerda's speech-task t.st however, the<br />
variability does not seem to b. as faithfully mirrored in<br />
the listeners' percepts for apparently th.y treat stimuli<br />
easily discriminabl. in psycho-acoustic tests as "the sameM•<br />
Wh.ther it i. the list.n.r invoking the "speech modeM or it<br />
is the interaction of the dynamic stimulus properties and<br />
speech-ind.pendent auditory processing is an issue still<br />
worth addr.ssing. However, our main point is this: The<br />
invariance that we disc.rn in these findings i8 not<br />
acoustic. It cl.arly presupposes auditory processing.<br />
IMPLICATIONS OF SPEAKING STYLE: THE HYPER-HYPO DIMENSION<br />
Everyday .xperienc. indicates that .peaking is a highly<br />
flexible proc.... W. are capable of varying our style of<br />
sp.ech from fast to slow, 50ft to loud. casual to clear.<br />
intimate to public. W. speak in diff.rent ways when talking<br />
to foreigner., babies, computers and hard of hearing<br />
persons. And we change our pronunciation as a function of<br />
the social rules that govern speaker-listener interactions<br />
(Labov 1972).<br />
Above we considered principally three types of phonetiC<br />
invariance: articulatory, acoustic and auditory invariance.<br />
What are the implications of variations in speaking style<br />
for the invariance is.ue? For the purpose of our discu.sion<br />
let us give phonetic invariance a strong literal<br />
interpretation which is rather extreme but neverthele.s not<br />
too far from working hypotheses explored previously by<br />
various investigators: MAll the information is in the<br />
Signal, particularly in it. dynamic.". For .uch a view of<br />
invariance to be correct - let us call it the strong ver.ion<br />
of ab.olute physical invariance - the following must be<br />
true: Talkers vary their .peaking style and thereby<br />
contribut. to increaSing the variability of the spe.ch wave<br />
but in utterances that are intelligible lingui.tic units<br />
will always exhibit a core of invariant physical information<br />
that will remain unde.troyed so a. to b. succes.fully u.ed<br />
by a listener.<br />
8
We recently undertook a literature survey+ in order to<br />
systematize the types of .peech materials that have been<br />
u.ed in<br />
acoustic phonetic studies published during the past<br />
t.n years in J Acoust Soc Am, J of Phon.tics, Language and<br />
Speech, and Phanetica. A total of over 700 articles were<br />
.elected as preliminarily relevant. W. ended up choosing 216<br />
as me.ting our crit.rian of Ndescriptiv. study of speech<br />
based on quantitative acoustic phonetic measur.ments N•<br />
Of .p.cial intere.t to us was to ascertain the relative<br />
proportion. of studies inv •• tigating Uself-g.n.ratedU .peech<br />
(including e 9 spontaneous conversation) on the one hand and<br />
speech samples chosen by the experimenter (e 9 list<br />
readings, nan.ense words etc) an the oth.r. Not<br />
.urprisingly, we found that the majority of studies, over<br />
90%, use experimenter-controll.d sp •• ch .amples. The rea.on<br />
is clear. A satisfactory experimental deSign pre.upposes<br />
good control of the variabl.s invalv.d. This is l.ss of a<br />
problem if the experimenter determines the test items but<br />
for Nreal speech- with its immense number of variables there<br />
is no establi.hed methodology that will guarantee such<br />
control. So rather than drown in an ocean of Nunknown<br />
factorsu our .trategy tends naturally to become one of<br />
resorting to Ugiven- test mat.rials and read speaking mode ••<br />
One way of justifying this widely used procedure is to<br />
argue that fir.t we will solve the problem of phonetic<br />
invariance in Nlab speech·. Then we will get to work on<br />
Nnatural speechu. Another outlook might be to suggest that,<br />
although we lack the supplementary methodology required by<br />
Uecological N .pe.ch, the exces.ive use of ulab speech N<br />
introduces an undesirable bias in our data bases as well a.<br />
in our theoretical intuitions about invariance and other key<br />
i •• ues - a bias that might make us underestimate the problem<br />
of speech variability in .pite of the fact that it is<br />
readily acknowledged by all workers in the field and has<br />
already, it would app.ar, be.n rath.r massively docum.nted.<br />
Consequently the situation ought to be balanced.<br />
We have recently been persuaded by the latter point of<br />
view and are currently recording (1) ·self-generated speech N<br />
produced under natural condition. and (2) parallel Ncitation<br />
form· .peech ba.ed on the syllables, words and phrases that<br />
occur in the spontaneous mat.rials. Data are currently being<br />
collected by Rolf Lindgren, Diana Krull and myself using<br />
this two-pronged approach involving compari.on. of ref.renc.<br />
pronunciation. (Ncitation formu speech) with samples of<br />
Uself-generated speech-. A few preliminary observations can<br />
be made that bear on the present discussion (cf also<br />
Lindblom and Lindgren 198).<br />
The reductions that we have found in spontaneous speech<br />
- and often escape the trained phon.tic ear even aft.r<br />
spectrographic evidence has been examined - are sometimes<br />
drastic. Speaking style has marked effects on the acoustic<br />
patterns of words. The vowel space shrinks in casual style<br />
and is expanded in Nhyper.peechu modes. The dlphthongization<br />
+1 am indebted to Diana Krull for doing the preliminary<br />
s.l.ctions and to Nata.ha B.ery of the Phonology Laboratory,<br />
Univ.r.ity of California, Berkel.y for the .tatistical<br />
analy ••••<br />
9
of ten.e Swedi.h vowel. i. enhanced and i. p.rticularly<br />
apparent in clear speech. Contrast in VOT for voiced and<br />
voicele.s .tops incre •••• and d.cr.ases a. we compare hyper<br />
.nd hypo-form. resp.ctively. Locus equations .how a smaller<br />
slope (=less vowel-dependence) for citation form<br />
pronunciations than for .pontaneou. speech which w.<br />
interpret to indic.t. th.t vow.l-consonant coarticulation i.<br />
count.r.cted in hyper.p •• ch (mor. inv.ri.nce) but tol.rat.d<br />
in hypo.peech (le.s inv.riance). Although prelimin.ry the<br />
obs.rv.tion. made so far .ugge.t th.t the pro.pect. for any<br />
.trong ver.ion of .bsolute phy.ic.l inv.riance to b •<br />
• ub.tantiated s.em most unfavor.ble.<br />
SPEECH UNDERSTANDING: CIN)DEPENDENCE OF SIGNAL INFORMATION<br />
At the Department of Romanc. L.nguag.s at Stockholm<br />
University a te.t is u.ed to measure how proficient native<br />
Swedi.h .tud.nts are in under.tanding .poken French in which<br />
the t.sk of the students i. to listen to triads of stimuli<br />
con.isting of two identical .entence. and one minimally<br />
diff.rent and to indicat. the odd cas ••<br />
Montre-l.ur ce chapeau s'i1 te pla t<br />
Montre-l.ur c. chapeau .'il te plat<br />
Montre-leur ces chapeaux .'il t. pl t<br />
Nativ. speakers of French have no problems of course<br />
with such .entence. wh.rea. Sw.dish li.ten.rs knowing no<br />
French have a lot of trouble. However, when the key<br />
inform.tion - • 9 the ce/ce/ce. triad - is pres.nted as<br />
fragments gated<br />
from the original •• ntenc •• the p.rformanc.<br />
of the Sw.dish subject. improv.s radically (Dufberg and<br />
steek forthcoming).<br />
This test can s.rv. to remind u. that perception is a<br />
product of two things: signal-dependent and signalindependent<br />
information. While I am perfectly capable of<br />
di.criminating the Fr.nch minimal contra.t. as auditory<br />
patterns I would quickly lose those patterns in a sentence<br />
context unle.s I have a suffici.ntly good command of French<br />
- that i. acce.. to .ignal-indep.ndent 'knowledge' whose<br />
interaction with the signal is a part of forming of the<br />
final percept.<br />
Th • • peech literature i. full of experimental data<br />
indicating that proc ••••• not primarily driv.n by the signal<br />
play an important role in the perception of speech. There<br />
will not be time to do justice to all the r.search b.aring<br />
on this issue.<br />
Let m. just recall some well-known paradigms:<br />
Perception of .pe.ch in the pres.nc. of various disturbance.<br />
(noise and distortion). The improvement of identification a.<br />
the .ignal g.ts lingui.tically richer (Miller, Hei.e and<br />
Lichten, Pollack and Pickett 1964 and by Miller&Isard).<br />
Detection of delib.rate mi.pronunciation. (Cole 1973). Word<br />
frequency effects (How.s, Savin). Restoration (Warren 1970,<br />
Ohala and Feder 1986). Phoneme monitoring (Foss&Blank). Word<br />
recognition from word fragments (Grosjean 1980, Nooteboom<br />
1981). Fluent restor.tion. in .hadowing mi.pronunciations<br />
(Mar.len-Wil.on .nd Welsh 1978). V.rbal tr.n.form.tion.<br />
(W.rren). Int.lligibility of lip-r.ading from vid.orecordings<br />
.upplemented by "hummed spe.chY - an audio .ignal<br />
10
proc •••• d to contain primarily<br />
(Ri.b.rg 1979). Inferences from<br />
(Ohala 1981).<br />
rhythm and<br />
historical<br />
intonation cu.s<br />
sound<br />
changes<br />
CONCEPTUALIZING SPEAKER-LISTENER INTERACTIONS<br />
Our review of .xp.rimental evidence bearing on the<br />
invariance i.sue has b.en •• lect ive but should n.vertheless<br />
provide a rough indication of a panoply of alt.rnativ.<br />
position. and their respective pro's and con's. We have<br />
considered the sugg.stion that the invariance of phon.t ic<br />
segments be defined: (i) at an articulatory level (e 9 the<br />
uspatial targetu hypothes is), (ii) at an acoustic level (e g<br />
spectral propert ies of stops), (iii) at an auditory level (e<br />
9 p.rceptual constancy of vowel quality). Which of the.e<br />
alternative • • hould we put our money on?<br />
When pursued .xperimentally articulatory, acoustic or<br />
auditory def init ions of invariance have the methodological<br />
virtue of .ncouraging a maximally thorough s.arch at th.se<br />
particular levels. But in seeking a broader theoretical<br />
und.r.tanding of sp.ech communication w. would .tand little<br />
to gain from .p.nding effort on choosing between levels.<br />
Such an appoach misr.ads the ev id.nce which, when view.d in<br />
a broader perspective, strongly pOints to the conclusion<br />
that: The invariance problem i. not a phonetiC i.sue at all<br />
for ultimately invariance can b. defined only at the level<br />
of listener comprehen. ion.<br />
We can convince ourselves of the correctness of that<br />
point by considering the following phrase in Engli.h:<br />
Il.snsevn/. We can h.ar this utterance either as LESS THAN<br />
SEVEN, or as LESSON SEVEN. In the appropriate contexts (.ay<br />
uHow many are comingU, and ·What is our topic to-day?H) the<br />
list.ner will not be aware of any ambiguity. At which<br />
phonetiC level do we find the physical correlates of the<br />
initial segments of the word Uthan-? Ne.dle.s to say there<br />
no such correlates in th is part icular case. The<br />
conclu.ion •• ems inescapabl.: W. should not put our mon.y on<br />
any of the above alt.rnatives. We must seek a more general<br />
theory.<br />
Th. experimental data on production indicates that the<br />
behavior of the speech motor syst.m i • • hap.d primarily<br />
two force. - plasticity (li.tener-orient.d reorganization)<br />
and economy (talk.r-oriented simplification) which<br />
int.ract on a short-term ba.is 50 as to gen.rate .ignal.<br />
that may be<br />
Urich or poorN in .xplicit physical information.<br />
Th. evid.nce on perc.ption has id.ntifi.d two major<br />
source. of information: .ignal-dependent and signalindep.ndent<br />
proc..... and .ugge.t. that on a .hort-term<br />
basi. percepts arise from the latter (i e NcontextU)<br />
modulating the former in an analogou.ly Urich or poorN<br />
manner.<br />
One pos.ible way of schematizing the log ical<br />
po.sibilitit •• of the.e conc.ptual .implification. i • • hown<br />
in the diagram of the enclosed figur •• This is not a very<br />
rigorous sch.m. but .eems useful, at l ••• t pedagogically, in<br />
contrasting some of the ideas curr.ntly entertain.d in<br />
phon.tics (cf J of Phon.tic., January i •• u. 1986).<br />
Thi. graph states that for sp •• ch to be intell igible<br />
the .um of explicit physical information and signal-<br />
by<br />
11
independent inform .. tion must b. ..bov. a threshold, that is<br />
the 13 d.gr •• lin •• In the ideal ca.e this sum equals a<br />
const .. nt the x- and y-v .. lu •• of sp.cific spe.ch ... mples<br />
falling right on that 11 ne. Points above the line are<br />
.... ociat.d with wh .. t might b. termed "over-clear" .p•• ch,<br />
points below it with "unintelligible" sp •• ch.<br />
MUTUALITY OF SPEAKER-<br />
LISTENER INTERACTION<br />
I<br />
Z<br />
W<br />
C<br />
Z<br />
w<br />
a..<br />
w<br />
c<br />
z<br />
Ī<br />
...J<br />
It appears reasonable to assume that in the real-life<br />
situations utterances can vary tremendously with respect to<br />
how socially and communicatively successful they prove to<br />
be. For our present purposes let us focus on speech samples<br />
from hypothetically successful real-life speaker-listener<br />
interactions and assume that they produce data points<br />
clustering near and above the slant line. What would such a<br />
result imply? It would mean that there is a complementary<br />
relation between the amounts of information contributed by<br />
signal attributes on the one hand and ·contextN on the<br />
other. When speakers come close to the slant line it would<br />
indicate first of all that they are capable of varying their<br />
speech output in a plastic way (cf evidence on hypo-hyperspeech<br />
modes and other instances of reorganization of speech<br />
motor control) and secondly that, while perhaps not being<br />
perfect 'mind-readers', they are at least capable of<br />
adapting their speech on-line to the short-term fluctuations<br />
in the listener's access to NcontextU or signal-independent<br />
information (cf experimental documentation of numerous cases<br />
showing that listeners are in fact capable of successfully<br />
coping with highly context-dependent reduced and<br />
coarticulated speech stimuli). The possibility of such<br />
complementarity in real speech emerges also from some recent<br />
measurements reported by Hunnicutt (198) as well as from<br />
Lieberman's 1963 study.<br />
If we hypothesize that this strategy - let us call it<br />
the STRATEGY OF ADAPTIVE VARIABILITY - comes near the way<br />
real speakers actually behave when they are communicatively<br />
successful, we obtain a natural way of resolving some of the<br />
paradoxes that surround the invariance issue. For it follows<br />
that intra-speaker phonetic variation - along a hyper-hypocontinuum<br />
as well as along other dimensions is the<br />
characteristic that we should expect the units of ecological<br />
speech to exhibit - not absolute physical invariance.<br />
The proposed way of thinking about the issue does not,<br />
of course, rule out finding physical speech sound invariance<br />
in restricted domains of observation but it does explain why<br />
our quest for a general concept of phonetiC invariance has<br />
been largely unsuccessful. And, in a pessimistiC vein, it<br />
predicts in fact that it will continue to be so.<br />
Our reasoning leads us back to a conclusion already<br />
drawn by MacNeilage in his 1970 review of the invariance<br />
issue:<br />
N • • • the essence of the speech production process<br />
is not an inefficient response to invariant<br />
central signals, but an elegantly controlled<br />
variability of response to the demand for a<br />
relatively constant end (p 184)N.<br />
If, as sugge.ted here, we take the Nrelatively constant<br />
endN to be defined neither articulatorily, acoustically nor<br />
auditorily but specified only with reference to Nthe level<br />
of listener comprehensionu MacNeilage's formulation .till<br />
captures the Ne.sence of the speech production proce.sN<br />
saUsfactori lYe<br />
Let us pause to reflect on some of the implications of<br />
the two theories contrasted in our discussion: Absolute<br />
13
Physical Invariance versus Adaptive Variability. The former,<br />
if proved correct, would transform what currently looks like<br />
instances of massive variability into artefacts. For this<br />
theory says in fact that there simply IS NO variability of<br />
linguistic units. There seems to be but that is merely a<br />
result of our presently inadequate conceptual and<br />
experimental tools. Further note that if we push the notion<br />
of absolute constancy to it. extreme another implication can<br />
be noted, namely that the transmis.ion of information by<br />
speech - an undeniably biological process - is basically<br />
non-adaptive.<br />
The Theory of Adaptive Variability, on the other hand,<br />
says exactly the opposite. This is a theory for which it is<br />
easier to find support within the general study of the<br />
biology of motor control and perception. It is precisely by<br />
emphasizing the adaptive nature of speech processes that we<br />
obtain a principled way of investigating phonetiC variation<br />
and its origin.<br />
ON-LINE PROCESSES IN THE LIGHT OF TYPOLOGICAL EVIDENCE ON<br />
CONSONANT SYSTEMS<br />
Some time ago Nooteboom did an experiment on word<br />
retrieval and was able to show that listeners perform better<br />
if presented with the first halves of words than on the<br />
corresponding second-half fragments (Nooteboom 198 1). For an<br />
explanation he sugge.ted that, since word recognition is a<br />
real-time left-to-right process, word beginnings are less<br />
predictable than word endings. Consequently left-to-right<br />
context can be much more easily used than right-to-Ieft<br />
context.<br />
He concluded his paper by raising the question whether<br />
this asymmetry - that he take. to be a universal feature of<br />
the perceptual proceSSing of any language - might have left<br />
its imprint an how lexical information is organized in the<br />
languages of the world. He predicted (p 422) that: U(l) in<br />
the initial position there will be a greater variety of<br />
different phonemes and phoneme combinations than in word<br />
final pOSition, and (2) ward initial phonemes will suffer<br />
le.s than word final phonemes from assimilation and<br />
coarticulation rule •• u<br />
One basic assumption is that variations in perceptual<br />
predictability correlate with signal udistinctivenessu•<br />
Hence -the greater variety of different phonemes and phoneme<br />
combinationsu in the initial as compared with the final<br />
position of words. Restating the idea we can say that a<br />
larger paradigm goes with a RICHER signal inventory. The<br />
other side of the coin is of course that a smaller paradigm<br />
- such as that attributed to word endings - goes with a<br />
POORER signal inventory. In suggesting that the presence of<br />
assimilation and coarticulation should vary inversely with<br />
the need for keeping items distinct Nooteboom taCitly<br />
formulates a hypotheSiS that comes close to the theory of<br />
Adaptive Variability described here. Note that the theory<br />
Absolute Physical Invariance do nat offer us any b.sis at<br />
all for making predictions about a possible interplay<br />
between language structure and on-line processing. Why? As<br />
stated earlier according to that theory there IS no phonetiC<br />
variation, there only seems to be. The idea of language<br />
14
structure adapting to the on-line constraints of speaking<br />
and listening only becomes a possibility once we recognize<br />
the existence and systematic nature of phonetic variation.<br />
Only from that point of departure will we be able to address<br />
the question of what feeds the processes of phonological<br />
innovation.<br />
We shall not be in a position to present the<br />
typological data needed to test Nooteboom's hypothesis.<br />
However, we shall conclude our paper by presenting some<br />
other data that do bear on it and strongly encourage further<br />
examination of the underlying ideas.<br />
In collaboration with Ian Maddieson we recently<br />
undertook an analysis of the consonant inventories of 317<br />
languages, carefully selected so as to constitute a<br />
reasonable sample of the "languages of the world". Our<br />
corpus was that of UPSID, the UCLA Phonetic Segment<br />
Inventory Database (Maddieson 1984) . The data consists of<br />
lists of systems whose elements (allophones of major<br />
phonemes) are specified in phonetic transcription.<br />
Inventory sizes range from 6 to 95 consonants per<br />
system. The materials lend themselves to testing a<br />
paraphrase of Nooteboom's hypothesis: Is the phonetic<br />
structure of consonant systems independent of their size? Or<br />
is it systematically related to that dimension? If there is<br />
a systematic size-dependence what is it?<br />
There is neither time nor space to give the details of<br />
the analysis. They will be published elsewhere (Lindblom,<br />
MacNeilage and Studdert-Kennedy; Lindblom and Maddieson<br />
forthcoming) . Fortunately, Nooteboom's perspective provides<br />
us with a way of summarizing the main findings.<br />
It turns out that small paradigms statistically favor<br />
segments with both phonatory and articulatory properties<br />
that can be classified as basic or elementary. Medium-sized<br />
paradigms tend to include consonants invoking more<br />
elaborated gestures in addition to a core of basic elements.<br />
The largest systems use both these types but also<br />
combinations of elaborated gestures that we label complex<br />
articulations. To exemplify, plain Ip t kl are classfied as<br />
"basic" articulations whereas ejective Ip ' t ' k \ 1 or<br />
aspirated Ip t k / invoke -elaborated· mechanisms. A<br />
segment such as It ' is ·complex· since it shows more than<br />
one elaboration: both of place (retroflexion) and source<br />
features (aspiration). Logically a six-consonant system<br />
could use the eJective set for its stop series. Small<br />
systems never do in our material whereas medium-sized and<br />
large systems do. Moreover, the "complex", multiply<br />
elaborated segments are most frequent in the large<br />
inventories. The basic rule is that a less simple consonant<br />
tends not to be recruited without the presence of parallel<br />
more simple ("basic" or "elaborated") series (cf the notion<br />
of 'implicational hierarchy' of traditional terminology).<br />
The claim we make is accordingly that we see a positive<br />
correlation between paradigm size and the number of elements<br />
that a sound pattern selects from a dimension of<br />
"articulatory complexity".<br />
The validity of our analysis naturally hinges on the<br />
success with which we can give non-Circular, independently<br />
motivated definitions of ·articulatory complexity". When it<br />
15
comes to the details of the analysis that problem is a topic<br />
for future quantitative phonetic theory. For the moment we<br />
believe that the major trends are rather gross effects that<br />
can be convincingly demonstrated by the force of the<br />
examples. They permit us to make the following<br />
generalization: Small consonant paradigms invoke 'unmarked'<br />
phonetiCS, large paradigms 'marked' phonetics. That is of<br />
course exactly what Nooteboom's hypothesis predicts and it<br />
takes a few steps towards an explanation for why sevenconsonant<br />
systems do not show inventories like the following<br />
(Ohala 1980):<br />
We take the present typological data on consonant<br />
systems as providing strong evidence in favor of (a)<br />
language structure evolving as an adaptation to the<br />
constraints of the on-line processes of speaker-listener<br />
interaction. and for (b) the correctness of a theory of<br />
Adaptive Variability as an account of those processes.<br />
REFERENCES<br />
Blumstein S and Stevens K N (1979): UAcoustic Invariance in<br />
Speech Production: Evidence from Measurement of<br />
the Spectral Characteristics of Stop Consonants N,<br />
J Acoust Soc Am 72, 43-30.<br />
Blumstein S and Stevens K N (198 1): ·Phonetic Features and<br />
Acoustic Invariance in Speech", Cognition 10, 23-<br />
32.<br />
Cole R A (1973): NListening for Mispronunciations: A Measure<br />
of What We Hear during Speech N, Perception and<br />
Psychophysics 13, 13- 16.<br />
Delattre, P (1969): NThe General Phonetic Characteristics of<br />
Languages: An Acoustic and Articulatory Study of<br />
Vowel Reduction in Four Languages", Mimeographed<br />
Report, University of California, Santa Barbara.<br />
Engstrand, 0 (1987): NArticulatory Correlates of Stress and<br />
Speaking Rate N, accepted for publication in J<br />
Acoust Soc Am.<br />
Flanagan, J (19): uA Difference Limen for Vowel Formant<br />
Frequency·, J Acoust Soc Am 27:6 13-6 14.<br />
Fischer-Jrgensen E (1964): "Sound Duration and Place of<br />
Articulartion N, Zeltschrift fdr Sprachwissenschaft<br />
und Kommunikationsforschung 17: 17-207.<br />
16
Fonagy I and Fonagy J (1966): "Sound Pressure Level and<br />
Duration", Phonetica 15: 14-2 1.<br />
Fowler C A, Rubin P, Remez R E and Turvey M T (1980):<br />
"Implications for Speech Production of a General<br />
Theory of Action", 373-420 in Butterworth, B (ed):<br />
Language Production, vol I, London:Academic Press.<br />
Gay, T (1978): "Effect of Speaking Rate on Vowel Formant<br />
Movements", J Acoust Soc Am 63 ( 1):223-230.<br />
Gay T, Lindblom B and Lubker J (198 1): "Production of Bite<br />
Block Vowels: Acoustic Equivalence by Selective<br />
Compensation", J Acoust Soc Am 69 (3), 802-8 10.<br />
Grosjean, F (1980): "Spoken Word Recognition and the Gating<br />
Paradigm", Perception and Psychophysics 28, 267-<br />
283.<br />
Henke, W J (1966): Dynamic Articulatory Model of Speech<br />
Production Using Computer Simulation, Doctoral<br />
dissertation, M. I. T.<br />
Hunnicutt, S (1985): "Intelligibility<br />
Conditions of Dependency",<br />
28 ( 1):47-56.<br />
versus<br />
Redundancy<br />
Language and Speech<br />
Keating, P (1985): "Universal Phonetics and the Organization<br />
of Grammars", 115- 132 in Fromkin, V A (ed):<br />
Phonetic Linguisticst Orlando, FL:Academic Press.<br />
Kelso J A S, Saltzman, E L and Tuller, B (1986): NThe<br />
Dynamical Perspective on Speech Production: Data<br />
and Theory", J of Phon 14: 1, 29-59.<br />
Kewley-Port, D (1983): "Time-varying Features as Correlates<br />
of Place of Articulation in stop Consonants", J<br />
Acoust Soc Am 73:322-355.<br />
Krull, D (1987): "Evaluation of Distance Metrics Using<br />
Swedish stop Consonants", paper submitted to the<br />
Xlth ICPhS, Tallinn, Estonia.<br />
Kuehn, D P and Moll, K L (1976): "A Cineradiographic Study<br />
of VC and CV Articulatory Velocities·, J of Phon<br />
4:303-320.<br />
Labov, W (1972): Sociolinguistic Pattern.,<br />
Philadelphia:University of Pennsylvania.<br />
Lehiste, I (1970): Supra.egmentals, Cambridge, MA:MIT Press.<br />
Lieberman, P (1963): "Some Effects of Semantic and<br />
Grammatical Context on the Production and<br />
Perception of Speech", Language and Speech 6: 172-<br />
187.<br />
17
Lib@rman A M, Harris K S, Hoffman H S and Griffith B C<br />
(1957): "The Discrimination of Spe@ch Sounds<br />
within and across Phoneme Boundaries", J of<br />
Exp@rim@ntal Psychology 54:358-368.<br />
Lindblom, B (1963): ·Spectrographic Study of<br />
Reduction·, J Acoust Soc Am 35:1773-1781.<br />
Vowel<br />
Lindblom, B (1967): ·Vowel Duration and a Mod@l of Lip<br />
Mandibl@ Coordination", STL-QPSR 4/1967, 1-29 (Dep<br />
t of Speech Communication, RIT, Stockholm).<br />
Lindblom B, Lubker J and Gay T (1979): -Formant Fr@quencies<br />
of Som@ Fix@d-Mandible Vowels and a Model of<br />
Speech Motor Programming by Predictive<br />
Simulation", J of Phon@tics 7, 147-161.<br />
Lindblom B, Lubker J, Lyberg B, Branderud P and Holmgren K<br />
(in press): NThe Concept of Target and Speech<br />
TimingU, to appear in Festschrift for lise<br />
L@hist@, (Foris:Dordrecht).<br />
Lindblom, B and Lindgr@n R (1985): ·Speaker-Listener<br />
Interaction and Phonetic Variation-, Perilus IV,<br />
Dept of Linguistics, University of Stockholm.<br />
Lindblom B, MacNeilage P and<br />
(forthcoming): Evolution<br />
Orlando, FL:Academic Press.<br />
Studdert-Kennedy<br />
of Spoken Language,<br />
M<br />
Lindblom, B and Maddieson, I (1988): ·Phon@tic Universals in<br />
Consonant Systems·, to appear in Hyman, L M and<br />
Li, C N (eds): Language, Speech and Mind, Croom<br />
Helm.<br />
MacNeilage, P (1970): -Motor Control of S@rial Ordering of<br />
Speech-, Psychological Review 77:182-196.<br />
MacN@ilag@, P (1980): -Speech Productionu, Language and<br />
Spe@ch 23 (1), 3-24.<br />
Maddieson, I (1984): Patterns of Sound, Cambridge:Cambridge<br />
UniverSity Press.<br />
Marslen-Wilson, W D and Welsh, A<br />
Interactions and Lexical<br />
Recognition in Continuous<br />
Psychology 10, 29-63.<br />
(1978): uProcessing<br />
Access during Word<br />
Sp@echu, Cognitive<br />
Netsell R, Kent, R and Abbs J (1978): UAdjustm@nts of th@<br />
Tongue and Lip to Fixed Jaw Positions during<br />
Speech: A Preliminary Reportu, Conference on<br />
Speech Motor Control, Madison, Wisconsin.<br />
Nooteboom, S G (1981): "Lexical Retrieval from Fragments of<br />
Spoken Words: Beginnings vs Endings·, J of<br />
Phonetics 9, 407-424.<br />
18
Nord, L (1986): MAcoustic Studies of Vowel Reduction in<br />
SwedishM, STL-QPSR 4/1986, 19-36 (Dept of Speech<br />
Communication, RIT, Stockholm).<br />
Chala, J J (1980): -Chairman's Introduction to Symposium on<br />
Phonetic Universals in Phonological Systems and<br />
their ExplanationU, 184-18 in Proceedings of the<br />
IXth International Congress of Phonetic Sciences<br />
1979, Institute of PhonetiCS, University of<br />
Copenhagen.<br />
Chala, J J (1981): NThe Listener as a Source of Sound<br />
ChangeN, 178-203 in Masek, C S, Hendrick, R A and<br />
Miller, M F (eds): P.p.r. from the P.r ••••• ion on<br />
L.ngu.g • • nd B.h.vior, Chicago:Chicago Linguistic<br />
Society.<br />
Chala, J J and Feder, D (1986): -Speech Sound Identification<br />
Influenced by Adjacent NRestored- PhonemesN, J<br />
Acoust Soc Am 80. S110.<br />
tlhman, S (1966): MCoarticulation in<br />
Spectrographic MeasurementsN,<br />
39:11-168.<br />
VCV<br />
Utterances:<br />
J Acoust Soc Am<br />
Hhman, S (1967): -Numerical Model of CoarticulationM, J<br />
Acoust Soc Am 41:310-320.<br />
Perkell, J and Klatt, D (1986): Inv.ri.nce .nd V.ri.bility<br />
in Speech Proc ••••• , Hillsdale, N J:LEA.<br />
Pollack, I and Pickett, J M (1964): -Intelligibility of<br />
Excerpts from Fluent Speech: Auditory vs<br />
Structural Context-, J Verb Learn and Verb Beh<br />
3:79-84.<br />
Risberg, A (1979): Doctoral dissertation, RIT, Stockholm.<br />
Schulman, R (forthcoming): ·Articulatory Dynamics of Loud<br />
and Normal Speech-, submitted to J Acoust Soc Am.<br />
Stevens, K N and House A S<br />
Articulations by<br />
Acoustical Study-,<br />
128.<br />
(1963): -Perturbation of Vowel<br />
Consonantal Context: An<br />
J Speech Hearing Res 6:111-<br />
Stevens K N and Blumstein S (1978):<br />
Place of Articulation in<br />
Acoust Soc Am 64, 138-1368.<br />
NInvariant Cues for<br />
Stop Consonants-, J<br />
Stevens K N and Blumstein S (1981): NThe Search for<br />
Invariant Correlates Phonetic Features·, in Eimas,<br />
P and Miller J (eds): Per.p.ctiv •• on the Study of<br />
Spe.ch, Hillsdale, N J:LEA.<br />
Sundberg, J (1975): -Formant Technique in a Professional<br />
Singer·, Acustica 32 (2), 89-96.<br />
19
Traunmdller, H (1981): ·Perceptual Dimension of Openness in<br />
Vowels·, J Acoust Soc Am 69, 146-147.<br />
Warren, R (1970): -Perceptual Restoration of Missing Speech<br />
Sounds·, Science 167, 392-393.<br />
Westbury, J and Keating P (1980): ·Central Representation of<br />
Vowel Duration-, J Acoust Soc Am 67, Suppl 1, S37<br />
(A) •<br />
20
ARTICULATORY DYNAMICS OF LOUD AND NORMAL SPEECH*<br />
Richard Schulman<br />
Introduction<br />
The present study was initiated to compare the movements and timing<br />
relationships of the lips and jaw for normal and loud speech<br />
productions.<br />
Swedish vowels varying in several degrees of openness were<br />
produced in a bilabial stop context.<br />
Considering findings reported in<br />
the literature, we could expect to observe the following for the loud as<br />
compared with normal productions:<br />
1) There will be an increase in jaw opening for all vowels<br />
(Schulman 1985).<br />
2) Sussman et al. (1973), Gay (1977), and Macchi (1985) have<br />
all found that jaw postion during bilabial stops is lower before and<br />
after open vowels than close vowels.<br />
should be even more pronounced here,<br />
This coarticulatory relationship<br />
given the increase in jaw openings<br />
for all vowels.<br />
3) Folkins and Abbs (1975) reported that when increasing jaw<br />
opening by resistive loading during the closure of bilabial stops,<br />
closure was achieved primarily by compensatory movement of the upper lip<br />
and to a lesser extent by the lower lip which affects a more elevated<br />
position in respect to the jaw. Given expectation , we should, in our<br />
"natural bite-block" condition,<br />
also see such compensation.<br />
4) In the bite block work of Netsell, Kent and Abbs (1978)<br />
compensatory lip movement during bilabial closure was accompanied by<br />
increases in the velocities of the articulators in order to achieve<br />
closures of the same duration as for normal productions.<br />
If the analogy<br />
between bite block and loud speech is a valid one, similar temporal<br />
characteristics should also be observed.<br />
What effects this will have on<br />
the coordination of the individual articulators is uncertain.<br />
Gay<br />
(1977) reports specific timing relationships between the articulators<br />
for bilabial stops.<br />
Will these relationships be maintained despite the<br />
increased movement and velocities expected for the loud productions?<br />
Will similar articulatory patterns be observed for the vowels?<br />
* Paper presented at the Swedish Phonetics Conference, Uppsala, October<br />
17-18, 1986<br />
21
Procedure<br />
A magnetometer system (Branderud, 1985) was used to track the<br />
movements of the lips and jaw.<br />
Magnetic coils were placed along the<br />
mid-sagittal plane on the vermil ion border of the upper and lower lips<br />
and at the base of the incisors. The movements recorded were in the y<br />
plane for all three articulators and in addition in the x-plane for the<br />
jaw.<br />
An electroglottograph was used to register the opening and closing<br />
of the glottis.<br />
Two channels with different gain settings were used for<br />
recording the audio signal.<br />
All signals were recorded simultaneously<br />
with a Racal seven channel tape recorder at a speed of 30 ips.<br />
Four<br />
subjects were used, three male and one female, between the ages of 22<br />
and 30.<br />
The speech of all subjects is typical of that for speakers of<br />
standard Swedish.<br />
The recordings were made in a sound-treated booth at<br />
the Phonetics Lab, Stockholm University.<br />
The speech material consisted of twelve Swedish vowels appearing in<br />
a li'b_bl frame. By placing an unstressed Iii before the the first Ibl<br />
in the frame, one might induce the jaw to attain a similar degree of<br />
openness during the stop closure regardless of the following vowel's<br />
openness.<br />
In other words, due to the high posi tion of the jaw for the<br />
Iii, its start position from the Ibl should be minimally low and<br />
relatively the same for all vowels to follow. One must, however,<br />
acknowledge that this presents us with a conflict of intent, for in the<br />
process of influencing the jaw's position during the bilabial as<br />
expressed here, we would be reducing the right to left coarticulatory<br />
effects wh ich we have set out to study (point 2 in the introduction).<br />
Six lists of the words, each in different orders,<br />
were written on<br />
separate cards and were held by the author during the actual recording.<br />
On each card were fifteen words, that is, all the twelve vowels<br />
appearing in the frame, plus three additional words. In Swedish, both<br />
[£] and [;Q] have the same orthographic representation, "1:1.". Therefore,<br />
for purposes of clarification, the stimuli containing these vowels were<br />
preceded by real words containing the appropriate vowel quality: "i<br />
b1:l.ver" for [E] and "i b1:l.r" for [a]. In addition, one of the remaining<br />
ten words was repeated in list final position to eliminate "end of<br />
list" pronunciation for the preceding word.<br />
The productions of these<br />
three<br />
additional stimuli were not examined during the analysis.<br />
All the lists were read through first with normal vocal effort then<br />
with loud effort.<br />
The first list for both conditions was considered a<br />
22
practice list to be discarded during analysis.<br />
Prior to the recording,<br />
this first list was read through several times allowing the subject to<br />
familiarize himself with the material. For each subj ect, several<br />
attempts were necessary before feeling comfortable in distinguishing<br />
between the two "ibl\b "'s. Despite this procedure, subj ects were often<br />
requested to retake productions of these words during the actual<br />
recording .<br />
Summary of Results<br />
Though there is decided variability between subjects, in general we<br />
have found that for loud speech as compared with normal speech the<br />
following (as illustrated by the data for subject H.H.) holds true:<br />
1. In vowel production, movement increases dramatically for all<br />
articulators in regular, predictable fashions. For the jaw,<br />
distinctions in vowel height are maintained (Figs.<br />
la and 1b), while the<br />
lips clearly reflect the differences in degree of rounding and<br />
spreading. (Fig. 2)<br />
2. Coarticulation is manifested as a right to left effect on jaw<br />
pos it ioning. The increased displacement of the jaw for the loud<br />
stressed vowels had the consequence of causing its highest position in<br />
the preceding bilabial to lower by almost twice its normal amount<br />
(exception speaker C. K.) (Fig. 3)<br />
Coarticulation is also demonstrated<br />
by the nearly identical positioning of the jaw for the initial and<br />
stressed Iii and the following bilabial.<br />
3. The lowered jaw position for bilabial closure provides an<br />
articulatory setting strongly reminiscent of that induced by applying<br />
artificial perturbations to the jaw.<br />
This gives us cause to regard the<br />
shouting paradigm as providing us with a "natural" bite block.<br />
4. In deference to this "natural" bite block during the bilabial<br />
closure, motor equivalence (c.f. Hughes and Abbs, 1976) is demonstrated,<br />
whereby the upper lip compensates for the lowered jaw (hence,<br />
inferior<br />
lower lip position) achieving closure even more complete than for normal<br />
production. (see Table I)<br />
5. It was demonstrated that increased articulatory movement cannot<br />
always be depicted as a simple linear amplification of normal<br />
articulation by scale factors, but also entails a more complex goaloriented<br />
reorganization of specific movements. (Fig. 4)<br />
6. Greater displacements are accompanied by increased veloc ities<br />
23
(Fig. 5). For loud speech, this results in shorter durations of<br />
intervocalic bilabials.<br />
Durations of stressed loud vowels are, however,<br />
somewhat longer than normal productions (Fig. 6). To achieve durations<br />
equal to or shorter than those for normal speech, the ratios of velocity<br />
to displacement would have to have been greater (as was the case with<br />
speaker T.L.).<br />
We are thus presented with CV sequences only slightly<br />
longer for loud speech as compared with normal speech.<br />
Both<br />
phonological distinctions (short/long) and the inherent length of vowels<br />
associated with openness are maintained during loud speech.<br />
7. The timing order of the articulators is neither very stable nor<br />
predictable, across vowel contexts, speech conditions and speakers.<br />
More synchrony is found between the lower lip and the jaw in the<br />
production of loud vowels as compared with normal, whereas the converse<br />
is true for synchrony between the upper lip and the lower lip and jaw.<br />
References<br />
Branderud, P. (1985). "Movetrack - A movement tracking system". PERILUS<br />
IV. Stockholm University. 20-29<br />
Folkins, J. and Abbs.,J (1975). "Lip and jaw motor control during speech:<br />
Responses to resistive loading of the jaw". J. Speech Hearing Res.<br />
, 207-220<br />
Gay, T.<br />
(1977). "Articulatory movements in VCV sequences", J. Acoust.<br />
Soc. Am. £l, 183-193<br />
Hughes, O. and Abbs, J.<br />
(1976). "Labial-mandibular coordination in the<br />
production of speech: Implications for the operation of motor<br />
equivalence". Phonetica ll, 199-221<br />
Mac chi, M. (1985). .s.!:!!!!!.! !! EE.s.!:!!!!!.! .f!.E !! !iE !!<br />
i ! E !.i£ ! !.£E · Ph D. dissertation, New York University.<br />
McAllister, R., Lubker, J. and Carlson, J. (1974). "An EMG study of some<br />
characteristics of the Swedish vowels". Journal of Phonetics l,<br />
267-278<br />
24
Netsell, R., Kent, R. and Abbs,J. (1978). "Ad justments of the tongue and<br />
lips to fixed j aw positions during speech: A preliminary report ".<br />
Paper presented at the Conference on Speech Motor Control,<br />
University of Wisconsin, Madison,<br />
Wisconsin.<br />
Schulman, R. (1985). "Articulatory targeting and perceptual constancy of<br />
loud speech". PERILUS IV. Stockholm University. 86-91<br />
Sussman, H.M., MacNeilage,P. F. and Hanson,R.J. (197 3). "Labial and<br />
mandibular dynamics during the product ion of bilabial consonants:<br />
Preliminary observations ", J. Speech Hering Res. , 397-420<br />
25
TABLE I.<br />
Mean displacement in millimeters from rest position of<br />
articulators at point of minimum lip separation during the first<br />
bilabial stop of the /i'bVb/ test words.<br />
Values are averaged for twelve<br />
Swedish vowels (5 tokens for each vowel) for normal (X N ),<br />
loud (X L ) and<br />
the difference between normal and loud productions (LN ) '<br />
FIG. 1. Jaw opening during production of stressed Swedish vowels<br />
plotted according to their traditional phonological classification in<br />
terms of front, central, back and rounding.<br />
Normal and loud productions<br />
for each speaker appear in separate plots (la and lb,<br />
respectively).<br />
FIG. 2. Maximum lip separation for loud speech plotted against normal<br />
productions for the stressed vowels.<br />
The traditional terms inrounded,<br />
out rounded and spread are used.<br />
Two regression lines are fitted to the<br />
data: for out rounded vowels and for spread vowels. Inrounded vowels are<br />
not included in the calculation of these lines.<br />
FIG. 3. Position of the jaw at minimum displacement during the<br />
production of the first bilabial stop against the jaw's position at<br />
maximum displacement during the following vowel.<br />
Separate regression<br />
lines are fitted for normal and loud productions.<br />
FIG. 4. Position of the jaw at maximum displacement during the<br />
production of the frame initial segment ([i)) against the position of<br />
jaw at minimum displacement in the following bilabial stop.<br />
A single<br />
regression line is fitted to both normal and loud productions.<br />
FIG. 5. Averaged movements of the: (a) upper lip component; (b) lower<br />
lip component;<br />
(c) jaw component for speaker H. H. 's productions of<br />
/i'bab/. For each articulator the normal (l), loud (2) and normal<br />
multiplied by a scale factor (3) movements are presented.<br />
Figure d<br />
shows lip separation for this se quence and is produced by subtracting<br />
the movement curves of Figure a from the sum of the curves in Figures b<br />
and c.<br />
The normal and loud audio signals for this se quence is displayed<br />
26
in the top of each figure.<br />
FIG. 6. Peak velocity of the jaw during movement from the closure of the<br />
first bilabial stop to its position of maximum displacement during the<br />
stressed vowel versus displacement of jaw from the position of minimum<br />
excursion (bl) to maximum excursion (V).<br />
One regression line is fitted to<br />
normal and loud productions .<br />
FIG. 7.<br />
Acoustic durations are plotted for each vowel context for normal<br />
and loud productions of the intitial bilabial stop and the stressed vowel.<br />
The vowel segment (white bar) begins at the termination of the bilabial<br />
stop (black bar).<br />
27
PARJ>.METER X N<br />
X L<br />
l:. XLN X N<br />
X L<br />
l:. XLN X N<br />
X L<br />
l:.X LN<br />
X N<br />
X L<br />
l:.X LN<br />
LL-UL -1.00 .03 1. 03 -1 .51 -2.15 -.64 -2.06 -.83 1. 23 -.51 2. 19 2.70<br />
LL -1.96 -3.59 -1 .63 -2.30 -2.88 -.58 -4.16 -6.09 -1.93 -.80 1. 94 2.74<br />
JY -3.09 -5.57 -2.48 -4.16 -5.08 -.92 -5.81 -8.16 -2.35 -4.46 -3.33 1. 13<br />
LL-J 1.13 1. 97 .84 1. 87 2.20 .33 1. 65 2.07 .42 3.66 5.27 1. 61<br />
UL -.95 -3.62 -2.67 -.78 -.73 .05 -2.10 -5.26 -3.16 -.30 -.25 .05<br />
Table I<br />
ARTICULATORY<br />
H.H. T.L. P.H. C.K.<br />
N<br />
00
Figure 1a.<br />
NORMAL<br />
i<br />
<br />
5 e "<br />
i<br />
10<br />
a<br />
u<br />
0<br />
0<br />
a<br />
"<br />
E<br />
E<br />
'"'<br />
::3:<br />
..<br />
Fi gure 2.<br />
3S<br />
"" 30<br />
E<br />
E<br />
v<br />
a<br />
:::) 2S<br />
0<br />
-J<br />
. .<br />
Z<br />
0<br />
..<br />
20<br />
I-<br />
Figure 4.<br />
/'"'.<br />
E<br />
E<br />
\./<br />
10<br />
/'"'.<br />
-.-4<br />
\./<br />
-J<br />
..<br />
..<br />
Figure 6.<br />
,...<br />
0')<br />
)<br />
E<br />
V'<br />
400<br />
350<br />
>- 300<br />
<br />
-<br />
U<br />
0<br />
-l<br />
W<br />
><br />
250<br />
(y=7.1x+1O.9)<br />
=-<br />
<br />
0<br />
-J<br />
Cl<br />
z<br />
AN EXPERIMENT ON THE CUES TO THE IDENTIFICATION OF FRICATIVES<br />
HARTMUT TRAUNMULLER<br />
DIANA KRULL<br />
ABSTRACT<br />
Synthetic fricatives with two spectral peaks scanning a wide range of<br />
frequencies were put into three versions of the context [a £:] , also<br />
generated synthetically, and imitating a male speaker (1), a child (2),<br />
and an aroused male speaker (3) with elevated Fa and Fl. The stimuli<br />
were presented in two orders, with increasing or decreasing frequencies<br />
of the spectral peaks, to 16 speakers of Swedish who identified the<br />
fricatives as [f] , [s] , [c], [], or [ 6]. In a given context, the<br />
obtained phonetic boundaries followed mainly the spectral peak lowest in<br />
frequency, while the upper peak contributed only marginally even if it<br />
was at a distance less than the "critical distance" of about 3 Bark. In<br />
context (2), as compared with (1), the phonetic boundaries were shifted<br />
up, but less (in Bark) than the vowel formants.<br />
INTRODUCTION<br />
It is well known that the characteristic frequencies, i. e. , the<br />
frequencies of the formants and the fundamental in speech sounds with a<br />
given phonetic quality vary with the overall dimensions of the speaker's<br />
vocal tract. If the characteristic frequencies of vowels are converted<br />
into a measure of tonotopical place, such as critical band rate (Bark),<br />
differences in speaker size can be seen to correspond to a tonotopic<br />
translation of the auditory pattern of excitation [11].<br />
Identifications of synthetic two-formant vowels revealed that a uniform<br />
tonotopic compression of the auditory pattern of excitation with a<br />
fixed point in the region of F3 also preserves phonetic quality [12].<br />
Natural vowels are transformed in this way in shouting and in whispering<br />
[ 11 ] .<br />
The present investigation is about the transformations the spectra of<br />
voiceless fricatives can be subjected to without affecting their phonetic<br />
quality. It is known that voiceless fricatives can be synthesized<br />
satisfactorily with two resonances and one antiresonance and that the<br />
33
cues to the phonetic identity of voiceless sibilants reside mainly in<br />
the stationary part of their spectrum, while the transitions are more<br />
important for non-sibilants [5, 7]. One-parameter sibilants can be<br />
synthesized using a resonance and an antiresonance one octave lower in<br />
frequency [5] . Such sibilants lack intrinsic cues to speaker size. In<br />
spectrogram reading, the Swedish voiceless sibilants can be distinguished<br />
by the frequency of spectral energy onset while there is more<br />
variation, even within the same speaker and context, in the detail above<br />
that frequency [6]. A second characteristic spectral peak can, however,<br />
often be discerned and one question we address here is whether this second<br />
peak is used to normalize for speaker size. We also investigate in<br />
how far a vocalic context can serve this purpose.<br />
METHODS<br />
Subjects<br />
The experiments were conducted with a group of 20 native and 6 nonnative<br />
speakers of Swedish, all employees or students at the Institute<br />
of Linguistics at Stockholm University. None of them reported auditory<br />
handicaps and all were familiar with the phonetics of Swedish, possessing<br />
Iff, ls i, //, and /J/. We report here the results of 16 native<br />
speakers with uniform behavior, mostly speakers of the local variety<br />
with the distributional allophones [] and [51 for / J /, but including<br />
three speakers of southern varieties, who had no [] in their own<br />
speech.<br />
Stimuli<br />
The stimuli were synthetic VCV sequences. The vocalic segments had<br />
been obtained by synthetic imitation of a natural [a 1 s:f: ], produced by<br />
a male speaker of Swedish (Stockholm variety). A three parameter voice<br />
source [3] signal in accordance with that utterance was generated by the<br />
procedure described in [12]. The vocalic as well as the fricative segments<br />
were generated in serial synthesis by use of a block diagram<br />
simulating program (sampling at 16 kHz, 16 bit/sample). Eight vowel<br />
formants were used. Their bandwidths obeyed the standard relation<br />
Bi<br />
=<br />
0.05 Fi + 50 Hz.<br />
The fricatives were generated by feeding white noise through a high-<br />
34
pass and a low-pass resonance filter, both of second order and with<br />
Q=10. The two resonance frequencies Fl and Fh were varied in steps of a<br />
factor 4 1/9 (approx. 1.0 Bark). 42 combinations of Fl and Fh were used<br />
to scan the auditory space as shown in Figure 1. The fricatives had a<br />
duration of 0.20 s and the intensity onset and offset of the natural [s]<br />
was also imitated.<br />
A second version of the vowel context was obtained by a uniform<br />
translation of all vowel formant frequencies by + 2.5 Bark. The voice<br />
source parameters were rescaled in such a way that the mean FO, weighted<br />
according to amplitude, was also translated by + 2.5 Bark. This transformation<br />
produces the characteristic frequencies in vowels of children<br />
four to five years of age from those of the same vowels pronounced by<br />
men [11].<br />
A third version of the vowel context was obtained by a uniform tonotopic<br />
compression of all formant frequencies and the weighted mean FO.<br />
The compression is described by Equation [1]:<br />
z Zo + 0.15 (15.5 - Zo ) [ 1 ],<br />
where Zo is the critical band rate of a characteristic peak in the<br />
original version, and Z is the corresponding value in the compressed<br />
version. This transformation produces the characteristic frequencies of<br />
shouted vowels from those of the original [11]. Between these modes of<br />
speech, there are, however, additional differences which have not been<br />
imitated in our stimuli which provoked the impression of being produced<br />
by an aroused speaker rather than by a shouting one.<br />
For conversion of the vowel formant frequencies f (in Hz) into critical<br />
band rate z (in Bark) Equation [2] that agrees to within ± 0.05 Bark<br />
with the empirical values [13] in the range of 0.2 to 6.7 kHz [10] was<br />
used and for reconversion Equation [3].<br />
z = (26.81<br />
f / (1960 + f)) - 0.53<br />
[ 2]<br />
f<br />
1960 (z + 0.53) / (26.28 - z)<br />
[ 3 ]<br />
The formants, which were stationary, had the frequencies listed in<br />
Table 1 together with the weighted mean FO.<br />
35
Table 1..: The characteristic frequencies of the<br />
three versions of the same vowels (in Hz).<br />
Neutral male<br />
Neutral child Aroused male<br />
[a] [ E: ] [a] [ E: ] [a] [ E: ]<br />
FO 102 110 327 337 298 306<br />
F1 751 442 1153 751 945 639<br />
F2 1248 1799 1626 2617 1421 1932<br />
F3 2501 2390 3702 3525 2558 2461<br />
F4 3359 3413 5160 5258 3287 3332<br />
F5 4311 4386 6977 7131 4052 4111<br />
After D/A conversion the stimuli were recorded on tape in two different<br />
orders. First, Fl and Fh started at their highest values, 24 and 25<br />
log. units. Fl subsequently decreased in steps of 2 u. and Fh in steps<br />
of 1 u. until the distance between the two peaks reached 7 u. In the<br />
following descending series of stimuli Fl and Fh started 1 u. below the<br />
initial values, etc. In the second order Fl and Fh started at their<br />
lowest values, 7 and 14 u., and ascended in reversal of the first order.<br />
Each stimulus had a duration of .8 s and was presented twice in<br />
succession with an interval of 1.5 s. In the following, any sequence of<br />
this kind is considered as one "stimulus". Each stimulus was followed by<br />
a pause of 2.5 s for the subjects to respond. A pause of 5 s was inserted<br />
before each new series of stimuli. The stimuli were presented in six<br />
blocks, beginning with the neutral male version in the first (1) order,<br />
followed by child (2), aroused male (1), neutral male (2), child (1),<br />
and aroused male (2).<br />
Procedure<br />
The subjects were tested in a quiet, sound treated room and the<br />
stimuli were presented to them via Sennheiser HD414 headphones at a<br />
comfortable listening level. The subjects received answer sheets with a<br />
set of the five symbols "G, s, tj, rs, sj" for each stimulus. After explaining<br />
the meaning of the symbols ([8] or [f], [s], [s;], [], [6]) and<br />
presenting a few stimuli for aquaintance, the subjects were asked to<br />
mark for each stimulus the symbol of the fricative they had heard. They<br />
36
were allowed to mark two different symbols in cases of doubt. Singlesymbol<br />
responses were counted as two markings of the same symbol.<br />
Two-dimensional histograms were obtained from the distribution of assigned<br />
labels as a function of the Fl and Fh values. The histograms were<br />
locally normalized with respect to the total number of responses to each<br />
stimulus and smoothed by a spatial cosine filter. "Phonetic boundaries",<br />
say between [s] and [s;], were obtained by considering only the [s] and<br />
[] labels and computing the 50 % level curve.<br />
RESULTS AND DISCUSSION<br />
Effects of presentation order<br />
"8"-labels were infrequent and mainly attached at the highest resonance<br />
frequencies and, occasionally, at the very lowest. The boundaries<br />
between the sibilants are shown in Figure 1. The effect of contrast can<br />
clearly be seen at the [] - [] boundary which is shifted by 0.9 Bark<br />
in Fl between the two orders of presentation. Since contrast presupposes<br />
that at least one similar stimulus has been heard, there is no such<br />
effect at the beginning of each series (shown with thin lines in Figure<br />
1). There, the responses are, instead, likely to be biased by expectation<br />
towards [s] or [6] responses because the previous series of stimuli<br />
begun with these sounds. Outside this region, the [s] - [] boundary is<br />
shifted just as much as the [] - [] boundary. As for the boundary between<br />
[6] and [t>] , the responses are likely to be biased towards [] ,<br />
because this allophone would normally occur in an laSE:I sequence as<br />
pronounced by most of our subjects. This would explain the deviant<br />
course of this boundary in the second order of presentation.<br />
Effects of intrinsic properties<br />
The perceptual role of the two spectral peaks in our stimuli can be<br />
understood by studying the slopes of the boundaries in Figure 1. The<br />
boundaries whose slope is not affected by order effects are well approximated<br />
by straight lines. Two of them ([] - [s] and [] - [6]) have a<br />
course almost perpendicular to the Fl-axis, implying that the higher<br />
resonance Fh is practically irrelevant for these distinctions. Then, of<br />
course, the distance between the spectral peaks is also irrelevant.<br />
Thus, intrinsic properties of these stimuli were not used to normalize<br />
37
for speaker size.<br />
Phonetic boundaries might possibly be given by a gross center of<br />
spectral gravity, like perceived "sharpness" [1].<br />
Since Fh does affect<br />
the sharpness of our stimuli - as affirmed by informal listening - the<br />
results show that sharpness is not an invariant quantity in sibilants<br />
with a given phonetic quality.<br />
If the resonances are separated less than a critical distance of 3.5<br />
Bark observed by Chistovich et al. [2] the phonetic boundaries might be<br />
expected to reflect an integrated spectral peak. The main part of our<br />
L] - [s] boundary runs through an area where Zh - Zl < 3.5 Bark (see<br />
Figure 1). The slope of this line indicates, however, that this phonetic<br />
decision is only based on the pitch of the lower spectral peak or on the<br />
-<br />
N<br />
I<br />
.::t:.<br />
-<br />
.!:<br />
u..<br />
6 4 2<br />
(Bark)<br />
Figure !: Phonetic boundaries between Swedish sibilants.<br />
First (continuous) and second (dashed) order<br />
of presentation. Pooled contexts.<br />
38
spectral onset of auditory excitation. Similar results have been obtained<br />
in non-phonetic pitch matching tasks [4, 9] for frequencies below<br />
1 kHz.<br />
The boundaries between [] and [] are, however, not completely<br />
independent of Fh' This may be due to the fact that [] and [] are the<br />
sibilants for which our synthetic stimuli were closest to the natural<br />
versions, as judged by comparison with measured spectra of Swedish sibilants<br />
[9, 8]. The other phonetic boundaries might have followed a similar<br />
course if the stimuli had been closer imitations of natural sibilants.<br />
The phonetic boundaries can be described by Equation [4]:<br />
I· 1.<br />
[4 ],<br />
where ki is a factor expressing the perceptual weight of Zhi' see Table<br />
2, and Ii is a constant characteristic of boundary i. The factor k might<br />
reflect the goodness of fit between the auditory spectra of the synthetic<br />
stimuli and those of natural sibilants, but it might, alternatively,<br />
be a function of (Zh - Zl)' In that case the phonetic boundaries in<br />
Figures 1 and 2 should deviate slightly from linearity. Interestingly, k<br />
is most negative for (Zh - Zl) 3.5 Bark. This reminds of the suggestion<br />
by Syrdal et al. [8] to regard this distance as specific of phoneme<br />
boundaries among sonorants. While our data do not immediately support<br />
this for sibilants - the observed boundaries are not perpendicular to<br />
the (Zh - Zl)-axis - they do show a tendency in this direction.<br />
Table : Perceptual weight k of Fh<br />
in relation to that of Fl' cf. Equation [4].<br />
Phonetic<br />
boundary<br />
k -0.05<br />
-0.20 -0.27 -0 .10<br />
Effects of context<br />
Since intrinsic normalization for speaker size is almost absent in<br />
our results, we would expect such a normal ization, which theoretically<br />
would be appropriate, to be mediated by context. Figure 2 illustrates<br />
39
the effects of transforming the spectrum of the vowel context. We can<br />
see that the boundaries between sibilants are affected by the acoustic<br />
properties of the vowel context whose phonetic quality was close to<br />
invariant.<br />
The extent of the boundary shift between the neutral male and the<br />
child version of the vowels ( between +0.7 and +1.3 Bark ) is,<br />
however,<br />
smaller than the translation of the vowel spectra (+2.5 Bark ) , especially<br />
at the<br />
[] - [6] boundary.<br />
The boundaries in the aroused male version are shifted from those in<br />
the neutral version about halfway in the same direction as those in the<br />
child version. The [] - [6] boundary ( at 11.6 Bark = 1.6<br />
kH z ) is<br />
-<br />
N<br />
I<br />
<br />
-<br />
..c<br />
U.<br />
6 4 2<br />
<<br />
(Bark)<br />
Figure : Phonetic boundaries between sibilants in<br />
contexts of a man's ( cotinuous), a child's (dashed),<br />
and an aroused man's ( dash-dotted) vowels.<br />
Pooled<br />
orders of presentation.<br />
40
shifted by roughly +0.3 Bark, i. e. , less than the vowel formants in the<br />
same frequency region (+0.6 Bark). Since, further, the upper vowel<br />
formants (above 15.5 Bark =<br />
2.9 kHz) in the aroused male version are not<br />
shifted upwards but slightly downwards, the shift of the [s] - [9]<br />
boundary (at 21 = 19<br />
Bark) can not have been guided by the vowel formants<br />
in the same frequency region. Apparently, the sibilant boundaries<br />
are shifted about half as mu ch as some weig hted mean of the vowel<br />
formants, F2 given the highest weight. This would hold approximately for<br />
both of our context transformations, but the correlation of the extent<br />
of boundary shift with F l<br />
remains an open question.<br />
41
ACKNOWLEDGEMENT<br />
This research has been supported by a grant from HSFR, the Council<br />
for Research in the Humanities and Social<br />
Sciences.<br />
REFERENCES<br />
[1] G. v. Bismarck, Extraktion und Messung von Merkmalen der Klangfarbenwahrnehmung<br />
stationrer Schalle, MUnchen 1972.<br />
[2] L. Chistovich and V. Lublinskaya, "The "center of gravity"<br />
effect in vowel spectra and the critical distance between formants",<br />
Hearing Res. l, 1981, 185-195.<br />
[3] G. Fant, "Glottal source and excitation analysis", STL-QPSR<br />
1/1979, 85-107.<br />
[4] R. Glave Untersuchungen zur Tonhhenwahrnehmung stochastischer<br />
Schallsignale, Helmut Buske Verlag, Hamburg, 1973.<br />
[5] J. M. Heinz and K. Stevens, "On the properties of voiceless<br />
fricative consonants", :!.:.. Acoust. Soc. Am. ll, 1961, 589-596.<br />
[6] P. Lindblad, Svenskans sj e- och tj e-ljud i ett allmfonetiskt<br />
perspektiv, CWK Gleerup, Lund 1980.<br />
[7] J. Martony, "On the synthesis and perception of voiceless fricatives",<br />
STL-QPSR 1/1962, 17-22.<br />
[8] A. K. Syrdal and H. S. Gopal, "A perceptual model of vowel<br />
recognition", J. Acuost. Soc. Am. 7...2.., 1986, 1086-1110.<br />
[9] H. TraunmUller, "Perception of timbre: It, in R. Carlson and B.<br />
Granstrm (eds.), The Representation of Speech in the Peripheral Audito<br />
Ei. System, Elsevier Biomed. , 1982, pp. 103-108.<br />
[10] H. TraunmUller, "Analytical expressions for the tonotopical<br />
sensory scale", part of Ph. D. thesis, <strong>Stockholms</strong> Universitet, 1983.<br />
[11] H. TraunmUller, "Some aspects of the sound of speech sounds",<br />
contr. to NATO-ARW on psychophysics of speech perception, Utrecht 1986.<br />
[12] H. Traunmfiller and F. Lacerda, "Perceptual relativity in identification<br />
of two-formant vowels", Speech Communication, 1987, (in<br />
print) .<br />
[13] E. Zwicker, "Zur Unterteilung des hrbaren Frequenzbereiches in<br />
Frequenzgruppen", Acustica lQ, 1960, p. 185.<br />
42
SECOND FORMANT LOCUS PATTERNS AS A MEASURE OF<br />
CONSONANT-VOWEL COART ICULATION<br />
Diana Krull<br />
1. Introduction<br />
Formant frequencies at the consonant-vowel boundary depend not<br />
only on the place of articulation of the consonant but al so on<br />
the adjacent vowel. Fant (1973) measured F2, F3 and F4 at stop<br />
consonant- vowel boundaries of one male Swedish speaker. His<br />
results<br />
showed that there is a considerable variation of formant<br />
freqencies at CV boundaries,<br />
especial ly in connection with voiced<br />
stops; also, labial s and velars demonstrate greater variation<br />
when compared to dental s.<br />
The variation is most pronounced for F2<br />
although the dif ference measured in Hz is sometimes larger for<br />
F3, it will amount to less on a perceptual scal e.<br />
ohman (1966 ) used voiced stops between systematicall y varied<br />
preceding and following vowel s and demonstrated that F2 at the CV<br />
boundary is influenced also by the preceding vowel . Both these<br />
studies have shown that there is a strong coarticulation effect<br />
from adjacent vowels on F2 at CV boundary,<br />
thus contradicting the<br />
claim of an invariant F2 locus made by Del attre, Liberman, and<br />
Cooper (1955).<br />
Coarticulation does not work in one direction only, that is,<br />
there is also an influence from the consonant on the vowel.<br />
Thus,<br />
for example, F2 measured in the middle of the vowel is lower in<br />
43
Ibubl than in Idudl everything else being equal, see Lindblom<br />
(1963) .<br />
The aim of this investigation was to compare the amount of<br />
coarticulation in spontaneous speech and in isolated words on the<br />
basis of the second formant trajectories. The differences in F2<br />
between two pOints, one at the CV-boundary and another in the<br />
middle of the vowe I, should decrease with increasing<br />
coarticulation because the adjacent sounds would become more<br />
alike, although we would not be able to te ll whether it was the<br />
vowel that had influenced the consonant or vice versa. F2 was<br />
measured at two pOints on a Voiceprint spectrogram as shown in<br />
Fig.1: (ll at the CV boundary, and (2) in the middle of the<br />
vowel. We called the first point the "locus" of the second<br />
formant and defined it as the frequency of the formant at the<br />
first pulse of the vowel after consonant release. Locus was not<br />
measured at the moment of consonant release because in the<br />
spontaneous speech there was often no visible burst.<br />
(Measurements at the release may also be difficult due to the<br />
rapid transition.) The second point, measured in the middle of<br />
the vowel, we called "target". Both terms were used in a more<br />
concrete sense than had been done earlier: Delattre et al (1955)<br />
had<br />
defined "locus" as a point on the frequency scale about 50ms<br />
before consonant release which they considered to be the<br />
virtual<br />
starting point of the formant; Lindblom (1963) had used "target"<br />
in the sense of an asymptotic value towards which the formant<br />
frequency is aimed.<br />
44
Fig.! Example of measurements on a spectrgram: F2i at the first<br />
pulse of the vowel after stop release, and F2t in the middle of<br />
the vowel.<br />
The relation between the two pOints can be expressed in what we<br />
call the "locus equation"<br />
F2i = k * F2t + c<br />
where F2i is the initial locus, F2t the vowel target, and k and c<br />
are constants.<br />
45
The value of k determines the slope of the regression line for<br />
the locus frequencies (see Fig. 2'. The slope shows the amount of<br />
coarticulation: thus, for example, if k=O then F2i=c and there is<br />
no coarticulation at all; the xocus is invariant. If, on the<br />
other hand, k=1 then locus is completely dependent on the vowel<br />
target, and there is maximal coarticulation. Other studies<br />
(Lindblom, 1963; Lindblom and Lacerda, 1985) have shown that the<br />
mount of coarticulation varies with consonant place of<br />
articulation: the strongest coarticulation is connected with<br />
the<br />
labials, the weakest with the palatal /g/, while the denta ls and<br />
the retroflexes lie somewhere in between.<br />
3.0<br />
"...<br />
N<br />
I<br />
:::.::<br />
'../<br />
N<br />
LL<br />
2.5<br />
0::: 2.0 -<br />
0<br />
LL<br />
(j)<br />
:::l<br />
u 1.5<br />
0<br />
-l<br />
-l<br />
How does speech style affect the amount of coarticulation?<br />
Lindblom and Lindgren (1985) investigated CV coarticulation for<br />
IbV I and IdV I by comparing the size of F2 trajectories between<br />
locus and target. Their results showed that there is more<br />
coarticulation in a neutral speech style in comparison with clear<br />
speech.<br />
Would the same kind of difference be found between words<br />
occurring in spontaneous speech and the same words spoken in<br />
isolation? That is, would words occurring in spontaneous speech<br />
in general display more CV coarticulation than<br />
words<br />
spoken<br />
in<br />
isolation?<br />
2. Experiment<br />
The spontaneous speech material used in this investigation<br />
consisted<br />
of recordings made for the project "Phonetic variation<br />
in natural speech" (Lindgren, Lindblom, and Krull, 1986) . The<br />
recordings were made of two male speakers of Central Swedish.<br />
Spontaneous speech was elicited in two ways: firstly, the<br />
speakers were asked to retell short stories they had been given<br />
to read beforehand; secondly, they conversed freely with each<br />
other. The recordings were made in a quiet room at the Phonetics<br />
Laboratory of the University of Stockholm.<br />
Only word initial CV combinations were used for measurements in<br />
this study. The first CV combinations consisted of a voiced stop<br />
followed by a vowel. Only labial and dental stops were used: Igl<br />
before front vowels is, with few exeptions, pronounced as (j) ,<br />
and the velar samples before back vowels showed too little<br />
variation in F2 for meaningful locus equtions to be set<br />
up.<br />
Stops tat did not have a complete closure were not used.<br />
47
For each speaker a list was prepared containing the words in<br />
his<br />
spontaneous speech sample that had been measured. The speakers<br />
were then asked to read the list with a short pause between<br />
items. The words were in random order without context, each<br />
occurring twice. The second occurrence was meant to be used in<br />
case the first reading of the word should present difficulties of<br />
measurement. Only one item was measured. We shall refer to the<br />
words read in isolation as "reference words" (see word list in<br />
Appendix).<br />
The measurements of locus and target frequencies were carried out<br />
as shown in Fig. !. The resulting locus-target plots for dentals<br />
can be seen in Fig. 3. For speaker PaT there was clearly more<br />
coarticulation in spontaneous speech where k=. 45 while the<br />
corresponding value for the reference words was k=. 25. For<br />
speaker AV there was less of a difference: k=. 47 in spontaneous<br />
speech and k=. 43 for reference words.<br />
A low F2 in a preceding vowel usually lowered the frequency of<br />
the initial locus slightly while a high F2 had the opposite<br />
effect. To check the amount of influence from preceding vowels,<br />
the slope of the regression line for speaker AV was calculated<br />
for cases where Idl was preceded by another dental consonant.<br />
The<br />
result showed very little difference in slope (k=. 49 as compared<br />
to K=. 47) but there was much<br />
less variation in locus<br />
frequencies at a given target value.<br />
The second stop consonant investigated was Ib/. Earlier results<br />
referred to above and the fact that the tongue is not involved in<br />
48
DENTAL (SPONT. SPEECH)<br />
DENTAL<br />
(REFERENCE WORDS)<br />
SPEAKER:<br />
PAT<br />
SPEAKER:<br />
PAT<br />
2.5<br />
F2LOC=.45X+.81<br />
"-<br />
N<br />
I<br />
<br />
2.5<br />
F2LOC=.25X+1.1'3<br />
N<br />
LL<br />
a; 2.0<br />
o<br />
LL<br />
N<br />
LL<br />
a; 2.0<br />
o<br />
LL<br />
(j)<br />
::J<br />
U 1. 5<br />
o<br />
-J<br />
(j)<br />
::J<br />
U<br />
o<br />
-J<br />
x<br />
-J<br />
their production led us to expect more coarticulation with labial<br />
consonants, and such was also the case here. There was also more<br />
coarticulation in spontaneous speech: for both speakers, the<br />
regression line had a lesser slope in the case of reference words<br />
(Fig. 4). In the spontaneous speech of speaker PaT the value of<br />
k=. 96 indicated an almost maximal dependency on the following<br />
vowel for most of the cases. However, on the spontaneous speech<br />
plots of both speakers, there were locus-target points that<br />
formed their own group apart from the other points, their locus<br />
frequency being lower than could be expected from the target<br />
value. These points - marked with "x" on the plots all had<br />
their origin in the same word: t b: raJ 'only' which had<br />
undergone different degrees of reduction, the most extreme case<br />
being [ba3. (It should be noted that gara is a word that tends to<br />
be reduced more than other content words. )<br />
3. Discussion<br />
The results show more overlapping of CV in spontaneous speech.<br />
Why should this be so? One possible explanation may lie in the<br />
time factor: no systematic comparisons of word durations have<br />
been carried out yet, but random samples all showed compressions<br />
in word length. The word , for example, showed great length<br />
variation in the spontaneous speech of both speakers, but even<br />
the longest item of bara in sponaneous speech was 45% of the<br />
reference version for speaker PaT, and 60% for speaker AV; the<br />
shortest item for both speakers was only 13% of the reference<br />
version.<br />
A shortening in duration can affect the relative timing,<br />
and thus the coarticulation, in adjacent segmental gestures,<br />
although some speakers seem to be able to avoid this effect by<br />
50
"<br />
LABIAL (SPONT. SPEECH) LABIAL (REFERENCE WORDS)<br />
SPEAKER: PAT SPEAKER: PAT<br />
3.0 3.0<br />
N<br />
I<br />
<br />
"<br />
N<br />
2.5 F2LOC=.%X+.03 I<br />
2.5<br />
'V<br />
F2LOC=.81X+.21<br />
('J<br />
I..i..<br />
0::<br />
0<br />
I..i..<br />
Ul<br />
::l<br />
u<br />
0<br />
-l<br />
-l<br />
a:<br />
I-<br />
Z<br />
2.0 0:: 2.0<br />
0<br />
° I..i..<br />
° °0<br />
Ul<br />
0<br />
::l<br />
1.5-<br />
speeding up their articu latory movements (Kuehn and Mo ll, 1976;<br />
Gay, 1981). An increase of coarticu lation with a faster speaking<br />
rate has also been shown for Swedish by Engstrand and<br />
Nordstrand<br />
(1983) through measurements of initia l and media l formant<br />
frequencies in vowe ls (corresponding to locus and target in this<br />
investigation). Moreover, Engstrand (forthcoming) measured<br />
utterances of two Centra l Swedish speakers, especia lly Ipi pu<br />
pal, on x-ray fi lm. He found re lative ly litt le coarticu lation<br />
when speech rate and stress were norma l; at a fast speech rate<br />
coarticu lation increased,<br />
especia lly in stressed sy l lab les.<br />
It is also possib le that a dimension of more or less clear<br />
pronounciation - "hypo" and "hyper" speech - has inf luenced the<br />
coarticu lation in our experiment.<br />
This dimension can be<br />
independent of speech rate, though<br />
instead dependent on,<br />
for<br />
example, socia lly or communicative ly determined factors (Lindb lom<br />
and Lindgren, 1985) .<br />
A<br />
special exp lanation is necessary for the locus-target relation<br />
in bara: why is there not the same amount of coarticu lation here<br />
as in the rest of the words beginning with IbVI or even those<br />
beginning<br />
with<br />
Iba/? In Swedish the phoneme lal has two<br />
phonological ly<br />
distinct lengths. The length distinction is<br />
accompanied by a dif ference in timbre whose main acoustic<br />
correlate is the frequency of the second formant: the long<br />
variant has an F2 of about 1000Hz for ma le speakers whi le the<br />
corresponding frequency in the short variant is about 1250Hz.<br />
Dif ferences in timbre between short and long vowe ls can be<br />
perceptually re levant even if the length distinction is removed<br />
(Hadding-Koch and Abramson, 1964) . Thus when depending on e. g.<br />
52
the speech tempo, the length of a long and a short lal over lap,<br />
their timbre can sti ll make them clear ly perceivab le as<br />
distinct<br />
sounds.<br />
In the word the anoma ly on the locus plot for spontaneous<br />
speech lay in the fact that the target va lue of the second<br />
formant in the reduced version had risen to about 1200-1300Hz<br />
whi le the locus frequency had retained its va lue appropriate<br />
for<br />
the long variant of lal (in words with an origina lly short<br />
version of lal, the locus lay at 1200-1250Hz). To begin with, we<br />
looked for an exp lanation of the anoma ly in the phonetic features<br />
of the adjacent sounds. However, an investigation showed that<br />
this exp lanation was insufficient: for speaker AV, for examp le,<br />
was in eight of the nine cases preceded by lal, lei or a<br />
denta l consonant and in all but two cases fo l lowed by segments<br />
that cou ld not have raised F2: labia l consonants, back vowe ls, a<br />
pause. For speaker PaT the word was in four cases preceded by<br />
lei, in one by lal and in two by denta l consonants. In his case,<br />
however, the word was fo l lowed by consonants that may have<br />
raised F2, but in such cases the locus shou ld have been expected<br />
to be raised too (see word lists in Appendix). The same was true<br />
if the second formant was raised by the Irl of which there was<br />
sometimes a trace present. Some examp les of are shown in<br />
Fig.5,<br />
together with an examp le of<br />
the word babbe l which has<br />
origina lly a short variant of la/.<br />
It can be argued that there is no reason for us to expect the<br />
locus as a function of the target to form a straight line. There<br />
is, however, reason to expect a straight line on theoretical<br />
grounds. We cou ld think of the lip opening after Ibl re lease in<br />
53
in terms of a lip rounding. The effect of rounding has been<br />
ca lcu lated by Fant (1960) for articu latory configurations with<br />
constrictions situated from the lip opening to the glottis.<br />
If we<br />
choose constriction locations from about 5. 5cm to 13cm from the<br />
that<br />
is,<br />
corresponding to vowe ls with pa lata l to<br />
pharyngea l constrictions,<br />
and choose two curves within this<br />
section, one for an unrounded vowe l and another with a lip<br />
rounding, we get two curves with a slow ly diminishing distance<br />
between them, the curve with the rounded va lues lower in<br />
frequency. When the lips open after a labia l consonant they are<br />
rounded, we can therefore think of the rounded articu lation as<br />
the locus and the unrounded as target; a locus-target plot in<br />
this case wou ld form an almost straight line.<br />
It therefore seemes that in the cases where ba: raJ was reduced<br />
to (bal the speakers pronounced an Cal but that sub limina lly<br />
there sti ll was an to.: :1 present. This raises a question that<br />
ca lls for further investigation: is the change from (Y bn: ra) to<br />
tba) an examp le of a phono logica l process and the change by<br />
definition discrete; or is it a phonetic transformation and can<br />
therefore be continuous? The present data seem to indicate that<br />
the change is phonetic: first ly, because the speakers seem to<br />
begin to say one sound and continue with another, but also<br />
because the disappearance of Irl does not happen in one step<br />
but<br />
is gradua l.<br />
55
A B c D<br />
f<br />
Fig.6 The reduced form [val from the words A: yar; B: vag;<br />
as a tag question.<br />
C,D:<br />
Prel iminary investigations indicate that the same reduction<br />
effect as in also appears in words like vara - 'be'; ar -<br />
'was'; and 'what'. In these cases, the consonant is<br />
normally deleted in fluent speech and the words are pronounced as<br />
Cv:J or reduced to (va). As was the case with the second<br />
formant in the reduced form (va) began lower than in words that<br />
have a short taJ to begin with. There was an exception, however:<br />
if the (va) used appeared as a tag question at the end of a<br />
sentence its F2 locus began at the frequency of the target as<br />
in<br />
words with an original (aJ, indicating that (val (from) in<br />
this function probably has been lexicalized with the short<br />
variant of lal (Fig.6). In the case of Ivl the coarticulation<br />
56
with the following vowel was even stronger than for biabial<br />
stops, therefore there appeared no anomaly on the locus-target<br />
plot as for labial stops. However, the loci for the reduced form<br />
(val which were not tag questions lay always below the regression<br />
line while the loci for (val used as a tag lay on or above the<br />
line.<br />
Our preliminary investigations have also shown that the<br />
differences in the amount of coarticulation between spontaneous<br />
speech and reference words similar to those in stop<br />
exist even for other dental and labial consonants.<br />
consonants<br />
In the case of<br />
nasals the difference may be even greater.<br />
The research reported here has been supported by the Bank of<br />
Sweden Tercentenary Foundation, grant nr 86/109.<br />
57
APPEND IX<br />
WORD LISTS<br />
If there was more than one occurrence, the number is given in<br />
parentheses.<br />
CV of first syllab le of the word in the midd le column was the one<br />
measured,<br />
the columns to the left and right give the context.<br />
Speal.:er AV:<br />
nej banne mig<br />
det bara vissa<br />
for bara no ll<br />
hant bara for<br />
inte bara perception<br />
jag bara och (2 )<br />
jag bara runt<br />
man bara pi1<br />
sl.:all bara sitta<br />
inte be El is<br />
Bengtson behovde<br />
Eli s Bengtson (6)<br />
att<br />
beral.:na<br />
inte betala<br />
mer<br />
beta la<br />
ocl.:si1 beta lt<br />
liten bit fri1n<br />
jag bor just<br />
har bott i<br />
laser bocl.:er<br />
pi1<br />
borjan<br />
en dag att<br />
en dag di1<br />
fri1n dag till<br />
ti 11 dag **<br />
andra dagar si1<br />
soml iga dagar si1<br />
di1 dags att<br />
var dags att<br />
Bengtsons dat- sagan<br />
Bengtsons datamasl.:in<br />
oh<br />
datamasl.:iner<br />
en<br />
dator<br />
och den I.:oper<br />
och den 1 i gger<br />
ah den ar<br />
** det 1 i -<br />
** det 1 i gger<br />
** det ar inte (de'nte)<br />
** det ar halvt<br />
58
ja det var<br />
att det blir<br />
att det borde<br />
att det ar<br />
haf t det he II er<br />
han det att<br />
han det oc:h<br />
just det oc:h<br />
med det dar<br />
men det ryms<br />
mnaden det ar<br />
oc:h det gjorde<br />
p det sattet<br />
skriva det man<br />
ti II det Kunde<br />
ut det har<br />
eh det visade<br />
han dit oc:h<br />
alia dom skatter<br />
ja du viii<br />
ett dugg skatt<br />
# d blev<br />
den d #<br />
oc:h d blev<br />
oc:h d kom<br />
oc:h d sa<br />
satt d oc:h<br />
va l d dags<br />
ar d sprk<br />
bott dar det<br />
den dar I i II a<br />
den dar maskinen<br />
det dar #<br />
det dar det<br />
det dar oc:h<br />
vagen dar ungef ar<br />
fervnad darfer att<br />
Speaker PaT:<br />
myc:ket babbe l (2) #<br />
babb la-babb la-babb la<br />
oc:h<br />
badrum<br />
han bara en<br />
erat bara enke lt<br />
hade bara ett<br />
ar bara grinig<br />
ska ll bara hamta<br />
ar bara nnting<br />
har bara tio<br />
ar bar a tre<br />
har bara tio<br />
ar bara tre<br />
det bara ar<br />
in<br />
barnen<br />
utf odrar barnen<br />
andra barnet<br />
ena<br />
barnet<br />
har<br />
boken<br />
59
tt<br />
borde<br />
ka bart of tare<br />
ker buss gor<br />
av<br />
bussen<br />
vid<br />
busshllplatsen<br />
det<br />
basta<br />
mycket battre<br />
det<br />
borjar<br />
en dag och<br />
en dag s<br />
tt<br />
daghem<br />
p<br />
dagis<br />
p<br />
Danderyd<br />
saga datum<br />
tt<br />
datum<br />
enligt den har<br />
och det mste<br />
tt det ringer<br />
och det stammer<br />
tt det ar<br />
med det ar<br />
s dog han<br />
s du har<br />
och d borjar<br />
och d hade<br />
-vis d hem<br />
sig d inte<br />
utsikt d ocks<br />
och d ser<br />
och d springer<br />
60
REFERENCES<br />
Delattre, P., Liberman,<br />
loci and transitional<br />
27, 769-773.<br />
A.M., and Cooper, F. S<br />
cues for consonants. J.<br />
(1955). Acoustic<br />
Acoust. Soc. Am.<br />
Engstrand, O. (forthcoming). Articu latory correlates of stress<br />
and speaking rate in Swedish VCV utterances. J. Acoust. Soc. Am.<br />
Engstrand, O. and Nordstrand, L. (1983). Acoustic features<br />
correlating with tenseness, laxness, and stress: prel im inary<br />
observations. RUUL 11. Dept. of Linguistics, Uppsala University.<br />
Fant, G. (1973). Stops<br />
features. MIT Press.<br />
in CV sy llab les. In Qeech sounds <br />
Gay, T. (1981>. Mechanisms<br />
Phonetica 38, 148-158.<br />
in the contro l of speech rate.<br />
Hadding -Koch, K. and Abramson, A.S. (1984). Duration versus<br />
spectrum in Swedish vowels: some perceptual exper iments. Studia<br />
Linguistica, Lund, 94-107.<br />
Kuehn, D.P. and Moll, K.L. (1976). A cinerad iographic study of VC<br />
and CV articu latory velocities. J. Phon. 4, 303-320<br />
Lindblom, B. (1963) . Spectrographic study of vowel reduction. J.<br />
Acoust. Soc. Am. 35, 1773-1781.<br />
Lindblom, B.<br />
syntes av<br />
Lingu istics,<br />
and Lacerda, F. (1985). Akustiska uttalsstudier<br />
svenska II. Projektbeskrivning. Inst itute<br />
University of Stockholm.<br />
for<br />
of<br />
Lindblom, B. and Lindgren, R. (1985). Speaker-l istener<br />
interaction and phonetic variation. PER ILUS, Report IV. Inst itute<br />
of Linguistics, University of Stockholm.<br />
Lindgren, R., Lindblom, B., and Krul l, D. (1986). Phonetic<br />
variation in natural speech. Status and progress report I.<br />
Institute of Linguistics, Un iversity of Stockholm.<br />
ohman, S. (1966). Coarticu lation in VCV<br />
Spectrographic measurements. J. Acoust. Soc. Am. 39,<br />
utterances:<br />
151-168.<br />
61
AN UNMARKED DIALOG?<br />
Exploring Discourse Intonation In Swedish<br />
Madeleine Wulffson<br />
Is<br />
there<br />
phoneticians<br />
distinguishing<br />
such<br />
a like<br />
a<br />
thing as<br />
have<br />
long<br />
an<br />
unmarked<br />
spoken<br />
of<br />
dialog? Linguists and<br />
sentence<br />
Intonation,<br />
between 'marked' and 'unmarked' versions of any given<br />
sentence or utterance. The standard 'unmarked' version of a statement,<br />
It has been found, wi I I usually figure with some sort of 'global fal I'<br />
(Lieberman 1967, Bruce 1984) whereas a 'neutral question' wi I I usually<br />
figure with some sort of rising contour,<br />
Studies<br />
at least towards the end.<br />
of 'marked' versions have often been concerned with devices<br />
for Inviting shifts of focus from one element to another,<br />
Q: "What color cottage did you stay In last summer? "<br />
A: "We stayed In a RED cottage. "<br />
of the type:<br />
In the case of Swedish, Issues relevant to, e. g. the word accents, have<br />
been greatly I I lumlnated by this type of 'lab language' analysis. But<br />
when It comes to live Interactive dialog, a different approach Is<br />
required to handle the communicative value of Intonational phenomena.<br />
This article proposes just such an approach to the study of<br />
Intonation In discourse.<br />
Developed In England for the Engl Ish language,<br />
It has proven equally effective and applicable to the Swedish language as<br />
wei I, (see Wulffson 1987) with only certain technical modifications being<br />
required to accommodate the morpho-lexical phenomena of the Swedish<br />
accents.<br />
tone<br />
The Discourse model of Intonation In question, developed by David<br />
Brazl I at the University of Birmingham, represents a finite set of<br />
meaningful variables which are the result of either/or linguistic choices<br />
made on the part of the speaker,<br />
the state of existential,<br />
on the basis of an ongoing assessment of<br />
here and now convergence or divergence between<br />
speaker and hearer, encoded and decoded respectively In real-time.<br />
These functional oppositions are represented by relatively easily<br />
recognizable Intonational phenomena of relative pitch level and pitch<br />
direction. The purpose of this article Is not, however, to present the<br />
model per se. This has been done more than adequately In other<br />
publications, particularly Brazl I 1985, "The Communicative Value of<br />
Intonation In Engl Ish". More recently a study of the Swedish<br />
Implications of the model wi I I be found In Wulffson 1987. A brief<br />
summary of the meaningful variables subsumed In the model,<br />
taken from the<br />
latter mentioned publication, Is to be found In Appendix 1. Further, In<br />
Appendix 2, Is a summary of the transcription conventions which have been<br />
slightly modified to accommodate the central Swedish tone accents. In<br />
the present analysis of Skane Swedish,<br />
these modifications are of lesser<br />
62
consequence due to this dialect's lack of 'dubbeltopplghet' ('doubletopped-ness')<br />
In words carrying accent 2 so characteristic of central or<br />
stockholm Swedish. Instead the Interactive dlscoursal aspects of<br />
Intonation In Swedish wi I I be the center of focus.<br />
The purpose of this article l!, on the other hand, to explore ways<br />
In which the Discourse Model can effectively be used to I Ilumlnate a<br />
whole conceptual area of Interaction In discourse, that suppl led by<br />
Intonation , which has largely been neglected In Swedish up to now. This<br />
wi I I be done In the fol lowing manner: A snippet of dialog wi I I be<br />
compared Intonational Iy with Itself, so to speak. That Is, two versions<br />
of the same dialog, one being what could be cal led a 'marked'verslon, and<br />
the other a sort of 'unmarked' version. With the aid of the Discourse<br />
Model, the concepts of 'marked-' and 'unmarked-'ness, wi I I be seriously<br />
questioned, as the title suggests.<br />
A thorough-going analysis of both versions wi I I be attempted, taking<br />
Into consideration a certain range of the various options open to the<br />
speaker at any given moment. Each configuration Is an Interplay of<br />
semantic, grammatical and Intonational factors In a unique here-and-now<br />
context. Furthermore It should be kept In mind that al I of the various<br />
oppositions are available for exploitation and manipulation by the<br />
speaker. Within the system there are 4 basic factors, PROMINENCE, TONE,<br />
KEY, and TERM INAT ION, which can be divided Into 13 subfactors, each of<br />
which constitutes a potential meaningful contribution to the discourse.<br />
In addition, the exclusion of the non-chosen factors, that Is, that one<br />
factor (Ie prominence as opposed to non-prominence) , Is chosen over the<br />
other, can also be said to be meaningful.<br />
The 'marked' version Is the spontaneous one, plucked out of a taped<br />
conversation where two people, A and B, (both from the Skane part of<br />
Sweden) , are speculating over a photograph of a woman (whom they don't<br />
know, ) sitting at an outdoor cafe. The discussion has been going on for<br />
some time, and the two people have bul It up a broad picture of the<br />
woman's personality and activities before the bit which we have picked<br />
out comes In. The question they are discussing at the monent Is what<br />
this lady does with her free time, and the conclusion Is that, although<br />
there are not many theaters near where she lives, (somewhere In the midwestern<br />
part of the U. S. A. ,) she does enjoy going to the theater. In<br />
actuality the photograph was taken In a Piazza In Rome, and the subject<br />
was a lady Professor of Economics, but this Is of no Importance. The<br />
Important thing Is that we are dealing with a lively, spontaneous, and<br />
natural conversation between two col leagues and friends.<br />
The 'unmarked' version, on the other hand, Is an unabashed product<br />
of laboratory manipulation, arrived at by the fol lowing process:<br />
Each single utterance of the original bit of dialog was copied down on a<br />
separate slip of paper and given to the original speakers In shuffled,<br />
random order. The utterances were then re-recorded, one by one, with<br />
'neutral' Intonation. The 'this Is what Is written on the paper'<br />
Intonation. Subsequently the re-recorded utterances were computer-<br />
63
spl Iced together again, Humpty Dumpty style, In the same or der as the<br />
original dialog, with 200 ml I I I seconds space between each one.<br />
Now, It Is often said that a picture Is worth a thousand words. In<br />
this case we could say 'a bit of listening Is worth a thousand words'.<br />
Quite simply, the result of a couple of minute's listening to the two<br />
versions of the dialog can only lead to one conclusion. That Is, that<br />
the first dialog Is - utterly nor mal and natur al, and the second Is - not<br />
a dialog at al I but mer ely a conglomeration of separ ate utterances,<br />
strangely cohesive In textual content but totally lacking In any kind of<br />
communicative Interaction between the speakers. A real linguistic<br />
Frankenstein.<br />
So what went wrong? The speakers were the same In both versions.<br />
The words wer e the same, or practically the same. The timing was such<br />
that the utterances came In rapid, natural-I Ike succession. Only the<br />
Intonation was different. Actually, the subjects were rather concerned<br />
that, despite the precautions taken, they had 'remembered' the original<br />
conversation and had re-recorded the bits with that In mind, thus<br />
disturbing the equl I Ibrlum of the sought - after 'neutrality' of the<br />
second version. These fears were unfounded. The resulting dialog Is so<br />
'neutral' as to be ... untenable as a dialog.<br />
The Spontaneous Version<br />
But first things first.<br />
In Its transcribed entirety:<br />
We wi I I start with the spontaneous version<br />
-t<br />
l'<br />
A: II r+ j' UNDrar om hon I nte LASer II<br />
B :<br />
A-<br />
II r + MHM II<br />
A: II r roMANer II II r+MHM II<br />
'"<br />
B: 110 hon laser PA - II 1\ P DELt Idll<br />
'"<br />
A:<br />
II<br />
l'<br />
p nej jag MENar han laser<br />
-t<br />
roMANer II<br />
B:<br />
1\ p<br />
-t<br />
jasa hon laser roMANer<br />
A:<br />
B:<br />
A:<br />
B:<br />
1-<br />
II r+ JA II<br />
(skr a ttl laugh)<br />
\10 jag tro' du mena' HON -\lP laser<br />
1'f'<br />
II r+ NEJ IJ<br />
't<br />
UP JA j a 1\<br />
p det MENade jag<br />
p<br />
jo hon LASer<br />
Inte<br />
II<br />
NOG II 0<br />
pa DELtld<br />
sa N' Ar<br />
II p KVALLSt I d II<br />
-¥<br />
STORA- 1/<br />
64
l'<br />
A: \\ r+ MHM<br />
II<br />
B: (clears thr oat) 1\ p VERK P av TolSTOY och DostoYEVsky II<br />
+<br />
A: 1\ r+ JA \I p pr eill II<br />
A begins with: "I wonder If she reads. " Dlscour sally, she Is<br />
Intr oducing a new topic for discussion, that of reading as a fr ee-time<br />
activity. It Is a logical step to suggest, as the two discussants have<br />
recently decided that the woman's posslbl I Itles for cultur al enr ichment<br />
are limited by her 'small-town' envir onment. Intonatlonally the last<br />
boundar y was marked off by pitch sequencing low ter mination, so the<br />
maximally disjunctive high key on 'UNDr ar ' clearly sets off the utter ance<br />
as a new stage In the discour se. The higher pitch level also lends a<br />
par ticularizing function to the segment which might al low for a gloss<br />
such as: 'Reading Is the Item I choose out of a whole set of possible<br />
activities she might engage In, such as watching TV, sewing, going out to<br />
night clubs with fr iends, etc. The tonic, 'LASer ', also car ries the<br />
simple rising tone r + (), or dominant referr ing tone of the Discour se<br />
Model. One can say that speaker A 'refer s' (R tone) to reading as a<br />
possible free-time activity which both speakers ar e privy to, as wei I as<br />
an assumption that that the lady does, Indeed, read. But since A Is, at<br />
the moment, Intr oducing a new element to the discussion, and taking the<br />
Initiative, she Is wei I Justified In her choice of the dominant version<br />
of the referr ing tone. A P() tone on the other hand, might have<br />
projected the sense : 'This Is a possibility for consider ation, I have<br />
absolutely no pr e-assumptions on the subject, (so tel I me what you<br />
think. )' The fur ther aspect of high ter mination br ings an expected or<br />
projected response type Into focus. Her e the functional significance of<br />
the high ter mination choice could be glossed as: 'Do you, or do you not,<br />
think that she reads In her free time? ' She Is Inviting a polar -type or<br />
adjudicating response. Bul It Into the high ter mination Is the<br />
'expectation' of a high key, yes or no type rejoinder. Which she gets:<br />
'"<br />
\I r + MHM II<br />
Speaker B cooper atively affirms her Idea with a high key r + tone,<br />
satisfying both pitch concord expectations and response type. A mid<br />
key response might have carr ied an Implication of something less than<br />
agreement, I Ike: "yes, or . .. " or "maybe". A low key response could<br />
possibly have had the effect of mild disagreement, such as, perhaps,<br />
"Maybe, but I don't really think so. " Low key car ries an equivalence<br />
function which, combined with a dominant r + tone and maximal concord<br />
br eaking, could very likely have the effect of er adicating the<br />
suggestion, and meaning something I Ike "We'r e back to wher e we star ted<br />
befor e you came up with this dumb Idea. "<br />
65
The relatively equal social status of the two speakers al lows for a<br />
free and open 'game of catch' with regard to who Is In (temporary) charge<br />
of the discourse. In fact this Is a dialog between two friends and<br />
col leagues, a woman and a man, of the same age and In the same line of<br />
work. Here he chooses the dominant version of the referring tone, as It<br />
Is his turn to 'judge'.<br />
In terms of turn-taking (In the Sachs, Schegloff and Jefferson<br />
tradition) It could be said that A has yielded the turn to 8, who now<br />
sets out elaborate on A's suggestion. But A suddenly realizes that more<br />
precision Is needed, and hastens to Insert this afterthought even at the<br />
risk of Interrupting 8's turn (which he has clearly established and<br />
embarked upon. ) The result Is a two-dimensional overlap (there Is no<br />
overlap In the spl Iced version of the dialog, which enhances Its highly<br />
unnatural quality) :<br />
A: \I r roMANer II<br />
-t<br />
1/ r+ MHM 1/<br />
8: I/ o hon laser PA 1/ II p otLt I d II<br />
With simple referring () tone, A 'refers' to two things: that<br />
'novels' was what she meant to say, and/or that If the woman reads, It Is<br />
probably novels. She also 'refers', existentially, to her own<br />
expectation that this Is also 8's understanding. Her choice of mid key<br />
marks the tone unit as 'additive' - an additional bit of Information,<br />
and simultaneous mid termination projects an expectation of concurrence<br />
on 8's part.<br />
Meanwhl Ie 8 has set off on an entirely different 'tack', and whl Ie<br />
Indeed 'agreeing', he Is agreeing with the wrong thing! The col I Islon of<br />
simultaneous speech causes 8 to break off his tone unit midway, resulting<br />
In an oblique zero tone (-7). Satisfied that all Is clear by now, A<br />
encouragingly chips In with a high key r+ tone on 'MHM', antlclpatorlly<br />
adjudicating 8's contribution to be correct. 'Reference' here Is to this<br />
assumed mutuality.<br />
But then comes the 'bombshel I'. As 'Iasa' In Swedish means both<br />
'read', and 'study', It turns out that 8 Is presenting, with a high key<br />
p ( ) tone, that Indeed the woman studies but only part-time, as<br />
the contrastive nature of high key Implies, rather that ful I-time as she<br />
might have done otherwise. So A Is compel led to jump In with a quick<br />
correction:<br />
A:<br />
l'<br />
'"<br />
ne j jag MENar hon laser roMANer 1/<br />
In this repair sequence, the high key of 'menar' carries contrastive<br />
value, which here could be glossed as 'I mean TH IS (novels) , not THAT<br />
(part-time, as you said) . The proclaiming tone on 'romaner' can be said<br />
to present this bit as decidedly new to the discourse, a world-changing<br />
Increment to the unfolding argument.<br />
66
But now let's take another brief excursion Into the hypothetical,<br />
Into what have happened. Had she said Instead, for example:<br />
'"<br />
* 1/ p jag MENar att hon LASer romaner II *<br />
A'<br />
It would have sounded extremely odd In this context, as It would have<br />
laid contrastive emphasis on 'READ', rather than on 'NOVELS' as If to<br />
say, for example, 'She doesn't READ novels, she WRITES them', which would<br />
be quite Impossible In the logic of this contextual and Interactive<br />
setting. The subtle effects of selectivity or non-selectivity In the<br />
prominence system can easily make nonsense out of an otherwise perfectly<br />
cohesive and coherent text, lexically and grammatically speaking.<br />
But back to our or iginal version. It Is Indeed the word 'romaner'<br />
which receives prominence, being presented contrastively with 'deltld' as<br />
a correction of a misunderstanding. A's high termination further<br />
projects the expectation of a high key polar, adjudicating response.<br />
Which she gets:<br />
B:<br />
1-<br />
jasa hon laser roMANer<br />
II<br />
The high key fal ling tone 'proclaims' the whole as - 'This Is definitely<br />
new<br />
and not common ground'.<br />
B's high termination In turn<br />
'expects'<br />
a<br />
high key yes/no type response.<br />
Which he gets:<br />
A: 1/ r+<br />
,f<br />
MHM<br />
11<br />
In natural dialog the phenomenon of pitch concord Is perhaps one of<br />
the most striking Intonational features. In our 'unmarked dialog', on<br />
the other hand, the absence of this Interactive play of pitch Is one of<br />
the most obvious deficiencies.<br />
B:<br />
110<br />
jag tro' du<br />
mena' HON - II p<br />
laser pa DELtld<br />
II<br />
If'<br />
A:<br />
1\<br />
r+ NEJ<br />
/I<br />
B:<br />
\I p KVALLStld<br />
-V<br />
\I<br />
p<br />
-1'<br />
det MENade<br />
jag INte /I<br />
B's contribution here, apart from the o-tone In the first tone unit,<br />
(clearly due to verbal planning) , Is marked by proclaiming tones. Parttime,<br />
evenings, was what he thought she meant, and the proclaiming tones<br />
here underscore the separateness of their two worlds, the lack of common<br />
ground. It Is almost Impossible to Imagine a line of thinking at this<br />
point which would al low for referring tones In this context. A<br />
conceivable, though unlikely, gloss type for a referring tone here would<br />
be some sort of reproachful reminder that, If It was not shared<br />
knowledge, It should have been. If A didn't mean studying part-time, she<br />
should have! But given the cooperative atmosphere of this conversation,<br />
such a reproach would appear distinctly out of character.<br />
67
Now let us take a closer look at the last two tone units In S's<br />
utterance:<br />
<br />
p laser pa DELtld n p KVALLStld l\<br />
"y<br />
Further back In the conversation, before our snippet begins, A and S have<br />
decided that the woman In the photo Is an executive secretary, a real<br />
high-powered, go-getter type. Therefore she must work hard, al I day<br />
long. So If she studies part-time, this must, perforce, take place In<br />
the evening, as no other time Is available on weekdays. Speaker S drops<br />
to low key In the tone unit \\ p KVALLStld II '<br />
clearly, though<br />
"if<br />
subconsciously, exploiting the equivalence function of low key In order<br />
to present the two as existentially one and the same. (A mid key<br />
realization of 'KVALLStld', which would have presented 'evenings' as a<br />
new added bit of Information, might have sounded odd or even<br />
condescending - the subject of the woman's working having been so<br />
recently discussed and settled upon. ) Artificial tone or key switching<br />
can twist the message In such a way that an otherwise perfectly correct<br />
and logical utterance Is rendered nearly or totally unlntel Ilglble.<br />
It wi I I change the whole psychological effect.<br />
The second tone unit of S's utterance (' laser pa DELtld') , being a<br />
potential point of syntactic completion, Is a vulnerable place for<br />
Interruption or overlap. A has anticipated this 'closing point', and<br />
hastens to compete for the floor with a high key contrastive 'NEJ',<br />
Or,<br />
which<br />
overlaps with S's 'KVALLStld'. It Is not ' DELtld' she means, as he<br />
thinks, but 'roMANer'. In the here and now world of A and S's<br />
speculative<br />
conversation there Is an existential set of 2 posslbl I Itles,<br />
either 'novels', or 'part-time'. The Saussurlan general paradigm would<br />
never see 'novels' as the opposite of 'part-time', but here, these are<br />
the two existentially available choices, the 'existential paradigm' of<br />
the Discourse Model, marked Intonational Iy through both the prominence<br />
and the key systems.<br />
In this case high key on 'NEJ', Is chosen of necessity to effect a<br />
repair of the misunderstanding. Since this Is decidedly a competitive<br />
moment,<br />
(A and S are competing for both the turn and their own points of<br />
view) , a simple, non-dominant r () tone would have conceded a measure<br />
of agreement such that It might very possibly have led to a different<br />
'decision', Ie. , that perhaps the woman studied part-time after al I.<br />
(Please recal I that neither participant actually knows the woman's real<br />
character or activities, but rather are speculating about what she <br />
be I Ike or do. The decision could logically go either way. So the<br />
dominant r+ tone (..;:;f ) which A has chosen, Is by far the most<br />
appropriate and effective,<br />
same time firmly maintaining control.<br />
establishing social togetherness whl Ie at the<br />
She continues:<br />
1\ p '"<br />
det MENade jag INte \I<br />
The p tone () Is fu I I Y appropr I ate here to c I ar I fy and re-state<br />
68
the nature of the misunderstanding. The high key of 'MENade' (meant)<br />
underscores the polarity pair 'mean' vs 'not mean',<br />
but she concludes the<br />
tone unit with mid termination, (' INte') , In clear expectation of a<br />
concurring response. A gloss here might be: 'Now I expect that this<br />
little misunderstanding Is cleared up<br />
and that you are In agreement with<br />
me on this point. ' With mid termination she sets up an expectation of a<br />
mid key response as wei I .<br />
Which she does not get. B agrees, al I right, and gives up on his<br />
own Idea of part time studies In favour of novel reading.<br />
But he chooses<br />
to adjudicate rather than simply agree,<br />
and simultaneously signals (high<br />
key) his Intention to Introduce a heretofore new topic, that of what she<br />
reads. The breaking of the pitch concord expectation Is also a feature<br />
of discourse control or dominance, and In fact the turn or move that<br />
fo I lows const I tutes a fu II presentat Ion of the new top I c. I t is now B<br />
who 'decides' (the '+' or dominance factor) the nature of the reading<br />
material. A chips In, at a pause point whl Ie B Is clearing his throat<br />
and planning his strategy, with a concurring mid key: II r + MHM \\ to<br />
re-establ Ish togetherness and encourage B's line of thinking. B<br />
continues with the pr oposition that It Is works of Tolstoy and<br />
Dostoyevsky that the woman reads.<br />
These grand author s are presented with<br />
mid key, as constituting additional valuable Information, wher eas the mid<br />
termination<br />
key response.<br />
of the tone unit sets up the expectation of a concurring mid<br />
Which he doesn't get:<br />
..t-<br />
1/ r+ JA II p preC I S IJ<br />
A retains her part of the control ling role of the discourse by again<br />
breaking pitch concord expectations (an expression of dominance) , and<br />
adjudicating (high key) Instead of simply concurr ing, or 'chiming In<br />
agreement' (mid key) . In this second by second Interactive ml I leu, she<br />
'decides' ( the ' + ' factor) that this Is Indeed the type of reading matter<br />
In question.<br />
A gloss might be:<br />
' Yes,<br />
we are In agreement but Is Is I<br />
who am Judging that now. ' The final tone unit proclaims (p tone ) the<br />
correctnesss of the suggestion with a concl I latory mid termination.<br />
Although, as It happens, the subject Is closed after this, and a new one<br />
opened, A Is so delighted with having 'won' their little competition,<br />
that she does not close the subject with low termination, which would<br />
have const I tuted pitch sequence closure as we II.<br />
I nstead she emphas I zes<br />
again the agreement aspect of the exchange, by ending her comment<br />
'preCIS' ('precisely', 'right') with mid termination, thus leaving sense<br />
of concurrence 'In the air' to be savoured during the pause that<br />
fol lows<br />
before the next question Is taken up.<br />
69
The 'Unmarked Version'<br />
Now<br />
let's look at our Frankenstein version which<br />
lived<br />
but<br />
remained<br />
a monster.<br />
A:<br />
1/ p A non - dialog dialog:<br />
ja' UNDrar om hon INte LASer<br />
II<br />
'¥<br />
B: \1 r+ MHM 1/<br />
A:<br />
II p<br />
roMANer II<br />
B:<br />
li p<br />
A:<br />
li p<br />
hon LASer pa DEL tid II<br />
<br />
nej jag MENar att hon I ser roMANer II<br />
<br />
B:<br />
li<br />
p<br />
A: P<br />
JAsa II<br />
JA<br />
II<br />
p<br />
hon LASer roMANer<br />
if<br />
II<br />
B:<br />
\1 0<br />
<br />
jag trodde<br />
du MENade - " p<br />
hon LASer<br />
p pa DELtld<br />
II<br />
A:<br />
li<br />
p<br />
nej det MENade jag I NTe )1<br />
<br />
+<br />
B:<br />
li<br />
p JAja II r+ JO \1 p hon LASer NOG II<br />
p sana HAR<br />
A:<br />
STORa VERK II p<br />
\\ p ja preCIS II<br />
<br />
av Tolstoy och DostoYEVsky <br />
-V<br />
A 'begins' with: p ja' UNDrar om hon INte LASer For a start,<br />
she speaks Quite slowly and overclearly, In obi IQue orientation.<br />
There Is no ' 0 ' tone, but the tone unit contains three prominent<br />
syllables, as opposed to the usual two In direct Interactive orientation,<br />
where the speaker Is equally concerned with WHO he/she Is communicating<br />
WITH, as with WHAT Is being said. In obi IQue orientation It Is the<br />
language Itself, for one reason or other, which Is In focus. Equally<br />
strange Is the fact that although A Is 'Introducing an entirely new<br />
topic', she 'begins' on mid key, as If she were merely adding a thought<br />
of little consequence to the conversation. As 'Fo downdrlft' Is In<br />
function here, the utterance carries low termination which In discourse<br />
terms projects no expectations whatever as to the type of response that<br />
would be agreeable or acceptable. This Is the usual, normal circumstance<br />
of an out of context, laboratory recording situation, but only occurs<br />
under certain circumstances In live communication. So this stance might<br />
be appropriate If A were, say, an Interviewer In a panel discussion,<br />
throwing out a topic for open discussion. But not here.<br />
70
B's 'reply' Is a mid key / termination " r+ MHM ij "MHM" Is most<br />
often associated with 'feedback or 'backchannel I lng', and In Swedish<br />
there Is a tendency for much feedback to be realized with an r+ tone<br />
(Wulffson 1987) . So B's 'lab association' was quite natural. In fact,<br />
due to this Imaginary association with a general tendency, B's 'answer'<br />
almost sounds possible. Like a bored husband, perhaps, who's trying to<br />
read his newspaper, hasn't heard a word of what his wife said, but<br />
answers anyway,<br />
just to keep the peace.<br />
A's next contribution, li p rOMANer il 'meets' pitch concord<br />
'expectations' by 'adding' that the lady In the photograph reads<br />
In a one word utterance,<br />
by saying:<br />
novels.<br />
'Fo downdrlft' has not had time to take effect.<br />
B then proceeds to close the pitch sequence (and the 'discussion')<br />
11 p hon LASer pa DEL tid J/<br />
..v<br />
with low termination. There Is no overlap here, as was to be found In<br />
t he or I gina I .<br />
But A 'disregards' the fact that the 'discussion' Is 'clearly<br />
closed'. (Our hypothetical husband wants to read his paper. ) She 'adds'<br />
(mid key) , that she meant that the woman read novels. In the real<br />
version, A protested the turn of events by use of contrastive high key,<br />
and<br />
simultaneously demanded an active polar type response through use of<br />
high termination on 'romaner'. Our hypothetical 'wife', on the other<br />
hand, simply 'tel Is' her 'husband' that In fact he was wrong and that's<br />
that. (p tone, low term I nat Ion) .<br />
B 'responds' In mid key,<br />
'agreeing' and 'accepting' this 'additional<br />
piece of Information with totally uncal led for equl I Ibrlum! He<br />
'continues' by closing off the pitch sequence again with low termination,<br />
foregoing or sacrificing a response. 'Resignation' could be the word to<br />
describe the effect of this unengaged utterance. One could perhaps<br />
object here that, since e. g. 'resignation' too Is a human attitude, the<br />
'conversation' Is after al I possible.<br />
But as we have seen and wi I I see,<br />
there Is no continuity or coherence of 'attitude' or Interplay between<br />
the speakers.<br />
Nor does A now 'feel the need' to 'liven up' the conversation.<br />
low key II p J,t-II projects what Is contextually or content-wise a highly<br />
contrastive statement as being equivalent to the last, a foregone<br />
conclusion. Low termination again projects no expectations as to the<br />
type of response which might be agreeable.<br />
B<br />
How about a divorce?<br />
plods on:<br />
\\ 0 jag trodde du MENade -li p hon LASer 1\ p pa DELt Id<br />
II<br />
Her<br />
(I thought you meant she studies part time) For a start al I of the words<br />
are clearly pronounced, whereas In the spontaneous version 'trodde' was<br />
pronounced 'tro" , and menade was 'mena" - much reduced. Also obi Ique<br />
orientation plays a role here. There Is no contextual reason for 'laser'<br />
to be prominent. B 'hammers In' a point which was already Imminently<br />
71
'clear'. He 'beats a dead horse', so to speak. This bit does not end<br />
with low termination as the others do,<br />
but It Is also the only bit with a<br />
reference to another person (du) ,<br />
so It Is likely that the 'global fal I'<br />
rule was not In effect at the time of the recording.<br />
The reference to a<br />
2nd person no doubt Influenced the person recording to Imagine an<br />
Interlocutor and an answer, but having no Interlocutor, he merely<br />
'expects' neutral concurrence.<br />
Which Is what comes, but not because A 'agrees'. She doesn't at<br />
al I. But she also has nobody to 'answer', so why should she<br />
'adjudicate'?<br />
\I p nej det MENade jag .!.!:!te 1/<br />
-+-<br />
This Is an additive mid key 'response', where In fact contextually a<br />
strong contrast Is being made to correct a mistaken Idea. Not here. No,<br />
we've closed the subject again. (Low termination. )<br />
But now finally we get some ' life' out of B:<br />
\\ p "..<br />
JA ja \I r+ JO )1 p hon LASer NOG 1\ p<br />
VERK<br />
II<br />
p av TolSTOY och DostoYEVsky II<br />
-v<br />
sana HAR STORa<br />
This being a longer utterance, It has Its own Internal structure and<br />
beginning, with 'Ja ja, jo', which even out of context suggests a polar,<br />
adjudicating function. So the high key Is, for once, quite appropriate.<br />
Except that there are other factors which render the sequence unlikely<br />
a I I t he same.<br />
The main reason we have a high beginning and a low endlng here Is<br />
related to discourse factors In a mlnl- 'out of context context. ' This<br />
Is<br />
a longer utterance and there Is more 'content' In this statement than<br />
In the others, which leads B to read It as a complete presentation of a<br />
mini-topic, without regard to any Imaginary Interlocutor's reaction. The<br />
point of the matter Is that the 'global fall' Is a physical<br />
descr Ipt Ion,<br />
whereas the pitch sequence relates to discourse meaning,<br />
which Is not by<br />
any means automatic.<br />
A pitch sequence Is phonologically defined as a run<br />
of one or more consecutive tone units which ends In low termination.<br />
It<br />
has a number of discourse functions,<br />
among them being a dominance factor,<br />
and the marking off of discrete, consecutive stages In the unfolding<br />
discourse. The global fal I Is the pitch sequence by default, so to<br />
speak, due to the lack of any other discourse factors that would bring<br />
about other configurations.<br />
A's final 'reply' sounds very odd Indeed:<br />
\I p j a pr eC I S \I<br />
-.v<br />
The low key on 'precis' presents this bit as equivalent to the last.<br />
If the 'fact' of the woman reading Tolstoy and Dostoyevsky had already<br />
As<br />
72
een negotiated. Appr oximately as far from the tr uth of the context as<br />
one could get, as the Idea Is brand new. Again low termination closes a<br />
new pitch sequence and leaves no expectations as to the 'r esponse'. A<br />
sort of 'I don't care and ther e's nothing more to say' Impression Is<br />
conveyed.<br />
In sum, It al I sounds very odd. Her e are two people 'conversing'<br />
(exchanging utterances) yet not communicating. Or at least not<br />
communicating In a way any of us would be likely to consider<br />
satisfactory. If we wer e to over hear such a conver sation, we would<br />
Immediately ask ourselves, 'Are these people Aslmovlan robots? ' or 'Are<br />
they the very worst actor s Imaginable, In the process of lear ning lines<br />
they hate or couldn't care less about? '<br />
The lack of pitch level Inter play has been mentioned as one of the<br />
principle causes of the above described effects. Another very striking<br />
cause Is the near-total lack of referr ing tones, an obi Igatory<br />
characteristic of direct or ientation. In our spontaneous (direct)<br />
version, we have a good balance between R tones (9 examples) and P tones<br />
(10 examples) , plus 3 0 tones due to either planning difficulties or<br />
cut-off tone units. In the spl Iced dialog on the other hand, there are<br />
only 2 R tones as opposed to 15 P tones (plus one 0 tone due to reading<br />
aloud difficulties) .<br />
Quite a revealing ratio. The referr ing tone serves not only to<br />
relfy mutual worlds of understanding and Invoke common ground, but also<br />
to establ Ish social or psychological togetherness on supra-Informational<br />
levels. So It Is natur al and even predictable that when utterances are<br />
withdrawn from an Interactive context, the number of R tones wi I I either<br />
disappear or at least diminish drastically. As was the case In our<br />
stl I I-born laboratory dialog.<br />
Conclusions<br />
It Is hoped that the reader wi I I look upon the foregoing analysis as<br />
an attempt to bring out some of the ways In which Intonation reacts<br />
together with other pragmatic factors to generate the unique universes of<br />
what Is cal led 'local meaning' In the Discourse Model. The entir e,<br />
complex,<br />
here-and-now setting which speakers deal with according to their<br />
own apprehension of the continually mutating dynamics of verbal<br />
communication.<br />
The hypothetical suppositions In the analysis about some<br />
of the things that might have happened, are - Just that. Things that<br />
have happened. Out of the myriad other things that might have<br />
happened. Each contextual factor, Intonational or other wise, affects<br />
every other In various subtle ways.<br />
The analysis has also shown that there Is per haps no such thing as<br />
an 'unmarked' dialog.<br />
It could even be said that !!! naturally occurring<br />
Interactive speech Is, by definition, Intonatlonally 'marked'. Strings<br />
of words and sentences can be 'sewn together' In wr iting - maybe, but<br />
73
definitely not In speech. The Discourse Model clearly shows how<br />
Intonation represents an entire conceptual dimension of meaning In spoken<br />
language as opposed to written language, and further, how and why<br />
Intonation cannot be altered or manipulated with Impunity.<br />
Acknowledements Many many thanks to those who assisted In the<br />
rea II zat Ion of th I s project. My two Imaginative Informants, those who<br />
gave advice (Gosta Bruce, Robert McAI lister, and Jan Anward) , technical<br />
assistance (David House and Mats Dufberg) and food for thought<br />
(everybody at the Institutes of Phonetics and Linguistics at Lund and<br />
Stockholm universities) .<br />
Selected Blblography<br />
Brazl I, David. 1985. The Communicative Value of Intonation In Engl Ish.<br />
(Discourse Analysis Monograph no. 8) Engl Ish Language Research, & Bleak<br />
House Books. University of Birmingham.<br />
Brazil, D. 1985. Phonology: Intonation In Discourse. Extract:<br />
Handbook of Discourse Analysis. Academic Press, London.<br />
Brazil, D. 1985.<br />
Semlotlca 56 - 3/ 4.<br />
Where Is the Edge of Language? Review article.<br />
Brazil, D.<br />
and Rhythm.<br />
New York.<br />
1984. Sentences Read Aloud. Offprint: Intonation, Accent<br />
Studies In Discourse Phonology. Walter de Gruyter, Berl In,<br />
Brazl I, D. 1982 . Impromptuness & Intonation. Off print from Impromptu<br />
Speech - a Symposium.<br />
Abo Academy.<br />
Bruce, G. 1984 St ructure and Funct Ions of Prosody. Stenc I I . I nst. for<br />
I Ingvlstlk, Lunds Unlversltet.<br />
Bruce, G. 1977. Swedish Word Accents In Sentence Perspective. Gleerups,<br />
Lund.<br />
Bolinger, D. 1986. Intonation and Its Parts. Vol. 1. E. Arnold.<br />
Cooper-Kuhlen, E.<br />
Arnold.<br />
1986. An Introduction to Engl Ish Prosody.<br />
E.<br />
74
Coulthard & Montgomery (Eds) .<br />
Routledge and Kegan Paul.<br />
1981. Studies In Discourse Analysis.<br />
Cruttenden, A. 1985. Intonation. Cambridge University Press.<br />
Lieberman, P. 1967. Intonation, Perception and Language. Cambridge, Mass.<br />
MIT Press.<br />
Llnel I, P., Gustavsson, L. 1987. Inltlatlv och respons. Om dlalogens<br />
dynamlk, domlnans, och koherens. Llnkoplng.<br />
Levinson, S. Pragmatics. 1983. Cambridge University Press.<br />
Sinclair and Brazl I, 1982 Teacher Talk. Oxford University Press.<br />
Wulffson, M. 1987. Discourse Intonation In Swedish. To appear In Lund<br />
Working Papers. Instltutlonen for Llngvlstlk, Lund University.<br />
Appendix 1: Brief summary of the Discourse Model:<br />
The Discourse Model postulates a finite set of meaningful linguistic<br />
oppositions which can be singled out on a perceptual auditory level from<br />
the more or less constantly varying stream of speech. The meaning<br />
components here described represent the result of a speaker having made<br />
on either/or choice. The Independent variables are functional In nature.<br />
For example, "If there Is a 'fal ling pitch', It Is not the fall Itself<br />
which Is of Interest but rather the function of the language Item that<br />
carries It. "<br />
The basic factors which contribute to the realization of the<br />
functional oppositions within each tone unit are: PROMINENCE, TONE, KEY,<br />
and TERMINATION. Further within the domain of these systems are<br />
ORIENTATION ('direct/obi Ique') , and DOMINANC E or DISCOURSE CONTROL.<br />
Reference to the work of Braz I I Is heart I Iy recommended for the reader<br />
who would I Ike to gain a deeper understanding of the model. In the<br />
meantime, the fol lowing wi I I serve as a general guideline:<br />
PROMINENCE refers to 'a selection from sets available at successive<br />
places along the time dimension. ' 'An Incidence of prominence fixes the<br />
domain of the other variables of tone, key, and termination.' (Brazl I<br />
1985) A syllable or stretch of speech may be assigned prominence for the<br />
purpose of sense or Intonation selection. For example, If one Swede asks<br />
another 'Vllket kort spelade du? ' (Which card did you play? ), and the<br />
other rep I I es 'H JARTerDAM!' (the queen of hear ts) , I t represents a<br />
selection from an existential set of 52 - the deck of cards. If the<br />
question had been: 'VI Iken DAM spelade du? ' - (Which queen did you play? )<br />
with the answer, 'H JARTerdam', then there are only 4 choices In the<br />
75
existential set of 'hjarter', 'spader', 'ruter', and 'klover'. On the<br />
other hand, If the question had stl I I been 'VI Iken dam spelade du? ', but<br />
the answer had been, for example, 'HJARTerDAM', there would seem to be no<br />
motivation for 'DAM' to be prominent. But let's say, for example, that<br />
the speaker wished to concentrate on the card game Instead of answering<br />
questions, he might convey this with low termination on the word 'DAM',<br />
In order to be left alone! So prominence may be assigned for the purpose<br />
of making a choice within any of the other Intonational systems of tone,<br />
key or termination.<br />
TONE refers to basic pitch movement types, each of which carries a<br />
distinct abstract meaning Increment. The PROCLAIMING TONE, of which<br />
there are two versions, the SIMPLE proclaiming and the DOMINANT<br />
proclaiming (p p+) , stands for the elements In the discourse which<br />
represent a change In the status quo of speaker-hearer understanding.<br />
The REFERRING TONE, on the other hand, also with a SIMPLE and a DOMINANT<br />
version (r r+ ), effectively represents the areas on convergence, or<br />
relflcatlon of the status quo between speaker and hearer, either on<br />
Informational or social levels, or both. The dominant version reinforces<br />
the basic meaning of a tone and/or affects control of the discourse.<br />
The FIFTH TONE Is LEVEL (0 ), and remains outside of the<br />
Interactive proclaiming/referring dichotomy. ORIENTATION refers to the<br />
discourse situation In which speaker/hearer Interaction Is In focus<br />
(P/R) , whereas OBLIQUE orientation (O/P) functions where the language<br />
Itself or linguistic organization Is In focus.<br />
KEY and TERMINATION deal with the communicative value of relative<br />
pitch levels, HIGH , MID, or LOW. Key Is associated with the onset<br />
syllable, and termination with the tonic In an extended tone unit.<br />
Together on the tonic In a minimal tone unit. Within their domain are<br />
relationships of CONTRAST IVENESS, ADDITIVENESS, and EQUIVALENCE, as wei I<br />
as the Interactive areas of projected and actual responses, ADJUDICATING<br />
(high termination and key) and CONCURRING (mid termination and key) , or<br />
no projected expectations (low termination and key) . DISCOURSE<br />
STRUCTURING and SEQUENCING are also achieved through key and termination.<br />
The place of operation for these four sets of speaker options Is the<br />
TONE UNIT, which can be said to be the building block of verbal<br />
communication. According to Brazl I, the speaker 'plans' or 'encodes' the<br />
tone unit, and the hearer 'decodes' It as a whole. A tone unit (TU) In<br />
direct orientation consists of ONE (minimal TU) or TWO (extended TU)<br />
prominent syllables, one of which Is TONIC (- carries a major movement In<br />
pitch, or constitutes the beginning of a pitch movement which extends<br />
over the syllables that fol low) . Key and termination are determined by<br />
the level of pitch In relation to preceding and succeeding prominent<br />
syllables. Key and termination of p and r tones depend on the beginning<br />
of the tone, whereas In the p+ tone It Is the peak of the rlse-fal I, and<br />
In the r+ tone, It Is the end of the rise which counts. The tonic<br />
syllable Is the only obi Igatory portion of a TU. A pause always defines<br />
a TU boundary, but a TU Is not always defined by a pause. The model<br />
76
differs substantially from other models In this crucial point.<br />
It Is the<br />
Instance of a set of meaningful functional choices, and their Internal<br />
organization, rather than external boundaries which determine the tone<br />
unit.<br />
Appendix 2<br />
Transcription Conventions for Swedish<br />
1 . Tone un I t boundar I es: U 11<br />
2. Prominent syllables In capital letters with the tonic under I Ined:<br />
Ii oVre ka II<br />
3. Key and termination (relative pitch factors Involved at every TU.<br />
Key Is associated with the onset syllable,<br />
and termination with the<br />
tonic In an extended TU.<br />
Together on the tonic In a minimal TU.<br />
a. MID-KEY/TERMINATION are not specially marked.<br />
b. HIGH or LOW key-termination are marked with arrows:<br />
In the case of Accent 2 (A2 ) words In Swedish, where the pitch<br />
switch from mid to high key or termination takes place on the<br />
syllable fol lowing grave accent, this Is Indicated by an arrow<br />
placed above that second non-prominent syllable.<br />
4. Grave accent: (' )<br />
5. P - either /p/ ('sl) or /p+/ () proclalmlng. R - either /r/ ()<br />
or /r+/ ()If)<br />
refer ring.<br />
77
Why Two Labialization Strategies in Setswana?<br />
Mats Dufberg<br />
1. Background<br />
Setswana is a Bantu language spoken in southern Africa;<br />
in South Africa<br />
and in Botswana. It has seven vowel phonemes and a large number of<br />
consonant phonemes, of which many are labialized. In this paper I will<br />
discuss the labialized consonants and their non-labialized counterparts,<br />
and the two different realization strategies I have found, as one or two<br />
phonetic segments. In particular I will discuss why there is a bisegmental<br />
realization.<br />
The labialized consonant in Setswana can be found in two different<br />
vowel contexts. The first context is before a back, rounded vowel where<br />
all consonants are ( phonetically) labialized. The second context is<br />
before a non-back, unrounded vowel where there is a phonematic<br />
opposition between a labialized and a non-labialized consonant.<br />
Setswana has been described both by Tucker (1 929) in his book about<br />
a number of related Bantu languages and by Cole (1 955) in his Setswana<br />
grammar. Both agree on two important points ( Tucker 1929: 74, Cole 1955:<br />
33-34) :<br />
- labialized consonants in both vowel contexts are identical, and<br />
- labialized consonants are to be analyzed as one segment, both<br />
phonetically and phonologically.<br />
Tucker's claims concern both Setswana and Sesotho, which is a<br />
closely related language spoken in South Africa, among other places. Let<br />
me make another reference to Sesotho, and below it will be clear why it<br />
is relevant. Roux (1 981 ) refers to X-ray and acoustic studies of<br />
labialization, not in Setswana but in Sesotho. His conclusions are<br />
different from Tucker's and Cole's:<br />
- labialized consonants before rounded vowels are different from those<br />
before unrounded vowels, and<br />
- labialized consonants before unrounded vowels are ended with a labiovelar<br />
semi-vowel, [ w ] , and should at least phonetically be considered<br />
to be two segements.<br />
In Dufberg (1984) I reported on an acoustic study of the labialized<br />
consonants in Setswana. That study was done in 1984. Tore Janson,<br />
professor of linguistics, had during his studies of lexical change in<br />
Setswana in Botswana made recordings of word lists which he asked me to<br />
use for a study of the acoustic correlates of labialization in Setswana.<br />
In those recordings, of four informants, the labialization used was<br />
a monosegmental realization, that is, the labialized consonant was<br />
clearly one phonetic unit. This is in line with the Tucker-Cole view.<br />
For the second part of my study we recorded one speaker of Setswana<br />
78
from South Africa, not Botswana. In that recording I found two different<br />
strategies of labialization. One of the four consonants studied was<br />
pronounced monosegmentally, just like all consonants in the earlier<br />
recordings. But the other three consonants were pronounced with a bisegmental<br />
realization. That is, the consonant was followed by a semivowel.<br />
This is in line with the Roux view. Even though Roux studied<br />
Sesotho, not Setswana, his findings were felt relevant since my<br />
informant came from South Africa. This finding gave rise to two new<br />
questions.<br />
1) Should /C w / be analyzed - phonologically - as one or two segments?<br />
2) Is the difference in realization of the labialized consonants, i.e.<br />
mono vs. bisegmental, a dialectal or idiolectal difference?<br />
In my report (Dufberg 1984) I did not find any reason to change the<br />
phonological analysis. But I hypothesized that there was a dialectal<br />
difference in the pronunciation of the labialized consonants and argued<br />
against the idiolectal hypothesis.<br />
In this paper I will first, in section 2, give a brief presentation<br />
of the phonological system of Setswana. Firstly, becaused it will help<br />
the reader to understand my work, and secondly, because it is of general<br />
interest as a contrast to the Indo-European systems. In section 3 I will<br />
briefly review Dufberg (1984) , though section 3.2 on the expected<br />
effects of labialization was not included in that report. In section 4 I<br />
will discuss my new study and present some new data. Finally, in section<br />
5, I will discuss the two realization strategies - mono vs. bisegmental<br />
- and discuss alternatives to the dialect hypothesis.<br />
2. Phonological system of Setswana<br />
2.1 General description<br />
Setswana is a tone language but the tones will not be discussed in this<br />
paper. The syllable structure is the simplest possible; a syllable<br />
consists of either a consonant plus a vowel, CV, a vowel, V, or a<br />
syllabic consonant, C. There are no clusters of (non-syllabic)<br />
consonants, at least not on the phonological level.<br />
Vowel length is not phonematic in Setswana (Cole 1955: 55) , but<br />
there are different vowel lengths as a part of the prosodic structure.<br />
2.2 Vowels<br />
Setswana has seven vowel phonemes which can be divided into three groups<br />
(Cole 1955: 4-7) .<br />
79
Front, unrounded vowels:<br />
Iii phonetically very closed.<br />
/el phonetically more closed than half closed.<br />
/81 phonetically half open.<br />
Open, unrounded vowel:<br />
/al phonetically open and central.<br />
Back, rounded vowels:<br />
lu/ phonetically very closed.<br />
/0/ phonetically more closed than half closed.<br />
/I phonetically half open.<br />
There is phonologically governed variation of the vowel quality of<br />
the four mid vowels, Ie, E, 0, I (Cole 1955: 55) , but since it is not<br />
important for the present study I will not discuss it here.<br />
2.3 Consonants<br />
As we have seen, the number of vowel phonemes is low and the syllable<br />
structure is simple. The number of consonant phonemes, though, is high<br />
and there is a complex relationship between labialized and nonlabialized<br />
consonants.<br />
In Setswana there are 44 consonant phonemes, of which 17 are<br />
labialized. (For reasons which will be clear I do not count the semivowel<br />
/w/ as a labialized consonant here.)<br />
In the chart below each phoneme is represented by its major allophone.<br />
Notice that for every labialized consonant there is a consonant<br />
differing only in the respect that it is non-labialized. Since<br />
"labialization is a morphophonological process in Setswana" (Janson<br />
1985) it is really relevant to talk of a labialized consonant and its<br />
non-labialized counterpart.<br />
In a few cases the labialized consonant has<br />
two non-labialized counterparts; /ts W I has both Its/ and /tJ/ as its<br />
counterparts, /ts hw / has both /ts h / and /t J h /, and /s w / has both /s/<br />
and / J!.<br />
STOPS:<br />
Plain Labia- Aspi- & ASE·<br />
lized rated labial.<br />
Place of articulation<br />
/p/ /p h / Bilabial<br />
/b/<br />
Voiced bilabial<br />
/t/ /t w / /t h / It hw / Alveolar<br />
/tl/ /tl w / /tl h / Itl hw / Alveolar with lateral release<br />
/k/ /k w / /k h / /k hw / Velar<br />
80
AFFRICATES:<br />
Plain Labia- Aspilized<br />
rated<br />
Asp. &<br />
labial.<br />
Place of articulation<br />
/ts/ /ts h /<br />
/ts w /<br />
/t J/ /tJ h /<br />
/d 7; /<br />
w<br />
/d 7; /<br />
/kx h /<br />
Alveolar<br />
Alveolar or prepalatal<br />
Prepalatal<br />
Voiced prepalatal<br />
Velar<br />
FRICATICVES AND LIQUIDS:<br />
Plain<br />
Labialized<br />
Place and<br />
manner of articulation<br />
/ iF/<br />
/s/<br />
/s w /<br />
I f /<br />
/x/ /x w /<br />
/ r / /r w /<br />
/1/ /l w /<br />
Bilabial or labiodental fricative<br />
Alveolar fricative<br />
Alveolar or prepalatal fricative<br />
Prepalatal fricative<br />
Velar fricative<br />
Apical trill<br />
Alveolar lateral<br />
NASALS:<br />
Bilabial Alveolar Prel2alatal Velar Comment<br />
/m/ /n/ /fl / /0/ Plain<br />
/n w / /fl w / /O w / Labialized<br />
SEMI-VOWELS:<br />
/w/: bilabio-velar<br />
/ j /: palatal<br />
In Setswana there are a few click sounds, but of marginal<br />
importance (and only in interjections). For some consonants there are<br />
restrictions on which vowel can follow, but that is also out of the<br />
scope of this paper, except for what is relevant for labialized<br />
consonants. That discussion will follow below.<br />
2.4 Labialization of consonants<br />
Before the back (and rounded) vowels, /u, 0, / all consonants are<br />
(phonetically) labialized due to regressive assimilation (Cole 1955: 33-<br />
34) , i.e. the consonants are articulated with a distinct liprounding and<br />
- when it is possible - with the back of the tongue raised towards the<br />
velum.<br />
81
This means that before rounded vowels there is no opposition<br />
between the labialized and the non-labialized consonants described<br />
above. The alveolar series Is, ts, ts h / and the prepalatal series /f,<br />
tf, tf h / also collapse into one, labialized series /s w , ts W , ts hw /,<br />
which is alveolar or prepalatal depending on the dialect (Cole<br />
1955: 35) .<br />
Traditionally (Cole 1955) Setswana is described as having the<br />
labialized phonemes in front of the rounded vowels. That view could of<br />
course be challenged since there is no opposition between labialized and<br />
non-labialized consonants in that vowel context. (For an alternative<br />
analysis see Janson (1985) .)<br />
The labialized consonants can also be found before the unrounded<br />
vowels Ii, e, , a/, but in this case there is a phonematic distinction<br />
between the labialized and the non-labialized consonants. Even in this<br />
case there is only one labialized series that correspond to both the<br />
alveolar and the prepalatal series. It is this last kind of<br />
labialization that will be discussed in this paper.<br />
3. An acoustic analysis of /C W / - a review<br />
In this section I will briefly present my acoustic analysis of the<br />
labialized consonants originally presented in Dufberg (1984) . It is not,<br />
however, a pure review. In 3.2 I will discuss the expected effects of<br />
labialization which was not discussed in the original report.<br />
3.1 Objectives of the study<br />
The study was explorative and the questions we wanted to find answers to<br />
were:<br />
1) What is the acoustic difference between the labialized consonant and<br />
its non-labialized counterpart?<br />
2) Is there a common acoustic correlate that corresponds to the<br />
distinction labialized/non-labialized?<br />
To be able to answer the first question completely and to give an<br />
affirmative answer to the second question all consonant pairs have to be<br />
represented, and in comparable vowel contexts and positions in the<br />
words. Recall that the contrast between labialized and non-labialized<br />
consonants only exists before unrounded vowels which is the only context<br />
I have studied.<br />
3.2 Expected effects of labialization<br />
The term labialization implies that the consonant should have some extra<br />
component of the lips,<br />
most likely lip rounding. The acoustic effect of<br />
82
liprounding depends on the place of articulation. For dental, alveolar,<br />
or palatal consonants we would expect a lowering of the third formant,<br />
or its equivalent, like the effect of rounding of an [ i ] to an [ y ] . For<br />
velar consonants, on the other hand, we would expect a lowering of the<br />
second formant, like the effect of rounding of an [ w ] to an [ u ] (Fant<br />
1968: 214) .<br />
According to Cole (1955) , though, labialization (of consonants in<br />
Setswana) means rounding the lips and also raising the back of the<br />
tongue when that is possible. That is, labialization is then a<br />
combination of true labialization and velarization. Labialization<br />
combined with velarization will lower the second formant even in dental,<br />
alveolar, and palatal consonants, that is, labial ization and<br />
velarization will strengthen each other's lowering effect on F2. It<br />
would not be surprising if we would find velarization combined with<br />
labialization since the two have been found together in other languages<br />
(Jakobson & Waugh 1979: 116-7) .<br />
We can expect a secondary effect on the consonantal segment in<br />
either of these two models, that is, only labialization and<br />
labialization combined with velarization, to be lowering of the<br />
amplitude (Fant 1968: 204-5) , but much greater if F2 is lowered than if<br />
only F3 is lowered.<br />
3.3 Speech material and analysis method<br />
For the study we used two different recordings. Recording 1 was recorded<br />
in the field in Botswana by Tore Janson in 1982. It was a recording of 4<br />
native speakers of Setswana from Botswana reading a list of 75 words.<br />
The word list was composed for Janson's lexical change studies. The list<br />
was not planned for the study of labialization and there were a number<br />
of problems with the selection of words. Firstly, all consonant pairs<br />
were not represented, secondly, both members of a pair were not always<br />
in a comparable vowel context, and thirdly, some consonants were in the<br />
final syllable which often underwent devoicing. Of 20 /C-C w / pairs only<br />
8 could be used for the analysis. Recording 1 clearly did not, even<br />
theoretically, allow us to answer the two questions presented above.<br />
The second recording, recording 2, was recorded at the phonetics<br />
laboratory in Stockholm by Tore Janson and me. It consists of one<br />
speaker reading a list of 32 words specially selected for the study. The<br />
speaker is a native speaker of Setswana from South Africa. The selection<br />
of words in the word list was made after I had done most of the analysis<br />
of recording 1. This list was limited to four of the eight pairs<br />
analyzed in recording 1. This limitation was done to keep down the size<br />
of the study. For each consonant we had four different vowel contexts.<br />
The speech material was analyzed on a Kay Digital Sona-Graph 7800<br />
spectrograph, and all measurments were done by hand on spectrograms with<br />
a band width of 300 Hz. To be able to compare levels the spectrograms<br />
83
were normalized with the help of the strongest vowel of each word.<br />
3.4 Results<br />
Without too much simplification we can summarize the results from<br />
recording 1 and 2 in a table. In the table I use the following<br />
notations:<br />
duration<br />
F2<br />
Duration of the consonant segment.<br />
Transitions of the second formant from the vowel before to<br />
the consonant and from the consonant to the vowel after.<br />
consonant<br />
formant The lowest peak, in freqency, in the spectrum of the<br />
consonant segment.<br />
?<br />
Some uncertainty of the analysis.<br />
Phoneme<br />
pair<br />
Observed differences of /e w /<br />
with respect to /e/<br />
Recording 1<br />
longer duration<br />
dipping towards cons.?<br />
lower amplitude of noise?<br />
longer duration?<br />
F2 dipping towards cons.?<br />
lower amplitude of noise?<br />
lower consonant formant<br />
Recording 1<br />
Recording 2<br />
F2 dipping towards cons.<br />
lower cons. formant?<br />
lower ampl itude?<br />
F2 dipping towards cons.<br />
lower cons. formant<br />
lower amplitude<br />
F2 dipping towards cons.<br />
F2 dipping towards cons.<br />
lower cons. formant<br />
lower amplitude<br />
F2 dipping towards cons.<br />
lower cons. formant<br />
ended with a semi-vowel?<br />
longer duration<br />
F2 dipping towards cons.<br />
lower cons. formant<br />
lower amplitude<br />
ended with a semi-vowel<br />
longer duration<br />
F2 dipping towards cons.<br />
ended with a semi-vowel<br />
longer duration<br />
F2 dipping towards cons.<br />
lower cons. formant<br />
lower amplitude<br />
84
It seems fair to say that labialization has one or more of the<br />
following effects:<br />
- Lowering of F2 in transitions from the preceding vowel and to the<br />
following one.<br />
- Longer consonant segment.<br />
Lower amplitude of noise/formants in the consonant.<br />
Lower frequency of noise/formants in the consonant.<br />
- Semi-vowel.<br />
The last effect, the semivowel, is rather special. In two consonant<br />
phonemes, /n w / and /l w /, in recording 2 the speaker clearly used a<br />
bisegmental realization, that is he ended the consonant with the semivowel<br />
[w], in one consonant phoneme, /x w /, he as clearly used a monosegmental<br />
realization. The forth case, /r w /, is somewhat unclear but my<br />
interpretation is that the informant is using the bisegmental<br />
realization even in that case.<br />
In recording 1, on the other hand, I found only the monosegmental<br />
realization of labialization. That is, the labialized consonant was<br />
never ended by a semi-vowel.<br />
3.5 Conclusions<br />
If we compare the effects of labialization we have found with the<br />
expected effects we can see firstly, that there seems to be velarization<br />
as well as labialization since F2 is affected even for front vowel<br />
contexts. Secondly, that there are effects that are not directly<br />
connected to the labialization itself. These are the longer duration and<br />
the occurrence of the semi-vowel. There seems to be a connection at<br />
least in one direction: the semi-vowel gives a longer total segment. And<br />
what is special with the semi-vowel is that only the speaker in<br />
recording 2 has it and that he has it fairly consistently.<br />
If we try to find a common acoustic correlate that corresponds to<br />
the distinction labialized/non-labialized, there seems to be two good<br />
candidates. The first is the lowering of F2 in the transitions from and<br />
to the surrounding vowels. There is some data that seems to contradict<br />
it, and that is the data of /tl-tl w / and /d?-d? w / in recording 1. But<br />
that data is not very complete and the lowering of F2 is what we would<br />
expect if labialization is combined with velarization, so I think it is<br />
safe to consider lowering of F2 to be a common correlate.<br />
The second candidate is the lowering of the second formant, or the<br />
equivalent resonance, in the consonant itself. This is expected, and<br />
reasonably supported except in nasals. In nasals there is a total<br />
closure behind the lips, and therefore lip rounding seems to be<br />
irrelevant for the nasal segment. (The question remains, though, what<br />
the expected effect of velarization of an alveolar or palatal nasal<br />
consonant is in the nasal phase.)<br />
The two effects of labialization discussed above are common<br />
85
correlates of labialization, but there are other effects of<br />
labialization that are not present in all realizations of labialized<br />
consonants. The most obvious one is the presence of a semi-vowel. We<br />
have clearly found two different manners of realizing labialization,<br />
mono and bisegmental.<br />
The questions that the two different ways of realization gave rise<br />
to, which have already been referred to in section 2. 2, were:<br />
1) Should /e w / be analyzed - phonologically - as one or two segments?<br />
2) Is the difference in realization of the labialized consonants, i. e.<br />
mono vs. bisegmental, a dialectal or an idiolectal difference?<br />
Janson (1 985) points out that the strict ev structure of Setswana<br />
is a strong argument against analyzing the /e w / as two phonemes. And I<br />
see no reason to argue against that. On the second question I argued in<br />
Dufberg (1 984) that the dialectal hypothesis was the most reasonable. I<br />
will return to that question in the next section.<br />
4. Some new data<br />
4.1 Objectives of the study<br />
We wanted to test the hypothesis from Dufberg (1 984) that there is a<br />
dialectal difference between mono and bisegmental pronunciation. If this<br />
dialectal difference exist there are at least two possibilities. The<br />
dialectal difference is something that could be found only in areas in<br />
contact with Sesotho from which Setswana has borrowed the bisegmental<br />
strategy. Then we assume that Roux's analysis of Sesotho is correct,<br />
that is, that /e w / is ended phonetically by a semi-vowel, [wl. The<br />
second possibility is that this feature is spread to different areas<br />
maybe independently of Sesotho. Then we would be able to find the<br />
feature in other areas. That is in areas where there is no contact with<br />
Sesotho.<br />
4.2 Speech material and results<br />
This third recording, recording 3, contains ten speakers of Setswana<br />
from different areas of Botswana recorded in Botswana in 1985 by Tore<br />
Janson. The same list of words as for recording 2 is read once by these<br />
ten speakers. Three of these speakers have been digitalized and recorded<br />
onto disks and analyzed by a spectrogram program on the DEe Eclipse<br />
computer in the phonetics laboratory. Since the objectives have been to<br />
test the dialectal hypothesis, I have not made any detailed measurements<br />
but only looked for bi vs. monosegmental realizations of labialization.<br />
Let me illustrate here with spectrograms: firstly, the effect of<br />
labialization, and secondly, the difference between mono and bisegmental<br />
realization. Figures 1-4 contain spectrograms illustrating the effects<br />
86
of labialization. Figures 1 and 2 are parts of recordings of the same<br />
word, which illustrate the non-labialized /x/, read by to speakers, A<br />
and B. Speaker A is the one speaker who used bisegmental real ization of<br />
the labialized consonants (except in /x w /). Speaker B is one of the<br />
speakers of recording 3, and he never used bisegmental realization. In<br />
figure 3 and 4, read by the same two speakers, parts of recordings of<br />
another word illustrate labialized /x w /. Notice the transition of F2,<br />
and the down shift in frequency of the strongest resonance of the<br />
consonant. All realizations in figures 1-4 are monosegmental.<br />
Figures 5-8 illustrate bi vs. monosegmental realization of two<br />
consonant phonemes, /n w / and /l w /. Figures 5 and 7 are from recordings<br />
of speaker A, as defined above, realizing his consonants bisegmentally.<br />
Figures 6 and 8 are from recordings of speaker B, realizing his<br />
consonants monosegmentally. Notice that F2 stays at a low frequency<br />
value, forming a [w], in figures 5 and 7, whereas in figures 6 and 8 F2<br />
rises directly after the end of the nasal and lateral phases,<br />
respectively.<br />
Even though it can be hard to define the beginning and the end of<br />
the semi-vowel that ends a labialized consonant with bisegmental<br />
realization, the difference between a mono and bisegmental realization<br />
has still been rather clear cut. And in the three speakers, out of ten<br />
in recording 3, that I have analyzed I have found no examples of bisegmental<br />
realization.<br />
5. Discussion and conclusions<br />
5.1 Possible explanations of bisegmental realization<br />
Let us now try to account for the different realizations, that is,<br />
bisegmental vs. monosegmental realization. Let me first summarize the<br />
differences between the informants with only monosegmantal realization<br />
and the one informant with both mono and bisegmental realization.<br />
Only monosegmental<br />
From Botswana<br />
Contact with Sesotho less likely<br />
Recorded in field<br />
Recorded in their own country<br />
Both mono and bisegmental<br />
From South Africa<br />
Contact with Sesotho likely<br />
Recorded in echo free chamber<br />
Recorded in exil e<br />
The following are theoreticall y possibl e explanations that we<br />
shoul d consider:<br />
1) Idiolectal peculiarity<br />
2) Dialectal difference<br />
3) Spelling pronunciation<br />
4) Hyper speech due to the formal situation<br />
I rejected in Dufberg (1984) the first alternative on the grounds<br />
89
that it is not likely that someone would so consistently have such<br />
different realizations. As long as there are other possible explanations<br />
I think we can safely leave the idiolectal explanation out.<br />
The second alternative, dialectal difference, is the one which I<br />
adopted in Dufberg (1 984) . I did not then consider alternatives 3 and 4.<br />
And what I found in Dufberg (1 984) in favor of this hypothesis against<br />
the idiolectal hypothesis was the fact that the informant so<br />
consistently used the bisegmental realization for three consonants and<br />
so consistently the monosegmental for the fourth. Nothing speeks against<br />
the dialect hYPQthesis but the support is not very strong either.<br />
The spelling of Setswana, which is fairly standardized, is very<br />
phonematic. A consonant phoneme is in the orthography represented by a<br />
single grapheme, a digraph, a trigraph, or even a quadrigraph. A<br />
labialized<br />
consonant is represented by its non-labialized counterpart's<br />
graph plus a w in the end. So the words /onEla/ and / xon w Ela/,<br />
respectively, are spelled go nela and go nwela, respectively. ( Recall<br />
that there are no consonant clusters in Setswana.) But spelling<br />
pronunciation, the third alternative, can not, at least not alone,<br />
explain the bisegmental realization. Firstly, all informants were<br />
literate and bilingual in Setswana and English,<br />
and all were reading the<br />
words from a list written in Setswana standard othography. Among the<br />
informants that did not show any bisegmental realization were university<br />
students. Secondly, spelling pronunciation can not explain why the<br />
phoneme /x w / was never realized bisegmentally.<br />
Let us look at the fourth alternative. The one informant who used<br />
bisegmental<br />
realization was the only one to be recorded in an echo free<br />
chamber in a phonetics laboratory, which is probably the most formal<br />
place one could be recorded in. The other informants were recorded in<br />
much more relaxed places. The one informant was also the only one<br />
recorded in exile, that is, in Sweden. The other ones were recorded in<br />
their own country, Botswana. The formal situation may have triggered<br />
hyper speech, that is, the opposite of reduced speech. ( For a discussion<br />
of hyper speech see Lindblom (1 987) .) This explanation assumes that<br />
bisegmental realization is, at least potentially, available to the<br />
speaker of Setswana.<br />
the<br />
If this was not the case at the days of Tucker and<br />
Cole, literacy might have made it available to the literate.<br />
The hyper speech hypothesis itself can not explain why the<br />
informant that used bisegmental realization always realized / r w , n W , l W /<br />
bisegmentally, but never / x w /, which was always realized<br />
monosegmentally. But if we assume that labialization also implies<br />
velarization for non-velar consonants, then there is one interesting<br />
fact, namely that / x w / is a velar consonant, and the only velar one of<br />
the four consonants. For the non-velar consonants, the velar gesture,<br />
together with the labial gesture, adds to the complexity of the<br />
consonant, whereas for the velar consonant it is part of a non-complex<br />
consonant. This difference may be the key to why / x w / behaves<br />
differently. If we assume that the secondary articulation is carried out<br />
90
of the consonant itself we would get a labio-velar semi-vowel, [w],<br />
after non-velar consonants, that is, a low F2. But only liprounding<br />
after velar, which would not affect the F2 of non-back vowels.<br />
This velar explanation is compatible with both the dialect and the<br />
hyper speech hypothesis. But it makes the hyper speech hypothesis more<br />
convincing than without the velar explanation.<br />
5.2 Conclusion<br />
In this paper I have challenged the original hypothesis that the two<br />
different labialization strategies, mono vs. bisegmental realization,<br />
are connected to dialectal differences. The new hypothesis is a hyper<br />
speech hypothesis, that is, that the different strategies are connect to<br />
style of speech. In hyper speech, that is, the opposite of reduced<br />
speech, we would then get the bisegmental pronunciation.<br />
An explanation to the difference in realization of the labialized<br />
velar consonant, /x w /, which was never realized bisegmentally, in<br />
contrast to the other consonants could perhaps be found in the fact that<br />
it is velar in contrast to the other consonants analyzed, which are<br />
alveolar.<br />
REFERENCES<br />
Cole, D. T. (1 955) : An introduction to Tswana Grammar. London.<br />
Dufberg, Mats (1 984) : Labialiserade konsonanter i setswana -en akustisk<br />
analys. Unpublished paper. Stockholm: University of Stockholm,<br />
Institute of Linguistics.<br />
Fant, Gunnar (1 968) : "Analysis and synthesis of speech processes". In<br />
Manual of phonetics, 2nd edition, edited by Bertil Malmberg.<br />
Amsterdam: North-Holland Publishing Company, pp. 173-277.<br />
Jakobson, Roman & Waugh, Linda R. (1979) : The sound shape of language.<br />
Brighton, GB: Harvester Press.<br />
Janson, Tore (1 985) : "Labialisation in Setswana: phonetics and<br />
phonology". In Phonologica Africana 1984 (=Wiener linguistische<br />
Gazette Beiheft 5) . Wien: Institut fUr Sprachwissenschaft der<br />
Universitgt Wien, pp. 73-84.<br />
91
Lindblom, Bj6rn (1 987) : "Adaptive variability and absolute constancy in<br />
speech signals: two themes in the quest for phonetic invariance".<br />
In Proceeding s Xlth ICPhS from the Eleventh International Congress<br />
of Phonetic Sciences, 1987, vol 3, pp. 9-18.<br />
Also in Perilus report no 5 (this volume) . Stockholm: University of<br />
Stockholm, Institute of Linguistics.<br />
Roux, J. C. (1 981) : liOn the notion 'phonologization': some experimental<br />
phonetic considerations from Sesotho. " In Phonologica 1980, edited<br />
by W. Dressler et al. , pp. 373-378.<br />
Tucker, A. N. (1929) : The comparative phonetics of the Suto-Chuana group<br />
of Bantu languages. London.<br />
ACKNOWLEDGEMENTS<br />
Thanks to Robert McAllister and Sven Furumark, for insightful<br />
suggestions and proofreading, to Bj6rn Lindblom for leading my analysis<br />
in the right direction, and to Tore Janson, for supporting my work.<br />
92
L.Rcug, I.Lndbrg nd L. -J.Lundbrg<br />
Dpartmnt of<br />
Lingui s tic s<br />
University of<br />
Stockholm<br />
S",dn<br />
Duri ng the last decad thr has ben a growing interst in childrn's<br />
presp ec h developmnt (Yeni-KClmshian, Kavan agh and Ferguson 1980, Stark<br />
\981, LocKe 1983, Lindbl Clm and Zett rstrm 1986). Th viw hld by J aKobson<br />
(1968) t hat declared babbling and speech as two unrelated behav i ors, stands<br />
in cClntrast with recnt studies of p rling uist ic vocalizat ions and ear ly<br />
language acquisition i nd ic ating a gra du al transition (Oller, Wieman, Doyle<br />
and Ross 1976, Vihman, MaCKen, Miller, Simmons and Millr 1985). JaKobson's<br />
app roa c h !>Ias tCl pos t Ulate a d iscon t inuous step and a univrsa] order of<br />
acquisition of phonmes governed by the "laws Clf irrvrsibl solidarity "<br />
( 1968: 51 ) , laws whi c h in JaKobson's frameworK und erlie p honolog i cal<br />
u nivrsa l s, the regreSSion of the phonologic a l system in p atints w i t h<br />
a p hasia as well as the a cqu i sition of phonol ogy in the child. T he p rincipl e<br />
of m aximal contrast governs the order in which the phonemes are acquired.<br />
This mea ns that the infant's earlie st language productions will consist of<br />
consonant/vowel contrast s of maximally diffrent phontic vents pa,<br />
followed by a nasal/oral contrast pal ma. This period of structured<br />
phonol ogic a l d evelop ment is p r c d ed by a priod of rand om vcu:a 1<br />
93
prod uctions , i . e. babbling. These babbled utter ances are characte r izd by<br />
"an tonishing quantity and diversity of sound productions" (1968:21). The<br />
two types of vocalizations, babbling and speech, may be sepa r ated by means<br />
of a hort pe r iod in which the child i sometime "completly mute"<br />
(1968:29). This silent period marks fo r the child the functional diffe rence<br />
of the two types of vocal behavior. However, not all child ren tu rn mute<br />
since "fo r the most part . . . • • one stage merges unobt rusivly into the ot her"<br />
(1968:29). Ja k obson considers babbling <br />
"pu r poseless egocent ric" , and hence non-communicative, type of behavio r<br />
parallel to which " desire for communication" and<br />
gradually replaces the "biologically oriented tongue deli rium" (i.e. the<br />
babbling) of the child (1968:24). Thi view of the young infant a a<br />
non-communicative , rather passive individual acco rds well with the general<br />
opinion of infant competence at the time (see the dicusion in SuI Iowa<br />
1979) • It is only in recent years that the communicative capac ities of the<br />
ve ry young infant have begun to be more fully ap p reciatd ( Su 1 1 Q ... ,a 1979,<br />
Meltzoff 1986). The unde r standing that a search fo r precu rsors of speech<br />
must be conducted in the context of a more gneral per spective on<br />
communicative be havior has resulted in a rejection of the discontinuity<br />
theory. HQwver, it has been uggested that the order of ac qu i sit ion<br />
proposed by Jakobson might be more true fo r the prelinguistic pe r iod than<br />
fo r the acquisition of early phonology (Vi hman pe r onal communication),<br />
thus implying a unive rsal developmental pattern fo r babbling rather than<br />
fo r speec h . The silent per iod ... ,hich was repor ted by Jak obon has not been<br />
confi rmed by any of the many recent stlJdies of prespeech development (e.g.<br />
Mu rai 1963, Cruttenden 1970 , Koopmans van Beinum and van der Stelt 1986,<br />
Olle r 1980 , Sta rk 1980, Kent and Sauer 1984, Vi h man et al.1985, Holmgren,<br />
kindblom , Aurelius, Jalling and Zetterstram 1986). Instead there have been<br />
repo rts of the existence of developmental stages or milestones in the<br />
babbling pe riod (Olle r 1980, Stark 1980, Koopm ans van Beinum et al. 1979,<br />
94
Holmgren et al . 1986) and strong simi larities bet ween the phonetic<br />
reperto ire of a child's babbling and his/her first words<br />
(Ol ler et ill .<br />
1976 , Locke 1983 , Vi hman 1986) .<br />
Many of the studies have been performed on children in Eng lish speaking<br />
commun i ties , a circumstance that has served as a sti mu lus to undertake<br />
research to co nf irm these data on non-Engl ish subjec ts. The present pilper<br />
presents phonetic data on a group of Swed ish inf ants that by and large<br />
corroborate the deve lopmental mi l estones reported in other stUd ies<br />
(O l ler<br />
1980 , St ark 1980 , Koopmans va n Be i num et al . 1986).<br />
If we are correct in assum ing that babbl i ng ilnd speech are functio nal ly<br />
related , obse rvat ions of babb l ing shoul d be of clin ical interest . A large<br />
number of questio ns can be raised concerning the poss ibil i ties of obtilining<br />
ear ly indi c ators of deviant commun i cative deve lopment . The present projec t<br />
(foot note 1) was initiated by professor Ro lf Zet terstrom and his co l l eagues<br />
at Sankt Goran's Ch ildren 's Hospital in Stoc kholm. A major goa l of the<br />
project has been to obtain a detailed phonetic descrip tion of the prespeec:h<br />
development of norm al Swed ish inf ants wh ich could serve as a reference data<br />
base in the deve lopment of meth ods for the ear ly diagnosis of deviant<br />
commun icat ive<br />
deve lopment.<br />
Eight normal (fo otnote 2) Swed ish infants were audio-recorded (footnote 3)<br />
on a bi-week ly basis in their homes from when they were around 5 to 76<br />
weeks of age . Recordi ngs were ended when the ch ild had ac hieved a ten- to<br />
lexicon according to parental reports. The recordings were<br />
made in the prese nce of a close relat ive (mother or father )<br />
or an adult<br />
95
whom the child knew we ll. The ituat i on in wh ich the recor dings were made<br />
would vary depe nding on the age of the chi ld and the time of day . Typical<br />
record ing ituat ion wou ld be: infant lying in bed falling asleep or<br />
awaken ing, infant pl ay ing with toys, infant seated in sofa or at table<br />
draw ing or read ing in book with adult. We wou ld also record dur ing meal or<br />
shortly af ter , in nursing situati ons such as diape r change and dressi ng ,<br />
and in any other int era c tive sit u atio ns betwee n parent and ch i l d that would<br />
occur natural l y. As the inf ants grew older on ly the last interact ive<br />
si t uation rema ined . In parallel wit h the aud io record ings , note were made<br />
of the var i ous activities taki ng place . This in order to provide contextua l<br />
informat ion for the inter pretatio n of the voc al izat ions. An extra recording<br />
sess ion was made at the age of around 3.5 years to insure norma l<br />
speech<br />
deve lopment.<br />
From the larger samp le of eight infant, a group of four (two boys and two<br />
gi rls) were se lected on the basis of qua lity and regularity of record i ngs.<br />
The recording of these four infant were firt exposed to a crude aud itory<br />
analysis. Th is met hod of se lec ti ng voca l i zat ion samp les consi sted in<br />
sc reening the tapes at the approxim at e age of onset of mi l estone rep orted<br />
by other investigato rs (Koopmans van Be i num et al . 1986, Ol ler 1980, Stark<br />
1980) • The Ieeks at whic h there were clear changes in the c hara c ter of the<br />
chi ld's voca l i zati ons , were se l ected for further analyses . Also weeks ju st<br />
preced ing and following this point were an alysed to secure the t ab i lity of<br />
the mi lestone . The chose n record ings were comp uter ed ited (footnote 4) in<br />
order to exclu de al l non-c omf ort and non-i nfant sounds . In follow ing this<br />
procedu re 37 percent (20 hours ) of the tota l number of recorded hours (54)<br />
per infant , Ie re ana lysed . The ch ild's phonation s were divided into<br />
utt erances us i ng breath groups as segmentat ion crite ria and were then rerun<br />
96
onto tapes for transcrip tion. The total number of utterancs transcribd<br />
per child was around 2500 , vary ing between 2200 and 3100 utteranc es for<br />
each of<br />
the fou r children.<br />
The taps to be an a lysed were independent ly transc ribed by four students of<br />
phonetics using the In ternationa l Phonet ic Alph abet (IPA) (see The<br />
Princip les of the In ternation al Phone tic Association , 1981 ) . Prior to this<br />
ana lysiS , transcrip t ion training sessions and discussions were he ld to<br />
insu re identica l mode of procedu re. The IP A is , as the name implies , an<br />
in ternationa l notationa l system deve l oped to describe speech sounds found<br />
in the languages of the wor ld. Each symbo l rep resents a sound of a stand ard<br />
phonetic val ue. The phone tic symbo l of each sound can be ana lysed into<br />
articu lat.ory features such as place and manner of co nstriction in<br />
consonants, degree of opening of the mouth (i.e. lowering of the tongue and<br />
posit.ion of the tongue in the horisontal front -center-back dimension<br />
and presence vs. absence of rounding of the lips, for vowe ls. The features<br />
voiced vs. voice less denote whether or not. there is vibra tion of the voca l<br />
cords accompanying the articu lation. Additiona l ly there are diac ritics<br />
al l owing for detailed desc rip t ions of each symbol , should it deviate from<br />
the stand ard phonetic va l ue.<br />
In our approac h eac h of t.hese features of the IPA symbo ls we re given a<br />
number (see Table I) i.e. for conson an ts the different places and manners<br />
of articulation wou ld be numbered an d the presence vs. absence of voicing.<br />
Simi la rly for vowels the degree of opening of the mouth, front -center -back<br />
position of the tongue (see Table II) , rounding of the lips and voicing<br />
97
..<br />
..<br />
Place of articulation<br />
r 2 3 4 5 6 7 8- 9<br />
, j<br />
0<br />
ClnaonADU . i .i <br />
<br />
=<br />
<br />
:<br />
i 3<br />
-;<br />
-2<br />
- 3 ,g < < ;:.. ". 0<br />
.<br />
c<br />
0<br />
C .<br />
. <br />
1. Pic-i-n p b td t
we re coded in numbers. Additiona l l y dirction of breath st ream e.g.<br />
ingressive , egressive and voice qua lity e.g. norma l and deviant ,<br />
lik ewise numerica lly coded.<br />
These numbers formed the basis on whic h the<br />
frequency<br />
counts we re made.<br />
Choosing this app roac h we can an ticipate the fo l lowing me t hodologic al<br />
prob l ems. Since pre-speech canno t be assumed to be organi zed in to disc rete<br />
phonemic segments , the first prob lem is on of rep rsenting a<br />
series of events as nQn=£Bniin . A rl atd issue is the question of<br />
what notation to use when doing so. The choice of notationa l system when<br />
transc ribing babb ling varies in the literatur. Although a sep arat system<br />
has been deve loped for transc ribing babb ling (Koopman van Beinum et<br />
al.1986 ) many researchers have choosen to use an expanded form of IPA<br />
adding diacritics deve loped to describe non-speech -like sounds (Ol ler et<br />
al.1976 , Cru ttenden 1970 , Kent and Bauer 1985 , Vihman et al.198S amongs t<br />
others). Bush , Edwards, Luckau , Stoel , Mac ken and Pe tersen ( 1973 ) offe r<br />
such a set of diacritics for the specification of phonet ic modifications of<br />
basic IPA segments. The Du tch mode l (Koopmans van Beinum et al.1986 ) is a<br />
physio l ogic a l l y based transc rip tion procedure. It re l ates phonatory and<br />
articu latory events to the speec h production mec h anisms of the vocal tract.<br />
The articu latory notat ions used can be in t erpreted as indic ating the place<br />
and manne r of articu lation ,<br />
degree of opening of the mouth and the<br />
configu ration of the lips.<br />
This approach is simi lar to that of ours<br />
although the not ations used by us we re those of the IP A. In our study<br />
each<br />
IPA symbo l was ana lysed into it s articu latory features. When transcribing ,<br />
a given symbo l w as se lected with specific consideration of constituen t<br />
articu latory fe atures. In addition to making reference to the ove ral l<br />
99
phonetic value of the uttered sound we wou ld ask ourselves questions lik e:<br />
How and where are the consonant-like events of this sequence produced? What<br />
degree of opening do the vowel -like sequences seem to have and what are the<br />
tongue and lip positions? The answers to questions like these would<br />
faci litate the fi nal selection of symbol. Even though we are aware that the<br />
physio l ogy of the infant's vocal tract is quite unlike that of the ad u lt<br />
(Kent and Mu rray 1982) we would often try to reproduce the sounds to be<br />
transcribed in order better to understand their features (c.f. Pike's term<br />
"imit ation label tec hni que" 1943:16) . In this sense the way in whic h we<br />
used the IPA cou ld we l l be said to be a physio l ogical ly based approach.<br />
Invented signs were used for productions for whic h there are no est ablished<br />
IPA symbo ls e.g. a bi l abia l tril l (8). Modifications of the IPA symbo ls<br />
were made if necessary , using diac ritics developed for the transc rip tion of<br />
babb ling (Bush et al.1973). Also fe atures describing direction of breat h<br />
egressive) and typ e of phonation , if devia nt (breathy,<br />
squeaky , creaky, rough or pressed) were also included in ou r analysis.<br />
The choice of the IP A system raises the question of whether or not it is<br />
correct to use a notational system, developed to desc ribe speech sou nds , on<br />
(1943:150 -151 )- that "the controlling<br />
mechanisms of non- speech sounds are quite similar to those of speec h<br />
sounds". He goes on to say tha t "poi nts of articulation are similar for the<br />
two groups" and that "ty pes of articu l ation movements are lik ewise ".<br />
Further he states that "de grees of stric t u re fa ll into the same general<br />
classes for bot h groups, and strictures interrupt the ai r stream in simi lar<br />
ways regard less of whether or not the sou nds are used in speech". The<br />
simi larities between speech and non-speec h sound productions are due to the<br />
fact that they both use the same articu lat ors. We feel that these<br />
simi lari ties support our choice of notational system and transcrip tional<br />
p roc ed u re.<br />
100
Wi th regard to the Swedish 19 Q£k9Bn of the transcribers and<br />
the possible effec ts of this in the transcribed material , it cannot be<br />
denied that there might possib ly be suc h an influence. A mit igating<br />
circum stance however , is that the transcribers al l were students of<br />
phonet ics and thus trained to dis regard language bac kgrou n d effec ts when<br />
transcribi n g. Admittedly such effec ts are diff icult to control since they<br />
resul t from unconscious process ing of the perceived signal . We do however<br />
fee l that the transcribers' awareness of this problem faci l i tated thei r<br />
more obj ective j u dgemen t of the v ocal i zat ions.<br />
To supp l ement the subj ec tive judgements of the aud itory analyses , acous tic<br />
ana l yses of the mater ial<br />
are be ing made (Roug , Landberg and Lu ndberg<br />
fo rthcomin g) .<br />
The general problem of low inteQe mn1 (see e. g. Ol ler et<br />
al . 1976, Ol ler and Ei lers 1982, S toc k man , Wood s and Tishman 1981), we<br />
fou nd that the disagreement amongst transcr ibers decreased when the<br />
transcriptions were compared not on the segmental lev el , but in terms of<br />
frequency distribut ions of arti cu l atory features . To make feature<br />
compar isons between transc i bers possible a fi rst step was to introduce the<br />
ment i oned numerical coding des cribed above . By choosing this approach a<br />
more stab le transcriber-i ndependent picture of the chi ld's product ion<br />
pattern at a given age is achieved . In Table III t ranscr iptions from three<br />
points in time for a si n gle chi ld are shown . For each of the infants the<br />
correla t ion coefficients for the four transcribers of conson an tal place and<br />
manner of art iculat ion ac ross three poi nts in time , are seen in Table IV .<br />
We see that there is cons iderable disagreement with regard to what segments<br />
101
Transcriptions<br />
LR<br />
a?ha<br />
LJ<br />
?aha?e:<br />
IL ?ce?a Week 19<br />
BH<br />
?ae:?a<br />
LR<br />
° ?aaoaoaoaoao<br />
LJ<br />
ale:le:llcel<br />
IL ?aoe:01:Iaaur Week 33<br />
BH<br />
)«':!lcelcedlceal<br />
LR<br />
LJ<br />
IL<br />
BH<br />
hGpasa<br />
Qbaye:<br />
ma9E<br />
dapayce<br />
Week 54<br />
TAB TtT This T.ble shows exampls of transcriptions by four<br />
nrC:;:.ro" pr-
to use in the transcript ions bu t l i ttle disagreement as to what feat ures<br />
.r invo lved . Th is means that the four transcr ibers oft en wou ld disagree<br />
abou t the va l ue of the individua l segment bu t wou ld essent ial ly agree , in<br />
statistical terms , on how and where the favored sound types had occurred in<br />
the vocal trac t of a part icular chi ld .<br />
In ad d it ion to th segm ental analysis utterances were class ified with<br />
respec t to their sequent ial pattern ing of vowe l and consonant segm ents,<br />
i . (i • phonotact ic st ructure . Eac h utterance was class ified in terms of two<br />
parameters . One criter ion was the phonotact ic structure, the other was the<br />
phonet ic features of the const ituent consonant (s) . The fi rst parameter<br />
divided the utterances in five classes , the determi ning property be ing<br />
ab i l i ty to match (part of ) the utterance to one of the fo l l ow ing five<br />
phonotactic patterns .<br />
(V stands fo r any str ing of one or more vowe ls and C stands for any str ing<br />
of one or more consonants) .<br />
1 Non-consonant utterances 0<br />
2 Si n gle consonan t utterances C<br />
3 Op en sy l l ab le utterances CV<br />
4 Po lys yl labic utterances with<br />
repeated conson ant part of<br />
sy l lable<br />
CiVCiV<br />
5 Po lys yl labic utterances with<br />
vary ing consonants part<br />
(voicing, place or m.nner )<br />
CjVC iV<br />
(C i /=Cj and Cj mu st be non-g lotta l)<br />
103
child V<br />
LR LJ IL<br />
Child K Child J Child M<br />
SH LR LJ IL SH LR LJ IL BH LR LJ IL<br />
SH<br />
LR<br />
.. LJ<br />
u<br />
os<br />
... IL<br />
p..<br />
BH<br />
0,84 0,97<br />
0,85<br />
0,93 0,88 0,94 0,91 0,91 0,96 0,98 0,76 0,93<br />
0,91 0,96 0,97 0,90 0,95 0,92<br />
0,98 0,99 0,98<br />
0,96<br />
0,89<br />
0,98<br />
'"<br />
..<br />
Since it was on ly requ i red that part of an utterance matc h a spec ific<br />
pat tern an ut terance may we l l contain parts not covered by the pattern .<br />
Th is means that utterances be longing to any of the classes may contai n a<br />
lead i n g vowe l withou t this af fecting its class ificat ion. Likewise classes 2<br />
through 5 may contain trai l i n g consonants . (Utterances showing trai lin g<br />
consonant strings contai ning non-glottal or non-nasal features were in fact<br />
marked<br />
special ly,<br />
wh ich actua lly induces a further ref i nement of the<br />
partit ioning) .<br />
The second parameter divided the utterances into six cl asses , the<br />
determining property be ing presence of specified<br />
type of consonant.<br />
A No consonant<br />
B<br />
Glottal consonant<br />
C<br />
Non-glottal consonant with non-comp lete closure<br />
D<br />
Sonorants and gl ides<br />
E Non -glottal conso n ants with comp lete closure<br />
F<br />
Consonant clusters<br />
Al l glottal consonants are cou nted as identic al . A glott al consonant with<br />
an adjacent non- glottal<br />
consonant was consi dered ident ical with the<br />
non-glotta l. A cluster<br />
is def i ned as a<br />
str ing of two or more consonants.<br />
(Note that from this fol lows that a string consist ing of one glottal and<br />
one non-glottal consonant does not cou nt as a cluster . Fu rther a glottal<br />
stop and a glottal fricat ive does not give rise to variation) .<br />
The classes are conc i eved as ordered in the sense that when a part icular<br />
ut terance can be class ified as be lon gi n g to more than one class, the higher<br />
class shou ld be chosen (5 rat her than 1, F rather than A) . The order is<br />
105
supposed to mirror the development of the child .<br />
Ex amples of membe rs of t he va rious categories are show n below .<br />
2 3 4 5<br />
0 C CV CiVCiV Cj VCiV<br />
A<br />
+v owel<br />
.<br />
ee:<br />
B<br />
+g lottal<br />
e? h
th various features of the IPA segments The<br />
followin g flgure5 .how the percent occu rrence of the most frequent features<br />
and categories found in our data. It is interesting to note that there are<br />
features and categories that are more frequent at certain times and les& so<br />
at others and some which do not occur at all in the data.<br />
Age is presented on the abscissa and percent occurrence on the ord inate .<br />
The percent occurrence in the curves i& presented cumulatively. The letters<br />
refer to the four i nfants, the two girls: V and K and the two boys : J and<br />
M. The individual curves for each of the infants are numbered 1 to 4 i • e.<br />
V=l, K=2, J=3 and M=4 .<br />
there is inf inite number of poss i ble places of<br />
rtir"ltinn in the vocal trat (c.f. Pike 943). However on ly a smal l part<br />
of ths placs ar used in speech. The eleven point. of articulation<br />
pha.ryngea 1 , uvular, veler, pa.latal,<br />
alvli'olo-palatal, re trof lex, dental/alveolar, labiodental<br />
in Ol.1r data in very 101'1<br />
numhers and others not at all.<br />
bilabial, dental/alveolar, velar and glttal constitute 9? p er cen t of the<br />
number of places used by the four infants over the whole period<br />
studied. Palatal and uvular articulations occur in four and<br />
places of articulation (retroflex, palato-alveolar, alveolo-palatal and<br />
do not occur at all in our data. The four most f re quently used<br />
107
laes are presented in greater detail below.<br />
tn Figures 1:1 through 1:4 the development of place of art.iculation is<br />
presented. We see that the prevailing place of articulation in the 1-5<br />
months period in three of the infants V, K and J is glottal and that this<br />
dominance rapidly declines in the second half of the first year. The fourth<br />
child M, has a preference for nasals during the first months result ing in<br />
his glottal peak appearing later (5-7 months). For three of the infants V,K<br />
an d M there appears to be a following period of velar/uvular productions. A<br />
m ...j C'lr it.)' (75%) of these productions are velar, however the two pl aces of<br />
articulation have been added si nce they often were difficult to distinguish<br />
frC'lm ear.h nther when transcribing. For the fourth child J a prolonged<br />
glottal period seems to compensate for the velar/uvular articulations. As<br />
the glotttitls titnd velars decline the bilabial and dental/alveolar<br />
productions take over . There does not seem to be any general order of<br />
to the bilabial and dental/alveolar place of<br />
articulation. Two of the infants K and M develop the bilabial articulaticm<br />
befC'lre the dental/alveolar. One infant V acquires the dental/alveolar place<br />
first. whereas t.he last infant J lacks a clear preference until the<br />
beginning of the second yea.r of life. This preference is then<br />
dent.al / alveo lar. At the last sampling point t.hree of the inf ants V,J and M<br />
have no clear preferences as far as front place of articulation is<br />
concerned. In the fourth infant. K however the earlier bilabial<br />
preference<br />
has changed to dental/alvenlar.<br />
According to IPA nine different m anners are used to describe arbitrary<br />
108
CHILD V<br />
CHILD K<br />
en<br />
w<br />
a:<br />
=:J<br />
!:i:<br />
W<br />
U.<br />
U.<br />
0<br />
w<br />
U<br />
z<br />
w<br />
a:<br />
a:<br />
=:J<br />
U<br />
U<br />
0<br />
l-<br />
Z<br />
W<br />
U<br />
a:<br />
w<br />
c..<br />
100<br />
50<br />
0<br />
en<br />
w<br />
a:<br />
=:J<br />
l-<br />
e:(<br />
W<br />
U.<br />
U.<br />
0<br />
w<br />
U<br />
z<br />
w<br />
a:<br />
a:<br />
=:J<br />
U<br />
U<br />
0<br />
I-<br />
Z<br />
W<br />
u<br />
a:<br />
w<br />
c..<br />
100<br />
50<br />
0<br />
1-3 5-7 9-11 13-15 18-20<br />
3-5 7-9<br />
11-13 15-18<br />
1-3 5-7 9-11 13-15 18-20<br />
3-5 7-9 11-13 15-18<br />
AGE<br />
IN MONTHS<br />
AGE<br />
IN MONTHS<br />
CHILD J<br />
CHILD M<br />
en<br />
w<br />
a:<br />
=:J<br />
l- e:(<br />
W<br />
U.<br />
100<br />
en<br />
w<br />
a:<br />
=:J<br />
I-<br />
e:(<br />
W<br />
U.<br />
100<br />
U.<br />
0<br />
U.<br />
0<br />
W<br />
U<br />
z<br />
w<br />
a:<br />
a:<br />
=:J<br />
U<br />
U<br />
0<br />
l-<br />
Z<br />
W<br />
consonantal speec:h sound s in any language: plo.lve, nasal , lat&>r.l , l.t&ral<br />
fric:ativ&, rol I &d , flapped , rol led fri c: ativ&, fri c: ati ve, fric: t i anl & ••<br />
c:ontinuants and .emi-vowels. It is of c:on. i d&rable inter&>st to note that in<br />
our data only th& following mann&r. occurred: pl o.i v&, na.al , la t era l ,<br />
ro I 1 &d ( tri ll) , fricative and .&mi-vowel. The.e diff&r<br />
consider ably in fre qu&>ncy of occ:urrenc:e. Just as in the c:ase of plac:e of<br />
artic:u lation a few of th&> c at&gor i &. dominat& the .c:ene whi l & oth&rs oc:cur<br />
in low numbers or not at all. We f ind that plcsives , nasals and fric:atives<br />
c:cn.titut& 91 p &rc: &n t of the mann&rs us&d by the four in fant. over th&<br />
whole period studied. Semi-vowels , lat&>rals and vibratory trills c: onsti t u te<br />
fo ur , thr&&> and two p&rc &nt r&.p&c:tiv&ly whil&<br />
l at &r al<br />
flaps<br />
and rolled fric:atives are non-existent.<br />
The manner of arti c:ulation for the four infant. is seen in F igure. 2:1<br />
through 2:4. The g&nera l patt&rn here i • • cm&wh a t I e •• dr ama t i c: ov&r tim&<br />
compared tn place of articula tion. The early produc:ticns in the on& to five<br />
mon th. p &ri cd are mainly .top., fricativ&s and nasal •. We .ee that there i.<br />
a major shift in number of full stop c:onsonants around 9-11 mon ths of age<br />
in thre& of the infan t. V, K and M whil& th& fourth child J has hi. peak in<br />
the 11-13 mon t hs period. This inc:rease follows the onset of reduplic:ated<br />
c:on.onant babbl i ng prim& (.&e pag& 14). Li qui ds are pre.ent throughout th&<br />
study but i nc rease towards the end of the sec:ond year of life , at least in<br />
thre& of th& i nfan t. V,K and M, th& fourth child J ha. few&r I i quid. and<br />
the amoun t does not seem to i nc rease. The inc:rease of li quids in the 7-9<br />
month. period is mainly c:aused by bi l abia l and uvul ar t r ills , agai n thi. is<br />
true for three of the infants K,J and M. Infant V has only laterals at this<br />
po i n t . With regard to .emi-vowels th&y are c omparat iv&ly f&w and do not<br />
exhihit any major c:hanges in number. Infant J has an increase towards the<br />
110
CHILD V<br />
CHILD K<br />
(/)<br />
100<br />
::><br />
<br />
w<br />
u.<br />
(/)<br />
100<br />
::><br />
<br />
w<br />
u.<br />
u.<br />
o<br />
u.<br />
o<br />
w<br />
()<br />
ill<br />
50<br />
a:<br />
a:<br />
::><br />
()<br />
()<br />
o<br />
I<br />
Z<br />
W<br />
()<br />
a:<br />
w<br />
a.<br />
o<br />
,<br />
- - --- -- - -- ---- - - - ----- --- - --- ------- ----,<br />
w<br />
()<br />
ill 50<br />
a:<br />
a:::<br />
::><br />
()<br />
()<br />
o<br />
I<br />
Z<br />
W<br />
()<br />
a:<br />
w<br />
a.<br />
o<br />
, ,<br />
L _______________________________________ L<br />
I<br />
1-3 5-7 9-11 13-15 18-20<br />
3-5 7 -9 11-13 15-18<br />
1-3 5-7 9-11 13-15 18-20<br />
3-5<br />
11-13 15-18<br />
7-9<br />
AGE IN MONTHS<br />
AGE<br />
IN MONTHS<br />
CHID J<br />
CHILD M<br />
(/)<br />
100<br />
::><br />
I-<br />
oe(<br />
W<br />
U.<br />
U.<br />
o<br />
w<br />
()<br />
ill<br />
50<br />
a:<br />
a:<br />
::><br />
()<br />
()<br />
o<br />
I-<br />
Z<br />
W<br />
()<br />
a:<br />
w<br />
a.<br />
o<br />
(/)<br />
100<br />
::><br />
I-<br />
oe(<br />
W<br />
U.<br />
U.<br />
o<br />
w<br />
()<br />
ill<br />
50<br />
a:<br />
a:<br />
::><br />
8<br />
o<br />
I<br />
Z<br />
W<br />
()<br />
ffi 0<br />
a.<br />
L _____________________________________ _<br />
1-3 5-7 9-11 13-15 18-20<br />
3-5 7-9 11-13 15-18<br />
1-3 5-7 9-11 13-15 18-20<br />
3-5 7-9 11-13 15-18<br />
AGE IN MONTHS<br />
AGE<br />
IN MONTHS<br />
DSTOP [illjJFRICATIVE UNASAL<br />
.LlQUID<br />
DSEMI-VOWEL<br />
FIG 2:1-2:4<br />
The above Figures show percent occurrence of consonant<br />
manner of articulation as a function of age for each of the four<br />
infants.<br />
111
nd of th tudy wh il infant M ha a d@rease . With regard to fricat iv<br />
the general<br />
trend s@ems to be that of a gradual decrease toward. the end of<br />
lif. It hould be kpt in min d that we do not<br />
different iate between glottal and supraglottal frica tivs in thee curves .<br />
Conider i. ng the frquncy of ocurrnce of glottal, w up t that the<br />
large amount of fri c at i v es found in our data is heav i ly biased by the<br />
glottal production and that the fricat i v dcreasin g toward the end of<br />
the study are the gl ttal ones . Concerning nas als, thee are more frequent<br />
in the early produtions than in the late.<br />
I f we compare Figures 1 and 2 a general picture emerges of the segments<br />
ud ac ros time in the infants' voc al iza tion. From the manner and place<br />
curves we conc lude that a maj or ity of the fricat ives and stops produced in<br />
the one to fiv month per iod ar glottal . The dental /alveo l ar produtions<br />
during the same per iod are mai nly nasal but also fricat ives occur . The<br />
ve l ar ar fricativ or nasal and th ear ly bilabial art iulations are<br />
6em i -VO"Je Is, froica tives and nas als. A. the chi ld grow6 older ,<br />
dntal /alveo lar, bi labial and ve l ar top beom mo re frequent mai nly at<br />
the expense of the frica tiv es .<br />
In Figur 3: 1 through 3:4 w th degr of open ing of the vowl-like<br />
sounds for eac h of the four infants, plotted as a function of age . The<br />
gnera] pattern i that of non-high, non-low vowl dominating the ar ly<br />
produ c tions. At the end of the study a more diversified picture emerge . If<br />
compard with Figur 4:2 through 4:4, "' h ich how the ocurrnce of back ,<br />
center and front vowe l art iculatio ns, a general pattern emerges of mai nly<br />
112
CHILD V<br />
CHILD K<br />
en<br />
w 100<br />
a:<br />
::J<br />
I--<br />
«<br />
w<br />
u..<br />
u..<br />
0<br />
en<br />
w<br />
a:<br />
::J<br />
I--<br />
«<br />
w<br />
u..<br />
u..<br />
0<br />
100<br />
W<br />
0<br />
z<br />
w<br />
a:<br />
a:<br />
::J<br />
0<br />
0<br />
0<br />
50<br />
0<br />
0<br />
W<br />
U<br />
z<br />
w<br />
a:<br />
a:<br />
::J<br />
0<br />
0<br />
0<br />
50<br />
.. ..<br />
0<br />
0<br />
I--<br />
Z<br />
W<br />
0<br />
a:<br />
w<br />
a..<br />
0<br />
0<br />
0<br />
0<br />
o 0<br />
o 0<br />
L _____________________________ I<br />
I--<br />
Z<br />
W<br />
0<br />
a:<br />
w<br />
a..<br />
0<br />
0<br />
0<br />
---------------- - - - - - - - ------<br />
0-3 6-9 12- 15 18-21<br />
3- 6 9-12 15-18<br />
0-3 6 -9 12 -15 18-21<br />
3 - 6 9-12 15-18<br />
AGE IN MONTHS<br />
AGE IN M ONTHS<br />
CHILD J<br />
CHILD M<br />
en<br />
w 100<br />
a:<br />
::J<br />
I--<br />
«<br />
w<br />
u..<br />
u..<br />
0<br />
en<br />
w<br />
a:<br />
::J<br />
I--<br />
«<br />
w<br />
u..<br />
u..<br />
0<br />
100<br />
W<br />
0<br />
z 50<br />
w<br />
a:<br />
a:<br />
::J<br />
0<br />
0<br />
0<br />
I--<br />
Z<br />
W<br />
0<br />
a:<br />
Ll.I 0<br />
a..<br />
W<br />
0<br />
z 50<br />
w<br />
a:<br />
a:<br />
::J<br />
0<br />
0<br />
0<br />
I--<br />
Z<br />
W<br />
0<br />
a:<br />
W 0<br />
a..<br />
o 0<br />
1 __ _____________________ ______ .J<br />
0<br />
0-3 6-9 12-15 18- 21<br />
3-6 9-12 15-18<br />
0-3 6-9 12-15 18-21<br />
3- 6 9 -12 15- 18<br />
AGE IN MONTHS<br />
AGE IN MONTHS<br />
FIG 3:1-3 :4<br />
These Figures show percent occurrence of degree of<br />
opening for vowels presented as a function of age for each of the<br />
four infants. Since a maj ority of the occuring vowels were front or<br />
central , these have been chosen to exemplify degree of opening in<br />
this Figure.<br />
113
I i fe,<br />
v@rsus more different iated vowe l qual iti@s ( i,e,ae ,a ) in later p a rt of the<br />
first year .<br />
It is interest ing to not ice the total ab&ene of bak vowels in<br />
the ear ly p r oduct ions . Bac k vowels beg i n to ap p ear in the second year of<br />
life. With regard to the features r ounded , unrounded , is a total<br />
domi nance of u nrou n ded vowels over the whole pe riod stud ied. Our results<br />
are simi lar to those reported in the literature. Kent and Murray ( 1 982 )<br />
rep ort , in their acousti stu dy of 21 infants at 3 , 6 and 9 months of age ,<br />
that the 3 to 9 months per iod is domina ted by "rela tively mid-front or<br />
central artic ulat ions" . Similarly C r utt e n d en (1970 ) in his study of his own<br />
t .. IO t .. lin daughters found that vowels of "the (.e) (a) (a ) type predom inated<br />
throughou t the babb l i ng perio d ". Kent and Bauer (1985 ) a nalyse d five<br />
infants at 13 months of age and report that "central and front vowe ls were<br />
favored over bac k vowel s, and low vowels p r edomina t ed over high vowe l s " .<br />
Buhr ( 1 980 ) who fol lowed a hi ld from the age of 16 to 64 weeks with<br />
biweek ly rec o r d ings finds that the acute ax is ( i -ae ) develops before the<br />
grave ax is (u-a) and explains this by the earlier devel opment of the jaw<br />
mu s cula tur e. Bickley ( 1 983 ) reports simi lar preferences in the vowe ls of 14<br />
infants' ear ly word produc tions between one and two years of age . Her<br />
results show that the Fl dimension (i- a) d eve l oped before that of F2 ( l - u) .<br />
That<br />
is, bak vowe ls did not our before late in the infan ts' repertoire.<br />
As menti oned earl ier a categor iza tion accord ing to phonotati structure<br />
was made of t he utterances . We were interested in seei ng when utterances of<br />
diff erent consonant and vowe l struture occ urred in the chi ld's product ions<br />
an d h ow they developed over time . In Figures 5: 1 through 5: 4 the<br />
114
CHILD K<br />
en<br />
100<br />
:::l<br />
I--<br />
«<br />
LU<br />
u...<br />
u...<br />
o<br />
LU<br />
()<br />
m<br />
50<br />
0:<br />
0:<br />
:::l<br />
()<br />
()<br />
o<br />
,<br />
,<br />
,<br />
,<br />
,<br />
,<br />
I--<br />
Z<br />
LU<br />
<br />
LU<br />
c..<br />
o<br />
, ,<br />
t _____ ______ ___________ _______ ..I<br />
0 -3 6-9 12-15 18-21<br />
en<br />
LU<br />
100<br />
0:<br />
:::l<br />
I--<br />
«<br />
LU<br />
u...<br />
u...<br />
0<br />
LU<br />
()<br />
z 50<br />
LU<br />
0:<br />
0:<br />
:::l<br />
()<br />
()<br />
0<br />
I--<br />
Z<br />
LU<br />
()<br />
0:<br />
LU<br />
c..<br />
0<br />
CHILD J<br />
- - - - - - - - - ------------- - -- - ---<br />
I<br />
0-3 6 - 9 12-15 18-21<br />
3-6 9-12 15-18<br />
en<br />
LU<br />
0:<br />
:::l<br />
I--<br />
«<br />
LU<br />
u...<br />
u...<br />
0<br />
LU<br />
()<br />
z<br />
LU<br />
0:<br />
0:<br />
:::l<br />
()<br />
()<br />
0<br />
I--<br />
Z<br />
LU<br />
()<br />
0:<br />
LU<br />
c..<br />
100<br />
50<br />
0<br />
3-6 9-12 15-18<br />
AGE IN MONTHS<br />
CHILD M<br />
,<br />
,<br />
,<br />
,<br />
,<br />
: --- - - - - - - - - - - --- --- - --- - - - - --<br />
0-3 6 -9 12 -15 18-21<br />
3-6 9-12 15-18<br />
AGE<br />
IN MONTHS<br />
AGE<br />
IN MONTHS<br />
D FRONT ffiillill CENTER<br />
_ BACK<br />
FIG 4:2-4:4<br />
The above Figures show percent occurrence of tongue<br />
position presented as a function of age. Data are only available for<br />
three of the four infants.<br />
115
&du.p I i c:at&d ( R B ) (which contains a<br />
secondar ily mod ified category cal led redup licated bonsonant babbl ing prime<br />
(R2' » , non-r&dup licated consonan t babb l ing (NR B) , var i egat&d consonant<br />
babbl ing (VB) , non-consonant babb l ing (NCB) and glottal bonsonant babb l ing<br />
(GB) are pr&sent&d for &ac:h of th& fou.r infants.<br />
Th& c:at &gory r&dup lic:at&d babb l ing (RB) consists of utteranc:&s wh&r& a<br />
non- glottal consonant is repeated one 01" mor& times e.g. mamama, lala,<br />
A sub-cat egory to RB, contains utt&r anc&s produc&d with a comp let&<br />
supraglottal constrict ion e.g. dadada, papa.<br />
This class wi ll be referred to<br />
as r&dup licat&d consonant babb l ing prime ( RB ' ) • Th& n&xt class , th&<br />
non- redup licated consonant babb l ing (NRB ) contains utterances with<br />
supraglottal non-r&dup licated consonants e.g. aba, na, lal , af , gao The<br />
cat egory : var i egated consonant babbl ing (VB) , consists of utterances with<br />
alternat ing consonants. These alternat ions may be of manner , place or<br />
voicing e.g. naeda, bada, dae ta. The non-consonant babb l ing cat egory (NCB)<br />
as the name implies of utt&r ances<br />
lacking<br />
consonants i.&.<br />
utterances containing on ly vowe l modu lations e.g.<br />
a, ai • The category<br />
glottal consonant bab bl ing (GB) , consists of utt&ranc&s containing on ly<br />
glottal consonants e.g. ?oh, ? ? ,hae .<br />
W& r&ad from th& Figures that NRB is pr&s&nt in the rep&rtoir & of al l the<br />
infants from an ear ly age and that it inc reases dramat ical ly short ly after<br />
the ons&t of RB ' at around eight months of age . As a contrast to the sudd&n<br />
onset of RB ' we see the more gradual appearance of the category RB as a<br />
whol&. Th& ar&a b&twe&n th& dotted and the so lid lin e of that cat&gory<br />
consists of redup licated utterances with frica tiv es , nas als, liqu ids and<br />
semi-vow& ls as consonants &.g. o aa a, mama, lalala and .. , awa. As can b&<br />
116
CHILD V<br />
fB<br />
rr 100<br />
o<br />
(!)<br />
W<br />
f-<br />
«<br />
o<br />
ll..<br />
o<br />
W<br />
o<br />
ill 50<br />
0:<br />
0:<br />
:::><br />
o<br />
o<br />
o<br />
f-<br />
Z<br />
W<br />
o<br />
0:<br />
w<br />
a..<br />
o<br />
,<br />
,<br />
,<br />
-____ - __ - - - ____ - - - _______ - ___ - - - - ___ ... - - - - - - _____ - - - - - - - ____ I<br />
2-3 4-5 6-7 8-9 . 10-11 12-13 14-15 16-17 18-19<br />
3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18 19-20<br />
AGE IN MONTHS<br />
CHILD K<br />
en<br />
w<br />
rr 100<br />
o<br />
(!)<br />
W<br />
f-<br />
«<br />
o<br />
<br />
w<br />
o<br />
z<br />
W<br />
0:<br />
0:<br />
:::><br />
o<br />
o<br />
o<br />
50<br />
f<br />
Z<br />
W<br />
o<br />
0:<br />
w<br />
a..<br />
o<br />
2-3 4- 5 6-7 8-9 10-11 12-13 14- 15 16-17 18-19<br />
3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18<br />
AGE IN MONTHS<br />
O REDUPLICATED VARIEGATED o.NON-REDUPLICATED<br />
W<br />
BABBLING l2J BABBLING . . BABBLING<br />
•. NON-CONSONANT r::::j GLOTTAL .<br />
BABBLING ru BABBLING OTHER<br />
FIG 5:1-5:4<br />
These Figures show percent occurrence of the phonotactic<br />
categories for each of the four infants as a function of age. The dashed<br />
line shows the percent occurrence of the subcategory Reduplicated<br />
Babbling Prime (RBI ) .<br />
117
en<br />
w<br />
o<br />
(')<br />
W<br />
l-<br />
tS<br />
iX 100<br />
l1..<br />
o<br />
W<br />
()<br />
aJ 50<br />
a:<br />
a:<br />
:l<br />
()<br />
()<br />
o<br />
I- Z<br />
W<br />
()<br />
a:<br />
w<br />
c...<br />
o<br />
CHILD J<br />
ri8Jj:: _________________________________________ _________________ _<br />
1-2 3 -4 5-6 7-8 9-10 11-12 13-14 15-16 17-18<br />
2-3 4-5 6-7 8-9 10-11 12-13 14-15 16-17 18-19<br />
AGE IN MONTHS<br />
en<br />
w<br />
o<br />
(')<br />
w<br />
t:<br />
()<br />
iX 100<br />
l1..<br />
o<br />
W<br />
()<br />
Z<br />
w<br />
a:<br />
a:<br />
:l<br />
()<br />
()<br />
o<br />
I Z<br />
W<br />
()<br />
a:<br />
w<br />
c...<br />
50<br />
o<br />
..<br />
"<br />
"<br />
,<br />
"<br />
",<br />
t\···.··:··.············<br />
CHILD M<br />
2-3 4-5 6-7 8-9 10-11 12-13 14- 15 16-17 18 -19<br />
3 -4 5-6 7-8 9-10 11-12 13-14 15- 16 17-18<br />
AGE IN MONTHS<br />
118
sen , these types of redu pl icat ion p r ecde RB ' . It is int e res t i ng to not i c e<br />
the scarc ity of these babb les. Redup l icated nasal utterances are almost<br />
In our exp er ience nasals of ten occ u r in<br />
discomf ort sounds. Our e xc luding discomfort sounds from the data might<br />
therefore account in part for the low occurrence of<br />
r edup l icated<br />
nasal<br />
utteranc:El's .<br />
We seEl' that VB is present in thEl' El'arly productions and that it<br />
towards t h e end of t h e first year . The infan t s appear to d i vide into two<br />
groups with regard to amount of VB at thEl' end of the study.<br />
Infants V and J<br />
have a m aj or i ty ()45%) of VB at the last s amp l ing point, whe reas K and M<br />
have below fift een pEl'rcent .<br />
The NCB catego ry decreases towards the end of the first year . The max i mum<br />
pea of occurrnce appars about a month before t he onset of RB ' in thre<br />
of the infants V,K and J. The fourth chi ld M h as an extensivEl' per iod of<br />
vocal play with a m a x imu n pea abou t 3 months before the onset of RB ' .<br />
With regard to glottal utterances we see that the ear ly g l o t tal dom i nanc e<br />
is altered by the s upr ag l o t t al art i c ulat ion s i n t r odu c ed m a inly in the<br />
second half of the first year . A trend t hat has already been dEl'monstrated<br />
by the s h i ft in place of art iculatio n (see F i g u r e 1) .<br />
119
Smm.r izing the data presented sa f.r Ie find five deve l cpmen t.l b .bbl ing<br />
stages in the per iod stud ied. These .re;<br />
I<br />
the glott.l st.ge<br />
II<br />
the Vii 1 ar ILtVU Jar s t ag e<br />
III<br />
the vac:.l ic: st.ge<br />
IV the redup l i c:ated c:anscnant babbl ing stage<br />
V the v.r i egated c:onsc:ln.nt b.bb l in9 st.ge<br />
The ages at wh ich the d i ffer ent mi lestcnes oc:cr in the infants c:an be seen<br />
in Figre 6. For referenc:e the res lts frcm a Dtc:h study cf 51 infants is<br />
super imposed (van dar Stelt and Kcopmans van Be i num 1986 ) .<br />
first samp ling point between eight and twelve weeks of age . Th is st.ge is<br />
c:haractar izlid by utterances wit h glgtil c:anscnants and nrcunded cften<br />
nasa lized c:entral vowels. The seccnd most fr& quent typ e cf vcc:al izat icn<br />
dur ing this per iod invalves syll.b ic: n.sa ls. There is a certain amount of<br />
individual va riab i l i t y as tc wh ic:h cf<br />
considerable am o u n t of 9lott al ccnsonants in the bab bl ing. As a resu lt of<br />
the .hift to sp r aglcttal art ic:lat ions that c:ome with the RB stage , the<br />
fre quenc: y of o c:c:u r r a n c: e of glotta ls shows a d r as t i c drop .<br />
In stage II we not ic:e • first se of spraglcttal artic:ulat ions wh ich are<br />
non-nasa l . These a r & t yp icall y art iculat&d at the plac:e cf<br />
ar t i cu l a t i o n . The c;cnscnants t y p ic al cf this stage are vo i c:ed fric:ativ es .<br />
120
en<br />
w<br />
z<br />
0<br />
I-<br />
en<br />
w<br />
....J<br />
<br />
V<br />
IV<br />
III<br />
II<br />
f-----------------------------------------------------I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
MJ V<br />
I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
!<br />
I<br />
I<br />
I<br />
I<br />
/T<br />
I<br />
I<br />
I<br />
I<br />
K I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
I<br />
L _____________________ ______________________________ J<br />
o 10 20 30 40 50 60 70 80<br />
AGE IN WEEKS<br />
FIG 6<br />
Th i s " Figure shows the age of oc currenc of the<br />
mi lesto nes I through V for each of t he four infants. The m ean age of onset<br />
of t h e f our th mi lestone RB ' , is 34 weeks w i t h a standard deviat ion of seven<br />
wee k s . The ind i v idual ages of onset are J 26 , V 33 , M 34 a n d K 43 weeks.<br />
Super i mposed are t h e findinS5 of a D utch study Ivan der Stelt et al . 1986 )<br />
of 51 inf a n t s whe r e the mean age o f o nse t far r edupl icated bab bl ing w as<br />
fund to b e 31 weeks wi t h a stand srd dev iat i o n of six and a half w eeks.<br />
121
is t h a t of inc omp l e t e clasures ou t number ing comp lete ones . Th is stage wou ld<br />
correspond to what has been cal led the "go oing" and "cooing" stage in the<br />
(Ol ler 1980 , S t ark 1980 ) . Th is mi lestone occurs as an<br />
expansion of ve lar /uv ular art iculations resulting in this place of<br />
art iculatio n becoming the second most fre que nt between fifteen and nineteen<br />
weeks of age .<br />
The Y2li£ t ag wh ich fol lows the ve l ar /uv ular stage , is a period in<br />
wh ich the infants produce a large number of non-consonant utterances . These<br />
utterances are best described as re l at ively<br />
long voca l i zat i ons with<br />
non-speec h l i k e int onat ien p a t t erns (Reug et al .<br />
forthcoming) ,<br />
resemb l ing<br />
singing pat terns rather than s p eec h . Bi labial<br />
tri l l s , prol enged and<br />
unrel eased bi labial stops with lots of sal iva, insp i ratory uvu lar tri l l s<br />
(snorts) etc . are a lsc int rcduced dur ing this per iod . It might be sa id that<br />
dur ing this stage the infan t exp l ores both the per iodic and non-per iodic<br />
sound sources of the vocal tract . The vocal i zat icns are var ied bet h in the<br />
intens ity and frequency domain as we l l as in mede ef phonat ien i.e. ve ice<br />
qu al ity. These mcdu l ated vcwe l utterances are present over the whele per ied<br />
stud ied but an expansien in the number ef predu ctions, resulting in a<br />
per icd of max imum occurrence ef NCBs , in the months<br />
preced ing the onset of RB ' . It is accord ing te this per ied that the stage<br />
is def i ned .<br />
the<br />
chi ld deve lops<br />
the ab i l i t y to reproduce sy l lables in sequences . Most ef these utterances<br />
are praduced with a fu l l step censonan t contrasted with an open , central<br />
vawe l<br />
adad<br />
• These sequences are typical ly rhyth mic in<br />
thei r al ternat ions . Rhythmic behav ior, such as reck ing and kicking, has<br />
been found to d eve l o p around six m o nths of age in norma l infants (Thelen<br />
1981 ) . Redup l i cated babbl ing can be cons idered one ef these rhythm ic<br />
122
Svr.l inf.nt. have .ccord ing to th rport. of .om of th<br />
p.rents, been ob.ervd moving th j.w rhythmica l ly p and down witho t<br />
phon.t ing, a fw d.y. bfor th on.t of RB ' . Th .mont of ttr.nc.<br />
with nas al , semi -vowe l and fricat iv redp l i c at i ons con.t itte a mi nor<br />
port ion of th total nmbr of redup l i catd uttrancs . Th mrgnc of<br />
this mi le.ton is def i ned ac co r d ing to the abrpt on.t of redp l i cated<br />
ful l stop consonant bab bl ing (RB' ) . Th ag of onst of this stage var i.<br />
in our infant. from week 26 to week 43.<br />
Th onst of th last stag , t he is t& £onnl eeel1n9 1 , is<br />
def i ned accord ing to the fir.t .amp l ing point af ter the on.t of RB ' , at<br />
wh ich thr is a maj or inc ras in th nmbr of p rod u ct ions with<br />
alternat ing consonants. Th var i gated tteranc. are fond throughou t the<br />
.tdy but inc ras dramat ical ly toward. th nd of th fir.t yar of l i f.<br />
Th is type of b abb l i ng can be con.ide red an elaboratd form of rd up l i cated<br />
both with regard to sgmntal and .pr a.gm ntal<br />
f eatu res .<br />
We f i nd that the i ntonat i onal pat tern. and th .egm ental<br />
however not the case in v ar i egate d con.onant bab bl ing. Hre the c h i l d<br />
producs a var ity of consonants ovr laid on what sm. to b a typical<br />
sentence- l i k e intonat ion p at tern . The chi ld al.o var i. between sy l lable.<br />
that ar prc i vd by th l i stnr as b ing .tr.sd and nstr.sd . The<br />
extent to wh ich the chi ld exp l ore. this type of bab bl ing seem. to b highly<br />
individu al . As mnt i ond arl ir two of or infants V and J, have a high<br />
percentage of var i egated bab bl ing at th. laat record ing seaa ion w hereas the<br />
ot hr two K .nd M, do not seem to favor t hi s t y p e o f babb l i ng (see Figur<br />
5) . At this point of deve l opment it i. important to rememb er that what we<br />
hr rgard as babb ls might in fact be ar ly words. Th inc reas of NRB i n<br />
K at 14 mont hs, might be an expan& ion o f her ear ly lexico n. Infant M shows<br />
a simi lar xpansion of NRB at 15 months. It might b that K .nd M hav ,<br />
123
what is known as , a more analyt ic approac h to language (Vihman 1986 ) thus<br />
preferr ing .horter , .ingle word -or i ented utterance. rather than the<br />
ho l i st ic
(Ol ler 1980 , Stark 1980 , Koopmans van Be i num t al . 1986 )<br />
simi larities are found .<br />
mi l estones for the stud ies ment ioned . The overal l picture is one of gneral<br />
agreement in dis rn ing fiY m in the bab bl ing of normal<br />
infants dur ing the first year of l i f. However some disagreement as to age<br />
of onset and durat ion of the var i ous st ages ex ists probab ly ow ing to<br />
individual variation of the<br />
infants<br />
in the groups stud ied and varyin g<br />
methodo logical approac hes .<br />
Ol ler (1980 ) does not ment ion glottal<br />
consonants as a character istic<br />
feature of the first babb l ing stage , instead he stresses the nasal qual ity<br />
of the vocal i zations. However he does mention the ex i stence of<br />
throaty sounds" (p.95 ) when refe rring to other authors' findings . It might<br />
be that thse sounds are in fact glottal . Stark (1980 ) talks of "ref lexive<br />
voc a l i zat ions" as the first stage and states that the consonants of the<br />
newborn per iod "are almost always glottal stops, nas als or l i q uids" (p.77) .<br />
The Dutch data (Koopmans van Be i num et al . 1986 ) suggsts "glottal stops in<br />
ser ies" as a main character istic of this stage . We find ma i nly glottal<br />
consonants bu t also sy l l ab ic nasa ls in ou r study . If we consider the<br />
anatomical /physiological aspects of the very young infant 's vocal tract we<br />
find that there are considerabl differences compared with that of the<br />
ad ult. Kent and Murray (1982 ) l i st several such important diffrences and<br />
stress the importance of conside ring these facts when "ex plaining patterns<br />
of hange in infant vocal i zat ion s" . The very young infant is "an ob l i g ate<br />
nasa l breat her and an ob l i gate nasal vocal izer " accord ing to the authors .<br />
Th is is exp lai ned by th fact that there is "en gagement of larynga l and<br />
ve l opharyngea l structures " lea ding to mai nly nasa l ex it of air. The<br />
separat ion of these structures deve lop at four to six months of age around<br />
125
Age in OLLER S TA RK KvB et a!. ROUG et a!.<br />
Months 1980 1980 1986 1987<br />
1<br />
PHONATION<br />
REFLEXIVE<br />
-----------<br />
UNINTER-<br />
RUPTED<br />
PHONATION<br />
2 GOO STAGE - --- -- - ---- ---INTEFI:'--<br />
COOING RUPTED - ------ - - --<br />
GLOTTAL<br />
3 AND PHONATION<br />
ONE STAGE<br />
-------- ---.<br />
LAUGHTER<br />
4 EXPANSION<br />
ARTICULA-<br />
- ------ - - - - TORY ----- VELAR - - - ----<br />
STAGE<br />
5 MOVEMENT<br />
__ qTt- ___<br />
VOWEL VARIAtiONS<br />
VOCA LIC<br />
6 PLAY IN THE<br />
- ---------- PHONATOI3Y STAGE<br />
CANONICAL<br />
7<br />
DOMAIN<br />
BABBLING<br />
-1'-----------<br />
8 STAGE - ------- - - REDUPLI -<br />
- - - --- --- ----<br />
REDUPLI -<br />
CATED<br />
CATED<br />
REDUPLI -<br />
9 ARTICULA-<br />
BABBLING<br />
CATED<br />
TORY<br />
10<br />
CONSONANl<br />
--- - -------<br />
VARIEGATED INGljNON- MOVEMENTS BABBLING<br />
11<br />
BABBLING WORD RE - STAGE<br />
STAGE PRO - RUP<br />
12<br />
DUC - r..;AT , E<br />
- ---- - -- - --<br />
PROTO - rTlONS BABB<br />
13<br />
LING<br />
WORDS<br />
rv'ARIEGATED<br />
BABBLING<br />
14<br />
15<br />
'1 6<br />
17<br />
18<br />
19<br />
20<br />
--- --- -----<br />
- - ---<br />
- -----------<br />
STAGE<br />
FIG 7 Th iE Figure shows a com pilation of the findin gs of four<br />
stud i E-t; of infants' ea'-· l:1 '.Iocal de'J E- lcpment . Each column refers tc a study .<br />
The apprcx imatQ age of onset and durat ion of eac r, tag I S I n6, c Ed y the<br />
timE- sc ale on the ord inate . The s t a ges are separated by das h ed lin es to<br />
inicate the cont inui ty between stages .<br />
126
the time of o nset of r edupl icate d babbl ing. Wit h regard to glottals, we<br />
e x p l ain s the p r eva l e n c e of glottal productions in th arly reprtoir.<br />
In t h fol lowin g bab bl ing stag , th goo- or coo-stag , th consonant<br />
sounds "have pre dom i n a ntly a ve lar plac of art iculatio n" accord ing t o<br />
stark ( 1 980: 95 ) . Simi larly al lr ( 1 980: 9 6) &tats that ther i& a "vel ar<br />
consonant prf erence" . In th Dutch s tudy (Koopmans van B i num t al a 1986 )<br />
the ma in charater i&tic of thi& &tage i& consi drd to be th "onset of<br />
ar t iculatory movemen ts" . Accord ing to our data these movements ar producd<br />
by rai&ing or retracting the back of th tongu to form mai nly incomp lt<br />
constrictions against the ve lum or uvula. It has been proposed that the<br />
high occurrenc of dor&al sounds wre due to &pecific body po&ture (c.f.<br />
Locke 198 3) . The h y poth e sis states that in the sup ine positio n the ffcts<br />
of gravity upon the sof t structures of th vocal tract would<br />
result in a<br />
predomi nance of bac k art iculations . Ol ler and Gavin<br />
analyzed ten infants in th ages of one to four months, when gooin g is<br />
supposed to be predominant . The infants wre recordd in both sup in and<br />
u p r ight position. Th rsults revaled no support for the body posture<br />
hypothesis . The infants did not p r o duce more goo- s ounds in the sup ine<br />
position, in fact there was a sl ight preference for goo-sounds in the<br />
upr ight posi t ion. An o t h e r poss i ble e x p lanat i o n for the occurrence of<br />
babbl ing stage& is to regard them a& resu lts of neuro logical maturat ion.<br />
Accord ingly, the ve lar art iculatio ns found in this stage could be accou n t d<br />
for by nuro logical maturat ion factors &uch as the earlier myl inizat ion of<br />
the cranial nerves r espo n sible for the muscu lar control of the poster ior<br />
part of the tongue (ecour& 1975 ) and /or o f th earlier myel inizat ion of<br />
the c orre spo n ding area in the primary motor cortex (Whitaker 1973 , Sa lus<br />
and 5a lu& 197 4) . However , recent data on early neuro logical deve l opment<br />
(Ra kic, Bou rgeo i s , Ec kenhoff , Zecev ic and Gold man- R a kic 1986 ) suggests that<br />
127
th is might not be a C:Clrr&ct; " Tn l::Q p·r ' tat:f6n· . · ihS? '"" ' n ' u r Q'l c9'i C:ia.l maturat ional<br />
processe s do not appear to d evelop in a hierarchi cal Clrder but rather as a<br />
glClba l process . Resent data also suggest that myel inizat ion might nClt be<br />
necessar i l y r elev ant to the funct ion of the nerve (Foster , Connors and<br />
Waxman 1992 ) .<br />
vowel -l ike art iculations is found . Koopmans van Bei num et al .<br />
(1986 )<br />
the phonatory doma in concerning intonat ion, durat ion and intensit y" .<br />
Stark<br />
(1980 )<br />
such as pitch leve l , pitch change and loudness are manipulated . Apart from<br />
marginal babb l ing and vowel-l ike elements (Uful ly resonant nuclei " ) Ol ler<br />
(1980 ) talks of raspberry voca l i zations, squeal ing, growl ing and ye l lin g as<br />
typical vClca l i z at ion categor ies of this stage . Ol ler (1986: 29 ) descr i bes<br />
these voca lizat i ons as representing an "exp l orat ion of a vocal<br />
potent ial "<br />
and of an<br />
intensity, f r e qu ency and phonatory dom ai n. The ab i l i t y to control these<br />
aspects of vocal production could be regarded as vital to later speec h<br />
deve l opment. Af ter having exp l ored the vocal capacities the chi ld can begin<br />
to mod ify and ref ine them accord ing to the requ i rem ents of the amb ient<br />
language .<br />
A maj or phonet ic mi lestone dur ing the first year of l i fe is the product ion<br />
of redup l i cated babbl ing. The onset of this mi lestone has been found to be<br />
fairly sl.ldden , occl.lrring around seven months of age . Accord ing to Ol ler<br />
(1980:99 ) , this is the fir s t stage in wh ich the chi ld produces sy l lables<br />
"that conform to mature natural language restrict ions" i . e . sy l l a bles that<br />
could be accep t ed from a phonological point of view. Th is is an important<br />
ac hievement since it means that the infant now as a fortl.li tous consequence<br />
128
of a natural movement can prod uce voca l i zat ions ac ceptab l as words in an<br />
adu lt langu age . Consquntly adu lts now bgin to rport "words" from thir<br />
infants, or that they have begun to "talk". Interest ingly in what is known<br />
as "Baby Ta l k" a simpl ifid lxicon is usd by adu lts (or older sibl ings )<br />
when ad dre ss ing infants where the pr inciple of redupl icat ion is app l i ed<br />
givin g rise to forms such as "vovve " , "pippin and "totto" in l i eu of<br />
Swed ish "hund " (dog ) , "fAg el " (bir d) and "hast " (horse) . The fact that the<br />
words used in many languags to dnote the two most important pegp l (mommy<br />
and daddy) for the chi ld have th same repet itive structure strikes one as<br />
b ing mor than a coincidence. Instead it seems likely that the adu lt<br />
language has chosen phonet ic forms simi lar to tho s e of the chi ld's own<br />
product ion patterns threby creat ing a l i n k of denotat ive function between<br />
the infant 's voca l i zat i ons and the adu lt langu age (c.f. Loc ke 198 3) . McCune<br />
and Vihman ( 1 987 ) has suggsted that th infant when produc ing his/her<br />
first words might be se lecting adu lt w o rds on the same basis i . e. the chi ld<br />
on ly attmpts to produc those words that fit the pro duction patterns of<br />
that ch i ld's babb l ing repertoi r e. As ment i oned earl ier, redup l i c ated<br />
babbl ing is a rather monotonous type of babb l ing, bath with regard to<br />
intonat ion and to the const ituent consonant and vowe l s egments . Therefore<br />
the maj or character istic of this stage could b said to be the<br />
redu pl icat ion itse lf .<br />
In the fol lowing mi leston, th var i gatd (or non-rdup l i cated ) bab bl ing<br />
s t age the chi ld seems to comb ine certain featurs of the intona tio n il, l<br />
var iations of the prev i ous vow l-stag with th redup l i cil,ted uttril,nces ,<br />
resulting in sentence- l i k e stri ngs of babb le with il, lternat ing consonants,<br />
vow ls and pattrns of stress . Ol ler (1980 ) refers to this last type of<br />
babb l ing with contrasts of sy l labic stress as "gibberish" , others have used<br />
th trm "j argon " (Menn 197 8) . Stark (1980 ) differs from Ol ler by including<br />
non-redup l i cated u tteranc es (e.g. V, VC , CVC ) in this stage . We find, along<br />
129
th& l i n &s of Stark , that th&r& is an expansion in the number of NRBs<br />
fol low i ng the onset of r edu pli cat ed babb l i ng .<br />
From the above observat ions the tentat ive conc lusion can be drawn that<br />
infants fol low a uni versal d eve l opmental pho n et i c course in their babbl ing<br />
l i fe. Babbl ing deve lops from what might be<br />
consi dered a. p roto - sy l l ab i c form. into more speec h - li ke phonet ic events<br />
acceptab le as parts of mature natural languages (c.f. Ol ler 1986 ) . If the<br />
phonet ic repertoi re of the babb ler is compar ed with the phonet ic patterns<br />
most common ly found in the language. of the world, i . Qt • language<br />
un i versals, clear simi larities are found . The preference for open (CV)<br />
sy l lables over closed (VC) , sin gle consonants over clu .ters, vo i ced initial<br />
stops over vo iceless , unvo i ced final obstruents over vo i ced , initial stops<br />
be ing ap ical rather than dorsa l and final obstruents be ing preferably ve lar<br />
or glottal are examp les of suc h .imi l a riti es . These p aral l els are<br />
interest ing since t h ey i mp ly cont inuity in bab bl ing and speech.<br />
If one accepts the cont i nuous view, the qu estion s ar ise of h2 and hn<br />
babbl ing beg i n s to show influ ence from the amb ient language . There are<br />
those who be l i e ve in an ear ly detectable influ ence in the infant 's bab bl ing<br />
(de Boysson-Bard ies 1982 , de Boy.son-Bard ies, S ag art and D urand 1984 , de<br />
Boysson-Bard ies, Sagart, Hal le and Durand 1986 ) and those who be lie ve that<br />
l a ngu age dependenc ies begin to af fect the chi ld's productions at a<br />
re latively late stage (ocke 1983 ) . The simi larities between bab bl ing and<br />
l a nguage un iversals have been taken to support the view that early vocal<br />
deve l opment is due to innate biological prereqUisites for language<br />
Loc ke 1983 ) . We consider it important to distingu i sh between the s e gme n t a l<br />
and the supra- segmental lev els of vocal behav ior when discussing phonet ic<br />
deve l opmen t. In our view the supra-segmental feature. specific to a given<br />
langu age might be ac qu ired earlier than the segmen tal . However as Vihman et<br />
130
al ( 1986 ) points o ut , we do not y e t know what features in the babb l ing of<br />
young infants are due to exposure of specific language versus language<br />
!: !!<br />
•<br />
ConSider ing the present data, two maj or quest ions ar ise,<br />
We be l i e ve that a tentat ive answer to the f i r s t quest ion l i es in the<br />
understand ing of man as a commun icative be ing. Commun icative behav iors are<br />
found in al l floc k an imals and const itute a necessity in order for the<br />
members of the flo ck to funct ion as a who le (Wi lson 1980 ) . In spec ies where<br />
the young offspring d emand s a great deal of attent ion and care from the<br />
mother it is of vital importance that the parent-o ffspring relation be<br />
strong and fundamental . The newborn infant is incapable of caring for<br />
itse lf. It is in other words total ly dependent on the caregiver for<br />
survival . From this point of view it is not difficult to see why the infant<br />
deve lops commun i cat ive behav ior in response to the caregiving treatment .<br />
The function of this behav ior is to t ie the two individuals emot ional ly to<br />
eac h other . We know that infants synchron ize their body movements to that<br />
of adu lt speec h (Condon 1974 ) and t hat they are born with the capac ity to<br />
imitate facial express ions and ident ify vocal sounds (Kuh l and Me ltz off<br />
1984b) . Further it has been observed that infants by t h e end of the first<br />
mon th of l i fe begin to respond to speec h with vocal i zations (Trevarthen<br />
) . This, we be l ieve , i mp l ies a biological component respons ible for the<br />
foundat ions on wh ich the commun ica tive behav ior is based .<br />
It i s known that infants also vocal ize outside of communi cat ive situat ions .<br />
They babb le when playing alone . Therefore bab bl ing cannot be seen as having<br />
an exc lusively communica tive funct ion, nor can speec h (c.f. P i age t 1973 ,<br />
Vygotsky 1971 ) . There are individual needs in the infant such as se lf<br />
131
st imulation and play to be consi dered . Just as play is an important part in<br />
t he chi ld's learn ing to control its c o n stantly growin g body, bab bl ing can<br />
be consi dered as exp l oratory play aimed at control l ing the rapidly changin g<br />
ap p ar atu s (c.f. Fry 196 ) . It has been s uggest ed t h a t experie n c e is<br />
not essent ial to the normal deve l opment of vocal behavior, as indicated by<br />
reports from infants who for med ical reasons have been p r evented from<br />
babb l ing (Lenneberg 1967 ) . However the tracheotom ized infant referred to by<br />
Lenneberg who had a tube inserted at eight months of age and had it removed<br />
at fourteen months had most probab ly begun to produce redup l i cated babb li ng<br />
at the time of insert ion in wh ich case the effects of the tube on phonet ic<br />
deve lopme nt might have been minor . This quest i on as we l l a s the quest ion of<br />
the importance of aud itory exper ience in b abb l ing is st i l l under debate.<br />
The r epor ts stating that d e af infants babb le as hear ing inf a nts do<br />
(Lenneberg 1967 ) are contrad icted by recent data (Ol ler 1986 , Stoe l -Gammon<br />
a n d Otomo 1986 ) suggest ing that deaf infants do not p r odu c e redup l i cated<br />
fu l l stop babb les dur ing the first year of l i fe.<br />
The answer to why t h e r epe r toire of the babb ler seems to be a uni versal one<br />
might be found in anatomical and aerodynamic constrai nts on the vocal<br />
mechan i sms with consi derat ions of the p h y s i o l og i c a l and neuro logical s t age<br />
of deve l opment. If one thinks of the art iculators in terms of a spr ing and<br />
mass system .. ,here a certai n amount of imped ance is present in eac h<br />
movement , one can explain some art iculatio ns as be ing more economical i . e.<br />
more eas i ly p roduc ed , than others. By economic is meant those movements<br />
that require the least amount of energy<br />
in order to be performed in<br />
relation to the structures invo l ved , a sort of art iculatory cost-benef it<br />
analysis (c.f. Lindblom, MacNei lage and Studdert-Kennedy, in press) . The<br />
se lect ivity of the babb ler in relation to pho n et ic preferences mig h t be<br />
understood in simi lar terms . The chi ld produces those art iCUlations that<br />
are the most economical and t hat acoust ical ly are the most s al ient. Th is<br />
132
impl ies that norma l babb l ing prespposes intact ad itory funct ion.<br />
If one considers the redp l i cated babbl ing stage in simi l a r terms the<br />
open ing and closin g movements of the jaw reslting in sy l l ab le-like events<br />
might be viewed as an oscil l at ing system , the constraints of wh ich are set<br />
by neurological maturat ion. The consequences of this redup l i cated behav ior<br />
might be that the infant so to speak discovers the sy l l ab le and indirect ly<br />
the supraglottal f l l stop art iculation. In order to voluntari ly contro l<br />
and coo rd i nate a movement it might be necessary first to produce this<br />
movement repet itively (c.f. Thelen 1981 ) . In the case of babbl ing, factors<br />
such as the length of the breath cyc l e would determine the durat ion of the<br />
ear ly repet itive vocal i zat ions . kater , as the ab i l i t y devel ops to contro l<br />
movements vo lntari ly, the chi l d can free itse lf of sc h bon ds and more<br />
free ly determine the number of redupl icat ions.<br />
To conc lde, the ev idence am assed in the l i t e r atre as we l l as that<br />
presented here strong ly impl ies that babbl ing can be consi dered a precursor<br />
of speech, in form as we l l as in fnction.<br />
The authors are great ly indebted to prof essor Bj o rn ind blom for his<br />
ass istance and he lpful comments on the manuscript .<br />
We wold also l i ke to thank Mari lyn Vihman for being a great sorce of<br />
insp ira tion and for commenting on the manuscript .<br />
We thank the parents and the inf a nts who p a r t icipated in this proj ect for<br />
their pat i ence and enthusi asm without which this proj ect never wou ld have<br />
133
We are indebted to Karin Ho lmgren for hr initial work in col lecting and<br />
analysing data and we thank Boe l Harl id for her ass istance in transcribin g<br />
the mater ial .<br />
docent Birgitta Jai l i n g, doctor Goran Aurel ius and doctor Ann-Sof ie<br />
Er icsson at Sankt Garan's Chi ldren's Hosp ital for int erst ing and fruitful<br />
col laborat ion.<br />
The authors wou ld l i k e to express thir thanks to Harved He llich ius for<br />
drawin g the Figures and Tab les of this article.<br />
134
FOOTNOTES<br />
1 Spgn5gred by Fijr5tmj blommans R ik5 f6rb nd (Fir5t gf May Flower Annl<br />
Cam paign for Chi Jdren'5 Health) .<br />
2 The chi ldren were medical ly exmined at birth and a p5ychDmDtgr<br />
deve l opment test ( The G rif fit h Ment al Dev e l opment Sc ale) was perfo rmed at<br />
5, 10 and 19 mgnth5 o f ge . The chi l dren5' re5u lt5 were found to be we l l<br />
bove those of a s t a n d a r d group on al l occ as ions (Norberg 1994 ) .<br />
3 The equ i pment u5ed wa5 a Sony table microphgne nd a Uher tape-recorder .<br />
4 The signal was computer digital i zed and ed i t e d with n ILS progrm cal led<br />
M IX, deve l oped at the Royal In5t i tute of Technol ogy in Stockholm. The<br />
ac tul cutting po int in the signal would be an intensity zero poi nt as<br />
clos e as pos5ible to where the chi ld's utterance began . We were careful not<br />
to ex lude initial Dr final vo iceless segments and we thought it important<br />
that the utterance wa5 not di5torted by the cli pp ing 50 a5 to sound<br />
unnatural .<br />
135
Bic;l dQY, C. ( 1 9S3 ) . A C; Q U lii t i c: Ev i dQnc;Q fQ... Phenele glc:al<br />
Vewe I Iii i n Young Chi ld ... en . Speeh Commun ic;a tion G ... eup<br />
Resea ... l:h l. abo ... ate ... Y of Elel:t ... en ic;s, M. I.T • • . 111-124 .<br />
Deve l Qpment<br />
ef<br />
We ... k ing P .. pe ... s ,<br />
Biklay, C. , Lindblom, B. and R eug , L. (1986 ) . Ac;Qulit ic; Measu ... eli Qf Rhythm<br />
in Infant 's Babbl ing. Pape... P ... esented at the P ... Ql:eed ingli ef the 12th<br />
Inte ... nat ipnal Ceng ... eliis on ACOU li tic;li, TQ ... cntCi.<br />
de BeYlii lien-Ba ... d ieli, B. ( 1 982 ) . D c Babieli Babb le a li<br />
p ... elientad at t h e Inte ... nat icnal Confe ... ence Qn<br />
Texas .<br />
Speake ... 1i Speak? Pape ...<br />
Infant Stud ieli, Au.t in,<br />
de BQYlison-Ba ... d ieli , B. , Saga ... t , L. and<br />
Diffe ... enc:es in the Bab b l ing ef Infantli<br />
Jou ... n al ef Ch i ld Lan guage . 11 . 1-15.<br />
Du. ... and , C. ( 1 984) . Dilil:e ... n ible<br />
Acco ... d ing te Ta ... g9t -Langu.age .<br />
de Bt'lysson-Ba ... die s, S. , Sagil. ... t , L. , Ha, ) l e, P. and DU ... iilnd , C. ( 1 9 86 ) .<br />
Aceulii tic: Inv elit igat i enli ef C ... esli-l ingu. ilitic: Va ... i abi l i t y in Bab bl ing. In<br />
Lindb lgm and Zette ... st ... cm (ed s. ) P"'aB o f Ea!:l::t. .EEJ!h . Nel., Ye ... k :<br />
St, oc l< tcn P ... es s .<br />
Suh ... , R. D. ( 1 980 ) . The Eme ... genc;e ef Vewe ls in an Infant . leu ... n al ef Speec h<br />
and Hea ... ing Relie .. ... c; h. al . 73 -<br />
94 .<br />
Bu l l c wa , M. ( 1 979 ) . Bef e E£h •<br />
Camb ... idge Un ive ... s ity P ... elili .<br />
Elulih , C.N. , Edl .. ... dli, M.l .. , LUI:I
Kent , R.D. (198 1 ) . Art iulatory-Aou st ic Persp ec t ivs o n<br />
Deve lopment. In Stark (ed. ) ban e hyi o in !nfnEX<br />
ChU&b.QB& . Nec.-, Yorl-: : £:: lsev ier NCirth Ho l land .<br />
Sp eec h<br />
i\ n & §!!.!:lx<br />
Kent , R.D. a n d Bauer, H.R. (19 85 ) . VClc a l i zat i ons of on-year-o lds.<br />
of Ch i ld Language . Ai . 49 1 -526 .<br />
J ou r nal<br />
Kent, R. D. and Murray , A. D. ( 1 982 ) . ACCIust i Features of Infant Voca lic<br />
Utteranes at 3,6 and 9 months . Journal of the Aous tic Soc iet y of Amer ica.<br />
Z2 :g<br />
• 353- 365 .<br />
Kuhl, P.K. and Me ltzoff, A. N. (19S4b ) Infant's recCign ition of Cross-modal<br />
Corr e s pon d enc e for Sp e ec h: Is it based on Phy s i c s or Phont ics? Journal of<br />
the Acoust ic Soc iety of Amer ica. Za . Suppl .l. sao .<br />
K oopm a n s van Be inum , F.J. a n d van dr S t e l t , J.M. (1986) . Ear ly S tag es in<br />
the Deve l opment of Speech Movements. In I_ indblom and Letterstram (ed s. )<br />
Pr!!E.!!!:rs of Eatl::t. §eJ!b. • Nec.-, Yorl!': Stockton Press .<br />
L.eours , A.R.<br />
and I.. anguage .<br />
D e12I!!!li.:..<br />
Press .<br />
(1975) . Mye l og&net ic Correlates of the Deve l opment of Sp eech<br />
In Lenneberg and L.enneberg (eds. ) Eni2n 2£ ks<br />
8 l:!!!!lt idiE.ielia 8EEJ:2 h.l.. 9.l.!..! Nec.-I Y Cl rk: Academ i <br />
Lenneb erg, E.H. (1967) . Biologica) Foundat ions Clf L.angu age . New York: John<br />
W ile y 8t Sons .<br />
L.enneberg, Ii.H. and I_ e nneberg, E. ( 1 975) .<br />
DelBE.!!!t : 8 l:!!t i &.i![E.iI!lia 8EEJ:c h.l.. Q.!.!..l<br />
Press .<br />
FOillld at i!:!!l of L.a!lS<br />
New York: Academ ic<br />
L.indb lom, B. , MacNei lage , P. and Studdert-Kennedy, M. n i - Kom s hia n,<br />
Kava n agh a nd Fe r gu s o n (ed s. ) £b.il& Ehong las::t..a.. Va!.:..!<br />
137
Ol l&r, K.D. (198 1l . Infant V C:II: al i zat ion.: Exp lcrat ion and R&f l&xivity. In<br />
Stark (&d. ) b9 e hyicJ:. in .!nfn£:t m!. §l:t £hil£!.h!!S!£!. . 1'1& ... , Ycrk:<br />
El6ev ier North Hc l la nd .<br />
Ol l&r, D.K. (1986) . Metaphcncl cgy and Infant Vocal i zat ion •. In Lindblcm and<br />
Zetterstri:im (eds. ) Prll!:§Br. of Eatl:t. §E.h • I'I&W York : St.cckt.cn Pr& •••<br />
Ol ler, D.K. , Wieman , L.A. , Dcy}&, W.J.<br />
and Sp&ech. Jcurnal cf Chi ld Langu ag& .<br />
and Rc •• , C.<br />
;! • 1-11.<br />
(19 76 ) . Infant Sabbl ing<br />
Ol ler, K.D. and Ei lers, R.E. (19 82 ) . S i mi l ar i t i &. of babbl ing in Span ishand<br />
Engl i.h-l&arn ing babi& •. Jcu rnal cf Chi ld Langu age . 2 . 565-577 .<br />
Piage t, J. (19 71 ) . Lanss nd Thshi 2i ihe £ hil£!. . I'I&W Ycrk: Th& Wcr l d<br />
Publ ishing Company . Translation by M. Gabain frcm French or igin al "Le<br />
Langag& &t la P&n.&& ch&z l'E nfant". ( 1 968 ) .<br />
Pike, K.L.<br />
Pr&6 •.<br />
(19 43 ) • Ann Arbor :Th& Un i v&r.it.y cf Michigan<br />
Rakic, P. , Bour g&o i., J.P. , Eck&nhoff , M.F. , Z&c&v ic, 1'1 . and Gc l dman-Rak ic,<br />
P.S. (1986) . Concurrent Ov&rproductic n of Synapse. in Diver.e R&g ion. of<br />
th& Pr imat.& C&r&bral Ccrt.&x . Sc i&nc&. . 232-234 .<br />
Roug, L. , Landb& rg, I. and Lundberg, L.l. , (fo rthcoming) . Acoust ic Anal yses<br />
of Four Sw&d i.h Infant.. Ear ly Vocal i zation • .<br />
Salus, P.H. ilnd Sa lus, M.W. (1974) .<br />
Phono logical Acqu i.ition Ord&r . Languag& .<br />
Deve lopmental Neurophysiology and<br />
3Q • 151-160 •<br />
Sta rk, R.E. (1980 ) . S t age s of Sp&ech Deve l opment<br />
In Yen i -Kom.h ian, Kavanagh and Ferguson (&d • . )<br />
Prodlli:li!2!l<br />
• 1'I&oJ York : Academ i c;; Pr& ••.<br />
in the Fir.t Year of Life.<br />
£hiU Ph!2!l!llQ9:t..L VoL..!<br />
stark , R.E. , (eds. ) (198 1 ) . Lans B&hv iQJ:. in .!nin£ nd E atl<br />
Child hQBQ. . I'I&W York: lil.ev i&r North Ho lla nd .<br />
van d e r Stelt, J.M. and Koopmans van Be i num , F.J. (1986) . The On.et of<br />
Babb l ing Re l at&d to Gro •• Mot.or D&v& lopment. In Lindblom and Z&tt.&r.trom<br />
(eds. ) Pr!!'£!!J:.§2r. of Eatl:t. ih . 1'1& ... , York : S tcckt.on Pr& ••.<br />
Stoc kman , J. , Wood s , D.<br />
Phcln&t ic .&gm&nt.. in<br />
Psych ol ingu ist ic R&.earch.<br />
and Tishman ,<br />
Early Infant<br />
!Q . 593-617.<br />
A. (198 1 ) . Li.t.&ner Agre&m&nt on<br />
Vocal izat. ion.. Jou rnal of<br />
stee l -Gammon ,<br />
C.M.<br />
and Oteme (1986 ) .<br />
Journal of Sp &ech and Hear ing<br />
R&.&arch.<br />
Th&len, E. ( 1 98 1> . Rhyt.hmical B&havior in<br />
Persp e c tive . Deve lopm&ntal P.ychclogy . !Z .<br />
Infanc:y:<br />
237 -257 .<br />
An<br />
Et.hclogical<br />
Th!!, EJ:.in£iEl! .!2i ih e I n t &J:.nati c l Ehcii£ 8l!£ia t.ion • Obt.ainable from<br />
the Internat ional Phonet ic A ssocia t ion, Un iver6it y Co l leg& , Gow&r Street ,<br />
LcndCln .<br />
Vihman , M.M. ( 1 986 ) . Individual Differ&nce. in Babb l ing and Early Sp&&c;; h:<br />
Pred icting to Age Three . In Lindblom and Zetterstrom (ed s. ) PrllJ:.l!.!2 of<br />
138
Vih man , M.M. , Macken , M. A. , Mi l ler, R. , Simmons, H. a n d Mi l ler, 3. (1995 ) .<br />
Frcm Bab bl ing tc Speech : A Reassessment of the Ccnt inity Issue . banguage<br />
61.i..£ • 397-445 .<br />
Vihman , M.M. , F e rgscn , C.A. a n d<br />
Deve l opment from Babbl ing to Speech:<br />
Differences. App l i ed Psychol i n g istics.<br />
Elbert , M. ( 19&16 ) • P h cnc l cg i c al<br />
Commcn Tendenc ies and Individual<br />
Z . 3-40 .<br />
VygotS\.c)/, L.. (1962 ) .<br />
E ng l i s h translation<br />
( 1 934 ) •<br />
Th9.hi nd<br />
by E. Hanfmann<br />
L. a nss!!<br />
and G.<br />
New Ycrk : M. I.T. Press .<br />
Vakar of Rss ian cr iginal .<br />
Wi l s cn , E.O. ( 1 990 ) •<br />
Harvard Un iversity Press .<br />
W h i taker , H.A. ( 1 973 ) . Ccmments on the Innateness cf L.angage . In Shy<br />
(ed. ) §.!!!!ru! Ne ,Ri rtie.n§. in bi!lSl.l istic Wash ington D.C. : Gecrgetol>m<br />
Un i vers ity Press.<br />
Yen i -Kcmsh ian, G.H. , K ava n agh , 3.F. , and Ferguscn , C.F. , (eds. )<br />
C hil& Eh c!l.!!!lB.J.. E!.:...!.J.. E!:Bduc t i cn • Nel>l York: Ac adem ic: Press .<br />
( 19E10 ) •<br />
139
A SIMPLE COMPUTERIZED RESPONSE COLLECTION SYSTEM<br />
Johan Stark and Mats Dufberg<br />
1. Introduction<br />
The object of the system described here is to enable automatic response<br />
collection directly from the respondent (s) to a computer readable<br />
media. This has several important implications including rendering<br />
unnecessary the manual transfer of the data from answer forms to a<br />
computer for subsequent analysis. Data will instead be directly<br />
available to the computer. The computer may in turn perform on line<br />
processing of this data and hence control the data collection procedure.<br />
The computer configuration consists of one main computer and a number of<br />
terminals.<br />
In section 2 an application will be described. The application<br />
shows that the computer system is a useful tool and that it can be<br />
handled by personell who are inexperienced in computer programming, as<br />
was the case in the project described below. In section 3 possible<br />
future applications will be discussed. In section 4 the hard and software<br />
and the necessary programming will be presented.<br />
2. An application<br />
In a project described in McAllister et al. (1987) we wanted subjects to<br />
give judgements on recorded speech material. We decided to use a<br />
computerized method for collecting the responses from the subjects. We<br />
will first very briefly present the project and then describe how we<br />
used the computer system.<br />
For the project, we recorded a number of students before and after<br />
a certain training period. The same material, consisting of sentences<br />
and words, was read at both recordings. The recordings were digitalized<br />
and recorded to disks. From this material we produced test tapes. On the<br />
test tape the material was presented in pairs. A pair consisted of the<br />
same sentence/word read by the same student at the two times of<br />
recording, before and after the training period. The order within the<br />
pair was random. Then we recruited a panel of experts to judge which of<br />
the two members of each pair was the best (McAllister et al. 1987).<br />
Each subject in the listening test, that is, each expert-panel<br />
member, was sitting in front of a terminal with a keyboard and a<br />
monitor. We presented written information on each screen which was sent<br />
from the main computer. The information was, in this case, the standards<br />
that the subjects should be using for their judgement. The tape recorder<br />
was automatically started when all terminals had received the written<br />
information. The speech material was presented through headphones from a<br />
taperecorder. The tape was specially prepared with the speech material<br />
140
on channel one and control tone signals on channel two. The tone signals<br />
were placed directly after each pair. The tone signal triggered the tape<br />
recorder's stop mechanism immediately after each pair was presented. The<br />
tone signals were also sent to each terminal. The keyboard was locked<br />
for key presses until the terminal had "heard" the tone signal. And then<br />
the terminal only reacted on certain keys, namely those keys that gave<br />
the three accepted responses, the return key, and the back space key.<br />
That is, the subjects pressed one key for their judgement and then the<br />
return key. They could change their minds by pressing the back space key<br />
before the return key. All other keys appeared to the subjects to be<br />
"dead". The data from all terminals were then collected by the main<br />
computer and everything was repeated again until the end of the test<br />
tape. After each session the data were stored on files, one file for<br />
each subject and tape.<br />
We had decided to use SAS, a statistical program, for statistical<br />
analysis, so our data files had be compatible with the SAS program. The<br />
data files that resulted from the test sessions were pure text files,<br />
that is, they contain only normal characters. But the data files<br />
contained only the pure data, that is, they contained no information on<br />
which student or which sentence/word the data was connected to. That<br />
information was stored in a special key file. The key file also<br />
contained information on which order within the pair the sentences/words<br />
were presented. With the help of this key file and a small computer<br />
program we transformed all data files for each test tape into an SASreadable<br />
matrix.<br />
Some pros of this computer based system are:<br />
- The data is directly stored on computer readable media.<br />
- The response interval can be controlled on line, for example by the<br />
responders.<br />
- The responders will never get lost. What they respond to will always<br />
correspond to what just heard.<br />
What we had to prepare for this application (except for computer<br />
programs) were the test tapes with tones, the key files, and the the<br />
text files that contained the information that was written on the<br />
screens of the terminals. We would have had to make test tapes in an<br />
ordinary pencil-and-paper application too, and the key files and text<br />
files were easily made from the command files that produced the test<br />
tapes.<br />
3. Future applications<br />
The computer system could easily be used for a number of applications.<br />
We hope that there will be a library of standard applications so that no<br />
or marginal programming will be necessary for the user in the future. It<br />
is reasonble to expect the following applications to be standard:<br />
- Listening experiments with or without written information, with or<br />
141
without limited number of response alternatives.<br />
- Experiments measuring response time of audible and/or written stimuli.<br />
- Demonstration experiments for seminars.<br />
Computerized correction, on line or after the session, could of<br />
course be included in any application.<br />
4. Hard and software<br />
The first prototype system was set up by connecting a number of cheap<br />
personal computers (Micro-Bee 32) via a simple wire interface. This was<br />
dne by using the original input/output system already present on them<br />
for printer control etc. The printer interface is used as a parallel 8<br />
bit bus, and 4 bits from the serial interface is used as a control bus.<br />
Altogether these 12 bits are connected to an ordinary flat cable using<br />
standard 25 pin D-sub connectors. Up to 16 machines may be connected in<br />
this way. An ordinary CP/m system (SI-80) is used at the end of the line<br />
as a server of this network. The interface here is equally simple, only<br />
12 bits of digital I/O is used.<br />
Data may be transferred to/from the server to/from any terminal on<br />
the line. Each terminal has a unique address which enables it to<br />
communicate independently of the others. All data flow is controlled by<br />
the server system. The data transfer speed is about 20 kBytes/sec which<br />
gives almost no delays seen by the terminal user. The updating of one<br />
full terminal screen will virtually take place in no time at all.<br />
Each terminal has a portion of so called Boot strap software stored<br />
into a resident non-volative memory. Actually the character generator<br />
eprom has some unused locations that are used to store this software.<br />
This piece of software will initialize network processing by a simple<br />
startup command from the keyboard. The server system has similar software<br />
loadable from a diskett.<br />
On top of this, each terminal can load a portion of software<br />
written in Basic that enables the user to write Basic programs that<br />
communicate with the server system. These network services are easily<br />
programmable and may be extended to whatever commands that are wanted.<br />
On the server side the command processing is written in Turbo Pascal.<br />
The prototype system has so far the following commands available:<br />
1. Send a block of data to server.<br />
2. Receive a block of data from server.<br />
3. Load a program from server.<br />
4. Save a program in server.<br />
5. Load and start a program.<br />
6. Start a Revox tape recorder. (An additional interface required. )<br />
7. Stop a Revox tape recorder on an audio signal.<br />
For a particular experiment the user will have to make an<br />
142
application program using the more general software described above as a<br />
library. The application software will consist of a server part written<br />
in Turbo Pascal and a terminal part written in Basic. First the terminal<br />
program is implemented on a stand alone terminal. The Micro-Bee 32<br />
computers are stand alone computers with a built in Basic interpreter,<br />
computer screen and keyboard. Network services may be simulated in Basic<br />
using DATA statements as input and PRINT statements to check output.<br />
Similarly the server system program may be tested by simulating the<br />
terminals. When both programs seem to work satisfactorily a real version<br />
is set up and tested. If properly done the program will then handle<br />
several terminals simultaneously. This enables anyone to set up a simple<br />
or more sophisticated data collection procedure for his experiment in a<br />
fairly short time. The application software may also partly be used to<br />
update the general part thus after some time of use providing a whole<br />
database of readymade software for various cases of data collection<br />
experiments.<br />
Since every terminal also is an independant computer with its own<br />
CPU, the processing power of the system will be large. One effect of<br />
this is the ability to let each terminal individually measure the time<br />
for a response with a very high resolution.<br />
For users less experienced in programming in Basic and Turbo Pascal<br />
some general data collection program suited for a number of common<br />
situations could be written by someone more experienced. Then, a simple<br />
set-up file editable from an ordinary word processor could easily be<br />
used to determine some application variables in the data collection<br />
procedure, such as how many terminals are connected, how many responses<br />
to collect and the names of the files to be used for text input and data<br />
collection storage.<br />
A second generation of the system is under development using an<br />
IBM-PC-AT as the server computer. This removes the necessity of a<br />
special data transfer from the present CP/m machine to the more common<br />
MS-DOS format diskettes.<br />
REFERENCES<br />
McAllister, Robert, Dufberg, Mats, and Wallius, Maria (1987):<br />
"Experiments with technical aids<br />
Published in Perilus report no<br />
in pronunciation<br />
5 ( this volume ) .<br />
University of Stockholm, Institute of linguistics.<br />
teaching" .<br />
Stockholm:<br />
ABOUT THE AUTHORS<br />
Johan Stark is an engineer and has constructed the hardware and written<br />
the basic software for the computer system.<br />
Mats Dufberg is a graduate student in phonetics and has written the<br />
software and run the system for the application described.<br />
143
EXPERIMENTS WITH TECHNICAL AIDS IN PRONUNCIATION TEACHING<br />
Robert McAllister,<br />
Mats Dufberg and Maria Wallius<br />
1.0 Introduction<br />
This is a summary of experimental research whose aim was<br />
to<br />
test the utility of technical aids in pronunciation<br />
teaching. There have been several attempts in recent years<br />
to apply developments in speech technology to various<br />
language teaching/learning situations. In particular,<br />
there<br />
has been interest in a metodological approach which includes<br />
the concept of "feedback" as a learning aid (de Boot, 1980).<br />
This concept has been put to wide practical use with the<br />
advent of the so called "language laboratory" and its use in<br />
the field of second and foreign language learning. The<br />
relatively modest success of this movement has led to<br />
efforts to complement the audio active-comparative method<br />
most often used in the language laboratory. Some of these<br />
efforts have been based on the idea that feedback of the<br />
speech signal or some of its components via alternative<br />
sensory<br />
channels may be a viable aid especially in learning<br />
to produce suprasegmental aspects of the phonology of a<br />
foreign language.<br />
The teaching and learning of features such<br />
as rhythm and intonation has always seemed to present<br />
special problems and has proved to be particularly difficult<br />
144
( Crystal, 1975). Unfortunately, this difficulty has often<br />
led to the neglect of this important aspect of the target<br />
language phonology. Teachers have often been at a loss as<br />
to how to teach this part of the sound system.<br />
One of the<br />
traditions in this field is the use of<br />
a<br />
visual<br />
representation of the prosodic elements as a<br />
( Kelz et. al. , 1977). May different symbols and<br />
learning aid<br />
systematic<br />
transcription systems have been used but their common goal<br />
has been to augment the written text with an explicit<br />
notation of the prosody. This tradition provides the<br />
background for the research on technical aids in the<br />
teaching of prosody that has been done in the last three<br />
decades. The idea of using visual or tactile channels for<br />
the<br />
feedback of speech signal information has been used for<br />
many years in the teaching of handicapped learners or<br />
learners who are for other reasons not able to make<br />
effective use of the auditory feedback channel which appears<br />
to be indispensable in the production of normal speech<br />
( Potter et. al., 1948; Abberton and Fourcin, 1975; Martony,<br />
1976; Spens, 1984). This work drew the attention of<br />
phoneticians and linguists who were interested in the<br />
acoustic<br />
and perceptual nature of prosodic elements and the<br />
acquisition of these features by language learners. The<br />
basic idea here was that isolation and visual feedback of<br />
acoustic parameters critical to the rhythm and intonation<br />
could serve to concentrate the learners attention to these<br />
important and difficult aspects of the target language and<br />
thereby facilitate the learning of them.<br />
Pioneering work in<br />
this direction was done as early as 1966 by Harlan Lane<br />
145
(Lane and Buiten, 1966) . Since then, there has been<br />
considerable interest in the use of technical aids in<br />
pronunciation teaching. The subject has been discussed and<br />
several studies have been done, a large part thereof being<br />
of an informal nature (a few recent examples: Vardanian<br />
(1964) , Bannert (1979) , Albertson (1982) , Baker (1982) , for<br />
a critical survey see Leon and Martin, 1970) ). There have,<br />
however, been relatively few controlled studies of this<br />
methodology.<br />
Notable exceptions to this statement in recent<br />
years are James (1976) , Hengstenberg (1980) , and de Bot<br />
(1983). These researchers found a positive effect of the<br />
use of technical aids in the teaching of intonation.<br />
Generally<br />
speaking the learners who used the aids were more<br />
successful in learning prosodic features such as<br />
intonation<br />
than those who practiced according to traditional language<br />
laboratory methods.<br />
This report is a summary of research in which the<br />
methodology disussed above for the teaching of prosody has<br />
been used in a slightly different way. Our aim was to test<br />
the utility of technical aids and the feedback methodology<br />
as an integrated part of a foreign language course. Our<br />
basic question was similar to other studies already<br />
mentioned: Do technical aids help in the learning of<br />
prosody? - or formulated as a 0- hypothesis: learners who<br />
use the technical aids will not achieve a more native-like<br />
production of the prosodic features of the target language<br />
than the learners who do not use the techical aids.<br />
Aspects<br />
of this research that were somewhat different than the<br />
studies mentioned above include the integration of the<br />
146
training program in the course curriculum. Whereas earlier<br />
studies often compared performance before and after one or<br />
several short training sessions, we have tried to simulate<br />
an actual course situation where the training with the<br />
technical aids is more spread in time and integrated in<br />
the<br />
course as a logical part of the overall program.<br />
Consequently we have chosen to focus our interest on the<br />
obviously<br />
important "long term effects" of this methodology<br />
whose short term effects have been shown to be positive.<br />
2. 0 METHODS<br />
The methods used in this research will be presented under<br />
the following headings:<br />
1. Apparatus<br />
2. Training sessions and control recordings<br />
3. Progress evaluation through listener judgements<br />
It should be pointed out that steps 2 and 3 were carried out<br />
twice in two consecutive experiments. The first training<br />
experiment was done as a pilot study but is included in this<br />
report since the results of the two experiments were quite<br />
similar.<br />
The second experiment differed slightly on several<br />
methodological points and this will be elaboratesd upon in<br />
that which follows.<br />
147
2.1 Apparatus<br />
Our aim in the development of technical aids in these<br />
experiments was to provide visual and auditory feedback to<br />
the learners which would concentrate his or her attention on<br />
certain acoustic/auditory features important to natural<br />
sounding rhythm and intonation. Two technical aids were<br />
developed to this end. One to provide a clearer auditory<br />
impression of the prosody in practice utterances by means of<br />
isolation of the suprasegmental features. The other to<br />
present the learner with feedback of a visual representation<br />
of isolated acoustic features relevant to prosody and to<br />
make it possible to visually compare these features in<br />
practice efforts with a model utterance.<br />
2.1.1 Auditory feedback : "the Hummer"<br />
This device was developed with the idea that an isolation of<br />
the prosodic features in an utterance may have the effect of<br />
clarifying auditory goals toward which the learner was to<br />
strive. The instrument developed was, in electronic terms,<br />
a fairly simple one. The essence of this aid was a simple<br />
variable band pass filter which could be manipulated in<br />
several ways by the user. When the speech signal was fed<br />
through this filter it was possible to eliminate all<br />
segmental information so that the auditory impression was<br />
that of humming the original utterance thus effectively<br />
isolating the suprasegmental information. It was possible<br />
for the user to vary the center frequency of the filter so<br />
148
that the amount of segmental information present in the<br />
signal could be chosen at will. This instrument was located<br />
between the learners tape recorder and his earphones so that<br />
both the model utterances and his own efforts could be<br />
filtered and compared as a complement to the traditional<br />
audio active comparative method.<br />
A schematic representation<br />
of this device is shown in figure 1.<br />
FIGURE 1:<br />
F¢=60-300HZ<br />
I<br />
- , -<br />
36<br />
A<br />
II<br />
f : \<br />
I<br />
I ,<br />
/ I ,<br />
I<br />
BP-FILTER<br />
BUTTERWORTH<br />
-6 po l.<br />
dn/oct 36dI3/oct FILTEr<<br />
•<br />
OUT<br />
IN. "'" FIGURE 1<br />
DIRECT<br />
A SCHEMATIC REPRESENTATION OF THE LEARNING DEVICE<br />
FOR AUDITORY FEEDBACK "THE HUMMER"<br />
In experiment 1 still another "hum" was used. The speech<br />
signal was filtered by means of a computer program developed<br />
by Peter Branderud (1979) and the practice utterances were<br />
prerecorded so that each practice utterance in the training<br />
material was followed by a filtered version of the<br />
utterance.<br />
149
2. 1. 2 Visual feedback<br />
The essential elements of this instrumentation were a<br />
,.<br />
fundamental frequency extractor (Martony, 1976) and a twotrack<br />
storage oscilloscope. In effect this device<br />
functioned in roughly the same manner as visualiz ers used<br />
and described by researchers previously mentioned in section<br />
1. (Lane and Buiten, 1966; James, 1976) . The learner was<br />
able to hear the model utterance and see its<br />
intonation/rhythm representation on the upper track of the<br />
oscilloscope.<br />
After storing this image he was able to try<br />
to reproduce this representation with as many tries as<br />
were<br />
needed being able to store an effort for inspection and<br />
comparison with the model utterance before going on to the<br />
next attempt at matching the model. A schematic<br />
representation of this instrumentation can be seen i<br />
figure<br />
2.<br />
2. 2 Training sessions and control recordings<br />
A graphic representation of the procedural steps in both<br />
training experiments is presented in figure 3.<br />
Subjects were selected and a screening test was administered<br />
to all the students who were to take part in the<br />
experiments. This test was used to establish the<br />
proficiency of the subjects in the perception and production<br />
of the prosodic categories that were to be trained. The<br />
subjects were divided into experimental and control groups<br />
on the basis of the results of this test with the aim of<br />
150
LD<br />
Ti\PE<br />
RECORDER<br />
00<br />
llCROPIIONE<br />
a==o<br />
--11 '<br />
MIPLIFIER<br />
( l=::::J -1>-<br />
,<br />
F<br />
o<br />
I<br />
-extractor<br />
: , ! I '-C<br />
. I<br />
•<br />
STORi\GE OSClLLOSCOPE<br />
'iV\,;wJ'\r vJ\<br />
-f\r,N-J\r"<br />
S'rOl\I 1 STOlm 2<br />
CIIMIHII.<br />
CIIi\NNEL 2<br />
C\J<br />
<br />
Il::<br />
:::><br />
o<br />
H<br />
<br />
FIGURE 2<br />
A SCHEMATIC REPRESENTATION OF THE INSTRUMENTATION FOR THE DEVICE WHICH<br />
PROVIDED THE VISUAL FEEDBACK
FIGURE 3<br />
s U B J E C T S<br />
students of English and Swedish<br />
ISCREENINGJ<br />
PRETEST<br />
E X P E R I MEN T<br />
I 1 I<br />
P-. 6 C 0<br />
, .<br />
aUQI tory visual auditory auditory<br />
(fil ter)<br />
t<br />
+<br />
v isu al<br />
I<br />
1<br />
,<br />
K<br />
CON T R a L<br />
POSTTEST<br />
FIGURE 3<br />
A GRAPHIC REPRESENTATION OF THE PROCEDURAL<br />
STEPS IN THE TRAINING EXPERIMENTS<br />
152
creating groups that were as similar as possible in terms of<br />
the proficiency of group members prior to the beginning of<br />
the actual training experiment. A pre-test was then given<br />
to all subjects. This test consisted of a documentation of<br />
each individual subject's pre-training proficiency in the<br />
production of the prosodic categories that were covered by<br />
the training material. A recording of each subject's<br />
production of the relevant prosodic categories was made.<br />
subjects then trained in their respective groups and at<br />
The<br />
the<br />
end of the training period a post test was administered<br />
which was exactly the same as the pre-test. It should be<br />
stressed again here that one of the important aspects of<br />
these experiments was a concerted effort to integrate this<br />
training into the language course as a whole.<br />
2.2.1 Subjects<br />
The subjects in both experiments were recruited from two<br />
language courses at the University of Stock holm. The<br />
undergraduate pronunciation courses in the Department of<br />
English was one source of subjects. These were Swedes who<br />
had, generally speak ing, fairly high proficiency in English<br />
due to the emphasis on the learning of English in the<br />
Swedish schools. The other source of subjects were the<br />
courses in Swedish as a second language offered by the<br />
Institute of English Speak ing Students. These were foreign<br />
students speak ing many different native languages and<br />
were,<br />
almost without exception, beginners in their study of<br />
Swedish.<br />
153
2. 2. 2 Training material<br />
The linguistic practice material used in both training<br />
experiments was the same as that used in the regular<br />
language courses the difference being that only the prosodic<br />
material was used by our subjects during the training.<br />
Students from the English department used the relevant<br />
exercises and prerecorded tapes from "A Course Book in<br />
English Pronunciation" , Clerici (1984) in experiment I and<br />
2. These exercises emphasized sentence intonation types in<br />
British English as well as vowel reduction exercises.<br />
The students of Swedish used the exercise booklet "Uttal" by<br />
Marschall and Rosenquist (1983) and the corresponding<br />
prerecorded tapes. These exercises emphasized Swedish word<br />
accent in various phrase and sentence contexts, the<br />
long-short distinction, and the interaction of these<br />
categories with rhythm and intonation on the sentence<br />
level<br />
which has been shown to be critical to the realization of<br />
Swedish prosody.<br />
2. 2. 3 Training<br />
In experiment I the subjects were divided into 5 groups:<br />
group A trained with the "Hummer", group B with the device<br />
for visual feedback, group C with both the visual feedback<br />
instrumentation and the prerecored "hum" described above<br />
(3. 1), group D with the prerecorded hum only and group K was<br />
the control group who used the same training material but in<br />
154
the tradit ional way wit hout the technical aids. In<br />
experiment 2 there were only 3 groups correspondinng to<br />
groups A, B and K in experiment 1.<br />
hummer(group A), visual feedback<br />
group (group K). The subjects<br />
That is to say: the<br />
(group B), and control<br />
in experiment 1 were<br />
requested to practice 2 hours a week over a 4 week period.<br />
In experiment 2 this training time was increased to 2<br />
hours<br />
a week over an 8 week period. As has been mentioned above<br />
in the summary of the training experiments (2. 2) the<br />
pre-test was given at the start of the training period and<br />
the post-test at the end of this period. These tests were<br />
identical<br />
and were composed of the same prosodic categories<br />
included in the practice material.<br />
2. 3 Evaluation with listener judgements<br />
The purpose of this procedure was, of course, to evaluat e<br />
the progress of each subject in terms of the production of<br />
the relevant prosodic feat ures of the respective languages<br />
and<br />
to establish whether or not those subjects who used the<br />
technical aids showed a difference in progress when compared<br />
to the control<br />
group.<br />
The evaluation procedure differed<br />
slightly<br />
bet ween<br />
experiments 1 and 2. In the first<br />
experiment the pre and post test tapes were recorded into<br />
the DEC Eclipse computer at the phonet ics lab. Each tape<br />
was then edited by the MIX program, an interactive signal<br />
editor. Aft er editing there was one signal file for each<br />
utterance by each subject. The MIX file now allowed us to<br />
play the files back in any order. We made a syst emat ic<br />
155
selection of representative samples for the respective<br />
languages, ordered these utterances randomly, and created<br />
the tapes for the listening experiments.<br />
A panel of experts<br />
was then recruited to listen to these tapes and grade the<br />
utterances on a three poing scale with 1 being the least,<br />
and 3 being the most successful pronunciation of the<br />
target<br />
language. For the Swedish material, the panel was composed<br />
of Swedish natives who were either language teachers at<br />
the<br />
Institute of English Speak ing Students or linguists and<br />
phoneticians employed at the Department of Linguistics at<br />
the University of Stock holm. The panel who judged the<br />
English material were either native speak ers of English<br />
who<br />
had<br />
were<br />
experience<br />
teachers<br />
in teaching English prosody or Swedes who<br />
of English intonation in the English<br />
department<br />
at<br />
the University of St ock holm. For the<br />
evaluation of the second experiment the pre and post test<br />
tapes were edited in a similar way.<br />
For each language group<br />
we selected approximately half of the available test<br />
material. In contrast to the first experiment, this time<br />
the utterances were presented to the listeners in pairs.<br />
The pair consisted of the "same" utterance by the same<br />
student from the pre test tape and the post test tape<br />
repectively and these pairs were randomly ordered. The<br />
members of the pair could be presented to the listners in<br />
two orders: either the pre test utterance followed by the<br />
post test utterance or the post test utterance followed by<br />
the pre test utterance. With the help of a simple Basic<br />
program we radomized this utterance order within pairs.<br />
The<br />
different language groups were, of course, kept separate as<br />
156
in the first experiment. The task of the expert panels,<br />
which were very similar in composition to those in<br />
experiment 1, was now to assess which of utterances in the<br />
pair<br />
was the better production with respect to the intended<br />
prosodic category or to indicate that the two were equally<br />
good (or bad).<br />
For the collection of the listener panel judgements we<br />
used<br />
the DIRIS system (described by Dufberg elsewhere in this<br />
volume). At a listening session, each judge/listener used a<br />
computer terminal including a screen and a keyboard. Each<br />
intended utterance, together with the intended prosodic<br />
category, was written on the terminal screen. Then the<br />
recorded utterance pair was played and the tape was<br />
automatically stopped until all judges had given a response.<br />
The tape was then automatically re-started and continued<br />
to<br />
the next pair. The judgements were automatically stored in<br />
data files and recoded so as to allow us to treat it as<br />
least interval data. This data was then organized into<br />
matrixes compatible with the statistical program SAS.<br />
The<br />
statistical analysis was done on QZ's IBM/Guts computer's<br />
SAS program.<br />
3. Results<br />
Since experiments 1 and 2 differed slightly in terms of<br />
methods the results will be presented separately.<br />
157
3. 1 Experiment 1<br />
Figure 4 shows a summary of the average grades for all<br />
subjects with pre test score plotted against post test<br />
score. It appears that there is a definite tendency toward<br />
improvement in the realization of prosody for students of<br />
both languages. A t-test showed that this improvement was<br />
statistically significant at the 2. 5% level. This figure<br />
shows, however, no obvious difference in experiment and<br />
control groups. Indeed, no statistically sigificant<br />
difference could be established between the improvement of<br />
the experiment and control groups respectively.<br />
Figure 5 shows more explicitly the difference between<br />
control and experiment groups in a bar graph where the<br />
y-axis labeled DIFF SCORE is the difference between the<br />
average grade (1 to 3) on the pre test and average grade<br />
on<br />
the post test (also 1 to 3). Here we can see that, on the<br />
average, the students using the traditional language<br />
laboratory methodology improved more than the students who<br />
trained with the technical aids even though this<br />
difference<br />
was not statistically significant.<br />
In figure 6 the difference scores for the individual<br />
experiment groups and the control group are shown.<br />
It can be<br />
observed again that the control group showed better<br />
improvement than any of the experimental groups. None of<br />
these differences were statistically significant however.<br />
158
FIGURE 4<br />
AVERAGE GES FOR EACH SUBJECT<br />
3<br />
E-1<br />
++<br />
en<br />
<br />
E-1<br />
+<br />
E-1 2<br />
.<br />
en + •<br />
+<br />
•<br />
0<br />
•<br />
•<br />
P4 +<br />
•<br />
1<br />
•<br />
•<br />
1 2<br />
3<br />
PRE TEST<br />
• =<br />
EXPERIMENT GROUP<br />
+= CONTROL GROUP<br />
FIGURE 4<br />
A SUMMARY OF THE AVERAGE GRADES FOR ALL SUBJECTS WITH PRE TEST<br />
PLOTTED AGAINST POST TEST SCORE. THIS GRAPH SHmvS IMPROVEMENT<br />
FOR ALL SUBJECTS WHICH WAS SIGNIFICMT AT THE 2.5% LEVEL. THERE<br />
1;vAS NO STATISTICALLY SIGNIFICAi\)T DIFFERENCE BETWEEN THE CONT2-.01<br />
AND EXPERIMENT GROUPS.<br />
159
.5<br />
FIGURE 5<br />
.4<br />
w<br />
Ct:::<br />
0<br />
u<br />
(fJ<br />
LL<br />
LL<br />
<br />
a<br />
· 3<br />
· 2<br />
• 1<br />
o<br />
EXP<br />
CONTROL<br />
GROUP<br />
FIGURE 5<br />
DIFFERENCE BETWEEN THE CONTROL AND EXPERIMENT GROUPS<br />
THE Y-.IS LABELED DIFF SCORE<br />
INDICATES THE DIFFERENCE<br />
BETHEEN THE AVERAGE GRADE ON THE PRE TEST (1-3) AND THE<br />
AVERAGE GRADE ON THE POST TEST (1-3). THE DIFFERENCE<br />
WAS NOT STATISTICALLY SIGIFICANT .<br />
FIGURE 6<br />
. 5<br />
· 4<br />
w<br />
Ct:::<br />
0<br />
u<br />
(fJ<br />
LL<br />
LL<br />
<br />
Cl<br />
· 3<br />
. 2<br />
• 1<br />
o<br />
A B C<br />
GROUP<br />
D<br />
K<br />
FIGURE 6<br />
DIFFERENCE SCORES FOR THE INDIVIPUAL EXPERIMENT GROUPS AND<br />
THE CONTROL GROUP: DIFF SCORE INDICATES THE DIFFERENCE<br />
BETWEEN THE AVERAGE GRADE ON THE PRE TEST (1-3) AND THE<br />
AVERAGE GRADE ON THE POST TEST (1-3): NONE OF THESE<br />
DIFFERENCES ARE STATISTICALLY SIGNIFICANT:<br />
160
3. 2 Experiment 2<br />
Figure 7 shows the results of training for experiment groups<br />
A and B individually and taken together (EXP) and for the<br />
control group K. The two language groups were also taken<br />
together in these results. It should be recalled here that<br />
the listener judgements were expressed in terms of better,<br />
worse, same. These responses were recoded to +1 for better,<br />
-1 for worse and 0 for same. All groups showed a positive<br />
result i. e.<br />
all groups on the average improved their mastery<br />
of target language prosody. This gave us a positive number<br />
between 0 and +1 for all groups. The y-axis in this bar<br />
graph represents subjects progress expressed in terms of<br />
this number. As was the case for experiment 1, the control<br />
group shows the most improvement. In this case the<br />
difference<br />
between the experimental group as a whole (group<br />
A plus group B) and the control group as a whole (K) was<br />
significant at the 2% level. The difference between groups<br />
B and K was also significant at the 1% level. The<br />
difference between groups A and K was not significant nor<br />
was the difference between groups A and B.<br />
Figures 8 and 9 show the same results for the individual<br />
language groups. The English learners' progress reflects<br />
the same tendencies as were seen in fig 7. The control<br />
group shows the most progress. The difference seen on the<br />
graph (fig 8) between the experiment group (A plus B) and<br />
the control group was significant at the 5% level. The<br />
difference between groups B and K was also significant at<br />
the 1% level. No other differences seen i figure 8 were<br />
16 1
FIGURE 7<br />
-:r<br />
.3<br />
(f)<br />
(f)<br />
w .2<br />
er::<br />
L.9<br />
o<br />
er::<br />
(L<br />
• 1<br />
--<br />
--<br />
-i-<br />
-l-<br />
I-<br />
--<br />
--<br />
o<br />
A B K EXP<br />
GROUP<br />
I<br />
FIGURE 7<br />
THE<br />
RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY<br />
AND TAKEN TOGETHER (EXP) AND FOR THE CONTROL GROUP K. THE Y-AXIS<br />
EXPRESSES SUBJECTS PROGRESS IN TERMS OF A NUMBER BETWEEN 0 AND 1<br />
(see text). STATISICALLY SIGNIFICANT DIFFERENCES: EXP-K 2% level;<br />
B-K 1% level.<br />
162
FIGURE 8<br />
(f)<br />
(f)<br />
w<br />
et::<br />
l.9<br />
.3<br />
.2<br />
T<br />
T<br />
T<br />
$<br />
T<br />
I<br />
T<br />
I<br />
-.!-<br />
!<br />
I<br />
0 I<br />
T<br />
T<br />
0:::<br />
-r<br />
•<br />
CL 1<br />
+"<br />
T<br />
0<br />
t<br />
I<br />
-r n<br />
;--<br />
ENGLISH<br />
-<br />
A B I< EXP<br />
GROUP<br />
FIGURE 8<br />
THE RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY<br />
AND TAKEN TOGETHER (EXP) AND FOR THE CONTROL GROUP K FOR THE<br />
LEARNERS OF ENGLISH.<br />
OF A NUMBER BETWEEN 0 AND 1 (see text).<br />
DIFFERENCES: EXP-K 5% level; B-K 1% level.<br />
THE Y-AXIS EXPRESSES SUBJECTS PROGRESS IN TERMS<br />
STATISTICALLY SIGIFICANT<br />
FIGURE 9<br />
SWEDISH<br />
(f)<br />
(f)<br />
W<br />
et::<br />
l.9<br />
o<br />
0:::<br />
CL<br />
.2<br />
• 1<br />
o<br />
A<br />
n<br />
B I< EXP<br />
GROUP<br />
FIGURE 9<br />
THE RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY<br />
AND TAKEN TOGETHER (EXP) AND FOR CONTROL GROUP K FOR THE LEARNERS<br />
OF SWEDISH. THE Y-AXIS EXPRESSES SUBJECTS PROGRESS IN TERMS OF<br />
A NUMBER BETWEEN 0 AND 1 (see text). NO STATISICALLY SIGNIFICANT<br />
DIFFERENCES.<br />
163
statistically significant. The Swedish learners (fig 9)<br />
show generally the same results in that the control group<br />
shows the most progress. None of the differences between<br />
groups were statistically significant here however.<br />
4. Discussion<br />
Let us return to the introduction and review our point of<br />
departure and main question in this research. Other<br />
researchers have found a positive effect of this methodology<br />
in<br />
the learning of prosodic elements of a foreign language.<br />
Our main question was similar to that of these<br />
researchers:<br />
"Do technical aids help in the learning of prosody?"<br />
Formulated as a O-hypothesis this question could be<br />
expressed as: Learners who use the technical aids will not<br />
achieve a more native-like production of the prosodic<br />
features of the target language when compared to learners<br />
who have used only traditional language laboratory<br />
methods.<br />
Our qualification of these formulations is of considerable<br />
importance to this research. That is, we are most interested<br />
in the "long term effects" of this methodology or,<br />
to put it<br />
somewhat differently, "How does this method work if set<br />
within the time and curriculum framework of a typical<br />
language course?" The results presented in section 3 seem<br />
to<br />
make it fairly clear that the technical aids methodology<br />
as we have applied it in this research does NOT seem to<br />
facilitate the learning of the prosody of a foreign language<br />
164
more than the traditional language lab methods. In fact, we<br />
could go even further on the basis of our results and say<br />
that, even though we often lack statistical significance,<br />
there are several clear indications that the subjects who<br />
used the traditional audio active comparitive method aquired<br />
a more native-like mastery of these features than the<br />
subjects who used the technical aid/feedback methodology.<br />
Let us now briefly discuss some possible reasons for these<br />
results and the discrepancy between them and the expected<br />
results based on earlier research. It should be mentioned<br />
here that our informal observation of our subjects use of<br />
the technical aids made us optimistic as to the teaching<br />
value of this methodology.<br />
The<br />
students were generally<br />
enthusiatic and stimulated<br />
by<br />
working with these<br />
instruments. Due to these observations and comparison with<br />
other such experiments mentioned above, we do not believe<br />
that our subjects were somehow "confused" by these technical<br />
instruments as some of our colleagues have suggested. The<br />
operation of our apparatus was no more complicated than in<br />
other experiments of this kind and therefore we consider<br />
this explanation of our results to be less than<br />
convincing.<br />
This is not to say, of course, that an improvement in the<br />
function of our instrumentation would not effect our<br />
results. Our instrumentation was, in fact, relativly<br />
primitive compared with what is currently available in the<br />
form of computerized instructional devices.<br />
A somewhat more<br />
appealing explanation of our results, though very general<br />
and vague, is that the learning of the information that is<br />
fed back via the instruction devices is somehow not as<br />
165
closely related to the linguistic aspects of the target<br />
features as has been assumed in the development and use of<br />
these methods. Then the question immediately arises as to<br />
why the methods have worked better in other research where<br />
the training was more concentrated to short sessions. It<br />
would seem that the proposed explanation that the training<br />
may not be related to the linguistic learning process should<br />
have had the same effect in the other research that was<br />
similar in many ways to that presented here. Perhaps the<br />
most obvious difference is that in our experiments the<br />
training was spread out in time. We cannot at present<br />
understand why this time factor can be interpreted so as<br />
to<br />
account<br />
for the difference in the effects of these methods.<br />
The discrepancy between our results and the results of<br />
this<br />
previous work is, then, unresolved.<br />
Closer scrutiny, this methodology presents some problems<br />
related to our difficulty in explaining our results and<br />
relating them to earlier research. How much do we really<br />
know about the phonetic identity of prosodic elements? The<br />
visual manifestation of fundamental frequency in speech does<br />
not necessarily reveal to the learner which of the details<br />
of this parameter are critical to the production of a<br />
natural sounding intonation in a particular language. The<br />
same thing is of course true for the much discussed but<br />
little understood phenomenon of rhythm or timing.<br />
Actually,<br />
we would need to know these details and point them out to<br />
the learner for the effective use of this method but the<br />
fact is that our knowlege is still very limited.<br />
The use of alternative sensory channels for feedback<br />
166
information to be used in learning of linguistic features<br />
seems, in large part to be based on a rather vague<br />
behavioristic assumption. The feedback is assumed to<br />
facilitate a successful production and the successful<br />
production is assumed to reinforce the behavior and thus<br />
facilitate learning. This methodology has been used with<br />
some success in both foreign language learning and the<br />
teaching of handicapped such as deaf and hard of-hearing.<br />
It seems that this success is a fully adequate motivation<br />
for the use of the metods but that the success must be<br />
equally difficult<br />
to explain as the apparent failure of the<br />
methods in our work.<br />
5. Conclusions<br />
The<br />
inspiration for the initiation of this research was the<br />
success of the previous work in this field mentioned in<br />
section 1 of this report. As phoneticians we were<br />
enthusiastic about the possibility of using some of the<br />
methods we were familiar with from speech research in a<br />
practical way in a language teaching setting. Our aim was<br />
to test this promising methodology in an actual language<br />
course situation. We have found that the answer to our<br />
original question "Do technical aids help in the learning of<br />
prosody?" seems to be that they do NOT.<br />
Or at least that we<br />
have not been able to show such effects in this research.<br />
Our O-hypothesis is therefore supported:<br />
learners who use<br />
167
the technical aids will NOT achieve a more native-like<br />
production of the prosodic features of the target language<br />
when compared to learners who use the traditional language<br />
laboratory methodology. Although the results we have<br />
reported here are somewhat disappointing from the point of<br />
view of the phonetician who would lik e to be able to apply<br />
some of his research methods, they are important from<br />
another. As was mentioned in the introductory section,<br />
there<br />
has been some considerable research interest in these<br />
questions but a lack of well controlled research. We<br />
consider<br />
our work here to be a contribution to the research<br />
that is needed in order to be able to answer definitively<br />
our original question as to the utility of technical aids in<br />
language teaching and how these aids should be designed.<br />
168
REFERENCES<br />
1. Abberton, E. and Fourcin, A. (1975). Visual feedback and<br />
the acquisition of intonation. In: Lenneberg and Lenneberg,<br />
QnQ1iQn§ Qf 1nggg gglQEgn1 g Ng YQrl £Qgi£<br />
rg§§ EE 1§7=lQ§<br />
2. Albertson, K. (1982). Teaching pronunciation with visual<br />
feedback. N11 Qrnl , 1982.<br />
3. Baker, R. L. (1984). An experience with voice based<br />
learning. CALICO Journal, March 1984.<br />
4. Bannert, R. (1979) . Rapport fran ut talsk 1 iniken. In:<br />
r1i§ lingi§1i 1 · Lund: Lunds <strong>universitet</strong>, Inst for<br />
lingvistik.<br />
5. Bot, K. de (1980). The role of feedback and feedforward<br />
in the teaching of intonation. §1g Vol 8, pp. 35-45.<br />
6. (1983). Visual feedback and<br />
effectiveness and induced practice behavior.<br />
Egg£h Vol 26, part 4 pp. 331-349.<br />
intonation I:<br />
1nggg nQ<br />
7. Branderud, P. (1979) Blod - a block diagram simulator.<br />
gril§ J Stockholm: <strong>Stockholms</strong> <strong>universitet</strong>, Inst for<br />
lingvistik.<br />
8. Clerici,<br />
rQnn£i1iQn<br />
institutionen.<br />
M. (1981).<br />
Stockholm:<br />
QQr§g QQ in ngli§h<br />
<strong>Stockholms</strong> <strong>universitet</strong>, Engelska<br />
9. Crystal, D. (1975). Non segmental phonology in language<br />
acquisition. A review of issues. In: D. Crystal Thg ngli§h<br />
TQng Qf YQi£g . London: Edward Arnold pp. 125-149.<br />
10. Hengstenberg, P. (1979). Er§gggn1li nQ §Eg1g<br />
ihrgr ygri111ng in §Er£hli£hgn 19hr= nQ 19rnErQg§§gn<br />
Tubingen: Gunther Narr Verlag.<br />
11. James, I. F. (1976). The acquisition of prosodic features<br />
of speech using a speech visualizer. JE1 Vol XIV/3 pp.<br />
227-243.<br />
12. Kelz, H. , Kropp, W. , and Kummer, M. (1977). Zur<br />
Vereinheitligung der Intonationskodierung im<br />
Fremdspracheunterricht. In: H. Kelz (ed) hQng1i§£hg<br />
grnQlgg Qgr §§Er£hg§£hlng 1 Forum Phoneticum, 4.<br />
Hamburg:<br />
Buske Verlag.<br />
13. Lane, H. and Buiten, R. (1966). A self instructional<br />
device for conditioning accurate prosody. In: A. Valdman<br />
( ed) TrgnQ§ in 1nggg Tg£hing New York: Academic<br />
Press.<br />
169
14. Leon, P.<br />
measurements.<br />
pp. 30-47<br />
and Martin, P.<br />
In: Bolinger (ed)<br />
(1970) .<br />
.!!LtQ!!!!!iQ!!<br />
Machines and<br />
Harmondsworth<br />
15. Marschall, R. and Rosenquist, H.<br />
Stock holm: Stock holms <strong>universitet</strong>, IES .<br />
(1983) . !l!!!!1<br />
.,<br />
16. Martony, J. (1976). Om grundtonsfrekvensen hos gravt<br />
horselskadade och dova. CTM-rapport 3.<br />
17. Potter, R. , Kopp, G., and Green, H. (1948). Visible<br />
Speech. In. M. Joos £Q§!i£ rhQ!!!i£§ 1!!!!g!!g 24,<br />
Suppl.<br />
18. Spens, K. -E. , (1984). Hora med kanseln: Tak tila<br />
kommunikationshjalpmedel for dova - en forsk ningsoversikt.<br />
TRITA -TM 4-84. Stockholm: Kungl. Teknisk a Hogskolan, Inst<br />
for Taloverforing och Musik akustik.<br />
19. Vardanian, R. (1964). Teaching<br />
oscilloscope displays. 1!!!!g!!g 1!!r!!i!!g<br />
English through<br />
3-4 pp. 109-117.<br />
170