perilus v - Stockholms universitet

1. 

UNIVERSITY OF STOCKHOLM 

INSTITUTE OF LINGUISTICS 

PERILUS V 

Starting with this issue, we will be changing slightly the publication 

policy of PERILUS. Earlier issues included experimental efforts of our 

graduate students in connection with their course work in 

experimental phonetics. Results of work on larger projects were, as a 

rule, published elsewhere. In the future we Will, of course, continue to 

publish our work in international periodicals. It is, however, our 

intention to mirror the entire spectrum of scientific activity in our lab 

through PERIL US. PERILUS can thus be viewed as our department's 

working papers in phonetics. We hope that this new PERILUS will serve 

as an effective avenue of communication with our colleagues in the 

field of phonetics. Copies of PERIL US are available from the Institute 

of Linguistics, Stockholm University, S-7 06 97 Stockholm, Sweden. 

Olle Engstrand 

Hartmut TraunmOller

THE PHONETICS LABORATORY GROUP 

Ann-Marie Alma 

Uif Andersson 

Peter Branderud 

Una Cunningham-Andersson 

Hassan Djamshidpey 

Ma1s Dufberg 


iii 

ACKNOWLEDGMENTS 

The research reported in this issue of PERILUS was sponsored in part 

by the following sources: 

THE SWEDISH COUNCIL FOR RESEARCH IN THE HUMANITIES 

AND SOCIAL SCIENCES 

THE SWEDISH NATURAL SCIENCE RESEARCH COUNCIL 

THE TRICENTENNIAL FOUNDATION OF THE BANK OF SWEDEN 

THE SWEDISH BOARD FOR TECHNICAL DEVELOPMENT

v 

PREVIOUS ISSUES OF PERILUS 

PERILUS 1 1978 - 1979 

Page 

1. INTRODUCTION 

Bjorn Lindblom and James Lubker 

4 

2. SOME ISSUES IN RESEARCH ON THE PERCEPTION 

OF STEADY-STATE VOWELS 

Vowel identification and spectral slope 

Eva Agelfors and Mary Graslund 

10 

Why does [a] change to [:l] when F O 

is increased?: 

Interplay between harmonic structure and forman frequency 

in the perception of vowel quality 

Ake Floren 

13 

Analysis and prediction of difference limen data 

for formant frequencies 

Lennart Nord and Eva Sventelius 

24 

Vowel identification as a function of 

increasing fundamental frequency 

Elisabeth Tenenholtz 

38 

Essentials of a psychoacoustic model of spectral matching 

Hartmut TraunmOlier 

49 

3. ON THE PERCEPTUAL ROLE OF DYNAMIC FEATURES 

IN THE SPEECH SIGNAL 

Interaction between spectral and durational cues 

in Swedish vowel contrasts 

Anette Bishop and Gunilla Edlund 

64 

On the distribution of [h] in the languages of the world: 

Is the rarity of syllable final [h] due to an asymmetry 

of backward and forward masking? 

Eva Holmberg and Alan Gibson 

68 

On the function of formant transitions 

I. Formant frequency target vs. rate of change in vowel identification 83 

II. Perception of steady vs. dynamic vowel sounds in noise 

92 

Karin Holmgren 

83 

Artificially clipped syllables and the role of formant transitions 

in consonant perception 


105

v 

4. PROSODY AND TOP DOWN PROCESSING 

The importance of timing and fundamental frequency contour 

information in the perception of prosodic categories 

Bertil Lyberg 123 

Speech perception in noise and the evaluation 

of language proficiency 

Alan C. Sheats 

134 

5. BLOD - A BLOCK DIAGRAM SIMULATOR 

Peter Branderud 151 

PERILUS II 1979-1980 

Page 

Introduction 

James Lubker 

Astudy of anticipatory labial coarticulation in the speech of children 

Asa Berlin, Ingrid Landberg and Lilian Persson 2 

Rapid reproduction of vowel-vowel sequences by children 

Ake Floren 19 

Production of bite-block vowels by children 

Alan Gibson and Lorrane McPhearson 26 

Laryngeal airway resistance as a function of phonation type 

Eva Holmberg 

44 

The declination effect in Swedish 

Diana Krull and Siv Wandeback 

58 

Compensatory articulation by deaf speakers 

Richard Schulman 74 

Neural and mechanical response time in the speech 

of cerebral palsied subjects 

Elisabeth Tenenholtz 87 

An acoustic investigation of production of plosives 

by cleft palate speakers 

Garda Ericsson 95

vi 

PERILUS '" 1982 - 1983 

Page 

Introduction 

Bjorn Lindblom 

Elicitation and perceptual judgement of disfluency and stuttering 

Anne-Marie Alme 3 

Intelligibility vs redundancy - conditions of dependency 

Sheri Hunnicut 27 

The role of vowel context on the perception 

of place of articulation for stops 

Diana Krull 

45 

Vowel categorization by the bilingual listener 

Richard Schulman 81 

Comprehension of foreign accents. (ACryptic investigation.) 

Richard Schulman and Maria Wingstedt 101 

Syntetiskt tal som hjalpmedel vid korrektion av dovas tal 

Anne-Marie Oster 115 

PERILUS IV 1984 - 1985 

Page 

Introduction 


Labial coarticulation in stutterers and normal speakers 

Ann-Marie Alme 3 

Movetrack 


20 

Some evidence on rhythmic patterns of spoken French 

Danielle Duez and Yukihoro Nishinuma 

30 

On the relation between the acoustic properties of Swedish 

voiced stops and their perceptual processing 

Diana Krull 41 

Descriptive acoustic studies for the synthesis of spoken Swedish 

Francisco Lacerda 51

Frequency discrimination as a function of 

stimulus onset characteristics 

Francisco Lacerda 

66 

Speaker-listener interaction and phonetic variation 

Bjorn Lindblom and Rolf Lindgren 77 

Articulatory targeting and perceptual constancy of loud speech 

Richard Schulman 

86 

The role of the fundamental and the higher formants in 

the perception of speaker size, vocal effort, and vowel openness 


92

viii 

RECENT PUBLICATIONS 

AND PUBLICATIONS IN PROGRESS 

Una Cunningham-Andersson 

Durational correlates of post-vocalic voicing in English spoken by English and 

Spanish speakers. In Engstrand, O. (ed.): Papers from the Swedish Phonetics 

Conference held in Uppsala, Oct. 77-78, 7986, pp. 87-92. 


Salient features of Lule Sami pronunciation. In C.-C. E/ert (ed.): The Sounds of 

Lappish. University of Umea (in press). 

Articulatory correlates of stress and speaking rate in Swedish VCV utterances. J. 

Acoust. Soc. Am. (in press). 

The IRIS speech data base - a status report. In Engstrand, O. (ed.): Papers from 

the Swedish Phonetics Conference held in Uppsala, Oct. 77-78, 7986, pp. 

727-726. 

Diana Krull 

Spectrum and dynamics in the perception of stop consonants. In Engstrand, O. 

(ed.): Papers from the Swedish Phonetics Conference held in Uppsala, Oct. 

77-78, 7986, pp. 54-59, and contribution to the French-Swedish Research Meeting 

held in Grenoble, March, 7987. 

The locus-target relation in spontaneous speech. Contribution to the 

French-Swedish Research Meeting held in Grenoble, March, 7987. 

Evaluation of distance metrics using Swedish stop consonants. Proceedings of 

the 77th ICPhC, Tallinn 7987, Vol. 2, pp. 65-68. 


A typological study of consonant systems: the role of inventory size. In 

Engstrand, O. (ed.): Papers from the Swedish Phonetics Conference held in 

Uppsala, Oct. 77-78, 7986, pp. 7-9. 

Adaptive variability and absolute constancy in speech signals: two themes in 

the quest for phonetic invariance. Plenary Lecture, Proceedings of the 77th 

ICPhS, Tallinn 7987, Vol. 3, pp. 9-78.

ix 

Phonetic invariance and the adaptive nature of speech. Lecture presented at 

a symposium on 'Working models of human perception', celebrating the 30th 

anniversary of the Instituut voor Perceptie Onderzoek, Eindhoven, August 26-28, 

7987. Cambridge: Cambridge University Press. 

The concept of target and speech timing (with J. Lubker, B. Lyberg, P. 

Branderud and K. Holmgren). Festschrift for lise Lehiste. Dordrecht, The 

Netherlands: Foris (in press). 

Phonetic universals in consonant systems (with I. Maddieson). In L.M. Hyman 

and C.N. Li (eds. ): Language, brain and mind (in press). 

A model of phonetic variation and selection applied to the evolution of vowel 

systems. Presented in 7984 at a meeting at CASBS, Stanford. In S. -Y. W. Wang 

(ed.): Language transmission and change. New York: Blackwell (in press). 

Fonetik. Article submitted to the editorial board of Nationalencyklopedin. 

Evolution of spoken language (with P. MacNeilage and M. Studdert-Kennedy). 

Orlando, Florida: Academic Press (in preparation). 

Spraket, Lucy och datorn (with P. af Trampe). Stockholm: Bonniers (in 

preparation). 

Rolf Lindgren 

Phonetic reduction in spontaneous speech. Paper given at the TLH meeting in 

Lund, October 7987. 

Lennart Nord 

Acoustic studies of vowel reduction in Swedish. STL -QPSR 4/7986, 79-36. 

Vowel reduction in Swedish. In Engstrand, O. (ed. ): Papers from the Swedish 

Phonetics Conference held in Uppsala, Oct. 77-78, 7986, 76-27. 

Liselotte Roug 

Early phonetic development in four Swedish infants (with Ingrid Landberg and 

Lars-Johan Lundberg). In Engstrand, O. (ed. ): Papers from the Swedish 

Phonetics Conference held in Uppsala, Oct. 77-78, 7986, pp. 745-7 SO.

x 


Articulatory dynamics of loud and normal speech. In Engstrand, O. (ed.): 

Papers from the Swedish Phonetics Conference held in Uppsala, Oct. 77-78, 

7986, pp. 60-64. 


Phase vowels. Psychophysics of speech perception, Dordrecht: 

M. Nijhoff PUbl., 7987, pp. 293-305. 

Some types of variation and invariant spec tro t features of vowels. In Engstrand, 

O. (ed. ): Papers from the Swedish Phonetics Conference held in Uppsala, Oct. 

77-78, 7986, pp. 48-53. 

Perceptual relativity in identification of two-formant vowels (with Francisco 

Lacerda). Speech Communication 5 (in press). 

An experiment on the cues to the identification of fricatives. Proceedings of the 

77th ICPhS, Tallinn 7987, Vol. 5., pp. 205-208. 

Maria Wingstedt 

Foreign accents and perceptual processing (with Richard Sculman). In 

Engstrand, O. (ed. ): Papers from the Swedish Phonetics Conference held in 

Uppsala, Oct. 77-78, 7986, pp. 93-97. 

DISSERTATIONS 

Garda Eriksson (1987). Analysis and treatment of cleft palate speech: 

Some acoustic-phonetic observations. Link6ping University Medical 

Dissertations No. 254. ISSN 0345-0082. 

Lennart Nord (1987). Acoustic-phonetic studies of Swedish with an 

excursion into pathological speech. TRITA-TO M-87-1. Department of 

Speech Communication and Musical Acoustics, Royal Institute of 

Technology, Stockholm. ISSN 0280-9850.

X 

CONTENTS OF PERILUS V 

Peter Branderud 1 

About the computer-lab 


Adaptive variability and absolute constancy 

in speech signals: two themes in the quest for 

phonetic invariance 2 


Articulatory dynamics of loud 

and normal speech 21 


& Diana Krull 

An experiment on the cues to the 

identification of fricatives 33 

Diana Krull 

Second formant locus patterns 

as a measure of consonant-vowel 

coamculanon 43 

Madeleine Wulffson 

Exploring discourse intonation in Swedish 62 

Mats Dufberg 

Why two labialization strategies in Setswana? 78 

Liselotte Roug I 

Ingrid Landberg & 

Lars-Johan Lundberg 

Phonetic development in early infancy - 

a study of four Swedish children 

during the first 78 months of life 93 

Johan Stark & Mats Dufberg 

A simple computerized response 

collection system 140 

Robert McAllister I 

Mats Dufberg & 

Maria Wallius 

Experiments with technical aids in 

pronunciation teaching 

144

ABOUT THE COMPUTER-LAB 


In the last year we have been able to build up a new and modern 

computer system with grants from Wallenberg Foundation and FRN. 

Our old computer system is from 1975-83. It consists of two 

mini-computers with 200 MB disk storage each. It can accommodate four 

users at the same time. There is software for signal processing, 

acoustical analysis/synthesis, simulation of perception/production etc. 

It can A/D convert up to 16 channels with 12 bits resolution into the 

computer and it can D/A convert 2 channels with 16 bits resolution from 

the computer. 

The new computer system consists of several Apollo workstations that 

are connected by a fast network. We will also connect several PC/XT/AT 

via an Ethernet network. 

Presently we have two Apollo DN3000 with black and white displays and 

two DN3000 with color displays. Each work station has about 4 MB 

primary memory. We have 450 MB harddisk memory. We use the operating 

system Unix 4.2 and the programming languages C, Pascal, Fortran 77 and 

Commonlisp. We also plan to get Prolog. 

There is a connection between the old and the new computer systems 

that enables a fast file-transfer between the systems. We can also run 

the old system through a window on the Apollo workstations. That makes 

it possible and easy to use the best software on each system on our 

data. 

We are also preparing to install the program package Audlab from the 

Alvey-group. We will continuously transfer the most important programs 

from our old system to the new Apollo system. Some of the work stations 

will also be equipped with A/D- and D/A-converters. For the printouts 

we use laser-writers. 

We will continuously expand the system: for example, Harddisks, 

laserdisks, primary memory, working stations, array-processors and 

software. 

These increased resources will make our lab more modern and complete, 

thereby strongly enlarging our possibilities to engage in larger 

projects and receive more guest researchers. 

In addition, direct interaction with other laboratories around the world 

will become easy and efficient.

ADAPTIVE VARIABILITY AND ABSOLUTE CONSTANCY IN 

SPEECH SIGNALS: 

TWO THEMES IN THE QUEST FOR PHONETIC INVARIANCE* 

BjOrn 

Lindblom 

ABSTRACT 

Our topic is the classical problem of reconciling the 

physical and linguistic descriptions of speech: the 

invariance issue. Evidence is first presented indicating the 

possibility of defining phonetic invariance at the 

articulatory, acoustic or auditory levels of the speech 

signal. However, as we broaden the scope of our review, we 

find that attempts to define phonetic invariance in terms of 

absolute physical constancies tend to lose ground to 

theories that recognize signal variability as an essentially 

systematic and adaptive consequence of the informational 

mutuality of natural speaker-listener interactions. We reach 

this conclusion not only by examining experimental data on 

on-line speech processes but also by analyzing typological 

evidence on how the phonetic structure of consonant systems 

vary in lawful patterns with inventory size. 

INTRODUCTION 

Traditionally the problem of invariance in phonetics 

can be said to be that of proposing physical descriptions of 

linguistic entities that have the characteristic of 

remaining invariant across the large range of contexts that 

the communicatively successful real-life speech acts present 

to us. 

Many of us share the conviction that taking steps 

towards the solution of this problem will be crucial if we 

are to acquire a deeper theoretical understanding of the 

behavior of speakers and listeners as well as develop more 

advanced systems for speech-based man-machine communication 

(PerkellKlatt 1986) . 

The present paper will attempt to address some of the 

questions that we typically encounter in the search for 

invariance. We shall do so by summarizing research 

undertaken mostly in our own laboratory in Stockholm. 

Although thus deliberately limiting the scope of our review 

we hope that the issues raised will nevertheless be of 

sufficient interest to stimulate general discussion. 

IS PHONETIC INVARIANCE ARTICULATORY? 

A few decades ago phoneticians began to interpret 

phonetic events by comparing articulations to highly damped 

oscillatory systems. More recently, such models have 

acquired an important role within the framework of action 

theory (Kelso, Saltzman and Tuller 1986) . In the sixties it 

was hoped that a lot of the variability that speech signals 

*Plenary address to be presented at the XIth International 

Congress of Phonetic SCiences, Tallinn, Estonia, August 

1987. 

2

typically exhibit e g reductions and vowel-consonant 

coarticulation (Ohman 1967) - could be explained in terms of 

the spatial and temporal overlap of adjacent Pmotor 

commands" (MacNeilage 1970) . Articulatory movements were 

seen as sluggish responses to an underlying forcing function 

which was assumed to change, usually in a step-wise fashion, 

at the initiation of every new phoneme (Henke 1966) . Owing 

to variations in say stress or speaking tempo different 

contexts would give rise to differences in timing for a 

given sequence of phoneme commands. Articulatory and 

acoustic goals would not always be reached, the so-called 

'undershoot' phenomenon (stevens and House 1963) . But since 

such undershoot appeared to be lawfully related to the 

duration and context of the gestures (Lindblom 1963) , the 

underlying articulatory "targets" of any given phoneme 

'die Lautabsicht' - would nevertheless, it was maintained, 

remain invariant. Accordingly, at that time it seemed 

possible to argue that phonetic invariance might be 

articulatory. 

Duration-dependent undershoot still seems to to be a 

phonetically valid notion for biomechanical reasons. But it 

is clearly not as inevitable a phenomenon as was first 

thought. Current experimental information indicates that in 

fast speech articulatory and acoustic goals can be attained 

despite short segment durations (cf Engstrand 1987, Gay 

1978, Kuehn and Moll 1976) . Furthermore undershoot has been 

observed in unstressed Swedish vowels that exhibit long 

durations owing to 'final lengthening' (Nord 1986) . Such 

deviations from simple duration-dependence appear to 

highlight the reorganizational abilities of the speech 

production system. One way of resolving the problem posed 

by these somewhat contradictory results might be obtained if 

it were shown that when instructed to speak fast subjects 

have a tendency to "overarticulate", thus avoiding 

undershoot to some extent, whereas when destressing they are 

more prone to "underarticulate" (cf discussion below of 

hypo- and hyper-speech) . The demonstration of languagespecific 

patterns of vowel reduction (cf Delattre's 1969 

discussion of English, French, German and Spanish) becomes 

particularly relevant in the context of addressing such 

questions. 

In summary, the original observations of 'undershoot' 

carried the implication that the invariant correlates of 

linguistic units were to be found, not in the speech wave 

nor at an auditory level, but upstream from the level of 

articulatory movement. Phonetic invariance was accordingly 

associated with the constancy of underlying "spatial 

articulatory targets" (for reviews of the target concept see 

e g MacNeilage 1970, 1980) . However, subsequent 

experimentation - some of which we already hinted at above - 

has revealed that the notion of segmental target must be 

given a much more complex interpretation. 

This conclUSion is reinforced particularly strongly by 

studies of compensatory articulation. Let us summarize some 

results from an experiment using the so-called "bite-block" 

paradigm (Lindblom, Lubker, Lyberg, Branderud, Holmgren in 

press) . Native Swedish speakers were asked to pronounce 

monosyllables and bi- and trisyllabic words under two 

3

conditions: normally and with a large bite-block between 

their teeth. They were instructed to try to produce the 

bite-block utterances with the same rhythm and stress 

pattern as the corresponding normal items. Real Swedish 

words as well as "reiterant" nonsense forms were used: To 

exemplify, one of the metric patterns was: - '- - This 

pattern would occur in the lists as "begabbaN and 

Iba'bab:ab/. Measurements were made of the duration of the 

consonant and vowel segments of the normal and the biteblock 

versions of the reiterant speech samples. The question 

was thus whether subjects would be able achieve the bilabial 

closure for the Ibl segments in spite of the abnormally low 

and fixed jaw position and whether they would be able to do 

so reproducing the normal durational patterns. 

We found that the timing in the bite-block words 

deviated systematically but very little from the normal 

patterns and concluded that our subjects were indeed capable 

of compensating. To explain the results we suggested that a 

representation of the Ndesired end-product" - the metric 

pattern of the word - must be available in some form to the 

the subjects' speech motor systems and that the successful 

compensations implied a reorganization of articulatory 

gestures that must have been controlled by such an outputoriented 

target representation. These results are in 

agreement with those reported earlier by Netsell, Kent and 

Abbs (1978) . Moreover, they are completely analogous to the 

previous demonstrations that naive speakers are capable of 

producing isolated vowels whose formant patterns are normal 

at the first glottal pulse in spite of an unnatural jaw 

opening imposed by the use of a Nbite-block" (Lindblom, 

Lubker and Gay 1979, Gay, Lindblom and Lubker 1981) . 

These results bear on the recent discussion of speech 

timing as "intrinsicallyN or "extrinsicallyN controlled. 

Proponents of action theory (Fowler, Rubin, Remez and Turvey 

19BO) approach the physics of the speech motor system from a 

dynamical perspective with a view to reanalyzing many of the 

traditional notions that now require explicit representation 

in extant speech production models such as 'feedback loop', 

'target' etc. Their writings convey the expectation that 

many aspects of the traditional "translation models" will 

simply fall out as consequences of the dynamic properties 

intrinsic to the speech motor system. In the terminology of 

Kelso, Saltzman and Tuller (1986, 55) N • . • . , both time and 

timing are deemed to be intrinsic consequences of the 

system's dynamical organization. N Methodologically, action 

theory is commendable Since, being committed to interpreting 

phonetic phenomena as fortutitous (intrinsic) consequences 

rather than as controlled (extrinsic) aspects of a speaker's 

articulatory behavior, it guarantees a maximally thorough 

examination of speech production processes. However, it is 

difficult to see how, applying the action theoretic 

framework to the data on compensatory timing just reviewed, 

we could possibly avoid postulating some sort of Ntemporal 

targetN representation which is (i) extrinsic to the 

particular structures executing the gestures and which is 

(ii) responsible for extrinsically tuning their dynamics. 

Speech production is a highly versatile process and 

sometimes appears strongly listener-oriented. 

4

The plasticity of the speech motor system is further 

illustrated by an experiment recently done by Schulman 

(forthcoming) invoking a "natural bite-block" situation. 

This condition is provided by loud speech in which a more 

open mandible tends to be used than in normally spoken 

syllables. 

Whether rounded or not the vowels of loud test words 

produced by Schulman's talkers were found to exhibit almost 

three times as large jaw openings as the corresponding 

segments in the normal words. In the context of compensatory 

articulation two observations call for special comments. Why 

do not speakers compensate for the greater jaw opening in 

the loud vowels the way they do in the bite-block 

experiments? Schulman shows that they do not since the 

fundamental frequency and (as predicted by articulatoryacoustic 

nomograms) the first formant of the loud vowels are 

shifted upwards by about one Bark whereas the other formants 

do not undergo comparable modification. (Below we shall 

relate the F1 and FO shift to the results of a perceptual 

experiment) . 

The other finding of interest is the fact that loud 

vowel durations increase whereas loud consonant durations 

tend to decrease (cf Fonagy and Fonagy 1966) . What does that 

result mean? The normal-loud vowel duration differences look 

suspiciously similar to the durational differences between 

normal open and close vowels which have been observed for 

many languages (Lehiste 1970) . Finding that the duration of 

the EMG recorded from the anterior belly of the digastric 

correlated with both mandibular displacement and vowel 

duration Westbury and Keating (1980) suggest that this 

temporal variation among vowels, although non-distinctive, 

must be seen as present in the neuromuscular signals 

controlling their articulation. An alternative 

interpretation would be to regard the differences as 

automatic consequences of an interaction between an 

invariant underlying "vowel duration command" and 

articulatory inertia (cf Keating 1985 for further 

discussion) . In (Lindblom 1967) we reported some evidence in 

favor of the latter interpretation, the "extent of movement 

hypothesis" (Fischer-Jorgensen 1964) . We also found that the 

durational consequences of more extensive articulatory 

gestures were sometimes actively counteracted. 

The question whether the open-close vowel duration 

difference is an intrinsic or extrinsic phonetic phenomenon 

is accordingly somewhat controversial. Schulman's findings 

bear on the problem. He constructed a model of loud speech 

based on the observation that loud movements appear to be 

"exaggerated" versions of the normal movements. Assuming 

that the lips and the jaw are linear mechanical systems and 

that loud differs from normal speech solely in terms of the 

amplitudes of the underlying excitation forces he performed 

a linear scaling of all articulatory parameters recorded for 

normal syllables (vertical displacements of upper and lower 

lips and jaw) and combined the scaled curves so as to derive 

the vertical separation of the lips - the parameter that 

determines the open-closed state of the mouth opening. By 

using the value of this parameter at opening and cloSing in 

the normal syllables as his criterion he was then able to 

5

predict the durations of vowel and consonant segments for 

loud speech. He found that linear scaling eliminated stop 

closures entirely or produced much too long vowels. 

The implication of this result is that it clearly 

attributes the durational differences to a superposition 

effect, that is the interaction arising from the 

superposition of the lip and the jaw movements. Schulman 

concludes that, unless the effect of opening and closing of 

the jaw had been actively counteracted, loud and normal 

vowel durations would have differed even more than they 

actually did. 

Let us remark in the present context that, while it 

appears reasonable to suggest, as do Westbury and Keating, 

that the acoustic vowel duration differences are probably 

reflected at a level of neuromuscular control, there is also 

evidence indicating that the function of neural control 

signals may be a compensatory rather than a positive one, 

that is a function opposite to that suggested by Westbury 

and Keating. 

The preliminary implication of all work touching the 

theme of compensatory articulation appears to be that 

whether we use utarget" with reference to segmental 

attributes, segment durations or patterns of speech rhythm - 

the term is better defined, not in terms of any simple 

articulatory invariants, but with respect to the acoustic 

output that the talker wants to achieve. If phonetic 

invariance is not articulatory could it be acoustic then? 

IS PHONETIC INVARIANCE ACOUSTIC? 

The suggestion that the speech signal contains absolute 

physical invariants corresponding to phonetic segments and 

features has received a lot of attention thanks to the work 

by Stevens and Blumstein (Stevens and Blumstein 1978, 1981; 

Blumstein and Stevens 1979, 1981) . The idea has been 

favorably received by many, for instance Fowler in her 

attempts to apply the perspective of direct perception to 

speech (Fowler 1986) . 

Others have been provoked to emphasize the inadequacy 

of the non-dynamic nature of the Stevens template notion 

(Kewley-Port 1983) and the substantial context-dependence 

that the stop consonants of various languages typically 

display even in samples of carefully enunciated speech 

(Ohman 1966) . 

Recent work by Krull and Lacerda in our Stockholm 

laboratory uses the method of quantifying the extent of 

consonant-vowel coarticulation in the form of linear "locus 

equations". These relationships are obtained by plotting 

formant frequencies at CVz- and V1C-boundaries as a function 

of the formants for Vz and V1 respectively. Acoustic theory 

indicates that for the consonant-vowel combinations in 

question near-linear relationships should be expected. Such 

diagrams show clearly that, although a ulocusu pattern can 

exhibit considerable variation, it is predictable from 

information on stop consonant identity and adjacent vowel 

context. Here coarticulation stands out as the salient fact 

and the lack rather than the presence of absolute acoustic 

invariance tends to be reinforced. 

6

Inc identally, let us note that, if it ex ists, acoust ic 

invar iance is a strange not ion since talkers can only 

mon itor it through their senses and listeners can only 

access it through the ir hear ing system. Why should sensory 

and aud itory transduct ion be assumed to have a transfer 

funct ion of one impos ing no transformat ion? Is it the case 

that what people really mean when they talk about acoustic 

invar iance is in fact Uauditory· invar iance? Let us look at 

some psycho-acoust ic results. 

IS PHONETIC INVARIANCE AUDITORY? 

We ment ioned earl ier a perceptual result that offers a 

rather cur ious parallel to Schulman's find ings. It is the 

uTraunmCller effect· wh ich is a demonstrat ion of the 

transforms requ ired to preserve the perceptual constancy of 

vowel quality under changes in (i) vocal effort and (i i) 

vocal tract size. It is also somewhat rem inscent of the 

find ings on FO-F1 interrelat ionsh ips in soprano vowels 

(Sundberg 197). 

Effort and vocal tract var iat ions can be dramat ically 

illustrated by synthetically modify ing a naturally spoken 

IiI. When all formants and FO are sh ifted equally along a 

Bark scale an IiI-l ike vowel is perce ived but the voice 

changes from an adult's to a ch ild's. When both F1 and FO 

are var ied in such a way that F1-FO is kept constant on a 

Bark scale - and the upper formant complex is left unchanged 

- an IiI-l ike vowel is perceived. Th is is remarkable in view 

of the fact that F1 reaches a value more typical of a lowpitched 

I /. One's impress ion is that the speaker rema ins 

the same but that she ·shouts·. 

Note the parallel between Schulman's and TraunmCller's 

results. Are the find ings causally related? Do we expla in 

the lack of formant compensat ion in loud speech in terms of 

the TraunmCller effect? Or do we account for the vowel 

qual ity results in terms of the ·Schulmanu effect? 

Of importance for the present discuss ion is the fact 

that behav ioral constanc ies have been demonstrated and that 

they imply that at least in this case phonet iC invar iance 

must be def ined at a level of auditory representat ion. 

Let us return for a moment to the alleged invar iance of 

the release spectra of stop consonants. Diana Krull 

collected perceptual responses from Swed ish listeners to 

burst fragments obtained from V1C:VZ words (Krull 1987). One 

hundred test words were generated by constructing all 

poss ible combinat ions of V1 or Vz = short Ii e a 0 ul with 

C: · Ib: d: rd: g:/. Confus ion matr ices for the burst 

st imul i demonstrate the drastic coart iculat ion effects. By 

and large, listener responses can be accounted for in terms 

of the acoust ic propert ies of the st imul i. Th is is shown in 

her attempts to predict the confus ions from aud itor ily based 

·perceptual distance- computat ions. 

A related study has been carr ied out by Lacerda (1986). 

W. can characterize one part of his research as var ia tions 

on the theme struck by Flanagan in his .arly -difference 

limen· exper iments on vowel formant frequenc ies (Flanagan 

19). Lac.rda's quest ion was: How well can listeners 

discr im inate four-formant st imuli that differ solely in 

terms of the frequency of F2. His work perm its us to compare 

7

a psycho-acou.tic task: th. discrimination of F2 in bri.f 

ton. burst. with formant patt.rns static - with a "spe.ch 

task": the discrimination of th. onset of F2-transitions in 

Ida/-.timuli. 

The r.sults indicate that the subjects' ability to 

discriminate on the p.ycho-acoustic task is in close 

agr.em.nt with Flanagan'. findings whereas th.ir performance 

on the Ida/-stimuli is drastically impaired. One 

interpretation is that the di.crimination chang. is related 

to the fact that intra-category di.crimination is 

considerably worse than inter-cat.gory discrimination 

(Liberman. Harris, Hoffman and Griffith 197). 

With reference to the invariance issue it is important 

to note the following. Krull's results on .top perception 

indicate that the coarticulatory spectral variability of the 

stop releas.s is rather accurat.ly reflected in the 

confusions that her listeners made of such brief sounds. 

This is fully compatible with Lacerda's results on tone 

bursts. Note that in Lacerda's speech-task t.st however, the 

variability does not seem to b. as faithfully mirrored in 

the listeners' percepts for apparently th.y treat stimuli 

easily discriminabl. in psycho-acoustic tests as "the sameM• 

Wh.ther it i. the list.n.r invoking the "speech modeM or it 

is the interaction of the dynamic stimulus properties and 

speech-ind.pendent auditory processing is an issue still 

worth addr.ssing. However, our main point is this: The 

invariance that we disc.rn in these findings i8 not 

acoustic. It cl.arly presupposes auditory processing. 

IMPLICATIONS OF SPEAKING STYLE: THE HYPER-HYPO DIMENSION 

Everyday .xperienc. indicates that .peaking is a highly 

flexible proc.... W. are capable of varying our style of 

sp.ech from fast to slow, 50ft to loud. casual to clear. 

intimate to public. W. speak in diff.rent ways when talking 

to foreigner., babies, computers and hard of hearing 

persons. And we change our pronunciation as a function of 

the social rules that govern speaker-listener interactions 

(Labov 1972). 

Above we considered principally three types of phonetiC 

invariance: articulatory, acoustic and auditory invariance. 

What are the implications of variations in speaking style 

for the invariance is.ue? For the purpose of our discu.sion 

let us give phonetic invariance a strong literal 

interpretation which is rather extreme but neverthele.s not 

too far from working hypotheses explored previously by 

various investigators: MAll the information is in the 

Signal, particularly in it. dynamic.". For .uch a view of 

invariance to be correct - let us call it the strong ver.ion 

of ab.olute physical invariance - the following must be 

true: Talkers vary their .peaking style and thereby 

contribut. to increaSing the variability of the spe.ch wave 

but in utterances that are intelligible lingui.tic units 

will always exhibit a core of invariant physical information 

that will remain unde.troyed so a. to b. succes.fully u.ed 

by a listener. 

8

We recently undertook a literature survey+ in order to 

systematize the types of .peech materials that have been 

u.ed in 

acoustic phonetic studies published during the past 

t.n years in J Acoust Soc Am, J of Phon.tics, Language and 

Speech, and Phanetica. A total of over 700 articles were 

.elected as preliminarily relevant. W. ended up choosing 216 

as me.ting our crit.rian of Ndescriptiv. study of speech 

based on quantitative acoustic phonetic measur.ments N• 

Of .p.cial intere.t to us was to ascertain the relative 

proportion. of studies inv •• tigating Uself-g.n.ratedU .peech 

(including e 9 spontaneous conversation) on the one hand and 

speech samples chosen by the experimenter (e 9 list 

readings, nan.ense words etc) an the oth.r. Not 

.urprisingly, we found that the majority of studies, over 

90%, use experimenter-controll.d sp •• ch .amples. The rea.on 

is clear. A satisfactory experimental deSign pre.upposes 

good control of the variabl.s invalv.d. This is l.ss of a 

problem if the experimenter determines the test items but 

for Nreal speech- with its immense number of variables there 

is no establi.hed methodology that will guarantee such 

control. So rather than drown in an ocean of Nunknown 

factorsu our .trategy tends naturally to become one of 

resorting to Ugiven- test mat.rials and read speaking mode •• 

One way of justifying this widely used procedure is to 

argue that fir.t we will solve the problem of phonetic 

invariance in Nlab speech·. Then we will get to work on 

Nnatural speechu. Another outlook might be to suggest that, 

although we lack the supplementary methodology required by 

Uecological N .pe.ch, the exces.ive use of ulab speech N 

introduces an undesirable bias in our data bases as well a. 

in our theoretical intuitions about invariance and other key 

i •• ues - a bias that might make us underestimate the problem 

of speech variability in .pite of the fact that it is 

readily acknowledged by all workers in the field and has 

already, it would app.ar, be.n rath.r massively docum.nted. 

Consequently the situation ought to be balanced. 

We have recently been persuaded by the latter point of 

view and are currently recording (1) ·self-generated speech N 

produced under natural condition. and (2) parallel Ncitation 

form· .peech ba.ed on the syllables, words and phrases that 

occur in the spontaneous mat.rials. Data are currently being 

collected by Rolf Lindgren, Diana Krull and myself using 

this two-pronged approach involving compari.on. of ref.renc. 

pronunciation. (Ncitation formu speech) with samples of 

Uself-generated speech-. A few preliminary observations can 

be made that bear on the present discussion (cf also 

Lindblom and Lindgren 198). 

The reductions that we have found in spontaneous speech 

- and often escape the trained phon.tic ear even aft.r 

spectrographic evidence has been examined - are sometimes 

drastic. Speaking style has marked effects on the acoustic 

patterns of words. The vowel space shrinks in casual style 

and is expanded in Nhyper.peechu modes. The dlphthongization 

+1 am indebted to Diana Krull for doing the preliminary 

s.l.ctions and to Nata.ha B.ery of the Phonology Laboratory, 

Univ.r.ity of California, Berkel.y for the .tatistical 

analy •••• 

9

of ten.e Swedi.h vowel. i. enhanced and i. p.rticularly 

apparent in clear speech. Contrast in VOT for voiced and 

voicele.s .tops incre •••• and d.cr.ases a. we compare hyper 

.nd hypo-form. resp.ctively. Locus equations .how a smaller 

slope (=less vowel-dependence) for citation form 

pronunciations than for .pontaneou. speech which w. 

interpret to indic.t. th.t vow.l-consonant coarticulation i. 

count.r.cted in hyper.p •• ch (mor. inv.ri.nce) but tol.rat.d 

in hypo.peech (le.s inv.riance). Although prelimin.ry the 

obs.rv.tion. made so far .ugge.t th.t the pro.pect. for any 

.trong ver.ion of .bsolute phy.ic.l inv.riance to b • 

• ub.tantiated s.em most unfavor.ble. 

SPEECH UNDERSTANDING: CIN)DEPENDENCE OF SIGNAL INFORMATION 

At the Department of Romanc. L.nguag.s at Stockholm 

University a te.t is u.ed to measure how proficient native 

Swedi.h .tud.nts are in under.tanding .poken French in which 

the t.sk of the students i. to listen to triads of stimuli 

con.isting of two identical .entence. and one minimally 

diff.rent and to indicat. the odd cas •• 

Montre-l.ur ce chapeau s'i1 te pla t 

Montre-l.ur c. chapeau .'il te plat 

Montre-leur ces chapeaux .'il t. pl t 

Nativ. speakers of French have no problems of course 

with such .entence. wh.rea. Sw.dish li.ten.rs knowing no 

French have a lot of trouble. However, when the key 

inform.tion - • 9 the ce/ce/ce. triad - is pres.nted as 

fragments gated 

from the original •• ntenc •• the p.rformanc. 

of the Sw.dish subject. improv.s radically (Dufberg and 

steek forthcoming). 

This test can s.rv. to remind u. that perception is a 

product of two things: signal-dependent and signalindependent 

information. While I am perfectly capable of 

di.criminating the Fr.nch minimal contra.t. as auditory 

patterns I would quickly lose those patterns in a sentence 

context unle.s I have a suffici.ntly good command of French 

- that i. acce.. to .ignal-indep.ndent 'knowledge' whose 

interaction with the signal is a part of forming of the 

final percept. 

Th • • peech literature i. full of experimental data 

indicating that proc ••••• not primarily driv.n by the signal 

play an important role in the perception of speech. There 

will not be time to do justice to all the r.search b.aring 

on this issue. 

Let m. just recall some well-known paradigms: 

Perception of .pe.ch in the pres.nc. of various disturbance. 

(noise and distortion). The improvement of identification a. 

the .ignal g.ts lingui.tically richer (Miller, Hei.e and 

Lichten, Pollack and Pickett 1964 and by Miller&Isard). 

Detection of delib.rate mi.pronunciation. (Cole 1973). Word 

frequency effects (How.s, Savin). Restoration (Warren 1970, 

Ohala and Feder 1986). Phoneme monitoring (Foss&Blank). Word 

recognition from word fragments (Grosjean 1980, Nooteboom 

1981). Fluent restor.tion. in .hadowing mi.pronunciations 

(Mar.len-Wil.on .nd Welsh 1978). V.rbal tr.n.form.tion. 

(W.rren). Int.lligibility of lip-r.ading from vid.orecordings 

.upplemented by "hummed spe.chY - an audio .ignal 

10

proc •••• d to contain primarily 

(Ri.b.rg 1979). Inferences from 

(Ohala 1981). 

rhythm and 

historical 

intonation cu.s 

sound 

changes 

CONCEPTUALIZING SPEAKER-LISTENER INTERACTIONS 

Our review of .xp.rimental evidence bearing on the 

invariance i.sue has b.en •• lect ive but should n.vertheless 

provide a rough indication of a panoply of alt.rnativ. 

position. and their respective pro's and con's. We have 

considered the sugg.stion that the invariance of phon.t ic 

segments be defined: (i) at an articulatory level (e 9 the 

uspatial targetu hypothes is), (ii) at an acoustic level (e g 

spectral propert ies of stops), (iii) at an auditory level (e 

9 p.rceptual constancy of vowel quality). Which of the.e 

alternative • • hould we put our money on? 

When pursued .xperimentally articulatory, acoustic or 

auditory def init ions of invariance have the methodological 

virtue of .ncouraging a maximally thorough s.arch at th.se 

particular levels. But in seeking a broader theoretical 

und.r.tanding of sp.ech communication w. would .tand little 

to gain from .p.nding effort on choosing between levels. 

Such an appoach misr.ads the ev id.nce which, when view.d in 

a broader perspective, strongly pOints to the conclusion 

that: The invariance problem i. not a phonetiC i.sue at all 

for ultimately invariance can b. defined only at the level 

of listener comprehen. ion. 

We can convince ourselves of the correctness of that 

point by considering the following phrase in Engli.h: 

Il.snsevn/. We can h.ar this utterance either as LESS THAN 

SEVEN, or as LESSON SEVEN. In the appropriate contexts (.ay 

uHow many are comingU, and ·What is our topic to-day?H) the 

list.ner will not be aware of any ambiguity. At which 

phonetiC level do we find the physical correlates of the 

initial segments of the word Uthan-? Ne.dle.s to say there 

no such correlates in th is part icular case. The 

conclu.ion •• ems inescapabl.: W. should not put our mon.y on 

any of the above alt.rnatives. We must seek a more general 

theory. 

Th. experimental data on production indicates that the 

behavior of the speech motor syst.m i • • hap.d primarily 

two force. - plasticity (li.tener-orient.d reorganization) 

and economy (talk.r-oriented simplification) which 

int.ract on a short-term ba.is 50 as to gen.rate .ignal. 

that may be 

Urich or poorN in .xplicit physical information. 

Th. evid.nce on perc.ption has id.ntifi.d two major 

source. of information: .ignal-dependent and signalindep.ndent 

proc..... and .ugge.t. that on a .hort-term 

basi. percepts arise from the latter (i e NcontextU) 

modulating the former in an analogou.ly Urich or poorN 

manner. 

One pos.ible way of schematizing the log ical 

po.sibilitit •• of the.e conc.ptual .implification. i • • hown 

in the diagram of the enclosed figur •• This is not a very 

rigorous sch.m. but .eems useful, at l ••• t pedagogically, in 

contrasting some of the ideas curr.ntly entertain.d in 

phon.tics (cf J of Phon.tic., January i •• u. 1986). 

Thi. graph states that for sp •• ch to be intell igible 

the .um of explicit physical information and signal- 

by 

11

independent inform .. tion must b. ..bov. a threshold, that is 

the 13 d.gr •• lin •• In the ideal ca.e this sum equals a 

const .. nt the x- and y-v .. lu •• of sp.cific spe.ch ... mples 

falling right on that 11 ne. Points above the line are 

.... ociat.d with wh .. t might b. termed "over-clear" .p•• ch, 

points below it with "unintelligible" sp •• ch. 

MUTUALITY OF SPEAKER- 

LISTENER INTERACTION 

I 

Z 

W 

C 

Z 

w 

a.. 

w 

c 

z 

Ī 

...J 

It appears reasonable to assume that in the real-life 

situations utterances can vary tremendously with respect to 

how socially and communicatively successful they prove to 

be. For our present purposes let us focus on speech samples 

from hypothetically successful real-life speaker-listener 

interactions and assume that they produce data points 

clustering near and above the slant line. What would such a 

result imply? It would mean that there is a complementary 

relation between the amounts of information contributed by 

signal attributes on the one hand and ·contextN on the 

other. When speakers come close to the slant line it would 

indicate first of all that they are capable of varying their 

speech output in a plastic way (cf evidence on hypo-hyperspeech 

modes and other instances of reorganization of speech 

motor control) and secondly that, while perhaps not being 

perfect 'mind-readers', they are at least capable of 

adapting their speech on-line to the short-term fluctuations 

in the listener's access to NcontextU or signal-independent 

information (cf experimental documentation of numerous cases 

showing that listeners are in fact capable of successfully 

coping with highly context-dependent reduced and 

coarticulated speech stimuli). The possibility of such 

complementarity in real speech emerges also from some recent 

measurements reported by Hunnicutt (198) as well as from 

Lieberman's 1963 study. 

If we hypothesize that this strategy - let us call it 

the STRATEGY OF ADAPTIVE VARIABILITY - comes near the way 

real speakers actually behave when they are communicatively 

successful, we obtain a natural way of resolving some of the 

paradoxes that surround the invariance issue. For it follows 

that intra-speaker phonetic variation - along a hyper-hypocontinuum 

as well as along other dimensions is the 

characteristic that we should expect the units of ecological 

speech to exhibit - not absolute physical invariance. 

The proposed way of thinking about the issue does not, 

of course, rule out finding physical speech sound invariance 

in restricted domains of observation but it does explain why 

our quest for a general concept of phonetiC invariance has 

been largely unsuccessful. And, in a pessimistiC vein, it 

predicts in fact that it will continue to be so. 

Our reasoning leads us back to a conclusion already 

drawn by MacNeilage in his 1970 review of the invariance 

issue: 

N • • • the essence of the speech production process 

is not an inefficient response to invariant 

central signals, but an elegantly controlled 

variability of response to the demand for a 

relatively constant end (p 184)N. 

If, as sugge.ted here, we take the Nrelatively constant 

endN to be defined neither articulatorily, acoustically nor 

auditorily but specified only with reference to Nthe level 

of listener comprehensionu MacNeilage's formulation .till 

captures the Ne.sence of the speech production proce.sN 

saUsfactori lYe 

Let us pause to reflect on some of the implications of 

the two theories contrasted in our discussion: Absolute 

13

Physical Invariance versus Adaptive Variability. The former, 

if proved correct, would transform what currently looks like 

instances of massive variability into artefacts. For this 

theory says in fact that there simply IS NO variability of 

linguistic units. There seems to be but that is merely a 

result of our presently inadequate conceptual and 

experimental tools. Further note that if we push the notion 

of absolute constancy to it. extreme another implication can 

be noted, namely that the transmis.ion of information by 

speech - an undeniably biological process - is basically 

non-adaptive. 

The Theory of Adaptive Variability, on the other hand, 

says exactly the opposite. This is a theory for which it is 

easier to find support within the general study of the 

biology of motor control and perception. It is precisely by 

emphasizing the adaptive nature of speech processes that we 

obtain a principled way of investigating phonetiC variation 

and its origin. 

ON-LINE PROCESSES IN THE LIGHT OF TYPOLOGICAL EVIDENCE ON 

CONSONANT SYSTEMS 

Some time ago Nooteboom did an experiment on word 

retrieval and was able to show that listeners perform better 

if presented with the first halves of words than on the 

corresponding second-half fragments (Nooteboom 198 1). For an 

explanation he sugge.ted that, since word recognition is a 

real-time left-to-right process, word beginnings are less 

predictable than word endings. Consequently left-to-right 

context can be much more easily used than right-to-Ieft 

context. 

He concluded his paper by raising the question whether 

this asymmetry - that he take. to be a universal feature of 

the perceptual proceSSing of any language - might have left 

its imprint an how lexical information is organized in the 

languages of the world. He predicted (p 422) that: U(l) in 

the initial position there will be a greater variety of 

different phonemes and phoneme combinations than in word 

final pOSition, and (2) ward initial phonemes will suffer 

le.s than word final phonemes from assimilation and 

coarticulation rule •• u 

One basic assumption is that variations in perceptual 

predictability correlate with signal udistinctivenessu• 

Hence -the greater variety of different phonemes and phoneme 

combinationsu in the initial as compared with the final 

position of words. Restating the idea we can say that a 

larger paradigm goes with a RICHER signal inventory. The 

other side of the coin is of course that a smaller paradigm 

- such as that attributed to word endings - goes with a 

POORER signal inventory. In suggesting that the presence of 

assimilation and coarticulation should vary inversely with 

the need for keeping items distinct Nooteboom taCitly 

formulates a hypotheSiS that comes close to the theory of 

Adaptive Variability described here. Note that the theory 

Absolute Physical Invariance do nat offer us any b.sis at 

all for making predictions about a possible interplay 

between language structure and on-line processing. Why? As 

stated earlier according to that theory there IS no phonetiC 

variation, there only seems to be. The idea of language 

14

structure adapting to the on-line constraints of speaking 

and listening only becomes a possibility once we recognize 

the existence and systematic nature of phonetic variation. 

Only from that point of departure will we be able to address 

the question of what feeds the processes of phonological 

innovation. 

We shall not be in a position to present the 

typological data needed to test Nooteboom's hypothesis. 

However, we shall conclude our paper by presenting some 

other data that do bear on it and strongly encourage further 

examination of the underlying ideas. 

In collaboration with Ian Maddieson we recently 

undertook an analysis of the consonant inventories of 317 

languages, carefully selected so as to constitute a 

reasonable sample of the "languages of the world". Our 

corpus was that of UPSID, the UCLA Phonetic Segment 

Inventory Database (Maddieson 1984) . The data consists of 

lists of systems whose elements (allophones of major 

phonemes) are specified in phonetic transcription. 

Inventory sizes range from 6 to 95 consonants per 

system. The materials lend themselves to testing a 

paraphrase of Nooteboom's hypothesis: Is the phonetic 

structure of consonant systems independent of their size? Or 

is it systematically related to that dimension? If there is 

a systematic size-dependence what is it? 

There is neither time nor space to give the details of 

the analysis. They will be published elsewhere (Lindblom, 

MacNeilage and Studdert-Kennedy; Lindblom and Maddieson 

forthcoming) . Fortunately, Nooteboom's perspective provides 

us with a way of summarizing the main findings. 

It turns out that small paradigms statistically favor 

segments with both phonatory and articulatory properties 

that can be classified as basic or elementary. Medium-sized 

paradigms tend to include consonants invoking more 

elaborated gestures in addition to a core of basic elements. 

The largest systems use both these types but also 

combinations of elaborated gestures that we label complex 

articulations. To exemplify, plain Ip t kl are classfied as 

"basic" articulations whereas ejective Ip ' t ' k \ 1 or 

aspirated Ip t k / invoke -elaborated· mechanisms. A 

segment such as It ' is ·complex· since it shows more than 

one elaboration: both of place (retroflexion) and source 

features (aspiration). Logically a six-consonant system 

could use the eJective set for its stop series. Small 

systems never do in our material whereas medium-sized and 

large systems do. Moreover, the "complex", multiply 

elaborated segments are most frequent in the large 

inventories. The basic rule is that a less simple consonant 

tends not to be recruited without the presence of parallel 

more simple ("basic" or "elaborated") series (cf the notion 

of 'implicational hierarchy' of traditional terminology). 

The claim we make is accordingly that we see a positive 

correlation between paradigm size and the number of elements 

that a sound pattern selects from a dimension of 

"articulatory complexity". 

The validity of our analysis naturally hinges on the 

success with which we can give non-Circular, independently 

motivated definitions of ·articulatory complexity". When it 

15

comes to the details of the analysis that problem is a topic 

for future quantitative phonetic theory. For the moment we 

believe that the major trends are rather gross effects that 

can be convincingly demonstrated by the force of the 

examples. They permit us to make the following 

generalization: Small consonant paradigms invoke 'unmarked' 

phonetiCS, large paradigms 'marked' phonetics. That is of 

course exactly what Nooteboom's hypothesis predicts and it 

takes a few steps towards an explanation for why sevenconsonant 

systems do not show inventories like the following 

(Ohala 1980): 

We take the present typological data on consonant 

systems as providing strong evidence in favor of (a) 

language structure evolving as an adaptation to the 

constraints of the on-line processes of speaker-listener 

interaction. and for (b) the correctness of a theory of 

Adaptive Variability as an account of those processes. 

REFERENCES 

Blumstein S and Stevens K N (1979): UAcoustic Invariance in 

Speech Production: Evidence from Measurement of 

the Spectral Characteristics of Stop Consonants N, 

J Acoust Soc Am 72, 43-30. 

Blumstein S and Stevens K N (198 1): ·Phonetic Features and 

Acoustic Invariance in Speech", Cognition 10, 23- 

32. 

Cole R A (1973): NListening for Mispronunciations: A Measure 

of What We Hear during Speech N, Perception and 

Psychophysics 13, 13- 16. 

Delattre, P (1969): NThe General Phonetic Characteristics of 

Languages: An Acoustic and Articulatory Study of 

Vowel Reduction in Four Languages", Mimeographed 

Report, University of California, Santa Barbara. 

Engstrand, 0 (1987): NArticulatory Correlates of Stress and 

Speaking Rate N, accepted for publication in J 

Acoust Soc Am. 

Flanagan, J (19): uA Difference Limen for Vowel Formant 

Frequency·, J Acoust Soc Am 27:6 13-6 14. 

Fischer-Jrgensen E (1964): "Sound Duration and Place of 

Articulartion N, Zeltschrift fdr Sprachwissenschaft 

und Kommunikationsforschung 17: 17-207. 

16

Fonagy I and Fonagy J (1966): "Sound Pressure Level and 

Duration", Phonetica 15: 14-2 1. 

Fowler C A, Rubin P, Remez R E and Turvey M T (1980): 

"Implications for Speech Production of a General 

Theory of Action", 373-420 in Butterworth, B (ed): 

Language Production, vol I, London:Academic Press. 

Gay, T (1978): "Effect of Speaking Rate on Vowel Formant 

Movements", J Acoust Soc Am 63 ( 1):223-230. 

Gay T, Lindblom B and Lubker J (198 1): "Production of Bite 

Block Vowels: Acoustic Equivalence by Selective 

Compensation", J Acoust Soc Am 69 (3), 802-8 10. 

Grosjean, F (1980): "Spoken Word Recognition and the Gating 

Paradigm", Perception and Psychophysics 28, 267- 

283. 

Henke, W J (1966): Dynamic Articulatory Model of Speech 

Production Using Computer Simulation, Doctoral 

dissertation, M. I. T. 

Hunnicutt, S (1985): "Intelligibility 

Conditions of Dependency", 

28 ( 1):47-56. 

versus 

Redundancy 

Language and Speech 

Keating, P (1985): "Universal Phonetics and the Organization 

of Grammars", 115- 132 in Fromkin, V A (ed): 

Phonetic Linguisticst Orlando, FL:Academic Press. 

Kelso J A S, Saltzman, E L and Tuller, B (1986): NThe 

Dynamical Perspective on Speech Production: Data 

and Theory", J of Phon 14: 1, 29-59. 

Kewley-Port, D (1983): "Time-varying Features as Correlates 

of Place of Articulation in stop Consonants", J 

Acoust Soc Am 73:322-355. 

Krull, D (1987): "Evaluation of Distance Metrics Using 

Swedish stop Consonants", paper submitted to the 

Xlth ICPhS, Tallinn, Estonia. 

Kuehn, D P and Moll, K L (1976): "A Cineradiographic Study 

of VC and CV Articulatory Velocities·, J of Phon 

4:303-320. 

Labov, W (1972): Sociolinguistic Pattern., 

Philadelphia:University of Pennsylvania. 

Lehiste, I (1970): Supra.egmentals, Cambridge, MA:MIT Press. 

Lieberman, P (1963): "Some Effects of Semantic and 

Grammatical Context on the Production and 

Perception of Speech", Language and Speech 6: 172- 

187. 

17

Lib@rman A M, Harris K S, Hoffman H S and Griffith B C 

(1957): "The Discrimination of Spe@ch Sounds 

within and across Phoneme Boundaries", J of 

Exp@rim@ntal Psychology 54:358-368. 

Lindblom, B (1963): ·Spectrographic Study of 

Reduction·, J Acoust Soc Am 35:1773-1781. 

Vowel 

Lindblom, B (1967): ·Vowel Duration and a Mod@l of Lip 

Mandibl@ Coordination", STL-QPSR 4/1967, 1-29 (Dep 

t of Speech Communication, RIT, Stockholm). 

Lindblom B, Lubker J and Gay T (1979): -Formant Fr@quencies 

of Som@ Fix@d-Mandible Vowels and a Model of 

Speech Motor Programming by Predictive 

Simulation", J of Phon@tics 7, 147-161. 

Lindblom B, Lubker J, Lyberg B, Branderud P and Holmgren K 

(in press): NThe Concept of Target and Speech 

TimingU, to appear in Festschrift for lise 

L@hist@, (Foris:Dordrecht). 

Lindblom, B and Lindgr@n R (1985): ·Speaker-Listener 

Interaction and Phonetic Variation-, Perilus IV, 

Dept of Linguistics, University of Stockholm. 

Lindblom B, MacNeilage P and 

(forthcoming): Evolution 

Orlando, FL:Academic Press. 

Studdert-Kennedy 

of Spoken Language, 

M 

Lindblom, B and Maddieson, I (1988): ·Phon@tic Universals in 

Consonant Systems·, to appear in Hyman, L M and 

Li, C N (eds): Language, Speech and Mind, Croom 

Helm. 

MacNeilage, P (1970): -Motor Control of S@rial Ordering of 

Speech-, Psychological Review 77:182-196. 

MacN@ilag@, P (1980): -Speech Productionu, Language and 

Spe@ch 23 (1), 3-24. 

Maddieson, I (1984): Patterns of Sound, Cambridge:Cambridge 

UniverSity Press. 

Marslen-Wilson, W D and Welsh, A 

Interactions and Lexical 

Recognition in Continuous 

Psychology 10, 29-63. 

(1978): uProcessing 

Access during Word 

Sp@echu, Cognitive 

Netsell R, Kent, R and Abbs J (1978): UAdjustm@nts of th@ 

Tongue and Lip to Fixed Jaw Positions during 

Speech: A Preliminary Reportu, Conference on 

Speech Motor Control, Madison, Wisconsin. 

Nooteboom, S G (1981): "Lexical Retrieval from Fragments of 

Spoken Words: Beginnings vs Endings·, J of 

Phonetics 9, 407-424. 

18

Nord, L (1986): MAcoustic Studies of Vowel Reduction in 

SwedishM, STL-QPSR 4/1986, 19-36 (Dept of Speech 

Communication, RIT, Stockholm). 

Chala, J J (1980): -Chairman's Introduction to Symposium on 

Phonetic Universals in Phonological Systems and 

their ExplanationU, 184-18 in Proceedings of the 

IXth International Congress of Phonetic Sciences 

1979, Institute of PhonetiCS, University of 

Copenhagen. 

Chala, J J (1981): NThe Listener as a Source of Sound 

ChangeN, 178-203 in Masek, C S, Hendrick, R A and 

Miller, M F (eds): P.p.r. from the P.r ••••• ion on 

L.ngu.g • • nd B.h.vior, Chicago:Chicago Linguistic 

Society. 

Chala, J J and Feder, D (1986): -Speech Sound Identification 

Influenced by Adjacent NRestored- PhonemesN, J 

Acoust Soc Am 80. S110. 

tlhman, S (1966): MCoarticulation in 

Spectrographic MeasurementsN, 

39:11-168. 

VCV 

Utterances: 

J Acoust Soc Am 

Hhman, S (1967): -Numerical Model of CoarticulationM, J 

Acoust Soc Am 41:310-320. 

Perkell, J and Klatt, D (1986): Inv.ri.nce .nd V.ri.bility 

in Speech Proc ••••• , Hillsdale, N J:LEA. 

Pollack, I and Pickett, J M (1964): -Intelligibility of 

Excerpts from Fluent Speech: Auditory vs 

Structural Context-, J Verb Learn and Verb Beh 

3:79-84. 

Risberg, A (1979): Doctoral dissertation, RIT, Stockholm. 

Schulman, R (forthcoming): ·Articulatory Dynamics of Loud 

and Normal Speech-, submitted to J Acoust Soc Am. 

Stevens, K N and House A S 

Articulations by 

Acoustical Study-, 

128. 

(1963): -Perturbation of Vowel 

Consonantal Context: An 

J Speech Hearing Res 6:111- 

Stevens K N and Blumstein S (1978): 

Place of Articulation in 

Acoust Soc Am 64, 138-1368. 

NInvariant Cues for 

Stop Consonants-, J 

Stevens K N and Blumstein S (1981): NThe Search for 

Invariant Correlates Phonetic Features·, in Eimas, 

P and Miller J (eds): Per.p.ctiv •• on the Study of 

Spe.ch, Hillsdale, N J:LEA. 

Sundberg, J (1975): -Formant Technique in a Professional 

Singer·, Acustica 32 (2), 89-96. 

19

Traunmdller, H (1981): ·Perceptual Dimension of Openness in 

Vowels·, J Acoust Soc Am 69, 146-147. 

Warren, R (1970): -Perceptual Restoration of Missing Speech 

Sounds·, Science 167, 392-393. 

Westbury, J and Keating P (1980): ·Central Representation of 

Vowel Duration-, J Acoust Soc Am 67, Suppl 1, S37 

(A) • 

20

ARTICULATORY DYNAMICS OF LOUD AND NORMAL SPEECH* 


Introduction 

The present study was initiated to compare the movements and timing 

relationships of the lips and jaw for normal and loud speech 

productions. 

Swedish vowels varying in several degrees of openness were 

produced in a bilabial stop context. 

Considering findings reported in 

the literature, we could expect to observe the following for the loud as 

compared with normal productions: 

1) There will be an increase in jaw opening for all vowels 

(Schulman 1985). 

2) Sussman et al. (1973), Gay (1977), and Macchi (1985) have 

all found that jaw postion during bilabial stops is lower before and 

after open vowels than close vowels. 

should be even more pronounced here, 

This coarticulatory relationship 

given the increase in jaw openings 

for all vowels. 

3) Folkins and Abbs (1975) reported that when increasing jaw 

opening by resistive loading during the closure of bilabial stops, 

closure was achieved primarily by compensatory movement of the upper lip 

and to a lesser extent by the lower lip which affects a more elevated 

position in respect to the jaw. Given expectation , we should, in our 

"natural bite-block" condition, 

also see such compensation. 

4) In the bite block work of Netsell, Kent and Abbs (1978) 

compensatory lip movement during bilabial closure was accompanied by 

increases in the velocities of the articulators in order to achieve 

closures of the same duration as for normal productions. 

If the analogy 

between bite block and loud speech is a valid one, similar temporal 

characteristics should also be observed. 

What effects this will have on 

the coordination of the individual articulators is uncertain. 

Gay 

(1977) reports specific timing relationships between the articulators 

for bilabial stops. 

Will these relationships be maintained despite the 

increased movement and velocities expected for the loud productions? 

Will similar articulatory patterns be observed for the vowels? 

* Paper presented at the Swedish Phonetics Conference, Uppsala, October 

17-18, 1986 

21

Procedure 

A magnetometer system (Branderud, 1985) was used to track the 

movements of the lips and jaw. 

Magnetic coils were placed along the 

mid-sagittal plane on the vermil ion border of the upper and lower lips 

and at the base of the incisors. The movements recorded were in the y 

plane for all three articulators and in addition in the x-plane for the 

jaw. 

An electroglottograph was used to register the opening and closing 

of the glottis. 

Two channels with different gain settings were used for 

recording the audio signal. 

All signals were recorded simultaneously 

with a Racal seven channel tape recorder at a speed of 30 ips. 

Four 

subjects were used, three male and one female, between the ages of 22 

and 30. 

The speech of all subjects is typical of that for speakers of 

standard Swedish. 

The recordings were made in a sound-treated booth at 

the Phonetics Lab, Stockholm University. 

The speech material consisted of twelve Swedish vowels appearing in 

a li'b_bl frame. By placing an unstressed Iii before the the first Ibl 

in the frame, one might induce the jaw to attain a similar degree of 

openness during the stop closure regardless of the following vowel's 

openness. 

In other words, due to the high posi tion of the jaw for the 

Iii, its start position from the Ibl should be minimally low and 

relatively the same for all vowels to follow. One must, however, 

acknowledge that this presents us with a conflict of intent, for in the 

process of influencing the jaw's position during the bilabial as 

expressed here, we would be reducing the right to left coarticulatory 

effects wh ich we have set out to study (point 2 in the introduction). 

Six lists of the words, each in different orders, 

were written on 

separate cards and were held by the author during the actual recording. 

On each card were fifteen words, that is, all the twelve vowels 

appearing in the frame, plus three additional words. In Swedish, both 

[£] and [;Q] have the same orthographic representation, "1:1.". Therefore, 

for purposes of clarification, the stimuli containing these vowels were 

preceded by real words containing the appropriate vowel quality: "i 

b1:l.ver" for [E] and "i b1:l.r" for [a]. In addition, one of the remaining 

ten words was repeated in list final position to eliminate "end of 

list" pronunciation for the preceding word. 

The productions of these 

three 

additional stimuli were not examined during the analysis. 

All the lists were read through first with normal vocal effort then 

with loud effort. 

The first list for both conditions was considered a 

22

practice list to be discarded during analysis. 

Prior to the recording, 

this first list was read through several times allowing the subject to 

familiarize himself with the material. For each subj ect, several 

attempts were necessary before feeling comfortable in distinguishing 

between the two "ibl\b "'s. Despite this procedure, subj ects were often 

requested to retake productions of these words during the actual 

recording . 

Summary of Results 

Though there is decided variability between subjects, in general we 

have found that for loud speech as compared with normal speech the 

following (as illustrated by the data for subject H.H.) holds true: 

1. In vowel production, movement increases dramatically for all 

articulators in regular, predictable fashions. For the jaw, 

distinctions in vowel height are maintained (Figs. 

la and 1b), while the 

lips clearly reflect the differences in degree of rounding and 

spreading. (Fig. 2) 

2. Coarticulation is manifested as a right to left effect on jaw 

pos it ioning. The increased displacement of the jaw for the loud 

stressed vowels had the consequence of causing its highest position in 

the preceding bilabial to lower by almost twice its normal amount 

(exception speaker C. K.) (Fig. 3) 

Coarticulation is also demonstrated 

by the nearly identical positioning of the jaw for the initial and 

stressed Iii and the following bilabial. 

3. The lowered jaw position for bilabial closure provides an 

articulatory setting strongly reminiscent of that induced by applying 

artificial perturbations to the jaw. 

This gives us cause to regard the 

shouting paradigm as providing us with a "natural" bite block. 

4. In deference to this "natural" bite block during the bilabial 

closure, motor equivalence (c.f. Hughes and Abbs, 1976) is demonstrated, 

whereby the upper lip compensates for the lowered jaw (hence, 

inferior 

lower lip position) achieving closure even more complete than for normal 

production. (see Table I) 

5. It was demonstrated that increased articulatory movement cannot 

always be depicted as a simple linear amplification of normal 

articulation by scale factors, but also entails a more complex goaloriented 

reorganization of specific movements. (Fig. 4) 

6. Greater displacements are accompanied by increased veloc ities 

23

(Fig. 5). For loud speech, this results in shorter durations of 

intervocalic bilabials. 

Durations of stressed loud vowels are, however, 

somewhat longer than normal productions (Fig. 6). To achieve durations 

equal to or shorter than those for normal speech, the ratios of velocity 

to displacement would have to have been greater (as was the case with 

speaker T.L.). 

We are thus presented with CV sequences only slightly 

longer for loud speech as compared with normal speech. 

Both 

phonological distinctions (short/long) and the inherent length of vowels 

associated with openness are maintained during loud speech. 

7. The timing order of the articulators is neither very stable nor 

predictable, across vowel contexts, speech conditions and speakers. 

More synchrony is found between the lower lip and the jaw in the 

production of loud vowels as compared with normal, whereas the converse 

is true for synchrony between the upper lip and the lower lip and jaw. 

References 

Branderud, P. (1985). "Movetrack - A movement tracking system". PERILUS 

IV. Stockholm University. 20-29 

Folkins, J. and Abbs.,J (1975). "Lip and jaw motor control during speech: 

Responses to resistive loading of the jaw". J. Speech Hearing Res. 

, 207-220 

Gay, T. 

(1977). "Articulatory movements in VCV sequences", J. Acoust. 

Soc. Am. £l, 183-193 

Hughes, O. and Abbs, J. 

(1976). "Labial-mandibular coordination in the 

production of speech: Implications for the operation of motor 

equivalence". Phonetica ll, 199-221 

Mac chi, M. (1985). .s.!:!!!!!.! !! EE.s.!:!!!!!.! .f!.E !! !iE !! 

i ! E !.i£ ! !.£E · Ph D. dissertation, New York University. 

McAllister, R., Lubker, J. and Carlson, J. (1974). "An EMG study of some 

characteristics of the Swedish vowels". Journal of Phonetics l, 

267-278 

24

Netsell, R., Kent, R. and Abbs,J. (1978). "Ad justments of the tongue and 

lips to fixed j aw positions during speech: A preliminary report ". 

Paper presented at the Conference on Speech Motor Control, 

University of Wisconsin, Madison, 

Wisconsin. 

Schulman, R. (1985). "Articulatory targeting and perceptual constancy of 

loud speech". PERILUS IV. Stockholm University. 86-91 

Sussman, H.M., MacNeilage,P. F. and Hanson,R.J. (197 3). "Labial and 

mandibular dynamics during the product ion of bilabial consonants: 

Preliminary observations ", J. Speech Hering Res. , 397-420 

25

TABLE I. 

Mean displacement in millimeters from rest position of 

articulators at point of minimum lip separation during the first 

bilabial stop of the /i'bVb/ test words. 

Values are averaged for twelve 

Swedish vowels (5 tokens for each vowel) for normal (X N ), 

loud (X L ) and 

the difference between normal and loud productions (LN ) ' 

FIG. 1. Jaw opening during production of stressed Swedish vowels 

plotted according to their traditional phonological classification in 

terms of front, central, back and rounding. 

Normal and loud productions 

for each speaker appear in separate plots (la and lb, 

respectively). 

FIG. 2. Maximum lip separation for loud speech plotted against normal 

productions for the stressed vowels. 

The traditional terms inrounded, 

out rounded and spread are used. 

Two regression lines are fitted to the 

data: for out rounded vowels and for spread vowels. Inrounded vowels are 

not included in the calculation of these lines. 

FIG. 3. Position of the jaw at minimum displacement during the 

production of the first bilabial stop against the jaw's position at 

maximum displacement during the following vowel. 

Separate regression 

lines are fitted for normal and loud productions. 

FIG. 4. Position of the jaw at maximum displacement during the 

production of the frame initial segment ([i)) against the position of 

jaw at minimum displacement in the following bilabial stop. 

A single 

regression line is fitted to both normal and loud productions. 

FIG. 5. Averaged movements of the: (a) upper lip component; (b) lower 

lip component; 

(c) jaw component for speaker H. H. 's productions of 

/i'bab/. For each articulator the normal (l), loud (2) and normal 

multiplied by a scale factor (3) movements are presented. 

Figure d 

shows lip separation for this se quence and is produced by subtracting 

the movement curves of Figure a from the sum of the curves in Figures b 

and c. 

The normal and loud audio signals for this se quence is displayed 

26

in the top of each figure. 

FIG. 6. Peak velocity of the jaw during movement from the closure of the 

first bilabial stop to its position of maximum displacement during the 

stressed vowel versus displacement of jaw from the position of minimum 

excursion (bl) to maximum excursion (V). 

One regression line is fitted to 

normal and loud productions . 

FIG. 7. 

Acoustic durations are plotted for each vowel context for normal 

and loud productions of the intitial bilabial stop and the stressed vowel. 

The vowel segment (white bar) begins at the termination of the bilabial 

stop (black bar). 

27

PARJ>.METER X N 

X L 

l:. XLN X N 

X L 

l:. XLN X N 

X L 

l:.X LN 

X N 

X L 

l:.X LN 

LL-UL -1.00 .03 1. 03 -1 .51 -2.15 -.64 -2.06 -.83 1. 23 -.51 2. 19 2.70 

LL -1.96 -3.59 -1 .63 -2.30 -2.88 -.58 -4.16 -6.09 -1.93 -.80 1. 94 2.74 

JY -3.09 -5.57 -2.48 -4.16 -5.08 -.92 -5.81 -8.16 -2.35 -4.46 -3.33 1. 13 

LL-J 1.13 1. 97 .84 1. 87 2.20 .33 1. 65 2.07 .42 3.66 5.27 1. 61 

UL -.95 -3.62 -2.67 -.78 -.73 .05 -2.10 -5.26 -3.16 -.30 -.25 .05 

Table I 

ARTICULATORY 

H.H. T.L. P.H. C.K. 

N 

00

Figure 1a. 

NORMAL 

i 

 

5 e " 

i 

10 

a 

u 

0 

0 

a 

" 

E 

E 

'"' 

::3: 

.. 

Fi gure 2. 

3S 

"" 30 

E 

E 

v 

a 

:::) 2S 

0 

-J 

. . 

Z 

0 

.. 

20 

I- 

Figure 4. 

/'"'. 

E 

E 

\./ 

10 

/'"'. 

-.-4 

\./ 

-J 

.. 

.. 

Figure 6. 

,... 

0') 

) 

E 

V' 

400 

350 

>- 300 

 

- 

U 

0 

-l 

W 

> 

250 

(y=7.1x+1O.9) 

=- 

 

0 

-J 

Cl 

z 

AN EXPERIMENT ON THE CUES TO THE IDENTIFICATION OF FRICATIVES 

HARTMUT TRAUNMULLER 

DIANA KRULL 

ABSTRACT 

Synthetic fricatives with two spectral peaks scanning a wide range of 

frequencies were put into three versions of the context [a £:] , also 

generated synthetically, and imitating a male speaker (1), a child (2), 

and an aroused male speaker (3) with elevated Fa and Fl. The stimuli 

were presented in two orders, with increasing or decreasing frequencies 

of the spectral peaks, to 16 speakers of Swedish who identified the 

fricatives as [f] , [s] , [c], [], or [ 6]. In a given context, the 

obtained phonetic boundaries followed mainly the spectral peak lowest in 

frequency, while the upper peak contributed only marginally even if it 

was at a distance less than the "critical distance" of about 3 Bark. In 

context (2), as compared with (1), the phonetic boundaries were shifted 

up, but less (in Bark) than the vowel formants. 

INTRODUCTION 

It is well known that the characteristic frequencies, i. e. , the 

frequencies of the formants and the fundamental in speech sounds with a 

given phonetic quality vary with the overall dimensions of the speaker's 

vocal tract. If the characteristic frequencies of vowels are converted 

into a measure of tonotopical place, such as critical band rate (Bark), 

differences in speaker size can be seen to correspond to a tonotopic 

translation of the auditory pattern of excitation [11]. 

Identifications of synthetic two-formant vowels revealed that a uniform 

tonotopic compression of the auditory pattern of excitation with a 

fixed point in the region of F3 also preserves phonetic quality [12]. 

Natural vowels are transformed in this way in shouting and in whispering 

[ 11 ] . 

The present investigation is about the transformations the spectra of 

voiceless fricatives can be subjected to without affecting their phonetic 

quality. It is known that voiceless fricatives can be synthesized 

satisfactorily with two resonances and one antiresonance and that the 

33

cues to the phonetic identity of voiceless sibilants reside mainly in 

the stationary part of their spectrum, while the transitions are more 

important for non-sibilants [5, 7]. One-parameter sibilants can be 

synthesized using a resonance and an antiresonance one octave lower in 

frequency [5] . Such sibilants lack intrinsic cues to speaker size. In 

spectrogram reading, the Swedish voiceless sibilants can be distinguished 

by the frequency of spectral energy onset while there is more 

variation, even within the same speaker and context, in the detail above 

that frequency [6]. A second characteristic spectral peak can, however, 

often be discerned and one question we address here is whether this second 

peak is used to normalize for speaker size. We also investigate in 

how far a vocalic context can serve this purpose. 

METHODS 

Subjects 

The experiments were conducted with a group of 20 native and 6 nonnative 

speakers of Swedish, all employees or students at the Institute 

of Linguistics at Stockholm University. None of them reported auditory 

handicaps and all were familiar with the phonetics of Swedish, possessing 

Iff, ls i, //, and /J/. We report here the results of 16 native 

speakers with uniform behavior, mostly speakers of the local variety 

with the distributional allophones [] and [51 for / J /, but including 

three speakers of southern varieties, who had no [] in their own 

speech. 

Stimuli 

The stimuli were synthetic VCV sequences. The vocalic segments had 

been obtained by synthetic imitation of a natural [a 1 s:f: ], produced by 

a male speaker of Swedish (Stockholm variety). A three parameter voice 

source [3] signal in accordance with that utterance was generated by the 

procedure described in [12]. The vocalic as well as the fricative segments 

were generated in serial synthesis by use of a block diagram 

simulating program (sampling at 16 kHz, 16 bit/sample). Eight vowel 

formants were used. Their bandwidths obeyed the standard relation 

Bi 

= 

0.05 Fi + 50 Hz. 

The fricatives were generated by feeding white noise through a high- 

34

pass and a low-pass resonance filter, both of second order and with 

Q=10. The two resonance frequencies Fl and Fh were varied in steps of a 

factor 4 1/9 (approx. 1.0 Bark). 42 combinations of Fl and Fh were used 

to scan the auditory space as shown in Figure 1. The fricatives had a 

duration of 0.20 s and the intensity onset and offset of the natural [s] 

was also imitated. 

A second version of the vowel context was obtained by a uniform 

translation of all vowel formant frequencies by + 2.5 Bark. The voice 

source parameters were rescaled in such a way that the mean FO, weighted 

according to amplitude, was also translated by + 2.5 Bark. This transformation 

produces the characteristic frequencies in vowels of children 

four to five years of age from those of the same vowels pronounced by 

men [11]. 

A third version of the vowel context was obtained by a uniform tonotopic 

compression of all formant frequencies and the weighted mean FO. 

The compression is described by Equation [1]: 

z Zo + 0.15 (15.5 - Zo ) [ 1 ], 

where Zo is the critical band rate of a characteristic peak in the 

original version, and Z is the corresponding value in the compressed 

version. This transformation produces the characteristic frequencies of 

shouted vowels from those of the original [11]. Between these modes of 

speech, there are, however, additional differences which have not been 

imitated in our stimuli which provoked the impression of being produced 

by an aroused speaker rather than by a shouting one. 

For conversion of the vowel formant frequencies f (in Hz) into critical 

band rate z (in Bark) Equation [2] that agrees to within ± 0.05 Bark 

with the empirical values [13] in the range of 0.2 to 6.7 kHz [10] was 

used and for reconversion Equation [3]. 

z = (26.81 

f / (1960 + f)) - 0.53 

[ 2] 

f 

1960 (z + 0.53) / (26.28 - z) 

[ 3 ] 

The formants, which were stationary, had the frequencies listed in 

Table 1 together with the weighted mean FO. 

35

Table 1..: The characteristic frequencies of the 

three versions of the same vowels (in Hz). 

Neutral male 

Neutral child Aroused male 

[a] [ E: ] [a] [ E: ] [a] [ E: ] 

FO 102 110 327 337 298 306 

F1 751 442 1153 751 945 639 

F2 1248 1799 1626 2617 1421 1932 

F3 2501 2390 3702 3525 2558 2461 

F4 3359 3413 5160 5258 3287 3332 

F5 4311 4386 6977 7131 4052 4111 

After D/A conversion the stimuli were recorded on tape in two different 

orders. First, Fl and Fh started at their highest values, 24 and 25 

log. units. Fl subsequently decreased in steps of 2 u. and Fh in steps 

of 1 u. until the distance between the two peaks reached 7 u. In the 

following descending series of stimuli Fl and Fh started 1 u. below the 

initial values, etc. In the second order Fl and Fh started at their 

lowest values, 7 and 14 u., and ascended in reversal of the first order. 

Each stimulus had a duration of .8 s and was presented twice in 

succession with an interval of 1.5 s. In the following, any sequence of 

this kind is considered as one "stimulus". Each stimulus was followed by 

a pause of 2.5 s for the subjects to respond. A pause of 5 s was inserted 

before each new series of stimuli. The stimuli were presented in six 

blocks, beginning with the neutral male version in the first (1) order, 

followed by child (2), aroused male (1), neutral male (2), child (1), 

and aroused male (2). 

Procedure 

The subjects were tested in a quiet, sound treated room and the 

stimuli were presented to them via Sennheiser HD414 headphones at a 

comfortable listening level. The subjects received answer sheets with a 

set of the five symbols "G, s, tj, rs, sj" for each stimulus. After explaining 

the meaning of the symbols ([8] or [f], [s], [s;], [], [6]) and 

presenting a few stimuli for aquaintance, the subjects were asked to 

mark for each stimulus the symbol of the fricative they had heard. They 

36

were allowed to mark two different symbols in cases of doubt. Singlesymbol 

responses were counted as two markings of the same symbol. 

Two-dimensional histograms were obtained from the distribution of assigned 

labels as a function of the Fl and Fh values. The histograms were 

locally normalized with respect to the total number of responses to each 

stimulus and smoothed by a spatial cosine filter. "Phonetic boundaries", 

say between [s] and [s;], were obtained by considering only the [s] and 

[] labels and computing the 50 % level curve. 

RESULTS AND DISCUSSION 

Effects of presentation order 

"8"-labels were infrequent and mainly attached at the highest resonance 

frequencies and, occasionally, at the very lowest. The boundaries 

between the sibilants are shown in Figure 1. The effect of contrast can 

clearly be seen at the [] - [] boundary which is shifted by 0.9 Bark 

in Fl between the two orders of presentation. Since contrast presupposes 

that at least one similar stimulus has been heard, there is no such 

effect at the beginning of each series (shown with thin lines in Figure 

1). There, the responses are, instead, likely to be biased by expectation 

towards [s] or [6] responses because the previous series of stimuli 

begun with these sounds. Outside this region, the [s] - [] boundary is 

shifted just as much as the [] - [] boundary. As for the boundary between 

[6] and [t>] , the responses are likely to be biased towards [] , 

because this allophone would normally occur in an laSE:I sequence as 

pronounced by most of our subjects. This would explain the deviant 

course of this boundary in the second order of presentation. 

Effects of intrinsic properties 

The perceptual role of the two spectral peaks in our stimuli can be 

understood by studying the slopes of the boundaries in Figure 1. The 

boundaries whose slope is not affected by order effects are well approximated 

by straight lines. Two of them ([] - [s] and [] - [6]) have a 

course almost perpendicular to the Fl-axis, implying that the higher 

resonance Fh is practically irrelevant for these distinctions. Then, of 

course, the distance between the spectral peaks is also irrelevant. 

Thus, intrinsic properties of these stimuli were not used to normalize 

37

for speaker size. 

Phonetic boundaries might possibly be given by a gross center of 

spectral gravity, like perceived "sharpness" [1]. 

Since Fh does affect 

the sharpness of our stimuli - as affirmed by informal listening - the 

results show that sharpness is not an invariant quantity in sibilants 

with a given phonetic quality. 

If the resonances are separated less than a critical distance of 3.5 

Bark observed by Chistovich et al. [2] the phonetic boundaries might be 

expected to reflect an integrated spectral peak. The main part of our 

L] - [s] boundary runs through an area where Zh - Zl < 3.5 Bark (see 

Figure 1). The slope of this line indicates, however, that this phonetic 

decision is only based on the pitch of the lower spectral peak or on the 

- 

N 

I 

.::t:. 

- 

.!: 

u.. 

6 4 2 

(Bark) 

Figure !: Phonetic boundaries between Swedish sibilants. 

First (continuous) and second (dashed) order 

of presentation. Pooled contexts. 

38

spectral onset of auditory excitation. Similar results have been obtained 

in non-phonetic pitch matching tasks [4, 9] for frequencies below 

1 kHz. 

The boundaries between [] and [] are, however, not completely 

independent of Fh' This may be due to the fact that [] and [] are the 

sibilants for which our synthetic stimuli were closest to the natural 

versions, as judged by comparison with measured spectra of Swedish sibilants 

[9, 8]. The other phonetic boundaries might have followed a similar 

course if the stimuli had been closer imitations of natural sibilants. 

The phonetic boundaries can be described by Equation [4]: 

I· 1. 

[4 ], 

where ki is a factor expressing the perceptual weight of Zhi' see Table 

2, and Ii is a constant characteristic of boundary i. The factor k might 

reflect the goodness of fit between the auditory spectra of the synthetic 

stimuli and those of natural sibilants, but it might, alternatively, 

be a function of (Zh - Zl)' In that case the phonetic boundaries in 

Figures 1 and 2 should deviate slightly from linearity. Interestingly, k 

is most negative for (Zh - Zl) 3.5 Bark. This reminds of the suggestion 

by Syrdal et al. [8] to regard this distance as specific of phoneme 

boundaries among sonorants. While our data do not immediately support 

this for sibilants - the observed boundaries are not perpendicular to 

the (Zh - Zl)-axis - they do show a tendency in this direction. 

Table : Perceptual weight k of Fh 

in relation to that of Fl' cf. Equation [4]. 

Phonetic 

boundary 

k -0.05 

-0.20 -0.27 -0 .10 

Effects of context 

Since intrinsic normalization for speaker size is almost absent in 

our results, we would expect such a normal ization, which theoretically 

would be appropriate, to be mediated by context. Figure 2 illustrates 

39

the effects of transforming the spectrum of the vowel context. We can 

see that the boundaries between sibilants are affected by the acoustic 

properties of the vowel context whose phonetic quality was close to 

invariant. 

The extent of the boundary shift between the neutral male and the 

child version of the vowels ( between +0.7 and +1.3 Bark ) is, 

however, 

smaller than the translation of the vowel spectra (+2.5 Bark ) , especially 

at the 

[] - [6] boundary. 

The boundaries in the aroused male version are shifted from those in 

the neutral version about halfway in the same direction as those in the 

child version. The [] - [6] boundary ( at 11.6 Bark = 1.6 

kH z ) is 

- 

N 

I 

 

- 

..c 

U. 

6 4 2 

< 

(Bark) 

Figure : Phonetic boundaries between sibilants in 

contexts of a man's ( cotinuous), a child's (dashed), 

and an aroused man's ( dash-dotted) vowels. 

Pooled 

orders of presentation. 

40

shifted by roughly +0.3 Bark, i. e. , less than the vowel formants in the 

same frequency region (+0.6 Bark). Since, further, the upper vowel 

formants (above 15.5 Bark = 

2.9 kHz) in the aroused male version are not 

shifted upwards but slightly downwards, the shift of the [s] - [9] 

boundary (at 21 = 19 

Bark) can not have been guided by the vowel formants 

in the same frequency region. Apparently, the sibilant boundaries 

are shifted about half as mu ch as some weig hted mean of the vowel 

formants, F2 given the highest weight. This would hold approximately for 

both of our context transformations, but the correlation of the extent 

of boundary shift with F l 

remains an open question. 

41

ACKNOWLEDGEMENT 

This research has been supported by a grant from HSFR, the Council 

for Research in the Humanities and Social 

Sciences. 

REFERENCES 

[1] G. v. Bismarck, Extraktion und Messung von Merkmalen der Klangfarbenwahrnehmung 

stationrer Schalle, MUnchen 1972. 

[2] L. Chistovich and V. Lublinskaya, "The "center of gravity" 

effect in vowel spectra and the critical distance between formants", 

Hearing Res. l, 1981, 185-195. 

[3] G. Fant, "Glottal source and excitation analysis", STL-QPSR 

1/1979, 85-107. 

[4] R. Glave Untersuchungen zur Tonhhenwahrnehmung stochastischer 

Schallsignale, Helmut Buske Verlag, Hamburg, 1973. 

[5] J. M. Heinz and K. Stevens, "On the properties of voiceless 

fricative consonants", :!.:.. Acoust. Soc. Am. ll, 1961, 589-596. 

[6] P. Lindblad, Svenskans sj e- och tj e-ljud i ett allmfonetiskt 

perspektiv, CWK Gleerup, Lund 1980. 

[7] J. Martony, "On the synthesis and perception of voiceless fricatives", 

STL-QPSR 1/1962, 17-22. 

[8] A. K. Syrdal and H. S. Gopal, "A perceptual model of vowel 

recognition", J. Acuost. Soc. Am. 7...2.., 1986, 1086-1110. 

[9] H. TraunmUller, "Perception of timbre: It, in R. Carlson and B. 

Granstrm (eds.), The Representation of Speech in the Peripheral Audito 

Ei. System, Elsevier Biomed. , 1982, pp. 103-108. 

[10] H. TraunmUller, "Analytical expressions for the tonotopical 

sensory scale", part of Ph. D. thesis, Stockholms Universitet, 1983. 

[11] H. TraunmUller, "Some aspects of the sound of speech sounds", 

contr. to NATO-ARW on psychophysics of speech perception, Utrecht 1986. 

[12] H. Traunmfiller and F. Lacerda, "Perceptual relativity in identification 

of two-formant vowels", Speech Communication, 1987, (in 

print) . 

[13] E. Zwicker, "Zur Unterteilung des hrbaren Frequenzbereiches in 

Frequenzgruppen", Acustica lQ, 1960, p. 185. 

42

SECOND FORMANT LOCUS PATTERNS AS A MEASURE OF 

CONSONANT-VOWEL COART ICULATION 

Diana Krull 

1. Introduction 

Formant frequencies at the consonant-vowel boundary depend not 

only on the place of articulation of the consonant but al so on 

the adjacent vowel. Fant (1973) measured F2, F3 and F4 at stop 

consonant- vowel boundaries of one male Swedish speaker. His 

results 

showed that there is a considerable variation of formant 

freqencies at CV boundaries, 

especial ly in connection with voiced 

stops; also, labial s and velars demonstrate greater variation 

when compared to dental s. 

The variation is most pronounced for F2 

although the dif ference measured in Hz is sometimes larger for 

F3, it will amount to less on a perceptual scal e. 

ohman (1966 ) used voiced stops between systematicall y varied 

preceding and following vowel s and demonstrated that F2 at the CV 

boundary is influenced also by the preceding vowel . Both these 

studies have shown that there is a strong coarticulation effect 

from adjacent vowels on F2 at CV boundary, 

thus contradicting the 

claim of an invariant F2 locus made by Del attre, Liberman, and 

Cooper (1955). 

Coarticulation does not work in one direction only, that is, 

there is also an influence from the consonant on the vowel. 

Thus, 

for example, F2 measured in the middle of the vowel is lower in 

43

Ibubl than in Idudl everything else being equal, see Lindblom 

(1963) . 

The aim of this investigation was to compare the amount of 

coarticulation in spontaneous speech and in isolated words on the 

basis of the second formant trajectories. The differences in F2 

between two pOints, one at the CV-boundary and another in the 

middle of the vowe I, should decrease with increasing 

coarticulation because the adjacent sounds would become more 

alike, although we would not be able to te ll whether it was the 

vowel that had influenced the consonant or vice versa. F2 was 

measured at two pOints on a Voiceprint spectrogram as shown in 

Fig.1: (ll at the CV boundary, and (2) in the middle of the 

vowel. We called the first point the "locus" of the second 

formant and defined it as the frequency of the formant at the 

first pulse of the vowel after consonant release. Locus was not 

measured at the moment of consonant release because in the 

spontaneous speech there was often no visible burst. 

(Measurements at the release may also be difficult due to the 

rapid transition.) The second point, measured in the middle of 

the vowel, we called "target". Both terms were used in a more 

concrete sense than had been done earlier: Delattre et al (1955) 

had 

defined "locus" as a point on the frequency scale about 50ms 

before consonant release which they considered to be the 

virtual 

starting point of the formant; Lindblom (1963) had used "target" 

in the sense of an asymptotic value towards which the formant 

frequency is aimed. 

44

Fig.! Example of measurements on a spectrgram: F2i at the first 

pulse of the vowel after stop release, and F2t in the middle of 

the vowel. 

The relation between the two pOints can be expressed in what we 

call the "locus equation" 

F2i = k * F2t + c 

where F2i is the initial locus, F2t the vowel target, and k and c 

are constants. 

45

The value of k determines the slope of the regression line for 

the locus frequencies (see Fig. 2'. The slope shows the amount of 

coarticulation: thus, for example, if k=O then F2i=c and there is 

no coarticulation at all; the xocus is invariant. If, on the 

other hand, k=1 then locus is completely dependent on the vowel 

target, and there is maximal coarticulation. Other studies 

(Lindblom, 1963; Lindblom and Lacerda, 1985) have shown that the 

mount of coarticulation varies with consonant place of 

articulation: the strongest coarticulation is connected with 

the 

labials, the weakest with the palatal /g/, while the denta ls and 

the retroflexes lie somewhere in between. 

3.0 

"... 

N 

I 

:::.:: 

'../ 

N 

LL 

2.5 

0::: 2.0 - 

0 

LL 

(j) 

:::l 

u 1.5 

0 

-l 

-l 

How does speech style affect the amount of coarticulation? 

Lindblom and Lindgren (1985) investigated CV coarticulation for 

IbV I and IdV I by comparing the size of F2 trajectories between 

locus and target. Their results showed that there is more 

coarticulation in a neutral speech style in comparison with clear 

speech. 

Would the same kind of difference be found between words 

occurring in spontaneous speech and the same words spoken in 

isolation? That is, would words occurring in spontaneous speech 

in general display more CV coarticulation than 

words 

spoken 

in 

isolation? 

2. Experiment 

The spontaneous speech material used in this investigation 

consisted 

of recordings made for the project "Phonetic variation 

in natural speech" (Lindgren, Lindblom, and Krull, 1986) . The 

recordings were made of two male speakers of Central Swedish. 

Spontaneous speech was elicited in two ways: firstly, the 

speakers were asked to retell short stories they had been given 

to read beforehand; secondly, they conversed freely with each 

other. The recordings were made in a quiet room at the Phonetics 

Laboratory of the University of Stockholm. 

Only word initial CV combinations were used for measurements in 

this study. The first CV combinations consisted of a voiced stop 

followed by a vowel. Only labial and dental stops were used: Igl 

before front vowels is, with few exeptions, pronounced as (j) , 

and the velar samples before back vowels showed too little 

variation in F2 for meaningful locus equtions to be set 

up. 

Stops tat did not have a complete closure were not used. 

47

For each speaker a list was prepared containing the words in 

his 

spontaneous speech sample that had been measured. The speakers 

were then asked to read the list with a short pause between 

items. The words were in random order without context, each 

occurring twice. The second occurrence was meant to be used in 

case the first reading of the word should present difficulties of 

measurement. Only one item was measured. We shall refer to the 

words read in isolation as "reference words" (see word list in 

Appendix). 

The measurements of locus and target frequencies were carried out 

as shown in Fig. !. The resulting locus-target plots for dentals 

can be seen in Fig. 3. For speaker PaT there was clearly more 

coarticulation in spontaneous speech where k=. 45 while the 

corresponding value for the reference words was k=. 25. For 

speaker AV there was less of a difference: k=. 47 in spontaneous 

speech and k=. 43 for reference words. 

A low F2 in a preceding vowel usually lowered the frequency of 

the initial locus slightly while a high F2 had the opposite 

effect. To check the amount of influence from preceding vowels, 

the slope of the regression line for speaker AV was calculated 

for cases where Idl was preceded by another dental consonant. 

The 

result showed very little difference in slope (k=. 49 as compared 

to K=. 47) but there was much 

less variation in locus 

frequencies at a given target value. 

The second stop consonant investigated was Ib/. Earlier results 

referred to above and the fact that the tongue is not involved in 

48

DENTAL (SPONT. SPEECH) 

DENTAL 

(REFERENCE WORDS) 

SPEAKER: 

PAT 

SPEAKER: 

PAT 

2.5 

F2LOC=.45X+.81 

"- 

N 

I 

 

2.5 

F2LOC=.25X+1.1'3 

N 

LL 

a; 2.0 

o 

LL 

N 

LL 

a; 2.0 

o 

LL 

(j) 

::J 

U 1. 5 

o 

-J 

(j) 

::J 

U 

o 

-J 

x 

-J 

their production led us to expect more coarticulation with labial 

consonants, and such was also the case here. There was also more 

coarticulation in spontaneous speech: for both speakers, the 

regression line had a lesser slope in the case of reference words 

(Fig. 4). In the spontaneous speech of speaker PaT the value of 

k=. 96 indicated an almost maximal dependency on the following 

vowel for most of the cases. However, on the spontaneous speech 

plots of both speakers, there were locus-target points that 

formed their own group apart from the other points, their locus 

frequency being lower than could be expected from the target 

value. These points - marked with "x" on the plots all had 

their origin in the same word: t b: raJ 'only' which had 

undergone different degrees of reduction, the most extreme case 

being [ba3. (It should be noted that gara is a word that tends to 

be reduced more than other content words. ) 

3. Discussion 

The results show more overlapping of CV in spontaneous speech. 

Why should this be so? One possible explanation may lie in the 

time factor: no systematic comparisons of word durations have 

been carried out yet, but random samples all showed compressions 

in word length. The word , for example, showed great length 

variation in the spontaneous speech of both speakers, but even 

the longest item of bara in sponaneous speech was 45% of the 

reference version for speaker PaT, and 60% for speaker AV; the 

shortest item for both speakers was only 13% of the reference 

version. 

A shortening in duration can affect the relative timing, 

and thus the coarticulation, in adjacent segmental gestures, 

although some speakers seem to be able to avoid this effect by 

50

" 

LABIAL (SPONT. SPEECH) LABIAL (REFERENCE WORDS) 

SPEAKER: PAT SPEAKER: PAT 

3.0 3.0 

N 

I 

 

" 

N 

2.5 F2LOC=.%X+.03 I 

2.5 

'V 

F2LOC=.81X+.21 

('J 

I..i.. 

0:: 

0 

I..i.. 

Ul 

::l 

u 

0 

-l 

-l 

a: 

I- 

Z 

2.0 0:: 2.0 

0 

° I..i.. 

° °0 

Ul 

0 

::l 

1.5- 

speeding up their articu latory movements (Kuehn and Mo ll, 1976; 

Gay, 1981). An increase of coarticu lation with a faster speaking 

rate has also been shown for Swedish by Engstrand and 

Nordstrand 

(1983) through measurements of initia l and media l formant 

frequencies in vowe ls (corresponding to locus and target in this 

investigation). Moreover, Engstrand (forthcoming) measured 

utterances of two Centra l Swedish speakers, especia lly Ipi pu 

pal, on x-ray fi lm. He found re lative ly litt le coarticu lation 

when speech rate and stress were norma l; at a fast speech rate 

coarticu lation increased, 

especia lly in stressed sy l lab les. 

It is also possib le that a dimension of more or less clear 

pronounciation - "hypo" and "hyper" speech - has inf luenced the 

coarticu lation in our experiment. 

This dimension can be 

independent of speech rate, though 

instead dependent on, 

for 

example, socia lly or communicative ly determined factors (Lindb lom 

and Lindgren, 1985) . 

A 

special exp lanation is necessary for the locus-target relation 

in bara: why is there not the same amount of coarticu lation here 

as in the rest of the words beginning with IbVI or even those 

beginning 

with 

Iba/? In Swedish the phoneme lal has two 

phonological ly 

distinct lengths. The length distinction is 

accompanied by a dif ference in timbre whose main acoustic 

correlate is the frequency of the second formant: the long 

variant has an F2 of about 1000Hz for ma le speakers whi le the 

corresponding frequency in the short variant is about 1250Hz. 

Dif ferences in timbre between short and long vowe ls can be 

perceptually re levant even if the length distinction is removed 

(Hadding-Koch and Abramson, 1964) . Thus when depending on e. g. 

52

the speech tempo, the length of a long and a short lal over lap, 

their timbre can sti ll make them clear ly perceivab le as 

distinct 

sounds. 

In the word the anoma ly on the locus plot for spontaneous 

speech lay in the fact that the target va lue of the second 

formant in the reduced version had risen to about 1200-1300Hz 

whi le the locus frequency had retained its va lue appropriate 

for 

the long variant of lal (in words with an origina lly short 

version of lal, the locus lay at 1200-1250Hz). To begin with, we 

looked for an exp lanation of the anoma ly in the phonetic features 

of the adjacent sounds. However, an investigation showed that 

this exp lanation was insufficient: for speaker AV, for examp le, 

was in eight of the nine cases preceded by lal, lei or a 

denta l consonant and in all but two cases fo l lowed by segments 

that cou ld not have raised F2: labia l consonants, back vowe ls, a 

pause. For speaker PaT the word was in four cases preceded by 

lei, in one by lal and in two by denta l consonants. In his case, 

however, the word was fo l lowed by consonants that may have 

raised F2, but in such cases the locus shou ld have been expected 

to be raised too (see word lists in Appendix). The same was true 

if the second formant was raised by the Irl of which there was 

sometimes a trace present. Some examp les of are shown in 

Fig.5, 

together with an examp le of 

the word babbe l which has 

origina lly a short variant of la/. 

It can be argued that there is no reason for us to expect the 

locus as a function of the target to form a straight line. There 

is, however, reason to expect a straight line on theoretical 

grounds. We cou ld think of the lip opening after Ibl re lease in 

53

in terms of a lip rounding. The effect of rounding has been 

ca lcu lated by Fant (1960) for articu latory configurations with 

constrictions situated from the lip opening to the glottis. 

If we 

choose constriction locations from about 5. 5cm to 13cm from the 

that 

is, 

corresponding to vowe ls with pa lata l to 

pharyngea l constrictions, 

and choose two curves within this 

section, one for an unrounded vowe l and another with a lip 

rounding, we get two curves with a slow ly diminishing distance 

between them, the curve with the rounded va lues lower in 

frequency. When the lips open after a labia l consonant they are 

rounded, we can therefore think of the rounded articu lation as 

the locus and the unrounded as target; a locus-target plot in 

this case wou ld form an almost straight line. 

It therefore seemes that in the cases where ba: raJ was reduced 

to (bal the speakers pronounced an Cal but that sub limina lly 

there sti ll was an to.: :1 present. This raises a question that 

ca lls for further investigation: is the change from (Y bn: ra) to 

tba) an examp le of a phono logica l process and the change by 

definition discrete; or is it a phonetic transformation and can 

therefore be continuous? The present data seem to indicate that 

the change is phonetic: first ly, because the speakers seem to 

begin to say one sound and continue with another, but also 

because the disappearance of Irl does not happen in one step 

but 

is gradua l. 

55

A B c D 

f 

Fig.6 The reduced form [val from the words A: yar; B: vag; 

as a tag question. 

C,D: 

Prel iminary investigations indicate that the same reduction 

effect as in also appears in words like vara - 'be'; ar - 

'was'; and 'what'. In these cases, the consonant is 

normally deleted in fluent speech and the words are pronounced as 

Cv:J or reduced to (va). As was the case with the second 

formant in the reduced form (va) began lower than in words that 

have a short taJ to begin with. There was an exception, however: 

if the (va) used appeared as a tag question at the end of a 

sentence its F2 locus began at the frequency of the target as 

in 

words with an original (aJ, indicating that (val (from) in 

this function probably has been lexicalized with the short 

variant of lal (Fig.6). In the case of Ivl the coarticulation 

56

with the following vowel was even stronger than for biabial 

stops, therefore there appeared no anomaly on the locus-target 

plot as for labial stops. However, the loci for the reduced form 

(val which were not tag questions lay always below the regression 

line while the loci for (val used as a tag lay on or above the 

line. 

Our preliminary investigations have also shown that the 

differences in the amount of coarticulation between spontaneous 

speech and reference words similar to those in stop 

exist even for other dental and labial consonants. 

consonants 

In the case of 

nasals the difference may be even greater. 

The research reported here has been supported by the Bank of 

Sweden Tercentenary Foundation, grant nr 86/109. 

57

APPEND IX 

WORD LISTS 

If there was more than one occurrence, the number is given in 

parentheses. 

CV of first syllab le of the word in the midd le column was the one 

measured, 

the columns to the left and right give the context. 

Speal.:er AV: 

nej banne mig 

det bara vissa 

for bara no ll 

hant bara for 

inte bara perception 

jag bara och (2 ) 

jag bara runt 

man bara pi1 

sl.:all bara sitta 

inte be El is 

Bengtson behovde 

Eli s Bengtson (6) 

att 

beral.:na 

inte betala 

mer 

beta la 

ocl.:si1 beta lt 

liten bit fri1n 

jag bor just 

har bott i 

laser bocl.:er 

pi1 

borjan 

en dag att 

en dag di1 

fri1n dag till 

ti 11 dag ** 

andra dagar si1 

soml iga dagar si1 

di1 dags att 

var dags att 

Bengtsons dat- sagan 

Bengtsons datamasl.:in 

oh 

datamasl.:iner 

en 

dator 

och den I.:oper 

och den 1 i gger 

ah den ar 

** det 1 i - 

** det 1 i gger 

** det ar inte (de'nte) 

** det ar halvt 

58

ja det var 

att det blir 

att det borde 

att det ar 

haf t det he II er 

han det att 

han det oc:h 

just det oc:h 

med det dar 

men det ryms 

mnaden det ar 

oc:h det gjorde 

p det sattet 

skriva det man 

ti II det Kunde 

ut det har 

eh det visade 

han dit oc:h 

alia dom skatter 

ja du viii 

ett dugg skatt 

# d blev 

den d # 

oc:h d blev 

oc:h d kom 

oc:h d sa 

satt d oc:h 

va l d dags 

ar d sprk 

bott dar det 

den dar I i II a 

den dar maskinen 

det dar # 

det dar det 

det dar oc:h 

vagen dar ungef ar 

fervnad darfer att 

Speaker PaT: 

myc:ket babbe l (2) # 

babb la-babb la-babb la 

oc:h 

badrum 

han bara en 

erat bara enke lt 

hade bara ett 

ar bara grinig 

ska ll bara hamta 

ar bara nnting 

har bara tio 

ar bar a tre 

har bara tio 

ar bara tre 

det bara ar 

in 

barnen 

utf odrar barnen 

andra barnet 

ena 

barnet 

har 

boken 

59

tt 

borde 

ka bart of tare 

ker buss gor 

av 

bussen 

vid 

busshllplatsen 

det 

basta 

mycket battre 

det 

borjar 

en dag och 

en dag s 

tt 

daghem 

p 

dagis 

p 

Danderyd 

saga datum 

tt 

datum 

enligt den har 

och det mste 

tt det ringer 

och det stammer 

tt det ar 

med det ar 

s dog han 

s du har 

och d borjar 

och d hade 

-vis d hem 

sig d inte 

utsikt d ocks 

och d ser 

och d springer 

60

REFERENCES 

Delattre, P., Liberman, 

loci and transitional 

27, 769-773. 

A.M., and Cooper, F. S 

cues for consonants. J. 

(1955). Acoustic 

Acoust. Soc. Am. 

Engstrand, O. (forthcoming). Articu latory correlates of stress 

and speaking rate in Swedish VCV utterances. J. Acoust. Soc. Am. 

Engstrand, O. and Nordstrand, L. (1983). Acoustic features 

correlating with tenseness, laxness, and stress: prel im inary 

observations. RUUL 11. Dept. of Linguistics, Uppsala University. 

Fant, G. (1973). Stops 

features. MIT Press. 

in CV sy llab les. In Qeech sounds 

Gay, T. (1981>. Mechanisms 

Phonetica 38, 148-158. 

in the contro l of speech rate. 

Hadding -Koch, K. and Abramson, A.S. (1984). Duration versus 

spectrum in Swedish vowels: some perceptual exper iments. Studia 

Linguistica, Lund, 94-107. 

Kuehn, D.P. and Moll, K.L. (1976). A cinerad iographic study of VC 

and CV articu latory velocities. J. Phon. 4, 303-320 

Lindblom, B. (1963) . Spectrographic study of vowel reduction. J. 

Acoust. Soc. Am. 35, 1773-1781. 

Lindblom, B. 

syntes av 

Lingu istics, 

and Lacerda, F. (1985). Akustiska uttalsstudier 

svenska II. Projektbeskrivning. Inst itute 

University of Stockholm. 

for 

of 

Lindblom, B. and Lindgren, R. (1985). Speaker-l istener 

interaction and phonetic variation. PER ILUS, Report IV. Inst itute 

of Linguistics, University of Stockholm. 

Lindgren, R., Lindblom, B., and Krul l, D. (1986). Phonetic 

variation in natural speech. Status and progress report I. 

Institute of Linguistics, Un iversity of Stockholm. 

ohman, S. (1966). Coarticu lation in VCV 

Spectrographic measurements. J. Acoust. Soc. Am. 39, 

utterances: 

151-168. 

61

AN UNMARKED DIALOG? 

Exploring Discourse Intonation In Swedish 

Madeleine Wulffson 

Is 

there 

phoneticians 

distinguishing 

such 

a like 

a 

thing as 

have 

long 

an 

unmarked 

spoken 

of 

dialog? Linguists and 

sentence 

Intonation, 

between 'marked' and 'unmarked' versions of any given 

sentence or utterance. The standard 'unmarked' version of a statement, 

It has been found, wi I I usually figure with some sort of 'global fal I' 

(Lieberman 1967, Bruce 1984) whereas a 'neutral question' wi I I usually 

figure with some sort of rising contour, 

Studies 

at least towards the end. 

of 'marked' versions have often been concerned with devices 

for Inviting shifts of focus from one element to another, 

Q: "What color cottage did you stay In last summer? " 

A: "We stayed In a RED cottage. " 

of the type: 

In the case of Swedish, Issues relevant to, e. g. the word accents, have 

been greatly I I lumlnated by this type of 'lab language' analysis. But 

when It comes to live Interactive dialog, a different approach Is 

required to handle the communicative value of Intonational phenomena. 

This article proposes just such an approach to the study of 

Intonation In discourse. 

Developed In England for the Engl Ish language, 

It has proven equally effective and applicable to the Swedish language as 

wei I, (see Wulffson 1987) with only certain technical modifications being 

required to accommodate the morpho-lexical phenomena of the Swedish 

accents. 

tone 

The Discourse model of Intonation In question, developed by David 

Brazl I at the University of Birmingham, represents a finite set of 

meaningful variables which are the result of either/or linguistic choices 

made on the part of the speaker, 

the state of existential, 

on the basis of an ongoing assessment of 

here and now convergence or divergence between 

speaker and hearer, encoded and decoded respectively In real-time. 

These functional oppositions are represented by relatively easily 

recognizable Intonational phenomena of relative pitch level and pitch 

direction. The purpose of this article Is not, however, to present the 

model per se. This has been done more than adequately In other 

publications, particularly Brazl I 1985, "The Communicative Value of 

Intonation In Engl Ish". More recently a study of the Swedish 

Implications of the model wi I I be found In Wulffson 1987. A brief 

summary of the meaningful variables subsumed In the model, 

taken from the 

latter mentioned publication, Is to be found In Appendix 1. Further, In 

Appendix 2, Is a summary of the transcription conventions which have been 

slightly modified to accommodate the central Swedish tone accents. In 

the present analysis of Skane Swedish, 

these modifications are of lesser 

62

consequence due to this dialect's lack of 'dubbeltopplghet' ('doubletopped-ness') 

In words carrying accent 2 so characteristic of central or 

stockholm Swedish. Instead the Interactive dlscoursal aspects of 

Intonation In Swedish wi I I be the center of focus. 

The purpose of this article l!, on the other hand, to explore ways 

In which the Discourse Model can effectively be used to I Ilumlnate a 

whole conceptual area of Interaction In discourse, that suppl led by 

Intonation , which has largely been neglected In Swedish up to now. This 

wi I I be done In the fol lowing manner: A snippet of dialog wi I I be 

compared Intonational Iy with Itself, so to speak. That Is, two versions 

of the same dialog, one being what could be cal led a 'marked'verslon, and 

the other a sort of 'unmarked' version. With the aid of the Discourse 

Model, the concepts of 'marked-' and 'unmarked-'ness, wi I I be seriously 

questioned, as the title suggests. 

A thorough-going analysis of both versions wi I I be attempted, taking 

Into consideration a certain range of the various options open to the 

speaker at any given moment. Each configuration Is an Interplay of 

semantic, grammatical and Intonational factors In a unique here-and-now 

context. Furthermore It should be kept In mind that al I of the various 

oppositions are available for exploitation and manipulation by the 

speaker. Within the system there are 4 basic factors, PROMINENCE, TONE, 

KEY, and TERM INAT ION, which can be divided Into 13 subfactors, each of 

which constitutes a potential meaningful contribution to the discourse. 

In addition, the exclusion of the non-chosen factors, that Is, that one 

factor (Ie prominence as opposed to non-prominence) , Is chosen over the 

other, can also be said to be meaningful. 

The 'marked' version Is the spontaneous one, plucked out of a taped 

conversation where two people, A and B, (both from the Skane part of 

Sweden) , are speculating over a photograph of a woman (whom they don't 

know, ) sitting at an outdoor cafe. The discussion has been going on for 

some time, and the two people have bul It up a broad picture of the 

woman's personality and activities before the bit which we have picked 

out comes In. The question they are discussing at the monent Is what 

this lady does with her free time, and the conclusion Is that, although 

there are not many theaters near where she lives, (somewhere In the midwestern 

part of the U. S. A. ,) she does enjoy going to the theater. In 

actuality the photograph was taken In a Piazza In Rome, and the subject 

was a lady Professor of Economics, but this Is of no Importance. The 

Important thing Is that we are dealing with a lively, spontaneous, and 

natural conversation between two col leagues and friends. 

The 'unmarked' version, on the other hand, Is an unabashed product 

of laboratory manipulation, arrived at by the fol lowing process: 

Each single utterance of the original bit of dialog was copied down on a 

separate slip of paper and given to the original speakers In shuffled, 

random order. The utterances were then re-recorded, one by one, with 

'neutral' Intonation. The 'this Is what Is written on the paper' 

Intonation. Subsequently the re-recorded utterances were computer- 

63

spl Iced together again, Humpty Dumpty style, In the same or der as the 

original dialog, with 200 ml I I I seconds space between each one. 

Now, It Is often said that a picture Is worth a thousand words. In 

this case we could say 'a bit of listening Is worth a thousand words'. 

Quite simply, the result of a couple of minute's listening to the two 

versions of the dialog can only lead to one conclusion. That Is, that 

the first dialog Is - utterly nor mal and natur al, and the second Is - not 

a dialog at al I but mer ely a conglomeration of separ ate utterances, 

strangely cohesive In textual content but totally lacking In any kind of 

communicative Interaction between the speakers. A real linguistic 

Frankenstein. 

So what went wrong? The speakers were the same In both versions. 

The words wer e the same, or practically the same. The timing was such 

that the utterances came In rapid, natural-I Ike succession. Only the 

Intonation was different. Actually, the subjects were rather concerned 

that, despite the precautions taken, they had 'remembered' the original 

conversation and had re-recorded the bits with that In mind, thus 

disturbing the equl I Ibrlum of the sought - after 'neutrality' of the 

second version. These fears were unfounded. The resulting dialog Is so 

'neutral' as to be ... untenable as a dialog. 

The Spontaneous Version 

But first things first. 

In Its transcribed entirety: 

We wi I I start with the spontaneous version 

-t 

l' 

A: II r+ j' UNDrar om hon I nte LASer II 

B : 

A- 

II r + MHM II 

A: II r roMANer II II r+MHM II 

'" 

B: 110 hon laser PA - II 1\ P DELt Idll 

'" 

A: 

II 

l' 

p nej jag MENar han laser 

-t 

roMANer II 

B: 

1\ p 

-t 

jasa hon laser roMANer 

A: 

B: 

A: 

B: 

1- 

II r+ JA II 

(skr a ttl laugh) 

\10 jag tro' du mena' HON -\lP laser 

1'f' 

II r+ NEJ IJ 

't 

UP JA j a 1\ 

p det MENade jag 

p 

jo hon LASer 

Inte 

II 

NOG II 0 

pa DELtld 

sa N' Ar 

II p KVALLSt I d II 

-¥ 

STORA- 1/ 

64

l' 

A: \\ r+ MHM 

II 

B: (clears thr oat) 1\ p VERK P av TolSTOY och DostoYEVsky II 

+ 

A: 1\ r+ JA \I p pr eill II 

A begins with: "I wonder If she reads. " Dlscour sally, she Is 

Intr oducing a new topic for discussion, that of reading as a fr ee-time 

activity. It Is a logical step to suggest, as the two discussants have 

recently decided that the woman's posslbl I Itles for cultur al enr ichment 

are limited by her 'small-town' envir onment. Intonatlonally the last 

boundar y was marked off by pitch sequencing low ter mination, so the 

maximally disjunctive high key on 'UNDr ar ' clearly sets off the utter ance 

as a new stage In the discour se. The higher pitch level also lends a 

par ticularizing function to the segment which might al low for a gloss 

such as: 'Reading Is the Item I choose out of a whole set of possible 

activities she might engage In, such as watching TV, sewing, going out to 

night clubs with fr iends, etc. The tonic, 'LASer ', also car ries the 

simple rising tone r + (), or dominant referr ing tone of the Discour se 

Model. One can say that speaker A 'refer s' (R tone) to reading as a 

possible free-time activity which both speakers ar e privy to, as wei I as 

an assumption that that the lady does, Indeed, read. But since A Is, at 

the moment, Intr oducing a new element to the discussion, and taking the 

Initiative, she Is wei I Justified In her choice of the dominant version 

of the referr ing tone. A P() tone on the other hand, might have 

projected the sense : 'This Is a possibility for consider ation, I have 

absolutely no pr e-assumptions on the subject, (so tel I me what you 

think. )' The fur ther aspect of high ter mination br ings an expected or 

projected response type Into focus. Her e the functional significance of 

the high ter mination choice could be glossed as: 'Do you, or do you not, 

think that she reads In her free time? ' She Is Inviting a polar -type or 

adjudicating response. Bul It Into the high ter mination Is the 

'expectation' of a high key, yes or no type rejoinder. Which she gets: 

'" 

\I r + MHM II 

Speaker B cooper atively affirms her Idea with a high key r + tone, 

satisfying both pitch concord expectations and response type. A mid 

key response might have carr ied an Implication of something less than 

agreement, I Ike: "yes, or . .. " or "maybe". A low key response could 

possibly have had the effect of mild disagreement, such as, perhaps, 

"Maybe, but I don't really think so. " Low key car ries an equivalence 

function which, combined with a dominant r + tone and maximal concord 

br eaking, could very likely have the effect of er adicating the 

suggestion, and meaning something I Ike "We'r e back to wher e we star ted 

befor e you came up with this dumb Idea. " 

65

The relatively equal social status of the two speakers al lows for a 

free and open 'game of catch' with regard to who Is In (temporary) charge 

of the discourse. In fact this Is a dialog between two friends and 

col leagues, a woman and a man, of the same age and In the same line of 

work. Here he chooses the dominant version of the referring tone, as It 

Is his turn to 'judge'. 

In terms of turn-taking (In the Sachs, Schegloff and Jefferson 

tradition) It could be said that A has yielded the turn to 8, who now 

sets out elaborate on A's suggestion. But A suddenly realizes that more 

precision Is needed, and hastens to Insert this afterthought even at the 

risk of Interrupting 8's turn (which he has clearly established and 

embarked upon. ) The result Is a two-dimensional overlap (there Is no 

overlap In the spl Iced version of the dialog, which enhances Its highly 

unnatural quality) : 

A: \I r roMANer II 

-t 

1/ r+ MHM 1/ 

8: I/ o hon laser PA 1/ II p otLt I d II 

With simple referring () tone, A 'refers' to two things: that 

'novels' was what she meant to say, and/or that If the woman reads, It Is 

probably novels. She also 'refers', existentially, to her own 

expectation that this Is also 8's understanding. Her choice of mid key 

marks the tone unit as 'additive' - an additional bit of Information, 

and simultaneous mid termination projects an expectation of concurrence 

on 8's part. 

Meanwhl Ie 8 has set off on an entirely different 'tack', and whl Ie 

Indeed 'agreeing', he Is agreeing with the wrong thing! The col I Islon of 

simultaneous speech causes 8 to break off his tone unit midway, resulting 

In an oblique zero tone (-7). Satisfied that all Is clear by now, A 

encouragingly chips In with a high key r+ tone on 'MHM', antlclpatorlly 

adjudicating 8's contribution to be correct. 'Reference' here Is to this 

assumed mutuality. 

But then comes the 'bombshel I'. As 'Iasa' In Swedish means both 

'read', and 'study', It turns out that 8 Is presenting, with a high key 

p ( ) tone, that Indeed the woman studies but only part-time, as 

the contrastive nature of high key Implies, rather that ful I-time as she 

might have done otherwise. So A Is compel led to jump In with a quick 

correction: 

A: 

l' 

'" 

ne j jag MENar hon laser roMANer 1/ 

In this repair sequence, the high key of 'menar' carries contrastive 

value, which here could be glossed as 'I mean TH IS (novels) , not THAT 

(part-time, as you said) . The proclaiming tone on 'romaner' can be said 

to present this bit as decidedly new to the discourse, a world-changing 

Increment to the unfolding argument. 

66

But now let's take another brief excursion Into the hypothetical, 

Into what have happened. Had she said Instead, for example: 

'" 

* 1/ p jag MENar att hon LASer romaner II * 

A' 

It would have sounded extremely odd In this context, as It would have 

laid contrastive emphasis on 'READ', rather than on 'NOVELS' as If to 

say, for example, 'She doesn't READ novels, she WRITES them', which would 

be quite Impossible In the logic of this contextual and Interactive 

setting. The subtle effects of selectivity or non-selectivity In the 

prominence system can easily make nonsense out of an otherwise perfectly 

cohesive and coherent text, lexically and grammatically speaking. 

But back to our or iginal version. It Is Indeed the word 'romaner' 

which receives prominence, being presented contrastively with 'deltld' as 

a correction of a misunderstanding. A's high termination further 

projects the expectation of a high key polar, adjudicating response. 

Which she gets: 

B: 

1- 

jasa hon laser roMANer 

II 

The high key fal ling tone 'proclaims' the whole as - 'This Is definitely 

new 

and not common ground'. 

B's high termination In turn 

'expects' 

a 

high key yes/no type response. 

Which he gets: 

A: 1/ r+ 

,f 

MHM 

11 

In natural dialog the phenomenon of pitch concord Is perhaps one of 

the most striking Intonational features. In our 'unmarked dialog', on 

the other hand, the absence of this Interactive play of pitch Is one of 

the most obvious deficiencies. 

B: 

110 

jag tro' du 

mena' HON - II p 

laser pa DELtld 

II 

If' 

A: 

1\ 

r+ NEJ 

/I 

B: 

\I p KVALLStld 

-V 

\I 

p 

-1' 

det MENade 

jag INte /I 

B's contribution here, apart from the o-tone In the first tone unit, 

(clearly due to verbal planning) , Is marked by proclaiming tones. Parttime, 

evenings, was what he thought she meant, and the proclaiming tones 

here underscore the separateness of their two worlds, the lack of common 

ground. It Is almost Impossible to Imagine a line of thinking at this 

point which would al low for referring tones In this context. A 

conceivable, though unlikely, gloss type for a referring tone here would 

be some sort of reproachful reminder that, If It was not shared 

knowledge, It should have been. If A didn't mean studying part-time, she 

should have! But given the cooperative atmosphere of this conversation, 

such a reproach would appear distinctly out of character. 

67

Now let us take a closer look at the last two tone units In S's 

utterance: 

 

p laser pa DELtld n p KVALLStld l\ 

"y 

Further back In the conversation, before our snippet begins, A and S have 

decided that the woman In the photo Is an executive secretary, a real 

high-powered, go-getter type. Therefore she must work hard, al I day 

long. So If she studies part-time, this must, perforce, take place In 

the evening, as no other time Is available on weekdays. Speaker S drops 

to low key In the tone unit \\ p KVALLStld II ' 

clearly, though 

"if 

subconsciously, exploiting the equivalence function of low key In order 

to present the two as existentially one and the same. (A mid key 

realization of 'KVALLStld', which would have presented 'evenings' as a 

new added bit of Information, might have sounded odd or even 

condescending - the subject of the woman's working having been so 

recently discussed and settled upon. ) Artificial tone or key switching 

can twist the message In such a way that an otherwise perfectly correct 

and logical utterance Is rendered nearly or totally unlntel Ilglble. 

It wi I I change the whole psychological effect. 

The second tone unit of S's utterance (' laser pa DELtld') , being a 

potential point of syntactic completion, Is a vulnerable place for 

Interruption or overlap. A has anticipated this 'closing point', and 

hastens to compete for the floor with a high key contrastive 'NEJ', 

Or, 

which 

overlaps with S's 'KVALLStld'. It Is not ' DELtld' she means, as he 

thinks, but 'roMANer'. In the here and now world of A and S's 

speculative 

conversation there Is an existential set of 2 posslbl I Itles, 

either 'novels', or 'part-time'. The Saussurlan general paradigm would 

never see 'novels' as the opposite of 'part-time', but here, these are 

the two existentially available choices, the 'existential paradigm' of 

the Discourse Model, marked Intonational Iy through both the prominence 

and the key systems. 

In this case high key on 'NEJ', Is chosen of necessity to effect a 

repair of the misunderstanding. Since this Is decidedly a competitive 

moment, 

(A and S are competing for both the turn and their own points of 

view) , a simple, non-dominant r () tone would have conceded a measure 

of agreement such that It might very possibly have led to a different 

'decision', Ie. , that perhaps the woman studied part-time after al I. 

(Please recal I that neither participant actually knows the woman's real 

character or activities, but rather are speculating about what she 

be I Ike or do. The decision could logically go either way. So the 

dominant r+ tone (..;:;f ) which A has chosen, Is by far the most 

appropriate and effective, 

same time firmly maintaining control. 

establishing social togetherness whl Ie at the 

She continues: 

1\ p '" 

det MENade jag INte \I 

The p tone () Is fu I I Y appropr I ate here to c I ar I fy and re-state 

68

the nature of the misunderstanding. The high key of 'MENade' (meant) 

underscores the polarity pair 'mean' vs 'not mean', 

but she concludes the 

tone unit with mid termination, (' INte') , In clear expectation of a 

concurring response. A gloss here might be: 'Now I expect that this 

little misunderstanding Is cleared up 

and that you are In agreement with 

me on this point. ' With mid termination she sets up an expectation of a 

mid key response as wei I . 

Which she does not get. B agrees, al I right, and gives up on his 

own Idea of part time studies In favour of novel reading. 

But he chooses 

to adjudicate rather than simply agree, 

and simultaneously signals (high 

key) his Intention to Introduce a heretofore new topic, that of what she 

reads. The breaking of the pitch concord expectation Is also a feature 

of discourse control or dominance, and In fact the turn or move that 

fo I lows const I tutes a fu II presentat Ion of the new top I c. I t is now B 

who 'decides' (the '+' or dominance factor) the nature of the reading 

material. A chips In, at a pause point whl Ie B Is clearing his throat 

and planning his strategy, with a concurring mid key: II r + MHM \\ to 

re-establ Ish togetherness and encourage B's line of thinking. B 

continues with the pr oposition that It Is works of Tolstoy and 

Dostoyevsky that the woman reads. 

These grand author s are presented with 

mid key, as constituting additional valuable Information, wher eas the mid 

termination 

key response. 

of the tone unit sets up the expectation of a concurring mid 

Which he doesn't get: 

..t- 

1/ r+ JA II p preC I S IJ 

A retains her part of the control ling role of the discourse by again 

breaking pitch concord expectations (an expression of dominance) , and 

adjudicating (high key) Instead of simply concurr ing, or 'chiming In 

agreement' (mid key) . In this second by second Interactive ml I leu, she 

'decides' ( the ' + ' factor) that this Is Indeed the type of reading matter 

In question. 

A gloss might be: 

' Yes, 

we are In agreement but Is Is I 

who am Judging that now. ' The final tone unit proclaims (p tone ) the 

correctnesss of the suggestion with a concl I latory mid termination. 

Although, as It happens, the subject Is closed after this, and a new one 

opened, A Is so delighted with having 'won' their little competition, 

that she does not close the subject with low termination, which would 

have const I tuted pitch sequence closure as we II. 

I nstead she emphas I zes 

again the agreement aspect of the exchange, by ending her comment 

'preCIS' ('precisely', 'right') with mid termination, thus leaving sense 

of concurrence 'In the air' to be savoured during the pause that 

fol lows 

before the next question Is taken up. 

69

The 'Unmarked Version' 

Now 

let's look at our Frankenstein version which 

lived 

but 

remained 

a monster. 

A: 

1/ p A non - dialog dialog: 

ja' UNDrar om hon INte LASer 

II 

'¥ 

B: \1 r+ MHM 1/ 

A: 

II p 

roMANer II 

B: 

li p 

A: 

li p 

hon LASer pa DEL tid II 

 

nej jag MENar att hon I ser roMANer II 

 

B: 

li 

p 

A: P 

JAsa II 

JA 

II 

p 

hon LASer roMANer 

if 

II 

B: 

\1 0 

 

jag trodde 

du MENade - " p 

hon LASer 

p pa DELtld 

II 

A: 

li 

p 

nej det MENade jag I NTe )1 

 

+ 

B: 

li 

p JAja II r+ JO \1 p hon LASer NOG II 

p sana HAR 

A: 

STORa VERK II p 

\\ p ja preCIS II 

 

av Tolstoy och DostoYEVsky 

-V 

A 'begins' with: p ja' UNDrar om hon INte LASer For a start, 

she speaks Quite slowly and overclearly, In obi IQue orientation. 

There Is no ' 0 ' tone, but the tone unit contains three prominent 

syllables, as opposed to the usual two In direct Interactive orientation, 

where the speaker Is equally concerned with WHO he/she Is communicating 

WITH, as with WHAT Is being said. In obi IQue orientation It Is the 

language Itself, for one reason or other, which Is In focus. Equally 

strange Is the fact that although A Is 'Introducing an entirely new 

topic', she 'begins' on mid key, as If she were merely adding a thought 

of little consequence to the conversation. As 'Fo downdrlft' Is In 

function here, the utterance carries low termination which In discourse 

terms projects no expectations whatever as to the type of response that 

would be agreeable or acceptable. This Is the usual, normal circumstance 

of an out of context, laboratory recording situation, but only occurs 

under certain circumstances In live communication. So this stance might 

be appropriate If A were, say, an Interviewer In a panel discussion, 

throwing out a topic for open discussion. But not here. 

70

B's 'reply' Is a mid key / termination " r+ MHM ij "MHM" Is most 

often associated with 'feedback or 'backchannel I lng', and In Swedish 

there Is a tendency for much feedback to be realized with an r+ tone 

(Wulffson 1987) . So B's 'lab association' was quite natural. In fact, 

due to this Imaginary association with a general tendency, B's 'answer' 

almost sounds possible. Like a bored husband, perhaps, who's trying to 

read his newspaper, hasn't heard a word of what his wife said, but 

answers anyway, 

just to keep the peace. 

A's next contribution, li p rOMANer il 'meets' pitch concord 

'expectations' by 'adding' that the lady In the photograph reads 

In a one word utterance, 

by saying: 

novels. 

'Fo downdrlft' has not had time to take effect. 

B then proceeds to close the pitch sequence (and the 'discussion') 

11 p hon LASer pa DEL tid J/ 

..v 

with low termination. There Is no overlap here, as was to be found In 

t he or I gina I . 

But A 'disregards' the fact that the 'discussion' Is 'clearly 

closed'. (Our hypothetical husband wants to read his paper. ) She 'adds' 

(mid key) , that she meant that the woman read novels. In the real 

version, A protested the turn of events by use of contrastive high key, 

and 

simultaneously demanded an active polar type response through use of 

high termination on 'romaner'. Our hypothetical 'wife', on the other 

hand, simply 'tel Is' her 'husband' that In fact he was wrong and that's 

that. (p tone, low term I nat Ion) . 

B 'responds' In mid key, 

'agreeing' and 'accepting' this 'additional 

piece of Information with totally uncal led for equl I Ibrlum! He 

'continues' by closing off the pitch sequence again with low termination, 

foregoing or sacrificing a response. 'Resignation' could be the word to 

describe the effect of this unengaged utterance. One could perhaps 

object here that, since e. g. 'resignation' too Is a human attitude, the 

'conversation' Is after al I possible. 

But as we have seen and wi I I see, 

there Is no continuity or coherence of 'attitude' or Interplay between 

the speakers. 

Nor does A now 'feel the need' to 'liven up' the conversation. 

low key II p J,t-II projects what Is contextually or content-wise a highly 

contrastive statement as being equivalent to the last, a foregone 

conclusion. Low termination again projects no expectations as to the 

type of response which might be agreeable. 

B 

How about a divorce? 

plods on: 

\\ 0 jag trodde du MENade -li p hon LASer 1\ p pa DELt Id 

II 

Her 

(I thought you meant she studies part time) For a start al I of the words 

are clearly pronounced, whereas In the spontaneous version 'trodde' was 

pronounced 'tro" , and menade was 'mena" - much reduced. Also obi Ique 

orientation plays a role here. There Is no contextual reason for 'laser' 

to be prominent. B 'hammers In' a point which was already Imminently 

71

'clear'. He 'beats a dead horse', so to speak. This bit does not end 

with low termination as the others do, 

but It Is also the only bit with a 

reference to another person (du) , 

so It Is likely that the 'global fal I' 

rule was not In effect at the time of the recording. 

The reference to a 

2nd person no doubt Influenced the person recording to Imagine an 

Interlocutor and an answer, but having no Interlocutor, he merely 

'expects' neutral concurrence. 

Which Is what comes, but not because A 'agrees'. She doesn't at 

al I. But she also has nobody to 'answer', so why should she 

'adjudicate'? 

\I p nej det MENade jag .!.!:!te 1/ 

-+- 

This Is an additive mid key 'response', where In fact contextually a 

strong contrast Is being made to correct a mistaken Idea. Not here. No, 

we've closed the subject again. (Low termination. ) 

But now finally we get some ' life' out of B: 

\\ p ".. 

JA ja \I r+ JO )1 p hon LASer NOG 1\ p 

VERK 

II 

p av TolSTOY och DostoYEVsky II 

-v 

sana HAR STORa 

This being a longer utterance, It has Its own Internal structure and 

beginning, with 'Ja ja, jo', which even out of context suggests a polar, 

adjudicating function. So the high key Is, for once, quite appropriate. 

Except that there are other factors which render the sequence unlikely 

a I I t he same. 

The main reason we have a high beginning and a low endlng here Is 

related to discourse factors In a mlnl- 'out of context context. ' This 

Is 

a longer utterance and there Is more 'content' In this statement than 

In the others, which leads B to read It as a complete presentation of a 

mini-topic, without regard to any Imaginary Interlocutor's reaction. The 

point of the matter Is that the 'global fall' Is a physical 

descr Ipt Ion, 

whereas the pitch sequence relates to discourse meaning, 

which Is not by 

any means automatic. 

A pitch sequence Is phonologically defined as a run 

of one or more consecutive tone units which ends In low termination. 

It 

has a number of discourse functions, 

among them being a dominance factor, 

and the marking off of discrete, consecutive stages In the unfolding 

discourse. The global fal I Is the pitch sequence by default, so to 

speak, due to the lack of any other discourse factors that would bring 

about other configurations. 

A's final 'reply' sounds very odd Indeed: 

\I p j a pr eC I S \I 

-.v 

The low key on 'precis' presents this bit as equivalent to the last. 

If the 'fact' of the woman reading Tolstoy and Dostoyevsky had already 

As 

72

een negotiated. Appr oximately as far from the tr uth of the context as 

one could get, as the Idea Is brand new. Again low termination closes a 

new pitch sequence and leaves no expectations as to the 'r esponse'. A 

sort of 'I don't care and ther e's nothing more to say' Impression Is 

conveyed. 

In sum, It al I sounds very odd. Her e are two people 'conversing' 

(exchanging utterances) yet not communicating. Or at least not 

communicating In a way any of us would be likely to consider 

satisfactory. If we wer e to over hear such a conver sation, we would 

Immediately ask ourselves, 'Are these people Aslmovlan robots? ' or 'Are 

they the very worst actor s Imaginable, In the process of lear ning lines 

they hate or couldn't care less about? ' 

The lack of pitch level Inter play has been mentioned as one of the 

principle causes of the above described effects. Another very striking 

cause Is the near-total lack of referr ing tones, an obi Igatory 

characteristic of direct or ientation. In our spontaneous (direct) 

version, we have a good balance between R tones (9 examples) and P tones 

(10 examples) , plus 3 0 tones due to either planning difficulties or 

cut-off tone units. In the spl Iced dialog on the other hand, there are 

only 2 R tones as opposed to 15 P tones (plus one 0 tone due to reading 

aloud difficulties) . 

Quite a revealing ratio. The referr ing tone serves not only to 

relfy mutual worlds of understanding and Invoke common ground, but also 

to establ Ish social or psychological togetherness on supra-Informational 

levels. So It Is natur al and even predictable that when utterances are 

withdrawn from an Interactive context, the number of R tones wi I I either 

disappear or at least diminish drastically. As was the case In our 

stl I I-born laboratory dialog. 

Conclusions 

It Is hoped that the reader wi I I look upon the foregoing analysis as 

an attempt to bring out some of the ways In which Intonation reacts 

together with other pragmatic factors to generate the unique universes of 

what Is cal led 'local meaning' In the Discourse Model. The entir e, 

complex, 

here-and-now setting which speakers deal with according to their 

own apprehension of the continually mutating dynamics of verbal 

communication. 

The hypothetical suppositions In the analysis about some 

of the things that might have happened, are - Just that. Things that 

have happened. Out of the myriad other things that might have 

happened. Each contextual factor, Intonational or other wise, affects 

every other In various subtle ways. 

The analysis has also shown that there Is per haps no such thing as 

an 'unmarked' dialog. 

It could even be said that !!! naturally occurring 

Interactive speech Is, by definition, Intonatlonally 'marked'. Strings 

of words and sentences can be 'sewn together' In wr iting - maybe, but 

73

definitely not In speech. The Discourse Model clearly shows how 

Intonation represents an entire conceptual dimension of meaning In spoken 

language as opposed to written language, and further, how and why 

Intonation cannot be altered or manipulated with Impunity. 

Acknowledements Many many thanks to those who assisted In the 

rea II zat Ion of th I s project. My two Imaginative Informants, those who 

gave advice (Gosta Bruce, Robert McAI lister, and Jan Anward) , technical 

assistance (David House and Mats Dufberg) and food for thought 

(everybody at the Institutes of Phonetics and Linguistics at Lund and 

Stockholm universities) . 

Selected Blblography 

Brazl I, David. 1985. The Communicative Value of Intonation In Engl Ish. 

(Discourse Analysis Monograph no. 8) Engl Ish Language Research, & Bleak 

House Books. University of Birmingham. 

Brazil, D. 1985. Phonology: Intonation In Discourse. Extract: 

Handbook of Discourse Analysis. Academic Press, London. 

Brazil, D. 1985. 

Semlotlca 56 - 3/ 4. 

Where Is the Edge of Language? Review article. 

Brazil, D. 

and Rhythm. 

New York. 

1984. Sentences Read Aloud. Offprint: Intonation, Accent 

Studies In Discourse Phonology. Walter de Gruyter, Berl In, 

Brazl I, D. 1982 . Impromptuness & Intonation. Off print from Impromptu 

Speech - a Symposium. 

Abo Academy. 

Bruce, G. 1984 St ructure and Funct Ions of Prosody. Stenc I I . I nst. for 

I Ingvlstlk, Lunds Unlversltet. 

Bruce, G. 1977. Swedish Word Accents In Sentence Perspective. Gleerups, 

Lund. 

Bolinger, D. 1986. Intonation and Its Parts. Vol. 1. E. Arnold. 

Cooper-Kuhlen, E. 

Arnold. 

1986. An Introduction to Engl Ish Prosody. 

E. 

74

Coulthard & Montgomery (Eds) . 

Routledge and Kegan Paul. 

1981. Studies In Discourse Analysis. 

Cruttenden, A. 1985. Intonation. Cambridge University Press. 

Lieberman, P. 1967. Intonation, Perception and Language. Cambridge, Mass. 

MIT Press. 

Llnel I, P., Gustavsson, L. 1987. Inltlatlv och respons. Om dlalogens 

dynamlk, domlnans, och koherens. Llnkoplng. 

Levinson, S. Pragmatics. 1983. Cambridge University Press. 

Sinclair and Brazl I, 1982 Teacher Talk. Oxford University Press. 

Wulffson, M. 1987. Discourse Intonation In Swedish. To appear In Lund 

Working Papers. Instltutlonen for Llngvlstlk, Lund University. 

Appendix 1: Brief summary of the Discourse Model: 

The Discourse Model postulates a finite set of meaningful linguistic 

oppositions which can be singled out on a perceptual auditory level from 

the more or less constantly varying stream of speech. The meaning 

components here described represent the result of a speaker having made 

on either/or choice. The Independent variables are functional In nature. 

For example, "If there Is a 'fal ling pitch', It Is not the fall Itself 

which Is of Interest but rather the function of the language Item that 

carries It. " 

The basic factors which contribute to the realization of the 

functional oppositions within each tone unit are: PROMINENCE, TONE, KEY, 

and TERMINATION. Further within the domain of these systems are 

ORIENTATION ('direct/obi Ique') , and DOMINANC E or DISCOURSE CONTROL. 

Reference to the work of Braz I I Is heart I Iy recommended for the reader 

who would I Ike to gain a deeper understanding of the model. In the 

meantime, the fol lowing wi I I serve as a general guideline: 

PROMINENCE refers to 'a selection from sets available at successive 

places along the time dimension. ' 'An Incidence of prominence fixes the 

domain of the other variables of tone, key, and termination.' (Brazl I 

1985) A syllable or stretch of speech may be assigned prominence for the 

purpose of sense or Intonation selection. For example, If one Swede asks 

another 'Vllket kort spelade du? ' (Which card did you play? ), and the 

other rep I I es 'H JARTerDAM!' (the queen of hear ts) , I t represents a 

selection from an existential set of 52 - the deck of cards. If the 

question had been: 'VI Iken DAM spelade du? ' - (Which queen did you play? ) 

with the answer, 'H JARTerdam', then there are only 4 choices In the 

75

existential set of 'hjarter', 'spader', 'ruter', and 'klover'. On the 

other hand, If the question had stl I I been 'VI Iken dam spelade du? ', but 

the answer had been, for example, 'HJARTerDAM', there would seem to be no 

motivation for 'DAM' to be prominent. But let's say, for example, that 

the speaker wished to concentrate on the card game Instead of answering 

questions, he might convey this with low termination on the word 'DAM', 

In order to be left alone! So prominence may be assigned for the purpose 

of making a choice within any of the other Intonational systems of tone, 

key or termination. 

TONE refers to basic pitch movement types, each of which carries a 

distinct abstract meaning Increment. The PROCLAIMING TONE, of which 

there are two versions, the SIMPLE proclaiming and the DOMINANT 

proclaiming (p p+) , stands for the elements In the discourse which 

represent a change In the status quo of speaker-hearer understanding. 

The REFERRING TONE, on the other hand, also with a SIMPLE and a DOMINANT 

version (r r+ ), effectively represents the areas on convergence, or 

relflcatlon of the status quo between speaker and hearer, either on 

Informational or social levels, or both. The dominant version reinforces 

the basic meaning of a tone and/or affects control of the discourse. 

The FIFTH TONE Is LEVEL (0 ), and remains outside of the 

Interactive proclaiming/referring dichotomy. ORIENTATION refers to the 

discourse situation In which speaker/hearer Interaction Is In focus 

(P/R) , whereas OBLIQUE orientation (O/P) functions where the language 

Itself or linguistic organization Is In focus. 

KEY and TERMINATION deal with the communicative value of relative 

pitch levels, HIGH , MID, or LOW. Key Is associated with the onset 

syllable, and termination with the tonic In an extended tone unit. 

Together on the tonic In a minimal tone unit. Within their domain are 

relationships of CONTRAST IVENESS, ADDITIVENESS, and EQUIVALENCE, as wei I 

as the Interactive areas of projected and actual responses, ADJUDICATING 

(high termination and key) and CONCURRING (mid termination and key) , or 

no projected expectations (low termination and key) . DISCOURSE 

STRUCTURING and SEQUENCING are also achieved through key and termination. 

The place of operation for these four sets of speaker options Is the 

TONE UNIT, which can be said to be the building block of verbal 

communication. According to Brazl I, the speaker 'plans' or 'encodes' the 

tone unit, and the hearer 'decodes' It as a whole. A tone unit (TU) In 

direct orientation consists of ONE (minimal TU) or TWO (extended TU) 

prominent syllables, one of which Is TONIC (- carries a major movement In 

pitch, or constitutes the beginning of a pitch movement which extends 

over the syllables that fol low) . Key and termination are determined by 

the level of pitch In relation to preceding and succeeding prominent 

syllables. Key and termination of p and r tones depend on the beginning 

of the tone, whereas In the p+ tone It Is the peak of the rlse-fal I, and 

In the r+ tone, It Is the end of the rise which counts. The tonic 

syllable Is the only obi Igatory portion of a TU. A pause always defines 

a TU boundary, but a TU Is not always defined by a pause. The model 

76

differs substantially from other models In this crucial point. 

It Is the 

Instance of a set of meaningful functional choices, and their Internal 

organization, rather than external boundaries which determine the tone 

unit. 

Appendix 2 

Transcription Conventions for Swedish 

1 . Tone un I t boundar I es: U 11 

2. Prominent syllables In capital letters with the tonic under I Ined: 

Ii oVre ka II 

3. Key and termination (relative pitch factors Involved at every TU. 

Key Is associated with the onset syllable, 

and termination with the 

tonic In an extended TU. 

Together on the tonic In a minimal TU. 

a. MID-KEY/TERMINATION are not specially marked. 

b. HIGH or LOW key-termination are marked with arrows: 

In the case of Accent 2 (A2 ) words In Swedish, where the pitch 

switch from mid to high key or termination takes place on the 

syllable fol lowing grave accent, this Is Indicated by an arrow 

placed above that second non-prominent syllable. 

4. Grave accent: (' ) 

5. P - either /p/ ('sl) or /p+/ () proclalmlng. R - either /r/ () 

or /r+/ ()If) 

refer ring. 

77

Why Two Labialization Strategies in Setswana? 

Mats Dufberg 

1. Background 

Setswana is a Bantu language spoken in southern Africa; 

in South Africa 

and in Botswana. It has seven vowel phonemes and a large number of 

consonant phonemes, of which many are labialized. In this paper I will 

discuss the labialized consonants and their non-labialized counterparts, 

and the two different realization strategies I have found, as one or two 

phonetic segments. In particular I will discuss why there is a bisegmental 

realization. 

The labialized consonant in Setswana can be found in two different 

vowel contexts. The first context is before a back, rounded vowel where 

all consonants are ( phonetically) labialized. The second context is 

before a non-back, unrounded vowel where there is a phonematic 

opposition between a labialized and a non-labialized consonant. 

Setswana has been described both by Tucker (1 929) in his book about 

a number of related Bantu languages and by Cole (1 955) in his Setswana 

grammar. Both agree on two important points ( Tucker 1929: 74, Cole 1955: 

33-34) : 

- labialized consonants in both vowel contexts are identical, and 

- labialized consonants are to be analyzed as one segment, both 

phonetically and phonologically. 

Tucker's claims concern both Setswana and Sesotho, which is a 

closely related language spoken in South Africa, among other places. Let 

me make another reference to Sesotho, and below it will be clear why it 

is relevant. Roux (1 981 ) refers to X-ray and acoustic studies of 

labialization, not in Setswana but in Sesotho. His conclusions are 

different from Tucker's and Cole's: 

- labialized consonants before rounded vowels are different from those 

before unrounded vowels, and 

- labialized consonants before unrounded vowels are ended with a labiovelar 

semi-vowel, [ w ] , and should at least phonetically be considered 

to be two segements. 

In Dufberg (1984) I reported on an acoustic study of the labialized 

consonants in Setswana. That study was done in 1984. Tore Janson, 

professor of linguistics, had during his studies of lexical change in 

Setswana in Botswana made recordings of word lists which he asked me to 

use for a study of the acoustic correlates of labialization in Setswana. 

In those recordings, of four informants, the labialization used was 

a monosegmental realization, that is, the labialized consonant was 

clearly one phonetic unit. This is in line with the Tucker-Cole view. 

For the second part of my study we recorded one speaker of Setswana 

78

from South Africa, not Botswana. In that recording I found two different 

strategies of labialization. One of the four consonants studied was 

pronounced monosegmentally, just like all consonants in the earlier 

recordings. But the other three consonants were pronounced with a bisegmental 

realization. That is, the consonant was followed by a semivowel. 

This is in line with the Roux view. Even though Roux studied 

Sesotho, not Setswana, his findings were felt relevant since my 

informant came from South Africa. This finding gave rise to two new 

questions. 

1) Should /C w / be analyzed - phonologically - as one or two segments? 

2) Is the difference in realization of the labialized consonants, i.e. 

mono vs. bisegmental, a dialectal or idiolectal difference? 

In my report (Dufberg 1984) I did not find any reason to change the 

phonological analysis. But I hypothesized that there was a dialectal 

difference in the pronunciation of the labialized consonants and argued 

against the idiolectal hypothesis. 

In this paper I will first, in section 2, give a brief presentation 

of the phonological system of Setswana. Firstly, becaused it will help 

the reader to understand my work, and secondly, because it is of general 

interest as a contrast to the Indo-European systems. In section 3 I will 

briefly review Dufberg (1984) , though section 3.2 on the expected 

effects of labialization was not included in that report. In section 4 I 

will discuss my new study and present some new data. Finally, in section 

5, I will discuss the two realization strategies - mono vs. bisegmental 

- and discuss alternatives to the dialect hypothesis. 

2. Phonological system of Setswana 

2.1 General description 

Setswana is a tone language but the tones will not be discussed in this 

paper. The syllable structure is the simplest possible; a syllable 

consists of either a consonant plus a vowel, CV, a vowel, V, or a 

syllabic consonant, C. There are no clusters of (non-syllabic) 

consonants, at least not on the phonological level. 

Vowel length is not phonematic in Setswana (Cole 1955: 55) , but 

there are different vowel lengths as a part of the prosodic structure. 

2.2 Vowels 

Setswana has seven vowel phonemes which can be divided into three groups 

(Cole 1955: 4-7) . 

79

Front, unrounded vowels: 

Iii phonetically very closed. 

/el phonetically more closed than half closed. 

/81 phonetically half open. 

Open, unrounded vowel: 

/al phonetically open and central. 

Back, rounded vowels: 

lu/ phonetically very closed. 

/0/ phonetically more closed than half closed. 

/I phonetically half open. 

There is phonologically governed variation of the vowel quality of 

the four mid vowels, Ie, E, 0, I (Cole 1955: 55) , but since it is not 

important for the present study I will not discuss it here. 

2.3 Consonants 

As we have seen, the number of vowel phonemes is low and the syllable 

structure is simple. The number of consonant phonemes, though, is high 

and there is a complex relationship between labialized and nonlabialized 

consonants. 

In Setswana there are 44 consonant phonemes, of which 17 are 

labialized. (For reasons which will be clear I do not count the semivowel 

/w/ as a labialized consonant here.) 

In the chart below each phoneme is represented by its major allophone. 

Notice that for every labialized consonant there is a consonant 

differing only in the respect that it is non-labialized. Since 

"labialization is a morphophonological process in Setswana" (Janson 

1985) it is really relevant to talk of a labialized consonant and its 

non-labialized counterpart. 

In a few cases the labialized consonant has 

two non-labialized counterparts; /ts W I has both Its/ and /tJ/ as its 

counterparts, /ts hw / has both /ts h / and /t J h /, and /s w / has both /s/ 

and / J!. 

STOPS: 

Plain Labia- Aspi- & ASE· 

lized rated labial. 

Place of articulation 

/p/ /p h / Bilabial 

/b/ 

Voiced bilabial 

/t/ /t w / /t h / It hw / Alveolar 

/tl/ /tl w / /tl h / Itl hw / Alveolar with lateral release 

/k/ /k w / /k h / /k hw / Velar 

80

AFFRICATES: 

Plain Labia- Aspilized 

rated 

Asp. & 

labial. 


/ts/ /ts h / 

/ts w / 

/t J/ /tJ h / 

/d 7; / 

w 

/d 7; / 

/kx h / 

Alveolar 

Alveolar or prepalatal 

Prepalatal 

Voiced prepalatal 

Velar 

FRICATICVES AND LIQUIDS: 

Plain 

Labialized 

Place and 

manner of articulation 

/ iF/ 

/s/ 

/s w / 

I f / 

/x/ /x w / 

/ r / /r w / 

/1/ /l w / 

Bilabial or labiodental fricative 

Alveolar fricative 

Alveolar or prepalatal fricative 

Prepalatal fricative 

Velar fricative 

Apical trill 

Alveolar lateral 

NASALS: 

Bilabial Alveolar Prel2alatal Velar Comment 

/m/ /n/ /fl / /0/ Plain 

/n w / /fl w / /O w / Labialized 

SEMI-VOWELS: 

/w/: bilabio-velar 

/ j /: palatal 

In Setswana there are a few click sounds, but of marginal 

importance (and only in interjections). For some consonants there are 

restrictions on which vowel can follow, but that is also out of the 

scope of this paper, except for what is relevant for labialized 

consonants. That discussion will follow below. 

2.4 Labialization of consonants 

Before the back (and rounded) vowels, /u, 0, / all consonants are 

(phonetically) labialized due to regressive assimilation (Cole 1955: 33- 

34) , i.e. the consonants are articulated with a distinct liprounding and 

- when it is possible - with the back of the tongue raised towards the 

velum. 

81

This means that before rounded vowels there is no opposition 

between the labialized and the non-labialized consonants described 

above. The alveolar series Is, ts, ts h / and the prepalatal series /f, 

tf, tf h / also collapse into one, labialized series /s w , ts W , ts hw /, 

which is alveolar or prepalatal depending on the dialect (Cole 

1955: 35) . 

Traditionally (Cole 1955) Setswana is described as having the 

labialized phonemes in front of the rounded vowels. That view could of 

course be challenged since there is no opposition between labialized and 

non-labialized consonants in that vowel context. (For an alternative 

analysis see Janson (1985) .) 

The labialized consonants can also be found before the unrounded 

vowels Ii, e, , a/, but in this case there is a phonematic distinction 

between the labialized and the non-labialized consonants. Even in this 

case there is only one labialized series that correspond to both the 

alveolar and the prepalatal series. It is this last kind of 

labialization that will be discussed in this paper. 

3. An acoustic analysis of /C W / - a review 

In this section I will briefly present my acoustic analysis of the 

labialized consonants originally presented in Dufberg (1984) . It is not, 

however, a pure review. In 3.2 I will discuss the expected effects of 

labialization which was not discussed in the original report. 

3.1 Objectives of the study 

The study was explorative and the questions we wanted to find answers to 

were: 

1) What is the acoustic difference between the labialized consonant and 

its non-labialized counterpart? 

2) Is there a common acoustic correlate that corresponds to the 

distinction labialized/non-labialized? 

To be able to answer the first question completely and to give an 

affirmative answer to the second question all consonant pairs have to be 

represented, and in comparable vowel contexts and positions in the 

words. Recall that the contrast between labialized and non-labialized 

consonants only exists before unrounded vowels which is the only context 

I have studied. 

3.2 Expected effects of labialization 

The term labialization implies that the consonant should have some extra 

component of the lips, 

most likely lip rounding. The acoustic effect of 

82

liprounding depends on the place of articulation. For dental, alveolar, 

or palatal consonants we would expect a lowering of the third formant, 

or its equivalent, like the effect of rounding of an [ i ] to an [ y ] . For 

velar consonants, on the other hand, we would expect a lowering of the 

second formant, like the effect of rounding of an [ w ] to an [ u ] (Fant 

1968: 214) . 

According to Cole (1955) , though, labialization (of consonants in 

Setswana) means rounding the lips and also raising the back of the 

tongue when that is possible. That is, labialization is then a 

combination of true labialization and velarization. Labialization 

combined with velarization will lower the second formant even in dental, 

alveolar, and palatal consonants, that is, labial ization and 

velarization will strengthen each other's lowering effect on F2. It 

would not be surprising if we would find velarization combined with 

labialization since the two have been found together in other languages 

(Jakobson & Waugh 1979: 116-7) . 

We can expect a secondary effect on the consonantal segment in 

either of these two models, that is, only labialization and 

labialization combined with velarization, to be lowering of the 

amplitude (Fant 1968: 204-5) , but much greater if F2 is lowered than if 

only F3 is lowered. 

3.3 Speech material and analysis method 

For the study we used two different recordings. Recording 1 was recorded 

in the field in Botswana by Tore Janson in 1982. It was a recording of 4 

native speakers of Setswana from Botswana reading a list of 75 words. 

The word list was composed for Janson's lexical change studies. The list 

was not planned for the study of labialization and there were a number 

of problems with the selection of words. Firstly, all consonant pairs 

were not represented, secondly, both members of a pair were not always 

in a comparable vowel context, and thirdly, some consonants were in the 

final syllable which often underwent devoicing. Of 20 /C-C w / pairs only 

8 could be used for the analysis. Recording 1 clearly did not, even 

theoretically, allow us to answer the two questions presented above. 

The second recording, recording 2, was recorded at the phonetics 

laboratory in Stockholm by Tore Janson and me. It consists of one 

speaker reading a list of 32 words specially selected for the study. The 

speaker is a native speaker of Setswana from South Africa. The selection 

of words in the word list was made after I had done most of the analysis 

of recording 1. This list was limited to four of the eight pairs 

analyzed in recording 1. This limitation was done to keep down the size 

of the study. For each consonant we had four different vowel contexts. 

The speech material was analyzed on a Kay Digital Sona-Graph 7800 

spectrograph, and all measurments were done by hand on spectrograms with 

a band width of 300 Hz. To be able to compare levels the spectrograms 

83

were normalized with the help of the strongest vowel of each word. 

3.4 Results 

Without too much simplification we can summarize the results from 

recording 1 and 2 in a table. In the table I use the following 

notations: 

duration 

F2 

Duration of the consonant segment. 

Transitions of the second formant from the vowel before to 

the consonant and from the consonant to the vowel after. 

consonant 

formant The lowest peak, in freqency, in the spectrum of the 

consonant segment. 

? 

Some uncertainty of the analysis. 

Phoneme 

pair 

Observed differences of /e w / 

with respect to /e/ 

Recording 1 

longer duration 

dipping towards cons.? 

lower amplitude of noise? 

longer duration? 

F2 dipping towards cons.? 

lower amplitude of noise? 

lower consonant formant 

Recording 1 

Recording 2 

F2 dipping towards cons. 

lower cons. formant? 

lower ampl itude? 


lower cons. formant 

lower amplitude 







ended with a semi-vowel? 





ended with a semi-vowel 



ended with a semi-vowel 





84

It seems fair to say that labialization has one or more of the 

following effects: 

- Lowering of F2 in transitions from the preceding vowel and to the 

following one. 

- Longer consonant segment. 

Lower amplitude of noise/formants in the consonant. 

Lower frequency of noise/formants in the consonant. 

- Semi-vowel. 

The last effect, the semivowel, is rather special. In two consonant 

phonemes, /n w / and /l w /, in recording 2 the speaker clearly used a 

bisegmental realization, that is he ended the consonant with the semivowel 

[w], in one consonant phoneme, /x w /, he as clearly used a monosegmental 

realization. The forth case, /r w /, is somewhat unclear but my 

interpretation is that the informant is using the bisegmental 

realization even in that case. 

In recording 1, on the other hand, I found only the monosegmental 

realization of labialization. That is, the labialized consonant was 

never ended by a semi-vowel. 

3.5 Conclusions 

If we compare the effects of labialization we have found with the 

expected effects we can see firstly, that there seems to be velarization 

as well as labialization since F2 is affected even for front vowel 

contexts. Secondly, that there are effects that are not directly 

connected to the labialization itself. These are the longer duration and 

the occurrence of the semi-vowel. There seems to be a connection at 

least in one direction: the semi-vowel gives a longer total segment. And 

what is special with the semi-vowel is that only the speaker in 

recording 2 has it and that he has it fairly consistently. 

If we try to find a common acoustic correlate that corresponds to 

the distinction labialized/non-labialized, there seems to be two good 

candidates. The first is the lowering of F2 in the transitions from and 

to the surrounding vowels. There is some data that seems to contradict 

it, and that is the data of /tl-tl w / and /d?-d? w / in recording 1. But 

that data is not very complete and the lowering of F2 is what we would 

expect if labialization is combined with velarization, so I think it is 

safe to consider lowering of F2 to be a common correlate. 

The second candidate is the lowering of the second formant, or the 

equivalent resonance, in the consonant itself. This is expected, and 

reasonably supported except in nasals. In nasals there is a total 

closure behind the lips, and therefore lip rounding seems to be 

irrelevant for the nasal segment. (The question remains, though, what 

the expected effect of velarization of an alveolar or palatal nasal 

consonant is in the nasal phase.) 

The two effects of labialization discussed above are common 

85

correlates of labialization, but there are other effects of 

labialization that are not present in all realizations of labialized 

consonants. The most obvious one is the presence of a semi-vowel. We 

have clearly found two different manners of realizing labialization, 

mono and bisegmental. 

The questions that the two different ways of realization gave rise 

to, which have already been referred to in section 2. 2, were: 

1) Should /e w / be analyzed - phonologically - as one or two segments? 

2) Is the difference in realization of the labialized consonants, i. e. 

mono vs. bisegmental, a dialectal or an idiolectal difference? 

Janson (1 985) points out that the strict ev structure of Setswana 

is a strong argument against analyzing the /e w / as two phonemes. And I 

see no reason to argue against that. On the second question I argued in 

Dufberg (1 984) that the dialectal hypothesis was the most reasonable. I 

will return to that question in the next section. 

4. Some new data 

4.1 Objectives of the study 

We wanted to test the hypothesis from Dufberg (1 984) that there is a 

dialectal difference between mono and bisegmental pronunciation. If this 

dialectal difference exist there are at least two possibilities. The 

dialectal difference is something that could be found only in areas in 

contact with Sesotho from which Setswana has borrowed the bisegmental 

strategy. Then we assume that Roux's analysis of Sesotho is correct, 

that is, that /e w / is ended phonetically by a semi-vowel, [wl. The 

second possibility is that this feature is spread to different areas 

maybe independently of Sesotho. Then we would be able to find the 

feature in other areas. That is in areas where there is no contact with 

Sesotho. 

4.2 Speech material and results 

This third recording, recording 3, contains ten speakers of Setswana 

from different areas of Botswana recorded in Botswana in 1985 by Tore 

Janson. The same list of words as for recording 2 is read once by these 

ten speakers. Three of these speakers have been digitalized and recorded 

onto disks and analyzed by a spectrogram program on the DEe Eclipse 

computer in the phonetics laboratory. Since the objectives have been to 

test the dialectal hypothesis, I have not made any detailed measurements 

but only looked for bi vs. monosegmental realizations of labialization. 

Let me illustrate here with spectrograms: firstly, the effect of 

labialization, and secondly, the difference between mono and bisegmental 

realization. Figures 1-4 contain spectrograms illustrating the effects 

86

of labialization. Figures 1 and 2 are parts of recordings of the same 

word, which illustrate the non-labialized /x/, read by to speakers, A 

and B. Speaker A is the one speaker who used bisegmental real ization of 

the labialized consonants (except in /x w /). Speaker B is one of the 

speakers of recording 3, and he never used bisegmental realization. In 

figure 3 and 4, read by the same two speakers, parts of recordings of 

another word illustrate labialized /x w /. Notice the transition of F2, 

and the down shift in frequency of the strongest resonance of the 

consonant. All realizations in figures 1-4 are monosegmental. 

Figures 5-8 illustrate bi vs. monosegmental realization of two 

consonant phonemes, /n w / and /l w /. Figures 5 and 7 are from recordings 

of speaker A, as defined above, realizing his consonants bisegmentally. 

Figures 6 and 8 are from recordings of speaker B, realizing his 

consonants monosegmentally. Notice that F2 stays at a low frequency 

value, forming a [w], in figures 5 and 7, whereas in figures 6 and 8 F2 

rises directly after the end of the nasal and lateral phases, 

respectively. 

Even though it can be hard to define the beginning and the end of 

the semi-vowel that ends a labialized consonant with bisegmental 

realization, the difference between a mono and bisegmental realization 

has still been rather clear cut. And in the three speakers, out of ten 

in recording 3, that I have analyzed I have found no examples of bisegmental 

realization. 

5. Discussion and conclusions 

5.1 Possible explanations of bisegmental realization 

Let us now try to account for the different realizations, that is, 

bisegmental vs. monosegmental realization. Let me first summarize the 

differences between the informants with only monosegmantal realization 

and the one informant with both mono and bisegmental realization. 

Only monosegmental 

From Botswana 

Contact with Sesotho less likely 

Recorded in field 

Recorded in their own country 

Both mono and bisegmental 

From South Africa 

Contact with Sesotho likely 

Recorded in echo free chamber 

Recorded in exil e 

The following are theoreticall y possibl e explanations that we 

shoul d consider: 

1) Idiolectal peculiarity 

2) Dialectal difference 

3) Spelling pronunciation 

4) Hyper speech due to the formal situation 

I rejected in Dufberg (1984) the first alternative on the grounds 

89

that it is not likely that someone would so consistently have such 

different realizations. As long as there are other possible explanations 

I think we can safely leave the idiolectal explanation out. 

The second alternative, dialectal difference, is the one which I 

adopted in Dufberg (1 984) . I did not then consider alternatives 3 and 4. 

And what I found in Dufberg (1 984) in favor of this hypothesis against 

the idiolectal hypothesis was the fact that the informant so 

consistently used the bisegmental realization for three consonants and 

so consistently the monosegmental for the fourth. Nothing speeks against 

the dialect hYPQthesis but the support is not very strong either. 

The spelling of Setswana, which is fairly standardized, is very 

phonematic. A consonant phoneme is in the orthography represented by a 

single grapheme, a digraph, a trigraph, or even a quadrigraph. A 

labialized 

consonant is represented by its non-labialized counterpart's 

graph plus a w in the end. So the words /onEla/ and / xon w Ela/, 

respectively, are spelled go nela and go nwela, respectively. ( Recall 

that there are no consonant clusters in Setswana.) But spelling 

pronunciation, the third alternative, can not, at least not alone, 

explain the bisegmental realization. Firstly, all informants were 

literate and bilingual in Setswana and English, 

and all were reading the 

words from a list written in Setswana standard othography. Among the 

informants that did not show any bisegmental realization were university 

students. Secondly, spelling pronunciation can not explain why the 

phoneme /x w / was never realized bisegmentally. 

Let us look at the fourth alternative. The one informant who used 

bisegmental 

realization was the only one to be recorded in an echo free 

chamber in a phonetics laboratory, which is probably the most formal 

place one could be recorded in. The other informants were recorded in 

much more relaxed places. The one informant was also the only one 

recorded in exile, that is, in Sweden. The other ones were recorded in 

their own country, Botswana. The formal situation may have triggered 

hyper speech, that is, the opposite of reduced speech. ( For a discussion 

of hyper speech see Lindblom (1 987) .) This explanation assumes that 

bisegmental realization is, at least potentially, available to the 

speaker of Setswana. 

the 

If this was not the case at the days of Tucker and 

Cole, literacy might have made it available to the literate. 

The hyper speech hypothesis itself can not explain why the 

informant that used bisegmental realization always realized / r w , n W , l W / 

bisegmentally, but never / x w /, which was always realized 

monosegmentally. But if we assume that labialization also implies 

velarization for non-velar consonants, then there is one interesting 

fact, namely that / x w / is a velar consonant, and the only velar one of 

the four consonants. For the non-velar consonants, the velar gesture, 

together with the labial gesture, adds to the complexity of the 

consonant, whereas for the velar consonant it is part of a non-complex 

consonant. This difference may be the key to why / x w / behaves 

differently. If we assume that the secondary articulation is carried out 

90

of the consonant itself we would get a labio-velar semi-vowel, [w], 

after non-velar consonants, that is, a low F2. But only liprounding 

after velar, which would not affect the F2 of non-back vowels. 

This velar explanation is compatible with both the dialect and the 

hyper speech hypothesis. But it makes the hyper speech hypothesis more 

convincing than without the velar explanation. 

5.2 Conclusion 

In this paper I have challenged the original hypothesis that the two 

different labialization strategies, mono vs. bisegmental realization, 

are connected to dialectal differences. The new hypothesis is a hyper 

speech hypothesis, that is, that the different strategies are connect to 

style of speech. In hyper speech, that is, the opposite of reduced 

speech, we would then get the bisegmental pronunciation. 

An explanation to the difference in realization of the labialized 

velar consonant, /x w /, which was never realized bisegmentally, in 

contrast to the other consonants could perhaps be found in the fact that 

it is velar in contrast to the other consonants analyzed, which are 

alveolar. 

REFERENCES 

Cole, D. T. (1 955) : An introduction to Tswana Grammar. London. 

Dufberg, Mats (1 984) : Labialiserade konsonanter i setswana -en akustisk 

analys. Unpublished paper. Stockholm: University of Stockholm, 

Institute of Linguistics. 

Fant, Gunnar (1 968) : "Analysis and synthesis of speech processes". In 

Manual of phonetics, 2nd edition, edited by Bertil Malmberg. 

Amsterdam: North-Holland Publishing Company, pp. 173-277. 

Jakobson, Roman & Waugh, Linda R. (1979) : The sound shape of language. 

Brighton, GB: Harvester Press. 

Janson, Tore (1 985) : "Labialisation in Setswana: phonetics and 

phonology". In Phonologica Africana 1984 (=Wiener linguistische 

Gazette Beiheft 5) . Wien: Institut fUr Sprachwissenschaft der 

Universitgt Wien, pp. 73-84. 

91

Lindblom, Bj6rn (1 987) : "Adaptive variability and absolute constancy in 

speech signals: two themes in the quest for phonetic invariance". 

In Proceeding s Xlth ICPhS from the Eleventh International Congress 

of Phonetic Sciences, 1987, vol 3, pp. 9-18. 

Also in Perilus report no 5 (this volume) . Stockholm: University of 

Stockholm, Institute of Linguistics. 

Roux, J. C. (1 981) : liOn the notion 'phonologization': some experimental 

phonetic considerations from Sesotho. " In Phonologica 1980, edited 

by W. Dressler et al. , pp. 373-378. 

Tucker, A. N. (1929) : The comparative phonetics of the Suto-Chuana group 

of Bantu languages. London. 

ACKNOWLEDGEMENTS 

Thanks to Robert McAllister and Sven Furumark, for insightful 

suggestions and proofreading, to Bj6rn Lindblom for leading my analysis 

in the right direction, and to Tore Janson, for supporting my work. 

92

L.Rcug, I.Lndbrg nd L. -J.Lundbrg 

Dpartmnt of 

Lingui s tic s 

University of 

Stockholm 

S",dn 

Duri ng the last decad thr has ben a growing interst in childrn's 

presp ec h developmnt (Yeni-KClmshian, Kavan agh and Ferguson 1980, Stark 

\981, LocKe 1983, Lindbl Clm and Zett rstrm 1986). Th viw hld by J aKobson 

(1968) t hat declared babbling and speech as two unrelated behav i ors, stands 

in cClntrast with recnt studies of p rling uist ic vocalizat ions and ear ly 

language acquisition i nd ic ating a gra du al transition (Oller, Wieman, Doyle 

and Ross 1976, Vihman, MaCKen, Miller, Simmons and Millr 1985). JaKobson's 

app roa c h !>Ias tCl pos t Ulate a d iscon t inuous step and a univrsa] order of 

acquisition of phonmes governed by the "laws Clf irrvrsibl solidarity " 

( 1968: 51 ) , laws whi c h in JaKobson's frameworK und erlie p honolog i cal 

u nivrsa l s, the regreSSion of the phonologic a l system in p atints w i t h 

a p hasia as well as the a cqu i sition of phonol ogy in the child. T he p rincipl e 

of m aximal contrast governs the order in which the phonemes are acquired. 

This mea ns that the infant's earlie st language productions will consist of 

consonant/vowel contrast s of maximally diffrent phontic vents pa, 

followed by a nasal/oral contrast pal ma. This period of structured 

phonol ogic a l d evelop ment is p r c d ed by a priod of rand om vcu:a 1 

93

prod uctions , i . e. babbling. These babbled utter ances are characte r izd by 

"an tonishing quantity and diversity of sound productions" (1968:21). The 

two types of vocalizations, babbling and speech, may be sepa r ated by means 

of a hort pe r iod in which the child i sometime "completly mute" 

(1968:29). This silent period marks fo r the child the functional diffe rence 

of the two types of vocal behavior. However, not all child ren tu rn mute 

since "fo r the most part . . . • • one stage merges unobt rusivly into the ot her" 

(1968:29). Ja k obson considers babbling 

"pu r poseless egocent ric" , and hence non-communicative, type of behavio r 

parallel to which " desire for communication" and 

gradually replaces the "biologically oriented tongue deli rium" (i.e. the 

babbling) of the child (1968:24). Thi view of the young infant a a 

non-communicative , rather passive individual acco rds well with the general 

opinion of infant competence at the time (see the dicusion in SuI Iowa 

1979) • It is only in recent years that the communicative capac ities of the 

ve ry young infant have begun to be more fully ap p reciatd ( Su 1 1 Q ... ,a 1979, 

Meltzoff 1986). The unde r standing that a search fo r precu rsors of speech 

must be conducted in the context of a more gneral per spective on 

communicative be havior has resulted in a rejection of the discontinuity 

theory. HQwver, it has been uggested that the order of ac qu i sit ion 

proposed by Jakobson might be more true fo r the prelinguistic pe r iod than 

fo r the acquisition of early phonology (Vi hman pe r onal communication), 

thus implying a unive rsal developmental pattern fo r babbling rather than 

fo r speec h . The silent per iod ... ,hich was repor ted by Jak obon has not been 

confi rmed by any of the many recent stlJdies of prespeech development (e.g. 

Mu rai 1963, Cruttenden 1970 , Koopmans van Beinum and van der Stelt 1986, 

Olle r 1980 , Sta rk 1980, Kent and Sauer 1984, Vi h man et al.1985, Holmgren, 

kindblom , Aurelius, Jalling and Zetterstram 1986). Instead there have been 

repo rts of the existence of developmental stages or milestones in the 

babbling pe riod (Olle r 1980, Stark 1980, Koopm ans van Beinum et al. 1979, 

94

Holmgren et al . 1986) and strong simi larities bet ween the phonetic 

reperto ire of a child's babbling and his/her first words 

(Ol ler et ill . 

1976 , Locke 1983 , Vi hman 1986) . 

Many of the studies have been performed on children in Eng lish speaking 

commun i ties , a circumstance that has served as a sti mu lus to undertake 

research to co nf irm these data on non-Engl ish subjec ts. The present pilper 

presents phonetic data on a group of Swed ish inf ants that by and large 

corroborate the deve lopmental mi l estones reported in other stUd ies 

(O l ler 

1980 , St ark 1980 , Koopmans va n Be i num et al . 1986). 

If we are correct in assum ing that babbl i ng ilnd speech are functio nal ly 

related , obse rvat ions of babb l ing shoul d be of clin ical interest . A large 

number of questio ns can be raised concerning the poss ibil i ties of obtilining 

ear ly indi c ators of deviant commun i cative deve lopment . The present projec t 

(foot note 1) was initiated by professor Ro lf Zet terstrom and his co l l eagues 

at Sankt Goran's Ch ildren 's Hospital in Stoc kholm. A major goa l of the 

project has been to obtain a detailed phonetic descrip tion of the prespeec:h 

development of norm al Swed ish inf ants wh ich could serve as a reference data 

base in the deve lopment of meth ods for the ear ly diagnosis of deviant 

commun icat ive 

deve lopment. 

Eight normal (fo otnote 2) Swed ish infants were audio-recorded (footnote 3) 

on a bi-week ly basis in their homes from when they were around 5 to 76 

weeks of age . Recordi ngs were ended when the ch ild had ac hieved a ten- to 

lexicon according to parental reports. The recordings were 

made in the prese nce of a close relat ive (mother or father ) 

or an adult 

95

whom the child knew we ll. The ituat i on in wh ich the recor dings were made 

would vary depe nding on the age of the chi ld and the time of day . Typical 

record ing ituat ion wou ld be: infant lying in bed falling asleep or 

awaken ing, infant pl ay ing with toys, infant seated in sofa or at table 

draw ing or read ing in book with adult. We wou ld also record dur ing meal or 

shortly af ter , in nursing situati ons such as diape r change and dressi ng , 

and in any other int era c tive sit u atio ns betwee n parent and ch i l d that would 

occur natural l y. As the inf ants grew older on ly the last interact ive 

si t uation rema ined . In parallel wit h the aud io record ings , note were made 

of the var i ous activities taki ng place . This in order to provide contextua l 

informat ion for the inter pretatio n of the voc al izat ions. An extra recording 

sess ion was made at the age of around 3.5 years to insure norma l 

speech 

deve lopment. 

From the larger samp le of eight infant, a group of four (two boys and two 

gi rls) were se lected on the basis of qua lity and regularity of record i ngs. 

The recording of these four infant were firt exposed to a crude aud itory 

analysis. Th is met hod of se lec ti ng voca l i zat ion samp les consi sted in 

sc reening the tapes at the approxim at e age of onset of mi l estone rep orted 

by other investigato rs (Koopmans van Be i num et al . 1986, Ol ler 1980, Stark 

1980) • The Ieeks at whic h there were clear changes in the c hara c ter of the 

chi ld's voca l i zati ons , were se l ected for further analyses . Also weeks ju st 

preced ing and following this point were an alysed to secure the t ab i lity of 

the mi lestone . The chose n record ings were comp uter ed ited (footnote 4) in 

order to exclu de al l non-c omf ort and non-i nfant sounds . In follow ing this 

procedu re 37 percent (20 hours ) of the tota l number of recorded hours (54) 

per infant , Ie re ana lysed . The ch ild's phonation s were divided into 

utt erances us i ng breath groups as segmentat ion crite ria and were then rerun 

96

onto tapes for transcrip tion. The total number of utterancs transcribd 

per child was around 2500 , vary ing between 2200 and 3100 utteranc es for 

each of 

the fou r children. 

The taps to be an a lysed were independent ly transc ribed by four students of 

phonetics using the In ternationa l Phonet ic Alph abet (IPA) (see The 

Princip les of the In ternation al Phone tic Association , 1981 ) . Prior to this 

ana lysiS , transcrip t ion training sessions and discussions were he ld to 

insu re identica l mode of procedu re. The IP A is , as the name implies , an 

in ternationa l notationa l system deve l oped to describe speech sounds found 

in the languages of the wor ld. Each symbo l rep resents a sound of a stand ard 

phonetic val ue. The phone tic symbo l of each sound can be ana lysed into 

articu lat.ory features such as place and manner of co nstriction in 

consonants, degree of opening of the mouth (i.e. lowering of the tongue and 

posit.ion of the tongue in the horisontal front -center-back dimension 

and presence vs. absence of rounding of the lips, for vowe ls. The features 

voiced vs. voice less denote whether or not. there is vibra tion of the voca l 

cords accompanying the articu lation. Additiona l ly there are diac ritics 

al l owing for detailed desc rip t ions of each symbol , should it deviate from 

the stand ard phonetic va l ue. 

In our approac h eac h of t.hese features of the IPA symbo ls we re given a 

number (see Table I) i.e. for conson an ts the different places and manners 

of articulation wou ld be numbered an d the presence vs. absence of voicing. 

Simi la rly for vowels the degree of opening of the mouth, front -center -back 

position of the tongue (see Table II) , rounding of the lips and voicing 

97

.. 

.. 


r 2 3 4 5 6 7 8- 9 

, j 

0 

ClnaonADU . i .i 

 

= 

 

: 

i 3 

-; 

-2 

- 3 ,g < < ;:.. ". 0 

. 

c 

0 

C . 

. 

1. Pic-i-n p b td t

we re coded in numbers. Additiona l l y dirction of breath st ream e.g. 

ingressive , egressive and voice qua lity e.g. norma l and deviant , 

lik ewise numerica lly coded. 

These numbers formed the basis on whic h the 

frequency 

counts we re made. 

Choosing this app roac h we can an ticipate the fo l lowing me t hodologic al 

prob l ems. Since pre-speech canno t be assumed to be organi zed in to disc rete 

phonemic segments , the first prob lem is on of rep rsenting a 

series of events as nQn=£Bniin . A rl atd issue is the question of 

what notation to use when doing so. The choice of notationa l system when 

transc ribing babb ling varies in the literatur. Although a sep arat system 

has been deve loped for transc ribing babb ling (Koopman van Beinum et 

al.1986 ) many researchers have choosen to use an expanded form of IPA 

adding diacritics deve loped to describe non-speech -like sounds (Ol ler et 

al.1976 , Cru ttenden 1970 , Kent and Bauer 1985 , Vihman et al.198S amongs t 

others). Bush , Edwards, Luckau , Stoel , Mac ken and Pe tersen ( 1973 ) offe r 

such a set of diacritics for the specification of phonet ic modifications of 

basic IPA segments. The Du tch mode l (Koopmans van Beinum et al.1986 ) is a 

physio l ogic a l l y based transc rip tion procedure. It re l ates phonatory and 

articu latory events to the speec h production mec h anisms of the vocal tract. 

The articu latory notat ions used can be in t erpreted as indic ating the place 

and manne r of articu lation , 

degree of opening of the mouth and the 

configu ration of the lips. 

This approach is simi lar to that of ours 

although the not ations used by us we re those of the IP A. In our study 

each 

IPA symbo l was ana lysed into it s articu latory features. When transcribing , 

a given symbo l w as se lected with specific consideration of constituen t 

articu latory fe atures. In addition to making reference to the ove ral l 

99

phonetic value of the uttered sound we wou ld ask ourselves questions lik e: 

How and where are the consonant-like events of this sequence produced? What 

degree of opening do the vowel -like sequences seem to have and what are the 

tongue and lip positions? The answers to questions like these would 

faci litate the fi nal selection of symbol. Even though we are aware that the 

physio l ogy of the infant's vocal tract is quite unlike that of the ad u lt 

(Kent and Mu rray 1982) we would often try to reproduce the sounds to be 

transcribed in order better to understand their features (c.f. Pike's term 

"imit ation label tec hni que" 1943:16) . In this sense the way in whic h we 

used the IPA cou ld we l l be said to be a physio l ogical ly based approach. 

Invented signs were used for productions for whic h there are no est ablished 

IPA symbo ls e.g. a bi l abia l tril l (8). Modifications of the IPA symbo ls 

were made if necessary , using diac ritics developed for the transc rip tion of 

babb ling (Bush et al.1973). Also fe atures describing direction of breat h 

egressive) and typ e of phonation , if devia nt (breathy, 

squeaky , creaky, rough or pressed) were also included in ou r analysis. 

The choice of the IP A system raises the question of whether or not it is 

correct to use a notational system, developed to desc ribe speech sou nds , on 

(1943:150 -151 )- that "the controlling 

mechanisms of non- speech sounds are quite similar to those of speec h 

sounds". He goes on to say tha t "poi nts of articulation are similar for the 

two groups" and that "ty pes of articu l ation movements are lik ewise ". 

Further he states that "de grees of stric t u re fa ll into the same general 

classes for bot h groups, and strictures interrupt the ai r stream in simi lar 

ways regard less of whether or not the sou nds are used in speech". The 

simi larities between speech and non-speec h sound productions are due to the 

fact that they both use the same articu lat ors. We feel that these 

simi lari ties support our choice of notational system and transcrip tional 

p roc ed u re. 

100

Wi th regard to the Swedish 19 Q£k9Bn of the transcribers and 

the possible effec ts of this in the transcribed material , it cannot be 

denied that there might possib ly be suc h an influence. A mit igating 

circum stance however , is that the transcribers al l were students of 

phonet ics and thus trained to dis regard language bac kgrou n d effec ts when 

transcribi n g. Admittedly such effec ts are diff icult to control since they 

resul t from unconscious process ing of the perceived signal . We do however 

fee l that the transcribers' awareness of this problem faci l i tated thei r 

more obj ective j u dgemen t of the v ocal i zat ions. 

To supp l ement the subj ec tive judgements of the aud itory analyses , acous tic 

ana l yses of the mater ial 

are be ing made (Roug , Landberg and Lu ndberg 

fo rthcomin g) . 

The general problem of low inteQe mn1 (see e. g. Ol ler et 

al . 1976, Ol ler and Ei lers 1982, S toc k man , Wood s and Tishman 1981), we 

fou nd that the disagreement amongst transcr ibers decreased when the 

transcriptions were compared not on the segmental lev el , but in terms of 

frequency distribut ions of arti cu l atory features . To make feature 

compar isons between transc i bers possible a fi rst step was to introduce the 

ment i oned numerical coding des cribed above . By choosing this approach a 

more stab le transcriber-i ndependent picture of the chi ld's product ion 

pattern at a given age is achieved . In Table III t ranscr iptions from three 

points in time for a si n gle chi ld are shown . For each of the infants the 

correla t ion coefficients for the four transcribers of conson an tal place and 

manner of art iculat ion ac ross three poi nts in time , are seen in Table IV . 

We see that there is cons iderable disagreement with regard to what segments 

101

Transcriptions 

LR 

a?ha 

LJ 

?aha?e: 

IL ?ce?a Week 19 

BH 

?ae:?a 

LR 

° ?aaoaoaoaoao 

LJ 

ale:le:llcel 

IL ?aoe:01:Iaaur Week 33 

BH 

)«':!lcelcedlceal 

LR 

LJ 

IL 

BH 

hGpasa 

Qbaye: 

ma9E 

dapayce 

Week 54 

TAB TtT This T.ble shows exampls of transcriptions by four 

nrC:;:.ro" pr-

to use in the transcript ions bu t l i ttle disagreement as to what feat ures 

.r invo lved . Th is means that the four transcr ibers oft en wou ld disagree 

abou t the va l ue of the individua l segment bu t wou ld essent ial ly agree , in 

statistical terms , on how and where the favored sound types had occurred in 

the vocal trac t of a part icular chi ld . 

In ad d it ion to th segm ental analysis utterances were class ified with 

respec t to their sequent ial pattern ing of vowe l and consonant segm ents, 

i . (i • phonotact ic st ructure . Eac h utterance was class ified in terms of two 

parameters . One criter ion was the phonotact ic structure, the other was the 

phonet ic features of the const ituent consonant (s) . The fi rst parameter 

divided the utterances in five classes , the determi ning property be ing 

ab i l i ty to match (part of ) the utterance to one of the fo l l ow ing five 

phonotactic patterns . 

(V stands fo r any str ing of one or more vowe ls and C stands for any str ing 

of one or more consonants) . 

1 Non-consonant utterances 0 

2 Si n gle consonan t utterances C 

3 Op en sy l l ab le utterances CV 

4 Po lys yl labic utterances with 

repeated conson ant part of 

sy l lable 

CiVCiV 

5 Po lys yl labic utterances with 

vary ing consonants part 

(voicing, place or m.nner ) 

CjVC iV 

(C i /=Cj and Cj mu st be non-g lotta l) 

103

child V 

LR LJ IL 

Child K Child J Child M 

SH LR LJ IL SH LR LJ IL BH LR LJ IL 

SH 

LR 

.. LJ 

u 

os 

... IL 

p.. 

BH 

0,84 0,97 

0,85 

0,93 0,88 0,94 0,91 0,91 0,96 0,98 0,76 0,93 

0,91 0,96 0,97 0,90 0,95 0,92 

0,98 0,99 0,98 

0,96 

0,89 

0,98 

'" 

.. 

Since it was on ly requ i red that part of an utterance matc h a spec ific 

pat tern an ut terance may we l l contain parts not covered by the pattern . 

Th is means that utterances be longing to any of the classes may contai n a 

lead i n g vowe l withou t this af fecting its class ificat ion. Likewise classes 2 

through 5 may contain trai l i n g consonants . (Utterances showing trai lin g 

consonant strings contai ning non-glottal or non-nasal features were in fact 

marked 

special ly, 

wh ich actua lly induces a further ref i nement of the 

partit ioning) . 

The second parameter divided the utterances into six cl asses , the 

determining property be ing presence of specified 

type of consonant. 

A No consonant 

B 

Glottal consonant 

C 

Non-glottal consonant with non-comp lete closure 

D 

Sonorants and gl ides 

E Non -glottal conso n ants with comp lete closure 

F 

Consonant clusters 

Al l glottal consonants are cou nted as identic al . A glott al consonant with 

an adjacent non- glottal 

consonant was consi dered ident ical with the 

non-glotta l. A cluster 

is def i ned as a 

str ing of two or more consonants. 

(Note that from this fol lows that a string consist ing of one glottal and 

one non-glottal consonant does not cou nt as a cluster . Fu rther a glottal 

stop and a glottal fricat ive does not give rise to variation) . 

The classes are conc i eved as ordered in the sense that when a part icular 

ut terance can be class ified as be lon gi n g to more than one class, the higher 

class shou ld be chosen (5 rat her than 1, F rather than A) . The order is 

105

supposed to mirror the development of the child . 

Ex amples of membe rs of t he va rious categories are show n below . 

2 3 4 5 

0 C CV CiVCiV Cj VCiV 

A 

+v owel 

. 

ee: 

B 

+g lottal 

e? h

th various features of the IPA segments The 

followin g flgure5 .how the percent occu rrence of the most frequent features 

and categories found in our data. It is interesting to note that there are 

features and categories that are more frequent at certain times and les& so 

at others and some which do not occur at all in the data. 

Age is presented on the abscissa and percent occurrence on the ord inate . 

The percent occurrence in the curves i& presented cumulatively. The letters 

refer to the four i nfants, the two girls: V and K and the two boys : J and 

M. The individual curves for each of the infants are numbered 1 to 4 i • e. 

V=l, K=2, J=3 and M=4 . 

there is inf inite number of poss i ble places of 

rtir"ltinn in the vocal trat (c.f. Pike 943). However on ly a smal l part 

of ths placs ar used in speech. The eleven point. of articulation 

pha.ryngea 1 , uvular, veler, pa.latal, 

alvli'olo-palatal, re trof lex, dental/alveolar, labiodental 

in Ol.1r data in very 101'1 

numhers and others not at all. 

bilabial, dental/alveolar, velar and glttal constitute 9? p er cen t of the 

number of places used by the four infants over the whole period 

studied. Palatal and uvular articulations occur in four and 

places of articulation (retroflex, palato-alveolar, alveolo-palatal and 

do not occur at all in our data. The four most f re quently used 

107

laes are presented in greater detail below. 

tn Figures 1:1 through 1:4 the development of place of art.iculation is 

presented. We see that the prevailing place of articulation in the 1-5 

months period in three of the infants V, K and J is glottal and that this 

dominance rapidly declines in the second half of the first year. The fourth 

child M, has a preference for nasals during the first months result ing in 

his glottal peak appearing later (5-7 months). For three of the infants V,K 

an d M there appears to be a following period of velar/uvular productions. A 

m ...j C'lr it.)' (75%) of these productions are velar, however the two pl aces of 

articulation have been added si nce they often were difficult to distinguish 

frC'lm ear.h nther when transcribing. For the fourth child J a prolonged 

glottal period seems to compensate for the velar/uvular articulations. As 

the glotttitls titnd velars decline the bilabial and dental/alveolar 

productions take over . There does not seem to be any general order of 

to the bilabial and dental/alveolar place of 

articulation. Two of the infants K and M develop the bilabial articulaticm 

befC'lre the dental/alveolar. One infant V acquires the dental/alveolar place 

first. whereas t.he last infant J lacks a clear preference until the 

beginning of the second yea.r of life. This preference is then 

dent.al / alveo lar. At the last sampling point t.hree of the inf ants V,J and M 

have no clear preferences as far as front place of articulation is 

concerned. In the fourth infant. K however the earlier bilabial 

preference 

has changed to dental/alvenlar. 

According to IPA nine different m anners are used to describe arbitrary 

108

CHILD V 

CHILD K 

en 

w 

a: 

=:J 

!:i: 

W 

U. 

U. 

0 

w 

U 

z 

w 

a: 

a: 

=:J 

U 

U 

0 

l- 

Z 

W 

U 

a: 

w 

c.. 

100 

50 

0 

en 

w 

a: 

=:J 

l- 

e:( 

W 

U. 

U. 

0 

w 

U 

z 

w 

a: 

a: 

=:J 

U 

U 

0 

I- 

Z 

W 

u 

a: 

w 

c.. 

100 

50 

0 

1-3 5-7 9-11 13-15 18-20 

3-5 7-9 

11-13 15-18 

1-3 5-7 9-11 13-15 18-20 

3-5 7-9 11-13 15-18 

AGE 

IN MONTHS 

AGE 

IN MONTHS 

CHILD J 

CHILD M 

en 

w 

a: 

=:J 

l- e:( 

W 

U. 

100 

en 

w 

a: 

=:J 

I- 

e:( 

W 

U. 

100 

U. 

0 

U. 

0 

W 

U 

z 

w 

a: 

a: 

=:J 

U 

U 

0 

l- 

Z 

W 

consonantal speec:h sound s in any language: plo.lve, nasal , lat&>r.l , l.t&ral 

fric:ativ&, rol I &d , flapped , rol led fri c: ativ&, fri c: ati ve, fric: t i anl & •• 

c:ontinuants and .emi-vowels. It is of c:on. i d&rable inter&>st to note that in 

our data only th& following mann&r. occurred: pl o.i v&, na.al , la t era l , 

ro I 1 &d ( tri ll) , fricative and .&mi-vowel. The.e diff&r 

consider ably in fre qu&>ncy of occ:urrenc:e. Just as in the c:ase of plac:e of 

artic:u lation a few of th&> c at&gor i &. dominat& the .c:ene whi l & oth&rs oc:cur 

in low numbers or not at all. We f ind that plcsives , nasals and fric:atives 

c:cn.titut& 91 p &rc: &n t of the mann&rs us&d by the four in fant. over th& 

whole period studied. Semi-vowels , lat&>rals and vibratory trills c: onsti t u te 

fo ur , thr&&> and two p&rc &nt r&.p&c:tiv&ly whil& 

l at &r al 

flaps 

and rolled fric:atives are non-existent. 

The manner of arti c:ulation for the four infant. is seen in F igure. 2:1 

through 2:4. The g&nera l patt&rn here i • • cm&wh a t I e •• dr ama t i c: ov&r tim& 

compared tn place of articula tion. The early produc:ticns in the on& to five 

mon th. p &ri cd are mainly .top., fricativ&s and nasal •. We .ee that there i. 

a major shift in number of full stop c:onsonants around 9-11 mon ths of age 

in thre& of the infan t. V, K and M whil& th& fourth child J has hi. peak in 

the 11-13 mon t hs period. This inc:rease follows the onset of reduplic:ated 

c:on.onant babbl i ng prim& (.&e pag& 14). Li qui ds are pre.ent throughout th& 

study but i nc rease towards the end of the sec:ond year of life , at least in 

thre& of th& i nfan t. V,K and M, th& fourth child J ha. few&r I i quid. and 

the amoun t does not seem to i nc rease. The inc:rease of li quids in the 7-9 

month. period is mainly c:aused by bi l abia l and uvul ar t r ills , agai n thi. is 

true for three of the infants K,J and M. Infant V has only laterals at this 

po i n t . With regard to .emi-vowels th&y are c omparat iv&ly f&w and do not 

exhihit any major c:hanges in number. Infant J has an increase towards the 

110

CHILD V 

CHILD K 

(/) 

100 

::> 

 

w 

u. 

(/) 

100 

::> 

 

w 

u. 

u. 

o 

u. 

o 

w 

() 

ill 

50 

a: 

a: 

::> 

() 

() 

o 

I 

Z 

W 

() 

a: 

w 

a. 

o 

, 

- - --- -- - -- ---- - - - ----- --- - --- ------- ----, 

w 

() 

ill 50 

a: 

a::: 

::> 

() 

() 

o 

I 

Z 

W 

() 

a: 

w 

a. 

o 

, , 

L _______________________________________ L 

I 

1-3 5-7 9-11 13-15 18-20 

3-5 7 -9 11-13 15-18 

1-3 5-7 9-11 13-15 18-20 

3-5 

11-13 15-18 

7-9 

AGE IN MONTHS 

AGE 

IN MONTHS 

CHID J 

CHILD M 

(/) 

100 

::> 

I- 

oe( 

W 

U. 

U. 

o 

w 

() 

ill 

50 

a: 

a: 

::> 

() 

() 

o 

I- 

Z 

W 

() 

a: 

w 

a. 

o 

(/) 

100 

::> 

I- 

oe( 

W 

U. 

U. 

o 

w 

() 

ill 

50 

a: 

a: 

::> 

8 

o 

I 

Z 

W 

() 

ffi 0 

a. 

L _____________________________________ _ 

1-3 5-7 9-11 13-15 18-20 

3-5 7-9 11-13 15-18 

1-3 5-7 9-11 13-15 18-20 

3-5 7-9 11-13 15-18 

AGE IN MONTHS 

AGE 

IN MONTHS 

DSTOP [illjJFRICATIVE UNASAL 

.LlQUID 

DSEMI-VOWEL 

FIG 2:1-2:4 

The above Figures show percent occurrence of consonant 

manner of articulation as a function of age for each of the four 

infants. 

111

nd of th tudy wh il infant M ha a d@rease . With regard to fricat iv 

the general 

trend s@ems to be that of a gradual decrease toward. the end of 

lif. It hould be kpt in min d that we do not 

different iate between glottal and supraglottal frica tivs in thee curves . 

Conider i. ng the frquncy of ocurrnce of glottal, w up t that the 

large amount of fri c at i v es found in our data is heav i ly biased by the 

glottal production and that the fricat i v dcreasin g toward the end of 

the study are the gl ttal ones . Concerning nas als, thee are more frequent 

in the early produtions than in the late. 

I f we compare Figures 1 and 2 a general picture emerges of the segments 

ud ac ros time in the infants' voc al iza tion. From the manner and place 

curves we conc lude that a maj or ity of the fricat ives and stops produced in 

the one to fiv month per iod ar glottal . The dental /alveo l ar produtions 

during the same per iod are mai nly nasal but also fricat ives occur . The 

ve l ar ar fricativ or nasal and th ear ly bilabial art iulations are 

6em i -VO"Je Is, froica tives and nas als. A. the chi ld grow6 older , 

dntal /alveo lar, bi labial and ve l ar top beom mo re frequent mai nly at 

the expense of the frica tiv es . 

In Figur 3: 1 through 3:4 w th degr of open ing of the vowl-like 

sounds for eac h of the four infants, plotted as a function of age . The 

gnera] pattern i that of non-high, non-low vowl dominating the ar ly 

produ c tions. At the end of the study a more diversified picture emerge . If 

compard with Figur 4:2 through 4:4, "' h ich how the ocurrnce of back , 

center and front vowe l art iculatio ns, a general pattern emerges of mai nly 

112

CHILD V 

CHILD K 

en 

w 100 

a: 

::J 

I-- 

« 

w 

u.. 

u.. 

0 

en 

w 

a: 

::J 

I-- 

« 

w 

u.. 

u.. 

0 

100 

W 

0 

z 

w 

a: 

a: 

::J 

0 

0 

0 

50 

0 

0 

W 

U 

z 

w 

a: 

a: 

::J 

0 

0 

0 

50 

.. .. 

0 

0 

I-- 

Z 

W 

0 

a: 

w 

a.. 

0 

0 

0 

0 

o 0 

o 0 

L _____________________________ I 

I-- 

Z 

W 

0 

a: 

w 

a.. 

0 

0 

0 

---------------- - - - - - - - ------ 

0-3 6-9 12- 15 18-21 

3- 6 9-12 15-18 

0-3 6 -9 12 -15 18-21 

3 - 6 9-12 15-18 

AGE IN MONTHS 

AGE IN M ONTHS 

CHILD J 

CHILD M 

en 

w 100 

a: 

::J 

I-- 

« 

w 

u.. 

u.. 

0 

en 

w 

a: 

::J 

I-- 

« 

w 

u.. 

u.. 

0 

100 

W 

0 

z 50 

w 

a: 

a: 

::J 

0 

0 

0 

I-- 

Z 

W 

0 

a: 

Ll.I 0 

a.. 

W 

0 

z 50 

w 

a: 

a: 

::J 

0 

0 

0 

I-- 

Z 

W 

0 

a: 

W 0 

a.. 

o 0 

1 __ _____________________ ______ .J 

0 

0-3 6-9 12-15 18- 21 

3-6 9-12 15-18 

0-3 6-9 12-15 18-21 

3- 6 9 -12 15- 18 

AGE IN MONTHS 

AGE IN MONTHS 

FIG 3:1-3 :4 

These Figures show percent occurrence of degree of 

opening for vowels presented as a function of age for each of the 

four infants. Since a maj ority of the occuring vowels were front or 

central , these have been chosen to exemplify degree of opening in 

this Figure. 

113

I i fe, 

v@rsus more different iated vowe l qual iti@s ( i,e,ae ,a ) in later p a rt of the 

first year . 

It is interest ing to not ice the total ab&ene of bak vowels in 

the ear ly p r oduct ions . Bac k vowels beg i n to ap p ear in the second year of 

life. With regard to the features r ounded , unrounded , is a total 

domi nance of u nrou n ded vowels over the whole pe riod stud ied. Our results 

are simi lar to those reported in the literature. Kent and Murray ( 1 982 ) 

rep ort , in their acousti stu dy of 21 infants at 3 , 6 and 9 months of age , 

that the 3 to 9 months per iod is domina ted by "rela tively mid-front or 

central artic ulat ions" . Similarly C r utt e n d en (1970 ) in his study of his own 

t .. IO t .. lin daughters found that vowels of "the (.e) (a) (a ) type predom inated 

throughou t the babb l i ng perio d ". Kent and Bauer (1985 ) a nalyse d five 

infants at 13 months of age and report that "central and front vowe ls were 

favored over bac k vowel s, and low vowels p r edomina t ed over high vowe l s " . 

Buhr ( 1 980 ) who fol lowed a hi ld from the age of 16 to 64 weeks with 

biweek ly rec o r d ings finds that the acute ax is ( i -ae ) develops before the 

grave ax is (u-a) and explains this by the earlier devel opment of the jaw 

mu s cula tur e. Bickley ( 1 983 ) reports simi lar preferences in the vowe ls of 14 

infants' ear ly word produc tions between one and two years of age . Her 

results show that the Fl dimension (i- a) d eve l oped before that of F2 ( l - u) . 

That 

is, bak vowe ls did not our before late in the infan ts' repertoire. 

As menti oned earl ier a categor iza tion accord ing to phonotati structure 

was made of t he utterances . We were interested in seei ng when utterances of 

diff erent consonant and vowe l struture occ urred in the chi ld's product ions 

an d h ow they developed over time . In Figures 5: 1 through 5: 4 the 

114

CHILD K 

en 

100 

:::l 

I-- 

« 

LU 

u... 

u... 

o 

LU 

() 

m 

50 

0: 

0: 

:::l 

() 

() 

o 

, 

, 

, 

, 

, 

, 

I-- 

Z 

LU 

 

LU 

c.. 

o 

, , 

t _____ ______ ___________ _______ ..I 

0 -3 6-9 12-15 18-21 

en 

LU 

100 

0: 

:::l 

I-- 

« 

LU 

u... 

u... 

0 

LU 

() 

z 50 

LU 

0: 

0: 

:::l 

() 

() 

0 

I-- 

Z 

LU 

() 

0: 

LU 

c.. 

0 

CHILD J 

- - - - - - - - - ------------- - -- - --- 

I 

0-3 6 - 9 12-15 18-21 

3-6 9-12 15-18 

en 

LU 

0: 

:::l 

I-- 

« 

LU 

u... 

u... 

0 

LU 

() 

z 

LU 

0: 

0: 

:::l 

() 

() 

0 

I-- 

Z 

LU 

() 

0: 

LU 

c.. 

100 

50 

0 

3-6 9-12 15-18 

AGE IN MONTHS 

CHILD M 

, 

, 

, 

, 

, 

: --- - - - - - - - - - - --- --- - --- - - - - -- 

0-3 6 -9 12 -15 18-21 

3-6 9-12 15-18 

AGE 

IN MONTHS 

AGE 

IN MONTHS 

D FRONT ffiillill CENTER 

_ BACK 

FIG 4:2-4:4 

The above Figures show percent occurrence of tongue 

position presented as a function of age. Data are only available for 

three of the four infants. 

115

&du.p I i c:at&d ( R B ) (which contains a 

secondar ily mod ified category cal led redup licated bonsonant babbl ing prime 

(R2' » , non-r&dup licated consonan t babb l ing (NR B) , var i egat&d consonant 

babbl ing (VB) , non-consonant babb l ing (NCB) and glottal bonsonant babb l ing 

(GB) are pr&sent&d for &ac:h of th& fou.r infants. 

Th& c:at &gory r&dup lic:at&d babb l ing (RB) consists of utteranc:&s wh&r& a 

non- glottal consonant is repeated one 01" mor& times e.g. mamama, lala, 

A sub-cat egory to RB, contains utt&r anc&s produc&d with a comp let& 

supraglottal constrict ion e.g. dadada, papa. 

This class wi ll be referred to 

as r&dup licat&d consonant babb l ing prime ( RB ' ) • Th& n&xt class , th& 

non- redup licated consonant babb l ing (NRB ) contains utterances with 

supraglottal non-r&dup licated consonants e.g. aba, na, lal , af , gao The 

cat egory : var i egated consonant babbl ing (VB) , consists of utterances with 

alternat ing consonants. These alternat ions may be of manner , place or 

voicing e.g. naeda, bada, dae ta. The non-consonant babb l ing cat egory (NCB) 

as the name implies of utt&r ances 

lacking 

consonants i.&. 

utterances containing on ly vowe l modu lations e.g. 

a, ai • The category 

glottal consonant bab bl ing (GB) , consists of utt&ranc&s containing on ly 

glottal consonants e.g. ?oh, ? ? ,hae . 

W& r&ad from th& Figures that NRB is pr&s&nt in the rep&rtoir & of al l the 

infants from an ear ly age and that it inc reases dramat ical ly short ly after 

the ons&t of RB ' at around eight months of age . As a contrast to the sudd&n 

onset of RB ' we see the more gradual appearance of the category RB as a 

whol&. Th& ar&a b&twe&n th& dotted and the so lid lin e of that cat&gory 

consists of redup licated utterances with frica tiv es , nas als, liqu ids and 

semi-vow& ls as consonants &.g. o aa a, mama, lalala and .. , awa. As can b& 

116

CHILD V 

fB 

rr 100 

o 

(!) 

W 

f- 

« 

o 

ll.. 

o 

W 

o 

ill 50 

0: 

0: 

:::> 

o 

o 

o 

f- 

Z 

W 

o 

0: 

w 

a.. 

o 

, 

, 

, 

-____ - __ - - - ____ - - - _______ - ___ - - - - ___ ... - - - - - - _____ - - - - - - - ____ I 

2-3 4-5 6-7 8-9 . 10-11 12-13 14-15 16-17 18-19 

3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18 19-20 

AGE IN MONTHS 

CHILD K 

en 

w 

rr 100 

o 

(!) 

W 

f- 

« 

o 

 

w 

o 

z 

W 

0: 

0: 

:::> 

o 

o 

o 

50 

f 

Z 

W 

o 

0: 

w 

a.. 

o 

2-3 4- 5 6-7 8-9 10-11 12-13 14- 15 16-17 18-19 

3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18 

AGE IN MONTHS 

O REDUPLICATED VARIEGATED o.NON-REDUPLICATED 

W 

BABBLING l2J BABBLING . . BABBLING 

•. NON-CONSONANT r::::j GLOTTAL . 

BABBLING ru BABBLING OTHER 

FIG 5:1-5:4 

These Figures show percent occurrence of the phonotactic 

categories for each of the four infants as a function of age. The dashed 

line shows the percent occurrence of the subcategory Reduplicated 

Babbling Prime (RBI ) . 

117

en 

w 

o 

(') 

W 

l- 

tS 

iX 100 

l1.. 

o 

W 

() 

aJ 50 

a: 

a: 

:l 

() 

() 

o 

I- Z 

W 

() 

a: 

w 

c... 

o 

CHILD J 

ri8Jj:: _________________________________________ _________________ _ 

1-2 3 -4 5-6 7-8 9-10 11-12 13-14 15-16 17-18 

2-3 4-5 6-7 8-9 10-11 12-13 14-15 16-17 18-19 

AGE IN MONTHS 

en 

w 

o 

(') 

w 

t: 

() 

iX 100 

l1.. 

o 

W 

() 

Z 

w 

a: 

a: 

:l 

() 

() 

o 

I Z 

W 

() 

a: 

w 

c... 

50 

o 

.. 

" 

" 

, 

" 

", 

t\···.··:··.············ 

CHILD M 

2-3 4-5 6-7 8-9 10-11 12-13 14- 15 16-17 18 -19 

3 -4 5-6 7-8 9-10 11-12 13-14 15- 16 17-18 

AGE IN MONTHS 

118

sen , these types of redu pl icat ion p r ecde RB ' . It is int e res t i ng to not i c e 

the scarc ity of these babb les. Redup l icated nasal utterances are almost 

In our exp er ience nasals of ten occ u r in 

discomf ort sounds. Our e xc luding discomfort sounds from the data might 

therefore account in part for the low occurrence of 

r edup l icated 

nasal 

utteranc:El's . 

We seEl' that VB is present in thEl' El'arly productions and that it 

towards t h e end of t h e first year . The infan t s appear to d i vide into two 

groups with regard to amount of VB at thEl' end of the study. 

Infants V and J 

have a m aj or i ty ()45%) of VB at the last s amp l ing point, whe reas K and M 

have below fift een pEl'rcent . 

The NCB catego ry decreases towards the end of the first year . The max i mum 

pea of occurrnce appars about a month before t he onset of RB ' in thre 

of the infants V,K and J. The fourth chi ld M h as an extensivEl' per iod of 

vocal play with a m a x imu n pea abou t 3 months before the onset of RB ' . 

With regard to glottal utterances we see that the ear ly g l o t tal dom i nanc e 

is altered by the s upr ag l o t t al art i c ulat ion s i n t r odu c ed m a inly in the 

second half of the first year . A trend t hat has already been dEl'monstrated 

by the s h i ft in place of art iculatio n (see F i g u r e 1) . 

119

Smm.r izing the data presented sa f.r Ie find five deve l cpmen t.l b .bbl ing 

stages in the per iod stud ied. These .re; 

I 

the glott.l st.ge 

II 

the Vii 1 ar ILtVU Jar s t ag e 

III 

the vac:.l ic: st.ge 

IV the redup l i c:ated c:anscnant babbl ing stage 

V the v.r i egated c:onsc:ln.nt b.bb l in9 st.ge 

The ages at wh ich the d i ffer ent mi lestcnes oc:cr in the infants c:an be seen 

in Figre 6. For referenc:e the res lts frcm a Dtc:h study cf 51 infants is 

super imposed (van dar Stelt and Kcopmans van Be i num 1986 ) . 

first samp ling point between eight and twelve weeks of age . Th is st.ge is 

c:haractar izlid by utterances wit h glgtil c:anscnants and nrcunded cften 

nasa lized c:entral vowels. The seccnd most fr& quent typ e cf vcc:al izat icn 

dur ing this per iod invalves syll.b ic: n.sa ls. There is a certain amount of 

individual va riab i l i t y as tc wh ic:h cf 

considerable am o u n t of 9lott al ccnsonants in the bab bl ing. As a resu lt of 

the .hift to sp r aglcttal art ic:lat ions that c:ome with the RB stage , the 

fre quenc: y of o c:c:u r r a n c: e of glotta ls shows a d r as t i c drop . 

In stage II we not ic:e • first se of spraglcttal artic:ulat ions wh ich are 

non-nasa l . These a r & t yp icall y art iculat&d at the plac:e cf 

ar t i cu l a t i o n . The c;cnscnants t y p ic al cf this stage are vo i c:ed fric:ativ es . 

120

en 

w 

z 

0 

I- 

en 

w 

....J 

 

V 

IV 

III 

II 

f-----------------------------------------------------I 

I 

I 

I 

I 

I 

MJ V 

I 

I 

I 

I 

I 

I 

! 

I 

I 

I 

I 

/T 

I 

I 

I 

I 

K I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

I 

L _____________________ ______________________________ J 

o 10 20 30 40 50 60 70 80 

AGE IN WEEKS 

FIG 6 

Th i s " Figure shows the age of oc currenc of the 

mi lesto nes I through V for each of t he four infants. The m ean age of onset 

of t h e f our th mi lestone RB ' , is 34 weeks w i t h a standard deviat ion of seven 

wee k s . The ind i v idual ages of onset are J 26 , V 33 , M 34 a n d K 43 weeks. 

Super i mposed are t h e findinS5 of a D utch study Ivan der Stelt et al . 1986 ) 

of 51 inf a n t s whe r e the mean age o f o nse t far r edupl icated bab bl ing w as 

fund to b e 31 weeks wi t h a stand srd dev iat i o n of six and a half w eeks. 

121

is t h a t of inc omp l e t e clasures ou t number ing comp lete ones . Th is stage wou ld 

correspond to what has been cal led the "go oing" and "cooing" stage in the 

(Ol ler 1980 , S t ark 1980 ) . Th is mi lestone occurs as an 

expansion of ve lar /uv ular art iculations resulting in this place of 

art iculatio n becoming the second most fre que nt between fifteen and nineteen 

weeks of age . 

The Y2li£ t ag wh ich fol lows the ve l ar /uv ular stage , is a period in 

wh ich the infants produce a large number of non-consonant utterances . These 

utterances are best described as re l at ively 

long voca l i zat i ons with 

non-speec h l i k e int onat ien p a t t erns (Reug et al . 

forthcoming) , 

resemb l ing 

singing pat terns rather than s p eec h . Bi labial 

tri l l s , prol enged and 

unrel eased bi labial stops with lots of sal iva, insp i ratory uvu lar tri l l s 

(snorts) etc . are a lsc int rcduced dur ing this per iod . It might be sa id that 

dur ing this stage the infan t exp l ores both the per iodic and non-per iodic 

sound sources of the vocal tract . The vocal i zat icns are var ied bet h in the 

intens ity and frequency domain as we l l as in mede ef phonat ien i.e. ve ice 

qu al ity. These mcdu l ated vcwe l utterances are present over the whele per ied 

stud ied but an expansien in the number ef predu ctions, resulting in a 

per icd of max imum occurrence ef NCBs , in the months 

preced ing the onset of RB ' . It is accord ing te this per ied that the stage 

is def i ned . 

the 

chi ld deve lops 

the ab i l i t y to reproduce sy l lables in sequences . Most ef these utterances 

are praduced with a fu l l step censonan t contrasted with an open , central 

vawe l 

adad 

• These sequences are typical ly rhyth mic in 

thei r al ternat ions . Rhythmic behav ior, such as reck ing and kicking, has 

been found to d eve l o p around six m o nths of age in norma l infants (Thelen 

1981 ) . Redup l i cated babbl ing can be cons idered one ef these rhythm ic 

122

Svr.l inf.nt. have .ccord ing to th rport. of .om of th 

p.rents, been ob.ervd moving th j.w rhythmica l ly p and down witho t 

phon.t ing, a fw d.y. bfor th on.t of RB ' . Th .mont of ttr.nc. 

with nas al , semi -vowe l and fricat iv redp l i c at i ons con.t itte a mi nor 

port ion of th total nmbr of redup l i catd uttrancs . Th mrgnc of 

this mi le.ton is def i ned ac co r d ing to the abrpt on.t of redp l i cated 

ful l stop consonant bab bl ing (RB' ) . Th ag of onst of this stage var i. 

in our infant. from week 26 to week 43. 

Th onst of th last stag , t he is t& £onnl eeel1n9 1 , is 

def i ned accord ing to the fir.t .amp l ing point af ter the on.t of RB ' , at 

wh ich thr is a maj or inc ras in th nmbr of p rod u ct ions with 

alternat ing consonants. Th var i gated tteranc. are fond throughou t the 

.tdy but inc ras dramat ical ly toward. th nd of th fir.t yar of l i f. 

Th is type of b abb l i ng can be con.ide red an elaboratd form of rd up l i cated 

both with regard to sgmntal and .pr a.gm ntal 

f eatu res . 

We f i nd that the i ntonat i onal pat tern. and th .egm ental 

however not the case in v ar i egate d con.onant bab bl ing. Hre the c h i l d 

producs a var ity of consonants ovr laid on what sm. to b a typical 

sentence- l i k e intonat ion p at tern . The chi ld al.o var i. between sy l lable. 

that ar prc i vd by th l i stnr as b ing .tr.sd and nstr.sd . The 

extent to wh ich the chi ld exp l ore. this type of bab bl ing seem. to b highly 

individu al . As mnt i ond arl ir two of or infants V and J, have a high 

percentage of var i egated bab bl ing at th. laat record ing seaa ion w hereas the 

ot hr two K .nd M, do not seem to favor t hi s t y p e o f babb l i ng (see Figur 

5) . At this point of deve l opment it i. important to rememb er that what we 

hr rgard as babb ls might in fact be ar ly words. Th inc reas of NRB i n 

K at 14 mont hs, might be an expan& ion o f her ear ly lexico n. Infant M shows 

a simi lar xpansion of NRB at 15 months. It might b that K .nd M hav , 

123

what is known as , a more analyt ic approac h to language (Vihman 1986 ) thus 

preferr ing .horter , .ingle word -or i ented utterance. rather than the 

ho l i st ic

(Ol ler 1980 , Stark 1980 , Koopmans van Be i num t al . 1986 ) 

simi larities are found . 

mi l estones for the stud ies ment ioned . The overal l picture is one of gneral 

agreement in dis rn ing fiY m in the bab bl ing of normal 

infants dur ing the first year of l i f. However some disagreement as to age 

of onset and durat ion of the var i ous st ages ex ists probab ly ow ing to 

individual variation of the 

infants 

in the groups stud ied and varyin g 

methodo logical approac hes . 

Ol ler (1980 ) does not ment ion glottal 

consonants as a character istic 

feature of the first babb l ing stage , instead he stresses the nasal qual ity 

of the vocal i zations. However he does mention the ex i stence of 

throaty sounds" (p.95 ) when refe rring to other authors' findings . It might 

be that thse sounds are in fact glottal . Stark (1980 ) talks of "ref lexive 

voc a l i zat ions" as the first stage and states that the consonants of the 

newborn per iod "are almost always glottal stops, nas als or l i q uids" (p.77) . 

The Dutch data (Koopmans van Be i num et al . 1986 ) suggsts "glottal stops in 

ser ies" as a main character istic of this stage . We find ma i nly glottal 

consonants bu t also sy l l ab ic nasa ls in ou r study . If we consider the 

anatomical /physiological aspects of the very young infant 's vocal tract we 

find that there are considerabl differences compared with that of the 

ad ult. Kent and Murray (1982 ) l i st several such important diffrences and 

stress the importance of conside ring these facts when "ex plaining patterns 

of hange in infant vocal i zat ion s" . The very young infant is "an ob l i g ate 

nasa l breat her and an ob l i gate nasal vocal izer " accord ing to the authors . 

Th is is exp lai ned by th fact that there is "en gagement of larynga l and 

ve l opharyngea l structures " lea ding to mai nly nasa l ex it of air. The 

separat ion of these structures deve lop at four to six months of age around 

125

Age in OLLER S TA RK KvB et a!. ROUG et a!. 

Months 1980 1980 1986 1987 

1 

PHONATION 

REFLEXIVE 

----------- 

UNINTER- 

RUPTED 

PHONATION 

2 GOO STAGE - --- -- - ---- ---INTEFI:'-- 

COOING RUPTED - ------ - - -- 

GLOTTAL 

3 AND PHONATION 

ONE STAGE 

-------- ---. 

LAUGHTER 

4 EXPANSION 

ARTICULA- 

- ------ - - - - TORY ----- VELAR - - - ---- 

STAGE 

5 MOVEMENT 

__ qTt- ___ 

VOWEL VARIAtiONS 

VOCA LIC 

6 PLAY IN THE 

- ---------- PHONATOI3Y STAGE 

CANONICAL 

7 

DOMAIN 

BABBLING 

-1'----------- 

8 STAGE - ------- - - REDUPLI - 

- - - --- --- ---- 

REDUPLI - 

CATED 

CATED 

REDUPLI - 

9 ARTICULA- 

BABBLING 

CATED 

TORY 

10 

CONSONANl 

--- - ------- 

VARIEGATED INGljNON- MOVEMENTS BABBLING 

11 

BABBLING WORD RE - STAGE 

STAGE PRO - RUP 

12 

DUC - r..;AT , E 

- ---- - -- - -- 

PROTO - rTlONS BABB 

13 

LING 

WORDS 

rv'ARIEGATED 

BABBLING 

14 

15 

'1 6 

17 

18 

19 

20 

--- --- ----- 

- - --- 

- ----------- 

STAGE 

FIG 7 Th iE Figure shows a com pilation of the findin gs of four 

stud i E-t; of infants' ea'-· l:1 '.Iocal de'J E- lcpment . Each column refers tc a study . 

The apprcx imatQ age of onset and durat ion of eac r, tag I S I n6, c Ed y the 

timE- sc ale on the ord inate . The s t a ges are separated by das h ed lin es to 

inicate the cont inui ty between stages . 

126

the time of o nset of r edupl icate d babbl ing. Wit h regard to glottals, we 

e x p l ain s the p r eva l e n c e of glottal productions in th arly reprtoir. 

In t h fol lowin g bab bl ing stag , th goo- or coo-stag , th consonant 

sounds "have pre dom i n a ntly a ve lar plac of art iculatio n" accord ing t o 

stark ( 1 980: 95 ) . Simi larly al lr ( 1 980: 9 6) &tats that ther i& a "vel ar 

consonant prf erence" . In th Dutch s tudy (Koopmans van B i num t al a 1986 ) 

the ma in charater i&tic of thi& &tage i& consi drd to be th "onset of 

ar t iculatory movemen ts" . Accord ing to our data these movements ar producd 

by rai&ing or retracting the back of th tongu to form mai nly incomp lt 

constrictions against the ve lum or uvula. It has been proposed that the 

high occurrenc of dor&al sounds wre due to &pecific body po&ture (c.f. 

Locke 198 3) . The h y poth e sis states that in the sup ine positio n the ffcts 

of gravity upon the sof t structures of th vocal tract would 

result in a 

predomi nance of bac k art iculations . Ol ler and Gavin 

analyzed ten infants in th ages of one to four months, when gooin g is 

supposed to be predominant . The infants wre recordd in both sup in and 

u p r ight position. Th rsults revaled no support for the body posture 

hypothesis . The infants did not p r o duce more goo- s ounds in the sup ine 

position, in fact there was a sl ight preference for goo-sounds in the 

upr ight posi t ion. An o t h e r poss i ble e x p lanat i o n for the occurrence of 

babbl ing stage& is to regard them a& resu lts of neuro logical maturat ion. 

Accord ingly, the ve lar art iculatio ns found in this stage could be accou n t d 

for by nuro logical maturat ion factors &uch as the earlier myl inizat ion of 

the cranial nerves r espo n sible for the muscu lar control of the poster ior 

part of the tongue (ecour& 1975 ) and /or o f th earlier myel inizat ion of 

the c orre spo n ding area in the primary motor cortex (Whitaker 1973 , Sa lus 

and 5a lu& 197 4) . However , recent data on early neuro logical deve l opment 

(Ra kic, Bou rgeo i s , Ec kenhoff , Zecev ic and Gold man- R a kic 1986 ) suggests that 

127

th is might not be a C:Clrr&ct; " Tn l::Q p·r ' tat:f6n· . · ihS? '"" ' n ' u r Q'l c9'i C:ia.l maturat ional 

processe s do not appear to d evelop in a hierarchi cal Clrder but rather as a 

glClba l process . Resent data also suggest that myel inizat ion might nClt be 

necessar i l y r elev ant to the funct ion of the nerve (Foster , Connors and 

Waxman 1992 ) . 

vowel -l ike art iculations is found . Koopmans van Bei num et al . 

(1986 ) 

the phonatory doma in concerning intonat ion, durat ion and intensit y" . 

Stark 

(1980 ) 

such as pitch leve l , pitch change and loudness are manipulated . Apart from 

marginal babb l ing and vowel-l ike elements (Uful ly resonant nuclei " ) Ol ler 

(1980 ) talks of raspberry voca l i zations, squeal ing, growl ing and ye l lin g as 

typical vClca l i z at ion categor ies of this stage . Ol ler (1986: 29 ) descr i bes 

these voca lizat i ons as representing an "exp l orat ion of a vocal 

potent ial " 

and of an 

intensity, f r e qu ency and phonatory dom ai n. The ab i l i t y to control these 

aspects of vocal production could be regarded as vital to later speec h 

deve l opment. Af ter having exp l ored the vocal capacities the chi ld can begin 

to mod ify and ref ine them accord ing to the requ i rem ents of the amb ient 

language . 

A maj or phonet ic mi lestone dur ing the first year of l i fe is the product ion 

of redup l i cated babbl ing. The onset of this mi lestone has been found to be 

fairly sl.ldden , occl.lrring around seven months of age . Accord ing to Ol ler 

(1980:99 ) , this is the fir s t stage in wh ich the chi ld produces sy l lables 

"that conform to mature natural language restrict ions" i . e . sy l l a bles that 

could be accep t ed from a phonological point of view. Th is is an important 

ac hievement since it means that the infant now as a fortl.li tous consequence 

128

of a natural movement can prod uce voca l i zat ions ac ceptab l as words in an 

adu lt langu age . Consquntly adu lts now bgin to rport "words" from thir 

infants, or that they have begun to "talk". Interest ingly in what is known 

as "Baby Ta l k" a simpl ifid lxicon is usd by adu lts (or older sibl ings ) 

when ad dre ss ing infants where the pr inciple of redupl icat ion is app l i ed 

givin g rise to forms such as "vovve " , "pippin and "totto" in l i eu of 

Swed ish "hund " (dog ) , "fAg el " (bir d) and "hast " (horse) . The fact that the 

words used in many languags to dnote the two most important pegp l (mommy 

and daddy) for the chi ld have th same repet itive structure strikes one as 

b ing mor than a coincidence. Instead it seems likely that the adu lt 

language has chosen phonet ic forms simi lar to tho s e of the chi ld's own 

product ion patterns threby creat ing a l i n k of denotat ive function between 

the infant 's voca l i zat i ons and the adu lt langu age (c.f. Loc ke 198 3) . McCune 

and Vihman ( 1 987 ) has suggsted that th infant when produc ing his/her 

first words might be se lecting adu lt w o rds on the same basis i . e. the chi ld 

on ly attmpts to produc those words that fit the pro duction patterns of 

that ch i ld's babb l ing repertoi r e. As ment i oned earl ier, redup l i c ated 

babbl ing is a rather monotonous type of babb l ing, bath with regard to 

intonat ion and to the const ituent consonant and vowe l s egments . Therefore 

the maj or character istic of this stage could b said to be the 

redu pl icat ion itse lf . 

In the fol lowing mi leston, th var i gatd (or non-rdup l i cated ) bab bl ing 

s t age the chi ld seems to comb ine certain featurs of the intona tio n il, l 

var iations of the prev i ous vow l-stag with th redup l i cil,ted uttril,nces , 

resulting in sentence- l i k e stri ngs of babb le with il, lternat ing consonants, 

vow ls and pattrns of stress . Ol ler (1980 ) refers to this last type of 

babb l ing with contrasts of sy l labic stress as "gibberish" , others have used 

th trm "j argon " (Menn 197 8) . Stark (1980 ) differs from Ol ler by including 

non-redup l i cated u tteranc es (e.g. V, VC , CVC ) in this stage . We find, along 

129

th& l i n &s of Stark , that th&r& is an expansion in the number of NRBs 

fol low i ng the onset of r edu pli cat ed babb l i ng . 

From the above observat ions the tentat ive conc lusion can be drawn that 

infants fol low a uni versal d eve l opmental pho n et i c course in their babbl ing 

l i fe. Babbl ing deve lops from what might be 

consi dered a. p roto - sy l l ab i c form. into more speec h - li ke phonet ic events 

acceptab le as parts of mature natural languages (c.f. Ol ler 1986 ) . If the 

phonet ic repertoi re of the babb ler is compar ed with the phonet ic patterns 

most common ly found in the language. of the world, i . Qt • language 

un i versals, clear simi larities are found . The preference for open (CV) 

sy l lables over closed (VC) , sin gle consonants over clu .ters, vo i ced initial 

stops over vo iceless , unvo i ced final obstruents over vo i ced , initial stops 

be ing ap ical rather than dorsa l and final obstruents be ing preferably ve lar 

or glottal are examp les of suc h .imi l a riti es . These p aral l els are 

interest ing since t h ey i mp ly cont inuity in bab bl ing and speech. 

If one accepts the cont i nuous view, the qu estion s ar ise of h2 and hn 

babbl ing beg i n s to show influ ence from the amb ient language . There are 

those who be l i e ve in an ear ly detectable influ ence in the infant 's bab bl ing 

(de Boysson-Bard ies 1982 , de Boy.son-Bard ies, S ag art and D urand 1984 , de 

Boysson-Bard ies, Sagart, Hal le and Durand 1986 ) and those who be lie ve that 

l a ngu age dependenc ies begin to af fect the chi ld's productions at a 

re latively late stage (ocke 1983 ) . The simi larities between bab bl ing and 

l a nguage un iversals have been taken to support the view that early vocal 

deve l opment is due to innate biological prereqUisites for language 

Loc ke 1983 ) . We consider it important to distingu i sh between the s e gme n t a l 

and the suprasegmental lev els of vocal behav ior when discussing phonet ic 

deve l opmen t. In our view the supra-segmental feature. specific to a given 

langu age might be ac qu ired earlier than the segmen tal . However as Vihman et 

130

al ( 1986 ) points o ut , we do not y e t know what features in the babb l ing of 

young infants are due to exposure of specific language versus language 

!: !! 

• 

ConSider ing the present data, two maj or quest ions ar ise, 

We be l i e ve that a tentat ive answer to the f i r s t quest ion l i es in the 

understand ing of man as a commun icative be ing. Commun icative behav iors are 

found in al l floc k an imals and const itute a necessity in order for the 

members of the flo ck to funct ion as a who le (Wi lson 1980 ) . In spec ies where 

the young offspring d emand s a great deal of attent ion and care from the 

mother it is of vital importance that the parent-o ffspring relation be 

strong and fundamental . The newborn infant is incapable of caring for 

itse lf. It is in other words total ly dependent on the caregiver for 

survival . From this point of view it is not difficult to see why the infant 

deve lops commun i cat ive behav ior in response to the caregiving treatment . 

The function of this behav ior is to t ie the two individuals emot ional ly to 

eac h other . We know that infants synchron ize their body movements to that 

of adu lt speec h (Condon 1974 ) and t hat they are born with the capac ity to 

imitate facial express ions and ident ify vocal sounds (Kuh l and Me ltz off 

1984b) . Further it has been observed that infants by t h e end of the first 

mon th of l i fe begin to respond to speec h with vocal i zations (Trevarthen 

) . This, we be l ieve , i mp l ies a biological component respons ible for the 

foundat ions on wh ich the commun ica tive behav ior is based . 

It i s known that infants also vocal ize outside of communi cat ive situat ions . 

They babb le when playing alone . Therefore bab bl ing cannot be seen as having 

an exc lusively communica tive funct ion, nor can speec h (c.f. P i age t 1973 , 

Vygotsky 1971 ) . There are individual needs in the infant such as se lf 

131

st imulation and play to be consi dered . Just as play is an important part in 

t he chi ld's learn ing to control its c o n stantly growin g body, bab bl ing can 

be consi dered as exp l oratory play aimed at control l ing the rapidly changin g 

ap p ar atu s (c.f. Fry 196 ) . It has been s uggest ed t h a t experie n c e is 

not essent ial to the normal deve l opment of vocal behavior, as indicated by 

reports from infants who for med ical reasons have been p r evented from 

babb l ing (Lenneberg 1967 ) . However the tracheotom ized infant referred to by 

Lenneberg who had a tube inserted at eight months of age and had it removed 

at fourteen months had most probab ly begun to produce redup l i cated babb li ng 

at the time of insert ion in wh ich case the effects of the tube on phonet ic 

deve lopme nt might have been minor . This quest i on as we l l a s the quest ion of 

the importance of aud itory exper ience in b abb l ing is st i l l under debate. 

The r epor ts stating that d e af infants babb le as hear ing inf a nts do 

(Lenneberg 1967 ) are contrad icted by recent data (Ol ler 1986 , Stoe l -Gammon 

a n d Otomo 1986 ) suggest ing that deaf infants do not p r odu c e redup l i cated 

fu l l stop babb les dur ing the first year of l i fe. 

The answer to why t h e r epe r toire of the babb ler seems to be a uni versal one 

might be found in anatomical and aerodynamic constrai nts on the vocal 

mechan i sms with consi derat ions of the p h y s i o l og i c a l and neuro logical s t age 

of deve l opment. If one thinks of the art iculators in terms of a spr ing and 

mass system .. ,here a certai n amount of imped ance is present in eac h 

movement , one can explain some art iculatio ns as be ing more economical i . e. 

more eas i ly p roduc ed , than others. By economic is meant those movements 

that require the least amount of energy 

in order to be performed in 

relation to the structures invo l ved , a sort of art iculatory cost-benef it 

analysis (c.f. Lindblom, MacNei lage and Studdert-Kennedy, in press) . The 

se lect ivity of the babb ler in relation to pho n et ic preferences mig h t be 

understood in simi lar terms . The chi ld produces those art iCUlations that 

are the most economical and t hat acoust ical ly are the most s al ient. Th is 

132

impl ies that norma l babb l ing prespposes intact ad itory funct ion. 

If one considers the redp l i cated babbl ing stage in simi l a r terms the 

open ing and closin g movements of the jaw reslting in sy l l ab le-like events 

might be viewed as an oscil l at ing system , the constraints of wh ich are set 

by neurological maturat ion. The consequences of this redup l i cated behav ior 

might be that the infant so to speak discovers the sy l l ab le and indirect ly 

the supraglottal f l l stop art iculation. In order to voluntari ly contro l 

and coo rd i nate a movement it might be necessary first to produce this 

movement repet itively (c.f. Thelen 1981 ) . In the case of babbl ing, factors 

such as the length of the breath cyc l e would determine the durat ion of the 

ear ly repet itive vocal i zat ions . kater , as the ab i l i t y devel ops to contro l 

movements vo lntari ly, the chi l d can free itse lf of sc h bon ds and more 

free ly determine the number of redupl icat ions. 

To conc lde, the ev idence am assed in the l i t e r atre as we l l as that 

presented here strong ly impl ies that babbl ing can be consi dered a precursor 

of speech, in form as we l l as in fnction. 

The authors are great ly indebted to prof essor Bj o rn ind blom for his 

ass istance and he lpful comments on the manuscript . 

We wold also l i ke to thank Mari lyn Vihman for being a great sorce of 

insp ira tion and for commenting on the manuscript . 

We thank the parents and the inf a nts who p a r t icipated in this proj ect for 

their pat i ence and enthusi asm without which this proj ect never wou ld have 

133

We are indebted to Karin Ho lmgren for hr initial work in col lecting and 

analysing data and we thank Boe l Harl id for her ass istance in transcribin g 

the mater ial . 

docent Birgitta Jai l i n g, doctor Goran Aurel ius and doctor Ann-Sof ie 

Er icsson at Sankt Garan's Chi ldren's Hosp ital for int erst ing and fruitful 

col laborat ion. 

The authors wou ld l i k e to express thir thanks to Harved He llich ius for 

drawin g the Figures and Tab les of this article. 

134

FOOTNOTES 

1 Spgn5gred by Fijr5tmj blommans R ik5 f6rb nd (Fir5t gf May Flower Annl 

Cam paign for Chi Jdren'5 Health) . 

2 The chi ldren were medical ly exmined at birth and a p5ychDmDtgr 

deve l opment test ( The G rif fit h Ment al Dev e l opment Sc ale) was perfo rmed at 

5, 10 and 19 mgnth5 o f ge . The chi l dren5' re5u lt5 were found to be we l l 

bove those of a s t a n d a r d group on al l occ as ions (Norberg 1994 ) . 

3 The equ i pment u5ed wa5 a Sony table microphgne nd a Uher tape-recorder . 

4 The signal was computer digital i zed and ed i t e d with n ILS progrm cal led 

M IX, deve l oped at the Royal In5t i tute of Technol ogy in Stockholm. The 

ac tul cutting po int in the signal would be an intensity zero poi nt as 

clos e as pos5ible to where the chi ld's utterance began . We were careful not 

to ex lude initial Dr final vo iceless segments and we thought it important 

that the utterance wa5 not di5torted by the cli pp ing 50 a5 to sound 

unnatural . 

135

Bic;l dQY, C. ( 1 9S3 ) . A C; Q U lii t i c: Ev i dQnc;Q fQ... Phenele glc:al 

Vewe I Iii i n Young Chi ld ... en . Speeh Commun ic;a tion G ... eup 

Resea ... l:h l. abo ... ate ... Y of Elel:t ... en ic;s, M. I.T • • . 111-124 . 

Deve l Qpment 

ef 

We ... k ing P .. pe ... s , 

Biklay, C. , Lindblom, B. and R eug , L. (1986 ) . Ac;Qulit ic; Measu ... eli Qf Rhythm 

in Infant 's Babbl ing. Pape... P ... esented at the P ... Ql:eed ingli ef the 12th 

Inte ... nat ipnal Ceng ... eliis on ACOU li tic;li, TQ ... cntCi. 

de BeYlii lien-Ba ... d ieli, B. ( 1 982 ) . D c Babieli Babb le a li 

p ... elientad at t h e Inte ... nat icnal Confe ... ence Qn 

Texas . 

Speake ... 1i Speak? Pape ... 

Infant Stud ieli, Au.t in, 

de BQYlison-Ba ... d ieli , B. , Saga ... t , L. and 

Diffe ... enc:es in the Bab b l ing ef Infantli 

Jou ... n al ef Ch i ld Lan guage . 11 . 1-15. 

Du. ... and , C. ( 1 984) . Dilil:e ... n ible 

Acco ... d ing te Ta ... g9t -Langu.age . 

de Bt'lysson-Ba ... die s, S. , Sagil. ... t , L. , Ha, ) l e, P. and DU ... iilnd , C. ( 1 9 86 ) . 

Aceulii tic: Inv elit igat i enli ef C ... esli-l ingu. ilitic: Va ... i abi l i t y in Bab bl ing. In 

Lindb lgm and Zette ... st ... cm (ed s. ) P"'aB o f Ea!:l::t. .EEJ!h . Nel., Ye ... k : 

St, oc l< tcn P ... es s . 

Suh ... , R. D. ( 1 980 ) . The Eme ... genc;e ef Vewe ls in an Infant . leu ... n al ef Speec h 

and Hea ... ing Relie .. ... c; h. al . 73 - 

94 . 

Bu l l c wa , M. ( 1 979 ) . Bef e E£h • 

Camb ... idge Un ive ... s ity P ... elili . 

Elulih , C.N. , Edl .. ... dli, M.l .. , LUI:I

Kent , R.D. (198 1 ) . Art iulatory-Aou st ic Persp ec t ivs o n 

Deve lopment. In Stark (ed. ) ban e hyi o in !nfnEX 

ChU&b.QB& . Nec.-, Yorl-: : £:: lsev ier NCirth Ho l land . 

Sp eec h 

i\ n & §!!.!:lx 

Kent , R.D. a n d Bauer, H.R. (19 85 ) . VClc a l i zat i ons of on-year-o lds. 

of Ch i ld Language . Ai . 49 1 -526 . 

J ou r nal 

Kent, R. D. and Murray , A. D. ( 1 982 ) . ACCIust i Features of Infant Voca lic 

Utteranes at 3,6 and 9 months . Journal of the Aous tic Soc iet y of Amer ica. 

Z2 :g 

• 353- 365 . 

Kuhl, P.K. and Me ltzoff, A. N. (19S4b ) Infant's recCign ition of Cross-modal 

Corr e s pon d enc e for Sp e ec h: Is it based on Phy s i c s or Phont ics? Journal of 

the Acoust ic Soc iety of Amer ica. Za . Suppl .l. sao . 

K oopm a n s van Be inum , F.J. a n d van dr S t e l t , J.M. (1986) . Ear ly S tag es in 

the Deve l opment of Speech Movements. In I_ indblom and Letterstram (ed s. ) 

Pr!!E.!!!:rs of Eatl::t. §eJ!b. • Nec.-, Yorl!': Stockton Press . 

L.eours , A.R. 

and I.. anguage . 

D e12I!!!li.:.. 

Press . 

(1975) . Mye l og&net ic Correlates of the Deve l opment of Sp eech 

In Lenneberg and L.enneberg (eds. ) Eni2n 2£ ks 

8 l:!!!!lt idiE.ielia 8EEJ:2 h.l.. 9.l.!..! Nec.-I Y Cl rk: Academ i 

Lenneb erg, E.H. (1967) . Biologica) Foundat ions Clf L.angu age . New York: John 

W ile y 8t Sons . 

L.enneberg, Ii.H. and I_ e nneberg, E. ( 1 975) . 

DelBE.!!!t : 8 l:!!t i &.i![E.iI!lia 8EEJ:c h.l.. Q.!.!..l 

Press . 

FOillld at i!:!!l of L.a!lS 

New York: Academ ic 

L.indb lom, B. , MacNei lage , P. and Studdert-Kennedy, M. n i - Kom s hia n, 

Kava n agh a nd Fe r gu s o n (ed s. ) £b.il& Ehong las::t..a.. Va!.:..! 

137

Ol l&r, K.D. (198 1l . Infant V C:II: al i zat ion.: Exp lcrat ion and R&f l&xivity. In 

Stark (&d. ) b9 e hyicJ:. in .!nfn£:t m!. §l:t £hil£!.h!!S!£!. . 1'1& ... , Ycrk: 

El6ev ier North Hc l la nd . 

Ol l&r, D.K. (1986) . Metaphcncl cgy and Infant Vocal i zat ion •. In Lindblcm and 

Zetterstri:im (eds. ) Prll!:§Br. of Eatl:t. §E.h • I'I&W York : St.cckt.cn Pr& ••• 

Ol ler, D.K. , Wieman , L.A. , Dcy}&, W.J. 

and Sp&ech. Jcurnal cf Chi ld Langu ag& . 

and Rc •• , C. 

;! • 1-11. 

(19 76 ) . Infant Sabbl ing 

Ol ler, K.D. and Ei lers, R.E. (19 82 ) . S i mi l ar i t i &. of babbl ing in Span ishand 

Engl i.h-l&arn ing babi& •. Jcu rnal cf Chi ld Langu age . 2 . 565-577 . 

Piage t, J. (19 71 ) . Lanss nd Thshi 2i ihe £ hil£!. . I'I&W Ycrk: Th& Wcr l d 

Publ ishing Company . Translation by M. Gabain frcm French or igin al "Le 

Langag& &t la P&n.&& ch&z l'E nfant". ( 1 968 ) . 

Pike, K.L. 

Pr&6 •. 

(19 43 ) • Ann Arbor :Th& Un i v&r.it.y cf Michigan 

Rakic, P. , Bour g&o i., J.P. , Eck&nhoff , M.F. , Z&c&v ic, 1'1 . and Gc l dman-Rak ic, 

P.S. (1986) . Concurrent Ov&rproductic n of Synapse. in Diver.e R&g ion. of 

th& Pr imat.& C&r&bral Ccrt.&x . Sc i&nc&. . 232-234 . 

Roug, L. , Landb& rg, I. and Lundberg, L.l. , (fo rthcoming) . Acoust ic Anal yses 

of Four Sw&d i.h Infant.. Ear ly Vocal i zation • . 

Salus, P.H. ilnd Sa lus, M.W. (1974) . 

Phono logical Acqu i.ition Ord&r . Languag& . 

Deve lopmental Neurophysiology and 

3Q • 151-160 • 

Sta rk, R.E. (1980 ) . S t age s of Sp&ech Deve l opment 

In Yen i -Kom.h ian, Kavanagh and Ferguson (&d • . ) 

Prodlli:li!2!l 

• 1'I&oJ York : Academ i c;; Pr& ••. 

in the Fir.t Year of Life. 

£hiU Ph!2!l!llQ9:t..L VoL..! 

stark , R.E. , (eds. ) (198 1 ) . Lans B&hv iQJ:. in .!nin£ nd E atl 

Child hQBQ. . I'I&W York: lil.ev i&r North Ho lla nd . 

van d e r Stelt, J.M. and Koopmans van Be i num , F.J. (1986) . The On.et of 

Babb l ing Re l at&d to Gro •• Mot.or D&v& lopment. In Lindblom and Z&tt.&r.trom 

(eds. ) Pr!!'£!!J:.§2r. of Eatl:t. ih . 1'1& ... , York : S tcckt.on Pr& ••. 

Stoc kman , J. , Wood s , D. 

Phcln&t ic .&gm&nt.. in 

Psych ol ingu ist ic R&.earch. 

and Tishman , 

Early Infant 

!Q . 593-617. 

A. (198 1 ) . Li.t.&ner Agre&m&nt on 

Vocal izat. ion.. Jou rnal of 

stee l -Gammon , 

C.M. 

and Oteme (1986 ) . 

Journal of Sp &ech and Hear ing 

R&.&arch. 

Th&len, E. ( 1 98 1> . Rhyt.hmical B&havior in 

Persp e c tive . Deve lopm&ntal P.ychclogy . !Z . 

Infanc:y: 

237 -257 . 

An 

Et.hclogical 

Th!!, EJ:.in£iEl! .!2i ih e I n t &J:.nati c l Ehcii£ 8l!£ia t.ion • Obt.ainable from 

the Internat ional Phonet ic A ssocia t ion, Un iver6it y Co l leg& , Gow&r Street , 

LcndCln . 

Vihman , M.M. ( 1 986 ) . Individual Differ&nce. in Babb l ing and Early Sp&&c;; h: 

Pred icting to Age Three . In Lindblom and Zetterstrom (ed s. ) PrllJ:.l!.!2 of 

138

Vih man , M.M. , Macken , M. A. , Mi l ler, R. , Simmons, H. a n d Mi l ler, 3. (1995 ) . 

Frcm Bab bl ing tc Speech : A Reassessment of the Ccnt inity Issue . banguage 

61.i..£ • 397-445 . 

Vihman , M.M. , F e rgscn , C.A. a n d 

Deve l opment from Babbl ing to Speech: 

Differences. App l i ed Psychol i n g istics. 

Elbert , M. ( 19&16 ) • P h cnc l cg i c al 

Commcn Tendenc ies and Individual 

Z . 3-40 . 

VygotS\.c)/, L.. (1962 ) . 

E ng l i s h translation 

( 1 934 ) • 

Th9.hi nd 

by E. Hanfmann 

L. a nss!! 

and G. 

New Ycrk : M. I.T. Press . 

Vakar of Rss ian cr iginal . 

Wi l s cn , E.O. ( 1 990 ) • 

Harvard Un iversity Press . 

W h i taker , H.A. ( 1 973 ) . Ccmments on the Innateness cf L.angage . In Shy 

(ed. ) §.!!!!ru! Ne ,Ri rtie.n§. in bi!lSl.l istic Wash ington D.C. : Gecrgetol>m 

Un i vers ity Press. 

Yen i -Kcmsh ian, G.H. , K ava n agh , 3.F. , and Ferguscn , C.F. , (eds. ) 

C hil& Eh c!l.!!!lB.J.. E!.:...!.J.. E!:Bduc t i cn • Nel>l York: Ac adem ic: Press . 

( 19E10 ) • 

139

A SIMPLE COMPUTERIZED RESPONSE COLLECTION SYSTEM 

Johan Stark and Mats Dufberg 

1. Introduction 

The object of the system described here is to enable automatic response 

collection directly from the respondent (s) to a computer readable 

media. This has several important implications including rendering 

unnecessary the manual transfer of the data from answer forms to a 

computer for subsequent analysis. Data will instead be directly 

available to the computer. The computer may in turn perform on line 

processing of this data and hence control the data collection procedure. 

The computer configuration consists of one main computer and a number of 

terminals. 

In section 2 an application will be described. The application 

shows that the computer system is a useful tool and that it can be 

handled by personell who are inexperienced in computer programming, as 

was the case in the project described below. In section 3 possible 

future applications will be discussed. In section 4 the hard and software 

and the necessary programming will be presented. 

2. An application 

In a project described in McAllister et al. (1987) we wanted subjects to 

give judgements on recorded speech material. We decided to use a 

computerized method for collecting the responses from the subjects. We 

will first very briefly present the project and then describe how we 

used the computer system. 

For the project, we recorded a number of students before and after 

a certain training period. The same material, consisting of sentences 

and words, was read at both recordings. The recordings were digitalized 

and recorded to disks. From this material we produced test tapes. On the 

test tape the material was presented in pairs. A pair consisted of the 

same sentence/word read by the same student at the two times of 

recording, before and after the training period. The order within the 

pair was random. Then we recruited a panel of experts to judge which of 

the two members of each pair was the best (McAllister et al. 1987). 

Each subject in the listening test, that is, each expert-panel 

member, was sitting in front of a terminal with a keyboard and a 

monitor. We presented written information on each screen which was sent 

from the main computer. The information was, in this case, the standards 

that the subjects should be using for their judgement. The tape recorder 

was automatically started when all terminals had received the written 

information. The speech material was presented through headphones from a 

taperecorder. The tape was specially prepared with the speech material 

140

on channel one and control tone signals on channel two. The tone signals 

were placed directly after each pair. The tone signal triggered the tape 

recorder's stop mechanism immediately after each pair was presented. The 

tone signals were also sent to each terminal. The keyboard was locked 

for key presses until the terminal had "heard" the tone signal. And then 

the terminal only reacted on certain keys, namely those keys that gave 

the three accepted responses, the return key, and the back space key. 

That is, the subjects pressed one key for their judgement and then the 

return key. They could change their minds by pressing the back space key 

before the return key. All other keys appeared to the subjects to be 

"dead". The data from all terminals were then collected by the main 

computer and everything was repeated again until the end of the test 

tape. After each session the data were stored on files, one file for 

each subject and tape. 

We had decided to use SAS, a statistical program, for statistical 

analysis, so our data files had be compatible with the SAS program. The 

data files that resulted from the test sessions were pure text files, 

that is, they contain only normal characters. But the data files 

contained only the pure data, that is, they contained no information on 

which student or which sentence/word the data was connected to. That 

information was stored in a special key file. The key file also 

contained information on which order within the pair the sentences/words 

were presented. With the help of this key file and a small computer 

program we transformed all data files for each test tape into an SASreadable 

matrix. 

Some pros of this computer based system are: 

- The data is directly stored on computer readable media. 

- The response interval can be controlled on line, for example by the 

responders. 

- The responders will never get lost. What they respond to will always 

correspond to what just heard. 

What we had to prepare for this application (except for computer 

programs) were the test tapes with tones, the key files, and the the 

text files that contained the information that was written on the 

screens of the terminals. We would have had to make test tapes in an 

ordinary pencil-and-paper application too, and the key files and text 

files were easily made from the command files that produced the test 

tapes. 

3. Future applications 

The computer system could easily be used for a number of applications. 

We hope that there will be a library of standard applications so that no 

or marginal programming will be necessary for the user in the future. It 

is reasonble to expect the following applications to be standard: 

- Listening experiments with or without written information, with or 

141

without limited number of response alternatives. 

- Experiments measuring response time of audible and/or written stimuli. 

- Demonstration experiments for seminars. 

Computerized correction, on line or after the session, could of 

course be included in any application. 

4. Hard and software 

The first prototype system was set up by connecting a number of cheap 

personal computers (Micro-Bee 32) via a simple wire interface. This was 

dne by using the original input/output system already present on them 

for printer control etc. The printer interface is used as a parallel 8 

bit bus, and 4 bits from the serial interface is used as a control bus. 

Altogether these 12 bits are connected to an ordinary flat cable using 

standard 25 pin D-sub connectors. Up to 16 machines may be connected in 

this way. An ordinary CP/m system (SI-80) is used at the end of the line 

as a server of this network. The interface here is equally simple, only 

12 bits of digital I/O is used. 

Data may be transferred to/from the server to/from any terminal on 

the line. Each terminal has a unique address which enables it to 

communicate independently of the others. All data flow is controlled by 

the server system. The data transfer speed is about 20 kBytes/sec which 

gives almost no delays seen by the terminal user. The updating of one 

full terminal screen will virtually take place in no time at all. 

Each terminal has a portion of so called Boot strap software stored 

into a resident non-volative memory. Actually the character generator 

eprom has some unused locations that are used to store this software. 

This piece of software will initialize network processing by a simple 

startup command from the keyboard. The server system has similar software 

loadable from a diskett. 

On top of this, each terminal can load a portion of software 

written in Basic that enables the user to write Basic programs that 

communicate with the server system. These network services are easily 

programmable and may be extended to whatever commands that are wanted. 

On the server side the command processing is written in Turbo Pascal. 

The prototype system has so far the following commands available: 

1. Send a block of data to server. 

2. Receive a block of data from server. 

3. Load a program from server. 

4. Save a program in server. 

5. Load and start a program. 

6. Start a Revox tape recorder. (An additional interface required. ) 

7. Stop a Revox tape recorder on an audio signal. 

For a particular experiment the user will have to make an 

142

application program using the more general software described above as a 

library. The application software will consist of a server part written 

in Turbo Pascal and a terminal part written in Basic. First the terminal 

program is implemented on a stand alone terminal. The Micro-Bee 32 

computers are stand alone computers with a built in Basic interpreter, 

computer screen and keyboard. Network services may be simulated in Basic 

using DATA statements as input and PRINT statements to check output. 

Similarly the server system program may be tested by simulating the 

terminals. When both programs seem to work satisfactorily a real version 

is set up and tested. If properly done the program will then handle 

several terminals simultaneously. This enables anyone to set up a simple 

or more sophisticated data collection procedure for his experiment in a 

fairly short time. The application software may also partly be used to 

update the general part thus after some time of use providing a whole 

database of readymade software for various cases of data collection 

experiments. 

Since every terminal also is an independant computer with its own 

CPU, the processing power of the system will be large. One effect of 

this is the ability to let each terminal individually measure the time 

for a response with a very high resolution. 

For users less experienced in programming in Basic and Turbo Pascal 

some general data collection program suited for a number of common 

situations could be written by someone more experienced. Then, a simple 

set-up file editable from an ordinary word processor could easily be 

used to determine some application variables in the data collection 

procedure, such as how many terminals are connected, how many responses 

to collect and the names of the files to be used for text input and data 

collection storage. 

A second generation of the system is under development using an 

IBM-PC-AT as the server computer. This removes the necessity of a 

special data transfer from the present CP/m machine to the more common 

MS-DOS format diskettes. 

REFERENCES 

McAllister, Robert, Dufberg, Mats, and Wallius, Maria (1987): 

"Experiments with technical aids 

Published in Perilus report no 

in pronunciation 

5 ( this volume ) . 

University of Stockholm, Institute of linguistics. 

teaching" . 

Stockholm: 

ABOUT THE AUTHORS 

Johan Stark is an engineer and has constructed the hardware and written 

the basic software for the computer system. 

Mats Dufberg is a graduate student in phonetics and has written the 

software and run the system for the application described. 

143

EXPERIMENTS WITH TECHNICAL AIDS IN PRONUNCIATION TEACHING 

Robert McAllister, 

Mats Dufberg and Maria Wallius 

1.0 Introduction 

This is a summary of experimental research whose aim was 

to 

test the utility of technical aids in pronunciation 

teaching. There have been several attempts in recent years 

to apply developments in speech technology to various 

language teaching/learning situations. In particular, 

there 

has been interest in a metodological approach which includes 

the concept of "feedback" as a learning aid (de Boot, 1980). 

This concept has been put to wide practical use with the 

advent of the so called "language laboratory" and its use in 

the field of second and foreign language learning. The 

relatively modest success of this movement has led to 

efforts to complement the audio active-comparative method 

most often used in the language laboratory. Some of these 

efforts have been based on the idea that feedback of the 

speech signal or some of its components via alternative 

sensory 

channels may be a viable aid especially in learning 

to produce suprasegmental aspects of the phonology of a 

foreign language. 

The teaching and learning of features such 

as rhythm and intonation has always seemed to present 

special problems and has proved to be particularly difficult 

144

( Crystal, 1975). Unfortunately, this difficulty has often 

led to the neglect of this important aspect of the target 

language phonology. Teachers have often been at a loss as 

to how to teach this part of the sound system. 

One of the 

traditions in this field is the use of 

a 

visual 

representation of the prosodic elements as a 

( Kelz et. al. , 1977). May different symbols and 

learning aid 

systematic 

transcription systems have been used but their common goal 

has been to augment the written text with an explicit 

notation of the prosody. This tradition provides the 

background for the research on technical aids in the 

teaching of prosody that has been done in the last three 

decades. The idea of using visual or tactile channels for 

the 

feedback of speech signal information has been used for 

many years in the teaching of handicapped learners or 

learners who are for other reasons not able to make 

effective use of the auditory feedback channel which appears 

to be indispensable in the production of normal speech 

( Potter et. al., 1948; Abberton and Fourcin, 1975; Martony, 

1976; Spens, 1984). This work drew the attention of 

phoneticians and linguists who were interested in the 

acoustic 

and perceptual nature of prosodic elements and the 

acquisition of these features by language learners. The 

basic idea here was that isolation and visual feedback of 

acoustic parameters critical to the rhythm and intonation 

could serve to concentrate the learners attention to these 

important and difficult aspects of the target language and 

thereby facilitate the learning of them. 

Pioneering work in 

this direction was done as early as 1966 by Harlan Lane 

145

(Lane and Buiten, 1966) . Since then, there has been 

considerable interest in the use of technical aids in 

pronunciation teaching. The subject has been discussed and 

several studies have been done, a large part thereof being 

of an informal nature (a few recent examples: Vardanian 

(1964) , Bannert (1979) , Albertson (1982) , Baker (1982) , for 

a critical survey see Leon and Martin, 1970) ). There have, 

however, been relatively few controlled studies of this 

methodology. 

Notable exceptions to this statement in recent 

years are James (1976) , Hengstenberg (1980) , and de Bot 

(1983). These researchers found a positive effect of the 

use of technical aids in the teaching of intonation. 

Generally 

speaking the learners who used the aids were more 

successful in learning prosodic features such as 

intonation 

than those who practiced according to traditional language 

laboratory methods. 

This report is a summary of research in which the 

methodology disussed above for the teaching of prosody has 

been used in a slightly different way. Our aim was to test 

the utility of technical aids and the feedback methodology 

as an integrated part of a foreign language course. Our 

basic question was similar to other studies already 

mentioned: Do technical aids help in the learning of 

prosody? - or formulated as a 0- hypothesis: learners who 

use the technical aids will not achieve a more native-like 

production of the prosodic features of the target language 

than the learners who do not use the techical aids. 

Aspects 

of this research that were somewhat different than the 

studies mentioned above include the integration of the 

146

training program in the course curriculum. Whereas earlier 

studies often compared performance before and after one or 

several short training sessions, we have tried to simulate 

an actual course situation where the training with the 

technical aids is more spread in time and integrated in 

the 

course as a logical part of the overall program. 

Consequently we have chosen to focus our interest on the 

obviously 

important "long term effects" of this methodology 

whose short term effects have been shown to be positive. 

2. 0 METHODS 

The methods used in this research will be presented under 

the following headings: 

1. Apparatus 

2. Training sessions and control recordings 

3. Progress evaluation through listener judgements 

It should be pointed out that steps 2 and 3 were carried out 

twice in two consecutive experiments. The first training 

experiment was done as a pilot study but is included in this 

report since the results of the two experiments were quite 

similar. 

The second experiment differed slightly on several 

methodological points and this will be elaboratesd upon in 

that which follows. 

147

2.1 Apparatus 

Our aim in the development of technical aids in these 

experiments was to provide visual and auditory feedback to 

the learners which would concentrate his or her attention on 

certain acoustic/auditory features important to natural 

sounding rhythm and intonation. Two technical aids were 

developed to this end. One to provide a clearer auditory 

impression of the prosody in practice utterances by means of 

isolation of the suprasegmental features. The other to 

present the learner with feedback of a visual representation 

of isolated acoustic features relevant to prosody and to 

make it possible to visually compare these features in 

practice efforts with a model utterance. 

2.1.1 Auditory feedback : "the Hummer" 

This device was developed with the idea that an isolation of 

the prosodic features in an utterance may have the effect of 

clarifying auditory goals toward which the learner was to 

strive. The instrument developed was, in electronic terms, 

a fairly simple one. The essence of this aid was a simple 

variable band pass filter which could be manipulated in 

several ways by the user. When the speech signal was fed 

through this filter it was possible to eliminate all 

segmental information so that the auditory impression was 

that of humming the original utterance thus effectively 

isolating the suprasegmental information. It was possible 

for the user to vary the center frequency of the filter so 

148

that the amount of segmental information present in the 

signal could be chosen at will. This instrument was located 

between the learners tape recorder and his earphones so that 

both the model utterances and his own efforts could be 

filtered and compared as a complement to the traditional 

audio active comparative method. 

A schematic representation 

of this device is shown in figure 1. 

FIGURE 1: 

F¢=60-300HZ 

I 

- , - 

36 

A 

II 

f : \ 

I 

I , 

/ I , 

I 

BP-FILTER 

BUTTERWORTH 

-6 po l. 

dn/oct 36dI3/oct FILTEr< 

• 

OUT 

IN. "'" FIGURE 1 

DIRECT 

A SCHEMATIC REPRESENTATION OF THE LEARNING DEVICE 

FOR AUDITORY FEEDBACK "THE HUMMER" 

In experiment 1 still another "hum" was used. The speech 

signal was filtered by means of a computer program developed 

by Peter Branderud (1979) and the practice utterances were 

prerecorded so that each practice utterance in the training 

material was followed by a filtered version of the 

utterance. 

149

2. 1. 2 Visual feedback 

The essential elements of this instrumentation were a 

,. 

fundamental frequency extractor (Martony, 1976) and a twotrack 

storage oscilloscope. In effect this device 

functioned in roughly the same manner as visualiz ers used 

and described by researchers previously mentioned in section 

1. (Lane and Buiten, 1966; James, 1976) . The learner was 

able to hear the model utterance and see its 

intonation/rhythm representation on the upper track of the 

oscilloscope. 

After storing this image he was able to try 

to reproduce this representation with as many tries as 

were 

needed being able to store an effort for inspection and 

comparison with the model utterance before going on to the 

next attempt at matching the model. A schematic 

representation of this instrumentation can be seen i 

figure 

2. 

2. 2 Training sessions and control recordings 

A graphic representation of the procedural steps in both 

training experiments is presented in figure 3. 

Subjects were selected and a screening test was administered 

to all the students who were to take part in the 

experiments. This test was used to establish the 

proficiency of the subjects in the perception and production 

of the prosodic categories that were to be trained. The 

subjects were divided into experimental and control groups 

on the basis of the results of this test with the aim of 

150

LD 

Ti\PE 

RECORDER 

00 

llCROPIIONE 

a==o 

--11 ' 

MIPLIFIER 

( l=::::J -1>- 

, 

F 

o 

I 

-extractor 

: , ! I '-C 

. I 

• 

STORi\GE OSClLLOSCOPE 

'iV\,;wJ'\r vJ\ 

-f\r,N-J\r" 

S'rOl\I 1 STOlm 2 

CIIMIHII. 

CIIi\NNEL 2 

C\J 

 

Il:: 

:::> 

o 

H 

 

FIGURE 2 

A SCHEMATIC REPRESENTATION OF THE INSTRUMENTATION FOR THE DEVICE WHICH 

PROVIDED THE VISUAL FEEDBACK

FIGURE 3 

s U B J E C T S 

students of English and Swedish 

ISCREENINGJ 

PRETEST 

E X P E R I MEN T 

I 1 I 

P-. 6 C 0 

, . 

aUQI tory visual auditory auditory 

(fil ter) 

t 

+ 

v isu al 

I 

1 

, 

K 

CON T R a L 

POSTTEST 

FIGURE 3 

A GRAPHIC REPRESENTATION OF THE PROCEDURAL 

STEPS IN THE TRAINING EXPERIMENTS 

152

creating groups that were as similar as possible in terms of 

the proficiency of group members prior to the beginning of 

the actual training experiment. A pre-test was then given 

to all subjects. This test consisted of a documentation of 

each individual subject's pre-training proficiency in the 

production of the prosodic categories that were covered by 

the training material. A recording of each subject's 

production of the relevant prosodic categories was made. 

subjects then trained in their respective groups and at 

The 

the 

end of the training period a post test was administered 

which was exactly the same as the pre-test. It should be 

stressed again here that one of the important aspects of 

these experiments was a concerted effort to integrate this 

training into the language course as a whole. 

2.2.1 Subjects 

The subjects in both experiments were recruited from two 

language courses at the University of Stock holm. The 

undergraduate pronunciation courses in the Department of 

English was one source of subjects. These were Swedes who 

had, generally speak ing, fairly high proficiency in English 

due to the emphasis on the learning of English in the 

Swedish schools. The other source of subjects were the 

courses in Swedish as a second language offered by the 

Institute of English Speak ing Students. These were foreign 

students speak ing many different native languages and 

were, 

almost without exception, beginners in their study of 

Swedish. 

153

2. 2. 2 Training material 

The linguistic practice material used in both training 

experiments was the same as that used in the regular 

language courses the difference being that only the prosodic 

material was used by our subjects during the training. 

Students from the English department used the relevant 

exercises and prerecorded tapes from "A Course Book in 

English Pronunciation" , Clerici (1984) in experiment I and 

2. These exercises emphasized sentence intonation types in 

British English as well as vowel reduction exercises. 

The students of Swedish used the exercise booklet "Uttal" by 

Marschall and Rosenquist (1983) and the corresponding 

prerecorded tapes. These exercises emphasized Swedish word 

accent in various phrase and sentence contexts, the 

long-short distinction, and the interaction of these 

categories with rhythm and intonation on the sentence 

level 

which has been shown to be critical to the realization of 

Swedish prosody. 

2. 2. 3 Training 

In experiment I the subjects were divided into 5 groups: 

group A trained with the "Hummer", group B with the device 

for visual feedback, group C with both the visual feedback 

instrumentation and the prerecored "hum" described above 

(3. 1), group D with the prerecorded hum only and group K was 

the control group who used the same training material but in 

154

the tradit ional way wit hout the technical aids. In 

experiment 2 there were only 3 groups correspondinng to 

groups A, B and K in experiment 1. 

hummer(group A), visual feedback 

group (group K). The subjects 

That is to say: the 

(group B), and control 

in experiment 1 were 

requested to practice 2 hours a week over a 4 week period. 

In experiment 2 this training time was increased to 2 

hours 

a week over an 8 week period. As has been mentioned above 

in the summary of the training experiments (2. 2) the 

pre-test was given at the start of the training period and 

the post-test at the end of this period. These tests were 

identical 

and were composed of the same prosodic categories 

included in the practice material. 

2. 3 Evaluation with listener judgements 

The purpose of this procedure was, of course, to evaluat e 

the progress of each subject in terms of the production of 

the relevant prosodic feat ures of the respective languages 

and 

to establish whether or not those subjects who used the 

technical aids showed a difference in progress when compared 

to the control 

group. 

The evaluation procedure differed 

slightly 

bet ween 

experiments 1 and 2. In the first 

experiment the pre and post test tapes were recorded into 

the DEC Eclipse computer at the phonet ics lab. Each tape 

was then edited by the MIX program, an interactive signal 

editor. Aft er editing there was one signal file for each 

utterance by each subject. The MIX file now allowed us to 

play the files back in any order. We made a syst emat ic 

155

selection of representative samples for the respective 

languages, ordered these utterances randomly, and created 

the tapes for the listening experiments. 

A panel of experts 

was then recruited to listen to these tapes and grade the 

utterances on a three poing scale with 1 being the least, 

and 3 being the most successful pronunciation of the 

target 

language. For the Swedish material, the panel was composed 

of Swedish natives who were either language teachers at 

the 

Institute of English Speak ing Students or linguists and 

phoneticians employed at the Department of Linguistics at 

the University of Stock holm. The panel who judged the 

English material were either native speak ers of English 

who 

had 

were 

experience 

teachers 

in teaching English prosody or Swedes who 

of English intonation in the English 

department 

at 

the University of St ock holm. For the 

evaluation of the second experiment the pre and post test 

tapes were edited in a similar way. 

For each language group 

we selected approximately half of the available test 

material. In contrast to the first experiment, this time 

the utterances were presented to the listeners in pairs. 

The pair consisted of the "same" utterance by the same 

student from the pre test tape and the post test tape 

repectively and these pairs were randomly ordered. The 

members of the pair could be presented to the listners in 

two orders: either the pre test utterance followed by the 

post test utterance or the post test utterance followed by 

the pre test utterance. With the help of a simple Basic 

program we radomized this utterance order within pairs. 

The 

different language groups were, of course, kept separate as 

156

in the first experiment. The task of the expert panels, 

which were very similar in composition to those in 

experiment 1, was now to assess which of utterances in the 

pair 

was the better production with respect to the intended 

prosodic category or to indicate that the two were equally 

good (or bad). 

For the collection of the listener panel judgements we 

used 

the DIRIS system (described by Dufberg elsewhere in this 

volume). At a listening session, each judge/listener used a 

computer terminal including a screen and a keyboard. Each 

intended utterance, together with the intended prosodic 

category, was written on the terminal screen. Then the 

recorded utterance pair was played and the tape was 

automatically stopped until all judges had given a response. 

The tape was then automatically re-started and continued 

to 

the next pair. The judgements were automatically stored in 

data files and recoded so as to allow us to treat it as 

least interval data. This data was then organized into 

matrixes compatible with the statistical program SAS. 

The 

statistical analysis was done on QZ's IBM/Guts computer's 

SAS program. 

3. Results 

Since experiments 1 and 2 differed slightly in terms of 

methods the results will be presented separately. 

157

3. 1 Experiment 1 

Figure 4 shows a summary of the average grades for all 

subjects with pre test score plotted against post test 

score. It appears that there is a definite tendency toward 

improvement in the realization of prosody for students of 

both languages. A t-test showed that this improvement was 

statistically significant at the 2. 5% level. This figure 

shows, however, no obvious difference in experiment and 

control groups. Indeed, no statistically sigificant 

difference could be established between the improvement of 

the experiment and control groups respectively. 

Figure 5 shows more explicitly the difference between 

control and experiment groups in a bar graph where the 

y-axis labeled DIFF SCORE is the difference between the 

average grade (1 to 3) on the pre test and average grade 

on 

the post test (also 1 to 3). Here we can see that, on the 

average, the students using the traditional language 

laboratory methodology improved more than the students who 

trained with the technical aids even though this 

difference 

was not statistically significant. 

In figure 6 the difference scores for the individual 

experiment groups and the control group are shown. 

It can be 

observed again that the control group showed better 

improvement than any of the experimental groups. None of 

these differences were statistically significant however. 

158

FIGURE 4 

AVERAGE GES FOR EACH SUBJECT 

3 

E-1 

++ 

en 

 

E-1 

+ 

E-1 2 

. 

en + • 

+ 

• 

0 

• 

• 

P4 + 

• 

1 

• 

• 

1 2 

3 

PRE TEST 

• = 

EXPERIMENT GROUP 

+= CONTROL GROUP 

FIGURE 4 

A SUMMARY OF THE AVERAGE GRADES FOR ALL SUBJECTS WITH PRE TEST 

PLOTTED AGAINST POST TEST SCORE. THIS GRAPH SHmvS IMPROVEMENT 

FOR ALL SUBJECTS WHICH WAS SIGNIFICMT AT THE 2.5% LEVEL. THERE 

1;vAS NO STATISTICALLY SIGNIFICAi\)T DIFFERENCE BETWEEN THE CONT2-.01 

AND EXPERIMENT GROUPS. 

159

.5 

FIGURE 5 

.4 

w 

Ct::: 

0 

u 

(fJ 

LL 

LL 

 

a 

· 3 

· 2 

• 1 

o 

EXP 

CONTROL 

GROUP 

FIGURE 5 

DIFFERENCE BETWEEN THE CONTROL AND EXPERIMENT GROUPS 

THE Y-.IS LABELED DIFF SCORE 

INDICATES THE DIFFERENCE 

BETHEEN THE AVERAGE GRADE ON THE PRE TEST (1-3) AND THE 

AVERAGE GRADE ON THE POST TEST (1-3). THE DIFFERENCE 

WAS NOT STATISTICALLY SIGIFICANT . 

FIGURE 6 

. 5 

· 4 

w 

Ct::: 

0 

u 

(fJ 

LL 

LL 

 

Cl 

· 3 

. 2 

• 1 

o 

A B C 

GROUP 

D 

K 

FIGURE 6 

DIFFERENCE SCORES FOR THE INDIVIPUAL EXPERIMENT GROUPS AND 

THE CONTROL GROUP: DIFF SCORE INDICATES THE DIFFERENCE 

BETWEEN THE AVERAGE GRADE ON THE PRE TEST (1-3) AND THE 

AVERAGE GRADE ON THE POST TEST (1-3): NONE OF THESE 

DIFFERENCES ARE STATISTICALLY SIGNIFICANT: 

160

3. 2 Experiment 2 

Figure 7 shows the results of training for experiment groups 

A and B individually and taken together (EXP) and for the 

control group K. The two language groups were also taken 

together in these results. It should be recalled here that 

the listener judgements were expressed in terms of better, 

worse, same. These responses were recoded to +1 for better, 

-1 for worse and 0 for same. All groups showed a positive 

result i. e. 

all groups on the average improved their mastery 

of target language prosody. This gave us a positive number 

between 0 and +1 for all groups. The y-axis in this bar 

graph represents subjects progress expressed in terms of 

this number. As was the case for experiment 1, the control 

group shows the most improvement. In this case the 

difference 

between the experimental group as a whole (group 

A plus group B) and the control group as a whole (K) was 

significant at the 2% level. The difference between groups 

B and K was also significant at the 1% level. The 

difference between groups A and K was not significant nor 

was the difference between groups A and B. 

Figures 8 and 9 show the same results for the individual 

language groups. The English learners' progress reflects 

the same tendencies as were seen in fig 7. The control 

group shows the most progress. The difference seen on the 

graph (fig 8) between the experiment group (A plus B) and 

the control group was significant at the 5% level. The 

difference between groups B and K was also significant at 

the 1% level. No other differences seen i figure 8 were 

16 1

FIGURE 7 

-:r 

.3 

(f) 

(f) 

w .2 

er:: 

L.9 

o 

er:: 

(L 

• 1 

-- 

-- 

-i- 

-l- 

I- 

-- 

-- 

o 

A B K EXP 

GROUP 

I 

FIGURE 7 

THE 

RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY 

AND TAKEN TOGETHER (EXP) AND FOR THE CONTROL GROUP K. THE Y-AXIS 

EXPRESSES SUBJECTS PROGRESS IN TERMS OF A NUMBER BETWEEN 0 AND 1 

(see text). STATISICALLY SIGNIFICANT DIFFERENCES: EXP-K 2% level; 

B-K 1% level. 

162

FIGURE 8 

(f) 

(f) 

w 

et:: 

l.9 

.3 

.2 

T 

T 

T 

$ 

T 

I 

T 

I 

-.!- 

! 

I 

0 I 

T 

T 

0::: 

-r 

• 

CL 1 

+" 

T 

0 

t 

I 

-r n 

;-- 

ENGLISH 

- 

A B I< EXP 

GROUP 

FIGURE 8 

THE RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY 

AND TAKEN TOGETHER (EXP) AND FOR THE CONTROL GROUP K FOR THE 

LEARNERS OF ENGLISH. 

OF A NUMBER BETWEEN 0 AND 1 (see text). 

DIFFERENCES: EXP-K 5% level; B-K 1% level. 

THE Y-AXIS EXPRESSES SUBJECTS PROGRESS IN TERMS 

STATISTICALLY SIGIFICANT 

FIGURE 9 

SWEDISH 

(f) 

(f) 

W 

et:: 

l.9 

o 

0::: 

CL 

.2 

• 1 

o 

A 

n 

B I< EXP 

GROUP 

FIGURE 9 

THE RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY 

AND TAKEN TOGETHER (EXP) AND FOR CONTROL GROUP K FOR THE LEARNERS 

OF SWEDISH. THE Y-AXIS EXPRESSES SUBJECTS PROGRESS IN TERMS OF 

A NUMBER BETWEEN 0 AND 1 (see text). NO STATISICALLY SIGNIFICANT 

DIFFERENCES. 

163

statistically significant. The Swedish learners (fig 9) 

show generally the same results in that the control group 

shows the most progress. None of the differences between 

groups were statistically significant here however. 

4. Discussion 

Let us return to the introduction and review our point of 

departure and main question in this research. Other 

researchers have found a positive effect of this methodology 

in 

the learning of prosodic elements of a foreign language. 

Our main question was similar to that of these 

researchers: 

"Do technical aids help in the learning of prosody?" 

Formulated as a O-hypothesis this question could be 

expressed as: Learners who use the technical aids will not 

achieve a more native-like production of the prosodic 

features of the target language when compared to learners 

who have used only traditional language laboratory 

methods. 

Our qualification of these formulations is of considerable 

importance to this research. That is, we are most interested 

in the "long term effects" of this methodology or, 

to put it 

somewhat differently, "How does this method work if set 

within the time and curriculum framework of a typical 

language course?" The results presented in section 3 seem 

to 

make it fairly clear that the technical aids methodology 

as we have applied it in this research does NOT seem to 

facilitate the learning of the prosody of a foreign language 

164

more than the traditional language lab methods. In fact, we 

could go even further on the basis of our results and say 

that, even though we often lack statistical significance, 

there are several clear indications that the subjects who 

used the traditional audio active comparitive method aquired 

a more native-like mastery of these features than the 

subjects who used the technical aid/feedback methodology. 

Let us now briefly discuss some possible reasons for these 

results and the discrepancy between them and the expected 

results based on earlier research. It should be mentioned 

here that our informal observation of our subjects use of 

the technical aids made us optimistic as to the teaching 

value of this methodology. 

The 

students were generally 

enthusiatic and stimulated 

by 

working with these 

instruments. Due to these observations and comparison with 

other such experiments mentioned above, we do not believe 

that our subjects were somehow "confused" by these technical 

instruments as some of our colleagues have suggested. The 

operation of our apparatus was no more complicated than in 

other experiments of this kind and therefore we consider 

this explanation of our results to be less than 

convincing. 

This is not to say, of course, that an improvement in the 

function of our instrumentation would not effect our 

results. Our instrumentation was, in fact, relativly 

primitive compared with what is currently available in the 

form of computerized instructional devices. 

A somewhat more 

appealing explanation of our results, though very general 

and vague, is that the learning of the information that is 

fed back via the instruction devices is somehow not as 

165

closely related to the linguistic aspects of the target 

features as has been assumed in the development and use of 

these methods. Then the question immediately arises as to 

why the methods have worked better in other research where 

the training was more concentrated to short sessions. It 

would seem that the proposed explanation that the training 

may not be related to the linguistic learning process should 

have had the same effect in the other research that was 

similar in many ways to that presented here. Perhaps the 

most obvious difference is that in our experiments the 

training was spread out in time. We cannot at present 

understand why this time factor can be interpreted so as 

to 

account 

for the difference in the effects of these methods. 

The discrepancy between our results and the results of 

this 

previous work is, then, unresolved. 

Closer scrutiny, this methodology presents some problems 

related to our difficulty in explaining our results and 

relating them to earlier research. How much do we really 

know about the phonetic identity of prosodic elements? The 

visual manifestation of fundamental frequency in speech does 

not necessarily reveal to the learner which of the details 

of this parameter are critical to the production of a 

natural sounding intonation in a particular language. The 

same thing is of course true for the much discussed but 

little understood phenomenon of rhythm or timing. 

Actually, 

we would need to know these details and point them out to 

the learner for the effective use of this method but the 

fact is that our knowlege is still very limited. 

The use of alternative sensory channels for feedback 

166

information to be used in learning of linguistic features 

seems, in large part to be based on a rather vague 

behavioristic assumption. The feedback is assumed to 

facilitate a successful production and the successful 

production is assumed to reinforce the behavior and thus 

facilitate learning. This methodology has been used with 

some success in both foreign language learning and the 

teaching of handicapped such as deaf and hard of-hearing. 

It seems that this success is a fully adequate motivation 

for the use of the metods but that the success must be 

equally difficult 

to explain as the apparent failure of the 

methods in our work. 

5. Conclusions 

The 

inspiration for the initiation of this research was the 

success of the previous work in this field mentioned in 

section 1 of this report. As phoneticians we were 

enthusiastic about the possibility of using some of the 

methods we were familiar with from speech research in a 

practical way in a language teaching setting. Our aim was 

to test this promising methodology in an actual language 

course situation. We have found that the answer to our 

original question "Do technical aids help in the learning of 

prosody?" seems to be that they do NOT. 

Or at least that we 

have not been able to show such effects in this research. 

Our O-hypothesis is therefore supported: 

learners who use 

167

the technical aids will NOT achieve a more native-like 

production of the prosodic features of the target language 

when compared to learners who use the traditional language 

laboratory methodology. Although the results we have 

reported here are somewhat disappointing from the point of 

view of the phonetician who would lik e to be able to apply 

some of his research methods, they are important from 

another. As was mentioned in the introductory section, 

there 

has been some considerable research interest in these 

questions but a lack of well controlled research. We 

consider 

our work here to be a contribution to the research 

that is needed in order to be able to answer definitively 

our original question as to the utility of technical aids in 

language teaching and how these aids should be designed. 

168

REFERENCES 

1. Abberton, E. and Fourcin, A. (1975). Visual feedback and 

the acquisition of intonation. In: Lenneberg and Lenneberg, 

QnQ1iQn§ Qf 1nggg gglQEgn1 g Ng YQrl £Qgi£ 

rg§§ EE 1§7=lQ§ 

2. Albertson, K. (1982). Teaching pronunciation with visual 

feedback. N11 Qrnl , 1982. 

3. Baker, R. L. (1984). An experience with voice based 

learning. CALICO Journal, March 1984. 

4. Bannert, R. (1979) . Rapport fran ut talsk 1 iniken. In: 

r1i§ lingi§1i 1 · Lund: Lunds universitet, Inst for 

lingvistik. 

5. Bot, K. de (1980). The role of feedback and feedforward 

in the teaching of intonation. §1g Vol 8, pp. 35-45. 

6. (1983). Visual feedback and 

effectiveness and induced practice behavior. 

Egg£h Vol 26, part 4 pp. 331-349. 

intonation I: 

1nggg nQ 

7. Branderud, P. (1979) Blod - a block diagram simulator. 

gril§ J Stockholm: Stockholms universitet, Inst for 

lingvistik. 

8. Clerici, 

rQnn£i1iQn 

institutionen. 

M. (1981). 

Stockholm: 

QQr§g QQ in ngli§h 

Stockholms universitet, Engelska 

9. Crystal, D. (1975). Non segmental phonology in language 

acquisition. A review of issues. In: D. Crystal Thg ngli§h 

TQng Qf YQi£g . London: Edward Arnold pp. 125-149. 

10. Hengstenberg, P. (1979). Er§gggn1li nQ §Eg1g 

ihrgr ygri111ng in §Er£hli£hgn 19hr= nQ 19rnErQg§§gn 

Tubingen: Gunther Narr Verlag. 

11. James, I. F. (1976). The acquisition of prosodic features 

of speech using a speech visualizer. JE1 Vol XIV/3 pp. 

227-243. 

12. Kelz, H. , Kropp, W. , and Kummer, M. (1977). Zur 

Vereinheitligung der Intonationskodierung im 

Fremdspracheunterricht. In: H. Kelz (ed) hQng1i§£hg 

grnQlgg Qgr §§Er£hg§£hlng 1 Forum Phoneticum, 4. 

Hamburg: 

Buske Verlag. 

13. Lane, H. and Buiten, R. (1966). A self instructional 

device for conditioning accurate prosody. In: A. Valdman 

( ed) TrgnQ§ in 1nggg Tg£hing New York: Academic 

Press. 

169

14. Leon, P. 

measurements. 

pp. 30-47 

and Martin, P. 

In: Bolinger (ed) 

(1970) . 

.!!LtQ!!!!!iQ!! 

Machines and 

Harmondsworth 

15. Marschall, R. and Rosenquist, H. 

Stock holm: Stock holms universitet, IES . 

(1983) . !l!!!!1 

., 

16. Martony, J. (1976). Om grundtonsfrekvensen hos gravt 

horselskadade och dova. CTM-rapport 3. 

17. Potter, R. , Kopp, G., and Green, H. (1948). Visible 

Speech. In. M. Joos £Q§!i£ rhQ!!!i£§ 1!!!!g!!g 24, 

Suppl. 

18. Spens, K. -E. , (1984). Hora med kanseln: Tak tila 

kommunikationshjalpmedel for dova - en forsk ningsoversikt. 

TRITA -TM 4-84. Stockholm: Kungl. Teknisk a Hogskolan, Inst 

for Taloverforing och Musik akustik. 

19. Vardanian, R. (1964). Teaching 

oscilloscope displays. 1!!!!g!!g 1!!r!!i!!g 

English through 

3-4 pp. 109-117. 

170

perilus v - Stockholms universitet

Create successful ePaper yourself

Delete template?

Save as template?