25.11.2014 Views

perilus v - Stockholms universitet

perilus v - Stockholms universitet

perilus v - Stockholms universitet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.<br />

UNIVERSITY OF STOCKHOLM<br />

INSTITUTE OF LINGUISTICS<br />

PERILUS V<br />

Starting with this issue, we will be changing slightly the publication<br />

policy of PERILUS. Earlier issues included experimental efforts of our<br />

graduate students in connection with their course work in<br />

experimental phonetics. Results of work on larger projects were, as a<br />

rule, published elsewhere. In the future we Will, of course, continue to<br />

publish our work in international periodicals. It is, however, our<br />

intention to mirror the entire spectrum of scientific activity in our lab<br />

through PERIL US. PERILUS can thus be viewed as our department's<br />

working papers in phonetics. We hope that this new PERILUS will serve<br />

as an effective avenue of communication with our colleagues in the<br />

field of phonetics. Copies of PERIL US are available from the Institute<br />

of Linguistics, Stockholm University, S-7 06 97 Stockholm, Sweden.<br />

Olle Engstrand<br />

Hartmut TraunmOller


THE PHONETICS LABORATORY GROUP<br />

Ann-Marie Alma<br />

Uif Andersson<br />

Peter Branderud<br />

Una Cunningham-Andersson<br />

Hassan Djamshidpey<br />

Ma1s Dufberg<br />

Olle Engstrand<br />


iii<br />

ACKNOWLEDGMENTS<br />

The research reported in this issue of PERILUS was sponsored in part<br />

by the following sources:<br />

THE SWEDISH COUNCIL FOR RESEARCH IN THE HUMANITIES<br />

AND SOCIAL SCIENCES<br />

THE SWEDISH NATURAL SCIENCE RESEARCH COUNCIL<br />

THE TRICENTENNIAL FOUNDATION OF THE BANK OF SWEDEN<br />

THE SWEDISH BOARD FOR TECHNICAL DEVELOPMENT


v<br />

PREVIOUS ISSUES OF PERILUS<br />

PERILUS 1 1978 - 1979<br />

Page<br />

1. INTRODUCTION<br />

Bjorn Lindblom and James Lubker<br />

4<br />

2. SOME ISSUES IN RESEARCH ON THE PERCEPTION<br />

OF STEADY-STATE VOWELS<br />

Vowel identification and spectral slope<br />

Eva Agelfors and Mary Graslund<br />

10<br />

Why does [a] change to [:l] when F O<br />

is increased?:<br />

Interplay between harmonic structure and forman frequency<br />

in the perception of vowel quality<br />

Ake Floren<br />

13<br />

Analysis and prediction of difference limen data<br />

for formant frequencies<br />

Lennart Nord and Eva Sventelius<br />

24<br />

Vowel identification as a function of<br />

increasing fundamental frequency<br />

Elisabeth Tenenholtz<br />

38<br />

Essentials of a psychoacoustic model of spectral matching<br />

Hartmut TraunmOlier<br />

49<br />

3. ON THE PERCEPTUAL ROLE OF DYNAMIC FEATURES<br />

IN THE SPEECH SIGNAL<br />

Interaction between spectral and durational cues<br />

in Swedish vowel contrasts<br />

Anette Bishop and Gunilla Edlund<br />

64<br />

On the distribution of [h] in the languages of the world:<br />

Is the rarity of syllable final [h] due to an asymmetry<br />

of backward and forward masking?<br />

Eva Holmberg and Alan Gibson<br />

68<br />

On the function of formant transitions<br />

I. Formant frequency target vs. rate of change in vowel identification 83<br />

II. Perception of steady vs. dynamic vowel sounds in noise<br />

92<br />

Karin Holmgren<br />

83<br />

Artificially clipped syllables and the role of formant transitions<br />

in consonant perception<br />

Hartmut TraunmOlier<br />

105


v<br />

4. PROSODY AND TOP DOWN PROCESSING<br />

The importance of timing and fundamental frequency contour<br />

information in the perception of prosodic categories<br />

Bertil Lyberg 123<br />

Speech perception in noise and the evaluation<br />

of language proficiency<br />

Alan C. Sheats<br />

134<br />

5. BLOD - A BLOCK DIAGRAM SIMULATOR<br />

Peter Branderud 151<br />

PERILUS II 1979-1980<br />

Page<br />

Introduction<br />

James Lubker<br />

Astudy of anticipatory labial coarticulation in the speech of children<br />

Asa Berlin, Ingrid Landberg and Lilian Persson 2<br />

Rapid reproduction of vowel-vowel sequences by children<br />

Ake Floren 19<br />

Production of bite-block vowels by children<br />

Alan Gibson and Lorrane McPhearson 26<br />

Laryngeal airway resistance as a function of phonation type<br />

Eva Holmberg<br />

44<br />

The declination effect in Swedish<br />

Diana Krull and Siv Wandeback<br />

58<br />

Compensatory articulation by deaf speakers<br />

Richard Schulman 74<br />

Neural and mechanical response time in the speech<br />

of cerebral palsied subjects<br />

Elisabeth Tenenholtz 87<br />

An acoustic investigation of production of plosives<br />

by cleft palate speakers<br />

Garda Ericsson 95


vi<br />

PERILUS '" 1982 - 1983<br />

Page<br />

Introduction<br />

Bjorn Lindblom<br />

Elicitation and perceptual judgement of disfluency and stuttering<br />

Anne-Marie Alme 3<br />

Intelligibility vs redundancy - conditions of dependency<br />

Sheri Hunnicut 27<br />

The role of vowel context on the perception<br />

of place of articulation for stops<br />

Diana Krull<br />

45<br />

Vowel categorization by the bilingual listener<br />

Richard Schulman 81<br />

Comprehension of foreign accents. (ACryptic investigation.)<br />

Richard Schulman and Maria Wingstedt 101<br />

Syntetiskt tal som hjalpmedel vid korrektion av dovas tal<br />

Anne-Marie Oster 115<br />

PERILUS IV 1984 - 1985<br />

Page<br />

Introduction<br />

Bjorn Lindblom<br />

Labial coarticulation in stutterers and normal speakers<br />

Ann-Marie Alme 3<br />

Movetrack<br />

Peter Branderud<br />

20<br />

Some evidence on rhythmic patterns of spoken French<br />

Danielle Duez and Yukihoro Nishinuma<br />

30<br />

On the relation between the acoustic properties of Swedish<br />

voiced stops and their perceptual processing<br />

Diana Krull 41<br />

Descriptive acoustic studies for the synthesis of spoken Swedish<br />

Francisco Lacerda 51


Frequency discrimination as a function of<br />

stimulus onset characteristics<br />

Francisco Lacerda<br />

66<br />

Speaker-listener interaction and phonetic variation<br />

Bjorn Lindblom and Rolf Lindgren 77<br />

Articulatory targeting and perceptual constancy of loud speech<br />

Richard Schulman<br />

86<br />

The role of the fundamental and the higher formants in<br />

the perception of speaker size, vocal effort, and vowel openness<br />

Hartmut TraunmOlier<br />

92


viii<br />

RECENT PUBLICATIONS<br />

AND PUBLICATIONS IN PROGRESS<br />

Una Cunningham-Andersson<br />

Durational correlates of post-vocalic voicing in English spoken by English and<br />

Spanish speakers. In Engstrand, O. (ed.): Papers from the Swedish Phonetics<br />

Conference held in Uppsala, Oct. 77-78, 7986, pp. 87-92.<br />

Olle Engstrand<br />

Salient features of Lule Sami pronunciation. In C.-C. E/ert (ed.): The Sounds of<br />

Lappish. University of Umea (in press).<br />

Articulatory correlates of stress and speaking rate in Swedish VCV utterances. J.<br />

Acoust. Soc. Am. (in press).<br />

The IRIS speech data base - a status report. In Engstrand, O. (ed.): Papers from<br />

the Swedish Phonetics Conference held in Uppsala, Oct. 77-78, 7986, pp.<br />

727-726.<br />

Diana Krull<br />

Spectrum and dynamics in the perception of stop consonants. In Engstrand, O.<br />

(ed.): Papers from the Swedish Phonetics Conference held in Uppsala, Oct.<br />

77-78, 7986, pp. 54-59, and contribution to the French-Swedish Research Meeting<br />

held in Grenoble, March, 7987.<br />

The locus-target relation in spontaneous speech. Contribution to the<br />

French-Swedish Research Meeting held in Grenoble, March, 7987.<br />

Evaluation of distance metrics using Swedish stop consonants. Proceedings of<br />

the 77th ICPhC, Tallinn 7987, Vol. 2, pp. 65-68.<br />

Bjorn Lindblom<br />

A typological study of consonant systems: the role of inventory size. In<br />

Engstrand, O. (ed.): Papers from the Swedish Phonetics Conference held in<br />

Uppsala, Oct. 77-78, 7986, pp. 7-9.<br />

Adaptive variability and absolute constancy in speech signals: two themes in<br />

the quest for phonetic invariance. Plenary Lecture, Proceedings of the 77th<br />

ICPhS, Tallinn 7987, Vol. 3, pp. 9-78.


ix<br />

Phonetic invariance and the adaptive nature of speech. Lecture presented at<br />

a symposium on 'Working models of human perception', celebrating the 30th<br />

anniversary of the Instituut voor Perceptie Onderzoek, Eindhoven, August 26-28,<br />

7987. Cambridge: Cambridge University Press.<br />

The concept of target and speech timing (with J. Lubker, B. Lyberg, P.<br />

Branderud and K. Holmgren). Festschrift for lise Lehiste. Dordrecht, The<br />

Netherlands: Foris (in press).<br />

Phonetic universals in consonant systems (with I. Maddieson). In L.M. Hyman<br />

and C.N. Li (eds. ): Language, brain and mind (in press).<br />

A model of phonetic variation and selection applied to the evolution of vowel<br />

systems. Presented in 7984 at a meeting at CASBS, Stanford. In S. -Y. W. Wang<br />

(ed.): Language transmission and change. New York: Blackwell (in press).<br />

Fonetik. Article submitted to the editorial board of Nationalencyklopedin.<br />

Evolution of spoken language (with P. MacNeilage and M. Studdert-Kennedy).<br />

Orlando, Florida: Academic Press (in preparation).<br />

Spraket, Lucy och datorn (with P. af Trampe). Stockholm: Bonniers (in<br />

preparation).<br />

Rolf Lindgren<br />

Phonetic reduction in spontaneous speech. Paper given at the TLH meeting in<br />

Lund, October 7987.<br />

Lennart Nord<br />

Acoustic studies of vowel reduction in Swedish. STL -QPSR 4/7986, 79-36.<br />

Vowel reduction in Swedish. In Engstrand, O. (ed. ): Papers from the Swedish<br />

Phonetics Conference held in Uppsala, Oct. 77-78, 7986, 76-27.<br />

Liselotte Roug<br />

Early phonetic development in four Swedish infants (with Ingrid Landberg and<br />

Lars-Johan Lundberg). In Engstrand, O. (ed. ): Papers from the Swedish<br />

Phonetics Conference held in Uppsala, Oct. 77-78, 7986, pp. 745-7 SO.


x<br />

Richard Schulman<br />

Articulatory dynamics of loud and normal speech. In Engstrand, O. (ed.):<br />

Papers from the Swedish Phonetics Conference held in Uppsala, Oct. 77-78,<br />

7986, pp. 60-64.<br />

Hartmut TraunmOlier<br />

Phase vowels. Psychophysics of speech perception, Dordrecht:<br />

M. Nijhoff PUbl., 7987, pp. 293-305.<br />

Some types of variation and invariant spec tro t features of vowels. In Engstrand,<br />

O. (ed. ): Papers from the Swedish Phonetics Conference held in Uppsala, Oct.<br />

77-78, 7986, pp. 48-53.<br />

Perceptual relativity in identification of two-formant vowels (with Francisco<br />

Lacerda). Speech Communication 5 (in press).<br />

An experiment on the cues to the identification of fricatives. Proceedings of the<br />

77th ICPhS, Tallinn 7987, Vol. 5., pp. 205-208.<br />

Maria Wingstedt<br />

Foreign accents and perceptual processing (with Richard Sculman). In<br />

Engstrand, O. (ed. ): Papers from the Swedish Phonetics Conference held in<br />

Uppsala, Oct. 77-78, 7986, pp. 93-97.<br />

DISSERTATIONS<br />

Garda Eriksson (1987). Analysis and treatment of cleft palate speech:<br />

Some acoustic-phonetic observations. Link6ping University Medical<br />

Dissertations No. 254. ISSN 0345-0082.<br />

Lennart Nord (1987). Acoustic-phonetic studies of Swedish with an<br />

excursion into pathological speech. TRITA-TO M-87-1. Department of<br />

Speech Communication and Musical Acoustics, Royal Institute of<br />

Technology, Stockholm. ISSN 0280-9850.


X<br />

CONTENTS OF PERILUS V<br />

Peter Branderud 1<br />

About the computer-lab<br />

Bjorn Lindblom<br />

Adaptive variability and absolute constancy<br />

in speech signals: two themes in the quest for<br />

phonetic invariance 2<br />

Richard Schulman<br />

Articulatory dynamics of loud<br />

and normal speech 21<br />

Hartmut TraunmOlier<br />

& Diana Krull<br />

An experiment on the cues to the<br />

identification of fricatives 33<br />

Diana Krull<br />

Second formant locus patterns<br />

as a measure of consonant-vowel<br />

coamculanon 43<br />

Madeleine Wulffson<br />

Exploring discourse intonation in Swedish 62<br />

Mats Dufberg<br />

Why two labialization strategies in Setswana? 78<br />

Liselotte Roug I<br />

Ingrid Landberg &<br />

Lars-Johan Lundberg<br />

Phonetic development in early infancy -<br />

a study of four Swedish children<br />

during the first 78 months of life 93<br />

Johan Stark & Mats Dufberg<br />

A simple computerized response<br />

collection system 140<br />

Robert McAllister I<br />

Mats Dufberg &<br />

Maria Wallius<br />

Experiments with technical aids in<br />

pronunciation teaching<br />

144


ABOUT THE COMPUTER-LAB<br />

Peter Branderud<br />

In the last year we have been able to build up a new and modern<br />

computer system with grants from Wallenberg Foundation and FRN.<br />

Our old computer system is from 1975-83. It consists of two<br />

mini-computers with 200 MB disk storage each. It can accommodate four<br />

users at the same time. There is software for signal processing,<br />

acoustical analysis/synthesis, simulation of perception/production etc.<br />

It can A/D convert up to 16 channels with 12 bits resolution into the<br />

computer and it can D/A convert 2 channels with 16 bits resolution from<br />

the computer.<br />

The new computer system consists of several Apollo workstations that<br />

are connected by a fast network. We will also connect several PC/XT/AT<br />

via an Ethernet network.<br />

Presently we have two Apollo DN3000 with black and white displays and<br />

two DN3000 with color displays. Each work station has about 4 MB<br />

primary memory. We have 450 MB harddisk memory. We use the operating<br />

system Unix 4.2 and the programming languages C, Pascal, Fortran 77 and<br />

Commonlisp. We also plan to get Prolog.<br />

There is a connection between the old and the new computer systems<br />

that enables a fast file-transfer between the systems. We can also run<br />

the old system through a window on the Apollo workstations. That makes<br />

it possible and easy to use the best software on each system on our<br />

data.<br />

We are also preparing to install the program package Audlab from the<br />

Alvey-group. We will continuously transfer the most important programs<br />

from our old system to the new Apollo system. Some of the work stations<br />

will also be equipped with A/D- and D/A-converters. For the printouts<br />

we use laser-writers.<br />

We will continuously expand the system: for example, Harddisks,<br />

laserdisks, primary memory, working stations, array-processors and<br />

software.<br />

These increased resources will make our lab more modern and complete,<br />

thereby strongly enlarging our possibilities to engage in larger<br />

projects and receive more guest researchers.<br />

In addition, direct interaction with other laboratories around the world<br />

will become easy and efficient.


ADAPTIVE VARIABILITY AND ABSOLUTE CONSTANCY IN<br />

SPEECH SIGNALS:<br />

TWO THEMES IN THE QUEST FOR PHONETIC INVARIANCE*<br />

BjOrn<br />

Lindblom<br />

ABSTRACT<br />

Our topic is the classical problem of reconciling the<br />

physical and linguistic descriptions of speech: the<br />

invariance issue. Evidence is first presented indicating the<br />

possibility of defining phonetic invariance at the<br />

articulatory, acoustic or auditory levels of the speech<br />

signal. However, as we broaden the scope of our review, we<br />

find that attempts to define phonetic invariance in terms of<br />

absolute physical constancies tend to lose ground to<br />

theories that recognize signal variability as an essentially<br />

systematic and adaptive consequence of the informational<br />

mutuality of natural speaker-listener interactions. We reach<br />

this conclusion not only by examining experimental data on<br />

on-line speech processes but also by analyzing typological<br />

evidence on how the phonetic structure of consonant systems<br />

vary in lawful patterns with inventory size.<br />

INTRODUCTION<br />

Traditionally the problem of invariance in phonetics<br />

can be said to be that of proposing physical descriptions of<br />

linguistic entities that have the characteristic of<br />

remaining invariant across the large range of contexts that<br />

the communicatively successful real-life speech acts present<br />

to us.<br />

Many of us share the conviction that taking steps<br />

towards the solution of this problem will be crucial if we<br />

are to acquire a deeper theoretical understanding of the<br />

behavior of speakers and listeners as well as develop more<br />

advanced systems for speech-based man-machine communication<br />

(PerkellKlatt 1986) .<br />

The present paper will attempt to address some of the<br />

questions that we typically encounter in the search for<br />

invariance. We shall do so by summarizing research<br />

undertaken mostly in our own laboratory in Stockholm.<br />

Although thus deliberately limiting the scope of our review<br />

we hope that the issues raised will nevertheless be of<br />

sufficient interest to stimulate general discussion.<br />

IS PHONETIC INVARIANCE ARTICULATORY?<br />

A few decades ago phoneticians began to interpret<br />

phonetic events by comparing articulations to highly damped<br />

oscillatory systems. More recently, such models have<br />

acquired an important role within the framework of action<br />

theory (Kelso, Saltzman and Tuller 1986) . In the sixties it<br />

was hoped that a lot of the variability that speech signals<br />

*Plenary address to be presented at the XIth International<br />

Congress of Phonetic SCiences, Tallinn, Estonia, August<br />

1987.<br />

2


typically exhibit e g reductions and vowel-consonant<br />

coarticulation (Ohman 1967) - could be explained in terms of<br />

the spatial and temporal overlap of adjacent Pmotor<br />

commands" (MacNeilage 1970) . Articulatory movements were<br />

seen as sluggish responses to an underlying forcing function<br />

which was assumed to change, usually in a step-wise fashion,<br />

at the initiation of every new phoneme (Henke 1966) . Owing<br />

to variations in say stress or speaking tempo different<br />

contexts would give rise to differences in timing for a<br />

given sequence of phoneme commands. Articulatory and<br />

acoustic goals would not always be reached, the so-called<br />

'undershoot' phenomenon (stevens and House 1963) . But since<br />

such undershoot appeared to be lawfully related to the<br />

duration and context of the gestures (Lindblom 1963) , the<br />

underlying articulatory "targets" of any given phoneme<br />

'die Lautabsicht' - would nevertheless, it was maintained,<br />

remain invariant. Accordingly, at that time it seemed<br />

possible to argue that phonetic invariance might be<br />

articulatory.<br />

Duration-dependent undershoot still seems to to be a<br />

phonetically valid notion for biomechanical reasons. But it<br />

is clearly not as inevitable a phenomenon as was first<br />

thought. Current experimental information indicates that in<br />

fast speech articulatory and acoustic goals can be attained<br />

despite short segment durations (cf Engstrand 1987, Gay<br />

1978, Kuehn and Moll 1976) . Furthermore undershoot has been<br />

observed in unstressed Swedish vowels that exhibit long<br />

durations owing to 'final lengthening' (Nord 1986) . Such<br />

deviations from simple duration-dependence appear to<br />

highlight the reorganizational abilities of the speech<br />

production system. One way of resolving the problem posed<br />

by these somewhat contradictory results might be obtained if<br />

it were shown that when instructed to speak fast subjects<br />

have a tendency to "overarticulate", thus avoiding<br />

undershoot to some extent, whereas when destressing they are<br />

more prone to "underarticulate" (cf discussion below of<br />

hypo- and hyper-speech) . The demonstration of languagespecific<br />

patterns of vowel reduction (cf Delattre's 1969<br />

discussion of English, French, German and Spanish) becomes<br />

particularly relevant in the context of addressing such<br />

questions.<br />

In summary, the original observations of 'undershoot'<br />

carried the implication that the invariant correlates of<br />

linguistic units were to be found, not in the speech wave<br />

nor at an auditory level, but upstream from the level of<br />

articulatory movement. Phonetic invariance was accordingly<br />

associated with the constancy of underlying "spatial<br />

articulatory targets" (for reviews of the target concept see<br />

e g MacNeilage 1970, 1980) . However, subsequent<br />

experimentation - some of which we already hinted at above -<br />

has revealed that the notion of segmental target must be<br />

given a much more complex interpretation.<br />

This conclUSion is reinforced particularly strongly by<br />

studies of compensatory articulation. Let us summarize some<br />

results from an experiment using the so-called "bite-block"<br />

paradigm (Lindblom, Lubker, Lyberg, Branderud, Holmgren in<br />

press) . Native Swedish speakers were asked to pronounce<br />

monosyllables and bi- and trisyllabic words under two<br />

3


conditions: normally and with a large bite-block between<br />

their teeth. They were instructed to try to produce the<br />

bite-block utterances with the same rhythm and stress<br />

pattern as the corresponding normal items. Real Swedish<br />

words as well as "reiterant" nonsense forms were used: To<br />

exemplify, one of the metric patterns was: - '- - This<br />

pattern would occur in the lists as "begabbaN and<br />

Iba'bab:ab/. Measurements were made of the duration of the<br />

consonant and vowel segments of the normal and the biteblock<br />

versions of the reiterant speech samples. The question<br />

was thus whether subjects would be able achieve the bilabial<br />

closure for the Ibl segments in spite of the abnormally low<br />

and fixed jaw position and whether they would be able to do<br />

so reproducing the normal durational patterns.<br />

We found that the timing in the bite-block words<br />

deviated systematically but very little from the normal<br />

patterns and concluded that our subjects were indeed capable<br />

of compensating. To explain the results we suggested that a<br />

representation of the Ndesired end-product" - the metric<br />

pattern of the word - must be available in some form to the<br />

the subjects' speech motor systems and that the successful<br />

compensations implied a reorganization of articulatory<br />

gestures that must have been controlled by such an outputoriented<br />

target representation. These results are in<br />

agreement with those reported earlier by Netsell, Kent and<br />

Abbs (1978) . Moreover, they are completely analogous to the<br />

previous demonstrations that naive speakers are capable of<br />

producing isolated vowels whose formant patterns are normal<br />

at the first glottal pulse in spite of an unnatural jaw<br />

opening imposed by the use of a Nbite-block" (Lindblom,<br />

Lubker and Gay 1979, Gay, Lindblom and Lubker 1981) .<br />

These results bear on the recent discussion of speech<br />

timing as "intrinsicallyN or "extrinsicallyN controlled.<br />

Proponents of action theory (Fowler, Rubin, Remez and Turvey<br />

19BO) approach the physics of the speech motor system from a<br />

dynamical perspective with a view to reanalyzing many of the<br />

traditional notions that now require explicit representation<br />

in extant speech production models such as 'feedback loop',<br />

'target' etc. Their writings convey the expectation that<br />

many aspects of the traditional "translation models" will<br />

simply fall out as consequences of the dynamic properties<br />

intrinsic to the speech motor system. In the terminology of<br />

Kelso, Saltzman and Tuller (1986, 55) N • . • . , both time and<br />

timing are deemed to be intrinsic consequences of the<br />

system's dynamical organization. N Methodologically, action<br />

theory is commendable Since, being committed to interpreting<br />

phonetic phenomena as fortutitous (intrinsic) consequences<br />

rather than as controlled (extrinsic) aspects of a speaker's<br />

articulatory behavior, it guarantees a maximally thorough<br />

examination of speech production processes. However, it is<br />

difficult to see how, applying the action theoretic<br />

framework to the data on compensatory timing just reviewed,<br />

we could possibly avoid postulating some sort of Ntemporal<br />

targetN representation which is (i) extrinsic to the<br />

particular structures executing the gestures and which is<br />

(ii) responsible for extrinsically tuning their dynamics.<br />

Speech production is a highly versatile process and<br />

sometimes appears strongly listener-oriented.<br />

4


The plasticity of the speech motor system is further<br />

illustrated by an experiment recently done by Schulman<br />

(forthcoming) invoking a "natural bite-block" situation.<br />

This condition is provided by loud speech in which a more<br />

open mandible tends to be used than in normally spoken<br />

syllables.<br />

Whether rounded or not the vowels of loud test words<br />

produced by Schulman's talkers were found to exhibit almost<br />

three times as large jaw openings as the corresponding<br />

segments in the normal words. In the context of compensatory<br />

articulation two observations call for special comments. Why<br />

do not speakers compensate for the greater jaw opening in<br />

the loud vowels the way they do in the bite-block<br />

experiments? Schulman shows that they do not since the<br />

fundamental frequency and (as predicted by articulatoryacoustic<br />

nomograms) the first formant of the loud vowels are<br />

shifted upwards by about one Bark whereas the other formants<br />

do not undergo comparable modification. (Below we shall<br />

relate the F1 and FO shift to the results of a perceptual<br />

experiment) .<br />

The other finding of interest is the fact that loud<br />

vowel durations increase whereas loud consonant durations<br />

tend to decrease (cf Fonagy and Fonagy 1966) . What does that<br />

result mean? The normal-loud vowel duration differences look<br />

suspiciously similar to the durational differences between<br />

normal open and close vowels which have been observed for<br />

many languages (Lehiste 1970) . Finding that the duration of<br />

the EMG recorded from the anterior belly of the digastric<br />

correlated with both mandibular displacement and vowel<br />

duration Westbury and Keating (1980) suggest that this<br />

temporal variation among vowels, although non-distinctive,<br />

must be seen as present in the neuromuscular signals<br />

controlling their articulation. An alternative<br />

interpretation would be to regard the differences as<br />

automatic consequences of an interaction between an<br />

invariant underlying "vowel duration command" and<br />

articulatory inertia (cf Keating 1985 for further<br />

discussion) . In (Lindblom 1967) we reported some evidence in<br />

favor of the latter interpretation, the "extent of movement<br />

hypothesis" (Fischer-Jorgensen 1964) . We also found that the<br />

durational consequences of more extensive articulatory<br />

gestures were sometimes actively counteracted.<br />

The question whether the open-close vowel duration<br />

difference is an intrinsic or extrinsic phonetic phenomenon<br />

is accordingly somewhat controversial. Schulman's findings<br />

bear on the problem. He constructed a model of loud speech<br />

based on the observation that loud movements appear to be<br />

"exaggerated" versions of the normal movements. Assuming<br />

that the lips and the jaw are linear mechanical systems and<br />

that loud differs from normal speech solely in terms of the<br />

amplitudes of the underlying excitation forces he performed<br />

a linear scaling of all articulatory parameters recorded for<br />

normal syllables (vertical displacements of upper and lower<br />

lips and jaw) and combined the scaled curves so as to derive<br />

the vertical separation of the lips - the parameter that<br />

determines the open-closed state of the mouth opening. By<br />

using the value of this parameter at opening and cloSing in<br />

the normal syllables as his criterion he was then able to<br />

5


predict the durations of vowel and consonant segments for<br />

loud speech. He found that linear scaling eliminated stop<br />

closures entirely or produced much too long vowels.<br />

The implication of this result is that it clearly<br />

attributes the durational differences to a superposition<br />

effect, that is the interaction arising from the<br />

superposition of the lip and the jaw movements. Schulman<br />

concludes that, unless the effect of opening and closing of<br />

the jaw had been actively counteracted, loud and normal<br />

vowel durations would have differed even more than they<br />

actually did.<br />

Let us remark in the present context that, while it<br />

appears reasonable to suggest, as do Westbury and Keating,<br />

that the acoustic vowel duration differences are probably<br />

reflected at a level of neuromuscular control, there is also<br />

evidence indicating that the function of neural control<br />

signals may be a compensatory rather than a positive one,<br />

that is a function opposite to that suggested by Westbury<br />

and Keating.<br />

The preliminary implication of all work touching the<br />

theme of compensatory articulation appears to be that<br />

whether we use utarget" with reference to segmental<br />

attributes, segment durations or patterns of speech rhythm -<br />

the term is better defined, not in terms of any simple<br />

articulatory invariants, but with respect to the acoustic<br />

output that the talker wants to achieve. If phonetic<br />

invariance is not articulatory could it be acoustic then?<br />

IS PHONETIC INVARIANCE ACOUSTIC?<br />

The suggestion that the speech signal contains absolute<br />

physical invariants corresponding to phonetic segments and<br />

features has received a lot of attention thanks to the work<br />

by Stevens and Blumstein (Stevens and Blumstein 1978, 1981;<br />

Blumstein and Stevens 1979, 1981) . The idea has been<br />

favorably received by many, for instance Fowler in her<br />

attempts to apply the perspective of direct perception to<br />

speech (Fowler 1986) .<br />

Others have been provoked to emphasize the inadequacy<br />

of the non-dynamic nature of the Stevens template notion<br />

(Kewley-Port 1983) and the substantial context-dependence<br />

that the stop consonants of various languages typically<br />

display even in samples of carefully enunciated speech<br />

(Ohman 1966) .<br />

Recent work by Krull and Lacerda in our Stockholm<br />

laboratory uses the method of quantifying the extent of<br />

consonant-vowel coarticulation in the form of linear "locus<br />

equations". These relationships are obtained by plotting<br />

formant frequencies at CVz- and V1C-boundaries as a function<br />

of the formants for Vz and V1 respectively. Acoustic theory<br />

indicates that for the consonant-vowel combinations in<br />

question near-linear relationships should be expected. Such<br />

diagrams show clearly that, although a ulocusu pattern can<br />

exhibit considerable variation, it is predictable from<br />

information on stop consonant identity and adjacent vowel<br />

context. Here coarticulation stands out as the salient fact<br />

and the lack rather than the presence of absolute acoustic<br />

invariance tends to be reinforced.<br />

6


Inc identally, let us note that, if it ex ists, acoust ic<br />

invar iance is a strange not ion since talkers can only<br />

mon itor it through their senses and listeners can only<br />

access it through the ir hear ing system. Why should sensory<br />

and aud itory transduct ion be assumed to have a transfer<br />

funct ion of one impos ing no transformat ion? Is it the case<br />

that what people really mean when they talk about acoustic<br />

invar iance is in fact Uauditory· invar iance? Let us look at<br />

some psycho-acoust ic results.<br />

IS PHONETIC INVARIANCE AUDITORY?<br />

We ment ioned earl ier a perceptual result that offers a<br />

rather cur ious parallel to Schulman's find ings. It is the<br />

uTraunmCller effect· wh ich is a demonstrat ion of the<br />

transforms requ ired to preserve the perceptual constancy of<br />

vowel quality under changes in (i) vocal effort and (i i)<br />

vocal tract size. It is also somewhat rem inscent of the<br />

find ings on FO-F1 interrelat ionsh ips in soprano vowels<br />

(Sundberg 197).<br />

Effort and vocal tract var iat ions can be dramat ically<br />

illustrated by synthetically modify ing a naturally spoken<br />

IiI. When all formants and FO are sh ifted equally along a<br />

Bark scale an IiI-l ike vowel is perce ived but the voice<br />

changes from an adult's to a ch ild's. When both F1 and FO<br />

are var ied in such a way that F1-FO is kept constant on a<br />

Bark scale - and the upper formant complex is left unchanged<br />

- an IiI-l ike vowel is perceived. Th is is remarkable in view<br />

of the fact that F1 reaches a value more typical of a lowpitched<br />

I /. One's impress ion is that the speaker rema ins<br />

the same but that she ·shouts·.<br />

Note the parallel between Schulman's and TraunmCller's<br />

results. Are the find ings causally related? Do we expla in<br />

the lack of formant compensat ion in loud speech in terms of<br />

the TraunmCller effect? Or do we account for the vowel<br />

qual ity results in terms of the ·Schulmanu effect?<br />

Of importance for the present discuss ion is the fact<br />

that behav ioral constanc ies have been demonstrated and that<br />

they imply that at least in this case phonet iC invar iance<br />

must be def ined at a level of auditory representat ion.<br />

Let us return for a moment to the alleged invar iance of<br />

the release spectra of stop consonants. Diana Krull<br />

collected perceptual responses from Swed ish listeners to<br />

burst fragments obtained from V1C:VZ words (Krull 1987). One<br />

hundred test words were generated by constructing all<br />

poss ible combinat ions of V1 or Vz = short Ii e a 0 ul with<br />

C: · Ib: d: rd: g:/. Confus ion matr ices for the burst<br />

st imul i demonstrate the drastic coart iculat ion effects. By<br />

and large, listener responses can be accounted for in terms<br />

of the acoust ic propert ies of the st imul i. Th is is shown in<br />

her attempts to predict the confus ions from aud itor ily based<br />

·perceptual distance- computat ions.<br />

A related study has been carr ied out by Lacerda (1986).<br />

W. can characterize one part of his research as var ia tions<br />

on the theme struck by Flanagan in his .arly -difference<br />

limen· exper iments on vowel formant frequenc ies (Flanagan<br />

19). Lac.rda's quest ion was: How well can listeners<br />

discr im inate four-formant st imuli that differ solely in<br />

terms of the frequency of F2. His work perm its us to compare<br />

7


a psycho-acou.tic task: th. discrimination of F2 in bri.f<br />

ton. burst. with formant patt.rns static - with a "spe.ch<br />

task": the discrimination of th. onset of F2-transitions in<br />

Ida/-.timuli.<br />

The r.sults indicate that the subjects' ability to<br />

discriminate on the p.ycho-acoustic task is in close<br />

agr.em.nt with Flanagan'. findings whereas th.ir performance<br />

on the Ida/-stimuli is drastically impaired. One<br />

interpretation is that the di.crimination chang. is related<br />

to the fact that intra-category di.crimination is<br />

considerably worse than inter-cat.gory discrimination<br />

(Liberman. Harris, Hoffman and Griffith 197).<br />

With reference to the invariance issue it is important<br />

to note the following. Krull's results on .top perception<br />

indicate that the coarticulatory spectral variability of the<br />

stop releas.s is rather accurat.ly reflected in the<br />

confusions that her listeners made of such brief sounds.<br />

This is fully compatible with Lacerda's results on tone<br />

bursts. Note that in Lacerda's speech-task t.st however, the<br />

variability does not seem to b. as faithfully mirrored in<br />

the listeners' percepts for apparently th.y treat stimuli<br />

easily discriminabl. in psycho-acoustic tests as "the sameM•<br />

Wh.ther it i. the list.n.r invoking the "speech modeM or it<br />

is the interaction of the dynamic stimulus properties and<br />

speech-ind.pendent auditory processing is an issue still<br />

worth addr.ssing. However, our main point is this: The<br />

invariance that we disc.rn in these findings i8 not<br />

acoustic. It cl.arly presupposes auditory processing.<br />

IMPLICATIONS OF SPEAKING STYLE: THE HYPER-HYPO DIMENSION<br />

Everyday .xperienc. indicates that .peaking is a highly<br />

flexible proc.... W. are capable of varying our style of<br />

sp.ech from fast to slow, 50ft to loud. casual to clear.<br />

intimate to public. W. speak in diff.rent ways when talking<br />

to foreigner., babies, computers and hard of hearing<br />

persons. And we change our pronunciation as a function of<br />

the social rules that govern speaker-listener interactions<br />

(Labov 1972).<br />

Above we considered principally three types of phonetiC<br />

invariance: articulatory, acoustic and auditory invariance.<br />

What are the implications of variations in speaking style<br />

for the invariance is.ue? For the purpose of our discu.sion<br />

let us give phonetic invariance a strong literal<br />

interpretation which is rather extreme but neverthele.s not<br />

too far from working hypotheses explored previously by<br />

various investigators: MAll the information is in the<br />

Signal, particularly in it. dynamic.". For .uch a view of<br />

invariance to be correct - let us call it the strong ver.ion<br />

of ab.olute physical invariance - the following must be<br />

true: Talkers vary their .peaking style and thereby<br />

contribut. to increaSing the variability of the spe.ch wave<br />

but in utterances that are intelligible lingui.tic units<br />

will always exhibit a core of invariant physical information<br />

that will remain unde.troyed so a. to b. succes.fully u.ed<br />

by a listener.<br />

8


We recently undertook a literature survey+ in order to<br />

systematize the types of .peech materials that have been<br />

u.ed in<br />

acoustic phonetic studies published during the past<br />

t.n years in J Acoust Soc Am, J of Phon.tics, Language and<br />

Speech, and Phanetica. A total of over 700 articles were<br />

.elected as preliminarily relevant. W. ended up choosing 216<br />

as me.ting our crit.rian of Ndescriptiv. study of speech<br />

based on quantitative acoustic phonetic measur.ments N•<br />

Of .p.cial intere.t to us was to ascertain the relative<br />

proportion. of studies inv •• tigating Uself-g.n.ratedU .peech<br />

(including e 9 spontaneous conversation) on the one hand and<br />

speech samples chosen by the experimenter (e 9 list<br />

readings, nan.ense words etc) an the oth.r. Not<br />

.urprisingly, we found that the majority of studies, over<br />

90%, use experimenter-controll.d sp •• ch .amples. The rea.on<br />

is clear. A satisfactory experimental deSign pre.upposes<br />

good control of the variabl.s invalv.d. This is l.ss of a<br />

problem if the experimenter determines the test items but<br />

for Nreal speech- with its immense number of variables there<br />

is no establi.hed methodology that will guarantee such<br />

control. So rather than drown in an ocean of Nunknown<br />

factorsu our .trategy tends naturally to become one of<br />

resorting to Ugiven- test mat.rials and read speaking mode ••<br />

One way of justifying this widely used procedure is to<br />

argue that fir.t we will solve the problem of phonetic<br />

invariance in Nlab speech·. Then we will get to work on<br />

Nnatural speechu. Another outlook might be to suggest that,<br />

although we lack the supplementary methodology required by<br />

Uecological N .pe.ch, the exces.ive use of ulab speech N<br />

introduces an undesirable bias in our data bases as well a.<br />

in our theoretical intuitions about invariance and other key<br />

i •• ues - a bias that might make us underestimate the problem<br />

of speech variability in .pite of the fact that it is<br />

readily acknowledged by all workers in the field and has<br />

already, it would app.ar, be.n rath.r massively docum.nted.<br />

Consequently the situation ought to be balanced.<br />

We have recently been persuaded by the latter point of<br />

view and are currently recording (1) ·self-generated speech N<br />

produced under natural condition. and (2) parallel Ncitation<br />

form· .peech ba.ed on the syllables, words and phrases that<br />

occur in the spontaneous mat.rials. Data are currently being<br />

collected by Rolf Lindgren, Diana Krull and myself using<br />

this two-pronged approach involving compari.on. of ref.renc.<br />

pronunciation. (Ncitation formu speech) with samples of<br />

Uself-generated speech-. A few preliminary observations can<br />

be made that bear on the present discussion (cf also<br />

Lindblom and Lindgren 198).<br />

The reductions that we have found in spontaneous speech<br />

- and often escape the trained phon.tic ear even aft.r<br />

spectrographic evidence has been examined - are sometimes<br />

drastic. Speaking style has marked effects on the acoustic<br />

patterns of words. The vowel space shrinks in casual style<br />

and is expanded in Nhyper.peechu modes. The dlphthongization<br />

+1 am indebted to Diana Krull for doing the preliminary<br />

s.l.ctions and to Nata.ha B.ery of the Phonology Laboratory,<br />

Univ.r.ity of California, Berkel.y for the .tatistical<br />

analy ••••<br />

9


of ten.e Swedi.h vowel. i. enhanced and i. p.rticularly<br />

apparent in clear speech. Contrast in VOT for voiced and<br />

voicele.s .tops incre •••• and d.cr.ases a. we compare hyper­<br />

.nd hypo-form. resp.ctively. Locus equations .how a smaller<br />

slope (=less vowel-dependence) for citation form<br />

pronunciations than for .pontaneou. speech which w.<br />

interpret to indic.t. th.t vow.l-consonant coarticulation i.<br />

count.r.cted in hyper.p •• ch (mor. inv.ri.nce) but tol.rat.d<br />

in hypo.peech (le.s inv.riance). Although prelimin.ry the<br />

obs.rv.tion. made so far .ugge.t th.t the pro.pect. for any<br />

.trong ver.ion of .bsolute phy.ic.l inv.riance to b •<br />

• ub.tantiated s.em most unfavor.ble.<br />

SPEECH UNDERSTANDING: CIN)DEPENDENCE OF SIGNAL INFORMATION<br />

At the Department of Romanc. L.nguag.s at Stockholm<br />

University a te.t is u.ed to measure how proficient native<br />

Swedi.h .tud.nts are in under.tanding .poken French in which<br />

the t.sk of the students i. to listen to triads of stimuli<br />

con.isting of two identical .entence. and one minimally<br />

diff.rent and to indicat. the odd cas ••<br />

Montre-l.ur ce chapeau s'i1 te pla t<br />

Montre-l.ur c. chapeau .'il te plat<br />

Montre-leur ces chapeaux .'il t. pl t<br />

Nativ. speakers of French have no problems of course<br />

with such .entence. wh.rea. Sw.dish li.ten.rs knowing no<br />

French have a lot of trouble. However, when the key<br />

inform.tion - • 9 the ce/ce/ce. triad - is pres.nted as<br />

fragments gated<br />

from the original •• ntenc •• the p.rformanc.<br />

of the Sw.dish subject. improv.s radically (Dufberg and<br />

steek forthcoming).<br />

This test can s.rv. to remind u. that perception is a<br />

product of two things: signal-dependent and signalindependent<br />

information. While I am perfectly capable of<br />

di.criminating the Fr.nch minimal contra.t. as auditory<br />

patterns I would quickly lose those patterns in a sentence<br />

context unle.s I have a suffici.ntly good command of French<br />

- that i. acce.. to .ignal-indep.ndent 'knowledge' whose<br />

interaction with the signal is a part of forming of the<br />

final percept.<br />

Th • • peech literature i. full of experimental data<br />

indicating that proc ••••• not primarily driv.n by the signal<br />

play an important role in the perception of speech. There<br />

will not be time to do justice to all the r.search b.aring<br />

on this issue.<br />

Let m. just recall some well-known paradigms:<br />

Perception of .pe.ch in the pres.nc. of various disturbance.<br />

(noise and distortion). The improvement of identification a.<br />

the .ignal g.ts lingui.tically richer (Miller, Hei.e and<br />

Lichten, Pollack and Pickett 1964 and by Miller&Isard).<br />

Detection of delib.rate mi.pronunciation. (Cole 1973). Word<br />

frequency effects (How.s, Savin). Restoration (Warren 1970,<br />

Ohala and Feder 1986). Phoneme monitoring (Foss&Blank). Word<br />

recognition from word fragments (Grosjean 1980, Nooteboom<br />

1981). Fluent restor.tion. in .hadowing mi.pronunciations<br />

(Mar.len-Wil.on .nd Welsh 1978). V.rbal tr.n.form.tion.<br />

(W.rren). Int.lligibility of lip-r.ading from vid.orecordings<br />

.upplemented by "hummed spe.chY - an audio .ignal<br />

10


proc •••• d to contain primarily<br />

(Ri.b.rg 1979). Inferences from<br />

(Ohala 1981).<br />

rhythm and<br />

historical<br />

intonation cu.s<br />

sound<br />

changes<br />

CONCEPTUALIZING SPEAKER-LISTENER INTERACTIONS<br />

Our review of .xp.rimental evidence bearing on the<br />

invariance i.sue has b.en •• lect ive but should n.vertheless<br />

provide a rough indication of a panoply of alt.rnativ.<br />

position. and their respective pro's and con's. We have<br />

considered the sugg.stion that the invariance of phon.t ic<br />

segments be defined: (i) at an articulatory level (e 9 the<br />

uspatial targetu hypothes is), (ii) at an acoustic level (e g<br />

spectral propert ies of stops), (iii) at an auditory level (e<br />

9 p.rceptual constancy of vowel quality). Which of the.e<br />

alternative • • hould we put our money on?<br />

When pursued .xperimentally articulatory, acoustic or<br />

auditory def init ions of invariance have the methodological<br />

virtue of .ncouraging a maximally thorough s.arch at th.se<br />

particular levels. But in seeking a broader theoretical<br />

und.r.tanding of sp.ech communication w. would .tand little<br />

to gain from .p.nding effort on choosing between levels.<br />

Such an appoach misr.ads the ev id.nce which, when view.d in<br />

a broader perspective, strongly pOints to the conclusion<br />

that: The invariance problem i. not a phonetiC i.sue at all<br />

for ultimately invariance can b. defined only at the level<br />

of listener comprehen. ion.<br />

We can convince ourselves of the correctness of that<br />

point by considering the following phrase in Engli.h:<br />

Il.snsevn/. We can h.ar this utterance either as LESS THAN<br />

SEVEN, or as LESSON SEVEN. In the appropriate contexts (.ay<br />

uHow many are comingU, and ·What is our topic to-day?H) the<br />

list.ner will not be aware of any ambiguity. At which<br />

phonetiC level do we find the physical correlates of the<br />

initial segments of the word Uthan-? Ne.dle.s to say there<br />

no such correlates in th is part icular case. The<br />

conclu.ion •• ems inescapabl.: W. should not put our mon.y on<br />

any of the above alt.rnatives. We must seek a more general<br />

theory.<br />

Th. experimental data on production indicates that the<br />

behavior of the speech motor syst.m i • • hap.d primarily<br />

two force. - plasticity (li.tener-orient.d reorganization)<br />

and economy (talk.r-oriented simplification) which<br />

int.ract on a short-term ba.is 50 as to gen.rate .ignal.<br />

that may be<br />

Urich or poorN in .xplicit physical information.<br />

Th. evid.nce on perc.ption has id.ntifi.d two major<br />

source. of information: .ignal-dependent and signalindep.ndent<br />

proc..... and .ugge.t. that on a .hort-term<br />

basi. percepts arise from the latter (i e NcontextU)<br />

modulating the former in an analogou.ly Urich or poorN<br />

manner.<br />

One pos.ible way of schematizing the log ical<br />

po.sibilitit •• of the.e conc.ptual .implification. i • • hown<br />

in the diagram of the enclosed figur •• This is not a very<br />

rigorous sch.m. but .eems useful, at l ••• t pedagogically, in<br />

contrasting some of the ideas curr.ntly entertain.d in<br />

phon.tics (cf J of Phon.tic., January i •• u. 1986).<br />

Thi. graph states that for sp •• ch to be intell igible<br />

the .um of explicit physical information and signal-<br />

by<br />

11


independent inform .. tion must b. ..bov. a threshold, that is<br />

the 13 d.gr •• lin •• In the ideal ca.e this sum equals a<br />

const .. nt the x- and y-v .. lu •• of sp.cific spe.ch ... mples<br />

falling right on that 11 ne. Points above the line are<br />

.... ociat.d with wh .. t might b. termed "over-clear" .p•• ch,<br />

points below it with "unintelligible" sp •• ch.<br />

MUTUALITY OF SPEAKER-<br />

LISTENER INTERACTION<br />

I<br />

Z<br />

W<br />

C<br />

Z<br />

w<br />

a..<br />

w<br />

c<br />

z<br />

Ī<br />

...J<br />


It appears reasonable to assume that in the real-life<br />

situations utterances can vary tremendously with respect to<br />

how socially and communicatively successful they prove to<br />

be. For our present purposes let us focus on speech samples<br />

from hypothetically successful real-life speaker-listener<br />

interactions and assume that they produce data points<br />

clustering near and above the slant line. What would such a<br />

result imply? It would mean that there is a complementary<br />

relation between the amounts of information contributed by<br />

signal attributes on the one hand and ·contextN on the<br />

other. When speakers come close to the slant line it would<br />

indicate first of all that they are capable of varying their<br />

speech output in a plastic way (cf evidence on hypo-hyperspeech<br />

modes and other instances of reorganization of speech<br />

motor control) and secondly that, while perhaps not being<br />

perfect 'mind-readers', they are at least capable of<br />

adapting their speech on-line to the short-term fluctuations<br />

in the listener's access to NcontextU or signal-independent<br />

information (cf experimental documentation of numerous cases<br />

showing that listeners are in fact capable of successfully<br />

coping with highly context-dependent reduced and<br />

coarticulated speech stimuli). The possibility of such<br />

complementarity in real speech emerges also from some recent<br />

measurements reported by Hunnicutt (198) as well as from<br />

Lieberman's 1963 study.<br />

If we hypothesize that this strategy - let us call it<br />

the STRATEGY OF ADAPTIVE VARIABILITY - comes near the way<br />

real speakers actually behave when they are communicatively<br />

successful, we obtain a natural way of resolving some of the<br />

paradoxes that surround the invariance issue. For it follows<br />

that intra-speaker phonetic variation - along a hyper-hypocontinuum<br />

as well as along other dimensions is the<br />

characteristic that we should expect the units of ecological<br />

speech to exhibit - not absolute physical invariance.<br />

The proposed way of thinking about the issue does not,<br />

of course, rule out finding physical speech sound invariance<br />

in restricted domains of observation but it does explain why<br />

our quest for a general concept of phonetiC invariance has<br />

been largely unsuccessful. And, in a pessimistiC vein, it<br />

predicts in fact that it will continue to be so.<br />

Our reasoning leads us back to a conclusion already<br />

drawn by MacNeilage in his 1970 review of the invariance<br />

issue:<br />

N • • • the essence of the speech production process<br />

is not an inefficient response to invariant<br />

central signals, but an elegantly controlled<br />

variability of response to the demand for a<br />

relatively constant end (p 184)N.<br />

If, as sugge.ted here, we take the Nrelatively constant<br />

endN to be defined neither articulatorily, acoustically nor<br />

auditorily but specified only with reference to Nthe level<br />

of listener comprehensionu MacNeilage's formulation .till<br />

captures the Ne.sence of the speech production proce.sN<br />

saUsfactori lYe<br />

Let us pause to reflect on some of the implications of<br />

the two theories contrasted in our discussion: Absolute<br />

13


Physical Invariance versus Adaptive Variability. The former,<br />

if proved correct, would transform what currently looks like<br />

instances of massive variability into artefacts. For this<br />

theory says in fact that there simply IS NO variability of<br />

linguistic units. There seems to be but that is merely a<br />

result of our presently inadequate conceptual and<br />

experimental tools. Further note that if we push the notion<br />

of absolute constancy to it. extreme another implication can<br />

be noted, namely that the transmis.ion of information by<br />

speech - an undeniably biological process - is basically<br />

non-adaptive.<br />

The Theory of Adaptive Variability, on the other hand,<br />

says exactly the opposite. This is a theory for which it is<br />

easier to find support within the general study of the<br />

biology of motor control and perception. It is precisely by<br />

emphasizing the adaptive nature of speech processes that we<br />

obtain a principled way of investigating phonetiC variation<br />

and its origin.<br />

ON-LINE PROCESSES IN THE LIGHT OF TYPOLOGICAL EVIDENCE ON<br />

CONSONANT SYSTEMS<br />

Some time ago Nooteboom did an experiment on word<br />

retrieval and was able to show that listeners perform better<br />

if presented with the first halves of words than on the<br />

corresponding second-half fragments (Nooteboom 198 1). For an<br />

explanation he sugge.ted that, since word recognition is a<br />

real-time left-to-right process, word beginnings are less<br />

predictable than word endings. Consequently left-to-right<br />

context can be much more easily used than right-to-Ieft<br />

context.<br />

He concluded his paper by raising the question whether<br />

this asymmetry - that he take. to be a universal feature of<br />

the perceptual proceSSing of any language - might have left<br />

its imprint an how lexical information is organized in the<br />

languages of the world. He predicted (p 422) that: U(l) in<br />

the initial position there will be a greater variety of<br />

different phonemes and phoneme combinations than in word<br />

final pOSition, and (2) ward initial phonemes will suffer<br />

le.s than word final phonemes from assimilation and<br />

coarticulation rule •• u<br />

One basic assumption is that variations in perceptual<br />

predictability correlate with signal udistinctivenessu•<br />

Hence -the greater variety of different phonemes and phoneme<br />

combinationsu in the initial as compared with the final<br />

position of words. Restating the idea we can say that a<br />

larger paradigm goes with a RICHER signal inventory. The<br />

other side of the coin is of course that a smaller paradigm<br />

- such as that attributed to word endings - goes with a<br />

POORER signal inventory. In suggesting that the presence of<br />

assimilation and coarticulation should vary inversely with<br />

the need for keeping items distinct Nooteboom taCitly<br />

formulates a hypotheSiS that comes close to the theory of<br />

Adaptive Variability described here. Note that the theory<br />

Absolute Physical Invariance do nat offer us any b.sis at<br />

all for making predictions about a possible interplay<br />

between language structure and on-line processing. Why? As<br />

stated earlier according to that theory there IS no phonetiC<br />

variation, there only seems to be. The idea of language<br />

14


structure adapting to the on-line constraints of speaking<br />

and listening only becomes a possibility once we recognize<br />

the existence and systematic nature of phonetic variation.<br />

Only from that point of departure will we be able to address<br />

the question of what feeds the processes of phonological<br />

innovation.<br />

We shall not be in a position to present the<br />

typological data needed to test Nooteboom's hypothesis.<br />

However, we shall conclude our paper by presenting some<br />

other data that do bear on it and strongly encourage further<br />

examination of the underlying ideas.<br />

In collaboration with Ian Maddieson we recently<br />

undertook an analysis of the consonant inventories of 317<br />

languages, carefully selected so as to constitute a<br />

reasonable sample of the "languages of the world". Our<br />

corpus was that of UPSID, the UCLA Phonetic Segment<br />

Inventory Database (Maddieson 1984) . The data consists of<br />

lists of systems whose elements (allophones of major<br />

phonemes) are specified in phonetic transcription.<br />

Inventory sizes range from 6 to 95 consonants per<br />

system. The materials lend themselves to testing a<br />

paraphrase of Nooteboom's hypothesis: Is the phonetic<br />

structure of consonant systems independent of their size? Or<br />

is it systematically related to that dimension? If there is<br />

a systematic size-dependence what is it?<br />

There is neither time nor space to give the details of<br />

the analysis. They will be published elsewhere (Lindblom,<br />

MacNeilage and Studdert-Kennedy; Lindblom and Maddieson<br />

forthcoming) . Fortunately, Nooteboom's perspective provides<br />

us with a way of summarizing the main findings.<br />

It turns out that small paradigms statistically favor<br />

segments with both phonatory and articulatory properties<br />

that can be classified as basic or elementary. Medium-sized<br />

paradigms tend to include consonants invoking more<br />

elaborated gestures in addition to a core of basic elements.<br />

The largest systems use both these types but also<br />

combinations of elaborated gestures that we label complex<br />

articulations. To exemplify, plain Ip t kl are classfied as<br />

"basic" articulations whereas ejective Ip ' t ' k \ 1 or<br />

aspirated Ip t k / invoke -elaborated· mechanisms. A<br />

segment such as It ' is ·complex· since it shows more than<br />

one elaboration: both of place (retroflexion) and source<br />

features (aspiration). Logically a six-consonant system<br />

could use the eJective set for its stop series. Small<br />

systems never do in our material whereas medium-sized and<br />

large systems do. Moreover, the "complex", multiply<br />

elaborated segments are most frequent in the large<br />

inventories. The basic rule is that a less simple consonant<br />

tends not to be recruited without the presence of parallel<br />

more simple ("basic" or "elaborated") series (cf the notion<br />

of 'implicational hierarchy' of traditional terminology).<br />

The claim we make is accordingly that we see a positive<br />

correlation between paradigm size and the number of elements<br />

that a sound pattern selects from a dimension of<br />

"articulatory complexity".<br />

The validity of our analysis naturally hinges on the<br />

success with which we can give non-Circular, independently<br />

motivated definitions of ·articulatory complexity". When it<br />

15


comes to the details of the analysis that problem is a topic<br />

for future quantitative phonetic theory. For the moment we<br />

believe that the major trends are rather gross effects that<br />

can be convincingly demonstrated by the force of the<br />

examples. They permit us to make the following<br />

generalization: Small consonant paradigms invoke 'unmarked'<br />

phonetiCS, large paradigms 'marked' phonetics. That is of<br />

course exactly what Nooteboom's hypothesis predicts and it<br />

takes a few steps towards an explanation for why sevenconsonant<br />

systems do not show inventories like the following<br />

(Ohala 1980):<br />

We take the present typological data on consonant<br />

systems as providing strong evidence in favor of (a)<br />

language structure evolving as an adaptation to the<br />

constraints of the on-line processes of speaker-listener<br />

interaction. and for (b) the correctness of a theory of<br />

Adaptive Variability as an account of those processes.<br />

REFERENCES<br />

Blumstein S and Stevens K N (1979): UAcoustic Invariance in<br />

Speech Production: Evidence from Measurement of<br />

the Spectral Characteristics of Stop Consonants N,<br />

J Acoust Soc Am 72, 43-30.<br />

Blumstein S and Stevens K N (198 1): ·Phonetic Features and<br />

Acoustic Invariance in Speech", Cognition 10, 23-<br />

32.<br />

Cole R A (1973): NListening for Mispronunciations: A Measure<br />

of What We Hear during Speech N, Perception and<br />

Psychophysics 13, 13- 16.<br />

Delattre, P (1969): NThe General Phonetic Characteristics of<br />

Languages: An Acoustic and Articulatory Study of<br />

Vowel Reduction in Four Languages", Mimeographed<br />

Report, University of California, Santa Barbara.<br />

Engstrand, 0 (1987): NArticulatory Correlates of Stress and<br />

Speaking Rate N, accepted for publication in J<br />

Acoust Soc Am.<br />

Flanagan, J (19): uA Difference Limen for Vowel Formant<br />

Frequency·, J Acoust Soc Am 27:6 13-6 14.<br />

Fischer-Jrgensen E (1964): "Sound Duration and Place of<br />

Articulartion N, Zeltschrift fdr Sprachwissenschaft<br />

und Kommunikationsforschung 17: 17-207.<br />

16


Fonagy I and Fonagy J (1966): "Sound Pressure Level and<br />

Duration", Phonetica 15: 14-2 1.<br />

Fowler C A, Rubin P, Remez R E and Turvey M T (1980):<br />

"Implications for Speech Production of a General<br />

Theory of Action", 373-420 in Butterworth, B (ed):<br />

Language Production, vol I, London:Academic Press.<br />

Gay, T (1978): "Effect of Speaking Rate on Vowel Formant<br />

Movements", J Acoust Soc Am 63 ( 1):223-230.<br />

Gay T, Lindblom B and Lubker J (198 1): "Production of Bite­<br />

Block Vowels: Acoustic Equivalence by Selective<br />

Compensation", J Acoust Soc Am 69 (3), 802-8 10.<br />

Grosjean, F (1980): "Spoken Word Recognition and the Gating<br />

Paradigm", Perception and Psychophysics 28, 267-<br />

283.<br />

Henke, W J (1966): Dynamic Articulatory Model of Speech<br />

Production Using Computer Simulation, Doctoral<br />

dissertation, M. I. T.<br />

Hunnicutt, S (1985): "Intelligibility<br />

Conditions of Dependency",<br />

28 ( 1):47-56.<br />

versus<br />

Redundancy<br />

Language and Speech<br />

Keating, P (1985): "Universal Phonetics and the Organization<br />

of Grammars", 115- 132 in Fromkin, V A (ed):<br />

Phonetic Linguisticst Orlando, FL:Academic Press.<br />

Kelso J A S, Saltzman, E L and Tuller, B (1986): NThe<br />

Dynamical Perspective on Speech Production: Data<br />

and Theory", J of Phon 14: 1, 29-59.<br />

Kewley-Port, D (1983): "Time-varying Features as Correlates<br />

of Place of Articulation in stop Consonants", J<br />

Acoust Soc Am 73:322-355.<br />

Krull, D (1987): "Evaluation of Distance Metrics Using<br />

Swedish stop Consonants", paper submitted to the<br />

Xlth ICPhS, Tallinn, Estonia.<br />

Kuehn, D P and Moll, K L (1976): "A Cineradiographic Study<br />

of VC and CV Articulatory Velocities·, J of Phon<br />

4:303-320.<br />

Labov, W (1972): Sociolinguistic Pattern.,<br />

Philadelphia:University of Pennsylvania.<br />

Lehiste, I (1970): Supra.egmentals, Cambridge, MA:MIT Press.<br />

Lieberman, P (1963): "Some Effects of Semantic and<br />

Grammatical Context on the Production and<br />

Perception of Speech", Language and Speech 6: 172-<br />

187.<br />

17


Lib@rman A M, Harris K S, Hoffman H S and Griffith B C<br />

(1957): "The Discrimination of Spe@ch Sounds<br />

within and across Phoneme Boundaries", J of<br />

Exp@rim@ntal Psychology 54:358-368.<br />

Lindblom, B (1963): ·Spectrographic Study of<br />

Reduction·, J Acoust Soc Am 35:1773-1781.<br />

Vowel<br />

Lindblom, B (1967): ·Vowel Duration and a Mod@l of Lip<br />

Mandibl@ Coordination", STL-QPSR 4/1967, 1-29 (Dep<br />

t of Speech Communication, RIT, Stockholm).<br />

Lindblom B, Lubker J and Gay T (1979): -Formant Fr@quencies<br />

of Som@ Fix@d-Mandible Vowels and a Model of<br />

Speech Motor Programming by Predictive<br />

Simulation", J of Phon@tics 7, 147-161.<br />

Lindblom B, Lubker J, Lyberg B, Branderud P and Holmgren K<br />

(in press): NThe Concept of Target and Speech<br />

TimingU, to appear in Festschrift for lise<br />

L@hist@, (Foris:Dordrecht).<br />

Lindblom, B and Lindgr@n R (1985): ·Speaker-Listener<br />

Interaction and Phonetic Variation-, Perilus IV,<br />

Dept of Linguistics, University of Stockholm.<br />

Lindblom B, MacNeilage P and<br />

(forthcoming): Evolution<br />

Orlando, FL:Academic Press.<br />

Studdert-Kennedy<br />

of Spoken Language,<br />

M<br />

Lindblom, B and Maddieson, I (1988): ·Phon@tic Universals in<br />

Consonant Systems·, to appear in Hyman, L M and<br />

Li, C N (eds): Language, Speech and Mind, Croom<br />

Helm.<br />

MacNeilage, P (1970): -Motor Control of S@rial Ordering of<br />

Speech-, Psychological Review 77:182-196.<br />

MacN@ilag@, P (1980): -Speech Productionu, Language and<br />

Spe@ch 23 (1), 3-24.<br />

Maddieson, I (1984): Patterns of Sound, Cambridge:Cambridge<br />

UniverSity Press.<br />

Marslen-Wilson, W D and Welsh, A<br />

Interactions and Lexical<br />

Recognition in Continuous<br />

Psychology 10, 29-63.<br />

(1978): uProcessing<br />

Access during Word<br />

Sp@echu, Cognitive<br />

Netsell R, Kent, R and Abbs J (1978): UAdjustm@nts of th@<br />

Tongue and Lip to Fixed Jaw Positions during<br />

Speech: A Preliminary Reportu, Conference on<br />

Speech Motor Control, Madison, Wisconsin.<br />

Nooteboom, S G (1981): "Lexical Retrieval from Fragments of<br />

Spoken Words: Beginnings vs Endings·, J of<br />

Phonetics 9, 407-424.<br />

18


Nord, L (1986): MAcoustic Studies of Vowel Reduction in<br />

SwedishM, STL-QPSR 4/1986, 19-36 (Dept of Speech<br />

Communication, RIT, Stockholm).<br />

Chala, J J (1980): -Chairman's Introduction to Symposium on<br />

Phonetic Universals in Phonological Systems and<br />

their ExplanationU, 184-18 in Proceedings of the<br />

IXth International Congress of Phonetic Sciences<br />

1979, Institute of PhonetiCS, University of<br />

Copenhagen.<br />

Chala, J J (1981): NThe Listener as a Source of Sound<br />

ChangeN, 178-203 in Masek, C S, Hendrick, R A and<br />

Miller, M F (eds): P.p.r. from the P.r ••••• ion on<br />

L.ngu.g • • nd B.h.vior, Chicago:Chicago Linguistic<br />

Society.<br />

Chala, J J and Feder, D (1986): -Speech Sound Identification<br />

Influenced by Adjacent NRestored- PhonemesN, J<br />

Acoust Soc Am 80. S110.<br />

tlhman, S (1966): MCoarticulation in<br />

Spectrographic MeasurementsN,<br />

39:11-168.<br />

VCV<br />

Utterances:<br />

J Acoust Soc Am<br />

Hhman, S (1967): -Numerical Model of CoarticulationM, J<br />

Acoust Soc Am 41:310-320.<br />

Perkell, J and Klatt, D (1986): Inv.ri.nce .nd V.ri.bility<br />

in Speech Proc ••••• , Hillsdale, N J:LEA.<br />

Pollack, I and Pickett, J M (1964): -Intelligibility of<br />

Excerpts from Fluent Speech: Auditory vs<br />

Structural Context-, J Verb Learn and Verb Beh<br />

3:79-84.<br />

Risberg, A (1979): Doctoral dissertation, RIT, Stockholm.<br />

Schulman, R (forthcoming): ·Articulatory Dynamics of Loud<br />

and Normal Speech-, submitted to J Acoust Soc Am.<br />

Stevens, K N and House A S<br />

Articulations by<br />

Acoustical Study-,<br />

128.<br />

(1963): -Perturbation of Vowel<br />

Consonantal Context: An<br />

J Speech Hearing Res 6:111-<br />

Stevens K N and Blumstein S (1978):<br />

Place of Articulation in<br />

Acoust Soc Am 64, 138-1368.<br />

NInvariant Cues for<br />

Stop Consonants-, J<br />

Stevens K N and Blumstein S (1981): NThe Search for<br />

Invariant Correlates Phonetic Features·, in Eimas,<br />

P and Miller J (eds): Per.p.ctiv •• on the Study of<br />

Spe.ch, Hillsdale, N J:LEA.<br />

Sundberg, J (1975): -Formant Technique in a Professional<br />

Singer·, Acustica 32 (2), 89-96.<br />

19


Traunmdller, H (1981): ·Perceptual Dimension of Openness in<br />

Vowels·, J Acoust Soc Am 69, 146-147.<br />

Warren, R (1970): -Perceptual Restoration of Missing Speech<br />

Sounds·, Science 167, 392-393.<br />

Westbury, J and Keating P (1980): ·Central Representation of<br />

Vowel Duration-, J Acoust Soc Am 67, Suppl 1, S37<br />

(A) •<br />

20


ARTICULATORY DYNAMICS OF LOUD AND NORMAL SPEECH*<br />

Richard Schulman<br />

Introduction<br />

The present study was initiated to compare the movements and timing<br />

relationships of the lips and jaw for normal and loud speech<br />

productions.<br />

Swedish vowels varying in several degrees of openness were<br />

produced in a bilabial stop context.<br />

Considering findings reported in<br />

the literature, we could expect to observe the following for the loud as<br />

compared with normal productions:<br />

1) There will be an increase in jaw opening for all vowels<br />

(Schulman 1985).<br />

2) Sussman et al. (1973), Gay (1977), and Macchi (1985) have<br />

all found that jaw postion during bilabial stops is lower before and<br />

after open vowels than close vowels.<br />

should be even more pronounced here,<br />

This coarticulatory relationship<br />

given the increase in jaw openings<br />

for all vowels.<br />

3) Folkins and Abbs (1975) reported that when increasing jaw<br />

opening by resistive loading during the closure of bilabial stops,<br />

closure was achieved primarily by compensatory movement of the upper lip<br />

and to a lesser extent by the lower lip which affects a more elevated<br />

position in respect to the jaw. Given expectation , we should, in our<br />

"natural bite-block" condition,<br />

also see such compensation.<br />

4) In the bite block work of Netsell, Kent and Abbs (1978)<br />

compensatory lip movement during bilabial closure was accompanied by<br />

increases in the velocities of the articulators in order to achieve<br />

closures of the same duration as for normal productions.<br />

If the analogy<br />

between bite block and loud speech is a valid one, similar temporal<br />

characteristics should also be observed.<br />

What effects this will have on<br />

the coordination of the individual articulators is uncertain.<br />

Gay<br />

(1977) reports specific timing relationships between the articulators<br />

for bilabial stops.<br />

Will these relationships be maintained despite the<br />

increased movement and velocities expected for the loud productions?<br />

Will similar articulatory patterns be observed for the vowels?<br />

* Paper presented at the Swedish Phonetics Conference, Uppsala, October<br />

17-18, 1986<br />

21


Procedure<br />

A magnetometer system (Branderud, 1985) was used to track the<br />

movements of the lips and jaw.<br />

Magnetic coils were placed along the<br />

mid-sagittal plane on the vermil ion border of the upper and lower lips<br />

and at the base of the incisors. The movements recorded were in the y­<br />

plane for all three articulators and in addition in the x-plane for the<br />

jaw.<br />

An electroglottograph was used to register the opening and closing<br />

of the glottis.<br />

Two channels with different gain settings were used for<br />

recording the audio signal.<br />

All signals were recorded simultaneously<br />

with a Racal seven channel tape recorder at a speed of 30 ips.<br />

Four<br />

subjects were used, three male and one female, between the ages of 22<br />

and 30.<br />

The speech of all subjects is typical of that for speakers of<br />

standard Swedish.<br />

The recordings were made in a sound-treated booth at<br />

the Phonetics Lab, Stockholm University.<br />

The speech material consisted of twelve Swedish vowels appearing in<br />

a li'b_bl frame. By placing an unstressed Iii before the the first Ibl<br />

in the frame, one might induce the jaw to attain a similar degree of<br />

openness during the stop closure regardless of the following vowel's<br />

openness.<br />

In other words, due to the high posi tion of the jaw for the<br />

Iii, its start position from the Ibl should be minimally low and<br />

relatively the same for all vowels to follow. One must, however,<br />

acknowledge that this presents us with a conflict of intent, for in the<br />

process of influencing the jaw's position during the bilabial as<br />

expressed here, we would be reducing the right to left coarticulatory<br />

effects wh ich we have set out to study (point 2 in the introduction).<br />

Six lists of the words, each in different orders,<br />

were written on<br />

separate cards and were held by the author during the actual recording.<br />

On each card were fifteen words, that is, all the twelve vowels<br />

appearing in the frame, plus three additional words. In Swedish, both<br />

[£] and [;Q] have the same orthographic representation, "1:1.". Therefore,<br />

for purposes of clarification, the stimuli containing these vowels were<br />

preceded by real words containing the appropriate vowel quality: "i<br />

b1:l.ver" for [E] and "i b1:l.r" for [a]. In addition, one of the remaining<br />

ten words was repeated in list final position to eliminate "end of<br />

list" pronunciation for the preceding word.<br />

The productions of these<br />

three<br />

additional stimuli were not examined during the analysis.<br />

All the lists were read through first with normal vocal effort then<br />

with loud effort.<br />

The first list for both conditions was considered a<br />

22


practice list to be discarded during analysis.<br />

Prior to the recording,<br />

this first list was read through several times allowing the subject to<br />

familiarize himself with the material. For each subj ect, several<br />

attempts were necessary before feeling comfortable in distinguishing<br />

between the two "ibl\b "'s. Despite this procedure, subj ects were often<br />

requested to retake productions of these words during the actual<br />

recording .<br />

Summary of Results<br />

Though there is decided variability between subjects, in general we<br />

have found that for loud speech as compared with normal speech the<br />

following (as illustrated by the data for subject H.H.) holds true:<br />

1. In vowel production, movement increases dramatically for all<br />

articulators in regular, predictable fashions. For the jaw,<br />

distinctions in vowel height are maintained (Figs.<br />

la and 1b), while the<br />

lips clearly reflect the differences in degree of rounding and<br />

spreading. (Fig. 2)<br />

2. Coarticulation is manifested as a right to left effect on jaw<br />

pos it ioning. The increased displacement of the jaw for the loud<br />

stressed vowels had the consequence of causing its highest position in<br />

the preceding bilabial to lower by almost twice its normal amount<br />

(exception speaker C. K.) (Fig. 3)<br />

Coarticulation is also demonstrated<br />

by the nearly identical positioning of the jaw for the initial and<br />

stressed Iii and the following bilabial.<br />

3. The lowered jaw position for bilabial closure provides an<br />

articulatory setting strongly reminiscent of that induced by applying<br />

artificial perturbations to the jaw.<br />

This gives us cause to regard the<br />

shouting paradigm as providing us with a "natural" bite block.<br />

4. In deference to this "natural" bite block during the bilabial<br />

closure, motor equivalence (c.f. Hughes and Abbs, 1976) is demonstrated,<br />

whereby the upper lip compensates for the lowered jaw (hence,<br />

inferior<br />

lower lip position) achieving closure even more complete than for normal<br />

production. (see Table I)<br />

5. It was demonstrated that increased articulatory movement cannot<br />

always be depicted as a simple linear amplification of normal<br />

articulation by scale factors, but also entails a more complex goaloriented<br />

reorganization of specific movements. (Fig. 4)<br />

6. Greater displacements are accompanied by increased veloc ities<br />

23


(Fig. 5). For loud speech, this results in shorter durations of<br />

intervocalic bilabials.<br />

Durations of stressed loud vowels are, however,<br />

somewhat longer than normal productions (Fig. 6). To achieve durations<br />

equal to or shorter than those for normal speech, the ratios of velocity<br />

to displacement would have to have been greater (as was the case with<br />

speaker T.L.).<br />

We are thus presented with CV sequences only slightly<br />

longer for loud speech as compared with normal speech.<br />

Both<br />

phonological distinctions (short/long) and the inherent length of vowels<br />

associated with openness are maintained during loud speech.<br />

7. The timing order of the articulators is neither very stable nor<br />

predictable, across vowel contexts, speech conditions and speakers.<br />

More synchrony is found between the lower lip and the jaw in the<br />

production of loud vowels as compared with normal, whereas the converse<br />

is true for synchrony between the upper lip and the lower lip and jaw.<br />

References<br />

Branderud, P. (1985). "Movetrack - A movement tracking system". PERILUS<br />

IV. Stockholm University. 20-29<br />

Folkins, J. and Abbs.,J (1975). "Lip and jaw motor control during speech:<br />

Responses to resistive loading of the jaw". J. Speech Hearing Res.<br />

, 207-220<br />

Gay, T.<br />

(1977). "Articulatory movements in VCV sequences", J. Acoust.<br />

Soc. Am. £l, 183-193<br />

Hughes, O. and Abbs, J.<br />

(1976). "Labial-mandibular coordination in the<br />

production of speech: Implications for the operation of motor<br />

equivalence". Phonetica ll, 199-221<br />

Mac chi, M. (1985). .s.!:!!!!!.! !! EE.s.!:!!!!!.! .f!.E !! !iE !!<br />

i ! E !.i£ ! !.£E · Ph D. dissertation, New York University.<br />

McAllister, R., Lubker, J. and Carlson, J. (1974). "An EMG study of some<br />

characteristics of the Swedish vowels". Journal of Phonetics l,<br />

267-278<br />

24


Netsell, R., Kent, R. and Abbs,J. (1978). "Ad justments of the tongue and<br />

lips to fixed j aw positions during speech: A preliminary report ".<br />

Paper presented at the Conference on Speech Motor Control,<br />

University of Wisconsin, Madison,<br />

Wisconsin.<br />

Schulman, R. (1985). "Articulatory targeting and perceptual constancy of<br />

loud speech". PERILUS IV. Stockholm University. 86-91<br />

Sussman, H.M., MacNeilage,P. F. and Hanson,R.J. (197 3). "Labial and<br />

mandibular dynamics during the product ion of bilabial consonants:<br />

Preliminary observations ", J. Speech Hering Res. , 397-420<br />

25


TABLE I.<br />

Mean displacement in millimeters from rest position of<br />

articulators at point of minimum lip separation during the first<br />

bilabial stop of the /i'bVb/ test words.<br />

Values are averaged for twelve<br />

Swedish vowels (5 tokens for each vowel) for normal (X N ),<br />

loud (X L ) and<br />

the difference between normal and loud productions (LN ) '<br />

FIG. 1. Jaw opening during production of stressed Swedish vowels<br />

plotted according to their traditional phonological classification in<br />

terms of front, central, back and rounding.<br />

Normal and loud productions<br />

for each speaker appear in separate plots (la and lb,<br />

respectively).<br />

FIG. 2. Maximum lip separation for loud speech plotted against normal<br />

productions for the stressed vowels.<br />

The traditional terms inrounded,<br />

out rounded and spread are used.<br />

Two regression lines are fitted to the<br />

data: for out rounded vowels and for spread vowels. Inrounded vowels are<br />

not included in the calculation of these lines.<br />

FIG. 3. Position of the jaw at minimum displacement during the<br />

production of the first bilabial stop against the jaw's position at<br />

maximum displacement during the following vowel.<br />

Separate regression<br />

lines are fitted for normal and loud productions.<br />

FIG. 4. Position of the jaw at maximum displacement during the<br />

production of the frame initial segment ([i)) against the position of<br />

jaw at minimum displacement in the following bilabial stop.<br />

A single<br />

regression line is fitted to both normal and loud productions.<br />

FIG. 5. Averaged movements of the: (a) upper lip component; (b) lower<br />

lip component;<br />

(c) jaw component for speaker H. H. 's productions of<br />

/i'bab/. For each articulator the normal (l), loud (2) and normal<br />

multiplied by a scale factor (3) movements are presented.<br />

Figure d<br />

shows lip separation for this se quence and is produced by subtracting<br />

the movement curves of Figure a from the sum of the curves in Figures b<br />

and c.<br />

The normal and loud audio signals for this se quence is displayed<br />

26


in the top of each figure.<br />

FIG. 6. Peak velocity of the jaw during movement from the closure of the<br />

first bilabial stop to its position of maximum displacement during the<br />

stressed vowel versus displacement of jaw from the position of minimum<br />

excursion (bl) to maximum excursion (V).<br />

One regression line is fitted to<br />

normal and loud productions .<br />

FIG. 7.<br />

Acoustic durations are plotted for each vowel context for normal<br />

and loud productions of the intitial bilabial stop and the stressed vowel.<br />

The vowel segment (white bar) begins at the termination of the bilabial<br />

stop (black bar).<br />

27


PARJ>.METER X N<br />

X L<br />

l:. XLN X N<br />

X L<br />

l:. XLN X N<br />

X L<br />

l:.X LN<br />

X N<br />

X L<br />

l:.X LN<br />

LL-UL -1.00 .03 1. 03 -1 .51 -2.15 -.64 -2.06 -.83 1. 23 -.51 2. 19 2.70<br />

LL -1.96 -3.59 -1 .63 -2.30 -2.88 -.58 -4.16 -6.09 -1.93 -.80 1. 94 2.74<br />

JY -3.09 -5.57 -2.48 -4.16 -5.08 -.92 -5.81 -8.16 -2.35 -4.46 -3.33 1. 13<br />

LL-J 1.13 1. 97 .84 1. 87 2.20 .33 1. 65 2.07 .42 3.66 5.27 1. 61<br />

UL -.95 -3.62 -2.67 -.78 -.73 .05 -2.10 -5.26 -3.16 -.30 -.25 .05<br />

Table I<br />

ARTICULATORY<br />

H.H. T.L. P.H. C.K.<br />

N<br />

00


Figure 1a.<br />

NORMAL<br />

i<br />

<br />

5 e "<br />

i<br />

10<br />

a<br />

u<br />

0<br />

0<br />

a<br />

"<br />

E<br />

E<br />

'"'<br />

::3:<br />


..<br />

Fi gure 2.<br />

3S<br />

"" 30<br />

E<br />

E<br />

v<br />

a<br />

:::) 2S<br />

0<br />

-J<br />

. .<br />

Z<br />

0<br />

..<br />

20<br />

I-<br />


Figure 4.<br />

/'"'.<br />

E<br />

E<br />

\./<br />

10<br />

/'"'.<br />

-.-4<br />

\./<br />

-J<br />


..<br />

..<br />

Figure 6.<br />

,...<br />

0')<br />

)<br />

E<br />

V'<br />

400<br />

350<br />

>- 300<br />

<br />

-<br />

U<br />

0<br />

-l<br />

W<br />

><br />

250<br />

(y=7.1x+1O.9)<br />

=-<br />

<br />

0<br />

-J<br />

Cl<br />

z<br />


AN EXPERIMENT ON THE CUES TO THE IDENTIFICATION OF FRICATIVES<br />

HARTMUT TRAUNMULLER<br />

DIANA KRULL<br />

ABSTRACT<br />

Synthetic fricatives with two spectral peaks scanning a wide range of<br />

frequencies were put into three versions of the context [a £:] , also<br />

generated synthetically, and imitating a male speaker (1), a child (2),<br />

and an aroused male speaker (3) with elevated Fa and Fl. The stimuli<br />

were presented in two orders, with increasing or decreasing frequencies<br />

of the spectral peaks, to 16 speakers of Swedish who identified the<br />

fricatives as [f] , [s] , [c], [], or [ 6]. In a given context, the<br />

obtained phonetic boundaries followed mainly the spectral peak lowest in<br />

frequency, while the upper peak contributed only marginally even if it<br />

was at a distance less than the "critical distance" of about 3 Bark. In<br />

context (2), as compared with (1), the phonetic boundaries were shifted<br />

up, but less (in Bark) than the vowel formants.<br />

INTRODUCTION<br />

It is well known that the characteristic frequencies, i. e. , the<br />

frequencies of the formants and the fundamental in speech sounds with a<br />

given phonetic quality vary with the overall dimensions of the speaker's<br />

vocal tract. If the characteristic frequencies of vowels are converted<br />

into a measure of tonotopical place, such as critical band rate (Bark),<br />

differences in speaker size can be seen to correspond to a tonotopic<br />

translation of the auditory pattern of excitation [11].<br />

Identifications of synthetic two-formant vowels revealed that a uniform<br />

tonotopic compression of the auditory pattern of excitation with a<br />

fixed point in the region of F3 also preserves phonetic quality [12].<br />

Natural vowels are transformed in this way in shouting and in whispering<br />

[ 11 ] .<br />

The present investigation is about the transformations the spectra of<br />

voiceless fricatives can be subjected to without affecting their phonetic<br />

quality. It is known that voiceless fricatives can be synthesized<br />

satisfactorily with two resonances and one antiresonance and that the<br />

33


cues to the phonetic identity of voiceless sibilants reside mainly in<br />

the stationary part of their spectrum, while the transitions are more<br />

important for non-sibilants [5, 7]. One-parameter sibilants can be<br />

synthesized using a resonance and an antiresonance one octave lower in<br />

frequency [5] . Such sibilants lack intrinsic cues to speaker size. In<br />

spectrogram reading, the Swedish voiceless sibilants can be distinguished<br />

by the frequency of spectral energy onset while there is more<br />

variation, even within the same speaker and context, in the detail above<br />

that frequency [6]. A second characteristic spectral peak can, however,<br />

often be discerned and one question we address here is whether this second<br />

peak is used to normalize for speaker size. We also investigate in<br />

how far a vocalic context can serve this purpose.<br />

METHODS<br />

Subjects<br />

The experiments were conducted with a group of 20 native and 6 nonnative<br />

speakers of Swedish, all employees or students at the Institute<br />

of Linguistics at Stockholm University. None of them reported auditory<br />

handicaps and all were familiar with the phonetics of Swedish, possessing<br />

Iff, ls i, //, and /J/. We report here the results of 16 native<br />

speakers with uniform behavior, mostly speakers of the local variety<br />

with the distributional allophones [] and [51 for / J /, but including<br />

three speakers of southern varieties, who had no [] in their own<br />

speech.<br />

Stimuli<br />

The stimuli were synthetic VCV sequences. The vocalic segments had<br />

been obtained by synthetic imitation of a natural [a 1 s:f: ], produced by<br />

a male speaker of Swedish (Stockholm variety). A three parameter voice<br />

source [3] signal in accordance with that utterance was generated by the<br />

procedure described in [12]. The vocalic as well as the fricative segments<br />

were generated in serial synthesis by use of a block diagram<br />

simulating program (sampling at 16 kHz, 16 bit/sample). Eight vowel<br />

formants were used. Their bandwidths obeyed the standard relation<br />

Bi<br />

=<br />

0.05 Fi + 50 Hz.<br />

The fricatives were generated by feeding white noise through a high-<br />

34


pass and a low-pass resonance filter, both of second order and with<br />

Q=10. The two resonance frequencies Fl and Fh were varied in steps of a<br />

factor 4 1/9 (approx. 1.0 Bark). 42 combinations of Fl and Fh were used<br />

to scan the auditory space as shown in Figure 1. The fricatives had a<br />

duration of 0.20 s and the intensity onset and offset of the natural [s]<br />

was also imitated.<br />

A second version of the vowel context was obtained by a uniform<br />

translation of all vowel formant frequencies by + 2.5 Bark. The voice<br />

source parameters were rescaled in such a way that the mean FO, weighted<br />

according to amplitude, was also translated by + 2.5 Bark. This transformation<br />

produces the characteristic frequencies in vowels of children<br />

four to five years of age from those of the same vowels pronounced by<br />

men [11].<br />

A third version of the vowel context was obtained by a uniform tonotopic<br />

compression of all formant frequencies and the weighted mean FO.<br />

The compression is described by Equation [1]:<br />

z Zo + 0.15 (15.5 - Zo ) [ 1 ],<br />

where Zo is the critical band rate of a characteristic peak in the<br />

original version, and Z is the corresponding value in the compressed<br />

version. This transformation produces the characteristic frequencies of<br />

shouted vowels from those of the original [11]. Between these modes of<br />

speech, there are, however, additional differences which have not been<br />

imitated in our stimuli which provoked the impression of being produced<br />

by an aroused speaker rather than by a shouting one.<br />

For conversion of the vowel formant frequencies f (in Hz) into critical<br />

band rate z (in Bark) Equation [2] that agrees to within ± 0.05 Bark<br />

with the empirical values [13] in the range of 0.2 to 6.7 kHz [10] was<br />

used and for reconversion Equation [3].<br />

z = (26.81<br />

f / (1960 + f)) - 0.53<br />

[ 2]<br />

f<br />

1960 (z + 0.53) / (26.28 - z)<br />

[ 3 ]<br />

The formants, which were stationary, had the frequencies listed in<br />

Table 1 together with the weighted mean FO.<br />

35


Table 1..: The characteristic frequencies of the<br />

three versions of the same vowels (in Hz).<br />

Neutral male<br />

Neutral child Aroused male<br />

[a] [ E: ] [a] [ E: ] [a] [ E: ]<br />

FO 102 110 327 337 298 306<br />

F1 751 442 1153 751 945 639<br />

F2 1248 1799 1626 2617 1421 1932<br />

F3 2501 2390 3702 3525 2558 2461<br />

F4 3359 3413 5160 5258 3287 3332<br />

F5 4311 4386 6977 7131 4052 4111<br />

After D/A conversion the stimuli were recorded on tape in two different<br />

orders. First, Fl and Fh started at their highest values, 24 and 25<br />

log. units. Fl subsequently decreased in steps of 2 u. and Fh in steps<br />

of 1 u. until the distance between the two peaks reached 7 u. In the<br />

following descending series of stimuli Fl and Fh started 1 u. below the<br />

initial values, etc. In the second order Fl and Fh started at their<br />

lowest values, 7 and 14 u., and ascended in reversal of the first order.<br />

Each stimulus had a duration of .8 s and was presented twice in<br />

succession with an interval of 1.5 s. In the following, any sequence of<br />

this kind is considered as one "stimulus". Each stimulus was followed by<br />

a pause of 2.5 s for the subjects to respond. A pause of 5 s was inserted<br />

before each new series of stimuli. The stimuli were presented in six<br />

blocks, beginning with the neutral male version in the first (1) order,<br />

followed by child (2), aroused male (1), neutral male (2), child (1),<br />

and aroused male (2).<br />

Procedure<br />

The subjects were tested in a quiet, sound treated room and the<br />

stimuli were presented to them via Sennheiser HD414 headphones at a<br />

comfortable listening level. The subjects received answer sheets with a<br />

set of the five symbols "G, s, tj, rs, sj" for each stimulus. After explaining<br />

the meaning of the symbols ([8] or [f], [s], [s;], [], [6]) and<br />

presenting a few stimuli for aquaintance, the subjects were asked to<br />

mark for each stimulus the symbol of the fricative they had heard. They<br />

36


were allowed to mark two different symbols in cases of doubt. Singlesymbol<br />

responses were counted as two markings of the same symbol.<br />

Two-dimensional histograms were obtained from the distribution of assigned<br />

labels as a function of the Fl and Fh values. The histograms were<br />

locally normalized with respect to the total number of responses to each<br />

stimulus and smoothed by a spatial cosine filter. "Phonetic boundaries",<br />

say between [s] and [s;], were obtained by considering only the [s] and<br />

[] labels and computing the 50 % level curve.<br />

RESULTS AND DISCUSSION<br />

Effects of presentation order<br />

"8"-labels were infrequent and mainly attached at the highest resonance<br />

frequencies and, occasionally, at the very lowest. The boundaries<br />

between the sibilants are shown in Figure 1. The effect of contrast can<br />

clearly be seen at the [] - [] boundary which is shifted by 0.9 Bark<br />

in Fl between the two orders of presentation. Since contrast presupposes<br />

that at least one similar stimulus has been heard, there is no such<br />

effect at the beginning of each series (shown with thin lines in Figure<br />

1). There, the responses are, instead, likely to be biased by expectation<br />

towards [s] or [6] responses because the previous series of stimuli<br />

begun with these sounds. Outside this region, the [s] - [] boundary is<br />

shifted just as much as the [] - [] boundary. As for the boundary between<br />

[6] and [t>] , the responses are likely to be biased towards [] ,<br />

because this allophone would normally occur in an laSE:I sequence as<br />

pronounced by most of our subjects. This would explain the deviant<br />

course of this boundary in the second order of presentation.<br />

Effects of intrinsic properties<br />

The perceptual role of the two spectral peaks in our stimuli can be<br />

understood by studying the slopes of the boundaries in Figure 1. The<br />

boundaries whose slope is not affected by order effects are well approximated<br />

by straight lines. Two of them ([] - [s] and [] - [6]) have a<br />

course almost perpendicular to the Fl-axis, implying that the higher<br />

resonance Fh is practically irrelevant for these distinctions. Then, of<br />

course, the distance between the spectral peaks is also irrelevant.<br />

Thus, intrinsic properties of these stimuli were not used to normalize<br />

37


for speaker size.<br />

Phonetic boundaries might possibly be given by a gross center of<br />

spectral gravity, like perceived "sharpness" [1].<br />

Since Fh does affect<br />

the sharpness of our stimuli - as affirmed by informal listening - the<br />

results show that sharpness is not an invariant quantity in sibilants<br />

with a given phonetic quality.<br />

If the resonances are separated less than a critical distance of 3.5<br />

Bark observed by Chistovich et al. [2] the phonetic boundaries might be<br />

expected to reflect an integrated spectral peak. The main part of our<br />

L] - [s] boundary runs through an area where Zh - Zl < 3.5 Bark (see<br />

Figure 1). The slope of this line indicates, however, that this phonetic<br />

decision is only based on the pitch of the lower spectral peak or on the<br />

-<br />

N<br />

I<br />

.::t:.<br />

-<br />

.!:<br />

u..<br />

6 4 2<br />

(Bark)<br />

Figure !: Phonetic boundaries between Swedish sibilants.<br />

First (continuous) and second (dashed) order<br />

of presentation. Pooled contexts.<br />

38


spectral onset of auditory excitation. Similar results have been obtained<br />

in non-phonetic pitch matching tasks [4, 9] for frequencies below<br />

1 kHz.<br />

The boundaries between [] and [] are, however, not completely<br />

independent of Fh' This may be due to the fact that [] and [] are the<br />

sibilants for which our synthetic stimuli were closest to the natural<br />

versions, as judged by comparison with measured spectra of Swedish sibilants<br />

[9, 8]. The other phonetic boundaries might have followed a similar<br />

course if the stimuli had been closer imitations of natural sibilants.<br />

The phonetic boundaries can be described by Equation [4]:<br />

I· 1.<br />

[4 ],<br />

where ki is a factor expressing the perceptual weight of Zhi' see Table<br />

2, and Ii is a constant characteristic of boundary i. The factor k might<br />

reflect the goodness of fit between the auditory spectra of the synthetic<br />

stimuli and those of natural sibilants, but it might, alternatively,<br />

be a function of (Zh - Zl)' In that case the phonetic boundaries in<br />

Figures 1 and 2 should deviate slightly from linearity. Interestingly, k<br />

is most negative for (Zh - Zl) 3.5 Bark. This reminds of the suggestion<br />

by Syrdal et al. [8] to regard this distance as specific of phoneme<br />

boundaries among sonorants. While our data do not immediately support<br />

this for sibilants - the observed boundaries are not perpendicular to<br />

the (Zh - Zl)-axis - they do show a tendency in this direction.<br />

Table : Perceptual weight k of Fh<br />

in relation to that of Fl' cf. Equation [4].<br />

Phonetic<br />

boundary<br />

k -0.05<br />

-0.20 -0.27 -0 .10<br />

Effects of context<br />

Since intrinsic normalization for speaker size is almost absent in<br />

our results, we would expect such a normal ization, which theoretically<br />

would be appropriate, to be mediated by context. Figure 2 illustrates<br />

39


the effects of transforming the spectrum of the vowel context. We can<br />

see that the boundaries between sibilants are affected by the acoustic<br />

properties of the vowel context whose phonetic quality was close to<br />

invariant.<br />

The extent of the boundary shift between the neutral male and the<br />

child version of the vowels ( between +0.7 and +1.3 Bark ) is,<br />

however,<br />

smaller than the translation of the vowel spectra (+2.5 Bark ) , especially<br />

at the<br />

[] - [6] boundary.<br />

The boundaries in the aroused male version are shifted from those in<br />

the neutral version about halfway in the same direction as those in the<br />

child version. The [] - [6] boundary ( at 11.6 Bark = 1.6<br />

kH z ) is<br />

-<br />

N<br />

I<br />

<br />

-<br />

..c<br />

U.<br />

6 4 2<br />

<<br />

(Bark)<br />

Figure : Phonetic boundaries between sibilants in<br />

contexts of a man's ( cotinuous), a child's (dashed),<br />

and an aroused man's ( dash-dotted) vowels.<br />

Pooled<br />

orders of presentation.<br />

40


shifted by roughly +0.3 Bark, i. e. , less than the vowel formants in the<br />

same frequency region (+0.6 Bark). Since, further, the upper vowel<br />

formants (above 15.5 Bark =<br />

2.9 kHz) in the aroused male version are not<br />

shifted upwards but slightly downwards, the shift of the [s] - [9]<br />

boundary (at 21 = 19<br />

Bark) can not have been guided by the vowel formants<br />

in the same frequency region. Apparently, the sibilant boundaries<br />

are shifted about half as mu ch as some weig hted mean of the vowel<br />

formants, F2 given the highest weight. This would hold approximately for<br />

both of our context transformations, but the correlation of the extent<br />

of boundary shift with F l<br />

remains an open question.<br />

41


ACKNOWLEDGEMENT<br />

This research has been supported by a grant from HSFR, the Council<br />

for Research in the Humanities and Social<br />

Sciences.<br />

REFERENCES<br />

[1] G. v. Bismarck, Extraktion und Messung von Merkmalen der Klangfarbenwahrnehmung<br />

stationrer Schalle, MUnchen 1972.<br />

[2] L. Chistovich and V. Lublinskaya, "The "center of gravity"<br />

effect in vowel spectra and the critical distance between formants",<br />

Hearing Res. l, 1981, 185-195.<br />

[3] G. Fant, "Glottal source and excitation analysis", STL-QPSR<br />

1/1979, 85-107.<br />

[4] R. Glave Untersuchungen zur Tonhhenwahrnehmung stochastischer<br />

Schallsignale, Helmut Buske Verlag, Hamburg, 1973.<br />

[5] J. M. Heinz and K. Stevens, "On the properties of voiceless<br />

fricative consonants", :!.:.. Acoust. Soc. Am. ll, 1961, 589-596.<br />

[6] P. Lindblad, Svenskans sj e- och tj e-ljud i ett allmfonetiskt<br />

perspektiv, CWK Gleerup, Lund 1980.<br />

[7] J. Martony, "On the synthesis and perception of voiceless fricatives",<br />

STL-QPSR 1/1962, 17-22.<br />

[8] A. K. Syrdal and H. S. Gopal, "A perceptual model of vowel<br />

recognition", J. Acuost. Soc. Am. 7...2.., 1986, 1086-1110.<br />

[9] H. TraunmUller, "Perception of timbre: It, in R. Carlson and B.<br />

Granstrm (eds.), The Representation of Speech in the Peripheral Audito­<br />

Ei. System, Elsevier Biomed. , 1982, pp. 103-108.<br />

[10] H. TraunmUller, "Analytical expressions for the tonotopical<br />

sensory scale", part of Ph. D. thesis, <strong>Stockholms</strong> Universitet, 1983.<br />

[11] H. TraunmUller, "Some aspects of the sound of speech sounds",<br />

contr. to NATO-ARW on psychophysics of speech perception, Utrecht 1986.<br />

[12] H. Traunmfiller and F. Lacerda, "Perceptual relativity in identification<br />

of two-formant vowels", Speech Communication, 1987, (in<br />

print) .<br />

[13] E. Zwicker, "Zur Unterteilung des hrbaren Frequenzbereiches in<br />

Frequenzgruppen", Acustica lQ, 1960, p. 185.<br />

42


SECOND FORMANT LOCUS PATTERNS AS A MEASURE OF<br />

CONSONANT-VOWEL COART ICULATION<br />

Diana Krull<br />

1. Introduction<br />

Formant frequencies at the consonant-vowel boundary depend not<br />

only on the place of articulation of the consonant but al so on<br />

the adjacent vowel. Fant (1973) measured F2, F3 and F4 at stop<br />

consonant- vowel boundaries of one male Swedish speaker. His<br />

results<br />

showed that there is a considerable variation of formant<br />

freqencies at CV boundaries,<br />

especial ly in connection with voiced<br />

stops; also, labial s and velars demonstrate greater variation<br />

when compared to dental s.<br />

The variation is most pronounced for F2<br />

although the dif ference measured in Hz is sometimes larger for<br />

F3, it will amount to less on a perceptual scal e.<br />

ohman (1966 ) used voiced stops between systematicall y varied<br />

preceding and following vowel s and demonstrated that F2 at the CV<br />

boundary is influenced also by the preceding vowel . Both these<br />

studies have shown that there is a strong coarticulation effect<br />

from adjacent vowels on F2 at CV boundary,<br />

thus contradicting the<br />

claim of an invariant F2 locus made by Del attre, Liberman, and<br />

Cooper (1955).<br />

Coarticulation does not work in one direction only, that is,<br />

there is also an influence from the consonant on the vowel.<br />

Thus,<br />

for example, F2 measured in the middle of the vowel is lower in<br />

43


Ibubl than in Idudl everything else being equal, see Lindblom<br />

(1963) .<br />

The aim of this investigation was to compare the amount of<br />

coarticulation in spontaneous speech and in isolated words on the<br />

basis of the second formant trajectories. The differences in F2<br />

between two pOints, one at the CV-boundary and another in the<br />

middle of the vowe I, should decrease with increasing<br />

coarticulation because the adjacent sounds would become more<br />

alike, although we would not be able to te ll whether it was the<br />

vowel that had influenced the consonant or vice versa. F2 was<br />

measured at two pOints on a Voiceprint spectrogram as shown in<br />

Fig.1: (ll at the CV boundary, and (2) in the middle of the<br />

vowel. We called the first point the "locus" of the second<br />

formant and defined it as the frequency of the formant at the<br />

first pulse of the vowel after consonant release. Locus was not<br />

measured at the moment of consonant release because in the<br />

spontaneous speech there was often no visible burst.<br />

(Measurements at the release may also be difficult due to the<br />

rapid transition.) The second point, measured in the middle of<br />

the vowel, we called "target". Both terms were used in a more<br />

concrete sense than had been done earlier: Delattre et al (1955)<br />

had<br />

defined "locus" as a point on the frequency scale about 50ms<br />

before consonant release which they considered to be the<br />

virtual<br />

starting point of the formant; Lindblom (1963) had used "target"<br />

in the sense of an asymptotic value towards which the formant<br />

frequency is aimed.<br />

44


Fig.! Example of measurements on a spectrgram: F2i at the first<br />

pulse of the vowel after stop release, and F2t in the middle of<br />

the vowel.<br />

The relation between the two pOints can be expressed in what we<br />

call the "locus equation"<br />

F2i = k * F2t + c<br />

where F2i is the initial locus, F2t the vowel target, and k and c<br />

are constants.<br />

45


The value of k determines the slope of the regression line for<br />

the locus frequencies (see Fig. 2'. The slope shows the amount of<br />

coarticulation: thus, for example, if k=O then F2i=c and there is<br />

no coarticulation at all; the xocus is invariant. If, on the<br />

other hand, k=1 then locus is completely dependent on the vowel<br />

target, and there is maximal coarticulation. Other studies<br />

(Lindblom, 1963; Lindblom and Lacerda, 1985) have shown that the<br />

mount of coarticulation varies with consonant place of<br />

articulation: the strongest coarticulation is connected with<br />

the<br />

labials, the weakest with the palatal /g/, while the denta ls and<br />

the retroflexes lie somewhere in between.<br />

3.0<br />

"...<br />

N<br />

I<br />

:::.::<br />

'../<br />

N<br />

LL<br />

2.5<br />

0::: 2.0 -<br />

0<br />

LL<br />

(j)<br />

:::l<br />

u 1.5<br />

0<br />

-l<br />

-l<br />


How does speech style affect the amount of coarticulation?<br />

Lindblom and Lindgren (1985) investigated CV coarticulation for<br />

IbV I and IdV I by comparing the size of F2 trajectories between<br />

locus and target. Their results showed that there is more<br />

coarticulation in a neutral speech style in comparison with clear<br />

speech.<br />

Would the same kind of difference be found between words<br />

occurring in spontaneous speech and the same words spoken in<br />

isolation? That is, would words occurring in spontaneous speech<br />

in general display more CV coarticulation than<br />

words<br />

spoken<br />

in<br />

isolation?<br />

2. Experiment<br />

The spontaneous speech material used in this investigation<br />

consisted<br />

of recordings made for the project "Phonetic variation<br />

in natural speech" (Lindgren, Lindblom, and Krull, 1986) . The<br />

recordings were made of two male speakers of Central Swedish.<br />

Spontaneous speech was elicited in two ways: firstly, the<br />

speakers were asked to retell short stories they had been given<br />

to read beforehand; secondly, they conversed freely with each<br />

other. The recordings were made in a quiet room at the Phonetics<br />

Laboratory of the University of Stockholm.<br />

Only word initial CV combinations were used for measurements in<br />

this study. The first CV combinations consisted of a voiced stop<br />

followed by a vowel. Only labial and dental stops were used: Igl<br />

before front vowels is, with few exeptions, pronounced as (j) ,<br />

and the velar samples before back vowels showed too little<br />

variation in F2 for meaningful locus equtions to be set<br />

up.<br />

Stops tat did not have a complete closure were not used.<br />

47


For each speaker a list was prepared containing the words in<br />

his<br />

spontaneous speech sample that had been measured. The speakers<br />

were then asked to read the list with a short pause between<br />

items. The words were in random order without context, each<br />

occurring twice. The second occurrence was meant to be used in<br />

case the first reading of the word should present difficulties of<br />

measurement. Only one item was measured. We shall refer to the<br />

words read in isolation as "reference words" (see word list in<br />

Appendix).<br />

The measurements of locus and target frequencies were carried out<br />

as shown in Fig. !. The resulting locus-target plots for dentals<br />

can be seen in Fig. 3. For speaker PaT there was clearly more<br />

coarticulation in spontaneous speech where k=. 45 while the<br />

corresponding value for the reference words was k=. 25. For<br />

speaker AV there was less of a difference: k=. 47 in spontaneous<br />

speech and k=. 43 for reference words.<br />

A low F2 in a preceding vowel usually lowered the frequency of<br />

the initial locus slightly while a high F2 had the opposite<br />

effect. To check the amount of influence from preceding vowels,<br />

the slope of the regression line for speaker AV was calculated<br />

for cases where Idl was preceded by another dental consonant.<br />

The<br />

result showed very little difference in slope (k=. 49 as compared<br />

to K=. 47) but there was much<br />

less variation in locus<br />

frequencies at a given target value.<br />

The second stop consonant investigated was Ib/. Earlier results<br />

referred to above and the fact that the tongue is not involved in<br />

48


DENTAL (SPONT. SPEECH)<br />

DENTAL<br />

(REFERENCE WORDS)<br />

SPEAKER:<br />

PAT<br />

SPEAKER:<br />

PAT<br />

2.5<br />

F2LOC=.45X+.81<br />

"-<br />

N<br />

I<br />

<br />

2.5<br />

F2LOC=.25X+1.1'3<br />

N<br />

LL<br />

a; 2.0<br />

o<br />

LL<br />

N<br />

LL<br />

a; 2.0<br />

o<br />

LL<br />

(j)<br />

::J<br />

U 1. 5<br />

o<br />

-J<br />

(j)<br />

::J<br />

U<br />

o<br />

-J<br />

x<br />

-J<br />


their production led us to expect more coarticulation with labial<br />

consonants, and such was also the case here. There was also more<br />

coarticulation in spontaneous speech: for both speakers, the<br />

regression line had a lesser slope in the case of reference words<br />

(Fig. 4). In the spontaneous speech of speaker PaT the value of<br />

k=. 96 indicated an almost maximal dependency on the following<br />

vowel for most of the cases. However, on the spontaneous speech<br />

plots of both speakers, there were locus-target points that<br />

formed their own group apart from the other points, their locus<br />

frequency being lower than could be expected from the target<br />

value. These points - marked with "x" on the plots all had<br />

their origin in the same word: t b: raJ 'only' which had<br />

undergone different degrees of reduction, the most extreme case<br />

being [ba3. (It should be noted that gara is a word that tends to<br />

be reduced more than other content words. )<br />

3. Discussion<br />

The results show more overlapping of CV in spontaneous speech.<br />

Why should this be so? One possible explanation may lie in the<br />

time factor: no systematic comparisons of word durations have<br />

been carried out yet, but random samples all showed compressions<br />

in word length. The word , for example, showed great length<br />

variation in the spontaneous speech of both speakers, but even<br />

the longest item of bara in sponaneous speech was 45% of the<br />

reference version for speaker PaT, and 60% for speaker AV; the<br />

shortest item for both speakers was only 13% of the reference<br />

version.<br />

A shortening in duration can affect the relative timing,<br />

and thus the coarticulation, in adjacent segmental gestures,<br />

although some speakers seem to be able to avoid this effect by<br />

50


"<br />

LABIAL (SPONT. SPEECH) LABIAL (REFERENCE WORDS)<br />

SPEAKER: PAT SPEAKER: PAT<br />

3.0 3.0<br />

N<br />

I<br />

<br />

"<br />

N<br />

2.5 F2LOC=.%X+.03 I<br />

2.5<br />

'V<br />

F2LOC=.81X+.21<br />

('J<br />

I..i..<br />

0::<br />

0<br />

I..i..<br />

Ul<br />

::l<br />

u<br />

0<br />

-l<br />

-l<br />

a:<br />

I-<br />

Z<br />

2.0 0:: 2.0<br />

0<br />

° I..i..<br />

° °0<br />

Ul<br />

0<br />

::l<br />

1.5-<br />


speeding up their articu latory movements (Kuehn and Mo ll, 1976;<br />

Gay, 1981). An increase of coarticu lation with a faster speaking<br />

rate has also been shown for Swedish by Engstrand and<br />

Nordstrand<br />

(1983) through measurements of initia l and media l formant<br />

frequencies in vowe ls (corresponding to locus and target in this<br />

investigation). Moreover, Engstrand (forthcoming) measured<br />

utterances of two Centra l Swedish speakers, especia lly Ipi pu<br />

pal, on x-ray fi lm. He found re lative ly litt le coarticu lation<br />

when speech rate and stress were norma l; at a fast speech rate<br />

coarticu lation increased,<br />

especia lly in stressed sy l lab les.<br />

It is also possib le that a dimension of more or less clear<br />

pronounciation - "hypo" and "hyper" speech - has inf luenced the<br />

coarticu lation in our experiment.<br />

This dimension can be<br />

independent of speech rate, though<br />

instead dependent on,<br />

for<br />

example, socia lly or communicative ly determined factors (Lindb lom<br />

and Lindgren, 1985) .<br />

A<br />

special exp lanation is necessary for the locus-target relation<br />

in bara: why is there not the same amount of coarticu lation here<br />

as in the rest of the words beginning with IbVI or even those<br />

beginning<br />

with<br />

Iba/? In Swedish the phoneme lal has two<br />

phonological ly<br />

distinct lengths. The length distinction is<br />

accompanied by a dif ference in timbre whose main acoustic<br />

correlate is the frequency of the second formant: the long<br />

variant has an F2 of about 1000Hz for ma le speakers whi le the<br />

corresponding frequency in the short variant is about 1250Hz.<br />

Dif ferences in timbre between short and long vowe ls can be<br />

perceptually re levant even if the length distinction is removed<br />

(Hadding-Koch and Abramson, 1964) . Thus when depending on e. g.<br />

52


the speech tempo, the length of a long and a short lal over lap,<br />

their timbre can sti ll make them clear ly perceivab le as<br />

distinct<br />

sounds.<br />

In the word the anoma ly on the locus plot for spontaneous<br />

speech lay in the fact that the target va lue of the second<br />

formant in the reduced version had risen to about 1200-1300Hz<br />

whi le the locus frequency had retained its va lue appropriate<br />

for<br />

the long variant of lal (in words with an origina lly short<br />

version of lal, the locus lay at 1200-1250Hz). To begin with, we<br />

looked for an exp lanation of the anoma ly in the phonetic features<br />

of the adjacent sounds. However, an investigation showed that<br />

this exp lanation was insufficient: for speaker AV, for examp le,<br />

was in eight of the nine cases preceded by lal, lei or a<br />

denta l consonant and in all but two cases fo l lowed by segments<br />

that cou ld not have raised F2: labia l consonants, back vowe ls, a<br />

pause. For speaker PaT the word was in four cases preceded by<br />

lei, in one by lal and in two by denta l consonants. In his case,<br />

however, the word was fo l lowed by consonants that may have<br />

raised F2, but in such cases the locus shou ld have been expected<br />

to be raised too (see word lists in Appendix). The same was true<br />

if the second formant was raised by the Irl of which there was<br />

sometimes a trace present. Some examp les of are shown in<br />

Fig.5,<br />

together with an examp le of<br />

the word babbe l which has<br />

origina lly a short variant of la/.<br />

It can be argued that there is no reason for us to expect the<br />

locus as a function of the target to form a straight line. There<br />

is, however, reason to expect a straight line on theoretical<br />

grounds. We cou ld think of the lip opening after Ibl re lease in<br />

53


in terms of a lip rounding. The effect of rounding has been<br />

ca lcu lated by Fant (1960) for articu latory configurations with<br />

constrictions situated from the lip opening to the glottis.<br />

If we<br />

choose constriction locations from about 5. 5cm to 13cm from the<br />

that<br />

is,<br />

corresponding to vowe ls with pa lata l to<br />

pharyngea l constrictions,<br />

and choose two curves within this<br />

section, one for an unrounded vowe l and another with a lip<br />

rounding, we get two curves with a slow ly diminishing distance<br />

between them, the curve with the rounded va lues lower in<br />

frequency. When the lips open after a labia l consonant they are<br />

rounded, we can therefore think of the rounded articu lation as<br />

the locus and the unrounded as target; a locus-target plot in<br />

this case wou ld form an almost straight line.<br />

It therefore seemes that in the cases where ba: raJ was reduced<br />

to (bal the speakers pronounced an Cal but that sub limina lly<br />

there sti ll was an to.: :1 present. This raises a question that<br />

ca lls for further investigation: is the change from (Y bn: ra) to<br />

tba) an examp le of a phono logica l process and the change by<br />

definition discrete; or is it a phonetic transformation and can<br />

therefore be continuous? The present data seem to indicate that<br />

the change is phonetic: first ly, because the speakers seem to<br />

begin to say one sound and continue with another, but also<br />

because the disappearance of Irl does not happen in one step<br />

but<br />

is gradua l.<br />

55


A B c D<br />

f<br />

Fig.6 The reduced form [val from the words A: yar; B: vag;<br />

as a tag question.<br />

C,D:<br />

Prel iminary investigations indicate that the same reduction<br />

effect as in also appears in words like vara - 'be'; ar -<br />

'was'; and 'what'. In these cases, the consonant is<br />

normally deleted in fluent speech and the words are pronounced as<br />

Cv:J or reduced to (va). As was the case with the second<br />

formant in the reduced form (va) began lower than in words that<br />

have a short taJ to begin with. There was an exception, however:<br />

if the (va) used appeared as a tag question at the end of a<br />

sentence its F2 locus began at the frequency of the target as<br />

in<br />

words with an original (aJ, indicating that (val (from) in<br />

this function probably has been lexicalized with the short<br />

variant of lal (Fig.6). In the case of Ivl the coarticulation<br />

56


with the following vowel was even stronger than for biabial<br />

stops, therefore there appeared no anomaly on the locus-target<br />

plot as for labial stops. However, the loci for the reduced form<br />

(val which were not tag questions lay always below the regression<br />

line while the loci for (val used as a tag lay on or above the<br />

line.<br />

Our preliminary investigations have also shown that the<br />

differences in the amount of coarticulation between spontaneous<br />

speech and reference words similar to those in stop<br />

exist even for other dental and labial consonants.<br />

consonants<br />

In the case of<br />

nasals the difference may be even greater.<br />

The research reported here has been supported by the Bank of<br />

Sweden Tercentenary Foundation, grant nr 86/109.<br />

57


APPEND IX<br />

WORD LISTS<br />

If there was more than one occurrence, the number is given in<br />

parentheses.<br />

CV of first syllab le of the word in the midd le column was the one<br />

measured,<br />

the columns to the left and right give the context.<br />

Speal.:er AV:<br />

nej banne mig<br />

det bara vissa<br />

for bara no ll<br />

hant bara for<br />

inte bara perception<br />

jag bara och (2 )<br />

jag bara runt<br />

man bara pi1<br />

sl.:all bara sitta<br />

inte be El is<br />

Bengtson behovde<br />

Eli s Bengtson (6)<br />

att<br />

beral.:na<br />

inte betala<br />

mer<br />

beta la<br />

ocl.:si1 beta lt<br />

liten bit fri1n<br />

jag bor just<br />

har bott i<br />

laser bocl.:er<br />

pi1<br />

borjan<br />

en dag att<br />

en dag di1<br />

fri1n dag till<br />

ti 11 dag **<br />

andra dagar si1<br />

soml iga dagar si1<br />

di1 dags att<br />

var dags att<br />

Bengtsons dat- sagan<br />

Bengtsons datamasl.:in<br />

oh<br />

datamasl.:iner<br />

en<br />

dator<br />

och den I.:oper<br />

och den 1 i gger<br />

ah den ar<br />

** det 1 i -<br />

** det 1 i gger<br />

** det ar inte (de'nte)<br />

** det ar halvt<br />

58


ja det var<br />

att det blir<br />

att det borde<br />

att det ar<br />

haf t det he II er<br />

han det att<br />

han det oc:h<br />

just det oc:h<br />

med det dar<br />

men det ryms<br />

mnaden det ar<br />

oc:h det gjorde<br />

p det sattet<br />

skriva det man<br />

ti II det Kunde<br />

ut det har<br />

eh det visade<br />

han dit oc:h<br />

alia dom skatter<br />

ja du viii<br />

ett dugg skatt<br />

# d blev<br />

den d #<br />

oc:h d blev<br />

oc:h d kom<br />

oc:h d sa<br />

satt d oc:h<br />

va l d dags<br />

ar d sprk<br />

bott dar det<br />

den dar I i II a<br />

den dar maskinen<br />

det dar #<br />

det dar det<br />

det dar oc:h<br />

vagen dar ungef ar<br />

fervnad darfer att<br />

Speaker PaT:<br />

myc:ket babbe l (2) #<br />

babb la-babb la-babb la<br />

oc:h<br />

badrum<br />

han bara en<br />

erat bara enke lt<br />

hade bara ett<br />

ar bara grinig<br />

ska ll bara hamta<br />

ar bara nnting<br />

har bara tio<br />

ar bar a tre<br />

har bara tio<br />

ar bara tre<br />

det bara ar<br />

in<br />

barnen<br />

utf odrar barnen<br />

andra barnet<br />

ena<br />

barnet<br />

har<br />

boken<br />

59


tt<br />

borde<br />

ka bart of tare<br />

ker buss gor<br />

av<br />

bussen<br />

vid<br />

busshllplatsen<br />

det<br />

basta<br />

mycket battre<br />

det<br />

borjar<br />

en dag och<br />

en dag s<br />

tt<br />

daghem<br />

p<br />

dagis<br />

p<br />

Danderyd<br />

saga datum<br />

tt<br />

datum<br />

enligt den har<br />

och det mste<br />

tt det ringer<br />

och det stammer<br />

tt det ar<br />

med det ar<br />

s dog han<br />

s du har<br />

och d borjar<br />

och d hade<br />

-vis d hem<br />

sig d inte<br />

utsikt d ocks<br />

och d ser<br />

och d springer<br />

60


REFERENCES<br />

Delattre, P., Liberman,<br />

loci and transitional<br />

27, 769-773.<br />

A.M., and Cooper, F. S<br />

cues for consonants. J.<br />

(1955). Acoustic<br />

Acoust. Soc. Am.<br />

Engstrand, O. (forthcoming). Articu latory correlates of stress<br />

and speaking rate in Swedish VCV utterances. J. Acoust. Soc. Am.<br />

Engstrand, O. and Nordstrand, L. (1983). Acoustic features<br />

correlating with tenseness, laxness, and stress: prel im inary<br />

observations. RUUL 11. Dept. of Linguistics, Uppsala University.<br />

Fant, G. (1973). Stops<br />

features. MIT Press.<br />

in CV sy llab les. In Qeech sounds <br />

Gay, T. (1981>. Mechanisms<br />

Phonetica 38, 148-158.<br />

in the contro l of speech rate.<br />

Hadding -Koch, K. and Abramson, A.S. (1984). Duration versus<br />

spectrum in Swedish vowels: some perceptual exper iments. Studia<br />

Linguistica, Lund, 94-107.<br />

Kuehn, D.P. and Moll, K.L. (1976). A cinerad iographic study of VC<br />

and CV articu latory velocities. J. Phon. 4, 303-320<br />

Lindblom, B. (1963) . Spectrographic study of vowel reduction. J.<br />

Acoust. Soc. Am. 35, 1773-1781.<br />

Lindblom, B.<br />

syntes av<br />

Lingu istics,<br />

and Lacerda, F. (1985). Akustiska uttalsstudier<br />

svenska II. Projektbeskrivning. Inst itute<br />

University of Stockholm.<br />

for<br />

of<br />

Lindblom, B. and Lindgren, R. (1985). Speaker-l istener<br />

interaction and phonetic variation. PER ILUS, Report IV. Inst itute<br />

of Linguistics, University of Stockholm.<br />

Lindgren, R., Lindblom, B., and Krul l, D. (1986). Phonetic<br />

variation in natural speech. Status and progress report I.<br />

Institute of Linguistics, Un iversity of Stockholm.<br />

ohman, S. (1966). Coarticu lation in VCV<br />

Spectrographic measurements. J. Acoust. Soc. Am. 39,<br />

utterances:<br />

151-168.<br />

61


AN UNMARKED DIALOG?<br />

Exploring Discourse Intonation In Swedish<br />

Madeleine Wulffson<br />

Is<br />

there<br />

phoneticians<br />

distinguishing<br />

such<br />

a like<br />

a<br />

thing as<br />

have<br />

long<br />

an<br />

unmarked<br />

spoken<br />

of<br />

dialog? Linguists and<br />

sentence<br />

Intonation,<br />

between 'marked' and 'unmarked' versions of any given<br />

sentence or utterance. The standard 'unmarked' version of a statement,<br />

It has been found, wi I I usually figure with some sort of 'global fal I'<br />

(Lieberman 1967, Bruce 1984) whereas a 'neutral question' wi I I usually<br />

figure with some sort of rising contour,<br />

Studies<br />

at least towards the end.<br />

of 'marked' versions have often been concerned with devices<br />

for Inviting shifts of focus from one element to another,<br />

Q: "What color cottage did you stay In last summer? "<br />

A: "We stayed In a RED cottage. "<br />

of the type:<br />

In the case of Swedish, Issues relevant to, e. g. the word accents, have<br />

been greatly I I lumlnated by this type of 'lab language' analysis. But<br />

when It comes to live Interactive dialog, a different approach Is<br />

required to handle the communicative value of Intonational phenomena.<br />

This article proposes just such an approach to the study of<br />

Intonation In discourse.<br />

Developed In England for the Engl Ish language,<br />

It has proven equally effective and applicable to the Swedish language as<br />

wei I, (see Wulffson 1987) with only certain technical modifications being<br />

required to accommodate the morpho-lexical phenomena of the Swedish<br />

accents.<br />

tone<br />

The Discourse model of Intonation In question, developed by David<br />

Brazl I at the University of Birmingham, represents a finite set of<br />

meaningful variables which are the result of either/or linguistic choices<br />

made on the part of the speaker,<br />

the state of existential,<br />

on the basis of an ongoing assessment of<br />

here and now convergence or divergence between<br />

speaker and hearer, encoded and decoded respectively In real-time.<br />

These functional oppositions are represented by relatively easily<br />

recognizable Intonational phenomena of relative pitch level and pitch<br />

direction. The purpose of this article Is not, however, to present the<br />

model per se. This has been done more than adequately In other<br />

publications, particularly Brazl I 1985, "The Communicative Value of<br />

Intonation In Engl Ish". More recently a study of the Swedish<br />

Implications of the model wi I I be found In Wulffson 1987. A brief<br />

summary of the meaningful variables subsumed In the model,<br />

taken from the<br />

latter mentioned publication, Is to be found In Appendix 1. Further, In<br />

Appendix 2, Is a summary of the transcription conventions which have been<br />

slightly modified to accommodate the central Swedish tone accents. In<br />

the present analysis of Skane Swedish,<br />

these modifications are of lesser<br />

62


consequence due to this dialect's lack of 'dubbeltopplghet' ('doubletopped-ness')<br />

In words carrying accent 2 so characteristic of central or<br />

stockholm Swedish. Instead the Interactive dlscoursal aspects of<br />

Intonation In Swedish wi I I be the center of focus.<br />

The purpose of this article l!, on the other hand, to explore ways<br />

In which the Discourse Model can effectively be used to I Ilumlnate a<br />

whole conceptual area of Interaction In discourse, that suppl led by<br />

Intonation , which has largely been neglected In Swedish up to now. This<br />

wi I I be done In the fol lowing manner: A snippet of dialog wi I I be<br />

compared Intonational Iy with Itself, so to speak. That Is, two versions<br />

of the same dialog, one being what could be cal led a 'marked'verslon, and<br />

the other a sort of 'unmarked' version. With the aid of the Discourse<br />

Model, the concepts of 'marked-' and 'unmarked-'ness, wi I I be seriously<br />

questioned, as the title suggests.<br />

A thorough-going analysis of both versions wi I I be attempted, taking<br />

Into consideration a certain range of the various options open to the<br />

speaker at any given moment. Each configuration Is an Interplay of<br />

semantic, grammatical and Intonational factors In a unique here-and-now<br />

context. Furthermore It should be kept In mind that al I of the various<br />

oppositions are available for exploitation and manipulation by the<br />

speaker. Within the system there are 4 basic factors, PROMINENCE, TONE,<br />

KEY, and TERM INAT ION, which can be divided Into 13 subfactors, each of<br />

which constitutes a potential meaningful contribution to the discourse.<br />

In addition, the exclusion of the non-chosen factors, that Is, that one<br />

factor (Ie prominence as opposed to non-prominence) , Is chosen over the<br />

other, can also be said to be meaningful.<br />

The 'marked' version Is the spontaneous one, plucked out of a taped<br />

conversation where two people, A and B, (both from the Skane part of<br />

Sweden) , are speculating over a photograph of a woman (whom they don't<br />

know, ) sitting at an outdoor cafe. The discussion has been going on for<br />

some time, and the two people have bul It up a broad picture of the<br />

woman's personality and activities before the bit which we have picked<br />

out comes In. The question they are discussing at the monent Is what<br />

this lady does with her free time, and the conclusion Is that, although<br />

there are not many theaters near where she lives, (somewhere In the midwestern<br />

part of the U. S. A. ,) she does enjoy going to the theater. In<br />

actuality the photograph was taken In a Piazza In Rome, and the subject<br />

was a lady Professor of Economics, but this Is of no Importance. The<br />

Important thing Is that we are dealing with a lively, spontaneous, and<br />

natural conversation between two col leagues and friends.<br />

The 'unmarked' version, on the other hand, Is an unabashed product<br />

of laboratory manipulation, arrived at by the fol lowing process:<br />

Each single utterance of the original bit of dialog was copied down on a<br />

separate slip of paper and given to the original speakers In shuffled,<br />

random order. The utterances were then re-recorded, one by one, with<br />

'neutral' Intonation. The 'this Is what Is written on the paper'<br />

Intonation. Subsequently the re-recorded utterances were computer-<br />

63


spl Iced together again, Humpty Dumpty style, In the same or der as the<br />

original dialog, with 200 ml I I I seconds space between each one.<br />

Now, It Is often said that a picture Is worth a thousand words. In<br />

this case we could say 'a bit of listening Is worth a thousand words'.<br />

Quite simply, the result of a couple of minute's listening to the two<br />

versions of the dialog can only lead to one conclusion. That Is, that<br />

the first dialog Is - utterly nor mal and natur al, and the second Is - not<br />

a dialog at al I but mer ely a conglomeration of separ ate utterances,<br />

strangely cohesive In textual content but totally lacking In any kind of<br />

communicative Interaction between the speakers. A real linguistic<br />

Frankenstein.<br />

So what went wrong? The speakers were the same In both versions.<br />

The words wer e the same, or practically the same. The timing was such<br />

that the utterances came In rapid, natural-I Ike succession. Only the<br />

Intonation was different. Actually, the subjects were rather concerned<br />

that, despite the precautions taken, they had 'remembered' the original<br />

conversation and had re-recorded the bits with that In mind, thus<br />

disturbing the equl I Ibrlum of the sought - after 'neutrality' of the<br />

second version. These fears were unfounded. The resulting dialog Is so<br />

'neutral' as to be ... untenable as a dialog.<br />

The Spontaneous Version<br />

But first things first.<br />

In Its transcribed entirety:<br />

We wi I I start with the spontaneous version<br />

-t<br />

l'<br />

A: II r+ j' UNDrar om hon I nte LASer II<br />

B :<br />

A-<br />

II r + MHM II<br />

A: II r roMANer II II r+MHM II<br />

'"<br />

B: 110 hon laser PA - II 1\ P DELt Idll<br />

'"<br />

A:<br />

II<br />

l'<br />

p nej jag MENar han laser<br />

-t<br />

roMANer II<br />

B:<br />

1\ p<br />

-t<br />

jasa hon laser roMANer<br />

A:<br />

B:<br />

A:<br />

B:<br />

1-<br />

II r+ JA II<br />

(skr a ttl laugh)<br />

\10 jag tro' du mena' HON -\lP laser<br />

1'f'<br />

II r+ NEJ IJ<br />

't<br />

UP JA j a 1\<br />

p det MENade jag<br />

p<br />

jo hon LASer<br />

Inte<br />

II<br />

NOG II 0<br />

pa DELtld<br />

sa N' Ar<br />

II p KVALLSt I d II<br />

-¥<br />

STORA- 1/<br />

64


l'<br />

A: \\ r+ MHM<br />

II<br />

B: (clears thr oat) 1\ p VERK P av TolSTOY och DostoYEVsky II<br />

+<br />

A: 1\ r+ JA \I p pr eill II<br />

A begins with: "I wonder If she reads. " Dlscour sally, she Is<br />

Intr oducing a new topic for discussion, that of reading as a fr ee-time<br />

activity. It Is a logical step to suggest, as the two discussants have<br />

recently decided that the woman's posslbl I Itles for cultur al enr ichment<br />

are limited by her 'small-town' envir onment. Intonatlonally the last<br />

boundar y was marked off by pitch sequencing low ter mination, so the<br />

maximally disjunctive high key on 'UNDr ar ' clearly sets off the utter ance<br />

as a new stage In the discour se. The higher pitch level also lends a<br />

par ticularizing function to the segment which might al low for a gloss<br />

such as: 'Reading Is the Item I choose out of a whole set of possible<br />

activities she might engage In, such as watching TV, sewing, going out to<br />

night clubs with fr iends, etc. The tonic, 'LASer ', also car ries the<br />

simple rising tone r + (), or dominant referr ing tone of the Discour se<br />

Model. One can say that speaker A 'refer s' (R tone) to reading as a<br />

possible free-time activity which both speakers ar e privy to, as wei I as<br />

an assumption that that the lady does, Indeed, read. But since A Is, at<br />

the moment, Intr oducing a new element to the discussion, and taking the<br />

Initiative, she Is wei I Justified In her choice of the dominant version<br />

of the referr ing tone. A P() tone on the other hand, might have<br />

projected the sense : 'This Is a possibility for consider ation, I have<br />

absolutely no pr e-assumptions on the subject, (so tel I me what you<br />

think. )' The fur ther aspect of high ter mination br ings an expected or<br />

projected response type Into focus. Her e the functional significance of<br />

the high ter mination choice could be glossed as: 'Do you, or do you not,<br />

think that she reads In her free time? ' She Is Inviting a polar -type or<br />

adjudicating response. Bul It Into the high ter mination Is the<br />

'expectation' of a high key, yes or no type rejoinder. Which she gets:<br />

'"<br />

\I r + MHM II<br />

Speaker B cooper atively affirms her Idea with a high key r + tone,<br />

satisfying both pitch concord expectations and response type. A mid<br />

key response might have carr ied an Implication of something less than<br />

agreement, I Ike: "yes, or . .. " or "maybe". A low key response could<br />

possibly have had the effect of mild disagreement, such as, perhaps,<br />

"Maybe, but I don't really think so. " Low key car ries an equivalence<br />

function which, combined with a dominant r + tone and maximal concord<br />

br eaking, could very likely have the effect of er adicating the<br />

suggestion, and meaning something I Ike "We'r e back to wher e we star ted<br />

befor e you came up with this dumb Idea. "<br />

65


The relatively equal social status of the two speakers al lows for a<br />

free and open 'game of catch' with regard to who Is In (temporary) charge<br />

of the discourse. In fact this Is a dialog between two friends and<br />

col leagues, a woman and a man, of the same age and In the same line of<br />

work. Here he chooses the dominant version of the referring tone, as It<br />

Is his turn to 'judge'.<br />

In terms of turn-taking (In the Sachs, Schegloff and Jefferson<br />

tradition) It could be said that A has yielded the turn to 8, who now<br />

sets out elaborate on A's suggestion. But A suddenly realizes that more<br />

precision Is needed, and hastens to Insert this afterthought even at the<br />

risk of Interrupting 8's turn (which he has clearly established and<br />

embarked upon. ) The result Is a two-dimensional overlap (there Is no<br />

overlap In the spl Iced version of the dialog, which enhances Its highly<br />

unnatural quality) :<br />

A: \I r roMANer II<br />

-t<br />

1/ r+ MHM 1/<br />

8: I/ o hon laser PA 1/ II p otLt I d II<br />

With simple referring () tone, A 'refers' to two things: that<br />

'novels' was what she meant to say, and/or that If the woman reads, It Is<br />

probably novels. She also 'refers', existentially, to her own<br />

expectation that this Is also 8's understanding. Her choice of mid key<br />

marks the tone unit as 'additive' - an additional bit of Information,<br />

and simultaneous mid termination projects an expectation of concurrence<br />

on 8's part.<br />

Meanwhl Ie 8 has set off on an entirely different 'tack', and whl Ie<br />

Indeed 'agreeing', he Is agreeing with the wrong thing! The col I Islon of<br />

simultaneous speech causes 8 to break off his tone unit midway, resulting<br />

In an oblique zero tone (-7). Satisfied that all Is clear by now, A<br />

encouragingly chips In with a high key r+ tone on 'MHM', antlclpatorlly<br />

adjudicating 8's contribution to be correct. 'Reference' here Is to this<br />

assumed mutuality.<br />

But then comes the 'bombshel I'. As 'Iasa' In Swedish means both<br />

'read', and 'study', It turns out that 8 Is presenting, with a high key<br />

p ( ) tone, that Indeed the woman studies but only part-time, as<br />

the contrastive nature of high key Implies, rather that ful I-time as she<br />

might have done otherwise. So A Is compel led to jump In with a quick<br />

correction:<br />

A:<br />

l'<br />

'"<br />

ne j jag MENar hon laser roMANer 1/<br />

In this repair sequence, the high key of 'menar' carries contrastive<br />

value, which here could be glossed as 'I mean TH IS (novels) , not THAT<br />

(part-time, as you said) . The proclaiming tone on 'romaner' can be said<br />

to present this bit as decidedly new to the discourse, a world-changing<br />

Increment to the unfolding argument.<br />

66


But now let's take another brief excursion Into the hypothetical,<br />

Into what have happened. Had she said Instead, for example:<br />

'"<br />

* 1/ p jag MENar att hon LASer romaner II *<br />

A'<br />

It would have sounded extremely odd In this context, as It would have<br />

laid contrastive emphasis on 'READ', rather than on 'NOVELS' as If to<br />

say, for example, 'She doesn't READ novels, she WRITES them', which would<br />

be quite Impossible In the logic of this contextual and Interactive<br />

setting. The subtle effects of selectivity or non-selectivity In the<br />

prominence system can easily make nonsense out of an otherwise perfectly<br />

cohesive and coherent text, lexically and grammatically speaking.<br />

But back to our or iginal version. It Is Indeed the word 'romaner'<br />

which receives prominence, being presented contrastively with 'deltld' as<br />

a correction of a misunderstanding. A's high termination further<br />

projects the expectation of a high key polar, adjudicating response.<br />

Which she gets:<br />

B:<br />

1-<br />

jasa hon laser roMANer<br />

II<br />

The high key fal ling tone 'proclaims' the whole as - 'This Is definitely<br />

new<br />

and not common ground'.<br />

B's high termination In turn<br />

'expects'<br />

a<br />

high key yes/no type response.<br />

Which he gets:<br />

A: 1/ r+<br />

,f<br />

MHM<br />

11<br />

In natural dialog the phenomenon of pitch concord Is perhaps one of<br />

the most striking Intonational features. In our 'unmarked dialog', on<br />

the other hand, the absence of this Interactive play of pitch Is one of<br />

the most obvious deficiencies.<br />

B:<br />

110<br />

jag tro' du<br />

mena' HON - II p<br />

laser pa DELtld<br />

II<br />

If'<br />

A:<br />

1\<br />

r+ NEJ<br />

/I<br />

B:<br />

\I p KVALLStld<br />

-V<br />

\I<br />

p<br />

-1'<br />

det MENade<br />

jag INte /I<br />

B's contribution here, apart from the o-tone In the first tone unit,<br />

(clearly due to verbal planning) , Is marked by proclaiming tones. Parttime,<br />

evenings, was what he thought she meant, and the proclaiming tones<br />

here underscore the separateness of their two worlds, the lack of common<br />

ground. It Is almost Impossible to Imagine a line of thinking at this<br />

point which would al low for referring tones In this context. A<br />

conceivable, though unlikely, gloss type for a referring tone here would<br />

be some sort of reproachful reminder that, If It was not shared<br />

knowledge, It should have been. If A didn't mean studying part-time, she<br />

should have! But given the cooperative atmosphere of this conversation,<br />

such a reproach would appear distinctly out of character.<br />

67


Now let us take a closer look at the last two tone units In S's<br />

utterance:<br />

<br />

p laser pa DELtld n p KVALLStld l\<br />

"y<br />

Further back In the conversation, before our snippet begins, A and S have<br />

decided that the woman In the photo Is an executive secretary, a real<br />

high-powered, go-getter type. Therefore she must work hard, al I day<br />

long. So If she studies part-time, this must, perforce, take place In<br />

the evening, as no other time Is available on weekdays. Speaker S drops<br />

to low key In the tone unit \\ p KVALLStld II '<br />

clearly, though<br />

"if<br />

subconsciously, exploiting the equivalence function of low key In order<br />

to present the two as existentially one and the same. (A mid key<br />

realization of 'KVALLStld', which would have presented 'evenings' as a<br />

new added bit of Information, might have sounded odd or even<br />

condescending - the subject of the woman's working having been so<br />

recently discussed and settled upon. ) Artificial tone or key switching<br />

can twist the message In such a way that an otherwise perfectly correct<br />

and logical utterance Is rendered nearly or totally unlntel Ilglble.<br />

It wi I I change the whole psychological effect.<br />

The second tone unit of S's utterance (' laser pa DELtld') , being a<br />

potential point of syntactic completion, Is a vulnerable place for<br />

Interruption or overlap. A has anticipated this 'closing point', and<br />

hastens to compete for the floor with a high key contrastive 'NEJ',<br />

Or,<br />

which<br />

overlaps with S's 'KVALLStld'. It Is not ' DELtld' she means, as he<br />

thinks, but 'roMANer'. In the here and now world of A and S's<br />

speculative<br />

conversation there Is an existential set of 2 posslbl I Itles,<br />

either 'novels', or 'part-time'. The Saussurlan general paradigm would<br />

never see 'novels' as the opposite of 'part-time', but here, these are<br />

the two existentially available choices, the 'existential paradigm' of<br />

the Discourse Model, marked Intonational Iy through both the prominence<br />

and the key systems.<br />

In this case high key on 'NEJ', Is chosen of necessity to effect a<br />

repair of the misunderstanding. Since this Is decidedly a competitive<br />

moment,<br />

(A and S are competing for both the turn and their own points of<br />

view) , a simple, non-dominant r () tone would have conceded a measure<br />

of agreement such that It might very possibly have led to a different<br />

'decision', Ie. , that perhaps the woman studied part-time after al I.<br />

(Please recal I that neither participant actually knows the woman's real<br />

character or activities, but rather are speculating about what she <br />

be I Ike or do. The decision could logically go either way. So the<br />

dominant r+ tone (..;:;f ) which A has chosen, Is by far the most<br />

appropriate and effective,<br />

same time firmly maintaining control.<br />

establishing social togetherness whl Ie at the<br />

She continues:<br />

1\ p '"<br />

det MENade jag INte \I<br />

The p tone () Is fu I I Y appropr I ate here to c I ar I fy and re-state<br />

68


the nature of the misunderstanding. The high key of 'MENade' (meant)<br />

underscores the polarity pair 'mean' vs 'not mean',<br />

but she concludes the<br />

tone unit with mid termination, (' INte') , In clear expectation of a<br />

concurring response. A gloss here might be: 'Now I expect that this<br />

little misunderstanding Is cleared up<br />

and that you are In agreement with<br />

me on this point. ' With mid termination she sets up an expectation of a<br />

mid key response as wei I .<br />

Which she does not get. B agrees, al I right, and gives up on his<br />

own Idea of part time studies In favour of novel reading.<br />

But he chooses<br />

to adjudicate rather than simply agree,<br />

and simultaneously signals (high<br />

key) his Intention to Introduce a heretofore new topic, that of what she<br />

reads. The breaking of the pitch concord expectation Is also a feature<br />

of discourse control or dominance, and In fact the turn or move that<br />

fo I lows const I tutes a fu II presentat Ion of the new top I c. I t is now B<br />

who 'decides' (the '+' or dominance factor) the nature of the reading<br />

material. A chips In, at a pause point whl Ie B Is clearing his throat<br />

and planning his strategy, with a concurring mid key: II r + MHM \\ to<br />

re-establ Ish togetherness and encourage B's line of thinking. B<br />

continues with the pr oposition that It Is works of Tolstoy and<br />

Dostoyevsky that the woman reads.<br />

These grand author s are presented with<br />

mid key, as constituting additional valuable Information, wher eas the mid<br />

termination<br />

key response.<br />

of the tone unit sets up the expectation of a concurring mid<br />

Which he doesn't get:<br />

..t-<br />

1/ r+ JA II p preC I S IJ<br />

A retains her part of the control ling role of the discourse by again<br />

breaking pitch concord expectations (an expression of dominance) , and<br />

adjudicating (high key) Instead of simply concurr ing, or 'chiming In<br />

agreement' (mid key) . In this second by second Interactive ml I leu, she<br />

'decides' ( the ' + ' factor) that this Is Indeed the type of reading matter<br />

In question.<br />

A gloss might be:<br />

' Yes,<br />

we are In agreement but Is Is I<br />

who am Judging that now. ' The final tone unit proclaims (p tone ) the<br />

correctnesss of the suggestion with a concl I latory mid termination.<br />

Although, as It happens, the subject Is closed after this, and a new one<br />

opened, A Is so delighted with having 'won' their little competition,<br />

that she does not close the subject with low termination, which would<br />

have const I tuted pitch sequence closure as we II.<br />

I nstead she emphas I zes<br />

again the agreement aspect of the exchange, by ending her comment<br />

'preCIS' ('precisely', 'right') with mid termination, thus leaving sense<br />

of concurrence 'In the air' to be savoured during the pause that<br />

fol lows<br />

before the next question Is taken up.<br />

69


The 'Unmarked Version'<br />

Now<br />

let's look at our Frankenstein version which<br />

lived<br />

but<br />

remained<br />

a monster.<br />

A:<br />

1/ p A non - dialog dialog:<br />

ja' UNDrar om hon INte LASer<br />

II<br />

'¥<br />

B: \1 r+ MHM 1/<br />

A:<br />

II p<br />

roMANer II<br />

B:<br />

li p<br />

A:<br />

li p<br />

hon LASer pa DEL tid II<br />

<br />

nej jag MENar att hon I ser roMANer II<br />

<br />

B:<br />

li<br />

p<br />

A: P<br />

JAsa II<br />

JA<br />

II<br />

p<br />

hon LASer roMANer<br />

if<br />

II<br />

B:<br />

\1 0<br />

<br />

jag trodde<br />

du MENade - " p<br />

hon LASer<br />

p pa DELtld<br />

II<br />

A:<br />

li<br />

p<br />

nej det MENade jag I NTe )1<br />

<br />

+<br />

B:<br />

li<br />

p JAja II r+ JO \1 p hon LASer NOG II<br />

p sana HAR<br />

A:<br />

STORa VERK II p<br />

\\ p ja preCIS II<br />

<br />

av Tolstoy och DostoYEVsky <br />

-V<br />

A 'begins' with: p ja' UNDrar om hon INte LASer For a start,<br />

she speaks Quite slowly and overclearly, In obi IQue orientation.<br />

There Is no ' 0 ' tone, but the tone unit contains three prominent<br />

syllables, as opposed to the usual two In direct Interactive orientation,<br />

where the speaker Is equally concerned with WHO he/she Is communicating<br />

WITH, as with WHAT Is being said. In obi IQue orientation It Is the<br />

language Itself, for one reason or other, which Is In focus. Equally<br />

strange Is the fact that although A Is 'Introducing an entirely new<br />

topic', she 'begins' on mid key, as If she were merely adding a thought<br />

of little consequence to the conversation. As 'Fo downdrlft' Is In<br />

function here, the utterance carries low termination which In discourse<br />

terms projects no expectations whatever as to the type of response that<br />

would be agreeable or acceptable. This Is the usual, normal circumstance<br />

of an out of context, laboratory recording situation, but only occurs<br />

under certain circumstances In live communication. So this stance might<br />

be appropriate If A were, say, an Interviewer In a panel discussion,<br />

throwing out a topic for open discussion. But not here.<br />

70


B's 'reply' Is a mid key / termination " r+ MHM ij "MHM" Is most<br />

often associated with 'feedback or 'backchannel I lng', and In Swedish<br />

there Is a tendency for much feedback to be realized with an r+ tone<br />

(Wulffson 1987) . So B's 'lab association' was quite natural. In fact,<br />

due to this Imaginary association with a general tendency, B's 'answer'<br />

almost sounds possible. Like a bored husband, perhaps, who's trying to<br />

read his newspaper, hasn't heard a word of what his wife said, but<br />

answers anyway,<br />

just to keep the peace.<br />

A's next contribution, li p rOMANer il 'meets' pitch concord<br />

'expectations' by 'adding' that the lady In the photograph reads<br />

In a one word utterance,<br />

by saying:<br />

novels.<br />

'Fo downdrlft' has not had time to take effect.<br />

B then proceeds to close the pitch sequence (and the 'discussion')<br />

11 p hon LASer pa DEL tid J/<br />

..v<br />

with low termination. There Is no overlap here, as was to be found In<br />

t he or I gina I .<br />

But A 'disregards' the fact that the 'discussion' Is 'clearly<br />

closed'. (Our hypothetical husband wants to read his paper. ) She 'adds'<br />

(mid key) , that she meant that the woman read novels. In the real<br />

version, A protested the turn of events by use of contrastive high key,<br />

and<br />

simultaneously demanded an active polar type response through use of<br />

high termination on 'romaner'. Our hypothetical 'wife', on the other<br />

hand, simply 'tel Is' her 'husband' that In fact he was wrong and that's<br />

that. (p tone, low term I nat Ion) .<br />

B 'responds' In mid key,<br />

'agreeing' and 'accepting' this 'additional<br />

piece of Information with totally uncal led for equl I Ibrlum! He<br />

'continues' by closing off the pitch sequence again with low termination,<br />

foregoing or sacrificing a response. 'Resignation' could be the word to<br />

describe the effect of this unengaged utterance. One could perhaps<br />

object here that, since e. g. 'resignation' too Is a human attitude, the<br />

'conversation' Is after al I possible.<br />

But as we have seen and wi I I see,<br />

there Is no continuity or coherence of 'attitude' or Interplay between<br />

the speakers.<br />

Nor does A now 'feel the need' to 'liven up' the conversation.<br />

low key II p J,t-II projects what Is contextually or content-wise a highly<br />

contrastive statement as being equivalent to the last, a foregone<br />

conclusion. Low termination again projects no expectations as to the<br />

type of response which might be agreeable.<br />

B<br />

How about a divorce?<br />

plods on:<br />

\\ 0 jag trodde du MENade -li p hon LASer 1\ p pa DELt Id<br />

II<br />

Her<br />

(I thought you meant she studies part time) For a start al I of the words<br />

are clearly pronounced, whereas In the spontaneous version 'trodde' was<br />

pronounced 'tro" , and menade was 'mena" - much reduced. Also obi Ique<br />

orientation plays a role here. There Is no contextual reason for 'laser'<br />

to be prominent. B 'hammers In' a point which was already Imminently<br />

71


'clear'. He 'beats a dead horse', so to speak. This bit does not end<br />

with low termination as the others do,<br />

but It Is also the only bit with a<br />

reference to another person (du) ,<br />

so It Is likely that the 'global fal I'<br />

rule was not In effect at the time of the recording.<br />

The reference to a<br />

2nd person no doubt Influenced the person recording to Imagine an<br />

Interlocutor and an answer, but having no Interlocutor, he merely<br />

'expects' neutral concurrence.<br />

Which Is what comes, but not because A 'agrees'. She doesn't at<br />

al I. But she also has nobody to 'answer', so why should she<br />

'adjudicate'?<br />

\I p nej det MENade jag .!.!:!te 1/<br />

-+-<br />

This Is an additive mid key 'response', where In fact contextually a<br />

strong contrast Is being made to correct a mistaken Idea. Not here. No,<br />

we've closed the subject again. (Low termination. )<br />

But now finally we get some ' life' out of B:<br />

\\ p "..<br />

JA ja \I r+ JO )1 p hon LASer NOG 1\ p<br />

VERK<br />

II<br />

p av TolSTOY och DostoYEVsky II<br />

-v<br />

sana HAR STORa<br />

This being a longer utterance, It has Its own Internal structure and<br />

beginning, with 'Ja ja, jo', which even out of context suggests a polar,<br />

adjudicating function. So the high key Is, for once, quite appropriate.<br />

Except that there are other factors which render the sequence unlikely<br />

a I I t he same.<br />

The main reason we have a high beginning and a low endlng here Is<br />

related to discourse factors In a mlnl- 'out of context context. ' This<br />

Is<br />

a longer utterance and there Is more 'content' In this statement than<br />

In the others, which leads B to read It as a complete presentation of a<br />

mini-topic, without regard to any Imaginary Interlocutor's reaction. The<br />

point of the matter Is that the 'global fall' Is a physical<br />

descr Ipt Ion,<br />

whereas the pitch sequence relates to discourse meaning,<br />

which Is not by<br />

any means automatic.<br />

A pitch sequence Is phonologically defined as a run<br />

of one or more consecutive tone units which ends In low termination.<br />

It<br />

has a number of discourse functions,<br />

among them being a dominance factor,<br />

and the marking off of discrete, consecutive stages In the unfolding<br />

discourse. The global fal I Is the pitch sequence by default, so to<br />

speak, due to the lack of any other discourse factors that would bring<br />

about other configurations.<br />

A's final 'reply' sounds very odd Indeed:<br />

\I p j a pr eC I S \I<br />

-.v<br />

The low key on 'precis' presents this bit as equivalent to the last.<br />

If the 'fact' of the woman reading Tolstoy and Dostoyevsky had already<br />

As<br />

72


een negotiated. Appr oximately as far from the tr uth of the context as<br />

one could get, as the Idea Is brand new. Again low termination closes a<br />

new pitch sequence and leaves no expectations as to the 'r esponse'. A<br />

sort of 'I don't care and ther e's nothing more to say' Impression Is<br />

conveyed.<br />

In sum, It al I sounds very odd. Her e are two people 'conversing'<br />

(exchanging utterances) yet not communicating. Or at least not<br />

communicating In a way any of us would be likely to consider<br />

satisfactory. If we wer e to over hear such a conver sation, we would<br />

Immediately ask ourselves, 'Are these people Aslmovlan robots? ' or 'Are<br />

they the very worst actor s Imaginable, In the process of lear ning lines<br />

they hate or couldn't care less about? '<br />

The lack of pitch level Inter play has been mentioned as one of the<br />

principle causes of the above described effects. Another very striking<br />

cause Is the near-total lack of referr ing tones, an obi Igatory<br />

characteristic of direct or ientation. In our spontaneous (direct)<br />

version, we have a good balance between R tones (9 examples) and P tones<br />

(10 examples) , plus 3 0 tones due to either planning difficulties or<br />

cut-off tone units. In the spl Iced dialog on the other hand, there are<br />

only 2 R tones as opposed to 15 P tones (plus one 0 tone due to reading<br />

aloud difficulties) .<br />

Quite a revealing ratio. The referr ing tone serves not only to<br />

relfy mutual worlds of understanding and Invoke common ground, but also<br />

to establ Ish social or psychological togetherness on supra-Informational<br />

levels. So It Is natur al and even predictable that when utterances are<br />

withdrawn from an Interactive context, the number of R tones wi I I either<br />

disappear or at least diminish drastically. As was the case In our<br />

stl I I-born laboratory dialog.<br />

Conclusions<br />

It Is hoped that the reader wi I I look upon the foregoing analysis as<br />

an attempt to bring out some of the ways In which Intonation reacts<br />

together with other pragmatic factors to generate the unique universes of<br />

what Is cal led 'local meaning' In the Discourse Model. The entir e,<br />

complex,<br />

here-and-now setting which speakers deal with according to their<br />

own apprehension of the continually mutating dynamics of verbal<br />

communication.<br />

The hypothetical suppositions In the analysis about some<br />

of the things that might have happened, are - Just that. Things that<br />

have happened. Out of the myriad other things that might have<br />

happened. Each contextual factor, Intonational or other wise, affects<br />

every other In various subtle ways.<br />

The analysis has also shown that there Is per haps no such thing as<br />

an 'unmarked' dialog.<br />

It could even be said that !!! naturally occurring<br />

Interactive speech Is, by definition, Intonatlonally 'marked'. Strings<br />

of words and sentences can be 'sewn together' In wr iting - maybe, but<br />

73


definitely not In speech. The Discourse Model clearly shows how<br />

Intonation represents an entire conceptual dimension of meaning In spoken<br />

language as opposed to written language, and further, how and why<br />

Intonation cannot be altered or manipulated with Impunity.<br />

Acknowledements Many many thanks to those who assisted In the<br />

rea II zat Ion of th I s project. My two Imaginative Informants, those who<br />

gave advice (Gosta Bruce, Robert McAI lister, and Jan Anward) , technical<br />

assistance (David House and Mats Dufberg) and food for thought<br />

(everybody at the Institutes of Phonetics and Linguistics at Lund and<br />

Stockholm universities) .<br />

Selected Blblography<br />

Brazl I, David. 1985. The Communicative Value of Intonation In Engl Ish.<br />

(Discourse Analysis Monograph no. 8) Engl Ish Language Research, & Bleak<br />

House Books. University of Birmingham.<br />

Brazil, D. 1985. Phonology: Intonation In Discourse. Extract:<br />

Handbook of Discourse Analysis. Academic Press, London.<br />

Brazil, D. 1985.<br />

Semlotlca 56 - 3/ 4.<br />

Where Is the Edge of Language? Review article.<br />

Brazil, D.<br />

and Rhythm.<br />

New York.<br />

1984. Sentences Read Aloud. Offprint: Intonation, Accent<br />

Studies In Discourse Phonology. Walter de Gruyter, Berl In,<br />

Brazl I, D. 1982 . Impromptuness & Intonation. Off print from Impromptu<br />

Speech - a Symposium.<br />

Abo Academy.<br />

Bruce, G. 1984 St ructure and Funct Ions of Prosody. Stenc I I . I nst. for<br />

I Ingvlstlk, Lunds Unlversltet.<br />

Bruce, G. 1977. Swedish Word Accents In Sentence Perspective. Gleerups,<br />

Lund.<br />

Bolinger, D. 1986. Intonation and Its Parts. Vol. 1. E. Arnold.<br />

Cooper-Kuhlen, E.<br />

Arnold.<br />

1986. An Introduction to Engl Ish Prosody.<br />

E.<br />

74


Coulthard & Montgomery (Eds) .<br />

Routledge and Kegan Paul.<br />

1981. Studies In Discourse Analysis.<br />

Cruttenden, A. 1985. Intonation. Cambridge University Press.<br />

Lieberman, P. 1967. Intonation, Perception and Language. Cambridge, Mass.<br />

MIT Press.<br />

Llnel I, P., Gustavsson, L. 1987. Inltlatlv och respons. Om dlalogens<br />

dynamlk, domlnans, och koherens. Llnkoplng.<br />

Levinson, S. Pragmatics. 1983. Cambridge University Press.<br />

Sinclair and Brazl I, 1982 Teacher Talk. Oxford University Press.<br />

Wulffson, M. 1987. Discourse Intonation In Swedish. To appear In Lund<br />

Working Papers. Instltutlonen for Llngvlstlk, Lund University.<br />

Appendix 1: Brief summary of the Discourse Model:<br />

The Discourse Model postulates a finite set of meaningful linguistic<br />

oppositions which can be singled out on a perceptual auditory level from<br />

the more or less constantly varying stream of speech. The meaning<br />

components here described represent the result of a speaker having made<br />

on either/or choice. The Independent variables are functional In nature.<br />

For example, "If there Is a 'fal ling pitch', It Is not the fall Itself<br />

which Is of Interest but rather the function of the language Item that<br />

carries It. "<br />

The basic factors which contribute to the realization of the<br />

functional oppositions within each tone unit are: PROMINENCE, TONE, KEY,<br />

and TERMINATION. Further within the domain of these systems are<br />

ORIENTATION ('direct/obi Ique') , and DOMINANC E or DISCOURSE CONTROL.<br />

Reference to the work of Braz I I Is heart I Iy recommended for the reader<br />

who would I Ike to gain a deeper understanding of the model. In the<br />

meantime, the fol lowing wi I I serve as a general guideline:<br />

PROMINENCE refers to 'a selection from sets available at successive<br />

places along the time dimension. ' 'An Incidence of prominence fixes the<br />

domain of the other variables of tone, key, and termination.' (Brazl I<br />

1985) A syllable or stretch of speech may be assigned prominence for the<br />

purpose of sense or Intonation selection. For example, If one Swede asks<br />

another 'Vllket kort spelade du? ' (Which card did you play? ), and the<br />

other rep I I es 'H JARTerDAM!' (the queen of hear ts) , I t represents a<br />

selection from an existential set of 52 - the deck of cards. If the<br />

question had been: 'VI Iken DAM spelade du? ' - (Which queen did you play? )<br />

with the answer, 'H JARTerdam', then there are only 4 choices In the<br />

75


existential set of 'hjarter', 'spader', 'ruter', and 'klover'. On the<br />

other hand, If the question had stl I I been 'VI Iken dam spelade du? ', but<br />

the answer had been, for example, 'HJARTerDAM', there would seem to be no<br />

motivation for 'DAM' to be prominent. But let's say, for example, that<br />

the speaker wished to concentrate on the card game Instead of answering<br />

questions, he might convey this with low termination on the word 'DAM',<br />

In order to be left alone! So prominence may be assigned for the purpose<br />

of making a choice within any of the other Intonational systems of tone,<br />

key or termination.<br />

TONE refers to basic pitch movement types, each of which carries a<br />

distinct abstract meaning Increment. The PROCLAIMING TONE, of which<br />

there are two versions, the SIMPLE proclaiming and the DOMINANT<br />

proclaiming (p p+) , stands for the elements In the discourse which<br />

represent a change In the status quo of speaker-hearer understanding.<br />

The REFERRING TONE, on the other hand, also with a SIMPLE and a DOMINANT<br />

version (r r+ ), effectively represents the areas on convergence, or<br />

relflcatlon of the status quo between speaker and hearer, either on<br />

Informational or social levels, or both. The dominant version reinforces<br />

the basic meaning of a tone and/or affects control of the discourse.<br />

The FIFTH TONE Is LEVEL (0 ), and remains outside of the<br />

Interactive proclaiming/referring dichotomy. ORIENTATION refers to the<br />

discourse situation In which speaker/hearer Interaction Is In focus<br />

(P/R) , whereas OBLIQUE orientation (O/P) functions where the language<br />

Itself or linguistic organization Is In focus.<br />

KEY and TERMINATION deal with the communicative value of relative<br />

pitch levels, HIGH , MID, or LOW. Key Is associated with the onset<br />

syllable, and termination with the tonic In an extended tone unit.<br />

Together on the tonic In a minimal tone unit. Within their domain are<br />

relationships of CONTRAST IVENESS, ADDITIVENESS, and EQUIVALENCE, as wei I<br />

as the Interactive areas of projected and actual responses, ADJUDICATING<br />

(high termination and key) and CONCURRING (mid termination and key) , or<br />

no projected expectations (low termination and key) . DISCOURSE<br />

STRUCTURING and SEQUENCING are also achieved through key and termination.<br />

The place of operation for these four sets of speaker options Is the<br />

TONE UNIT, which can be said to be the building block of verbal<br />

communication. According to Brazl I, the speaker 'plans' or 'encodes' the<br />

tone unit, and the hearer 'decodes' It as a whole. A tone unit (TU) In<br />

direct orientation consists of ONE (minimal TU) or TWO (extended TU)<br />

prominent syllables, one of which Is TONIC (- carries a major movement In<br />

pitch, or constitutes the beginning of a pitch movement which extends<br />

over the syllables that fol low) . Key and termination are determined by<br />

the level of pitch In relation to preceding and succeeding prominent<br />

syllables. Key and termination of p and r tones depend on the beginning<br />

of the tone, whereas In the p+ tone It Is the peak of the rlse-fal I, and<br />

In the r+ tone, It Is the end of the rise which counts. The tonic<br />

syllable Is the only obi Igatory portion of a TU. A pause always defines<br />

a TU boundary, but a TU Is not always defined by a pause. The model<br />

76


differs substantially from other models In this crucial point.<br />

It Is the<br />

Instance of a set of meaningful functional choices, and their Internal<br />

organization, rather than external boundaries which determine the tone<br />

unit.<br />

Appendix 2<br />

Transcription Conventions for Swedish<br />

1 . Tone un I t boundar I es: U 11<br />

2. Prominent syllables In capital letters with the tonic under I Ined:<br />

Ii oVre ka II<br />

3. Key and termination (relative pitch factors Involved at every TU.<br />

Key Is associated with the onset syllable,<br />

and termination with the<br />

tonic In an extended TU.<br />

Together on the tonic In a minimal TU.<br />

a. MID-KEY/TERMINATION are not specially marked.<br />

b. HIGH or LOW key-termination are marked with arrows:<br />

In the case of Accent 2 (A2 ) words In Swedish, where the pitch<br />

switch from mid to high key or termination takes place on the<br />

syllable fol lowing grave accent, this Is Indicated by an arrow<br />

placed above that second non-prominent syllable.<br />

4. Grave accent: (' )<br />

5. P - either /p/ ('sl) or /p+/ () proclalmlng. R - either /r/ ()<br />

or /r+/ ()If)<br />

refer ring.<br />

77


Why Two Labialization Strategies in Setswana?<br />

Mats Dufberg<br />

1. Background<br />

Setswana is a Bantu language spoken in southern Africa;<br />

in South Africa<br />

and in Botswana. It has seven vowel phonemes and a large number of<br />

consonant phonemes, of which many are labialized. In this paper I will<br />

discuss the labialized consonants and their non-labialized counterparts,<br />

and the two different realization strategies I have found, as one or two<br />

phonetic segments. In particular I will discuss why there is a bisegmental<br />

realization.<br />

The labialized consonant in Setswana can be found in two different<br />

vowel contexts. The first context is before a back, rounded vowel where<br />

all consonants are ( phonetically) labialized. The second context is<br />

before a non-back, unrounded vowel where there is a phonematic<br />

opposition between a labialized and a non-labialized consonant.<br />

Setswana has been described both by Tucker (1 929) in his book about<br />

a number of related Bantu languages and by Cole (1 955) in his Setswana<br />

grammar. Both agree on two important points ( Tucker 1929: 74, Cole 1955:<br />

33-34) :<br />

- labialized consonants in both vowel contexts are identical, and<br />

- labialized consonants are to be analyzed as one segment, both<br />

phonetically and phonologically.<br />

Tucker's claims concern both Setswana and Sesotho, which is a<br />

closely related language spoken in South Africa, among other places. Let<br />

me make another reference to Sesotho, and below it will be clear why it<br />

is relevant. Roux (1 981 ) refers to X-ray and acoustic studies of<br />

labialization, not in Setswana but in Sesotho. His conclusions are<br />

different from Tucker's and Cole's:<br />

- labialized consonants before rounded vowels are different from those<br />

before unrounded vowels, and<br />

- labialized consonants before unrounded vowels are ended with a labiovelar<br />

semi-vowel, [ w ] , and should at least phonetically be considered<br />

to be two segements.<br />

In Dufberg (1984) I reported on an acoustic study of the labialized<br />

consonants in Setswana. That study was done in 1984. Tore Janson,<br />

professor of linguistics, had during his studies of lexical change in<br />

Setswana in Botswana made recordings of word lists which he asked me to<br />

use for a study of the acoustic correlates of labialization in Setswana.<br />

In those recordings, of four informants, the labialization used was<br />

a monosegmental realization, that is, the labialized consonant was<br />

clearly one phonetic unit. This is in line with the Tucker-Cole view.<br />

For the second part of my study we recorded one speaker of Setswana<br />

78


from South Africa, not Botswana. In that recording I found two different<br />

strategies of labialization. One of the four consonants studied was<br />

pronounced monosegmentally, just like all consonants in the earlier<br />

recordings. But the other three consonants were pronounced with a bisegmental<br />

realization. That is, the consonant was followed by a semivowel.<br />

This is in line with the Roux view. Even though Roux studied<br />

Sesotho, not Setswana, his findings were felt relevant since my<br />

informant came from South Africa. This finding gave rise to two new<br />

questions.<br />

1) Should /C w / be analyzed - phonologically - as one or two segments?<br />

2) Is the difference in realization of the labialized consonants, i.e.<br />

mono vs. bisegmental, a dialectal or idiolectal difference?<br />

In my report (Dufberg 1984) I did not find any reason to change the<br />

phonological analysis. But I hypothesized that there was a dialectal<br />

difference in the pronunciation of the labialized consonants and argued<br />

against the idiolectal hypothesis.<br />

In this paper I will first, in section 2, give a brief presentation<br />

of the phonological system of Setswana. Firstly, becaused it will help<br />

the reader to understand my work, and secondly, because it is of general<br />

interest as a contrast to the Indo-European systems. In section 3 I will<br />

briefly review Dufberg (1984) , though section 3.2 on the expected<br />

effects of labialization was not included in that report. In section 4 I<br />

will discuss my new study and present some new data. Finally, in section<br />

5, I will discuss the two realization strategies - mono vs. bisegmental<br />

- and discuss alternatives to the dialect hypothesis.<br />

2. Phonological system of Setswana<br />

2.1 General description<br />

Setswana is a tone language but the tones will not be discussed in this<br />

paper. The syllable structure is the simplest possible; a syllable<br />

consists of either a consonant plus a vowel, CV, a vowel, V, or a<br />

syllabic consonant, C. There are no clusters of (non-syllabic)<br />

consonants, at least not on the phonological level.<br />

Vowel length is not phonematic in Setswana (Cole 1955: 55) , but<br />

there are different vowel lengths as a part of the prosodic structure.<br />

2.2 Vowels<br />

Setswana has seven vowel phonemes which can be divided into three groups<br />

(Cole 1955: 4-7) .<br />

79


Front, unrounded vowels:<br />

Iii phonetically very closed.<br />

/el phonetically more closed than half closed.<br />

/81 phonetically half open.<br />

Open, unrounded vowel:<br />

/al phonetically open and central.<br />

Back, rounded vowels:<br />

lu/ phonetically very closed.<br />

/0/ phonetically more closed than half closed.<br />

/I phonetically half open.<br />

There is phonologically governed variation of the vowel quality of<br />

the four mid vowels, Ie, E, 0, I (Cole 1955: 55) , but since it is not<br />

important for the present study I will not discuss it here.<br />

2.3 Consonants<br />

As we have seen, the number of vowel phonemes is low and the syllable<br />

structure is simple. The number of consonant phonemes, though, is high<br />

and there is a complex relationship between labialized and nonlabialized<br />

consonants.<br />

In Setswana there are 44 consonant phonemes, of which 17 are<br />

labialized. (For reasons which will be clear I do not count the semivowel<br />

/w/ as a labialized consonant here.)<br />

In the chart below each phoneme is represented by its major allophone.<br />

Notice that for every labialized consonant there is a consonant<br />

differing only in the respect that it is non-labialized. Since<br />

"labialization is a morphophonological process in Setswana" (Janson<br />

1985) it is really relevant to talk of a labialized consonant and its<br />

non-labialized counterpart.<br />

In a few cases the labialized consonant has<br />

two non-labialized counterparts; /ts W I has both Its/ and /tJ/ as its<br />

counterparts, /ts hw / has both /ts h / and /t J h /, and /s w / has both /s/<br />

and / J!.<br />

STOPS:<br />

Plain Labia- Aspi- & ASE·<br />

lized rated labial.<br />

Place of articulation<br />

/p/ /p h / Bilabial<br />

/b/<br />

Voiced bilabial<br />

/t/ /t w / /t h / It hw / Alveolar<br />

/tl/ /tl w / /tl h / Itl hw / Alveolar with lateral release<br />

/k/ /k w / /k h / /k hw / Velar<br />

80


AFFRICATES:<br />

Plain Labia- Aspilized<br />

rated<br />

Asp. &<br />

labial.<br />

Place of articulation<br />

/ts/ /ts h /<br />

/ts w /<br />

/t J/ /tJ h /<br />

/d 7; /<br />

w<br />

/d 7; /<br />

/kx h /<br />

Alveolar<br />

Alveolar or prepalatal<br />

Prepalatal<br />

Voiced prepalatal<br />

Velar<br />

FRICATICVES AND LIQUIDS:<br />

Plain<br />

Labialized<br />

Place and<br />

manner of articulation<br />

/ iF/<br />

/s/<br />

/s w /<br />

I f /<br />

/x/ /x w /<br />

/ r / /r w /<br />

/1/ /l w /<br />

Bilabial or labiodental fricative<br />

Alveolar fricative<br />

Alveolar or prepalatal fricative<br />

Prepalatal fricative<br />

Velar fricative<br />

Apical trill<br />

Alveolar lateral<br />

NASALS:<br />

Bilabial Alveolar Prel2alatal Velar Comment<br />

/m/ /n/ /fl / /0/ Plain<br />

/n w / /fl w / /O w / Labialized<br />

SEMI-VOWELS:<br />

/w/: bilabio-velar<br />

/ j /: palatal<br />

In Setswana there are a few click sounds, but of marginal<br />

importance (and only in interjections). For some consonants there are<br />

restrictions on which vowel can follow, but that is also out of the<br />

scope of this paper, except for what is relevant for labialized<br />

consonants. That discussion will follow below.<br />

2.4 Labialization of consonants<br />

Before the back (and rounded) vowels, /u, 0, / all consonants are<br />

(phonetically) labialized due to regressive assimilation (Cole 1955: 33-<br />

34) , i.e. the consonants are articulated with a distinct liprounding and<br />

- when it is possible - with the back of the tongue raised towards the<br />

velum.<br />

81


This means that before rounded vowels there is no opposition<br />

between the labialized and the non-labialized consonants described<br />

above. The alveolar series Is, ts, ts h / and the prepalatal series /f,<br />

tf, tf h / also collapse into one, labialized series /s w , ts W , ts hw /,<br />

which is alveolar or prepalatal depending on the dialect (Cole<br />

1955: 35) .<br />

Traditionally (Cole 1955) Setswana is described as having the<br />

labialized phonemes in front of the rounded vowels. That view could of<br />

course be challenged since there is no opposition between labialized and<br />

non-labialized consonants in that vowel context. (For an alternative<br />

analysis see Janson (1985) .)<br />

The labialized consonants can also be found before the unrounded<br />

vowels Ii, e, , a/, but in this case there is a phonematic distinction<br />

between the labialized and the non-labialized consonants. Even in this<br />

case there is only one labialized series that correspond to both the<br />

alveolar and the prepalatal series. It is this last kind of<br />

labialization that will be discussed in this paper.<br />

3. An acoustic analysis of /C W / - a review<br />

In this section I will briefly present my acoustic analysis of the<br />

labialized consonants originally presented in Dufberg (1984) . It is not,<br />

however, a pure review. In 3.2 I will discuss the expected effects of<br />

labialization which was not discussed in the original report.<br />

3.1 Objectives of the study<br />

The study was explorative and the questions we wanted to find answers to<br />

were:<br />

1) What is the acoustic difference between the labialized consonant and<br />

its non-labialized counterpart?<br />

2) Is there a common acoustic correlate that corresponds to the<br />

distinction labialized/non-labialized?<br />

To be able to answer the first question completely and to give an<br />

affirmative answer to the second question all consonant pairs have to be<br />

represented, and in comparable vowel contexts and positions in the<br />

words. Recall that the contrast between labialized and non-labialized<br />

consonants only exists before unrounded vowels which is the only context<br />

I have studied.<br />

3.2 Expected effects of labialization<br />

The term labialization implies that the consonant should have some extra<br />

component of the lips,<br />

most likely lip rounding. The acoustic effect of<br />

82


liprounding depends on the place of articulation. For dental, alveolar,<br />

or palatal consonants we would expect a lowering of the third formant,<br />

or its equivalent, like the effect of rounding of an [ i ] to an [ y ] . For<br />

velar consonants, on the other hand, we would expect a lowering of the<br />

second formant, like the effect of rounding of an [ w ] to an [ u ] (Fant<br />

1968: 214) .<br />

According to Cole (1955) , though, labialization (of consonants in<br />

Setswana) means rounding the lips and also raising the back of the<br />

tongue when that is possible. That is, labialization is then a<br />

combination of true labialization and velarization. Labialization<br />

combined with velarization will lower the second formant even in dental,<br />

alveolar, and palatal consonants, that is, labial ization and<br />

velarization will strengthen each other's lowering effect on F2. It<br />

would not be surprising if we would find velarization combined with<br />

labialization since the two have been found together in other languages<br />

(Jakobson & Waugh 1979: 116-7) .<br />

We can expect a secondary effect on the consonantal segment in<br />

either of these two models, that is, only labialization and<br />

labialization combined with velarization, to be lowering of the<br />

amplitude (Fant 1968: 204-5) , but much greater if F2 is lowered than if<br />

only F3 is lowered.<br />

3.3 Speech material and analysis method<br />

For the study we used two different recordings. Recording 1 was recorded<br />

in the field in Botswana by Tore Janson in 1982. It was a recording of 4<br />

native speakers of Setswana from Botswana reading a list of 75 words.<br />

The word list was composed for Janson's lexical change studies. The list<br />

was not planned for the study of labialization and there were a number<br />

of problems with the selection of words. Firstly, all consonant pairs<br />

were not represented, secondly, both members of a pair were not always<br />

in a comparable vowel context, and thirdly, some consonants were in the<br />

final syllable which often underwent devoicing. Of 20 /C-C w / pairs only<br />

8 could be used for the analysis. Recording 1 clearly did not, even<br />

theoretically, allow us to answer the two questions presented above.<br />

The second recording, recording 2, was recorded at the phonetics<br />

laboratory in Stockholm by Tore Janson and me. It consists of one<br />

speaker reading a list of 32 words specially selected for the study. The<br />

speaker is a native speaker of Setswana from South Africa. The selection<br />

of words in the word list was made after I had done most of the analysis<br />

of recording 1. This list was limited to four of the eight pairs<br />

analyzed in recording 1. This limitation was done to keep down the size<br />

of the study. For each consonant we had four different vowel contexts.<br />

The speech material was analyzed on a Kay Digital Sona-Graph 7800<br />

spectrograph, and all measurments were done by hand on spectrograms with<br />

a band width of 300 Hz. To be able to compare levels the spectrograms<br />

83


were normalized with the help of the strongest vowel of each word.<br />

3.4 Results<br />

Without too much simplification we can summarize the results from<br />

recording 1 and 2 in a table. In the table I use the following<br />

notations:<br />

duration<br />

F2<br />

Duration of the consonant segment.<br />

Transitions of the second formant from the vowel before to<br />

the consonant and from the consonant to the vowel after.<br />

consonant<br />

formant The lowest peak, in freqency, in the spectrum of the<br />

consonant segment.<br />

?<br />

Some uncertainty of the analysis.<br />

Phoneme<br />

pair<br />

Observed differences of /e w /<br />

with respect to /e/<br />

Recording 1<br />

longer duration<br />

dipping towards cons.?<br />

lower amplitude of noise?<br />

longer duration?<br />

F2 dipping towards cons.?<br />

lower amplitude of noise?<br />

lower consonant formant<br />

Recording 1<br />

Recording 2<br />

F2 dipping towards cons.<br />

lower cons. formant?<br />

lower ampl itude?<br />

F2 dipping towards cons.<br />

lower cons. formant<br />

lower amplitude<br />

F2 dipping towards cons.<br />

F2 dipping towards cons.<br />

lower cons. formant<br />

lower amplitude<br />

F2 dipping towards cons.<br />

lower cons. formant<br />

ended with a semi-vowel?<br />

longer duration<br />

F2 dipping towards cons.<br />

lower cons. formant<br />

lower amplitude<br />

ended with a semi-vowel<br />

longer duration<br />

F2 dipping towards cons.<br />

ended with a semi-vowel<br />

longer duration<br />

F2 dipping towards cons.<br />

lower cons. formant<br />

lower amplitude<br />

84


It seems fair to say that labialization has one or more of the<br />

following effects:<br />

- Lowering of F2 in transitions from the preceding vowel and to the<br />

following one.<br />

- Longer consonant segment.<br />

Lower amplitude of noise/formants in the consonant.<br />

Lower frequency of noise/formants in the consonant.<br />

- Semi-vowel.<br />

The last effect, the semivowel, is rather special. In two consonant<br />

phonemes, /n w / and /l w /, in recording 2 the speaker clearly used a<br />

bisegmental realization, that is he ended the consonant with the semivowel<br />

[w], in one consonant phoneme, /x w /, he as clearly used a monosegmental<br />

realization. The forth case, /r w /, is somewhat unclear but my<br />

interpretation is that the informant is using the bisegmental<br />

realization even in that case.<br />

In recording 1, on the other hand, I found only the monosegmental<br />

realization of labialization. That is, the labialized consonant was<br />

never ended by a semi-vowel.<br />

3.5 Conclusions<br />

If we compare the effects of labialization we have found with the<br />

expected effects we can see firstly, that there seems to be velarization<br />

as well as labialization since F2 is affected even for front vowel<br />

contexts. Secondly, that there are effects that are not directly<br />

connected to the labialization itself. These are the longer duration and<br />

the occurrence of the semi-vowel. There seems to be a connection at<br />

least in one direction: the semi-vowel gives a longer total segment. And<br />

what is special with the semi-vowel is that only the speaker in<br />

recording 2 has it and that he has it fairly consistently.<br />

If we try to find a common acoustic correlate that corresponds to<br />

the distinction labialized/non-labialized, there seems to be two good<br />

candidates. The first is the lowering of F2 in the transitions from and<br />

to the surrounding vowels. There is some data that seems to contradict<br />

it, and that is the data of /tl-tl w / and /d?-d? w / in recording 1. But<br />

that data is not very complete and the lowering of F2 is what we would<br />

expect if labialization is combined with velarization, so I think it is<br />

safe to consider lowering of F2 to be a common correlate.<br />

The second candidate is the lowering of the second formant, or the<br />

equivalent resonance, in the consonant itself. This is expected, and<br />

reasonably supported except in nasals. In nasals there is a total<br />

closure behind the lips, and therefore lip rounding seems to be<br />

irrelevant for the nasal segment. (The question remains, though, what<br />

the expected effect of velarization of an alveolar or palatal nasal<br />

consonant is in the nasal phase.)<br />

The two effects of labialization discussed above are common<br />

85


correlates of labialization, but there are other effects of<br />

labialization that are not present in all realizations of labialized<br />

consonants. The most obvious one is the presence of a semi-vowel. We<br />

have clearly found two different manners of realizing labialization,<br />

mono and bisegmental.<br />

The questions that the two different ways of realization gave rise<br />

to, which have already been referred to in section 2. 2, were:<br />

1) Should /e w / be analyzed - phonologically - as one or two segments?<br />

2) Is the difference in realization of the labialized consonants, i. e.<br />

mono vs. bisegmental, a dialectal or an idiolectal difference?<br />

Janson (1 985) points out that the strict ev structure of Setswana<br />

is a strong argument against analyzing the /e w / as two phonemes. And I<br />

see no reason to argue against that. On the second question I argued in<br />

Dufberg (1 984) that the dialectal hypothesis was the most reasonable. I<br />

will return to that question in the next section.<br />

4. Some new data<br />

4.1 Objectives of the study<br />

We wanted to test the hypothesis from Dufberg (1 984) that there is a<br />

dialectal difference between mono and bisegmental pronunciation. If this<br />

dialectal difference exist there are at least two possibilities. The<br />

dialectal difference is something that could be found only in areas in<br />

contact with Sesotho from which Setswana has borrowed the bisegmental<br />

strategy. Then we assume that Roux's analysis of Sesotho is correct,<br />

that is, that /e w / is ended phonetically by a semi-vowel, [wl. The<br />

second possibility is that this feature is spread to different areas<br />

maybe independently of Sesotho. Then we would be able to find the<br />

feature in other areas. That is in areas where there is no contact with<br />

Sesotho.<br />

4.2 Speech material and results<br />

This third recording, recording 3, contains ten speakers of Setswana<br />

from different areas of Botswana recorded in Botswana in 1985 by Tore<br />

Janson. The same list of words as for recording 2 is read once by these<br />

ten speakers. Three of these speakers have been digitalized and recorded<br />

onto disks and analyzed by a spectrogram program on the DEe Eclipse<br />

computer in the phonetics laboratory. Since the objectives have been to<br />

test the dialectal hypothesis, I have not made any detailed measurements<br />

but only looked for bi vs. monosegmental realizations of labialization.<br />

Let me illustrate here with spectrograms: firstly, the effect of<br />

labialization, and secondly, the difference between mono and bisegmental<br />

realization. Figures 1-4 contain spectrograms illustrating the effects<br />

86


of labialization. Figures 1 and 2 are parts of recordings of the same<br />

word, which illustrate the non-labialized /x/, read by to speakers, A<br />

and B. Speaker A is the one speaker who used bisegmental real ization of<br />

the labialized consonants (except in /x w /). Speaker B is one of the<br />

speakers of recording 3, and he never used bisegmental realization. In<br />

figure 3 and 4, read by the same two speakers, parts of recordings of<br />

another word illustrate labialized /x w /. Notice the transition of F2,<br />

and the down shift in frequency of the strongest resonance of the<br />

consonant. All realizations in figures 1-4 are monosegmental.<br />

Figures 5-8 illustrate bi vs. monosegmental realization of two<br />

consonant phonemes, /n w / and /l w /. Figures 5 and 7 are from recordings<br />

of speaker A, as defined above, realizing his consonants bisegmentally.<br />

Figures 6 and 8 are from recordings of speaker B, realizing his<br />

consonants monosegmentally. Notice that F2 stays at a low frequency<br />

value, forming a [w], in figures 5 and 7, whereas in figures 6 and 8 F2<br />

rises directly after the end of the nasal and lateral phases,<br />

respectively.<br />

Even though it can be hard to define the beginning and the end of<br />

the semi-vowel that ends a labialized consonant with bisegmental<br />

realization, the difference between a mono and bisegmental realization<br />

has still been rather clear cut. And in the three speakers, out of ten<br />

in recording 3, that I have analyzed I have found no examples of bisegmental<br />

realization.<br />

5. Discussion and conclusions<br />

5.1 Possible explanations of bisegmental realization<br />

Let us now try to account for the different realizations, that is,<br />

bisegmental vs. monosegmental realization. Let me first summarize the<br />

differences between the informants with only monosegmantal realization<br />

and the one informant with both mono and bisegmental realization.<br />

Only monosegmental<br />

From Botswana<br />

Contact with Sesotho less likely<br />

Recorded in field<br />

Recorded in their own country<br />

Both mono and bisegmental<br />

From South Africa<br />

Contact with Sesotho likely<br />

Recorded in echo free chamber<br />

Recorded in exil e<br />

The following are theoreticall y possibl e explanations that we<br />

shoul d consider:<br />

1) Idiolectal peculiarity<br />

2) Dialectal difference<br />

3) Spelling pronunciation<br />

4) Hyper speech due to the formal situation<br />

I rejected in Dufberg (1984) the first alternative on the grounds<br />

89


that it is not likely that someone would so consistently have such<br />

different realizations. As long as there are other possible explanations<br />

I think we can safely leave the idiolectal explanation out.<br />

The second alternative, dialectal difference, is the one which I<br />

adopted in Dufberg (1 984) . I did not then consider alternatives 3 and 4.<br />

And what I found in Dufberg (1 984) in favor of this hypothesis against<br />

the idiolectal hypothesis was the fact that the informant so<br />

consistently used the bisegmental realization for three consonants and<br />

so consistently the monosegmental for the fourth. Nothing speeks against<br />

the dialect hYPQthesis but the support is not very strong either.<br />

The spelling of Setswana, which is fairly standardized, is very<br />

phonematic. A consonant phoneme is in the orthography represented by a<br />

single grapheme, a digraph, a trigraph, or even a quadrigraph. A<br />

labialized<br />

consonant is represented by its non-labialized counterpart's<br />

graph plus a w in the end. So the words /onEla/ and / xon w Ela/,<br />

respectively, are spelled go nela and go nwela, respectively. ( Recall<br />

that there are no consonant clusters in Setswana.) But spelling<br />

pronunciation, the third alternative, can not, at least not alone,<br />

explain the bisegmental realization. Firstly, all informants were<br />

literate and bilingual in Setswana and English,<br />

and all were reading the<br />

words from a list written in Setswana standard othography. Among the<br />

informants that did not show any bisegmental realization were university<br />

students. Secondly, spelling pronunciation can not explain why the<br />

phoneme /x w / was never realized bisegmentally.<br />

Let us look at the fourth alternative. The one informant who used<br />

bisegmental<br />

realization was the only one to be recorded in an echo free<br />

chamber in a phonetics laboratory, which is probably the most formal<br />

place one could be recorded in. The other informants were recorded in<br />

much more relaxed places. The one informant was also the only one<br />

recorded in exile, that is, in Sweden. The other ones were recorded in<br />

their own country, Botswana. The formal situation may have triggered<br />

hyper speech, that is, the opposite of reduced speech. ( For a discussion<br />

of hyper speech see Lindblom (1 987) .) This explanation assumes that<br />

bisegmental realization is, at least potentially, available to the<br />

speaker of Setswana.<br />

the<br />

If this was not the case at the days of Tucker and<br />

Cole, literacy might have made it available to the literate.<br />

The hyper speech hypothesis itself can not explain why the<br />

informant that used bisegmental realization always realized / r w , n W , l W /<br />

bisegmentally, but never / x w /, which was always realized<br />

monosegmentally. But if we assume that labialization also implies<br />

velarization for non-velar consonants, then there is one interesting<br />

fact, namely that / x w / is a velar consonant, and the only velar one of<br />

the four consonants. For the non-velar consonants, the velar gesture,<br />

together with the labial gesture, adds to the complexity of the<br />

consonant, whereas for the velar consonant it is part of a non-complex<br />

consonant. This difference may be the key to why / x w / behaves<br />

differently. If we assume that the secondary articulation is carried out<br />

90


of the consonant itself we would get a labio-velar semi-vowel, [w],<br />

after non-velar consonants, that is, a low F2. But only liprounding<br />

after velar, which would not affect the F2 of non-back vowels.<br />

This velar explanation is compatible with both the dialect and the<br />

hyper speech hypothesis. But it makes the hyper speech hypothesis more<br />

convincing than without the velar explanation.<br />

5.2 Conclusion<br />

In this paper I have challenged the original hypothesis that the two<br />

different labialization strategies, mono vs. bisegmental realization,<br />

are connected to dialectal differences. The new hypothesis is a hyper<br />

speech hypothesis, that is, that the different strategies are connect to<br />

style of speech. In hyper speech, that is, the opposite of reduced<br />

speech, we would then get the bisegmental pronunciation.<br />

An explanation to the difference in realization of the labialized<br />

velar consonant, /x w /, which was never realized bisegmentally, in<br />

contrast to the other consonants could perhaps be found in the fact that<br />

it is velar in contrast to the other consonants analyzed, which are<br />

alveolar.<br />

REFERENCES<br />

Cole, D. T. (1 955) : An introduction to Tswana Grammar. London.<br />

Dufberg, Mats (1 984) : Labialiserade konsonanter i setswana -en akustisk<br />

analys. Unpublished paper. Stockholm: University of Stockholm,<br />

Institute of Linguistics.<br />

Fant, Gunnar (1 968) : "Analysis and synthesis of speech processes". In<br />

Manual of phonetics, 2nd edition, edited by Bertil Malmberg.<br />

Amsterdam: North-Holland Publishing Company, pp. 173-277.<br />

Jakobson, Roman & Waugh, Linda R. (1979) : The sound shape of language.<br />

Brighton, GB: Harvester Press.<br />

Janson, Tore (1 985) : "Labialisation in Setswana: phonetics and<br />

phonology". In Phonologica Africana 1984 (=Wiener linguistische<br />

Gazette Beiheft 5) . Wien: Institut fUr Sprachwissenschaft der<br />

Universitgt Wien, pp. 73-84.<br />

91


Lindblom, Bj6rn (1 987) : "Adaptive variability and absolute constancy in<br />

speech signals: two themes in the quest for phonetic invariance".<br />

In Proceeding s Xlth ICPhS from the Eleventh International Congress<br />

of Phonetic Sciences, 1987, vol 3, pp. 9-18.<br />

Also in Perilus report no 5 (this volume) . Stockholm: University of<br />

Stockholm, Institute of Linguistics.<br />

Roux, J. C. (1 981) : liOn the notion 'phonologization': some experimental<br />

phonetic considerations from Sesotho. " In Phonologica 1980, edited<br />

by W. Dressler et al. , pp. 373-378.<br />

Tucker, A. N. (1929) : The comparative phonetics of the Suto-Chuana group<br />

of Bantu languages. London.<br />

ACKNOWLEDGEMENTS<br />

Thanks to Robert McAllister and Sven Furumark, for insightful<br />

suggestions and proofreading, to Bj6rn Lindblom for leading my analysis<br />

in the right direction, and to Tore Janson, for supporting my work.<br />

92


L.Rcug, I.Lndbrg nd L. -J.Lundbrg<br />

Dpartmnt of<br />

Lingui s tic s<br />

University of<br />

Stockholm<br />

S",dn<br />

Duri ng the last decad thr has ben a growing interst in childrn's<br />

presp ec h developmnt (Yeni-KClmshian, Kavan agh and Ferguson 1980, Stark<br />

\981, LocKe 1983, Lindbl Clm and Zett rstrm 1986). Th viw hld by J aKobson<br />

(1968) t hat declared babbling and speech as two unrelated behav i ors, stands<br />

in cClntrast with recnt studies of p rling uist ic vocalizat ions and ear ly<br />

language acquisition i nd ic ating a gra du al transition (Oller, Wieman, Doyle<br />

and Ross 1976, Vihman, MaCKen, Miller, Simmons and Millr 1985). JaKobson's<br />

app roa c h !>Ias tCl pos t Ulate a d iscon t inuous step and a univrsa] order of<br />

acquisition of phonmes governed by the "laws Clf irrvrsibl solidarity "<br />

( 1968: 51 ) , laws whi c h in JaKobson's frameworK und erlie p honolog i cal<br />

u nivrsa l s, the regreSSion of the phonologic a l system in p atints w i t h<br />

a p hasia as well as the a cqu i sition of phonol ogy in the child. T he p rincipl e<br />

of m aximal contrast governs the order in which the phonemes are acquired.<br />

This mea ns that the infant's earlie st language productions will consist of<br />

consonant/vowel contrast s of maximally diffrent phontic vents pa,<br />

followed by a nasal/oral contrast pal ma. This period of structured<br />

phonol ogic a l d evelop ment is p r c d ed by a priod of rand om vcu:a 1<br />

93


prod uctions , i . e. babbling. These babbled utter ances are characte r izd by<br />

"an tonishing quantity and diversity of sound productions" (1968:21). The<br />

two types of vocalizations, babbling and speech, may be sepa r ated by means<br />

of a hort pe r iod in which the child i sometime "completly mute"<br />

(1968:29). This silent period marks fo r the child the functional diffe rence<br />

of the two types of vocal behavior. However, not all child ren tu rn mute<br />

since "fo r the most part . . . • • one stage merges unobt rusivly into the ot her"<br />

(1968:29). Ja k obson considers babbling <br />

"pu r poseless egocent ric" , and hence non-communicative, type of behavio r<br />

parallel to which " desire for communication" and<br />

gradually replaces the "biologically oriented tongue deli rium" (i.e. the<br />

babbling) of the child (1968:24). Thi view of the young infant a a<br />

non-communicative , rather passive individual acco rds well with the general<br />

opinion of infant competence at the time (see the dicusion in SuI Iowa<br />

1979) • It is only in recent years that the communicative capac ities of the<br />

ve ry young infant have begun to be more fully ap p reciatd ( Su 1 1 Q ... ,a 1979,<br />

Meltzoff 1986). The unde r standing that a search fo r precu rsors of speech<br />

must be conducted in the context of a more gneral per spective on<br />

communicative be havior has resulted in a rejection of the discontinuity<br />

theory. HQwver, it has been uggested that the order of ac qu i sit ion<br />

proposed by Jakobson might be more true fo r the prelinguistic pe r iod than<br />

fo r the acquisition of early phonology (Vi hman pe r onal communication),<br />

thus implying a unive rsal developmental pattern fo r babbling rather than<br />

fo r speec h . The silent per iod ... ,hich was repor ted by Jak obon has not been<br />

confi rmed by any of the many recent stlJdies of prespeech development (e.g.<br />

Mu rai 1963, Cruttenden 1970 , Koopmans van Beinum and van der Stelt 1986,<br />

Olle r 1980 , Sta rk 1980, Kent and Sauer 1984, Vi h man et al.1985, Holmgren,<br />

kindblom , Aurelius, Jalling and Zetterstram 1986). Instead there have been<br />

repo rts of the existence of developmental stages or milestones in the<br />

babbling pe riod (Olle r 1980, Stark 1980, Koopm ans van Beinum et al. 1979,<br />

94


Holmgren et al . 1986) and strong simi larities bet ween the phonetic<br />

reperto ire of a child's babbling and his/her first words<br />

(Ol ler et ill .<br />

1976 , Locke 1983 , Vi hman 1986) .<br />

Many of the studies have been performed on children in Eng lish speaking<br />

commun i ties , a circumstance that has served as a sti mu lus to undertake<br />

research to co nf irm these data on non-Engl ish subjec ts. The present pilper<br />

presents phonetic data on a group of Swed ish inf ants that by and large<br />

corroborate the deve lopmental mi l estones reported in other stUd ies<br />

(O l ler<br />

1980 , St ark 1980 , Koopmans va n Be i num et al . 1986).<br />

If we are correct in assum ing that babbl i ng ilnd speech are functio nal ly<br />

related , obse rvat ions of babb l ing shoul d be of clin ical interest . A large<br />

number of questio ns can be raised concerning the poss ibil i ties of obtilining<br />

ear ly indi c ators of deviant commun i cative deve lopment . The present projec t<br />

(foot note 1) was initiated by professor Ro lf Zet terstrom and his co l l eagues<br />

at Sankt Goran's Ch ildren 's Hospital in Stoc kholm. A major goa l of the<br />

project has been to obtain a detailed phonetic descrip tion of the prespeec:h<br />

development of norm al Swed ish inf ants wh ich could serve as a reference data<br />

base in the deve lopment of meth ods for the ear ly diagnosis of deviant<br />

commun icat ive<br />

deve lopment.<br />

Eight normal (fo otnote 2) Swed ish infants were audio-recorded (footnote 3)<br />

on a bi-week ly basis in their homes from when they were around 5 to 76<br />

weeks of age . Recordi ngs were ended when the ch ild had ac hieved a ten- to<br />

lexicon according to parental reports. The recordings were<br />

made in the prese nce of a close relat ive (mother or father )<br />

or an adult<br />

95


whom the child knew we ll. The ituat i on in wh ich the recor dings were made<br />

would vary depe nding on the age of the chi ld and the time of day . Typical<br />

record ing ituat ion wou ld be: infant lying in bed falling asleep or<br />

awaken ing, infant pl ay ing with toys, infant seated in sofa or at table<br />

draw ing or read ing in book with adult. We wou ld also record dur ing meal or<br />

shortly af ter , in nursing situati ons such as diape r change and dressi ng ,<br />

and in any other int era c tive sit u atio ns betwee n parent and ch i l d that would<br />

occur natural l y. As the inf ants grew older on ly the last interact ive<br />

si t uation rema ined . In parallel wit h the aud io record ings , note were made<br />

of the var i ous activities taki ng place . This in order to provide contextua l<br />

informat ion for the inter pretatio n of the voc al izat ions. An extra recording<br />

sess ion was made at the age of around 3.5 years to insure norma l<br />

speech<br />

deve lopment.<br />

From the larger samp le of eight infant, a group of four (two boys and two<br />

gi rls) were se lected on the basis of qua lity and regularity of record i ngs.<br />

The recording of these four infant were firt exposed to a crude aud itory<br />

analysis. Th is met hod of se lec ti ng voca l i zat ion samp les consi sted in<br />

sc reening the tapes at the approxim at e age of onset of mi l estone rep orted<br />

by other investigato rs (Koopmans van Be i num et al . 1986, Ol ler 1980, Stark<br />

1980) • The Ieeks at whic h there were clear changes in the c hara c ter of the<br />

chi ld's voca l i zati ons , were se l ected for further analyses . Also weeks ju st<br />

preced ing and following this point were an alysed to secure the t ab i lity of<br />

the mi lestone . The chose n record ings were comp uter ed ited (footnote 4) in<br />

order to exclu de al l non-c omf ort and non-i nfant sounds . In follow ing this<br />

procedu re 37 percent (20 hours ) of the tota l number of recorded hours (54)<br />

per infant , Ie re ana lysed . The ch ild's phonation s were divided into<br />

utt erances us i ng breath groups as segmentat ion crite ria and were then rerun<br />

96


onto tapes for transcrip tion. The total number of utterancs transcribd<br />

per child was around 2500 , vary ing between 2200 and 3100 utteranc es for<br />

each of<br />

the fou r children.<br />

The taps to be an a lysed were independent ly transc ribed by four students of<br />

phonetics using the In ternationa l Phonet ic Alph abet (IPA) (see The<br />

Princip les of the In ternation al Phone tic Association , 1981 ) . Prior to this<br />

ana lysiS , transcrip t ion training sessions and discussions were he ld to<br />

insu re identica l mode of procedu re. The IP A is , as the name implies , an<br />

in ternationa l notationa l system deve l oped to describe speech sounds found<br />

in the languages of the wor ld. Each symbo l rep resents a sound of a stand ard<br />

phonetic val ue. The phone tic symbo l of each sound can be ana lysed into<br />

articu lat.ory features such as place and manner of co nstriction in<br />

consonants, degree of opening of the mouth (i.e. lowering of the tongue and<br />

posit.ion of the tongue in the horisontal front -center-back dimension<br />

and presence vs. absence of rounding of the lips, for vowe ls. The features<br />

voiced vs. voice less denote whether or not. there is vibra tion of the voca l<br />

cords accompanying the articu lation. Additiona l ly there are diac ritics<br />

al l owing for detailed desc rip t ions of each symbol , should it deviate from<br />

the stand ard phonetic va l ue.<br />

In our approac h eac h of t.hese features of the IPA symbo ls we re given a<br />

number (see Table I) i.e. for conson an ts the different places and manners<br />

of articulation wou ld be numbered an d the presence vs. absence of voicing.<br />

Simi la rly for vowels the degree of opening of the mouth, front -center -back<br />

position of the tongue (see Table II) , rounding of the lips and voicing<br />

97


..<br />

..<br />

Place of articulation<br />

r 2 3 4 5 6 7 8- 9<br />

, j<br />

0<br />

ClnaonADU . i .i <br />

<br />

=<br />

<br />

:<br />

i 3<br />

-;<br />

-2<br />

- 3 ,g < < ;:.. ". 0<br />

.<br />

c<br />

0<br />

C .<br />

. <br />

1. Pic-i-n p b td t


we re coded in numbers. Additiona l l y dirction of breath st ream e.g.<br />

ingressive , egressive and voice qua lity e.g. norma l and deviant ,<br />

lik ewise numerica lly coded.<br />

These numbers formed the basis on whic h the<br />

frequency<br />

counts we re made.<br />

Choosing this app roac h we can an ticipate the fo l lowing me t hodologic al<br />

prob l ems. Since pre-speech canno t be assumed to be organi zed in to disc rete<br />

phonemic segments , the first prob lem is on of rep rsenting a<br />

series of events as nQn=£Bniin . A rl atd issue is the question of<br />

what notation to use when doing so. The choice of notationa l system when<br />

transc ribing babb ling varies in the literatur. Although a sep arat system<br />

has been deve loped for transc ribing babb ling (Koopman van Beinum et<br />

al.1986 ) many researchers have choosen to use an expanded form of IPA<br />

adding diacritics deve loped to describe non-speech -like sounds (Ol ler et<br />

al.1976 , Cru ttenden 1970 , Kent and Bauer 1985 , Vihman et al.198S amongs t<br />

others). Bush , Edwards, Luckau , Stoel , Mac ken and Pe tersen ( 1973 ) offe r<br />

such a set of diacritics for the specification of phonet ic modifications of<br />

basic IPA segments. The Du tch mode l (Koopmans van Beinum et al.1986 ) is a<br />

physio l ogic a l l y based transc rip tion procedure. It re l ates phonatory and<br />

articu latory events to the speec h production mec h anisms of the vocal tract.<br />

The articu latory notat ions used can be in t erpreted as indic ating the place<br />

and manne r of articu lation ,<br />

degree of opening of the mouth and the<br />

configu ration of the lips.<br />

This approach is simi lar to that of ours<br />

although the not ations used by us we re those of the IP A. In our study<br />

each<br />

IPA symbo l was ana lysed into it s articu latory features. When transcribing ,<br />

a given symbo l w as se lected with specific consideration of constituen t<br />

articu latory fe atures. In addition to making reference to the ove ral l<br />

99


phonetic value of the uttered sound we wou ld ask ourselves questions lik e:<br />

How and where are the consonant-like events of this sequence produced? What<br />

degree of opening do the vowel -like sequences seem to have and what are the<br />

tongue and lip positions? The answers to questions like these would<br />

faci litate the fi nal selection of symbol. Even though we are aware that the<br />

physio l ogy of the infant's vocal tract is quite unlike that of the ad u lt<br />

(Kent and Mu rray 1982) we would often try to reproduce the sounds to be<br />

transcribed in order better to understand their features (c.f. Pike's term<br />

"imit ation label tec hni que" 1943:16) . In this sense the way in whic h we<br />

used the IPA cou ld we l l be said to be a physio l ogical ly based approach.<br />

Invented signs were used for productions for whic h there are no est ablished<br />

IPA symbo ls e.g. a bi l abia l tril l (8). Modifications of the IPA symbo ls<br />

were made if necessary , using diac ritics developed for the transc rip tion of<br />

babb ling (Bush et al.1973). Also fe atures describing direction of breat h<br />

egressive) and typ e of phonation , if devia nt (breathy,<br />

squeaky , creaky, rough or pressed) were also included in ou r analysis.<br />

The choice of the IP A system raises the question of whether or not it is<br />

correct to use a notational system, developed to desc ribe speech sou nds , on<br />

(1943:150 -151 )- that "the controlling<br />

mechanisms of non- speech sounds are quite similar to those of speec h<br />

sounds". He goes on to say tha t "poi nts of articulation are similar for the<br />

two groups" and that "ty pes of articu l ation movements are lik ewise ".<br />

Further he states that "de grees of stric t u re fa ll into the same general<br />

classes for bot h groups, and strictures interrupt the ai r stream in simi lar<br />

ways regard less of whether or not the sou nds are used in speech". The<br />

simi larities between speech and non-speec h sound productions are due to the<br />

fact that they both use the same articu lat ors. We feel that these<br />

simi lari ties support our choice of notational system and transcrip tional<br />

p roc ed u re.<br />

100


Wi th regard to the Swedish 19 Q£k9Bn of the transcribers and<br />

the possible effec ts of this in the transcribed material , it cannot be<br />

denied that there might possib ly be suc h an influence. A mit igating<br />

circum stance however , is that the transcribers al l were students of<br />

phonet ics and thus trained to dis regard language bac kgrou n d effec ts when<br />

transcribi n g. Admittedly such effec ts are diff icult to control since they<br />

resul t from unconscious process ing of the perceived signal . We do however<br />

fee l that the transcribers' awareness of this problem faci l i tated thei r<br />

more obj ective j u dgemen t of the v ocal i zat ions.<br />

To supp l ement the subj ec tive judgements of the aud itory analyses , acous tic<br />

ana l yses of the mater ial<br />

are be ing made (Roug , Landberg and Lu ndberg<br />

fo rthcomin g) .<br />

The general problem of low inteQe mn1 (see e. g. Ol ler et<br />

al . 1976, Ol ler and Ei lers 1982, S toc k man , Wood s and Tishman 1981), we<br />

fou nd that the disagreement amongst transcr ibers decreased when the<br />

transcriptions were compared not on the segmental lev el , but in terms of<br />

frequency distribut ions of arti cu l atory features . To make feature<br />

compar isons between transc i bers possible a fi rst step was to introduce the<br />

ment i oned numerical coding des cribed above . By choosing this approach a<br />

more stab le transcriber-i ndependent picture of the chi ld's product ion<br />

pattern at a given age is achieved . In Table III t ranscr iptions from three<br />

points in time for a si n gle chi ld are shown . For each of the infants the<br />

correla t ion coefficients for the four transcribers of conson an tal place and<br />

manner of art iculat ion ac ross three poi nts in time , are seen in Table IV .<br />

We see that there is cons iderable disagreement with regard to what segments<br />

101


Transcriptions<br />

LR<br />

a?ha<br />

LJ<br />

?aha?e:<br />

IL ?ce?a Week 19<br />

BH<br />

?ae:?a<br />

LR<br />

° ?aaoaoaoaoao<br />

LJ<br />

ale:le:llcel<br />

IL ?aoe:01:Iaaur Week 33<br />

BH<br />

)«':!lcelcedlceal<br />

LR<br />

LJ<br />

IL<br />

BH<br />

hGpasa<br />

Qbaye:<br />

ma9E<br />

dapayce<br />

Week 54<br />

TAB TtT This T.ble shows exampls of transcriptions by four<br />

nrC:;:.ro" pr-


to use in the transcript ions bu t l i ttle disagreement as to what feat ures<br />

.r invo lved . Th is means that the four transcr ibers oft en wou ld disagree<br />

abou t the va l ue of the individua l segment bu t wou ld essent ial ly agree , in<br />

statistical terms , on how and where the favored sound types had occurred in<br />

the vocal trac t of a part icular chi ld .<br />

In ad d it ion to th segm ental analysis utterances were class ified with<br />

respec t to their sequent ial pattern ing of vowe l and consonant segm ents,<br />

i . (i • phonotact ic st ructure . Eac h utterance was class ified in terms of two<br />

parameters . One criter ion was the phonotact ic structure, the other was the<br />

phonet ic features of the const ituent consonant (s) . The fi rst parameter<br />

divided the utterances in five classes , the determi ning property be ing<br />

ab i l i ty to match (part of ) the utterance to one of the fo l l ow ing five<br />

phonotactic patterns .<br />

(V stands fo r any str ing of one or more vowe ls and C stands for any str ing<br />

of one or more consonants) .<br />

1 Non-consonant utterances 0<br />

2 Si n gle consonan t utterances C<br />

3 Op en sy l l ab le utterances CV<br />

4 Po lys yl labic utterances with<br />

repeated conson ant part of<br />

sy l lable<br />

CiVCiV<br />

5 Po lys yl labic utterances with<br />

vary ing consonants part<br />

(voicing, place or m.nner )<br />

CjVC iV<br />

(C i /=Cj and Cj mu st be non-g lotta l)<br />

103


child V<br />

LR LJ IL<br />

Child K Child J Child M<br />

SH LR LJ IL SH LR LJ IL BH LR LJ IL<br />

SH<br />

LR<br />

.. LJ<br />

u<br />

os<br />

... IL<br />

p..<br />

BH<br />

0,84 0,97<br />

0,85<br />

0,93 0,88 0,94 0,91 0,91 0,96 0,98 0,76 0,93<br />

0,91 0,96 0,97 0,90 0,95 0,92<br />

0,98 0,99 0,98<br />

0,96<br />

0,89<br />

0,98<br />

'"<br />

..<br />


Since it was on ly requ i red that part of an utterance matc h a spec ific<br />

pat tern an ut terance may we l l contain parts not covered by the pattern .<br />

Th is means that utterances be longing to any of the classes may contai n a<br />

lead i n g vowe l withou t this af fecting its class ificat ion. Likewise classes 2<br />

through 5 may contain trai l i n g consonants . (Utterances showing trai lin g<br />

consonant strings contai ning non-glottal or non-nasal features were in fact<br />

marked<br />

special ly,<br />

wh ich actua lly induces a further ref i nement of the<br />

partit ioning) .<br />

The second parameter divided the utterances into six cl asses , the<br />

determining property be ing presence of specified<br />

type of consonant.<br />

A No consonant<br />

B<br />

Glottal consonant<br />

C<br />

Non-glottal consonant with non-comp lete closure<br />

D<br />

Sonorants and gl ides<br />

E Non -glottal conso n ants with comp lete closure<br />

F<br />

Consonant clusters<br />

Al l glottal consonants are cou nted as identic al . A glott al consonant with<br />

an adjacent non- glottal<br />

consonant was consi dered ident ical with the<br />

non-glotta l. A cluster<br />

is def i ned as a<br />

str ing of two or more consonants.<br />

(Note that from this fol lows that a string consist ing of one glottal and<br />

one non-glottal consonant does not cou nt as a cluster . Fu rther a glottal<br />

stop and a glottal fricat ive does not give rise to variation) .<br />

The classes are conc i eved as ordered in the sense that when a part icular<br />

ut terance can be class ified as be lon gi n g to more than one class, the higher<br />

class shou ld be chosen (5 rat her than 1, F rather than A) . The order is<br />

105


supposed to mirror the development of the child .<br />

Ex amples of membe rs of t he va rious categories are show n below .<br />

2 3 4 5<br />

0 C CV CiVCiV Cj VCiV<br />

A<br />

+v owel<br />

.<br />

ee:<br />

B<br />

+g lottal<br />

e? h


th various features of the IPA segments The<br />

followin g flgure5 .how the percent occu rrence of the most frequent features<br />

and categories found in our data. It is interesting to note that there are<br />

features and categories that are more frequent at certain times and les& so<br />

at others and some which do not occur at all in the data.<br />

Age is presented on the abscissa and percent occurrence on the ord inate .<br />

The percent occurrence in the curves i& presented cumulatively. The letters<br />

refer to the four i nfants, the two girls: V and K and the two boys : J and<br />

M. The individual curves for each of the infants are numbered 1 to 4 i • e.<br />

V=l, K=2, J=3 and M=4 .<br />

there is inf inite number of poss i ble places of<br />

rtir"ltinn in the vocal trat (c.f. Pike 943). However on ly a smal l part<br />

of ths placs ar used in speech. The eleven point. of articulation<br />

pha.ryngea 1 , uvular, veler, pa.latal,<br />

alvli'olo-palatal, re trof lex, dental/alveolar, labiodental<br />

in Ol.1r data in very 101'1<br />

numhers and others not at all.<br />

bilabial, dental/alveolar, velar and glttal constitute 9? p er cen t of the<br />

number of places used by the four infants over the whole period<br />

studied. Palatal and uvular articulations occur in four and<br />

places of articulation (retroflex, palato-alveolar, alveolo-palatal and<br />

do not occur at all in our data. The four most f re quently used<br />

107


laes are presented in greater detail below.<br />

tn Figures 1:1 through 1:4 the development of place of art.iculation is<br />

presented. We see that the prevailing place of articulation in the 1-5<br />

months period in three of the infants V, K and J is glottal and that this<br />

dominance rapidly declines in the second half of the first year. The fourth<br />

child M, has a preference for nasals during the first months result ing in<br />

his glottal peak appearing later (5-7 months). For three of the infants V,K<br />

an d M there appears to be a following period of velar/uvular productions. A<br />

m ...j C'lr it.)' (75%) of these productions are velar, however the two pl aces of<br />

articulation have been added si nce they often were difficult to distinguish<br />

frC'lm ear.h nther when transcribing. For the fourth child J a prolonged<br />

glottal period seems to compensate for the velar/uvular articulations. As<br />

the glotttitls titnd velars decline the bilabial and dental/alveolar<br />

productions take over . There does not seem to be any general order of<br />

to the bilabial and dental/alveolar place of<br />

articulation. Two of the infants K and M develop the bilabial articulaticm<br />

befC'lre the dental/alveolar. One infant V acquires the dental/alveolar place<br />

first. whereas t.he last infant J lacks a clear preference until the<br />

beginning of the second yea.r of life. This preference is then<br />

dent.al / alveo lar. At the last sampling point t.hree of the inf ants V,J and M<br />

have no clear preferences as far as front place of articulation is<br />

concerned. In the fourth infant. K however the earlier bilabial<br />

preference<br />

has changed to dental/alvenlar.<br />

According to IPA nine different m anners are used to describe arbitrary<br />

108


CHILD V<br />

CHILD K<br />

en<br />

w<br />

a:<br />

=:J<br />

!:i:<br />

W<br />

U.<br />

U.<br />

0<br />

w<br />

U<br />

z<br />

w<br />

a:<br />

a:<br />

=:J<br />

U<br />

U<br />

0<br />

l-<br />

Z<br />

W<br />

U<br />

a:<br />

w<br />

c..<br />

100<br />

50<br />

0<br />

en<br />

w<br />

a:<br />

=:J<br />

l-<br />

e:(<br />

W<br />

U.<br />

U.<br />

0<br />

w<br />

U<br />

z<br />

w<br />

a:<br />

a:<br />

=:J<br />

U<br />

U<br />

0<br />

I-<br />

Z<br />

W<br />

u<br />

a:<br />

w<br />

c..<br />

100<br />

50<br />

0<br />

1-3 5-7 9-11 13-15 18-20<br />

3-5 7-9<br />

11-13 15-18<br />

1-3 5-7 9-11 13-15 18-20<br />

3-5 7-9 11-13 15-18<br />

AGE<br />

IN MONTHS<br />

AGE<br />

IN MONTHS<br />

CHILD J<br />

CHILD M<br />

en<br />

w<br />

a:<br />

=:J<br />

l- e:(<br />

W<br />

U.<br />

100<br />

en<br />

w<br />

a:<br />

=:J<br />

I-<br />

e:(<br />

W<br />

U.<br />

100<br />

U.<br />

0<br />

U.<br />

0<br />

W<br />

U<br />

z<br />

w<br />

a:<br />

a:<br />

=:J<br />

U<br />

U<br />

0<br />

l-<br />

Z<br />

W<br />


consonantal speec:h sound s in any language: plo.lve, nasal , lat&>r.l , l.t&ral<br />

fric:ativ&, rol I &d , flapped , rol led fri c: ativ&, fri c: ati ve, fric: t i anl & ••<br />

c:ontinuants and .emi-vowels. It is of c:on. i d&rable inter&>st to note that in<br />

our data only th& following mann&r. occurred: pl o.i v&, na.al , la t era l ,<br />

ro I 1 &d ( tri ll) , fricative and .&mi-vowel. The.e diff&r<br />

consider ably in fre qu&>ncy of occ:urrenc:e. Just as in the c:ase of plac:e of<br />

artic:u lation a few of th&> c at&gor i &. dominat& the .c:ene whi l & oth&rs oc:cur<br />

in low numbers or not at all. We f ind that plcsives , nasals and fric:atives<br />

c:cn.titut& 91 p &rc: &n t of the mann&rs us&d by the four in fant. over th&<br />

whole period studied. Semi-vowels , lat&>rals and vibratory trills c: onsti t u te<br />

fo ur , thr&&> and two p&rc &nt r&.p&c:tiv&ly whil&<br />

l at &r al<br />

flaps<br />

and rolled fric:atives are non-existent.<br />

The manner of arti c:ulation for the four infant. is seen in F igure. 2:1<br />

through 2:4. The g&nera l patt&rn here i • • cm&wh a t I e •• dr ama t i c: ov&r tim&<br />

compared tn place of articula tion. The early produc:ticns in the on& to five<br />

mon th. p &ri cd are mainly .top., fricativ&s and nasal •. We .ee that there i.<br />

a major shift in number of full stop c:onsonants around 9-11 mon ths of age<br />

in thre& of the infan t. V, K and M whil& th& fourth child J has hi. peak in<br />

the 11-13 mon t hs period. This inc:rease follows the onset of reduplic:ated<br />

c:on.onant babbl i ng prim& (.&e pag& 14). Li qui ds are pre.ent throughout th&<br />

study but i nc rease towards the end of the sec:ond year of life , at least in<br />

thre& of th& i nfan t. V,K and M, th& fourth child J ha. few&r I i quid. and<br />

the amoun t does not seem to i nc rease. The inc:rease of li quids in the 7-9<br />

month. period is mainly c:aused by bi l abia l and uvul ar t r ills , agai n thi. is<br />

true for three of the infants K,J and M. Infant V has only laterals at this<br />

po i n t . With regard to .emi-vowels th&y are c omparat iv&ly f&w and do not<br />

exhihit any major c:hanges in number. Infant J has an increase towards the<br />

110


CHILD V<br />

CHILD K<br />

(/)<br />

100<br />

::><br />

<br />

w<br />

u.<br />

(/)<br />

100<br />

::><br />

<br />

w<br />

u.<br />

u.<br />

o<br />

u.<br />

o<br />

w<br />

()<br />

ill<br />

50<br />

a:<br />

a:<br />

::><br />

()<br />

()<br />

o<br />

I<br />

Z<br />

W<br />

()<br />

a:<br />

w<br />

a.<br />

o<br />

,<br />

- - --- -- - -- ---- - - - ----- --- - --- ------- ----,<br />

w<br />

()<br />

ill 50<br />

a:<br />

a:::<br />

::><br />

()<br />

()<br />

o<br />

I<br />

Z<br />

W<br />

()<br />

a:<br />

w<br />

a.<br />

o<br />

, ,<br />

L _______________________________________ L<br />

I<br />

1-3 5-7 9-11 13-15 18-20<br />

3-5 7 -9 11-13 15-18<br />

1-3 5-7 9-11 13-15 18-20<br />

3-5<br />

11-13 15-18<br />

7-9<br />

AGE IN MONTHS<br />

AGE<br />

IN MONTHS<br />

CHID J<br />

CHILD M<br />

(/)<br />

100<br />

::><br />

I-<br />

oe(<br />

W<br />

U.<br />

U.<br />

o<br />

w<br />

()<br />

ill<br />

50<br />

a:<br />

a:<br />

::><br />

()<br />

()<br />

o<br />

I-<br />

Z<br />

W<br />

()<br />

a:<br />

w<br />

a.<br />

o<br />

(/)<br />

100<br />

::><br />

I-<br />

oe(<br />

W<br />

U.<br />

U.<br />

o<br />

w<br />

()<br />

ill<br />

50<br />

a:<br />

a:<br />

::><br />

8<br />

o<br />

I<br />

Z<br />

W<br />

()<br />

ffi 0<br />

a.<br />

L _____________________________________ _<br />

1-3 5-7 9-11 13-15 18-20<br />

3-5 7-9 11-13 15-18<br />

1-3 5-7 9-11 13-15 18-20<br />

3-5 7-9 11-13 15-18<br />

AGE IN MONTHS<br />

AGE<br />

IN MONTHS<br />

DSTOP [illjJFRICATIVE UNASAL<br />

.LlQUID<br />

DSEMI-VOWEL<br />

FIG 2:1-2:4<br />

The above Figures show percent occurrence of consonant<br />

manner of articulation as a function of age for each of the four<br />

infants.<br />

111


nd of th tudy wh il infant M ha a d@rease . With regard to fricat iv<br />

the general<br />

trend s@ems to be that of a gradual decrease toward. the end of<br />

lif. It hould be kpt in min d that we do not<br />

different iate between glottal and supraglottal frica tivs in thee curves .<br />

Conider i. ng the frquncy of ocurrnce of glottal, w up t that the<br />

large amount of fri c at i v es found in our data is heav i ly biased by the<br />

glottal production and that the fricat i v dcreasin g toward the end of<br />

the study are the gl ttal ones . Concerning nas als, thee are more frequent<br />

in the early produtions than in the late.<br />

I f we compare Figures 1 and 2 a general picture emerges of the segments<br />

ud ac ros time in the infants' voc al iza tion. From the manner and place<br />

curves we conc lude that a maj or ity of the fricat ives and stops produced in<br />

the one to fiv month per iod ar glottal . The dental /alveo l ar produtions<br />

during the same per iod are mai nly nasal but also fricat ives occur . The<br />

ve l ar ar fricativ or nasal and th ear ly bilabial art iulations are<br />

6em i -VO"Je Is, froica tives and nas als. A. the chi ld grow6 older ,<br />

dntal /alveo lar, bi labial and ve l ar top beom mo re frequent mai nly at<br />

the expense of the frica tiv es .<br />

In Figur 3: 1 through 3:4 w th degr of open ing of the vowl-like<br />

sounds for eac h of the four infants, plotted as a function of age . The<br />

gnera] pattern i that of non-high, non-low vowl dominating the ar ly<br />

produ c tions. At the end of the study a more diversified picture emerge . If<br />

compard with Figur 4:2 through 4:4, "' h ich how the ocurrnce of back ,<br />

center and front vowe l art iculatio ns, a general pattern emerges of mai nly<br />

112


CHILD V<br />

CHILD K<br />

en<br />

w 100<br />

a:<br />

::J<br />

I--<br />

«<br />

w<br />

u..<br />

u..<br />

0<br />

en<br />

w<br />

a:<br />

::J<br />

I--<br />

«<br />

w<br />

u..<br />

u..<br />

0<br />

100<br />

W<br />

0<br />

z<br />

w<br />

a:<br />

a:<br />

::J<br />

0<br />

0<br />

0<br />

50<br />

0<br />

0<br />

W<br />

U<br />

z<br />

w<br />

a:<br />

a:<br />

::J<br />

0<br />

0<br />

0<br />

50<br />

.. ..<br />

0<br />

0<br />

I--<br />

Z<br />

W<br />

0<br />

a:<br />

w<br />

a..<br />

0<br />

0<br />

0<br />

0<br />

o 0<br />

o 0<br />

L _____________________________ I<br />

I--<br />

Z<br />

W<br />

0<br />

a:<br />

w<br />

a..<br />

0<br />

0<br />

0<br />

---------------- - - - - - - - ------<br />

0-3 6-9 12- 15 18-21<br />

3- 6 9-12 15-18<br />

0-3 6 -9 12 -15 18-21<br />

3 - 6 9-12 15-18<br />

AGE IN MONTHS<br />

AGE IN M ONTHS<br />

CHILD J<br />

CHILD M<br />

en<br />

w 100<br />

a:<br />

::J<br />

I--<br />

«<br />

w<br />

u..<br />

u..<br />

0<br />

en<br />

w<br />

a:<br />

::J<br />

I--<br />

«<br />

w<br />

u..<br />

u..<br />

0<br />

100<br />

W<br />

0<br />

z 50<br />

w<br />

a:<br />

a:<br />

::J<br />

0<br />

0<br />

0<br />

I--<br />

Z<br />

W<br />

0<br />

a:<br />

Ll.I 0<br />

a..<br />

W<br />

0<br />

z 50<br />

w<br />

a:<br />

a:<br />

::J<br />

0<br />

0<br />

0<br />

I--<br />

Z<br />

W<br />

0<br />

a:<br />

W 0<br />

a..<br />

o 0<br />

1 __ _____________________ ______ .J<br />

0<br />

0-3 6-9 12-15 18- 21<br />

3-6 9-12 15-18<br />

0-3 6-9 12-15 18-21<br />

3- 6 9 -12 15- 18<br />

AGE IN MONTHS<br />

AGE IN MONTHS<br />

FIG 3:1-3 :4<br />

These Figures show percent occurrence of degree of<br />

opening for vowels presented as a function of age for each of the<br />

four infants. Since a maj ority of the occuring vowels were front or<br />

central , these have been chosen to exemplify degree of opening in<br />

this Figure.<br />

113


I i fe,<br />

v@rsus more different iated vowe l qual iti@s ( i,e,ae ,a ) in later p a rt of the<br />

first year .<br />

It is interest ing to not ice the total ab&ene of bak vowels in<br />

the ear ly p r oduct ions . Bac k vowels beg i n to ap p ear in the second year of<br />

life. With regard to the features r ounded , unrounded , is a total<br />

domi nance of u nrou n ded vowels over the whole pe riod stud ied. Our results<br />

are simi lar to those reported in the literature. Kent and Murray ( 1 982 )<br />

rep ort , in their acousti stu dy of 21 infants at 3 , 6 and 9 months of age ,<br />

that the 3 to 9 months per iod is domina ted by "rela tively mid-front or<br />

central artic ulat ions" . Similarly C r utt e n d en (1970 ) in his study of his own<br />

t .. IO t .. lin daughters found that vowels of "the (.e) (a) (a ) type predom inated<br />

throughou t the babb l i ng perio d ". Kent and Bauer (1985 ) a nalyse d five<br />

infants at 13 months of age and report that "central and front vowe ls were<br />

favored over bac k vowel s, and low vowels p r edomina t ed over high vowe l s " .<br />

Buhr ( 1 980 ) who fol lowed a hi ld from the age of 16 to 64 weeks with<br />

biweek ly rec o r d ings finds that the acute ax is ( i -ae ) develops before the<br />

grave ax is (u-a) and explains this by the earlier devel opment of the jaw<br />

mu s cula tur e. Bickley ( 1 983 ) reports simi lar preferences in the vowe ls of 14<br />

infants' ear ly word produc tions between one and two years of age . Her<br />

results show that the Fl dimension (i- a) d eve l oped before that of F2 ( l - u) .<br />

That<br />

is, bak vowe ls did not our before late in the infan ts' repertoire.<br />

As menti oned earl ier a categor iza tion accord ing to phonotati structure<br />

was made of t he utterances . We were interested in seei ng when utterances of<br />

diff erent consonant and vowe l struture occ urred in the chi ld's product ions<br />

an d h ow they developed over time . In Figures 5: 1 through 5: 4 the<br />

114


CHILD K<br />

en<br />

100<br />

:::l<br />

I--<br />

«<br />

LU<br />

u...<br />

u...<br />

o<br />

LU<br />

()<br />

m<br />

50<br />

0:<br />

0:<br />

:::l<br />

()<br />

()<br />

o<br />

,<br />

,<br />

,<br />

,<br />

,<br />

,<br />

I--<br />

Z<br />

LU<br />

<br />

LU<br />

c..<br />

o<br />

, ,<br />

t _____ ______ ___________ _______ ..I<br />

0 -3 6-9 12-15 18-21<br />

en<br />

LU<br />

100<br />

0:<br />

:::l<br />

I--<br />

«<br />

LU<br />

u...<br />

u...<br />

0<br />

LU<br />

()<br />

z 50<br />

LU<br />

0:<br />

0:<br />

:::l<br />

()<br />

()<br />

0<br />

I--<br />

Z<br />

LU<br />

()<br />

0:<br />

LU<br />

c..<br />

0<br />

CHILD J<br />

- - - - - - - - - ------------- - -- - ---<br />

I<br />

0-3 6 - 9 12-15 18-21<br />

3-6 9-12 15-18<br />

en<br />

LU<br />

0:<br />

:::l<br />

I--<br />

«<br />

LU<br />

u...<br />

u...<br />

0<br />

LU<br />

()<br />

z<br />

LU<br />

0:<br />

0:<br />

:::l<br />

()<br />

()<br />

0<br />

I--<br />

Z<br />

LU<br />

()<br />

0:<br />

LU<br />

c..<br />

100<br />

50<br />

0<br />

3-6 9-12 15-18<br />

AGE IN MONTHS<br />

CHILD M<br />

,<br />

,<br />

,<br />

,<br />

,<br />

: --- - - - - - - - - - - --- --- - --- - - - - --<br />

0-3 6 -9 12 -15 18-21<br />

3-6 9-12 15-18<br />

AGE<br />

IN MONTHS<br />

AGE<br />

IN MONTHS<br />

D FRONT ffiillill CENTER<br />

_ BACK<br />

FIG 4:2-4:4<br />

The above Figures show percent occurrence of tongue<br />

position presented as a function of age. Data are only available for<br />

three of the four infants.<br />

115


&du.p I i c:at&d ( R B ) (which contains a<br />

secondar ily mod ified category cal led redup licated bonsonant babbl ing prime<br />

(R2' » , non-r&dup licated consonan t babb l ing (NR B) , var i egat&d consonant<br />

babbl ing (VB) , non-consonant babb l ing (NCB) and glottal bonsonant babb l ing<br />

(GB) are pr&sent&d for &ac:h of th& fou.r infants.<br />

Th& c:at &gory r&dup lic:at&d babb l ing (RB) consists of utteranc:&s wh&r& a<br />

non- glottal consonant is repeated one 01" mor& times e.g. mamama, lala,<br />

A sub-cat egory to RB, contains utt&r anc&s produc&d with a comp let&<br />

supraglottal constrict ion e.g. dadada, papa.<br />

This class wi ll be referred to<br />

as r&dup licat&d consonant babb l ing prime ( RB ' ) • Th& n&xt class , th&<br />

non- redup licated consonant babb l ing (NRB ) contains utterances with<br />

supraglottal non-r&dup licated consonants e.g. aba, na, lal , af , gao The<br />

cat egory : var i egated consonant babbl ing (VB) , consists of utterances with<br />

alternat ing consonants. These alternat ions may be of manner , place or<br />

voicing e.g. naeda, bada, dae ta. The non-consonant babb l ing cat egory (NCB)<br />

as the name implies of utt&r ances<br />

lacking<br />

consonants i.&.<br />

utterances containing on ly vowe l modu lations e.g.<br />

a, ai • The category<br />

glottal consonant bab bl ing (GB) , consists of utt&ranc&s containing on ly<br />

glottal consonants e.g. ?oh, ? ? ,hae .<br />

W& r&ad from th& Figures that NRB is pr&s&nt in the rep&rtoir & of al l the<br />

infants from an ear ly age and that it inc reases dramat ical ly short ly after<br />

the ons&t of RB ' at around eight months of age . As a contrast to the sudd&n<br />

onset of RB ' we see the more gradual appearance of the category RB as a<br />

whol&. Th& ar&a b&twe&n th& dotted and the so lid lin e of that cat&gory<br />

consists of redup licated utterances with frica tiv es , nas als, liqu ids and<br />

semi-vow& ls as consonants &.g. o aa a, mama, lalala and .. , awa. As can b&<br />

116


CHILD V<br />

fB<br />

rr 100<br />

o<br />

(!)<br />

W<br />

f-<br />

«<br />

o<br />

ll..<br />

o<br />

W<br />

o<br />

ill 50<br />

0:<br />

0:<br />

:::><br />

o<br />

o<br />

o<br />

f-<br />

Z<br />

W<br />

o<br />

0:<br />

w<br />

a..<br />

o<br />

,<br />

,<br />

,<br />

-____ - __ - - - ____ - - - _______ - ___ - - - - ___ ... - - - - - - _____ - - - - - - - ____ I<br />

2-3 4-5 6-7 8-9 . 10-11 12-13 14-15 16-17 18-19<br />

3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18 19-20<br />

AGE IN MONTHS<br />

CHILD K<br />

en<br />

w<br />

rr 100<br />

o<br />

(!)<br />

W<br />

f-<br />

«<br />

o<br />

<br />

w<br />

o<br />

z<br />

W<br />

0:<br />

0:<br />

:::><br />

o<br />

o<br />

o<br />

50<br />

f<br />

Z<br />

W<br />

o<br />

0:<br />

w<br />

a..<br />

o<br />

2-3 4- 5 6-7 8-9 10-11 12-13 14- 15 16-17 18-19<br />

3-4 5-6 7-8 9-10 11-12 13-14 15-16 17-18<br />

AGE IN MONTHS<br />

O REDUPLICATED VARIEGATED o.NON-REDUPLICATED<br />

W<br />

BABBLING l2J BABBLING . . BABBLING<br />

•. NON-CONSONANT r::::j GLOTTAL .<br />

BABBLING ru BABBLING OTHER<br />

FIG 5:1-5:4<br />

These Figures show percent occurrence of the phonotactic<br />

categories for each of the four infants as a function of age. The dashed<br />

line shows the percent occurrence of the subcategory Reduplicated<br />

Babbling Prime (RBI ) .<br />

117


en<br />

w<br />

o<br />

(')<br />

W<br />

l-<br />

tS<br />

iX 100<br />

l1..<br />

o<br />

W<br />

()<br />

aJ 50<br />

a:<br />

a:<br />

:l<br />

()<br />

()<br />

o<br />

I- Z<br />

W<br />

()<br />

a:<br />

w<br />

c...<br />

o<br />

CHILD J<br />

ri8Jj:: _________________________________________ _________________ _<br />

1-2 3 -4 5-6 7-8 9-10 11-12 13-14 15-16 17-18<br />

2-3 4-5 6-7 8-9 10-11 12-13 14-15 16-17 18-19<br />

AGE IN MONTHS<br />

en<br />

w<br />

o<br />

(')<br />

w<br />

t:<br />

()<br />

iX 100<br />

l1..<br />

o<br />

W<br />

()<br />

Z<br />

w<br />

a:<br />

a:<br />

:l<br />

()<br />

()<br />

o<br />

I Z<br />

W<br />

()<br />

a:<br />

w<br />

c...<br />

50<br />

o<br />

..<br />

"<br />

"<br />

,<br />

"<br />

",<br />

t\···.··:··.············<br />

CHILD M<br />

2-3 4-5 6-7 8-9 10-11 12-13 14- 15 16-17 18 -19<br />

3 -4 5-6 7-8 9-10 11-12 13-14 15- 16 17-18<br />

AGE IN MONTHS<br />

118


sen , these types of redu pl icat ion p r ecde RB ' . It is int e res t i ng to not i c e<br />

the scarc ity of these babb les. Redup l icated nasal utterances are almost<br />

In our exp er ience nasals of ten occ u r in<br />

discomf ort sounds. Our e xc luding discomfort sounds from the data might<br />

therefore account in part for the low occurrence of<br />

r edup l icated<br />

nasal<br />

utteranc:El's .<br />

We seEl' that VB is present in thEl' El'arly productions and that it<br />

towards t h e end of t h e first year . The infan t s appear to d i vide into two<br />

groups with regard to amount of VB at thEl' end of the study.<br />

Infants V and J<br />

have a m aj or i ty ()45%) of VB at the last s amp l ing point, whe reas K and M<br />

have below fift een pEl'rcent .<br />

The NCB catego ry decreases towards the end of the first year . The max i mum<br />

pea of occurrnce appars about a month before t he onset of RB ' in thre<br />

of the infants V,K and J. The fourth chi ld M h as an extensivEl' per iod of<br />

vocal play with a m a x imu n pea abou t 3 months before the onset of RB ' .<br />

With regard to glottal utterances we see that the ear ly g l o t tal dom i nanc e<br />

is altered by the s upr ag l o t t al art i c ulat ion s i n t r odu c ed m a inly in the<br />

second half of the first year . A trend t hat has already been dEl'monstrated<br />

by the s h i ft in place of art iculatio n (see F i g u r e 1) .<br />

119


Smm.r izing the data presented sa f.r Ie find five deve l cpmen t.l b .bbl ing<br />

stages in the per iod stud ied. These .re;<br />

I<br />

the glott.l st.ge<br />

II<br />

the Vii 1 ar ILtVU Jar s t ag e<br />

III<br />

the vac:.l ic: st.ge<br />

IV the redup l i c:ated c:anscnant babbl ing stage<br />

V the v.r i egated c:onsc:ln.nt b.bb l in9 st.ge<br />

The ages at wh ich the d i ffer ent mi lestcnes oc:cr in the infants c:an be seen<br />

in Figre 6. For referenc:e the res lts frcm a Dtc:h study cf 51 infants is<br />

super imposed (van dar Stelt and Kcopmans van Be i num 1986 ) .<br />

first samp ling point between eight and twelve weeks of age . Th is st.ge is<br />

c:haractar izlid by utterances wit h glgtil c:anscnants and nrcunded cften<br />

nasa lized c:entral vowels. The seccnd most fr& quent typ e cf vcc:al izat icn<br />

dur ing this per iod invalves syll.b ic: n.sa ls. There is a certain amount of<br />

individual va riab i l i t y as tc wh ic:h cf<br />

considerable am o u n t of 9lott al ccnsonants in the bab bl ing. As a resu lt of<br />

the .hift to sp r aglcttal art ic:lat ions that c:ome with the RB stage , the<br />

fre quenc: y of o c:c:u r r a n c: e of glotta ls shows a d r as t i c drop .<br />

In stage II we not ic:e • first se of spraglcttal artic:ulat ions wh ich are<br />

non-nasa l . These a r & t yp icall y art iculat&d at the plac:e cf<br />

ar t i cu l a t i o n . The c;cnscnants t y p ic al cf this stage are vo i c:ed fric:ativ es .<br />

120


en<br />

w<br />

z<br />

0<br />

I-<br />

en<br />

w<br />

....J<br />

<br />

V<br />

IV<br />

III<br />

II<br />

f-----------------------------------------------------I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

MJ V<br />

I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

!<br />

I<br />

I<br />

I<br />

I<br />

/T<br />

I<br />

I<br />

I<br />

I<br />

K I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

I<br />

L _____________________ ______________________________ J<br />

o 10 20 30 40 50 60 70 80<br />

AGE IN WEEKS<br />

FIG 6<br />

Th i s " Figure shows the age of oc currenc of the<br />

mi lesto nes I through V for each of t he four infants. The m ean age of onset<br />

of t h e f our th mi lestone RB ' , is 34 weeks w i t h a standard deviat ion of seven<br />

wee k s . The ind i v idual ages of onset are J 26 , V 33 , M 34 a n d K 43 weeks.<br />

Super i mposed are t h e findinS5 of a D utch study Ivan der Stelt et al . 1986 )<br />

of 51 inf a n t s whe r e the mean age o f o nse t far r edupl icated bab bl ing w as<br />

fund to b e 31 weeks wi t h a stand srd dev iat i o n of six and a half w eeks.<br />

121


is t h a t of inc omp l e t e clasures ou t number ing comp lete ones . Th is stage wou ld<br />

correspond to what has been cal led the "go oing" and "cooing" stage in the<br />

(Ol ler 1980 , S t ark 1980 ) . Th is mi lestone occurs as an<br />

expansion of ve lar /uv ular art iculations resulting in this place of<br />

art iculatio n becoming the second most fre que nt between fifteen and nineteen<br />

weeks of age .<br />

The Y2li£ t ag wh ich fol lows the ve l ar /uv ular stage , is a period in<br />

wh ich the infants produce a large number of non-consonant utterances . These<br />

utterances are best described as re l at ively<br />

long voca l i zat i ons with<br />

non-speec h l i k e int onat ien p a t t erns (Reug et al .<br />

forthcoming) ,<br />

resemb l ing<br />

singing pat terns rather than s p eec h . Bi labial<br />

tri l l s , prol enged and<br />

unrel eased bi labial stops with lots of sal iva, insp i ratory uvu lar tri l l s<br />

(snorts) etc . are a lsc int rcduced dur ing this per iod . It might be sa id that<br />

dur ing this stage the infan t exp l ores both the per iodic and non-per iodic<br />

sound sources of the vocal tract . The vocal i zat icns are var ied bet h in the<br />

intens ity and frequency domain as we l l as in mede ef phonat ien i.e. ve ice<br />

qu al ity. These mcdu l ated vcwe l utterances are present over the whele per ied<br />

stud ied but an expansien in the number ef predu ctions, resulting in a<br />

per icd of max imum occurrence ef NCBs , in the months<br />

preced ing the onset of RB ' . It is accord ing te this per ied that the stage<br />

is def i ned .<br />

the<br />

chi ld deve lops<br />

the ab i l i t y to reproduce sy l lables in sequences . Most ef these utterances<br />

are praduced with a fu l l step censonan t contrasted with an open , central<br />

vawe l<br />

adad<br />

• These sequences are typical ly rhyth mic in<br />

thei r al ternat ions . Rhythmic behav ior, such as reck ing and kicking, has<br />

been found to d eve l o p around six m o nths of age in norma l infants (Thelen<br />

1981 ) . Redup l i cated babbl ing can be cons idered one ef these rhythm ic<br />

122


Svr.l inf.nt. have .ccord ing to th rport. of .om of th<br />

p.rents, been ob.ervd moving th j.w rhythmica l ly p and down witho t<br />

phon.t ing, a fw d.y. bfor th on.t of RB ' . Th .mont of ttr.nc.<br />

with nas al , semi -vowe l and fricat iv redp l i c at i ons con.t itte a mi nor<br />

port ion of th total nmbr of redup l i catd uttrancs . Th mrgnc of<br />

this mi le.ton is def i ned ac co r d ing to the abrpt on.t of redp l i cated<br />

ful l stop consonant bab bl ing (RB' ) . Th ag of onst of this stage var i.<br />

in our infant. from week 26 to week 43.<br />

Th onst of th last stag , t he is t& £onnl eeel1n9 1 , is<br />

def i ned accord ing to the fir.t .amp l ing point af ter the on.t of RB ' , at<br />

wh ich thr is a maj or inc ras in th nmbr of p rod u ct ions with<br />

alternat ing consonants. Th var i gated tteranc. are fond throughou t the<br />

.tdy but inc ras dramat ical ly toward. th nd of th fir.t yar of l i f.<br />

Th is type of b abb l i ng can be con.ide red an elaboratd form of rd up l i cated<br />

both with regard to sgmntal and .pr a.gm ntal<br />

f eatu res .<br />

We f i nd that the i ntonat i onal pat tern. and th .egm ental<br />

however not the case in v ar i egate d con.onant bab bl ing. Hre the c h i l d<br />

producs a var ity of consonants ovr laid on what sm. to b a typical<br />

sentence- l i k e intonat ion p at tern . The chi ld al.o var i. between sy l lable.<br />

that ar prc i vd by th l i stnr as b ing .tr.sd and nstr.sd . The<br />

extent to wh ich the chi ld exp l ore. this type of bab bl ing seem. to b highly<br />

individu al . As mnt i ond arl ir two of or infants V and J, have a high<br />

percentage of var i egated bab bl ing at th. laat record ing seaa ion w hereas the<br />

ot hr two K .nd M, do not seem to favor t hi s t y p e o f babb l i ng (see Figur<br />

5) . At this point of deve l opment it i. important to rememb er that what we<br />

hr rgard as babb ls might in fact be ar ly words. Th inc reas of NRB i n<br />

K at 14 mont hs, might be an expan& ion o f her ear ly lexico n. Infant M shows<br />

a simi lar xpansion of NRB at 15 months. It might b that K .nd M hav ,<br />

123


what is known as , a more analyt ic approac h to language (Vihman 1986 ) thus<br />

preferr ing .horter , .ingle word -or i ented utterance. rather than the<br />

ho l i st ic


(Ol ler 1980 , Stark 1980 , Koopmans van Be i num t al . 1986 )<br />

simi larities are found .<br />

mi l estones for the stud ies ment ioned . The overal l picture is one of gneral<br />

agreement in dis rn ing fiY m in the bab bl ing of normal<br />

infants dur ing the first year of l i f. However some disagreement as to age<br />

of onset and durat ion of the var i ous st ages ex ists probab ly ow ing to<br />

individual variation of the<br />

infants<br />

in the groups stud ied and varyin g<br />

methodo logical approac hes .<br />

Ol ler (1980 ) does not ment ion glottal<br />

consonants as a character istic<br />

feature of the first babb l ing stage , instead he stresses the nasal qual ity<br />

of the vocal i zations. However he does mention the ex i stence of<br />

throaty sounds" (p.95 ) when refe rring to other authors' findings . It might<br />

be that thse sounds are in fact glottal . Stark (1980 ) talks of "ref lexive<br />

voc a l i zat ions" as the first stage and states that the consonants of the<br />

newborn per iod "are almost always glottal stops, nas als or l i q uids" (p.77) .<br />

The Dutch data (Koopmans van Be i num et al . 1986 ) suggsts "glottal stops in<br />

ser ies" as a main character istic of this stage . We find ma i nly glottal<br />

consonants bu t also sy l l ab ic nasa ls in ou r study . If we consider the<br />

anatomical /physiological aspects of the very young infant 's vocal tract we<br />

find that there are considerabl differences compared with that of the<br />

ad ult. Kent and Murray (1982 ) l i st several such important diffrences and<br />

stress the importance of conside ring these facts when "ex plaining patterns<br />

of hange in infant vocal i zat ion s" . The very young infant is "an ob l i g ate<br />

nasa l breat her and an ob l i gate nasal vocal izer " accord ing to the authors .<br />

Th is is exp lai ned by th fact that there is "en gagement of larynga l and<br />

ve l opharyngea l structures " lea ding to mai nly nasa l ex it of air. The<br />

separat ion of these structures deve lop at four to six months of age around<br />

125


Age in OLLER S TA RK KvB et a!. ROUG et a!.<br />

Months 1980 1980 1986 1987<br />

1<br />

PHONATION<br />

REFLEXIVE<br />

-----------<br />

UNINTER-<br />

RUPTED<br />

PHONATION<br />

2 GOO STAGE - --- -- - ---- ---INTEFI:'--<br />

COOING RUPTED - ------ - - --<br />

GLOTTAL<br />

3 AND PHONATION<br />

ONE STAGE<br />

-------- ---.<br />

LAUGHTER<br />

4 EXPANSION<br />

ARTICULA-<br />

- ------ - - - - TORY ----- VELAR - - - ----<br />

STAGE<br />

5 MOVEMENT<br />

__ qTt- ___<br />

VOWEL VARIAtiONS<br />

VOCA LIC<br />

6 PLAY IN THE<br />

- ---------- PHONATOI3Y STAGE<br />

CANONICAL<br />

7<br />

DOMAIN<br />

BABBLING<br />

-1'-----------<br />

8 STAGE - ------- - - REDUPLI -<br />

- - - --- --- ----<br />

REDUPLI -<br />

CATED<br />

CATED<br />

REDUPLI -<br />

9 ARTICULA-<br />

BABBLING<br />

CATED<br />

TORY<br />

10<br />

CONSONANl<br />

--- - -------<br />

VARIEGATED INGljNON- MOVEMENTS BABBLING<br />

11<br />

BABBLING WORD RE - STAGE<br />

STAGE PRO - RUP<br />

12<br />

DUC - r..;AT , E<br />

- ---- - -- - --<br />

PROTO - rTlONS BABB<br />

13<br />

LING<br />

WORDS<br />

rv'ARIEGATED<br />

BABBLING<br />

14<br />

15<br />

'1 6<br />

17<br />

18<br />

19<br />

20<br />

--- --- -----<br />

- - ---<br />

- -----------<br />

STAGE<br />

FIG 7 Th iE Figure shows a com pilation of the findin gs of four<br />

stud i E-t; of infants' ea'-· l:1 '.Iocal de'J E- lcpment . Each column refers tc a study .<br />

The apprcx imatQ age of onset and durat ion of eac r, tag I S I n6, c Ed y the<br />

timE- sc ale on the ord inate . The s t a ges are separated by das h ed lin es to<br />

inicate the cont inui ty between stages .<br />

126


the time of o nset of r edupl icate d babbl ing. Wit h regard to glottals, we<br />

e x p l ain s the p r eva l e n c e of glottal productions in th arly reprtoir.<br />

In t h fol lowin g bab bl ing stag , th goo- or coo-stag , th consonant<br />

sounds "have pre dom i n a ntly a ve lar plac of art iculatio n" accord ing t o<br />

stark ( 1 980: 95 ) . Simi larly al lr ( 1 980: 9 6) &tats that ther i& a "vel ar<br />

consonant prf erence" . In th Dutch s tudy (Koopmans van B i num t al a 1986 )<br />

the ma in charater i&tic of thi& &tage i& consi drd to be th "onset of<br />

ar t iculatory movemen ts" . Accord ing to our data these movements ar producd<br />

by rai&ing or retracting the back of th tongu to form mai nly incomp lt<br />

constrictions against the ve lum or uvula. It has been proposed that the<br />

high occurrenc of dor&al sounds wre due to &pecific body po&ture (c.f.<br />

Locke 198 3) . The h y poth e sis states that in the sup ine positio n the ffcts<br />

of gravity upon the sof t structures of th vocal tract would<br />

result in a<br />

predomi nance of bac k art iculations . Ol ler and Gavin<br />

analyzed ten infants in th ages of one to four months, when gooin g is<br />

supposed to be predominant . The infants wre recordd in both sup in and<br />

u p r ight position. Th rsults revaled no support for the body posture<br />

hypothesis . The infants did not p r o duce more goo- s ounds in the sup ine<br />

position, in fact there was a sl ight preference for goo-sounds in the<br />

upr ight posi t ion. An o t h e r poss i ble e x p lanat i o n for the occurrence of<br />

babbl ing stage& is to regard them a& resu lts of neuro logical maturat ion.<br />

Accord ingly, the ve lar art iculatio ns found in this stage could be accou n t d<br />

for by nuro logical maturat ion factors &uch as the earlier myl inizat ion of<br />

the cranial nerves r espo n sible for the muscu lar control of the poster ior<br />

part of the tongue (ecour& 1975 ) and /or o f th earlier myel inizat ion of<br />

the c orre spo n ding area in the primary motor cortex (Whitaker 1973 , Sa lus<br />

and 5a lu& 197 4) . However , recent data on early neuro logical deve l opment<br />

(Ra kic, Bou rgeo i s , Ec kenhoff , Zecev ic and Gold man- R a kic 1986 ) suggests that<br />

127


th is might not be a C:Clrr&ct; " Tn l::Q p·r ' tat:f6n· . · ihS? '"" ' n ' u r Q'l c9'i C:ia.l maturat ional<br />

processe s do not appear to d evelop in a hierarchi cal Clrder but rather as a<br />

glClba l process . Resent data also suggest that myel inizat ion might nClt be<br />

necessar i l y r elev ant to the funct ion of the nerve (Foster , Connors and<br />

Waxman 1992 ) .<br />

vowel -l ike art iculations is found . Koopmans van Bei num et al .<br />

(1986 )<br />

the phonatory doma in concerning intonat ion, durat ion and intensit y" .<br />

Stark<br />

(1980 )<br />

such as pitch leve l , pitch change and loudness are manipulated . Apart from<br />

marginal babb l ing and vowel-l ike elements (Uful ly resonant nuclei " ) Ol ler<br />

(1980 ) talks of raspberry voca l i zations, squeal ing, growl ing and ye l lin g as<br />

typical vClca l i z at ion categor ies of this stage . Ol ler (1986: 29 ) descr i bes<br />

these voca lizat i ons as representing an "exp l orat ion of a vocal<br />

potent ial "<br />

and of an<br />

intensity, f r e qu ency and phonatory dom ai n. The ab i l i t y to control these<br />

aspects of vocal production could be regarded as vital to later speec h<br />

deve l opment. Af ter having exp l ored the vocal capacities the chi ld can begin<br />

to mod ify and ref ine them accord ing to the requ i rem ents of the amb ient<br />

language .<br />

A maj or phonet ic mi lestone dur ing the first year of l i fe is the product ion<br />

of redup l i cated babbl ing. The onset of this mi lestone has been found to be<br />

fairly sl.ldden , occl.lrring around seven months of age . Accord ing to Ol ler<br />

(1980:99 ) , this is the fir s t stage in wh ich the chi ld produces sy l lables<br />

"that conform to mature natural language restrict ions" i . e . sy l l a bles that<br />

could be accep t ed from a phonological point of view. Th is is an important<br />

ac hievement since it means that the infant now as a fortl.li tous consequence<br />

128


of a natural movement can prod uce voca l i zat ions ac ceptab l as words in an<br />

adu lt langu age . Consquntly adu lts now bgin to rport "words" from thir<br />

infants, or that they have begun to "talk". Interest ingly in what is known<br />

as "Baby Ta l k" a simpl ifid lxicon is usd by adu lts (or older sibl ings )<br />

when ad dre ss ing infants where the pr inciple of redupl icat ion is app l i ed<br />

givin g rise to forms such as "vovve " , "pippin and "totto" in l i eu of<br />

Swed ish "hund " (dog ) , "fAg el " (bir d) and "hast " (horse) . The fact that the<br />

words used in many languags to dnote the two most important pegp l (mommy<br />

and daddy) for the chi ld have th same repet itive structure strikes one as<br />

b ing mor than a coincidence. Instead it seems likely that the adu lt<br />

language has chosen phonet ic forms simi lar to tho s e of the chi ld's own<br />

product ion patterns threby creat ing a l i n k of denotat ive function between<br />

the infant 's voca l i zat i ons and the adu lt langu age (c.f. Loc ke 198 3) . McCune<br />

and Vihman ( 1 987 ) has suggsted that th infant when produc ing his/her<br />

first words might be se lecting adu lt w o rds on the same basis i . e. the chi ld<br />

on ly attmpts to produc those words that fit the pro duction patterns of<br />

that ch i ld's babb l ing repertoi r e. As ment i oned earl ier, redup l i c ated<br />

babbl ing is a rather monotonous type of babb l ing, bath with regard to<br />

intonat ion and to the const ituent consonant and vowe l s egments . Therefore<br />

the maj or character istic of this stage could b said to be the<br />

redu pl icat ion itse lf .<br />

In the fol lowing mi leston, th var i gatd (or non-rdup l i cated ) bab bl ing<br />

s t age the chi ld seems to comb ine certain featurs of the intona tio n il, l<br />

var iations of the prev i ous vow l-stag with th redup l i cil,ted uttril,nces ,<br />

resulting in sentence- l i k e stri ngs of babb le with il, lternat ing consonants,<br />

vow ls and pattrns of stress . Ol ler (1980 ) refers to this last type of<br />

babb l ing with contrasts of sy l labic stress as "gibberish" , others have used<br />

th trm "j argon " (Menn 197 8) . Stark (1980 ) differs from Ol ler by including<br />

non-redup l i cated u tteranc es (e.g. V, VC , CVC ) in this stage . We find, along<br />

129


th& l i n &s of Stark , that th&r& is an expansion in the number of NRBs<br />

fol low i ng the onset of r edu pli cat ed babb l i ng .<br />

From the above observat ions the tentat ive conc lusion can be drawn that<br />

infants fol low a uni versal d eve l opmental pho n et i c course in their babbl ing<br />

l i fe. Babbl ing deve lops from what might be<br />

consi dered a. p roto - sy l l ab i c form. into more speec h - li ke phonet ic events<br />

acceptab le as parts of mature natural languages (c.f. Ol ler 1986 ) . If the<br />

phonet ic repertoi re of the babb ler is compar ed with the phonet ic patterns<br />

most common ly found in the language. of the world, i . Qt • language<br />

un i versals, clear simi larities are found . The preference for open (CV)<br />

sy l lables over closed (VC) , sin gle consonants over clu .ters, vo i ced initial<br />

stops over vo iceless , unvo i ced final obstruents over vo i ced , initial stops<br />

be ing ap ical rather than dorsa l and final obstruents be ing preferably ve lar<br />

or glottal are examp les of suc h .imi l a riti es . These p aral l els are<br />

interest ing since t h ey i mp ly cont inuity in bab bl ing and speech.<br />

If one accepts the cont i nuous view, the qu estion s ar ise of h2 and hn<br />

babbl ing beg i n s to show influ ence from the amb ient language . There are<br />

those who be l i e ve in an ear ly detectable influ ence in the infant 's bab bl ing<br />

(de Boysson-Bard ies 1982 , de Boy.son-Bard ies, S ag art and D urand 1984 , de<br />

Boysson-Bard ies, Sagart, Hal le and Durand 1986 ) and those who be lie ve that<br />

l a ngu age dependenc ies begin to af fect the chi ld's productions at a<br />

re latively late stage (ocke 1983 ) . The simi larities between bab bl ing and<br />

l a nguage un iversals have been taken to support the view that early vocal<br />

deve l opment is due to innate biological prereqUisites for language<br />

Loc ke 1983 ) . We consider it important to distingu i sh between the s e gme n t a l<br />

and the supra- segmental lev els of vocal behav ior when discussing phonet ic<br />

deve l opmen t. In our view the supra-segmental feature. specific to a given<br />

langu age might be ac qu ired earlier than the segmen tal . However as Vihman et<br />

130


al ( 1986 ) points o ut , we do not y e t know what features in the babb l ing of<br />

young infants are due to exposure of specific language versus language<br />

!: !!<br />

•<br />

ConSider ing the present data, two maj or quest ions ar ise,<br />

We be l i e ve that a tentat ive answer to the f i r s t quest ion l i es in the<br />

understand ing of man as a commun icative be ing. Commun icative behav iors are<br />

found in al l floc k an imals and const itute a necessity in order for the<br />

members of the flo ck to funct ion as a who le (Wi lson 1980 ) . In spec ies where<br />

the young offspring d emand s a great deal of attent ion and care from the<br />

mother it is of vital importance that the parent-o ffspring relation be<br />

strong and fundamental . The newborn infant is incapable of caring for<br />

itse lf. It is in other words total ly dependent on the caregiver for<br />

survival . From this point of view it is not difficult to see why the infant<br />

deve lops commun i cat ive behav ior in response to the caregiving treatment .<br />

The function of this behav ior is to t ie the two individuals emot ional ly to<br />

eac h other . We know that infants synchron ize their body movements to that<br />

of adu lt speec h (Condon 1974 ) and t hat they are born with the capac ity to<br />

imitate facial express ions and ident ify vocal sounds (Kuh l and Me ltz off<br />

1984b) . Further it has been observed that infants by t h e end of the first<br />

mon th of l i fe begin to respond to speec h with vocal i zations (Trevarthen<br />

) . This, we be l ieve , i mp l ies a biological component respons ible for the<br />

foundat ions on wh ich the commun ica tive behav ior is based .<br />

It i s known that infants also vocal ize outside of communi cat ive situat ions .<br />

They babb le when playing alone . Therefore bab bl ing cannot be seen as having<br />

an exc lusively communica tive funct ion, nor can speec h (c.f. P i age t 1973 ,<br />

Vygotsky 1971 ) . There are individual needs in the infant such as se lf<br />

131


st imulation and play to be consi dered . Just as play is an important part in<br />

t he chi ld's learn ing to control its c o n stantly growin g body, bab bl ing can<br />

be consi dered as exp l oratory play aimed at control l ing the rapidly changin g<br />

ap p ar atu s (c.f. Fry 196 ) . It has been s uggest ed t h a t experie n c e is<br />

not essent ial to the normal deve l opment of vocal behavior, as indicated by<br />

reports from infants who for med ical reasons have been p r evented from<br />

babb l ing (Lenneberg 1967 ) . However the tracheotom ized infant referred to by<br />

Lenneberg who had a tube inserted at eight months of age and had it removed<br />

at fourteen months had most probab ly begun to produce redup l i cated babb li ng<br />

at the time of insert ion in wh ich case the effects of the tube on phonet ic<br />

deve lopme nt might have been minor . This quest i on as we l l a s the quest ion of<br />

the importance of aud itory exper ience in b abb l ing is st i l l under debate.<br />

The r epor ts stating that d e af infants babb le as hear ing inf a nts do<br />

(Lenneberg 1967 ) are contrad icted by recent data (Ol ler 1986 , Stoe l -Gammon<br />

a n d Otomo 1986 ) suggest ing that deaf infants do not p r odu c e redup l i cated<br />

fu l l stop babb les dur ing the first year of l i fe.<br />

The answer to why t h e r epe r toire of the babb ler seems to be a uni versal one<br />

might be found in anatomical and aerodynamic constrai nts on the vocal<br />

mechan i sms with consi derat ions of the p h y s i o l og i c a l and neuro logical s t age<br />

of deve l opment. If one thinks of the art iculators in terms of a spr ing and<br />

mass system .. ,here a certai n amount of imped ance is present in eac h<br />

movement , one can explain some art iculatio ns as be ing more economical i . e.<br />

more eas i ly p roduc ed , than others. By economic is meant those movements<br />

that require the least amount of energy<br />

in order to be performed in<br />

relation to the structures invo l ved , a sort of art iculatory cost-benef it<br />

analysis (c.f. Lindblom, MacNei lage and Studdert-Kennedy, in press) . The<br />

se lect ivity of the babb ler in relation to pho n et ic preferences mig h t be<br />

understood in simi lar terms . The chi ld produces those art iCUlations that<br />

are the most economical and t hat acoust ical ly are the most s al ient. Th is<br />

132


impl ies that norma l babb l ing prespposes intact ad itory funct ion.<br />

If one considers the redp l i cated babbl ing stage in simi l a r terms the<br />

open ing and closin g movements of the jaw reslting in sy l l ab le-like events<br />

might be viewed as an oscil l at ing system , the constraints of wh ich are set<br />

by neurological maturat ion. The consequences of this redup l i cated behav ior<br />

might be that the infant so to speak discovers the sy l l ab le and indirect ly<br />

the supraglottal f l l stop art iculation. In order to voluntari ly contro l<br />

and coo rd i nate a movement it might be necessary first to produce this<br />

movement repet itively (c.f. Thelen 1981 ) . In the case of babbl ing, factors<br />

such as the length of the breath cyc l e would determine the durat ion of the<br />

ear ly repet itive vocal i zat ions . kater , as the ab i l i t y devel ops to contro l<br />

movements vo lntari ly, the chi l d can free itse lf of sc h bon ds and more<br />

free ly determine the number of redupl icat ions.<br />

To conc lde, the ev idence am assed in the l i t e r atre as we l l as that<br />

presented here strong ly impl ies that babbl ing can be consi dered a precursor<br />

of speech, in form as we l l as in fnction.<br />

The authors are great ly indebted to prof essor Bj o rn ind blom for his<br />

ass istance and he lpful comments on the manuscript .<br />

We wold also l i ke to thank Mari lyn Vihman for being a great sorce of<br />

insp ira tion and for commenting on the manuscript .<br />

We thank the parents and the inf a nts who p a r t icipated in this proj ect for<br />

their pat i ence and enthusi asm without which this proj ect never wou ld have<br />

133


We are indebted to Karin Ho lmgren for hr initial work in col lecting and<br />

analysing data and we thank Boe l Harl id for her ass istance in transcribin g<br />

the mater ial .<br />

docent Birgitta Jai l i n g, doctor Goran Aurel ius and doctor Ann-Sof ie<br />

Er icsson at Sankt Garan's Chi ldren's Hosp ital for int erst ing and fruitful<br />

col laborat ion.<br />

The authors wou ld l i k e to express thir thanks to Harved He llich ius for<br />

drawin g the Figures and Tab les of this article.<br />

134


FOOTNOTES<br />

1 Spgn5gred by Fijr5tmj blommans R ik5 f6rb nd (Fir5t gf May Flower Annl<br />

Cam paign for Chi Jdren'5 Health) .<br />

2 The chi ldren were medical ly exmined at birth and a p5ychDmDtgr<br />

deve l opment test ( The G rif fit h Ment al Dev e l opment Sc ale) was perfo rmed at<br />

5, 10 and 19 mgnth5 o f ge . The chi l dren5' re5u lt5 were found to be we l l<br />

bove those of a s t a n d a r d group on al l occ as ions (Norberg 1994 ) .<br />

3 The equ i pment u5ed wa5 a Sony table microphgne nd a Uher tape-recorder .<br />

4 The signal was computer digital i zed and ed i t e d with n ILS progrm cal led<br />

M IX, deve l oped at the Royal In5t i tute of Technol ogy in Stockholm. The<br />

ac tul cutting po int in the signal would be an intensity zero poi nt as<br />

clos e as pos5ible to where the chi ld's utterance began . We were careful not<br />

to ex lude initial Dr final vo iceless segments and we thought it important<br />

that the utterance wa5 not di5torted by the cli pp ing 50 a5 to sound<br />

unnatural .<br />

135


Bic;l dQY, C. ( 1 9S3 ) . A C; Q U lii t i c: Ev i dQnc;Q fQ... Phenele glc:al<br />

Vewe I Iii i n Young Chi ld ... en . Speeh Commun ic;a tion G ... eup<br />

Resea ... l:h l. abo ... ate ... Y of Elel:t ... en ic;s, M. I.T • • . 111-124 .<br />

Deve l Qpment<br />

ef<br />

We ... k ing P .. pe ... s ,<br />

Biklay, C. , Lindblom, B. and R eug , L. (1986 ) . Ac;Qulit ic; Measu ... eli Qf Rhythm<br />

in Infant 's Babbl ing. Pape... P ... esented at the P ... Ql:eed ingli ef the 12th<br />

Inte ... nat ipnal Ceng ... eliis on ACOU li tic;li, TQ ... cntCi.<br />

de BeYlii lien-Ba ... d ieli, B. ( 1 982 ) . D c Babieli Babb le a li<br />

p ... elientad at t h e Inte ... nat icnal Confe ... ence Qn<br />

Texas .<br />

Speake ... 1i Speak? Pape ...<br />

Infant Stud ieli, Au.t in,<br />

de BQYlison-Ba ... d ieli , B. , Saga ... t , L. and<br />

Diffe ... enc:es in the Bab b l ing ef Infantli<br />

Jou ... n al ef Ch i ld Lan guage . 11 . 1-15.<br />

Du. ... and , C. ( 1 984) . Dilil:e ... n ible<br />

Acco ... d ing te Ta ... g9t -Langu.age .<br />

de Bt'lysson-Ba ... die s, S. , Sagil. ... t , L. , Ha, ) l e, P. and DU ... iilnd , C. ( 1 9 86 ) .<br />

Aceulii tic: Inv elit igat i enli ef C ... esli-l ingu. ilitic: Va ... i abi l i t y in Bab bl ing. In<br />

Lindb lgm and Zette ... st ... cm (ed s. ) P"'aB o f Ea!:l::t. .EEJ!h . Nel., Ye ... k :<br />

St, oc l< tcn P ... es s .<br />

Suh ... , R. D. ( 1 980 ) . The Eme ... genc;e ef Vewe ls in an Infant . leu ... n al ef Speec h<br />

and Hea ... ing Relie .. ... c; h. al . 73 -<br />

94 .<br />

Bu l l c wa , M. ( 1 979 ) . Bef e E£h •<br />

Camb ... idge Un ive ... s ity P ... elili .<br />

Elulih , C.N. , Edl .. ... dli, M.l .. , LUI:I


Kent , R.D. (198 1 ) . Art iulatory-Aou st ic Persp ec t ivs o n<br />

Deve lopment. In Stark (ed. ) ban e hyi o in !nfnEX<br />

ChU&b.QB& . Nec.-, Yorl-: : £:: lsev ier NCirth Ho l land .<br />

Sp eec h<br />

i\ n & §!!.!:lx<br />

Kent , R.D. a n d Bauer, H.R. (19 85 ) . VClc a l i zat i ons of on-year-o lds.<br />

of Ch i ld Language . Ai . 49 1 -526 .<br />

J ou r nal<br />

Kent, R. D. and Murray , A. D. ( 1 982 ) . ACCIust i Features of Infant Voca lic<br />

Utteranes at 3,6 and 9 months . Journal of the Aous tic Soc iet y of Amer ica.<br />

Z2 :g<br />

• 353- 365 .<br />

Kuhl, P.K. and Me ltzoff, A. N. (19S4b ) Infant's recCign ition of Cross-modal<br />

Corr e s pon d enc e for Sp e ec h: Is it based on Phy s i c s or Phont ics? Journal of<br />

the Acoust ic Soc iety of Amer ica. Za . Suppl .l. sao .<br />

K oopm a n s van Be inum , F.J. a n d van dr S t e l t , J.M. (1986) . Ear ly S tag es in<br />

the Deve l opment of Speech Movements. In I_ indblom and Letterstram (ed s. )<br />

Pr!!E.!!!:rs of Eatl::t. §eJ!b. • Nec.-, Yorl!': Stockton Press .<br />

L.eours , A.R.<br />

and I.. anguage .<br />

D e12I!!!li.:..<br />

Press .<br />

(1975) . Mye l og&net ic Correlates of the Deve l opment of Sp eech<br />

In Lenneberg and L.enneberg (eds. ) Eni2n 2£ ks<br />

8 l:!!!!lt idiE.ielia 8EEJ:2 h.l.. 9.l.!..! Nec.-I Y Cl rk: Academ i <br />

Lenneb erg, E.H. (1967) . Biologica) Foundat ions Clf L.angu age . New York: John<br />

W ile y 8t Sons .<br />

L.enneberg, Ii.H. and I_ e nneberg, E. ( 1 975) .<br />

DelBE.!!!t : 8 l:!!t i &.i![E.iI!lia 8EEJ:c h.l.. Q.!.!..l<br />

Press .<br />

FOillld at i!:!!l of L.a!lS<br />

New York: Academ ic<br />

L.indb lom, B. , MacNei lage , P. and Studdert-Kennedy, M. n i - Kom s hia n,<br />

Kava n agh a nd Fe r gu s o n (ed s. ) £b.il& Ehong las::t..a.. Va!.:..!<br />

137


Ol l&r, K.D. (198 1l . Infant V C:II: al i zat ion.: Exp lcrat ion and R&f l&xivity. In<br />

Stark (&d. ) b9 e hyicJ:. in .!nfn£:t m!. §l:t £hil£!.h!!S!£!. . 1'1& ... , Ycrk:<br />

El6ev ier North Hc l la nd .<br />

Ol l&r, D.K. (1986) . Metaphcncl cgy and Infant Vocal i zat ion •. In Lindblcm and<br />

Zetterstri:im (eds. ) Prll!:§Br. of Eatl:t. §E.h • I'I&W York : St.cckt.cn Pr& •••<br />

Ol ler, D.K. , Wieman , L.A. , Dcy}&, W.J.<br />

and Sp&ech. Jcurnal cf Chi ld Langu ag& .<br />

and Rc •• , C.<br />

;! • 1-11.<br />

(19 76 ) . Infant Sabbl ing<br />

Ol ler, K.D. and Ei lers, R.E. (19 82 ) . S i mi l ar i t i &. of babbl ing in Span ishand<br />

Engl i.h-l&arn ing babi& •. Jcu rnal cf Chi ld Langu age . 2 . 565-577 .<br />

Piage t, J. (19 71 ) . Lanss nd Thshi 2i ihe £ hil£!. . I'I&W Ycrk: Th& Wcr l d<br />

Publ ishing Company . Translation by M. Gabain frcm French or igin al "Le<br />

Langag& &t la P&n.&& ch&z l'E nfant". ( 1 968 ) .<br />

Pike, K.L.<br />

Pr&6 •.<br />

(19 43 ) • Ann Arbor :Th& Un i v&r.it.y cf Michigan<br />

Rakic, P. , Bour g&o i., J.P. , Eck&nhoff , M.F. , Z&c&v ic, 1'1 . and Gc l dman-Rak ic,<br />

P.S. (1986) . Concurrent Ov&rproductic n of Synapse. in Diver.e R&g ion. of<br />

th& Pr imat.& C&r&bral Ccrt.&x . Sc i&nc&. . 232-234 .<br />

Roug, L. , Landb& rg, I. and Lundberg, L.l. , (fo rthcoming) . Acoust ic Anal yses<br />

of Four Sw&d i.h Infant.. Ear ly Vocal i zation • .<br />

Salus, P.H. ilnd Sa lus, M.W. (1974) .<br />

Phono logical Acqu i.ition Ord&r . Languag& .<br />

Deve lopmental Neurophysiology and<br />

3Q • 151-160 •<br />

Sta rk, R.E. (1980 ) . S t age s of Sp&ech Deve l opment<br />

In Yen i -Kom.h ian, Kavanagh and Ferguson (&d • . )<br />

Prodlli:li!2!l<br />

• 1'I&oJ York : Academ i c;; Pr& ••.<br />

in the Fir.t Year of Life.<br />

£hiU Ph!2!l!llQ9:t..L VoL..!<br />

stark , R.E. , (eds. ) (198 1 ) . Lans B&hv iQJ:. in .!nin£ nd E atl<br />

Child hQBQ. . I'I&W York: lil.ev i&r North Ho lla nd .<br />

van d e r Stelt, J.M. and Koopmans van Be i num , F.J. (1986) . The On.et of<br />

Babb l ing Re l at&d to Gro •• Mot.or D&v& lopment. In Lindblom and Z&tt.&r.trom<br />

(eds. ) Pr!!'£!!J:.§2r. of Eatl:t. ih . 1'1& ... , York : S tcckt.on Pr& ••.<br />

Stoc kman , J. , Wood s , D.<br />

Phcln&t ic .&gm&nt.. in<br />

Psych ol ingu ist ic R&.earch.<br />

and Tishman ,<br />

Early Infant<br />

!Q . 593-617.<br />

A. (198 1 ) . Li.t.&ner Agre&m&nt on<br />

Vocal izat. ion.. Jou rnal of<br />

stee l -Gammon ,<br />

C.M.<br />

and Oteme (1986 ) .<br />

Journal of Sp &ech and Hear ing<br />

R&.&arch.<br />

Th&len, E. ( 1 98 1> . Rhyt.hmical B&havior in<br />

Persp e c tive . Deve lopm&ntal P.ychclogy . !Z .<br />

Infanc:y:<br />

237 -257 .<br />

An<br />

Et.hclogical<br />

Th!!, EJ:.in£iEl! .!2i ih e I n t &J:.nati c l Ehcii£ 8l!£ia t.ion • Obt.ainable from<br />

the Internat ional Phonet ic A ssocia t ion, Un iver6it y Co l leg& , Gow&r Street ,<br />

LcndCln .<br />

Vihman , M.M. ( 1 986 ) . Individual Differ&nce. in Babb l ing and Early Sp&&c;; h:<br />

Pred icting to Age Three . In Lindblom and Zetterstrom (ed s. ) PrllJ:.l!.!2 of<br />

138


Vih man , M.M. , Macken , M. A. , Mi l ler, R. , Simmons, H. a n d Mi l ler, 3. (1995 ) .<br />

Frcm Bab bl ing tc Speech : A Reassessment of the Ccnt inity Issue . banguage<br />

61.i..£ • 397-445 .<br />

Vihman , M.M. , F e rgscn , C.A. a n d<br />

Deve l opment from Babbl ing to Speech:<br />

Differences. App l i ed Psychol i n g istics.<br />

Elbert , M. ( 19&16 ) • P h cnc l cg i c al<br />

Commcn Tendenc ies and Individual<br />

Z . 3-40 .<br />

VygotS\.c)/, L.. (1962 ) .<br />

E ng l i s h translation<br />

( 1 934 ) •<br />

Th9.hi nd<br />

by E. Hanfmann<br />

L. a nss!!<br />

and G.<br />

New Ycrk : M. I.T. Press .<br />

Vakar of Rss ian cr iginal .<br />

Wi l s cn , E.O. ( 1 990 ) •<br />

Harvard Un iversity Press .<br />

W h i taker , H.A. ( 1 973 ) . Ccmments on the Innateness cf L.angage . In Shy<br />

(ed. ) §.!!!!ru! Ne ,Ri rtie.n§. in bi!lSl.l istic Wash ington D.C. : Gecrgetol>m<br />

Un i vers ity Press.<br />

Yen i -Kcmsh ian, G.H. , K ava n agh , 3.F. , and Ferguscn , C.F. , (eds. )<br />

C hil& Eh c!l.!!!lB.J.. E!.:...!.J.. E!:Bduc t i cn • Nel>l York: Ac adem ic: Press .<br />

( 19E10 ) •<br />

139


A SIMPLE COMPUTERIZED RESPONSE COLLECTION SYSTEM<br />

Johan Stark and Mats Dufberg<br />

1. Introduction<br />

The object of the system described here is to enable automatic response<br />

collection directly from the respondent (s) to a computer readable<br />

media. This has several important implications including rendering<br />

unnecessary the manual transfer of the data from answer forms to a<br />

computer for subsequent analysis. Data will instead be directly<br />

available to the computer. The computer may in turn perform on line<br />

processing of this data and hence control the data collection procedure.<br />

The computer configuration consists of one main computer and a number of<br />

terminals.<br />

In section 2 an application will be described. The application<br />

shows that the computer system is a useful tool and that it can be<br />

handled by personell who are inexperienced in computer programming, as<br />

was the case in the project described below. In section 3 possible<br />

future applications will be discussed. In section 4 the hard and software<br />

and the necessary programming will be presented.<br />

2. An application<br />

In a project described in McAllister et al. (1987) we wanted subjects to<br />

give judgements on recorded speech material. We decided to use a<br />

computerized method for collecting the responses from the subjects. We<br />

will first very briefly present the project and then describe how we<br />

used the computer system.<br />

For the project, we recorded a number of students before and after<br />

a certain training period. The same material, consisting of sentences<br />

and words, was read at both recordings. The recordings were digitalized<br />

and recorded to disks. From this material we produced test tapes. On the<br />

test tape the material was presented in pairs. A pair consisted of the<br />

same sentence/word read by the same student at the two times of<br />

recording, before and after the training period. The order within the<br />

pair was random. Then we recruited a panel of experts to judge which of<br />

the two members of each pair was the best (McAllister et al. 1987).<br />

Each subject in the listening test, that is, each expert-panel<br />

member, was sitting in front of a terminal with a keyboard and a<br />

monitor. We presented written information on each screen which was sent<br />

from the main computer. The information was, in this case, the standards<br />

that the subjects should be using for their judgement. The tape recorder<br />

was automatically started when all terminals had received the written<br />

information. The speech material was presented through headphones from a<br />

taperecorder. The tape was specially prepared with the speech material<br />

140


on channel one and control tone signals on channel two. The tone signals<br />

were placed directly after each pair. The tone signal triggered the tape<br />

recorder's stop mechanism immediately after each pair was presented. The<br />

tone signals were also sent to each terminal. The keyboard was locked<br />

for key presses until the terminal had "heard" the tone signal. And then<br />

the terminal only reacted on certain keys, namely those keys that gave<br />

the three accepted responses, the return key, and the back space key.<br />

That is, the subjects pressed one key for their judgement and then the<br />

return key. They could change their minds by pressing the back space key<br />

before the return key. All other keys appeared to the subjects to be<br />

"dead". The data from all terminals were then collected by the main<br />

computer and everything was repeated again until the end of the test<br />

tape. After each session the data were stored on files, one file for<br />

each subject and tape.<br />

We had decided to use SAS, a statistical program, for statistical<br />

analysis, so our data files had be compatible with the SAS program. The<br />

data files that resulted from the test sessions were pure text files,<br />

that is, they contain only normal characters. But the data files<br />

contained only the pure data, that is, they contained no information on<br />

which student or which sentence/word the data was connected to. That<br />

information was stored in a special key file. The key file also<br />

contained information on which order within the pair the sentences/words<br />

were presented. With the help of this key file and a small computer<br />

program we transformed all data files for each test tape into an SASreadable<br />

matrix.<br />

Some pros of this computer based system are:<br />

- The data is directly stored on computer readable media.<br />

- The response interval can be controlled on line, for example by the<br />

responders.<br />

- The responders will never get lost. What they respond to will always<br />

correspond to what just heard.<br />

What we had to prepare for this application (except for computer<br />

programs) were the test tapes with tones, the key files, and the the<br />

text files that contained the information that was written on the<br />

screens of the terminals. We would have had to make test tapes in an<br />

ordinary pencil-and-paper application too, and the key files and text<br />

files were easily made from the command files that produced the test<br />

tapes.<br />

3. Future applications<br />

The computer system could easily be used for a number of applications.<br />

We hope that there will be a library of standard applications so that no<br />

or marginal programming will be necessary for the user in the future. It<br />

is reasonble to expect the following applications to be standard:<br />

- Listening experiments with or without written information, with or<br />

141


without limited number of response alternatives.<br />

- Experiments measuring response time of audible and/or written stimuli.<br />

- Demonstration experiments for seminars.<br />

Computerized correction, on line or after the session, could of<br />

course be included in any application.<br />

4. Hard and software<br />

The first prototype system was set up by connecting a number of cheap<br />

personal computers (Micro-Bee 32) via a simple wire interface. This was<br />

dne by using the original input/output system already present on them<br />

for printer control etc. The printer interface is used as a parallel 8<br />

bit bus, and 4 bits from the serial interface is used as a control bus.<br />

Altogether these 12 bits are connected to an ordinary flat cable using<br />

standard 25 pin D-sub connectors. Up to 16 machines may be connected in<br />

this way. An ordinary CP/m system (SI-80) is used at the end of the line<br />

as a server of this network. The interface here is equally simple, only<br />

12 bits of digital I/O is used.<br />

Data may be transferred to/from the server to/from any terminal on<br />

the line. Each terminal has a unique address which enables it to<br />

communicate independently of the others. All data flow is controlled by<br />

the server system. The data transfer speed is about 20 kBytes/sec which<br />

gives almost no delays seen by the terminal user. The updating of one<br />

full terminal screen will virtually take place in no time at all.<br />

Each terminal has a portion of so called Boot strap software stored<br />

into a resident non-volative memory. Actually the character generator<br />

eprom has some unused locations that are used to store this software.<br />

This piece of software will initialize network processing by a simple<br />

startup command from the keyboard. The server system has similar software<br />

loadable from a diskett.<br />

On top of this, each terminal can load a portion of software<br />

written in Basic that enables the user to write Basic programs that<br />

communicate with the server system. These network services are easily<br />

programmable and may be extended to whatever commands that are wanted.<br />

On the server side the command processing is written in Turbo Pascal.<br />

The prototype system has so far the following commands available:<br />

1. Send a block of data to server.<br />

2. Receive a block of data from server.<br />

3. Load a program from server.<br />

4. Save a program in server.<br />

5. Load and start a program.<br />

6. Start a Revox tape recorder. (An additional interface required. )<br />

7. Stop a Revox tape recorder on an audio signal.<br />

For a particular experiment the user will have to make an<br />

142


application program using the more general software described above as a<br />

library. The application software will consist of a server part written<br />

in Turbo Pascal and a terminal part written in Basic. First the terminal<br />

program is implemented on a stand alone terminal. The Micro-Bee 32<br />

computers are stand alone computers with a built in Basic interpreter,<br />

computer screen and keyboard. Network services may be simulated in Basic<br />

using DATA statements as input and PRINT statements to check output.<br />

Similarly the server system program may be tested by simulating the<br />

terminals. When both programs seem to work satisfactorily a real version<br />

is set up and tested. If properly done the program will then handle<br />

several terminals simultaneously. This enables anyone to set up a simple<br />

or more sophisticated data collection procedure for his experiment in a<br />

fairly short time. The application software may also partly be used to<br />

update the general part thus after some time of use providing a whole<br />

database of readymade software for various cases of data collection<br />

experiments.<br />

Since every terminal also is an independant computer with its own<br />

CPU, the processing power of the system will be large. One effect of<br />

this is the ability to let each terminal individually measure the time<br />

for a response with a very high resolution.<br />

For users less experienced in programming in Basic and Turbo Pascal<br />

some general data collection program suited for a number of common<br />

situations could be written by someone more experienced. Then, a simple<br />

set-up file editable from an ordinary word processor could easily be<br />

used to determine some application variables in the data collection<br />

procedure, such as how many terminals are connected, how many responses<br />

to collect and the names of the files to be used for text input and data<br />

collection storage.<br />

A second generation of the system is under development using an<br />

IBM-PC-AT as the server computer. This removes the necessity of a<br />

special data transfer from the present CP/m machine to the more common<br />

MS-DOS format diskettes.<br />

REFERENCES<br />

McAllister, Robert, Dufberg, Mats, and Wallius, Maria (1987):<br />

"Experiments with technical aids<br />

Published in Perilus report no<br />

in pronunciation<br />

5 ( this volume ) .<br />

University of Stockholm, Institute of linguistics.<br />

teaching" .<br />

Stockholm:<br />

ABOUT THE AUTHORS<br />

Johan Stark is an engineer and has constructed the hardware and written<br />

the basic software for the computer system.<br />

Mats Dufberg is a graduate student in phonetics and has written the<br />

software and run the system for the application described.<br />

143


EXPERIMENTS WITH TECHNICAL AIDS IN PRONUNCIATION TEACHING<br />

Robert McAllister,<br />

Mats Dufberg and Maria Wallius<br />

1.0 Introduction<br />

This is a summary of experimental research whose aim was<br />

to<br />

test the utility of technical aids in pronunciation<br />

teaching. There have been several attempts in recent years<br />

to apply developments in speech technology to various<br />

language teaching/learning situations. In particular,<br />

there<br />

has been interest in a metodological approach which includes<br />

the concept of "feedback" as a learning aid (de Boot, 1980).<br />

This concept has been put to wide practical use with the<br />

advent of the so called "language laboratory" and its use in<br />

the field of second and foreign language learning. The<br />

relatively modest success of this movement has led to<br />

efforts to complement the audio active-comparative method<br />

most often used in the language laboratory. Some of these<br />

efforts have been based on the idea that feedback of the<br />

speech signal or some of its components via alternative<br />

sensory<br />

channels may be a viable aid especially in learning<br />

to produce suprasegmental aspects of the phonology of a<br />

foreign language.<br />

The teaching and learning of features such<br />

as rhythm and intonation has always seemed to present<br />

special problems and has proved to be particularly difficult<br />

144


( Crystal, 1975). Unfortunately, this difficulty has often<br />

led to the neglect of this important aspect of the target<br />

language phonology. Teachers have often been at a loss as<br />

to how to teach this part of the sound system.<br />

One of the<br />

traditions in this field is the use of<br />

a<br />

visual<br />

representation of the prosodic elements as a<br />

( Kelz et. al. , 1977). May different symbols and<br />

learning aid<br />

systematic<br />

transcription systems have been used but their common goal<br />

has been to augment the written text with an explicit<br />

notation of the prosody. This tradition provides the<br />

background for the research on technical aids in the<br />

teaching of prosody that has been done in the last three<br />

decades. The idea of using visual or tactile channels for<br />

the<br />

feedback of speech signal information has been used for<br />

many years in the teaching of handicapped learners or<br />

learners who are for other reasons not able to make<br />

effective use of the auditory feedback channel which appears<br />

to be indispensable in the production of normal speech<br />

( Potter et. al., 1948; Abberton and Fourcin, 1975; Martony,<br />

1976; Spens, 1984). This work drew the attention of<br />

phoneticians and linguists who were interested in the<br />

acoustic<br />

and perceptual nature of prosodic elements and the<br />

acquisition of these features by language learners. The<br />

basic idea here was that isolation and visual feedback of<br />

acoustic parameters critical to the rhythm and intonation<br />

could serve to concentrate the learners attention to these<br />

important and difficult aspects of the target language and<br />

thereby facilitate the learning of them.<br />

Pioneering work in<br />

this direction was done as early as 1966 by Harlan Lane<br />

145


(Lane and Buiten, 1966) . Since then, there has been<br />

considerable interest in the use of technical aids in<br />

pronunciation teaching. The subject has been discussed and<br />

several studies have been done, a large part thereof being<br />

of an informal nature (a few recent examples: Vardanian<br />

(1964) , Bannert (1979) , Albertson (1982) , Baker (1982) , for<br />

a critical survey see Leon and Martin, 1970) ). There have,<br />

however, been relatively few controlled studies of this<br />

methodology.<br />

Notable exceptions to this statement in recent<br />

years are James (1976) , Hengstenberg (1980) , and de Bot<br />

(1983). These researchers found a positive effect of the<br />

use of technical aids in the teaching of intonation.<br />

Generally<br />

speaking the learners who used the aids were more<br />

successful in learning prosodic features such as<br />

intonation<br />

than those who practiced according to traditional language<br />

laboratory methods.<br />

This report is a summary of research in which the<br />

methodology disussed above for the teaching of prosody has<br />

been used in a slightly different way. Our aim was to test<br />

the utility of technical aids and the feedback methodology<br />

as an integrated part of a foreign language course. Our<br />

basic question was similar to other studies already<br />

mentioned: Do technical aids help in the learning of<br />

prosody? - or formulated as a 0- hypothesis: learners who<br />

use the technical aids will not achieve a more native-like<br />

production of the prosodic features of the target language<br />

than the learners who do not use the techical aids.<br />

Aspects<br />

of this research that were somewhat different than the<br />

studies mentioned above include the integration of the<br />

146


training program in the course curriculum. Whereas earlier<br />

studies often compared performance before and after one or<br />

several short training sessions, we have tried to simulate<br />

an actual course situation where the training with the<br />

technical aids is more spread in time and integrated in<br />

the<br />

course as a logical part of the overall program.<br />

Consequently we have chosen to focus our interest on the<br />

obviously<br />

important "long term effects" of this methodology<br />

whose short term effects have been shown to be positive.<br />

2. 0 METHODS<br />

The methods used in this research will be presented under<br />

the following headings:<br />

1. Apparatus<br />

2. Training sessions and control recordings<br />

3. Progress evaluation through listener judgements<br />

It should be pointed out that steps 2 and 3 were carried out<br />

twice in two consecutive experiments. The first training<br />

experiment was done as a pilot study but is included in this<br />

report since the results of the two experiments were quite<br />

similar.<br />

The second experiment differed slightly on several<br />

methodological points and this will be elaboratesd upon in<br />

that which follows.<br />

147


2.1 Apparatus<br />

Our aim in the development of technical aids in these<br />

experiments was to provide visual and auditory feedback to<br />

the learners which would concentrate his or her attention on<br />

certain acoustic/auditory features important to natural<br />

sounding rhythm and intonation. Two technical aids were<br />

developed to this end. One to provide a clearer auditory<br />

impression of the prosody in practice utterances by means of<br />

isolation of the suprasegmental features. The other to<br />

present the learner with feedback of a visual representation<br />

of isolated acoustic features relevant to prosody and to<br />

make it possible to visually compare these features in<br />

practice efforts with a model utterance.<br />

2.1.1 Auditory feedback : "the Hummer"<br />

This device was developed with the idea that an isolation of<br />

the prosodic features in an utterance may have the effect of<br />

clarifying auditory goals toward which the learner was to<br />

strive. The instrument developed was, in electronic terms,<br />

a fairly simple one. The essence of this aid was a simple<br />

variable band pass filter which could be manipulated in<br />

several ways by the user. When the speech signal was fed<br />

through this filter it was possible to eliminate all<br />

segmental information so that the auditory impression was<br />

that of humming the original utterance thus effectively<br />

isolating the suprasegmental information. It was possible<br />

for the user to vary the center frequency of the filter so<br />

148


that the amount of segmental information present in the<br />

signal could be chosen at will. This instrument was located<br />

between the learners tape recorder and his earphones so that<br />

both the model utterances and his own efforts could be<br />

filtered and compared as a complement to the traditional<br />

audio active comparative method.<br />

A schematic representation<br />

of this device is shown in figure 1.<br />

FIGURE 1:<br />

F¢=60-300HZ<br />

I<br />

- , -<br />

36<br />

A<br />

II<br />

f : \<br />

I<br />

I ,<br />

/ I ,<br />

I<br />

BP-FILTER<br />

BUTTERWORTH<br />

-6 po l.<br />

dn/oct 36dI3/oct FILTEr<<br />

•<br />

OUT<br />

IN. "'" FIGURE 1<br />

DIRECT<br />

A SCHEMATIC REPRESENTATION OF THE LEARNING DEVICE<br />

FOR AUDITORY FEEDBACK "THE HUMMER"<br />

In experiment 1 still another "hum" was used. The speech<br />

signal was filtered by means of a computer program developed<br />

by Peter Branderud (1979) and the practice utterances were<br />

prerecorded so that each practice utterance in the training<br />

material was followed by a filtered version of the<br />

utterance.<br />

149


2. 1. 2 Visual feedback<br />

The essential elements of this instrumentation were a<br />

,.<br />

fundamental frequency extractor (Martony, 1976) and a twotrack<br />

storage oscilloscope. In effect this device<br />

functioned in roughly the same manner as visualiz ers used<br />

and described by researchers previously mentioned in section<br />

1. (Lane and Buiten, 1966; James, 1976) . The learner was<br />

able to hear the model utterance and see its<br />

intonation/rhythm representation on the upper track of the<br />

oscilloscope.<br />

After storing this image he was able to try<br />

to reproduce this representation with as many tries as<br />

were<br />

needed being able to store an effort for inspection and<br />

comparison with the model utterance before going on to the<br />

next attempt at matching the model. A schematic<br />

representation of this instrumentation can be seen i<br />

figure<br />

2.<br />

2. 2 Training sessions and control recordings<br />

A graphic representation of the procedural steps in both<br />

training experiments is presented in figure 3.<br />

Subjects were selected and a screening test was administered<br />

to all the students who were to take part in the<br />

experiments. This test was used to establish the<br />

proficiency of the subjects in the perception and production<br />

of the prosodic categories that were to be trained. The<br />

subjects were divided into experimental and control groups<br />

on the basis of the results of this test with the aim of<br />

150


LD<br />

Ti\PE<br />

RECORDER<br />

00<br />

llCROPIIONE<br />

a==o<br />

--11 '<br />

MIPLIFIER<br />

( l=::::J -1>-<br />

,<br />

F<br />

o<br />

I<br />

-extractor<br />

: , ! I '-C<br />

. I<br />

•<br />

STORi\GE OSClLLOSCOPE<br />

'iV\,;wJ'\r vJ\<br />

-f\r,N-J\r"<br />

S'rOl\I 1 STOlm 2<br />

CIIMIHII.<br />

CIIi\NNEL 2<br />

C\J<br />

<br />

Il::<br />

:::><br />

o<br />

H<br />

<br />

FIGURE 2<br />

A SCHEMATIC REPRESENTATION OF THE INSTRUMENTATION FOR THE DEVICE WHICH<br />

PROVIDED THE VISUAL FEEDBACK


FIGURE 3<br />

s U B J E C T S<br />

students of English and Swedish<br />

ISCREENINGJ<br />

PRETEST<br />

E X P E R I MEN T<br />

I 1 I<br />

P-. 6 C 0<br />

, .<br />

aUQI tory visual auditory auditory<br />

(fil ter)<br />

t<br />

+<br />

v isu al<br />

I<br />

1<br />

,<br />

K<br />

CON T R a L<br />

POSTTEST<br />

FIGURE 3<br />

A GRAPHIC REPRESENTATION OF THE PROCEDURAL<br />

STEPS IN THE TRAINING EXPERIMENTS<br />

152


creating groups that were as similar as possible in terms of<br />

the proficiency of group members prior to the beginning of<br />

the actual training experiment. A pre-test was then given<br />

to all subjects. This test consisted of a documentation of<br />

each individual subject's pre-training proficiency in the<br />

production of the prosodic categories that were covered by<br />

the training material. A recording of each subject's<br />

production of the relevant prosodic categories was made.<br />

subjects then trained in their respective groups and at<br />

The<br />

the<br />

end of the training period a post test was administered<br />

which was exactly the same as the pre-test. It should be<br />

stressed again here that one of the important aspects of<br />

these experiments was a concerted effort to integrate this<br />

training into the language course as a whole.<br />

2.2.1 Subjects<br />

The subjects in both experiments were recruited from two<br />

language courses at the University of Stock holm. The<br />

undergraduate pronunciation courses in the Department of<br />

English was one source of subjects. These were Swedes who<br />

had, generally speak ing, fairly high proficiency in English<br />

due to the emphasis on the learning of English in the<br />

Swedish schools. The other source of subjects were the<br />

courses in Swedish as a second language offered by the<br />

Institute of English Speak ing Students. These were foreign<br />

students speak ing many different native languages and<br />

were,<br />

almost without exception, beginners in their study of<br />

Swedish.<br />

153


2. 2. 2 Training material<br />

The linguistic practice material used in both training<br />

experiments was the same as that used in the regular<br />

language courses the difference being that only the prosodic<br />

material was used by our subjects during the training.<br />

Students from the English department used the relevant<br />

exercises and prerecorded tapes from "A Course Book in<br />

English Pronunciation" , Clerici (1984) in experiment I and<br />

2. These exercises emphasized sentence intonation types in<br />

British English as well as vowel reduction exercises.<br />

The students of Swedish used the exercise booklet "Uttal" by<br />

Marschall and Rosenquist (1983) and the corresponding<br />

prerecorded tapes. These exercises emphasized Swedish word<br />

accent in various phrase and sentence contexts, the<br />

long-short distinction, and the interaction of these<br />

categories with rhythm and intonation on the sentence<br />

level<br />

which has been shown to be critical to the realization of<br />

Swedish prosody.<br />

2. 2. 3 Training<br />

In experiment I the subjects were divided into 5 groups:<br />

group A trained with the "Hummer", group B with the device<br />

for visual feedback, group C with both the visual feedback<br />

instrumentation and the prerecored "hum" described above<br />

(3. 1), group D with the prerecorded hum only and group K was<br />

the control group who used the same training material but in<br />

154


the tradit ional way wit hout the technical aids. In<br />

experiment 2 there were only 3 groups correspondinng to<br />

groups A, B and K in experiment 1.<br />

hummer(group A), visual feedback<br />

group (group K). The subjects<br />

That is to say: the<br />

(group B), and control<br />

in experiment 1 were<br />

requested to practice 2 hours a week over a 4 week period.<br />

In experiment 2 this training time was increased to 2<br />

hours<br />

a week over an 8 week period. As has been mentioned above<br />

in the summary of the training experiments (2. 2) the<br />

pre-test was given at the start of the training period and<br />

the post-test at the end of this period. These tests were<br />

identical<br />

and were composed of the same prosodic categories<br />

included in the practice material.<br />

2. 3 Evaluation with listener judgements<br />

The purpose of this procedure was, of course, to evaluat e<br />

the progress of each subject in terms of the production of<br />

the relevant prosodic feat ures of the respective languages<br />

and<br />

to establish whether or not those subjects who used the<br />

technical aids showed a difference in progress when compared<br />

to the control<br />

group.<br />

The evaluation procedure differed<br />

slightly<br />

bet ween<br />

experiments 1 and 2. In the first<br />

experiment the pre and post test tapes were recorded into<br />

the DEC Eclipse computer at the phonet ics lab. Each tape<br />

was then edited by the MIX program, an interactive signal<br />

editor. Aft er editing there was one signal file for each<br />

utterance by each subject. The MIX file now allowed us to<br />

play the files back in any order. We made a syst emat ic<br />

155


selection of representative samples for the respective<br />

languages, ordered these utterances randomly, and created<br />

the tapes for the listening experiments.<br />

A panel of experts<br />

was then recruited to listen to these tapes and grade the<br />

utterances on a three poing scale with 1 being the least,<br />

and 3 being the most successful pronunciation of the<br />

target<br />

language. For the Swedish material, the panel was composed<br />

of Swedish natives who were either language teachers at<br />

the<br />

Institute of English Speak ing Students or linguists and<br />

phoneticians employed at the Department of Linguistics at<br />

the University of Stock holm. The panel who judged the<br />

English material were either native speak ers of English<br />

who<br />

had<br />

were<br />

experience<br />

teachers<br />

in teaching English prosody or Swedes who<br />

of English intonation in the English<br />

department<br />

at<br />

the University of St ock holm. For the<br />

evaluation of the second experiment the pre and post test<br />

tapes were edited in a similar way.<br />

For each language group<br />

we selected approximately half of the available test<br />

material. In contrast to the first experiment, this time<br />

the utterances were presented to the listeners in pairs.<br />

The pair consisted of the "same" utterance by the same<br />

student from the pre test tape and the post test tape<br />

repectively and these pairs were randomly ordered. The<br />

members of the pair could be presented to the listners in<br />

two orders: either the pre test utterance followed by the<br />

post test utterance or the post test utterance followed by<br />

the pre test utterance. With the help of a simple Basic<br />

program we radomized this utterance order within pairs.<br />

The<br />

different language groups were, of course, kept separate as<br />

156


in the first experiment. The task of the expert panels,<br />

which were very similar in composition to those in<br />

experiment 1, was now to assess which of utterances in the<br />

pair<br />

was the better production with respect to the intended<br />

prosodic category or to indicate that the two were equally<br />

good (or bad).<br />

For the collection of the listener panel judgements we<br />

used<br />

the DIRIS system (described by Dufberg elsewhere in this<br />

volume). At a listening session, each judge/listener used a<br />

computer terminal including a screen and a keyboard. Each<br />

intended utterance, together with the intended prosodic<br />

category, was written on the terminal screen. Then the<br />

recorded utterance pair was played and the tape was<br />

automatically stopped until all judges had given a response.<br />

The tape was then automatically re-started and continued<br />

to<br />

the next pair. The judgements were automatically stored in<br />

data files and recoded so as to allow us to treat it as<br />

least interval data. This data was then organized into<br />

matrixes compatible with the statistical program SAS.<br />

The<br />

statistical analysis was done on QZ's IBM/Guts computer's<br />

SAS program.<br />

3. Results<br />

Since experiments 1 and 2 differed slightly in terms of<br />

methods the results will be presented separately.<br />

157


3. 1 Experiment 1<br />

Figure 4 shows a summary of the average grades for all<br />

subjects with pre test score plotted against post test<br />

score. It appears that there is a definite tendency toward<br />

improvement in the realization of prosody for students of<br />

both languages. A t-test showed that this improvement was<br />

statistically significant at the 2. 5% level. This figure<br />

shows, however, no obvious difference in experiment and<br />

control groups. Indeed, no statistically sigificant<br />

difference could be established between the improvement of<br />

the experiment and control groups respectively.<br />

Figure 5 shows more explicitly the difference between<br />

control and experiment groups in a bar graph where the<br />

y-axis labeled DIFF SCORE is the difference between the<br />

average grade (1 to 3) on the pre test and average grade<br />

on<br />

the post test (also 1 to 3). Here we can see that, on the<br />

average, the students using the traditional language<br />

laboratory methodology improved more than the students who<br />

trained with the technical aids even though this<br />

difference<br />

was not statistically significant.<br />

In figure 6 the difference scores for the individual<br />

experiment groups and the control group are shown.<br />

It can be<br />

observed again that the control group showed better<br />

improvement than any of the experimental groups. None of<br />

these differences were statistically significant however.<br />

158


FIGURE 4<br />

AVERAGE GES FOR EACH SUBJECT<br />

3<br />

E-1<br />

++<br />

en<br />

<br />

E-1<br />

+<br />

E-1 2<br />

.<br />

en + •<br />

+<br />

•<br />

0<br />

•<br />

•<br />

P4 +<br />

•<br />

1<br />

•<br />

•<br />

1 2<br />

3<br />

PRE TEST<br />

• =<br />

EXPERIMENT GROUP<br />

+= CONTROL GROUP<br />

FIGURE 4<br />

A SUMMARY OF THE AVERAGE GRADES FOR ALL SUBJECTS WITH PRE TEST<br />

PLOTTED AGAINST POST TEST SCORE. THIS GRAPH SHmvS IMPROVEMENT<br />

FOR ALL SUBJECTS WHICH WAS SIGNIFICMT AT THE 2.5% LEVEL. THERE<br />

1;vAS NO STATISTICALLY SIGNIFICAi\)T DIFFERENCE BETWEEN THE CONT2-.01<br />

AND EXPERIMENT GROUPS.<br />

159


.5<br />

FIGURE 5<br />

.4<br />

w<br />

Ct:::<br />

0<br />

u<br />

(fJ<br />

LL<br />

LL<br />

<br />

a<br />

· 3<br />

· 2<br />

• 1<br />

o<br />

EXP<br />

CONTROL<br />

GROUP<br />

FIGURE 5<br />

DIFFERENCE BETWEEN THE CONTROL AND EXPERIMENT GROUPS<br />

THE Y-.IS LABELED DIFF SCORE<br />

INDICATES THE DIFFERENCE<br />

BETHEEN THE AVERAGE GRADE ON THE PRE TEST (1-3) AND THE<br />

AVERAGE GRADE ON THE POST TEST (1-3). THE DIFFERENCE<br />

WAS NOT STATISTICALLY SIGIFICANT .<br />

FIGURE 6<br />

. 5<br />

· 4<br />

w<br />

Ct:::<br />

0<br />

u<br />

(fJ<br />

LL<br />

LL<br />

<br />

Cl<br />

· 3<br />

. 2<br />

• 1<br />

o<br />

A B C<br />

GROUP<br />

D<br />

K<br />

FIGURE 6<br />

DIFFERENCE SCORES FOR THE INDIVIPUAL EXPERIMENT GROUPS AND<br />

THE CONTROL GROUP: DIFF SCORE INDICATES THE DIFFERENCE<br />

BETWEEN THE AVERAGE GRADE ON THE PRE TEST (1-3) AND THE<br />

AVERAGE GRADE ON THE POST TEST (1-3): NONE OF THESE<br />

DIFFERENCES ARE STATISTICALLY SIGNIFICANT:<br />

160


3. 2 Experiment 2<br />

Figure 7 shows the results of training for experiment groups<br />

A and B individually and taken together (EXP) and for the<br />

control group K. The two language groups were also taken<br />

together in these results. It should be recalled here that<br />

the listener judgements were expressed in terms of better,<br />

worse, same. These responses were recoded to +1 for better,<br />

-1 for worse and 0 for same. All groups showed a positive<br />

result i. e.<br />

all groups on the average improved their mastery<br />

of target language prosody. This gave us a positive number<br />

between 0 and +1 for all groups. The y-axis in this bar<br />

graph represents subjects progress expressed in terms of<br />

this number. As was the case for experiment 1, the control<br />

group shows the most improvement. In this case the<br />

difference<br />

between the experimental group as a whole (group<br />

A plus group B) and the control group as a whole (K) was<br />

significant at the 2% level. The difference between groups<br />

B and K was also significant at the 1% level. The<br />

difference between groups A and K was not significant nor<br />

was the difference between groups A and B.<br />

Figures 8 and 9 show the same results for the individual<br />

language groups. The English learners' progress reflects<br />

the same tendencies as were seen in fig 7. The control<br />

group shows the most progress. The difference seen on the<br />

graph (fig 8) between the experiment group (A plus B) and<br />

the control group was significant at the 5% level. The<br />

difference between groups B and K was also significant at<br />

the 1% level. No other differences seen i figure 8 were<br />

16 1


FIGURE 7<br />

-:r<br />

.3<br />

(f)<br />

(f)<br />

w .2<br />

er::<br />

L.9<br />

o<br />

er::<br />

(L<br />

• 1<br />

--<br />

--<br />

-i-<br />

-l-<br />

I-<br />

--<br />

--<br />

o<br />

A B K EXP<br />

GROUP<br />

I<br />

FIGURE 7<br />

THE<br />

RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY<br />

AND TAKEN TOGETHER (EXP) AND FOR THE CONTROL GROUP K. THE Y-AXIS<br />

EXPRESSES SUBJECTS PROGRESS IN TERMS OF A NUMBER BETWEEN 0 AND 1<br />

(see text). STATISICALLY SIGNIFICANT DIFFERENCES: EXP-K 2% level;<br />

B-K 1% level.<br />

162


FIGURE 8<br />

(f)<br />

(f)<br />

w<br />

et::<br />

l.9<br />

.3<br />

.2<br />

T<br />

T<br />

T<br />

$<br />

T<br />

I<br />

T<br />

I<br />

-.!-<br />

!<br />

I<br />

0 I<br />

T<br />

T<br />

0:::<br />

-r<br />

•<br />

CL 1<br />

+"<br />

T<br />

0<br />

t<br />

I<br />

-r n<br />

;--<br />

ENGLISH<br />

-<br />

A B I< EXP<br />

GROUP<br />

FIGURE 8<br />

THE RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY<br />

AND TAKEN TOGETHER (EXP) AND FOR THE CONTROL GROUP K FOR THE<br />

LEARNERS OF ENGLISH.<br />

OF A NUMBER BETWEEN 0 AND 1 (see text).<br />

DIFFERENCES: EXP-K 5% level; B-K 1% level.<br />

THE Y-AXIS EXPRESSES SUBJECTS PROGRESS IN TERMS<br />

STATISTICALLY SIGIFICANT<br />

FIGURE 9<br />

SWEDISH<br />

(f)<br />

(f)<br />

W<br />

et::<br />

l.9<br />

o<br />

0:::<br />

CL<br />

.2<br />

• 1<br />

o<br />

A<br />

n<br />

B I< EXP<br />

GROUP<br />

FIGURE 9<br />

THE RESULTS OF TRAINING FOR EXPERIMENT GROUPS A AND B INDIVIDUALLY<br />

AND TAKEN TOGETHER (EXP) AND FOR CONTROL GROUP K FOR THE LEARNERS<br />

OF SWEDISH. THE Y-AXIS EXPRESSES SUBJECTS PROGRESS IN TERMS OF<br />

A NUMBER BETWEEN 0 AND 1 (see text). NO STATISICALLY SIGNIFICANT<br />

DIFFERENCES.<br />

163


statistically significant. The Swedish learners (fig 9)<br />

show generally the same results in that the control group<br />

shows the most progress. None of the differences between<br />

groups were statistically significant here however.<br />

4. Discussion<br />

Let us return to the introduction and review our point of<br />

departure and main question in this research. Other<br />

researchers have found a positive effect of this methodology<br />

in<br />

the learning of prosodic elements of a foreign language.<br />

Our main question was similar to that of these<br />

researchers:<br />

"Do technical aids help in the learning of prosody?"<br />

Formulated as a O-hypothesis this question could be<br />

expressed as: Learners who use the technical aids will not<br />

achieve a more native-like production of the prosodic<br />

features of the target language when compared to learners<br />

who have used only traditional language laboratory<br />

methods.<br />

Our qualification of these formulations is of considerable<br />

importance to this research. That is, we are most interested<br />

in the "long term effects" of this methodology or,<br />

to put it<br />

somewhat differently, "How does this method work if set<br />

within the time and curriculum framework of a typical<br />

language course?" The results presented in section 3 seem<br />

to<br />

make it fairly clear that the technical aids methodology<br />

as we have applied it in this research does NOT seem to<br />

facilitate the learning of the prosody of a foreign language<br />

164


more than the traditional language lab methods. In fact, we<br />

could go even further on the basis of our results and say<br />

that, even though we often lack statistical significance,<br />

there are several clear indications that the subjects who<br />

used the traditional audio active comparitive method aquired<br />

a more native-like mastery of these features than the<br />

subjects who used the technical aid/feedback methodology.<br />

Let us now briefly discuss some possible reasons for these<br />

results and the discrepancy between them and the expected<br />

results based on earlier research. It should be mentioned<br />

here that our informal observation of our subjects use of<br />

the technical aids made us optimistic as to the teaching<br />

value of this methodology.<br />

The<br />

students were generally<br />

enthusiatic and stimulated<br />

by<br />

working with these<br />

instruments. Due to these observations and comparison with<br />

other such experiments mentioned above, we do not believe<br />

that our subjects were somehow "confused" by these technical<br />

instruments as some of our colleagues have suggested. The<br />

operation of our apparatus was no more complicated than in<br />

other experiments of this kind and therefore we consider<br />

this explanation of our results to be less than<br />

convincing.<br />

This is not to say, of course, that an improvement in the<br />

function of our instrumentation would not effect our<br />

results. Our instrumentation was, in fact, relativly<br />

primitive compared with what is currently available in the<br />

form of computerized instructional devices.<br />

A somewhat more<br />

appealing explanation of our results, though very general<br />

and vague, is that the learning of the information that is<br />

fed back via the instruction devices is somehow not as<br />

165


closely related to the linguistic aspects of the target<br />

features as has been assumed in the development and use of<br />

these methods. Then the question immediately arises as to<br />

why the methods have worked better in other research where<br />

the training was more concentrated to short sessions. It<br />

would seem that the proposed explanation that the training<br />

may not be related to the linguistic learning process should<br />

have had the same effect in the other research that was<br />

similar in many ways to that presented here. Perhaps the<br />

most obvious difference is that in our experiments the<br />

training was spread out in time. We cannot at present<br />

understand why this time factor can be interpreted so as<br />

to<br />

account<br />

for the difference in the effects of these methods.<br />

The discrepancy between our results and the results of<br />

this<br />

previous work is, then, unresolved.<br />

Closer scrutiny, this methodology presents some problems<br />

related to our difficulty in explaining our results and<br />

relating them to earlier research. How much do we really<br />

know about the phonetic identity of prosodic elements? The<br />

visual manifestation of fundamental frequency in speech does<br />

not necessarily reveal to the learner which of the details<br />

of this parameter are critical to the production of a<br />

natural sounding intonation in a particular language. The<br />

same thing is of course true for the much discussed but<br />

little understood phenomenon of rhythm or timing.<br />

Actually,<br />

we would need to know these details and point them out to<br />

the learner for the effective use of this method but the<br />

fact is that our knowlege is still very limited.<br />

The use of alternative sensory channels for feedback<br />

166


information to be used in learning of linguistic features<br />

seems, in large part to be based on a rather vague<br />

behavioristic assumption. The feedback is assumed to<br />

facilitate a successful production and the successful<br />

production is assumed to reinforce the behavior and thus<br />

facilitate learning. This methodology has been used with<br />

some success in both foreign language learning and the<br />

teaching of handicapped such as deaf and hard of-hearing.<br />

It seems that this success is a fully adequate motivation<br />

for the use of the metods but that the success must be<br />

equally difficult<br />

to explain as the apparent failure of the<br />

methods in our work.<br />

5. Conclusions<br />

The<br />

inspiration for the initiation of this research was the<br />

success of the previous work in this field mentioned in<br />

section 1 of this report. As phoneticians we were<br />

enthusiastic about the possibility of using some of the<br />

methods we were familiar with from speech research in a<br />

practical way in a language teaching setting. Our aim was<br />

to test this promising methodology in an actual language<br />

course situation. We have found that the answer to our<br />

original question "Do technical aids help in the learning of<br />

prosody?" seems to be that they do NOT.<br />

Or at least that we<br />

have not been able to show such effects in this research.<br />

Our O-hypothesis is therefore supported:<br />

learners who use<br />

167


the technical aids will NOT achieve a more native-like<br />

production of the prosodic features of the target language<br />

when compared to learners who use the traditional language<br />

laboratory methodology. Although the results we have<br />

reported here are somewhat disappointing from the point of<br />

view of the phonetician who would lik e to be able to apply<br />

some of his research methods, they are important from<br />

another. As was mentioned in the introductory section,<br />

there<br />

has been some considerable research interest in these<br />

questions but a lack of well controlled research. We<br />

consider<br />

our work here to be a contribution to the research<br />

that is needed in order to be able to answer definitively<br />

our original question as to the utility of technical aids in<br />

language teaching and how these aids should be designed.<br />

168


REFERENCES<br />

1. Abberton, E. and Fourcin, A. (1975). Visual feedback and<br />

the acquisition of intonation. In: Lenneberg and Lenneberg,<br />

QnQ1iQn§ Qf 1nggg gglQEgn1 g Ng YQrl £Qgi£<br />

rg§§ EE 1§7=lQ§<br />

2. Albertson, K. (1982). Teaching pronunciation with visual<br />

feedback. N11 Qrnl , 1982.<br />

3. Baker, R. L. (1984). An experience with voice based<br />

learning. CALICO Journal, March 1984.<br />

4. Bannert, R. (1979) . Rapport fran ut talsk 1 iniken. In:<br />

r1i§ lingi§1i 1 · Lund: Lunds <strong>universitet</strong>, Inst for<br />

lingvistik.<br />

5. Bot, K. de (1980). The role of feedback and feedforward<br />

in the teaching of intonation. §1g Vol 8, pp. 35-45.<br />

6. (1983). Visual feedback and<br />

effectiveness and induced practice behavior.<br />

Egg£h Vol 26, part 4 pp. 331-349.<br />

intonation I:<br />

1nggg nQ<br />

7. Branderud, P. (1979) Blod - a block diagram simulator.<br />

gril§ J Stockholm: <strong>Stockholms</strong> <strong>universitet</strong>, Inst for<br />

lingvistik.<br />

8. Clerici,<br />

rQnn£i1iQn<br />

institutionen.<br />

M. (1981).<br />

Stockholm:<br />

QQr§g QQ in ngli§h<br />

<strong>Stockholms</strong> <strong>universitet</strong>, Engelska<br />

9. Crystal, D. (1975). Non segmental phonology in language<br />

acquisition. A review of issues. In: D. Crystal Thg ngli§h<br />

TQng Qf YQi£g . London: Edward Arnold pp. 125-149.<br />

10. Hengstenberg, P. (1979). Er§gggn1li nQ §Eg1g<br />

ihrgr ygri111ng in §Er£hli£hgn 19hr= nQ 19rnErQg§§gn<br />

Tubingen: Gunther Narr Verlag.<br />

11. James, I. F. (1976). The acquisition of prosodic features<br />

of speech using a speech visualizer. JE1 Vol XIV/3 pp.<br />

227-243.<br />

12. Kelz, H. , Kropp, W. , and Kummer, M. (1977). Zur<br />

Vereinheitligung der Intonationskodierung im<br />

Fremdspracheunterricht. In: H. Kelz (ed) hQng1i§£hg<br />

grnQlgg Qgr §§Er£hg§£hlng 1 Forum Phoneticum, 4.<br />

Hamburg:<br />

Buske Verlag.<br />

13. Lane, H. and Buiten, R. (1966). A self instructional<br />

device for conditioning accurate prosody. In: A. Valdman<br />

( ed) TrgnQ§ in 1nggg Tg£hing New York: Academic<br />

Press.<br />

169


14. Leon, P.<br />

measurements.<br />

pp. 30-47<br />

and Martin, P.<br />

In: Bolinger (ed)<br />

(1970) .<br />

.!!LtQ!!!!!iQ!!<br />

Machines and<br />

Harmondsworth<br />

15. Marschall, R. and Rosenquist, H.<br />

Stock holm: Stock holms <strong>universitet</strong>, IES .<br />

(1983) . !l!!!!1<br />

.,<br />

16. Martony, J. (1976). Om grundtonsfrekvensen hos gravt<br />

horselskadade och dova. CTM-rapport 3.<br />

17. Potter, R. , Kopp, G., and Green, H. (1948). Visible<br />

Speech. In. M. Joos £Q§!i£ rhQ!!!i£§ 1!!!!g!!g 24,<br />

Suppl.<br />

18. Spens, K. -E. , (1984). Hora med kanseln: Tak tila<br />

kommunikationshjalpmedel for dova - en forsk ningsoversikt.<br />

TRITA -TM 4-84. Stockholm: Kungl. Teknisk a Hogskolan, Inst<br />

for Taloverforing och Musik akustik.<br />

19. Vardanian, R. (1964). Teaching<br />

oscilloscope displays. 1!!!!g!!g 1!!r!!i!!g<br />

English through<br />

3-4 pp. 109-117.<br />

170

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!