13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm Universityment in future research will be to have learnerspractice presentation skills without teachermodels.It is important to point out that we cannotdetermine from these data that speakers becamebetter presenters as a result of their participationin this study. A successful presentation entails,of course, very many features, and usingpitch well is only one of them. Other vocal featuresthat are important are the ability to clearlyarticulate the sounds of the language, the rate ofspeech, and the ability to speak with an intensitythat is appropriate to the spatial setting. Inaddition, there are numerous other features regardingthe interaction of content, delivery andaudience that play a critical role in how thepresentation is received. Our presentation data,gathered as they were from real-life classroomsettings, are in all likelihood too varied to allowfor a study that attempted to find a correlationbetween pitch variation and, for example, theperceived clarity of a presentation. However,we do wish to explore perceptions of the speakers.We also plan to develop feedback gaugesfor other intonational features, beginning withrate of speech. We see potential to develop language-specificintonation pattern detectors thatcould respond to, for example, a speaker’s tendencyto use French intonation patterns whenspeaking English. Such gauges could form atype of toolbox that students and teachers coulduse as a resource in the preparation and assessmentof oral presentations.Our study contributes to the field in a numberof ways. It is, to the best of our knowledge,the first to rely on a synthesis of online fundamentalfrequency data in relation to learnerproduction. We have not shown the speakersthe absolute fundamental frequency itself, butrather how much it has varied over time as representedby the standard deviation. This variableis known to characterize discourse intendedfor a large audience (Johns-Lewis,1986), and is also a variable that listeners canperceive if they are asked to distinguish livelyspeech from monotone (Hincks, 2005; Traunmüller& Eriksson, 1995). In this paper, wehave demonstrated that it is a variable that caneffectively stimulate production as well. Furthermore,the variable itself provides a meansof measuring, characterizing and comparingspeaker intonation. It is important to point outthat enormous quantities of data lie behind thevalues reported in our results. Measurements offundamental frequency were made 100 times asecond, for stretches of speech up to 45 minutesin length, giving tens of thousands of datapoints per speaker for the training utterances.By converting the Hertz values to the logarithmicsemitone scale, we are able to make validcomparisons between speakers with differentvocal ranges. This normalization is an aspectthat appears to be neglected in commercial pronunciationprograms such as Auralog’s Tell MeMore series, where pitch curves of speakers ofdifferent mean frequencies can be indiscriminatelycompared. There is a big difference inthe perceptual force of a rise in pitch of 30Hzfor a speaker of low mean frequency and onewith high mean frequency, for example. Thesedifferences are normalized by converting tosemitones.Secondly, our feedback can be used for theproduction of long stretches of free speechrather than short, system-generated utterances.It is known that intonation must be studied at ahigher level than that of the word or phrase inorder for speech to achieve proper cohesiveforce over longer stretches of discourse. Bypresenting the learners with information abouttheir pitch variation in the previous ten secondsof speech, we are able to incorporate and reflectthe vital movement that should occur when aspeaker changes topic, for example. In an idealworld, most teachers would have the time to sitwith students, examine displays of pitch tracings,and discuss how peaks of the tracings relateto each other with respect to theoreticalmodels such as Brazil’s intonational paragraphs(Levis & Pickering, 2004). Our system cannotapproach that level of detail, and in fact cannotmake the connection between intonation and itslexical content. However, it can be used bylearners on their own, in the production of anycontent they choose. It also has the potential forfuture development in the direction of morefine-grained analyses.A third novel aspect of our feedback is thatit is transient and immediate. Our lights flickerand then disappear. This is akin to the way wenaturally process speech; not as something thatcan be captured and studied, but as soundwaves that last no longer than the millisecondsit takes to perceive them. It is also more similarto the way we receive auditory and sensoryfeedback when we produce speech – we onlyhear and feel what we produce in the very instancewe produce it; a moment later it is gone.Though at this point we can only speculate, itwould be interesting to test whether transient106

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!