13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm UniversityAutomatic classification of segmental second languagespeech quality using prosodic featuresEero Väyrynen 1 , Heikki Keränen 2 , Juhani Toivanen 3 and Tapio Seppänen 41 2,4 MediaTeam, University of Oulu3 MediaTeam, University of Oulu & Academy of FinlandAbstractAn experiment is reported exploring whetherthe general auditorily assessed segmental qualityof second language speech can be evaluatedwith automatic methods, based on a number ofprosodic features of the speech data. The resultssuggest that prosodic features can predictthe occurrence of a number of segmental problemsin non-native speech.IntroductionOur research question is: is it possible, by lookinginto the supra-segmentals of a second languagevariety, to gain essential informationabout the segmental aspects, at least in a probabilisticmanner? That is, if we know what kindsof supra-segmental features occur in a secondlanguage speech variety, can we predict whatsome of the segmental problems will be?The aim of this research is to find if suprasegmentalspeech features can be used to constructa segmental model of Finnish secondlanguage speech quality. Multiple nonlinear polynomialregression methods (for general referencesee e.g. Khuri (2003)) are used in an attemptto construct a model capable of predictingsegmental speech errors based solely onglobal prosodic features that can be automaticallyderived from speech recordings.Speech dataThe speech data used in this study was producedby 10 native Finnish speakers (5 maleand 5 female), and 5 native English speakers (2male and 3 female). Each of them read twotexts: first, a part of the Rainbow passage, andsecond, a conversation between two people.Each rendition was then split roughly from themiddle into two smaller parts to form a total of60 speech samples (4 for each person). The datawas collected by Emma Österlund, M.A.Segmental analysisThe human rating of the speech material wasdone by a linguist who was familiar with thetypes of problems usually encountered by Finnswhen learning and speaking English. The ratingwas not based on a scale rating of the overallfluency or a part thereof, but instead on countingthe number of errors in individual segmentalor prosodic units. As a guideline for theanalysis, the classification by Morris-Wilson(1992) was used to make sure that especiallythe most common errors encountered by Finnslearning English were taken into account.The main problems for the speakers were,as was expected for native Finnish speakers,problems with voicing (often with the sibilants),missing friction (mostly /v, θ, ð/), voiceonset time and aspiration (the plosives /p, t, k,b, d, g/), and affricates (post-alveolar instead ofpalato-alveolar). There were also clear problemswith coarticulation, assimilation, linking,rhythm and the strong/weak form distinction,all of which caused unnatural pauses withinword groups.The errors were divided into two rough categories,segmental and prosodic, the lattercomprising any unnatural pauses and wordlevelerrors – problems with intonation wereignored. Subsequently, only the data on thesegmental errors was used for the acousticanalysis.Acoustic analysisFor the speech data, features were calculatedusing the f0Tool software (Seppänen et al.2003). The f0Tool is a software package for automaticprosodic analysis of large quanta ofspeech data. The analysis algorithm first distinguishesbetween the voiced and voiceless partsof the speech signal using a cepstrum basedvoicing detection logic (Ahmadi & Spanias1999) and then determines the f0 contour forthe voiced parts of the signal with a high precisiontime domain pitch detection algorithm(Titze & Haixiang 1993). From the speech signal,over forty acoustic/prosodic parameterswere computed automatically. The parameterswere:116

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!