13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm Universitybe more general. However, considering thatchildren usually master speech sounds in wordinitialand word-final positions later than inword-medial positions (Linell & Jennische,1980), this limitation should not be disqualifyingon its own.The speech data in this work came fromadult speakers. New challenges can be expectedwhen faced with children’s voices, e.g. increasedvariability in the speech database(Gerosa et al, 2007). Moreover, variability inthe speech of the intended user - the child in thetherapy room - can also be expected. (Not tomention the variability from child to child inmotivation and ability and will to comply withthe therapist’s intervention plans.)The evaluation showed that there is muchroom for improving naturalness, and fortunately,some improvement strategies can besuggested. First, more manipulations withweighting factors might be a way to assure thatthe combinations that are ranked the highest arealso the ones that sound the best. As of now,this is not always the case. During the course ofthis investigation, attempts were made at increasingthe size of the target corpus, by includingword-initial voiceless plosives within utterancesas well. However, these efforts did notimprove the quality of the output concatenatedspeech samples. The current system does notinvolve any spectral smoothing; this might be away to polish the concatenation joints to improvenaturalness.Looking beyond the context of modified resynthesisto assist therapy with children withphonological impairments, the finding that it isindeed possible to generate natural soundingconcatenations of segments from differentspeakers might be valuable in concatenativesynthesis development in general. This mightbe useful in the context of extending a speechdatabase if the original speaker is no longeravailable, e.g. with new phonemes. However, itseems reasonable to assume that the method isonly applicable to voiceless segments.AcknowledgementsThis work was funded by The Swedish GraduateSchool of Language Technology (GSLT).ReferencesCarlson, R. & Granström, B. (2005) Datadrivenmultimodal synthesis. Speech Communication47, 182-193.Gerosa, M., Gioliani, D. & Brugnara, F. (2007)Acoustic variability and automatic recognitionof children’s speech. Speech Communication49, 847-835.Hewlett, N. (1992) Processes of developmentand production. In Grunwell, P. (ed.) DevelopmentalSpeech Disorders, 15-38. London:Whurr.Hunt, A. and Black, A. (1996) Unit selection ina concatenative speech synthesis system usinga large speech database. <strong>Proceedings</strong> ofICASSP 96 (Atlanta, Georgia), 373-376.Iskra, D., Grosskopf, B., Marasek, K., Van DenHeuvel, H., Diehl, F., and Kiessling, A.(2002) Speecon - speech databases for consumerdevices: Database specification andvalidation.Linell, P. & Jennische, M. (1980) Barns uttalsutveckling,Stockholm: Liber.Locke, J.L. & Pearson, D.M. (1992) VocalLearning and the Emergence of PhonologicalCapacity. A Neurobiological Approach.In C.A. Ferguson, L. Menn & C. Stoel-Gammon (Eds.), Phonological Development.Models, research, implications.,York: York Press.Protopapas, A. (1998) Modified LPC resynthesisfor controlling speech stimulus discriminability.136th Annual Meeting of theAcoustical Society of America, Norfolk,VA, October 13-16.Ramus, F. & Mehler, J. (1999) Language identificationwith suprasegmental cues: A studybased on speech resynthesis, Journal of theAcoustical Society of America 105, 512-521.Shuster, L. I. (1998) The perception of correctlyand incorrectly produced /r/. Journalof Speech, Language and Hearing Research41, 941-950.Sjölander, K. The Snack sound toolkit, Departmentof Speech, Music and Hearing,KTH, Stockholm, Sweden. Online:http://www.speech.kth.se/snack/, 1997-2004, accessed on April 12, <strong>2009</strong>.Sjölander, K. (2003) An HMM-based systemfor automatic segmentation and alignmentof speech. <strong>Proceedings</strong> of <strong>Fonetik</strong> 2003(Umeå University, Sweden), PHONUM 9,93-96.Taylor, P. (2008) Text-to-Speech Synthesis,Cambridge University Press.Wik, P. (2004) Designing a virtual languagetutor. <strong>Proceedings</strong> of <strong>Fonetik</strong> 2004 (StockholmUniversity, Sweden), 136-139.201

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!