13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm Universitycompared to the 11 hours from the professionalspeaker.HMM Contextual FeaturesThe typical HMM synthesis model (Tokuda etal., 2000) can be decomposed into a number ofdistinct layers:• At the acoustic level, a parametricsource-filter model (MLSA-vocoder) isresponsible for signal generation.• Context dependent HMMs, containingprobability distributions for the parametersand their 1 st and 2 nd order derivatives,are used for generation of controlparameter trajectories.• In order to select context dependentHMMs, a decision tree is used, that usesinput from a large feature set to clusterthe HMM models.In this work, we are using the standard modelfor acoustic and HMM level processing, andfocus on adapting the feature set for the decisiontree for the task of modeling dialectal variation.The feature set typically used in HMM synthesisincludes features on segment, syllable,word, phrase and utterance level. Segment levelfeatures include immediate context and positionin syllable; syllable features include stress andposition in word and phrase; word features includepart-of-speech tag (content or functionword), number of syllables, position in phraseetc., phrase features include phrase length interms of syllables and words; utterance levelincludes length in syllables, words and phrases.For our present experiments, we have alsoadded a speaker level to the feature set, sincewe train a voice on multiple speakers. The onlyfeature in this category at present is dialectgroup, which is one of Norrland, Dala, Svea,Göta, Gotland and South of Sweden.In addition to this, we have chosen to add tothe word level a morphological feature statingwhether or not the word is a compound, sincecompound stress pattern often is a significantdialectal feature in Swedish (Bruce et al.,2007). At the syllable level we have added explicitinformation about lexical accent type (accentI, accent II or compound accent).Training of HMM voices with these featuresets is currently in progress and results will bepresented at the conference.AcknowledgementsThe work within the SIMULEKT project isfunded by the Swedish Research Council 2007-<strong>2009</strong>. The data used in this study comes fromNorsk Språkbank (http://sprakbanken.uib.no)ReferencesBruce, G., Schötz, S., & Granström, B. (2007).SIMULEKT – modelling Swedish regionalintonation. <strong>Proceedings</strong> of <strong>Fonetik</strong>, TMH-QPSR, 50(1), 121-124.Lundgren, A. (2005). HMM-baserad talsyntes.Master's thesis, KTH, TMH, CTT.Megyesi, B. (2002). Data-Driven SyntacticAnalysis - Methods and Applications forSwedish. Doctoral dissertation, KTH, Departmentof Speech, Music and Hearing,KTH, Stockholm.Sjölander, K., & Heldner, M. (2004). Wordlevel precision of the NALIGN automaticsegmentation algorithm. In Proc of TheXVIIth Swedish Phonetics Conference, <strong>Fonetik</strong>2004 (pp. 116-119). Stockholm University.Taylor, P. (<strong>2009</strong>). Text-To-Speech Synthesis.Cambridge University Press.Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi,T., & Kitamura, T. (2000). Speechparameter generation algorithms for hmmbasedspeech synthesis. In <strong>Proceedings</strong> ofCASSP 2000 (pp. 1315-1318).Watts, O., Yamagishi, J., Berkling, K., & King,S. (2008). HMM-Based Synthesis of ChildSpeech. <strong>Proceedings</strong> of The 1st Workshopon Child, Computer and Interaction.29

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!