13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm UniversityResponses (%)1007550250Bottom Middle TopRankTraditional Adapted Gesture 70%Figure 6. Rank distributions for the traditional,adapted and gesture 70% systems.The outcome of the experiment should be consideredwith some caution due to the selectionof the subject group. However, the results indicatethat the gesture system has an advantageover the other two systems and that the adaptedsystem is ranked higher than the traditional system.The maximum rankings are 64%, 72% and71% for the traditional, adapted and gesturesystems, respectively. Our initial hypothesiswas that these systems would be ranked withthe traditional system at the bottom and thegesture system at the top. This is in fact true in58% of the cases with a standard deviation of21%. One subject contradicted this hypothesisin only one out off 12 cases while another subjectdid the same in as many as 9 cases. Thehypothesis was confirmed by all subjects forone utterance and by only one subject for anotherone.The adapted system is based on data fromthe diphone unit library and was created toform a homogeneous base for combining rulebasedand unit-based synthesis as smoothly aspossible. It is interesting that even these firststeps, creating the adapted system, are regardedto be an improvement. The diphone library hasnot yet been matched to the dialect of the referencespeaker, and a number of diphones aremissing.Final remarksThis paper describes our work on building formantsynthesis systems based on both rulegeneratedand database driven methods. Thetechnical and perceptual evaluations show thatthis approach is a very interesting path to explorefurther at least in a research environment.The perceptual results showed an advantage innaturalness for the gesture system which includesboth speaker adaptation and a diphonedatabase of formant gestures, compared to boththe traditional reference system and the speakeradapted system. However, it is also apparentfrom the synthesis quality that a lot of workstill needs to be put into the automatic buildingof a formant unit library.AcknowledgementsThe diphone database was recorded using theWaveSurfer software. David Öhlin contributedin building the diphone database. We thankJohn Lindberg and Roberto Bresin for makingthe evaluation software available for the perceptualranking. The SpeeCon database wasmade available by Kjell Elenius. We thank allsubjects for their participation in the perceptualevaluation.ReferencesAcero, A. (1999) “Formant analysis and synthesisusing hidden Markov models”, In:Proc. of Eurospeech'99, pp. 1047-1050.Carlson, R., and Granström, B. (2005) “Datadrivenmultimodal synthesis”, SpeechCommunication, Volume 47, Issues 1-2,September-October 2005, Pages 182-193.Carlson, R., Granström, B., and Karlsson, I.(1991) “Experiments with voice modellingin speech synthesis”, Speech Communication,10, 481-489.Carlson, R., Granström, B., Hunnicutt, S.(1982) “A multi-language text-to-speechmodule”, In: Proc. of the 7th InternationalConference on Acoustics, Speech, and SignalProcessing (ICASSP’82), Paris, France,vol. 3, pp. 1604-1607.Carlson, R., Sigvardson, T., Sjölander, A.(2002) “Data-driven formant synthesis”, In:Proc. of <strong>Fonetik</strong> 2002, Stockholm, Sweden,STL-QPSR 44, pp. 69-72Großkopf, B., Marasek, K., v. d. Heuvel, H.,Diehl, F., Kiessling, A. (2002) “SpeeCon -speech data for consumer devices: Databasespecification and validation”, Proc. LREC.Hertz, S. (2002) “Integration of Rule-BasedFormant Synthesis and Waveform Concatenation:A Hybrid Approach to Text-to-Speech Synthesis”, In: Proc. IEEE 2002Workshop on Speech Synthesis, 11-13, September2002 Santa Monica, USA.90

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!