13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm Universityshape may be explained as children havingsteeper spectral voice source slope than adults,but there may also be influence from differencesin the recording conditions between PF-Star and SPEECON.dB1812600 2000 4000 6000 8000 10000-6-12-18FrequencyMaximumAverageMinimumFigure 3. Transfer function of the voice sourcecompensation filter as an average over all testspeakers and the functions of two extreme speakers.The model variance scaling factor has anaverage value of 1.39 with a standard deviationof 0.11. This should not be interpreted as a ratiobetween the variability among children and thatof adults. This value is rather a measure of theremaining mismatch after compensation of theother features.ConclusionA tree-based search in the speaker profile spaceprovides recognition accuracy similar to an exhaustivesearch at a fraction of the computationalload and makes it practically possible toperform joint estimation in a larger number ofspeaker characteristic dimensions. Using fourdimensions instead of one increased the recognitionaccuracy and improved the property estimation.The distribution of the estimates ofthe individual property features can also provideinsight into the function of the recognitionprocess in speech production terms.STAR Children’s Speech Corpus, Proc. InterSpeech,2761-2764.Blomberg, M., and Elenius, D. (2008) InvestigatingExplicit Model Transformations forSpeaker Normalization. Proc. ISCA ITRWSpeech Analysis and Processing for KnowledgeDiscovery, Aalborg, Denmark,.Fant, G. and Kruckenberg, A. (1996) Voicesource properties of the speech code. TMH-QPSR 37(4), KTH, Stockholm, 45-56.Fant, G., Liljencrants, J. and Lin, Q. (1985) Afour-parameter model of glottal flow. STL-QPSR 4/1985, KTH, Stockholm, 1-13.Großkopf, B., Marasek, K., v. d. Heuvel, H.,Diehl, F., Kiessling, A. (2002) SPEECON -speech data for consumer devices: Databasespecification and validation, Proc. LREC.Lee, L. and Rose, R. C. (1998) A FrequencyWarping Approach to Speaker Normalisation,IEEE Trans. On Speech and AudioProcessing, 6(1): 49-60.Pitz, M. and Ney, H. (2005) Vocal Tract NormalizationEquals Linear Transformation inCepstral Space, IEEE Trans. On Speech andAudio Processing, 13(5):930-944.Potamianos A. and Narayanan S. (2003) RobustRecognition of Children’s Speech, IEEETrans. on Speech and Audio Processing,11(6):603-616.AcknowledgementsThis work was financed by the Swedish ResearchCouncil.ReferencesAkhil, P. T., Rath, S. P., Umesh, S. and Sanand,D. R. (2008) A Computationally EfficientApproach to Warp Factor Estimation inVTLN Using EM Algorithm and SufficientStatistics, Proc. Interspeech.Batliner, A., Blomberg, M., D’Arcy, S.,Elenius, D., Giuliani, D. (2002) The PF-158

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!