13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm UniversityThe found warping factors for the child and agegroups were then applied on the test-data tomeasure the implication on the WER. The resultof this experiment is given in Table 2. Introducingphoneme-specific warping did not substantiallyreduce the number of errors comparedto a shared warping factor for all phonemes.Table 2. Recognition results with adult modeladapted to children using a fixed warping vector forall utterances with one warp factor per phoneme.Phoneme- dependent and -independentwarping is denoted Pd and Pi respectively.Method WERFix Pi 13,7Fix Pd 13,2Fix Pd per age 13,2DiscussionTime invariant VTLN has in recent years beenextended towards phoneme-specific warping.The increase in recognition accuracy duringexperimental studies has however not yet reflectedthe large reduction in mismatch shownby Fant (1975).One reason for the discrepancy can be thatunconstrained warping of different phonemescan cause unrealistic transformation of thephoneme space. For instance swapping places ofthe low left and upper right regions could beperformed by choosing a high and low warpingfactor respectively.ConclusionIn theory phoneme-specific warping has a largepotential for improving the ASR accuracy. Thispotential has not yet been turned into significantlyincreased accuracy in speech recognitionexperiments. One difficulty to manage is thelarge search space resulting from estimating alarge number of parameters. Further research isstill needed to explore remaining approaches ofincorporating phoneme-dependent warping intoASR.AcknowledgementsThe authors wish to thank the Swedish ResearchCouncil for founding the research presented inthis paper.ReferencesBatliner A, Blomberg M, D’Acry S, Elenius Dand Giuliani D. (2005). The PF_STARChildren’s Speech Corpus. Interspeech2005, 2761 – 2764.Elenius, D., Blomberg, M. (2005) Adaptationand Normalization Experiments in SpeechRecognition for 4 to 8 Year old Children. InProc Interspeech 2005, pp. 2749 - 2752.Fant, G. (1975) Non-uniform vowel normalization.STL-QPSR. Quartely Progress andStatus Report. Departement for Speech Musicand Hearing, Stockholm, Sweden 1975.Giuliani, D., Gerosa, M. and Brugnara, F. (2006)Improved Automatic Speech RecognitionThrough Speaker Normalization. ComputerSpeech & Language, 20 (1), pp. 107-123,Jan. 2006.Großkopf B, Marasek K, v. d. Heuvel, H., DiehlF, and Kiessling A (2002). SpeeCon - speechdata for consumer devices: Database specificationand validation. Second InternationalConference on Language Resources andEvaluation 2002.Lee, L., and Rose, R. (1996) Speaker NormalizationUsing Efficient Frequency WarpingProcedures. In proc. Int. Conf. on Acoustic,Speech and Signal Processing,1996, Vol 1,pp. 353-356.Maragakis, M. G. and Potamianos, A. (2008)Region-Based Vocal Tract Length Normalizationfor ASR. Interspeech 2008. pp.1365 - 1368.Miguel, A., Lleida, E., Rose R. C.,.Buera, L. andortega, A. (2005) Augmented state spaceacoustic decoding for modeling local variabilityin speech. In Proc. Int. Conf. SpokenLanguage Processing. Sep 2005.Narayanan, S., Potamianos, A. (2002) CreatingConversational Interfaces for Children. IEEETransactions on Speech and Audio Processing,Vol. 10, No. 2, February 2002.Pitz, M. and Ney, H. (2005) Vocal Tract NormalizationEquals Linear Transformation inCepstral Space, IEEE Trans. On Speech andAudio Processing, 13(5):930-944, 2005.Potamianos, A. Narayanan, S. (2003) RobustRecognition of Children’s Speech. IEEETransactions on Speech and Audio Processing,Vol 11, No 6, November 2003. pp.603 – 616.Welling, L., Kanthak, S. and Ney, H. (1999)Improved Methods for Vocal Tract Normalization.ICASSP 99, Vol 2, pp. 161-164.148

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!