13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm Universityusing 16 bits at 16 kHz. Two sets were formedfor training and evaluation respectively consistingof 60 speakers each to match Pf-Star.Recognition systemThe adaptation scheme was performed using aphone-level HMM-system (Hidden MarkowModel) for connected digit-string recognition.Each string was assumed to be framed by silence(/sil/) and consist of an arbitrary number ofdigit-words. These were modeled as concatenationsof three state three-phone models ended byan optional short-pause model. The short pausemodel consisted of one state, which shared it’spdf (probability density function) with the centrestate of the silence model.The distribution of speech features in eachstate was modeled using GMMs (GaussianMixture Models) with 16 mixtures and diagonalcovariance matrices. The feature vector usedconsisted of 13 * 3 elements. These elementscorrespond to static parameters and their firstand second order time derivatives. The staticcoefficients consisted of the normalized log energyof the signal and MFCCs (Mel FrequencyCepstrum Coefficients). These coefficients wereextracted using a cosine transform of a melscaled filter bank consisting of 38 channels inthe range corresponding to the interval 0 to 7.6kHz.Training and recognition experiments wereconducted using the HTK speech recognitionsoftware package (Young et.al., 2005). Phoneme-specificadaptation of the acoustic modelsand warp factor search was performed by separateprograms. The adaptation part was performedby applying the correspondingpiece-wise linear VTLT in the model space aswas used in the feature space by Pitz and Ney2005.ResultsThe WER (word error rate) of recognition experimentswhere unsupervised adaptation to thetest utterance was performed is shown in Table 1.The baseline experiment using phoneme-independentwarping resulted in a WER(word error rate) of 13.2%. Introducing twogroups ({/sil/, /t/, /k/} and {the rest of the models})with separate warping factors lowered theerror rate to 12.9%. This required that an exhaustivesearch of all combinations of twowarping factors was performed. If an assumptionthat the warping factor could be estimatedseparately, the performance increase was reducedby 0.2% absolute. Further division byforming a 3:rd group with unvoiced fricatives{/s/, /S/, /f/ and /v/} was also attempted, but withno improvement in recognition to that above. Inthis case /v/ in “två” is mainly unvoicedTable 1. Recognition results with modelgroup-specific warping factors. Unsupervised likelihoodmaximization of each test utterance. Thegroup was formed by separating /sil/, /t/ and /k/from the rest of the models.MethodWERVTLN 1-warping factor 13,2Speech 13,42 Groups (separate estimation) 13,12 Groups (joint maximization) 12,9Phoneme-specific adaptation of an adult recognizerto children resulted in warping factorsgiven in Figure 1. The method gave silence awarping factor of 1.0, which is reasonable. Ingeneral voiced-phonemes were more stronglywarped than un-voiced ditto.1,51,451,41,351,31,251,21,151,11,051sil t s v S f k O n a m o: l e i: r uh: y: e: UFigure 1. Phoneme-specific warp adapting adultmodels to children sorted in increasing warp-factor.Further division of the adaptation data into agegroups resulted in the age and phoneme-specificwarping factors shown in Figure 2. In general,the least warping of adult models was needed for8 year old children compared to younger children.1,71,61,51,41,31,21,11sil t v S k s O o: f e l a uh: n r e: y: m i: UFigure 2. Phoneme and age-specific warping factors.Optimized on likelihood of adaptation data. Thephonemes are sorted in increasing warp-factor for 6year old speakers.45678147

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!