13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm UniversityTraining and recognition experiments wereconducted using HTK (Young et. al., 2005).Separate software was developed for the transformationand the model tree algorithms.Results and discussionRecognition results for the one- and fourelementspeaker profiles are presented in Table1 for different search criteria together with abaseline result for non-transformed models.The error rate of the one-dimensional treebasedsearch was as low as that of the exhaustivesearch at a fraction (25-50%) of the computationalload. This result is especially positive,considering that the latter search is guaranteedto find the global maximum-likelihoodspeaker vector.Even the profile-independent root node providessubstantial improvement compared to thebaseline result. Since there is no estimationprocedure involved, this saves considerablecomputation.In the four-dimensional speaker profile, thecomputational load is less than 1% of the exhaustivesearch. A minimum error rate isreached at a stop level two and three levels belowthe root. Four features yield consistent improvementsover the single feature, except forthe root criterion. Clearly, vocal tract length isvery important, but spectral slope and variancescaling also have positive contribution.Table 1. Number of recognition iterations and worderror rate for one and four-dimensional speakerprofile.Search No. iterations WER(%)alg.Baseline 1 32.21-D 4-D 1-D 4-DExhaustive16 8192 11.5 -Root 1 1 11.9 13.9Level 1 2 16 12.2 11.1Level 2 4 32 11.5 10.2Level 3 6 48 11.2 10.2Leaf 8 50 11.2 10.4Path-max 9 51 11.9 11.6Histograms of warp factors for individualutterances are presented in Figure 1. The distributionsfor exhaustive and 1-dimensional leafsearch are very similar, which corresponds wellwith their small difference in recognition errorrate. The 4-dimensional leaf search distributiondiffers from these, mainly in the peak region.The cause of its bimodal character calls for furtherinvestigation. A possible explanation maylie in the fact that the reference models aretrained on both male and female speakers. Distinctparts have probably been assigned in thetrained models for these two categories. Thetwo peaks might reflect that some utterancesare adjusted to the female parts of the modelswhile others are adjusted to the male parts. Thismight be better caught by the more detailedfour-dimensional estimation.Nbr utterances1201008060402001 1.2 1.4 1.6 1.8Warp factor1-dim exhaustive1-dim tree4-dim treeFigure 1. Histogram of estimated frequency warpfactors for the three estimation techniques.Figure 2 shows scatter diagrams for averagewarp factor per speaker vs. body height forone- and four-dimensional search. The largestdifference between the plots occurs for theshortest speakers, for which the fourdimensionalsearch shows more realistic values.This indicates that the latter makes more accurateestimates in spite of its larger deviationfrom a Gaussian distribution in Figure 1. Thisis also supported by a stronger correlation betweenwarp factor and height (-0.55 vs. -0.64).Warp factor1.61.51.41.31.21.1180 100 120 140 160Body height1.61.51.41.31.21.1180 100 120 140 160Body heightFigure 2. Scatter diagrams of warp factor vs. bodyheight for one- (left) and four-dimensional (right)search. Each sample point is an average of all utterancesof one speaker.The operation of the spectral shape compensationis presented in Figure 3 as an averagefunction over the speakers and for the twospeakers with the largest positive and negativedeviation from the average. The average functionindicates a slope compensation of the frequencyregion below around 500 Hz. This157

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!