13.07.2015 Views

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

Proceedings Fonetik 2009 - Institutionen för lingvistik

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Proceedings</strong>, FONETIK <strong>2009</strong>, Dept. of Linguistics, Stockholm UniversitySteps for a fully functional text-independentsystem in PraatFirst of all some kind of parameterization has tobe made of the recordings at hand. In this firstimplementation SPro (Guillaume, 2004) waschosen for parameter extraction as there was alreadysupport for this implemented in the Mistralprograms. There are 2 ways to extract parameters,either you choose a folder with audiofiles (preferably wave format, however otherformats are supported) or you record a sound inPraat directly. If the recording is supposed to bea user of the system (or a target) a scroll listwith a first option “New User” can be chosen.This function will control the sampling frequencyand resample if sample frequency isother than 16 kHz (currently default), perform aframe selection by excluding silent frameslonger than 100 ms before 19 MFCCs are extractedand stored in parameter file. The parametersare then automatically energy normalizedbefore storage. The name of the user is thenalso stored in a list of users for the system. Ifyou want to add more users you go through thesame procedure again. When you are done youcan choose the next option in the scroll listcalled “Train Users”. This procedure will controlthe list of users and then normalize andtrain the users using a background model(UBM) trained using Maximum LikelihoodCriterion. The individual models are trained tomaximise the a posteriori probability that theclaimed identity is the true identity given thedata (MAP training). This procedure requiresthat you already have a trained UBM. However,if you do not, you can choose the function“Train World” which will take your list of users(if you have not added others to be included inthe world model solely) and train one with thedefault of 512 Gaussian mixture models(GMM). The last option on the scroll list is instead“Recognise User” which will test the recordingagainst all the models trained by thesystem. A list of raw (not normalised) log likelihoodratio scores gives you feedback on howwell the recording fitted any of the models. In acommercial or fully-fledged verification systemyou would also have to test and decide onthreshold, as that is not the main purpose herewe are only going to speculate on possible useof threshold for this demo system.Preliminary UBM performance testTo get first impression how well the implementationworked a small pilot study was madeusing 2 different world models. For this purpose13 colleagues (4 female and 9 males) atthe department of linguistics were recorded usinga headset microphone. To enroll them asusers they had to read a short passage from awell known text (a comic about a boy endingup with his head in the mud). The recordingsfrom the reading task were between 25-30seconds. 3 of the speakers were later recordedto test the system using the same kind of headset.1 male and 1 female speaker was then alsorecorded to be used as impostors. For the testutterances the subjects were told to produce anutterance close to “Hej, jag heter X, jag skullevilja komma in, ett två tre fyra fem.” (“Hi, I amX, I would like to enter, one two three fourfive.”). The tests were run twice. In the first testonly the enrolled speakers were used as UBM.In the second the UBM was trained on excerptsfrom interviews with 109 young male speakersfrom the Swedia dialect database (Eriksson,2004). The enrolled speakers were not includedin the second world model.Results and discussionAt the enrollment of speakers some mistakes inthe original scripts were discovered such ashow to handle clipping in recordings as well asfeedback to the user while training models. Thescripts were updated to take care of that and afterwardsenrollment was done without problems.In the first test only the intended targetspeakers were used to train a UBM before theywere enrolled.LLR0,60,40,20-0,2-0,4-0,6-0,8-1LLR Score Test 1 Speaker RAM M M F F M F M M M M F MRA JA PN JV KC AE EB JL TL JaL SS UV HVRA_t RA_t RA_t RA_t RA_t RA_t RA_t RA_t RA_t RA_t RA_t RA_t RA_tTest SpeakerFigure 1. Result for test 1 speaker RA against allenrolled models. Row 1 shows male (M) or female(F) model, row 2 model name and row 3 the testspeaker.In Figure 1 we can observe that the speaker iscorrectly accepted with the only positive LLR(0.44). The closest following is then the modelof speaker JA (-0.08).195

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!