25.02.2013 Views

A Framework for Evaluating Early-Stage Human - of Marcus Hutter

A Framework for Evaluating Early-Stage Human - of Marcus Hutter

A Framework for Evaluating Early-Stage Human - of Marcus Hutter

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

whethertheoriginalutterancecontainedatargetword,and<br />

thentrainedseveralclassifiersonthelabeleddata. They<br />

usedtheseclassifierstoclassifyutterancesbasedonwhether<br />

theycontainedthetargetword. Thistechniqueachieved<br />

moderatesuccess,butthedatasetwassmall,anditdoesnot<br />

producewordboundaries,whichisthegoal<strong>of</strong>thiswork.<br />

Thisworkmakesuse<strong>of</strong>theVotingExperts(VE)algorithm.VEwasdesignedtodowithdiscretetokensequences<br />

exactlywhatwearetryingtodowithrealaudio. Thatis,<br />

givenalargetimeseries,specifyall<strong>of</strong>thelogicalbreaksso<br />

astosegmenttheseriesintocategoricalepisodes.Themajorcontribution<strong>of</strong>thispaperliesintrans<strong>for</strong>minganaudio<br />

signalsothattheVEmodelcanbeappliedtoit.<br />

Overview<strong>of</strong>VotingExperts<br />

TheVEalgorithmisbasedonthehypothesisthatnatural<br />

breaksinasequenceareusuallyaccompaniedbytwoin<strong>for</strong>mationtheoreticsignatures(Cohen,Adams,&Heeringa<br />

2007)(Shannon1951). Thesearelowinternalentropy<strong>of</strong><br />

chunks, andhighboundaryentropybetweenchunks. A<br />

chunkcanbethought<strong>of</strong>asasequence<strong>of</strong>relatedtokens.For<br />

instance,ifwearesegmentingtext,thentheletterscanbe<br />

groupedintochunksthatrepresentthewords.<br />

Internalentropycanbeunderstoodasthesurpriseassociatedwithseeingthegroup<strong>of</strong>objectstogether.Morespecifically,itisthenegativelog<strong>of</strong>theprobability<strong>of</strong>thoseobjectsbeingfoundtogether.Givenashortsequence<strong>of</strong>tokens<br />

takenfromalongertimeseries,theinternalentropy<strong>of</strong>the<br />

shortsequenceisthenegativelog<strong>of</strong>theprobability<strong>of</strong>findingthatsequenceinthelongertimeseries.Sothehigherthe<br />

probability<strong>of</strong>achunk,theloweritsinternalentropy.<br />

Boundaryentropyistheuncertaintyattheboundary<strong>of</strong><br />

achunk. Givenasequence<strong>of</strong>tokens,theboundaryentropyistheexpectedin<strong>for</strong>mationgain<strong>of</strong>beingtoldthenext<br />

tokeninthetimeseries. Thisiscalculatedas HI(c) =<br />

− � m<br />

h=1<br />

P(h,c)log(P(h,c))where cisthisgivensequence<br />

<strong>of</strong>tokens, P(h,c)istheconditionalprobability<strong>of</strong>symbol h<br />

following cand misthenumber<strong>of</strong>tokensinthealphabet.<br />

Well<strong>for</strong>medchunksaregroups<strong>of</strong>tokensthatarefoundtogetherinmanydifferentcircumstances,sotheyaresomewhatunrelatedtothesurroundingelements.<br />

Thismeans<br />

that,givenasubsequence,thereisnoparticulartokenthat<br />

isverylikelyt<strong>of</strong>ollowthatsubsequence.<br />

Inordertosegmentadiscretetimeseries,VEpreprocessorsthetimeseriestobuildann-gramtrie,whichrepresents<br />

allitspossiblesubsequences<strong>of</strong>lengthlessthanorequalton.<br />

Itthenpassesaslidingwindow<strong>of</strong>lengthnovertheseries.<br />

Ateachwindowlocation,two“experts”voteonhowthey<br />

wouldbreakthecontents<strong>of</strong>thewindow.Oneexpertvotes<br />

tominimizetheinternalentropy<strong>of</strong>theinducedchunks,and<br />

theothervotestomaximizetheentropyatthebreak. The<br />

expertsusethetrietomakethesecalculations.Afterallthe<br />

voteshavebeencast,thesequenceisbrokenatthe“peaks”locationsthatreceivedmorevotesthantheirneighbors.This<br />

algorithmcanberuninlineartimewithrespecttothelength<br />

<strong>of</strong>thesequence,andcanbeusedtosegmentverylongsequences.Forfurtherdetails,seethejournalarticle(Cohen,<br />

Adams,&Heeringa2007).<br />

ItisimportanttoemphasizetheVEmodelovertheactual<br />

139<br />

implementation<strong>of</strong>VE.Thegoal<strong>of</strong>ourworkistosegment<br />

audiospeechbasedonthesein<strong>for</strong>mationtheoreticmarkers,<br />

andtoevaluatehowwelltheywork<strong>for</strong>thistask. Inorder<br />

todothis,weuseaparticularimplementation<strong>of</strong>VotingExperts,andtrans<strong>for</strong>mtheaudiodataintoa<strong>for</strong>matitcanuse.<br />

Thisisnotnecessarilythebestwaytoapplythismodelto<br />

audiosegmentation.Butitisonewaytousethismodelto<br />

segmentaudiospeech.<br />

The model <strong>of</strong> segmenting based on low internal entropyandhighboundaryentropyisalsocloselyrelatedto<br />

theworkinpsychologymentionedabove(Saffranetal.<br />

1999). Specifically,theysuggestthathumanssegmentaudiostreamsbasedonconditionalprobability.Thatis,given<br />

twophonemesAandB,weconcludethatABispart<strong>of</strong>a<br />

wordiftheconditionalprobability<strong>of</strong>BoccurringafterAis<br />

high.Similarly,weconcludethatABisnotpart<strong>of</strong>awordif<br />

theconditionalprobability<strong>of</strong>BgivenAislow.Thein<strong>for</strong>mationtheoreticmarkers<strong>of</strong>VEaresimplyamoresophisticatedcharacterization<strong>of</strong>exactlythisidea.Internalentropyisdirectlyrelatedtotheconditionalprobabilityinside<strong>of</strong>words.<br />

Andboundaryentropyisdirectlyrelatedtotheconditional<br />

probabilitybetweenwords.Sowewouldliketobeableto<br />

useVEtosegmentaudiospeech,bothtotestthishypothesis<br />

andtopossiblyfacilitatenaturallanguagelearning.<br />

ExperimentalProcedure<br />

Ourprocedurecanbebrokendownintothreesteps.1)Temporallydiscretizetheaudiosequencewhileretainingtherelevantin<strong>for</strong>mation.<br />

2)Tokenizethediscretesequence. 3)<br />

ApplyVEtothetokenizedsequencetoobtainthelogical<br />

breaks.Thesethreestepsaredescribedindetailbelow,and<br />

illustratedinFigure2.<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

100 200 300 400 500 600 700 800<br />

Figure1:Avoiceprint<strong>of</strong>thefirstfewseconds<strong>of</strong>one<strong>of</strong>our<br />

audiodatasets. Theverticalaxisrepresents33frequency<br />

binsandthehorizontalaxisrepresentstime. Theintensity<br />

<strong>of</strong>eachfrequencyisrepresentedbythecolor.Eachvertical<br />

line<strong>of</strong>pixelsthenrepresentsaspectrogramcalculatedover<br />

ashortHammingwindowataspecificpointintime.<br />

Step1<br />

Inordertodiscritizethesequence, weusedthediscrete<br />

Fouriertrans<strong>for</strong>mintheSphinxs<strong>of</strong>twarepackagetoobtainthespectrogramin<strong>for</strong>mation(Walkeretal.2004).We<br />

alsotookadvantage<strong>of</strong>theraisedcosinewindowerandthe<br />

pre-emphasizerinSphinx.Theaudiostreamwaswindowed<br />

into26.6mswidesegmentscalledHammingwindows,taken<br />

every10ms(i.e. thewindowswereoverlapping). The<br />

windoweralsoappliedatrans<strong>for</strong>mationonthewindowto<br />

emphasizethecentralsamplesandde-emphasizethoseon<br />

theedge. Thenthepre-emphasizernormalizedthevolume<br />

acrossthefrequencyspectrum. Thiscompensates<strong>for</strong>the<br />

naturalattenuation(decreaseinintensity)<strong>of</strong>soundasthe<br />

frequencyisincreased.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!