A Framework for Evaluating Early-Stage Human - of Marcus Hutter
A Framework for Evaluating Early-Stage Human - of Marcus Hutter
A Framework for Evaluating Early-Stage Human - of Marcus Hutter
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
whethertheoriginalutterancecontainedatargetword,and<br />
thentrainedseveralclassifiersonthelabeleddata. They<br />
usedtheseclassifierstoclassifyutterancesbasedonwhether<br />
theycontainedthetargetword. Thistechniqueachieved<br />
moderatesuccess,butthedatasetwassmall,anditdoesnot<br />
producewordboundaries,whichisthegoal<strong>of</strong>thiswork.<br />
Thisworkmakesuse<strong>of</strong>theVotingExperts(VE)algorithm.VEwasdesignedtodowithdiscretetokensequences<br />
exactlywhatwearetryingtodowithrealaudio. Thatis,<br />
givenalargetimeseries,specifyall<strong>of</strong>thelogicalbreaksso<br />
astosegmenttheseriesintocategoricalepisodes.Themajorcontribution<strong>of</strong>thispaperliesintrans<strong>for</strong>minganaudio<br />
signalsothattheVEmodelcanbeappliedtoit.<br />
Overview<strong>of</strong>VotingExperts<br />
TheVEalgorithmisbasedonthehypothesisthatnatural<br />
breaksinasequenceareusuallyaccompaniedbytwoin<strong>for</strong>mationtheoreticsignatures(Cohen,Adams,&Heeringa<br />
2007)(Shannon1951). Thesearelowinternalentropy<strong>of</strong><br />
chunks, andhighboundaryentropybetweenchunks. A<br />
chunkcanbethought<strong>of</strong>asasequence<strong>of</strong>relatedtokens.For<br />
instance,ifwearesegmentingtext,thentheletterscanbe<br />
groupedintochunksthatrepresentthewords.<br />
Internalentropycanbeunderstoodasthesurpriseassociatedwithseeingthegroup<strong>of</strong>objectstogether.Morespecifically,itisthenegativelog<strong>of</strong>theprobability<strong>of</strong>thoseobjectsbeingfoundtogether.Givenashortsequence<strong>of</strong>tokens<br />
takenfromalongertimeseries,theinternalentropy<strong>of</strong>the<br />
shortsequenceisthenegativelog<strong>of</strong>theprobability<strong>of</strong>findingthatsequenceinthelongertimeseries.Sothehigherthe<br />
probability<strong>of</strong>achunk,theloweritsinternalentropy.<br />
Boundaryentropyistheuncertaintyattheboundary<strong>of</strong><br />
achunk. Givenasequence<strong>of</strong>tokens,theboundaryentropyistheexpectedin<strong>for</strong>mationgain<strong>of</strong>beingtoldthenext<br />
tokeninthetimeseries. Thisiscalculatedas HI(c) =<br />
− � m<br />
h=1<br />
P(h,c)log(P(h,c))where cisthisgivensequence<br />
<strong>of</strong>tokens, P(h,c)istheconditionalprobability<strong>of</strong>symbol h<br />
following cand misthenumber<strong>of</strong>tokensinthealphabet.<br />
Well<strong>for</strong>medchunksaregroups<strong>of</strong>tokensthatarefoundtogetherinmanydifferentcircumstances,sotheyaresomewhatunrelatedtothesurroundingelements.<br />
Thismeans<br />
that,givenasubsequence,thereisnoparticulartokenthat<br />
isverylikelyt<strong>of</strong>ollowthatsubsequence.<br />
Inordertosegmentadiscretetimeseries,VEpreprocessorsthetimeseriestobuildann-gramtrie,whichrepresents<br />
allitspossiblesubsequences<strong>of</strong>lengthlessthanorequalton.<br />
Itthenpassesaslidingwindow<strong>of</strong>lengthnovertheseries.<br />
Ateachwindowlocation,two“experts”voteonhowthey<br />
wouldbreakthecontents<strong>of</strong>thewindow.Oneexpertvotes<br />
tominimizetheinternalentropy<strong>of</strong>theinducedchunks,and<br />
theothervotestomaximizetheentropyatthebreak. The<br />
expertsusethetrietomakethesecalculations.Afterallthe<br />
voteshavebeencast,thesequenceisbrokenatthe“peaks”locationsthatreceivedmorevotesthantheirneighbors.This<br />
algorithmcanberuninlineartimewithrespecttothelength<br />
<strong>of</strong>thesequence,andcanbeusedtosegmentverylongsequences.Forfurtherdetails,seethejournalarticle(Cohen,<br />
Adams,&Heeringa2007).<br />
ItisimportanttoemphasizetheVEmodelovertheactual<br />
139<br />
implementation<strong>of</strong>VE.Thegoal<strong>of</strong>ourworkistosegment<br />
audiospeechbasedonthesein<strong>for</strong>mationtheoreticmarkers,<br />
andtoevaluatehowwelltheywork<strong>for</strong>thistask. Inorder<br />
todothis,weuseaparticularimplementation<strong>of</strong>VotingExperts,andtrans<strong>for</strong>mtheaudiodataintoa<strong>for</strong>matitcanuse.<br />
Thisisnotnecessarilythebestwaytoapplythismodelto<br />
audiosegmentation.Butitisonewaytousethismodelto<br />
segmentaudiospeech.<br />
The model <strong>of</strong> segmenting based on low internal entropyandhighboundaryentropyisalsocloselyrelatedto<br />
theworkinpsychologymentionedabove(Saffranetal.<br />
1999). Specifically,theysuggestthathumanssegmentaudiostreamsbasedonconditionalprobability.Thatis,given<br />
twophonemesAandB,weconcludethatABispart<strong>of</strong>a<br />
wordiftheconditionalprobability<strong>of</strong>BoccurringafterAis<br />
high.Similarly,weconcludethatABisnotpart<strong>of</strong>awordif<br />
theconditionalprobability<strong>of</strong>BgivenAislow.Thein<strong>for</strong>mationtheoreticmarkers<strong>of</strong>VEaresimplyamoresophisticatedcharacterization<strong>of</strong>exactlythisidea.Internalentropyisdirectlyrelatedtotheconditionalprobabilityinside<strong>of</strong>words.<br />
Andboundaryentropyisdirectlyrelatedtotheconditional<br />
probabilitybetweenwords.Sowewouldliketobeableto<br />
useVEtosegmentaudiospeech,bothtotestthishypothesis<br />
andtopossiblyfacilitatenaturallanguagelearning.<br />
ExperimentalProcedure<br />
Ourprocedurecanbebrokendownintothreesteps.1)Temporallydiscretizetheaudiosequencewhileretainingtherelevantin<strong>for</strong>mation.<br />
2)Tokenizethediscretesequence. 3)<br />
ApplyVEtothetokenizedsequencetoobtainthelogical<br />
breaks.Thesethreestepsaredescribedindetailbelow,and<br />
illustratedinFigure2.<br />
30<br />
25<br />
20<br />
15<br />
10<br />
5<br />
100 200 300 400 500 600 700 800<br />
Figure1:Avoiceprint<strong>of</strong>thefirstfewseconds<strong>of</strong>one<strong>of</strong>our<br />
audiodatasets. Theverticalaxisrepresents33frequency<br />
binsandthehorizontalaxisrepresentstime. Theintensity<br />
<strong>of</strong>eachfrequencyisrepresentedbythecolor.Eachvertical<br />
line<strong>of</strong>pixelsthenrepresentsaspectrogramcalculatedover<br />
ashortHammingwindowataspecificpointintime.<br />
Step1<br />
Inordertodiscritizethesequence, weusedthediscrete<br />
Fouriertrans<strong>for</strong>mintheSphinxs<strong>of</strong>twarepackagetoobtainthespectrogramin<strong>for</strong>mation(Walkeretal.2004).We<br />
alsotookadvantage<strong>of</strong>theraisedcosinewindowerandthe<br />
pre-emphasizerinSphinx.Theaudiostreamwaswindowed<br />
into26.6mswidesegmentscalledHammingwindows,taken<br />
every10ms(i.e. thewindowswereoverlapping). The<br />
windoweralsoappliedatrans<strong>for</strong>mationonthewindowto<br />
emphasizethecentralsamplesandde-emphasizethoseon<br />
theedge. Thenthepre-emphasizernormalizedthevolume<br />
acrossthefrequencyspectrum. Thiscompensates<strong>for</strong>the<br />
naturalattenuation(decreaseinintensity)<strong>of</strong>soundasthe<br />
frequencyisincreased.