30.05.2013 Views

W10-09

W10-09

W10-09

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

moreappropriately viewedasthepurpose oftraveling).<br />

The other effect clauses also appear to be<br />

incorrect. Thisshouldnotcomeasmuchofasurprisebecausetherankingwasgeneratedsoleyfrom<br />

thematchscorebetweentheinputdescription and<br />

thecausesin Rd,whicharequiterelevant.<br />

One potential problem with the naïve selection<br />

method is that it ignores information contained in<br />

therankedlist R ′ d ofclausesthatareassociatedwith<br />

the clauses in Rd. In our experiments, we often<br />

observedredundancies in R ′ d thatcapturedgeneral<br />

propertiesofthedesiredinference. Intuitively,contentthatissharedacrosselementsof<br />

R ′ d couldrepresentthecoremeaningofthedesiredinferenceresult.<br />

In what follows, we describe various re-rankings<br />

of R ′ d using this shared content. For each model<br />

described, thefinalinferencepredictionisthetoprankedelementof<br />

R ′ d .<br />

Centroidsimilarity To approximate the shared<br />

contentofdiscourseunitsin R ′ d ,wetreateach<br />

discourseunitasavectorofTFscores.Wethen<br />

computetheaveragevectorandre-rankalldiscourseunitsin<br />

R ′ d basedontheircosinesimilaritywiththeaveragevector.Thisfavorsinferenceresultsthat“agree”withmanyalternative<br />

hypotheses.<br />

Descriptionscorescaling Inthisapproach,weincorporatethescorefrom<br />

Rd intothecentroid<br />

similarityscore,multiplyingthetwoandgiving<br />

equalweighttoeach. Thiscaptures theintuitionthatthetop-rankedelementof<br />

R ′ d should<br />

represent the general content of the list but<br />

shouldalsobelinkedtoanelementof Rdthat<br />

bearshighsimilaritytothegivenstateorevent<br />

description d.<br />

Log-lengthscaling When working with the centroid<br />

similarity score, weoften observed topranked<br />

elements of R ′ d that were only a few<br />

words in length. This was typically the case<br />

when components from sparse TF vectors in<br />

R ′ d matched well with components from the<br />

centroid vector. Ideally, wewould like more<br />

lengthy (but not too long) descriptions. To<br />

achievethis, wemultiplied thecentroid similarityscorebythelogarithmofthewordlength<br />

ofthediscourseunitin R ′ d .<br />

47<br />

Descriptionscore/log-lengthscaling In this approach,wecombinethedescriptionscorescalingandlog-lengthscaling,multiplyingthecentroidsimilaritybybothandgivingequalweight<br />

toallthreefactors.<br />

4.2 Evaluatingthegeneratedtextualinferences<br />

To evaluate the inference re-ranking models described<br />

above, we automatically generated forward/backward<br />

causal andtemporal inferences for<br />

five documents (265 sentences) drawn randomly<br />

from the story corpus. For simplicity, we generated<br />

an inference for each sentence in each document.<br />

Each inference re-ranking model is able to<br />

generatefourtextualinferences(forward/backward<br />

causal/temporal) foreachsentence. Inourexperiments,weonlykeptthehighest-scoringofthefour<br />

inferencesgeneratedbyamodel.Oneoftheauthors<br />

thenmanuallyevaluatedthefinalpredictionsforcorrectness.<br />

Thiswasasubjectiveprocess,butitwas<br />

guidedbythefollowingrequirements:<br />

1. Thegeneratedinferencemustincreasethelocalcoherenceofthedocument.<br />

Asdescribed<br />

byGraesseretal.(1994),readersaretypically<br />

requiredtomakeinferencesaboutthetextthat<br />

leadtoacoherent understanding thereof. We<br />

requiredthegeneratedinferencestoaidinthis<br />

task.<br />

2. The generated inferences must be globally<br />

valid. Todemonstrateglobalvalidity,consider<br />

thefollowingactualoutput:<br />

(4) Ididn’tevenneedajacket(untilIgot<br />

there).<br />

In Example 4, the system-generated forward<br />

temporal inference is shown in parentheses.<br />

Theinferencemakessensegivenitslocalcontext;<br />

however, it is clear from the surroundingdiscourse(notshown)thatajacketwasnot<br />

neededatanypointintime(ithappenedtobe<br />

awarmday). Asaresult,thispredictionwas<br />

taggedasincorrect.<br />

Table 2 presents the results of the evaluation. As<br />

showninthetable, thetop-performing modelsare<br />

thosethatcombinecentroid similarity withoneor<br />

bothoftheotherre-rankingheuristics.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!