W10-09
W10-09
W10-09
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
moreappropriately viewedasthepurpose oftraveling).<br />
The other effect clauses also appear to be<br />
incorrect. Thisshouldnotcomeasmuchofasurprisebecausetherankingwasgeneratedsoleyfrom<br />
thematchscorebetweentheinputdescription and<br />
thecausesin Rd,whicharequiterelevant.<br />
One potential problem with the naïve selection<br />
method is that it ignores information contained in<br />
therankedlist R ′ d ofclausesthatareassociatedwith<br />
the clauses in Rd. In our experiments, we often<br />
observedredundancies in R ′ d thatcapturedgeneral<br />
propertiesofthedesiredinference. Intuitively,contentthatissharedacrosselementsof<br />
R ′ d couldrepresentthecoremeaningofthedesiredinferenceresult.<br />
In what follows, we describe various re-rankings<br />
of R ′ d using this shared content. For each model<br />
described, thefinalinferencepredictionisthetoprankedelementof<br />
R ′ d .<br />
Centroidsimilarity To approximate the shared<br />
contentofdiscourseunitsin R ′ d ,wetreateach<br />
discourseunitasavectorofTFscores.Wethen<br />
computetheaveragevectorandre-rankalldiscourseunitsin<br />
R ′ d basedontheircosinesimilaritywiththeaveragevector.Thisfavorsinferenceresultsthat“agree”withmanyalternative<br />
hypotheses.<br />
Descriptionscorescaling Inthisapproach,weincorporatethescorefrom<br />
Rd intothecentroid<br />
similarityscore,multiplyingthetwoandgiving<br />
equalweighttoeach. Thiscaptures theintuitionthatthetop-rankedelementof<br />
R ′ d should<br />
represent the general content of the list but<br />
shouldalsobelinkedtoanelementof Rdthat<br />
bearshighsimilaritytothegivenstateorevent<br />
description d.<br />
Log-lengthscaling When working with the centroid<br />
similarity score, weoften observed topranked<br />
elements of R ′ d that were only a few<br />
words in length. This was typically the case<br />
when components from sparse TF vectors in<br />
R ′ d matched well with components from the<br />
centroid vector. Ideally, wewould like more<br />
lengthy (but not too long) descriptions. To<br />
achievethis, wemultiplied thecentroid similarityscorebythelogarithmofthewordlength<br />
ofthediscourseunitin R ′ d .<br />
47<br />
Descriptionscore/log-lengthscaling In this approach,wecombinethedescriptionscorescalingandlog-lengthscaling,multiplyingthecentroidsimilaritybybothandgivingequalweight<br />
toallthreefactors.<br />
4.2 Evaluatingthegeneratedtextualinferences<br />
To evaluate the inference re-ranking models described<br />
above, we automatically generated forward/backward<br />
causal andtemporal inferences for<br />
five documents (265 sentences) drawn randomly<br />
from the story corpus. For simplicity, we generated<br />
an inference for each sentence in each document.<br />
Each inference re-ranking model is able to<br />
generatefourtextualinferences(forward/backward<br />
causal/temporal) foreachsentence. Inourexperiments,weonlykeptthehighest-scoringofthefour<br />
inferencesgeneratedbyamodel.Oneoftheauthors<br />
thenmanuallyevaluatedthefinalpredictionsforcorrectness.<br />
Thiswasasubjectiveprocess,butitwas<br />
guidedbythefollowingrequirements:<br />
1. Thegeneratedinferencemustincreasethelocalcoherenceofthedocument.<br />
Asdescribed<br />
byGraesseretal.(1994),readersaretypically<br />
requiredtomakeinferencesaboutthetextthat<br />
leadtoacoherent understanding thereof. We<br />
requiredthegeneratedinferencestoaidinthis<br />
task.<br />
2. The generated inferences must be globally<br />
valid. Todemonstrateglobalvalidity,consider<br />
thefollowingactualoutput:<br />
(4) Ididn’tevenneedajacket(untilIgot<br />
there).<br />
In Example 4, the system-generated forward<br />
temporal inference is shown in parentheses.<br />
Theinferencemakessensegivenitslocalcontext;<br />
however, it is clear from the surroundingdiscourse(notshown)thatajacketwasnot<br />
neededatanypointintime(ithappenedtobe<br />
awarmday). Asaresult,thispredictionwas<br />
taggedasincorrect.<br />
Table 2 presents the results of the evaluation. As<br />
showninthetable, thetop-performing modelsare<br />
thosethatcombinecentroid similarity withoneor<br />
bothoftheotherre-rankingheuristics.