W10-09
W10-09
W10-09
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Inference accuracy<br />
0.3<br />
0.25<br />
0.2<br />
0.15<br />
0.1<br />
0.05<br />
Re-rankingmodel Inferenceaccuracy(%)<br />
None 10.19<br />
Centroidsimilarity 12.83<br />
Descriptionscorescaling 17.36<br />
Log-lengthscaling 12.83<br />
Descriptionscore/log-lengthscaling 16.60<br />
0<br />
0.2<br />
0.05<br />
0.1<br />
0.15<br />
Table2:Inferencegenerationevaluationresults.<br />
0.4<br />
0.25<br />
0.3<br />
0.35<br />
0.8<br />
0.45<br />
0.5<br />
0.55<br />
0.6<br />
0.65<br />
0.7<br />
0.75<br />
Confidence-ordered percentage of all<br />
inferences<br />
0.85<br />
0.9<br />
0.95<br />
1<br />
None<br />
Centroid similarity<br />
Description score scaling<br />
Log-length scaling<br />
Combined scaling<br />
Figure1: Inferencerateversusaccuracy. Valuesalongthex-axisindicatethatthetop-scoringx%ofallinferences<br />
wereevaluated.Valuesalongthey-axisindicatethepredictionaccuracy.<br />
Theanalysisabovedemonstratestherelativeperformanceofthemodelswhenmakinginferencesfor<br />
allsentences; howeveritisprobably thecase that<br />
manygeneratedinferencesshouldberejecteddueto<br />
theirlowscore.Becausetheoutputscoresofasingle<br />
modelcanbemeaningfullycomparedacrosspredictions,itispossibletoimposeathresholdontheinferencegenerationprocesssuchthatanyprediction<br />
scoring atorbelowthethreshold iswithheld. We<br />
variedthepredictionthresholdfromzerotoavalue<br />
sufficientlylargethatitexcludedallpredictionsfor<br />
amodel. Doingsodemonstrates the trade-off betweenmakingalargenumberoftextualinferences<br />
and making accurate textual inferences. Figure 1<br />
showstheeffectsofthisvariableonthere-ranking<br />
models. Asshown in Figure 1, the highest inferenceaccuracyisreachedbythere-rankerthatcombinesdescriptionscoreandlog-lengthscalingwiththecentroidsimilaritymeasure.Thisaccuracyisat-<br />
48<br />
tainedbykeepingthetop25%mostconfidentinferences.<br />
5 Conclusions<br />
We have presented an approach to commonsense<br />
reasoningthatrelieson(1)theavailabilityofalarge<br />
corpusofpersonalweblogstoriesand(2)theabilitytoanalyzeandperforminferencewiththesestories.Ourcurrentresults,althoughpreliminary,suggestnovelandimportantareasoffutureexploration.<br />
Wegroupourobservationsaccordingtothelasttwo<br />
problemsidentifiedbyGordonandSwanson(2008):<br />
storyanalysisandenvisioningwiththeanalysisresults.<br />
5.1 Storyanalysis<br />
AsinotherNLPtasks,weobservedsignificantperformancedegradationwhenmovingfromthetraininggenre(newswire)tothetestinggenre(Internet