30.05.2013 Views

W10-09

W10-09

W10-09

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Inference accuracy<br />

0.3<br />

0.25<br />

0.2<br />

0.15<br />

0.1<br />

0.05<br />

Re-rankingmodel Inferenceaccuracy(%)<br />

None 10.19<br />

Centroidsimilarity 12.83<br />

Descriptionscorescaling 17.36<br />

Log-lengthscaling 12.83<br />

Descriptionscore/log-lengthscaling 16.60<br />

0<br />

0.2<br />

0.05<br />

0.1<br />

0.15<br />

Table2:Inferencegenerationevaluationresults.<br />

0.4<br />

0.25<br />

0.3<br />

0.35<br />

0.8<br />

0.45<br />

0.5<br />

0.55<br />

0.6<br />

0.65<br />

0.7<br />

0.75<br />

Confidence-ordered percentage of all<br />

inferences<br />

0.85<br />

0.9<br />

0.95<br />

1<br />

None<br />

Centroid similarity<br />

Description score scaling<br />

Log-length scaling<br />

Combined scaling<br />

Figure1: Inferencerateversusaccuracy. Valuesalongthex-axisindicatethatthetop-scoringx%ofallinferences<br />

wereevaluated.Valuesalongthey-axisindicatethepredictionaccuracy.<br />

Theanalysisabovedemonstratestherelativeperformanceofthemodelswhenmakinginferencesfor<br />

allsentences; howeveritisprobably thecase that<br />

manygeneratedinferencesshouldberejecteddueto<br />

theirlowscore.Becausetheoutputscoresofasingle<br />

modelcanbemeaningfullycomparedacrosspredictions,itispossibletoimposeathresholdontheinferencegenerationprocesssuchthatanyprediction<br />

scoring atorbelowthethreshold iswithheld. We<br />

variedthepredictionthresholdfromzerotoavalue<br />

sufficientlylargethatitexcludedallpredictionsfor<br />

amodel. Doingsodemonstrates the trade-off betweenmakingalargenumberoftextualinferences<br />

and making accurate textual inferences. Figure 1<br />

showstheeffectsofthisvariableonthere-ranking<br />

models. Asshown in Figure 1, the highest inferenceaccuracyisreachedbythere-rankerthatcombinesdescriptionscoreandlog-lengthscalingwiththecentroidsimilaritymeasure.Thisaccuracyisat-<br />

48<br />

tainedbykeepingthetop25%mostconfidentinferences.<br />

5 Conclusions<br />

We have presented an approach to commonsense<br />

reasoningthatrelieson(1)theavailabilityofalarge<br />

corpusofpersonalweblogstoriesand(2)theabilitytoanalyzeandperforminferencewiththesestories.Ourcurrentresults,althoughpreliminary,suggestnovelandimportantareasoffutureexploration.<br />

Wegroupourobservationsaccordingtothelasttwo<br />

problemsidentifiedbyGordonandSwanson(2008):<br />

storyanalysisandenvisioningwiththeanalysisresults.<br />

5.1 Storyanalysis<br />

AsinotherNLPtasks,weobservedsignificantperformancedegradationwhenmovingfromthetraininggenre(newswire)tothetestinggenre(Internet

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!