ontology-based information extraction systems (obies 2008)

More documents

Recommendations

Info

Relation Size DatasetDescription Pmanual Pclassic ∆ PF IM ∆ PF IMtuned albumBy 19852 Musiciansandtheirmusicalworks 80.8% 27.4% -11.6% -18% bornInYear 172696 personsandtheiryearofbirth 40.7% 19.5% +48.4% +17% currencyOf 221 countriesandtheirofficialcurrency 46.4% 22.8% -17.6% +10.9% headquarteredIn 14762 companies and the country of their headquarter 3% 9.8% +2.2% -5.2% locatedIn 34047 citiesandtheircorrespondingcountry 73% 56.5% -8.4% -0.5% productOf 2650 productnamesandtheirmanufacturers. 64.6% 42.2% -0.9% +12% teamOf 8307 sportspersonsandtheirteamorcountry 30% 8.0% +1.4% +0.8% average 48.3 26.6% +1.9% +4.7% Table 1.Relationswithprecisionscoresobtainedbytheclassicsystem(manualevaluation)and differences(∆)measuredwiththetwoFIMconditions. in * " is frequent, then " * was * in * " is frequent as well). In order to avoidsuchtoogeneralpatternsandatthesametimeavoidingtoospecificones(e.g. "Wolfgang Amadeus * was born in * "),weintroducethefollowingrule forremovingmoregeneralpatterns:ifpattern ahasallconstraintsalsopresentin band onemore, bisremovedunless SUPPORT(b)isatleast20%higherthan SUPPORT(a). Thisruleisappliedstartingwiththesmallestpatterns.Weexperimentallydetermined thatthethresholdof20%leadstoagenerallyratherappropriatesetofpatterns.The remainingunwantedpatternsarelefttobeeliminatedbyfurtherfiltering. 4 Experimental Evaluation Thegoalofourexperimentalevaluationistodemonstratetheadvantagesofmodeling thepatternabstractionsubtaskofiterativepatterninductionasafrequentitemsetmining (FIM)problem.Wedosobycomparingtheperformanceachievedbyouritemset-based implementationwiththeabstractionalgorithmusedinpreviousimplementations(compare[3]).WedonotintendtoshowthesuperiorityoftheapproachbasedonFrequent ItemsetMiningtothosefromtheliteratureasthiswouldrequireacommonbenchmark forlarge-scaleWebRelationExtractionoratleastacommonbasisofimplementation. Suchastandarddoesnotexistduetothediversityofapplicationsandpatternrepresentationformalismsintheliterature.Yet,weevaluateourresultsonafairlydiverseset ofnon-taxonomicrelationstoensuregenerality.Thedatasetsweusehavealreadybeen usedin[3]andareprovidedfordownloadbytheauthors.Asintheseexperiments,we havealsousedthesame10seedsselectedbyhandandthesameautomaticevaluation procedure. 4.1 Experimental Setup Inourexperiments,werelyonthewidelyusedprecisionandrecallmeasurestoevaluateoursystem’soutputwithrespecttothefullextensionoftherelation 2 .Togivean 2 Notethatthisisdifferentfromtheevaluationofothersimilarsystemswhichcalculatethese measureswithrespecttoaspecificcorpus,thusyieldinghigherscores.AlsoduetotheabscenceofaclosedcorpusinourWebscenario,ournotionofrecallisisnotcomparable.We use“relativerecall”inthesensethatitreflectsextractionscomparedtothehighestyieldcount obtainedoverallexperimentalsettingsweapplied.
Fig. 2.Precision,recall,F-measureandextractionratefortheindividualconfigurationsaveraged overallrelations(left);Time(sec.)takenbyarunoftheclassicalinductionalgorithm(squares) andtheFIM-basedalgorithm(circles)overthenumbersofsampleoccurrences.(right) objectivemeasurefortemporalperformance,weusetheExtractionRate,thatis,the numberofcorrectlyextractedtuples TP overtheduration Doftheextractionprocess inseconds(onadualcoremachinewith4GBofRAM): Ex = TP D Figure2showsprecision,recallandF-measureforthreeconfigurationsofthesystem:theclassicconfiguration,theFIMconfigurationwhichusestheproposedmodelingofthelearningproblemwithallparametersunchangedandFIMtunedforwhich theparametershavebeenoptimizedforthenewlearningalgorithm.Inparticular,as FIMismoreefficientthantheclassicmergeprocedure,wecanprocessahighernumberoftuples,suchthatwesetthenumberofoccurrencesdownloadedto200(versus adecreasingnumberasusedin[3]).Alltheotherparametersofthealgorithmhave beenchosenasdescribedthere.Overall,thereisasmallsuperiorityofFIMoverthe classicversionintermsofprecisionandrecall(29%vs.27%and15%vs.11%).Most importantly,thereisaclearsuperiorityintermsofextractionrate(0.19vs.0.05occurrences/second).Thisdifferenceisstatisticallysignificant(two-sidespairedStudent’s t-testwithan α-Levelof0.05). Table1showsthedifferentrelationstogetherwiththesizeoftheirextension,the precisionyieldedbyamanualevaluationofasampleof100tuplesofeachrelation (Pmanual),the precision yieldedby theclassic pattern inductionalgorithm Pclassic aswellastherelativeimprovementsyieldedbyourformulationoftheproblemasa frequentitemsetmining(FIM)taskrelativetotheprecision Pclassic calculatedautomaticallywithrespecttotherelation’sextension 3 .Thebestresultsforeachrelationare highlighted.Ingeneral,weseethatwhiletheresultsvaryforeachrelation,overallthe FIMversionofthealgorithmdoesnotdeterioratetheresults,butevenslightlyimproves themonaverage(+1,9%fortheFIMversionand+4.7%forthetunedFIMversion). 4.2 Discussion Inprinciple,therearenoreasonsforanyoftheabstractionalgorithmstoshowbetter precisionandrecallbecausetheybothexploreallpossiblefrequentlyoccurringpatterns 3 Noteherethattheprecision Pclassiccalculatedautomaticallywithrespecttothedatasetsis muchlowerthantheprecisionobtainedthroughsampledmanualevaluation(Pmanual).This isduetotheinsomecasesunavoidablein-completenessofthedatasetsandorthographicdifferencesintestdataandextractionresults.
Page 1 and 2: Proceedings 1st International and K
Page 3 and 4: Programme Chairs Benjamin Adrian Gu
Page 5 and 6: Scaling up Pattern Induction for We
Page 7 and 8: ITERATIVE PATTERNINDUCTION(Patterns
Page 9: 3.2 Mining for Text Patterns with A
Page 13 and 14: wardlytorepresentallthesedimensions
Page 15 and 16: Ontology-based information extracti
Page 17 and 18: comprise a head and a set of argume
Page 19 and 20: concept IDs into its record. Using
Page 21 and 22: overcrowded areas, they are obstruc
Page 23 and 24: 2 Brief Overview of the Ex system 2
Page 25 and 26: Fig. 1. General scheme of seminar e
Page 27 and 28: - An alternative to building comple
Page 29 and 30: as ‘oblivious’ as a machine. Th
Page 31 and 32: As far as we know, Relation Validat
Page 33 and 34: Assume that those instances are ext
Page 35 and 36: Precision 100.0% 80.0% 60.0% 40.0%
Page 37 and 38: extraction. The method is exactly t

ontology-based information extraction systems (obies 2008)

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?