ontology-based information extraction systems (obies 2008)
ontology-based information extraction systems (obies 2008)
ontology-based information extraction systems (obies 2008)
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Relation Size DatasetDescription Pmanual Pclassic ∆ PF IM ∆ PF IMtuned<br />
albumBy 19852 Musiciansandtheirmusicalworks 80.8% 27.4% -11.6% -18%<br />
bornInYear 172696 personsandtheiryearofbirth 40.7% 19.5% +48.4% +17%<br />
currencyOf 221 countriesandtheirofficialcurrency 46.4% 22.8% -17.6% +10.9%<br />
headquarteredIn 14762 companies and the country of their headquarter<br />
3% 9.8% +2.2% -5.2%<br />
locatedIn 34047 citiesandtheircorrespondingcountry 73% 56.5% -8.4% -0.5%<br />
productOf 2650 productnamesandtheirmanufacturers. 64.6% 42.2% -0.9% +12%<br />
teamOf 8307 sportspersonsandtheirteamorcountry 30% 8.0% +1.4% +0.8%<br />
average 48.3 26.6% +1.9% +4.7%<br />
Table 1.Relationswithprecisionscoresobtainedbytheclassicsystem(manualevaluation)and<br />
differences(∆)measuredwiththetwoFIMconditions.<br />
in * " is frequent, then " * was * in * " is frequent as well). In order to<br />
avoidsuchtoogeneralpatternsandatthesametimeavoidingtoospecificones(e.g.<br />
"Wolfgang Amadeus * was born in * "),weintroducethefollowingrule<br />
forremovingmoregeneralpatterns:ifpattern ahasallconstraintsalsopresentin band<br />
onemore, bisremovedunless SUPPORT(b)isatleast20%higherthan SUPPORT(a).<br />
Thisruleisappliedstartingwiththesmallestpatterns.Weexperimentallydetermined<br />
thatthethresholdof20%leadstoagenerallyratherappropriatesetofpatterns.The<br />
remainingunwantedpatternsarelefttobeeliminatedbyfurtherfiltering.<br />
4 Experimental Evaluation<br />
Thegoalofourexperimentalevaluationistodemonstratetheadvantagesofmodeling<br />
thepatternabstractionsubtaskofiterativepatterninductionasafrequentitemsetmining<br />
(FIM)problem.Wedosobycomparingtheperformanceachievedbyouritemset-<strong>based</strong><br />
implementationwiththeabstractionalgorithmusedinpreviousimplementations(compare[3]).Wedonotintendtoshowthesuperiorityoftheapproach<strong>based</strong>onFrequent<br />
ItemsetMiningtothosefromtheliteratureasthiswouldrequireacommonbenchmark<br />
forlarge-scaleWebRelationExtractionoratleastacommonbasisofimplementation.<br />
Suchastandarddoesnotexistduetothediversityofapplicationsandpatternrepresentationformalismsintheliterature.Yet,weevaluateourresultsonafairlydiverseset<br />
ofnon-taxonomicrelationstoensuregenerality.Thedatasetsweusehavealreadybeen<br />
usedin[3]andareprovidedfordownloadbytheauthors.Asintheseexperiments,we<br />
havealsousedthesame10seedsselectedbyhandandthesameautomaticevaluation<br />
procedure.<br />
4.1 Experimental Setup<br />
Inourexperiments,werelyonthewidelyusedprecisionandrecallmeasurestoevaluateoursystem’soutputwithrespecttothefullextensionoftherelation<br />
2 .Togivean<br />
2 Notethatthisisdifferentfromtheevaluationofothersimilar<strong>systems</strong>whichcalculatethese<br />
measureswithrespecttoaspecificcorpus,thusyieldinghigherscores.AlsoduetotheabscenceofaclosedcorpusinourWebscenario,ournotionofrecallisisnotcomparable.We<br />
use“relativerecall”inthesensethatitreflects<strong>extraction</strong>scomparedtothehighestyieldcount<br />
obtainedoverallexperimentalsettingsweapplied.