20.11.2012 Views

ontology-based information extraction systems (obies 2008)

ontology-based information extraction systems (obies 2008)

ontology-based information extraction systems (obies 2008)

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Relation Size DatasetDescription Pmanual Pclassic ∆ PF IM ∆ PF IMtuned<br />

albumBy 19852 Musiciansandtheirmusicalworks 80.8% 27.4% -11.6% -18%<br />

bornInYear 172696 personsandtheiryearofbirth 40.7% 19.5% +48.4% +17%<br />

currencyOf 221 countriesandtheirofficialcurrency 46.4% 22.8% -17.6% +10.9%<br />

headquarteredIn 14762 companies and the country of their headquarter<br />

3% 9.8% +2.2% -5.2%<br />

locatedIn 34047 citiesandtheircorrespondingcountry 73% 56.5% -8.4% -0.5%<br />

productOf 2650 productnamesandtheirmanufacturers. 64.6% 42.2% -0.9% +12%<br />

teamOf 8307 sportspersonsandtheirteamorcountry 30% 8.0% +1.4% +0.8%<br />

average 48.3 26.6% +1.9% +4.7%<br />

Table 1.Relationswithprecisionscoresobtainedbytheclassicsystem(manualevaluation)and<br />

differences(∆)measuredwiththetwoFIMconditions.<br />

in * " is frequent, then " * was * in * " is frequent as well). In order to<br />

avoidsuchtoogeneralpatternsandatthesametimeavoidingtoospecificones(e.g.<br />

"Wolfgang Amadeus * was born in * "),weintroducethefollowingrule<br />

forremovingmoregeneralpatterns:ifpattern ahasallconstraintsalsopresentin band<br />

onemore, bisremovedunless SUPPORT(b)isatleast20%higherthan SUPPORT(a).<br />

Thisruleisappliedstartingwiththesmallestpatterns.Weexperimentallydetermined<br />

thatthethresholdof20%leadstoagenerallyratherappropriatesetofpatterns.The<br />

remainingunwantedpatternsarelefttobeeliminatedbyfurtherfiltering.<br />

4 Experimental Evaluation<br />

Thegoalofourexperimentalevaluationistodemonstratetheadvantagesofmodeling<br />

thepatternabstractionsubtaskofiterativepatterninductionasafrequentitemsetmining<br />

(FIM)problem.Wedosobycomparingtheperformanceachievedbyouritemset-<strong>based</strong><br />

implementationwiththeabstractionalgorithmusedinpreviousimplementations(compare[3]).Wedonotintendtoshowthesuperiorityoftheapproach<strong>based</strong>onFrequent<br />

ItemsetMiningtothosefromtheliteratureasthiswouldrequireacommonbenchmark<br />

forlarge-scaleWebRelationExtractionoratleastacommonbasisofimplementation.<br />

Suchastandarddoesnotexistduetothediversityofapplicationsandpatternrepresentationformalismsintheliterature.Yet,weevaluateourresultsonafairlydiverseset<br />

ofnon-taxonomicrelationstoensuregenerality.Thedatasetsweusehavealreadybeen<br />

usedin[3]andareprovidedfordownloadbytheauthors.Asintheseexperiments,we<br />

havealsousedthesame10seedsselectedbyhandandthesameautomaticevaluation<br />

procedure.<br />

4.1 Experimental Setup<br />

Inourexperiments,werelyonthewidelyusedprecisionandrecallmeasurestoevaluateoursystem’soutputwithrespecttothefullextensionoftherelation<br />

2 .Togivean<br />

2 Notethatthisisdifferentfromtheevaluationofothersimilar<strong>systems</strong>whichcalculatethese<br />

measureswithrespecttoaspecificcorpus,thusyieldinghigherscores.AlsoduetotheabscenceofaclosedcorpusinourWebscenario,ournotionofrecallisisnotcomparable.We<br />

use“relativerecall”inthesensethatitreflects<strong>extraction</strong>scomparedtothehighestyieldcount<br />

obtainedoverallexperimentalsettingsweapplied.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!