ontology-based information extraction systems (obies 2008)
ontology-based information extraction systems (obies 2008)
ontology-based information extraction systems (obies 2008)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
asstandardsearchenginequeriesusingoff-the-shelfsearchengineAPIs.ThiscircumventstheneedtolinearlyprocessthewholeWeb(seee.g.[3]).Someapproachesperformpatterninductioninaniterativefashioninacyclicapproachwhichusesthenew<br />
examplesderivedinoneiterationfortheinductionofnewpatternsinthenextiteration<br />
[4,1].Inthispaperwefollowthislatterapproachandinparticularexaminemorein<br />
detailtheempiricalcomplexityofthepatterninductionstep.Asintheseapproaches<br />
theinductionofpatternsproceedsinabootstrapping-likefashion,thecomplexityofthe<br />
patterninductionstepcruciallydeterminesthetimecomplexityofthewholeapproach.<br />
Earlierimplementationsoftheapproachhaveusedgreedystrategiesforthepairwise<br />
comparisonoftheoccurrencesofseedexamples.InthispaperweshowhowtheApriorialgorithmfordiscoveringfrequentitemsetscanbeusedtoderivepatternswitha<br />
minimalsupportinlineartime.Ourempiricalevaluationshowsthatwiththisapproach<br />
patterninductioncanbereducedtolineartimewhilemaintaining<strong>extraction</strong>quality<br />
comparable(andevenmarginallybetter)toearlierimplementationsofthealgorithm.<br />
Theremainderofthispaperisorganizedasfollows.Inthenextsectionwedescribetheapproachofpattern-<strong>based</strong>relation<strong>extraction</strong>usingWebsearchenginesin<br />
moredetail.InsectionPatternInductionasFrequentItemsetMining,wegiveabrief<br />
introductiontoFrequentItemsetMiningbeforedescribinghowitisappliedinorder<br />
toinducepatternsforrelation<strong>extraction</strong>.WedescribeourexperimentalresultsinsectionExperimentalResults,beforediscussingrelatedworkandgivingsomeconcluding<br />
remarks.<br />
2 Iterative Pattern Induction<br />
Thegoalofpatterninductionis,givenasetofseedexamples(pairs) Sofarelation R<br />
aswellasoccurrences Occ(S)inthecorpus(theWebinourcase)oftheseseeds,to<br />
induceasetofpatterns Pwhicharegeneralenoughtoextractmanymoretuplesstandingintherelation<br />
R(thushavingagoodcoverage)andwhichatthesametimedonot<br />
overgenerateinthesensethattheyproducetoomanyspuriousexamples.Thechallengingissueshereareontheonehandthatthehypothesisspaceishuge,correspondingto<br />
thepowersetofthesetofpossiblepatterns P representingabstractionsoverthesetof<br />
occurrences Occ(S).Wewilldenotethishypothesisspaceas 2 P .Ontheotherhand,<br />
thecompleteextension extRoftherelation Risunknown(itisthegoalofthewhole<br />
approachtoapproximatethisextensionascloselyaspossibleattheendofthecycle),<br />
suchthatwecannotuseittocomputeanobjectivefunction: o : 2 P → Rtodetermine<br />
thepatterns’accuracywithrespecttotheextension extR.<br />
ThegeneralalgorithmforiterativeinductionofpatternsispresentedinFigure1.<br />
Itsubsumesmanyoftheapproachesmentionedintheintroductionwhichimplement<br />
similarbootstrapping-likeprocedures.Thekeyideaistoco-evolve P (whichatthe<br />
beginningisassumedtobeempty)aswellasaconstantlygrowingsetofexamples S<br />
whichatthebeginningcorrespondstotheseedexamples.Thecandidatepatternscanbe<br />
generatedinagreedyfashionbyabstractingovertheoccurrences Occ(S).Abstracting<br />
requiresfindingcommonproperties,whichinprincipleisaquadratictaskasitrequires<br />
pairwisecomparisonbetweenthedifferentoccurrences.