20.11.2012 Views

ontology-based information extraction systems (obies 2008)

ontology-based information extraction systems (obies 2008)

ontology-based information extraction systems (obies 2008)

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

asstandardsearchenginequeriesusingoff-the-shelfsearchengineAPIs.ThiscircumventstheneedtolinearlyprocessthewholeWeb(seee.g.[3]).Someapproachesperformpatterninductioninaniterativefashioninacyclicapproachwhichusesthenew<br />

examplesderivedinoneiterationfortheinductionofnewpatternsinthenextiteration<br />

[4,1].Inthispaperwefollowthislatterapproachandinparticularexaminemorein<br />

detailtheempiricalcomplexityofthepatterninductionstep.Asintheseapproaches<br />

theinductionofpatternsproceedsinabootstrapping-likefashion,thecomplexityofthe<br />

patterninductionstepcruciallydeterminesthetimecomplexityofthewholeapproach.<br />

Earlierimplementationsoftheapproachhaveusedgreedystrategiesforthepairwise<br />

comparisonoftheoccurrencesofseedexamples.InthispaperweshowhowtheApriorialgorithmfordiscoveringfrequentitemsetscanbeusedtoderivepatternswitha<br />

minimalsupportinlineartime.Ourempiricalevaluationshowsthatwiththisapproach<br />

patterninductioncanbereducedtolineartimewhilemaintaining<strong>extraction</strong>quality<br />

comparable(andevenmarginallybetter)toearlierimplementationsofthealgorithm.<br />

Theremainderofthispaperisorganizedasfollows.Inthenextsectionwedescribetheapproachofpattern-<strong>based</strong>relation<strong>extraction</strong>usingWebsearchenginesin<br />

moredetail.InsectionPatternInductionasFrequentItemsetMining,wegiveabrief<br />

introductiontoFrequentItemsetMiningbeforedescribinghowitisappliedinorder<br />

toinducepatternsforrelation<strong>extraction</strong>.WedescribeourexperimentalresultsinsectionExperimentalResults,beforediscussingrelatedworkandgivingsomeconcluding<br />

remarks.<br />

2 Iterative Pattern Induction<br />

Thegoalofpatterninductionis,givenasetofseedexamples(pairs) Sofarelation R<br />

aswellasoccurrences Occ(S)inthecorpus(theWebinourcase)oftheseseeds,to<br />

induceasetofpatterns Pwhicharegeneralenoughtoextractmanymoretuplesstandingintherelation<br />

R(thushavingagoodcoverage)andwhichatthesametimedonot<br />

overgenerateinthesensethattheyproducetoomanyspuriousexamples.Thechallengingissueshereareontheonehandthatthehypothesisspaceishuge,correspondingto<br />

thepowersetofthesetofpossiblepatterns P representingabstractionsoverthesetof<br />

occurrences Occ(S).Wewilldenotethishypothesisspaceas 2 P .Ontheotherhand,<br />

thecompleteextension extRoftherelation Risunknown(itisthegoalofthewhole<br />

approachtoapproximatethisextensionascloselyaspossibleattheendofthecycle),<br />

suchthatwecannotuseittocomputeanobjectivefunction: o : 2 P → Rtodetermine<br />

thepatterns’accuracywithrespecttotheextension extR.<br />

ThegeneralalgorithmforiterativeinductionofpatternsispresentedinFigure1.<br />

Itsubsumesmanyoftheapproachesmentionedintheintroductionwhichimplement<br />

similarbootstrapping-likeprocedures.Thekeyideaistoco-evolve P (whichatthe<br />

beginningisassumedtobeempty)aswellasaconstantlygrowingsetofexamples S<br />

whichatthebeginningcorrespondstotheseedexamples.Thecandidatepatternscanbe<br />

generatedinagreedyfashionbyabstractingovertheoccurrences Occ(S).Abstracting<br />

requiresfindingcommonproperties,whichinprincipleisaquadratictaskasitrequires<br />

pairwisecomparisonbetweenthedifferentoccurrences.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!