06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

114 Chapter 4. Extracting Relation Instancesmeasured by PMI. Later, associating a pattern with a larger number of more reliableinstances increases <strong>the</strong> pattern’s reliability.The reliability of instances is defined symmetrically: replace <strong>the</strong> reliability ofinstances is with <strong>the</strong> reliability of patterns r π (i) and <strong>the</strong> set of instances by <strong>the</strong> set ofpatterns P :∑ ( )pmi(i,p)p∈P max pmi∗ r π (p)r π =(4.2)|P |PMI originates <strong>from</strong> Information Theory. It measures <strong>the</strong> strength of associationbetween two events:|x, p, y||∗, ∗, ∗|pmi(i, p) = log (4.3)|x, ∗, y||∗, p, ∗||x, p, y| is <strong>the</strong> number of occurrences of x and y in contexts matching <strong>the</strong> patternp, x, ∗, y – <strong>the</strong> number of co-occurrences of x and y in <strong>the</strong> corpus regardless of <strong>the</strong>pattern, and so on.The definition of pmi presented by Pantel and Pennacchiotti (2006) does not include<strong>the</strong> constituent |∗, ∗, ∗| (<strong>the</strong> number of contexts). The PMI measure, however, shouldbe usually greater than 0, while pmi defined in (Pantel and Pennacchiotti, 2006) is not.The missing constituent is also suggested by <strong>the</strong> general definition of PMI:pmi(i, p) = log p(I, P )p(I)p(P )(4.4)Because PMI is significantly higher when instances and patterns are not numerous(e.g., < 10), PMI is multiplied by a discounting factor proposed in (Panteland Ravichandran, 2004) that decreases <strong>the</strong> bias towards infrequent events.In Espresso, generic patterns are defined as generating 10 times more instances thanpreviously accepted reliable patterns. They extract many instances but are characterisedby lower reliability. Generic patterns are not excluded by definition. They increaserecall (<strong>the</strong> number of correct instances extracted), but inevitably decrease precision.In order to prevent an excessive reduction of <strong>the</strong> precision, an additional measure ofconfidence of instances has been introduced. It is based on <strong>the</strong> evaluation of instancesagainst reliable patterns only and <strong>the</strong> additional data acquired by searching <strong>the</strong> Webwith <strong>the</strong> queries generated <strong>from</strong> instances and patterns:S(i) = ∑S p (i) ∗ r π(p)(4.5)Tp∈P RP R is <strong>the</strong> set of reliable patterns (given a threshold), S p (i, p) is <strong>the</strong> PMI between i andp measured on <strong>the</strong> data acquired <strong>from</strong> <strong>the</strong> Web (using Google queries) and T is <strong>the</strong>sum over <strong>the</strong> reliability of reliable patterns.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!