12.07.2015 Views

Ch7. word sense disambiguation

Ch7. word sense disambiguation

Ch7. word sense disambiguation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

EE669 Natural Language Processing Fall 2001EE669 Notes 4Supervised Disambiguation:An Information-Theoretic Approach• Let t 1 ,…, t m be translations for an ambiguous <strong>word</strong>and x 1 ,…, x n be possible values of the indicator.• The Flip-Flop algorithm is used to disambiguatebetween the different <strong>sense</strong>s of a <strong>word</strong> usingmutual information:– I(X;Y)=Σ x∈X Σ y∈Y p(x,y) log p(x,y)/(p(x)p(y))– See Brown et al. for an extension to more than two<strong>sense</strong>s.• The algorithm works by searching for a partitionof <strong>sense</strong>s that maximizes the mutual information.The algorithm stops when the increase becomesinsignificant.Fall 2001 EE669: Natural Language Processing 13Mutual Information• I(X; Y)=H(X)-H(X|Y)=H(Y)-H(Y|X), the mutualinformationbetween X and Y, is the reduction inuncertainty of one random variable due toknowing about another, or, in other <strong>word</strong>s, theamount of information one random variablecontains about another.H(X)H(X|Y)H(X,Y)I(X; Y)H(Y|X)H(Y)Fall 2001 EE669: Natural Language Processing 14Mutual Information (cont)I(X; Y) = H(X) - H(X | Y) = H(Y) - H(Y | X)• I(X; Y) is symmetric, non-negative measure of thecommon information of two variables.• Some see it as a measure of dependence between twovariables, but better to think of it as a measure ofindependence.– I(X; Y) is 0 only when X and Y are independent: H(X|Y)=H(X)– For two dependent variables, I grows not only according to thedegree of dependence but also according to the entropy of the twovariables.• H(X)=H(X)-H(X|X)=I(X; X) ⇒ Why entropy is calledself-information.The Flip-Flop DisambiguationAlgorithmfind random partition P={P 1 , P 2 } of translations {t 1 , …, t m }while (there is a significant improvement) do– find partition Q={Q 1 , Q 2 } of indicators {x 1 , …, x n } that maximizesI(P;Q)– find partition P={P 1 , P 2 } of translations {t 1 , …, t m } that maximizesI(P;Q)end• I(X; Y) = ∑ x∈X ∑ y∈Y p(x,y) log (p(x,y)/(p(x)p(y)))• Mutual information increases monotonically in the Flip- Flopalgorithm, so it is reasonable to stop when there is only an insignificantimprovement.Fall 2001 EE669: Natural Language Processing 15Fall 2001 EE669: Natural Language Processing 16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!