13.07.2015 Views

Collocation Collocations & colligations Defining ... - Indiana University

Collocation Collocations & colligations Defining ... - Indiana University

Collocation Collocations & colligations Defining ... - Indiana University

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Expected bigram probabilitiesCorpus LinguisticsApplication #2:<strong>Collocation</strong>sExpected bigram frequenciesCorpus LinguisticsApplication #2:<strong>Collocation</strong>sIf we assumed that sherlock and holmes areindependent—i.e., the probability of one is unaffected by theprobability of the other—we would get the following table:holmes ¬ holmes Totalsherlock 0.00647 x 0.00099 0.99353 x 0.00099 0.00099¬ sherlock 0.00647 x 0.99901 0.99353 x 0.99901 0.99901Total 0.00647 0.99353 1.0◮ This is simply p e (w 1 , w 2 ) = p(w 1 )p(w 2 )<strong>Collocation</strong>s<strong>Defining</strong> a collocationKrishnamurthyCalculatingcollocationsPractical workMultiplying by 7105 (the total number of bigrams) gives usthe expected number of times we should see each bigram:holmes ¬ holmes Totalsherlock 0.05 6.95 7¬ sherlock 45.5 7052.05 7098Total 46 7059 7105◮ The values in this chart are the expected frequencies(f e )<strong>Collocation</strong>s<strong>Defining</strong> a collocationKrishnamurthyCalculatingcollocationsPractical work25 / 2826 / 28Pearson’s chi-square testCorpus LinguisticsApplication #2:<strong>Collocation</strong>sWorking with collocationsCorpus LinguisticsApplication #2:<strong>Collocation</strong>sThe chi-square (χ 2 ) test measures how far the observedvalues are from the expected values:(6) χ 2 = ∑ (f o−f e) 2f e(7)χ 2= (7−0.05)20.05+ (0−6.95)26.95+ (39−45.5)245.5+ (7059−7052.05)27052.05<strong>Collocation</strong>s<strong>Defining</strong> a collocationKrishnamurthyCalculatingcollocationsPractical workThe question is:◮ What significant collocations are there that start with theword sweet?◮ Specifically, what nouns tend to co-occur after sweet?<strong>Collocation</strong>s<strong>Defining</strong> a collocationKrishnamurthyCalculatingcollocationsPractical work= 966.05 + 6.95 + 1.048 + 0.006What do your intuitions say?= 974.05Next time, we will work on how to calculate collocations ...If you look this up in a table, you’ll see that it’s unlikely to bechanceNB: The χ 2 test does not work well for rare events, i.e., f e < 627 / 2828 / 28

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!