20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stabler - Lx 185/209 2003<br />

8 Stochastic influences <strong>on</strong> simple language models<br />

8.1 Motivati<strong>on</strong>s and background<br />

(1) Our example parsers have tiny dicti<strong>on</strong>aries. If you just add in a big dicti<strong>on</strong>ary, we get many structural<br />

ambiguities.<br />

Just to illustrate how bad the problem is, the following simple examples from Abney (1996a) have<br />

ambiguitiesthatmostpeoplewouldnotnotice,butourparsingmethodswill:<br />

a. I know the cows are grazing in the meadow<br />

b. I know John saw Mary<br />

The word are is a noun in a hectare is a hundred ares, andsaw can be a noun, so the n<strong>on</strong>-obvious<br />

readings <strong>of</strong> those two sentences are the <strong>on</strong>es analogous to the natural readings <strong>of</strong> these:<br />

a. I know the sales force (which is) meeting in the <strong>of</strong>fice<br />

b. I know Gatling gun Joe<br />

There are many other readings too, <strong>on</strong>es which would be spelled differently (if we were careful about<br />

quotes, which most people are not) but pr<strong>on</strong>ounced the same:<br />

a. I know “The Cows are Grazing in the Meadow”<br />

b. I know “The Cows are Grazing” in the meadow<br />

c. I know “The Cows are” grazing in the meadow<br />

…<br />

… I know ““The Cows are Grazing in the Meadow””<br />

…<br />

This kind <strong>of</strong> thing is a problem for mimicking, let al<strong>on</strong>e modeling, human recogniti<strong>on</strong> capabilities.<br />

Abney c<strong>on</strong>cludes:<br />

The problem <strong>of</strong> how to identify the correct structure from am<strong>on</strong>g the in-principle possible<br />

structures provides <strong>on</strong>e <strong>of</strong> the central motivati<strong>on</strong>s for the use <strong>of</strong> weighted grammars in computati<strong>on</strong>al<br />

<strong>linguistics</strong>.<br />

(2) Martin Gardner gives us the following amusing puzzle. Insert the minimum number <strong>of</strong> quotati<strong>on</strong> marks<br />

to make the best sense <strong>of</strong> the following sentence:<br />

Wouldn’t the sentence I want to put a hyphen between the words fish and and and and and<br />

chips in my fish and chips sign have looked cleaner if quotati<strong>on</strong> marks had been placed before<br />

fish and between fish and and and and and and and and and and and and and and and and<br />

and and and and and chips as well as after chips?<br />

In effect, we solve a problem like this every time we interpret a spoken sentence.<br />

(3) Another dem<strong>on</strong>strati<strong>on</strong> <strong>of</strong> the ambiguity problem comes from studies like Charniak, Goldwater, and<br />

Johns<strong>on</strong> (1998). Applying the grammar <strong>of</strong> the Penn Treebank II to sentences in that Treebank shorter<br />

than 40 words from the Wall Street Journal, they found that their charts had, <strong>on</strong> average, 1.2 milli<strong>on</strong><br />

items per sentence – obviously, very few <strong>of</strong> these are actually used in the desired derivati<strong>on</strong>, and the<br />

rest come from local and global ambiguities.<br />

They say:<br />

Numbers like this suggest that any approach that <strong>of</strong>fers the possibility <strong>of</strong> reducing the work<br />

load is well worth pursuing…<br />

116

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!