20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Stabler - Lx 185/209 2003<br />

7.1.3 CKY example 2<br />

Since we can now recognize the language <strong>of</strong> any c<strong>on</strong>text free grammar, we can take grammars written by any<strong>on</strong>e<br />

else and try them out. For example, we can take the grammar defined by the Penn Treebank and try to parse<br />

with it. For example, in the file wsj_0005.mrg we find the following 3 trees:<br />

NNP<br />

J.P.<br />

NP<br />

NNP<br />

Bolduc<br />

,<br />

,<br />

NN<br />

vice<br />

NP-SBJ<br />

PRP<br />

He<br />

NP<br />

NP-SBJ-10<br />

NN<br />

chairman<br />

VBZ<br />

NNP<br />

W.R.<br />

succeeds<br />

NNP<br />

Terrence<br />

NP<br />

IN<br />

<strong>of</strong><br />

NNP<br />

Grace<br />

NP<br />

NNP<br />

D.<br />

NNP<br />

W.R.<br />

NP<br />

CC<br />

&<br />

NNP<br />

PP<br />

Daniels<br />

NP-SBJ<br />

NNP<br />

Grace<br />

NNP<br />

Co.<br />

S<br />

,<br />

,<br />

,<br />

,<br />

VP<br />

,<br />

,<br />

NP<br />

ADVP<br />

RB<br />

formerly<br />

VBZ<br />

holds<br />

S<br />

S<br />

WHNP-10<br />

WDT<br />

which<br />

DT<br />

a<br />

NP<br />

CD<br />

VP<br />

three<br />

NNP<br />

Grace<br />

NNP<br />

W.R.<br />

IN<br />

<strong>of</strong><br />

VBD<br />

was<br />

SBAR<br />

NP-SBJ<br />

-NONE-<br />

*T*-10<br />

NP<br />

VP<br />

VBN<br />

elected<br />

NP<br />

NP<br />

NNP<br />

Grace<br />

NP<br />

NNP<br />

Energy<br />

S<br />

VBZ<br />

holds<br />

DT<br />

.<br />

.<br />

a<br />

PP<br />

CD<br />

83.4<br />

NN<br />

vice<br />

VP<br />

NP-SBJ<br />

-NONE-<br />

*-10<br />

ADJP<br />

NP<br />

NN<br />

%<br />

.<br />

.<br />

VP<br />

S<br />

DT<br />

a<br />

NP-PRD<br />

NN<br />

interest<br />

NN<br />

chairman<br />

,<br />

,<br />

NN<br />

director<br />

NP<br />

IN<br />

in<br />

WHNP-11<br />

WP<br />

who<br />

DT<br />

this<br />

SBAR<br />

NP-SBJ<br />

-NONE-<br />

*T*-11<br />

PP-LOC<br />

JJ<br />

NP<br />

energy-services<br />

Notice that these trees indicate movement relati<strong>on</strong>s, with co-indexed traces. If we ignore the movement relati<strong>on</strong>s<br />

and just treat the traces as empty, though, we have a CFG – <strong>on</strong>e that will accept all the strings that are<br />

parsed in the treebank plus some others as well.<br />

We will study how to parse movements later, but for the moment, let’s collect the (overgenerating) c<strong>on</strong>text<br />

free rules from these trees. Dividing the lexical rules from the others, and showing how many times each rule<br />

is used, we have first:<br />

1 (’SBAR’:˜[’WHNP-11’,’S’]).<br />

.<br />

.<br />

107<br />

POS<br />

’s<br />

CD<br />

NP<br />

seven<br />

NN<br />

board<br />

NNS<br />

seats<br />

S<br />

VP<br />

VBD<br />

resigned<br />

NN<br />

company

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!