01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 6<br />

Determining the word types<br />

Determining the word types is what part of speech (POS) tagging is all about. A<br />

POS tagger parses a full sentence with the goal to arrange it into a dependence tree,<br />

where each node corresponds to a word and the parent-child relationship determines<br />

which word it depends on. With this tree, it can then make more informed decisions;<br />

for example, whether the word "book" is a noun ("This is a good book.") or a verb<br />

("Could you please book the flight?").<br />

You might have already guessed that NLTK will also play a role also in this area.<br />

And indeed, it comes readily packaged with all sorts of parsers and taggers. The POS<br />

tagger we will use, nltk.pos_tag(), is actually a full-blown classifier trained using<br />

manually annotated sentences from the Penn Treebank Project (http://www.cis.<br />

upenn.edu/~treebank). It takes as input a list of word tokens and outputs a list of<br />

tuples, each element of which contains the part of the original sentence and its part of<br />

speech tag:<br />

>>> import nltk<br />

>>> nltk.pos_tag(nltk.word_tokenize("This is a good book."))<br />

[('This', 'DT'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('book',<br />

'NN'), ('.', '.')]<br />

>>> nltk.pos_tag(nltk.word_tokenize("Could you please book the<br />

flight?"))<br />

[('Could', 'MD'), ('you', 'PRP'), ('please', 'VB'), ('book', 'NN'),<br />

('the', 'DT'), ('flight', 'NN'), ('?', '.')]<br />

The POS tag abbreviations are taken from the Penn Treebank Project (adapted from<br />

http://americannationalcorpus.org/OANC/penn.html):<br />

POS tag Description Example<br />

CC coordinating conjunction or<br />

CD cardinal number 2 second<br />

DT determiner the<br />

EX existential there there are<br />

FW foreign word kindergarten<br />

IN<br />

preposition/subordinating<br />

conjunction<br />

on, of, like<br />

JJ adjective cool<br />

JJR adjective, comparative cooler<br />

JJS adjective, superlative coolest<br />

LS list marker 1)<br />

MD modal could, will<br />

[ 139 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!