08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Clustering – Finding Related Posts<br />

Installing and using NLTK<br />

How to install NLTK on your operating system is described in detail at<br />

http://nltk.org/install.html. Basically, you will need to install the two<br />

packages NLTK and PyYAML.<br />

To check whether your installation was successful, open a <strong>Python</strong> interpreter and<br />

type the following:<br />

>>> import nltk<br />

You will find a very nice tutorial for NLTK in the book <strong>Python</strong><br />

Text Processing <strong>with</strong> NLTK 2.0 Cookbook. To play a little bit <strong>with</strong> a<br />

stemmer, you can visit the accompanied web page http://textprocessing.com/demo/stem/.<br />

NLTK comes <strong>with</strong> different stemmers. This is necessary, because every language has<br />

a different set of rules for stemming. For English, we can take SnowballStemmer.<br />

>>> import nltk.stem<br />

>>> s= nltk.stem.SnowballStemmer('english')<br />

>>> s.stem("graphics")<br />

u'graphic'<br />

>>> s.stem("imaging")<br />

u'imag'<br />

>>> s.stem("image")<br />

u'imag'<br />

>>> s.stem("imagination")u'imagin'<br />

>>> s.stem("imagine")<br />

u'imagin'<br />

Note that stemming does not necessarily have to result into valid<br />

English words.<br />

It also works <strong>with</strong> verbs as follows:<br />

>>> s.stem("buys")<br />

u'buy'<br />

>>> s.stem("buying")<br />

u'buy'<br />

>>> s.stem("bought")<br />

u'bought'<br />

[ 58 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!