08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 6<br />

Successfully cheating using SentiWordNet<br />

While the linguistic information that we discussed earlier will most likely help<br />

us, there is something better we can do to harvest it: SentiWordNet (http://<br />

sentiwordnet.isti.cnr.it). Simply put, it is a 13 MB file that assigns most of<br />

the English words a positive and negative value. In more complicated words, for<br />

every synonym set, it records both the positive and negative sentiment values. Some<br />

examples are as follows:<br />

POS ID PosScore NegScore SynsetTerms Description<br />

a 00311354 0.25 0.125 studious#1 Marked by care and<br />

effort; "made a studious<br />

attempt to fix the<br />

television set"<br />

a 00311663 0 0.5 careless#1 Marked by lack of<br />

attention or consideration<br />

or forethought or<br />

thoroughness; not careful<br />

n 03563710 0 0 implant#1 A prosthesis placed<br />

permanently in tissue<br />

v 00362128 0 0 kink#2<br />

curve#5<br />

curl#1<br />

Form a curl, curve, or<br />

kink; "the cigar smoke<br />

curled up at the ceiling"<br />

With the information in the POS column, we will be able to distinguish between the<br />

noun "book" and the verb "book". PosScore and NegScore together will help us to<br />

determine the neutrality of the word, which is 1-PosScore-NegScore. SynsetTerms<br />

lists all words in the set that are synonyms. The ID and Description can be safely<br />

ignored for our purpose.<br />

The synset terms have a number appended, because some occur multiple times in<br />

different synsets. For example, "fantasize" conveys two quite different meanings, also<br />

leading to different scores:<br />

POS ID PosScore NegScore SynsetTerms Description<br />

v 01636859 0.375 0 fantasize#2<br />

fantasise#2<br />

v 01637368 0 0.125 fantasy#1<br />

fantasize#1<br />

fantasise#1<br />

Portray in the mind; "he is<br />

fantasizing the ideal wife"<br />

Indulge in fantasies; "he is<br />

fantasizing when he says<br />

that he plans to start his<br />

own company"<br />

[ 141 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!