08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Classification II – Sentiment Analysis<br />

To find out which of the synsets to take, we would have to really understand the<br />

meaning of the tweets, which is beyond the scope of this chapter. The field of<br />

research that focuses on this challenge is called word sense disambiguation. For<br />

our task, we take the easy route and simply average the scores over all the synsets<br />

in which a term is found. For "fantasize", PosScore would be 0.1875 and NegScore<br />

would be 0.0625.<br />

The following function, load_sent_word_net(), does all that for us, and returns a<br />

dictionary where the keys are strings of the form "word type/word", for example "n/<br />

implant", and the values are the positive and negative scores:<br />

import csv, collections<br />

def load_sent_word_net():<br />

sent_scores = collections.defaultdict(list)<br />

<strong>with</strong> open(os.path.join(DATA_DIR,<br />

SentiWordNet_3.0.0_20130122.txt"), "r") as csvfile:<br />

reader = csv.reader(csvfile, delimiter='\t',<br />

quotechar='"')<br />

for line in reader:<br />

if line[0].starts<strong>with</strong>("#"):<br />

continue<br />

if len(line)==1:<br />

continue<br />

POS,ID,PosScore,NegScore,SynsetTerms,Gloss = line<br />

if len(POS)==0 or len(ID)==0:<br />

continue<br />

#print POS,PosScore,NegScore,SynsetTerms<br />

for term in SynsetTerms.split(" "):<br />

# drop number at the end of every term<br />

term = term.split("#")[0]<br />

term = term.replace("-", " ").replace("_", " ")<br />

key = "%s/%s"%(POS,term.split("#")[0])<br />

sent_scores[key].append((float(PosScore),<br />

float(NegScore)))<br />

for key, value in sent_scores.iteritems():<br />

sent_scores[key] = np.mean(value, axis=0)<br />

return sent_scores<br />

[ 142 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!