01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 6<br />

Our first estimator<br />

Now we have everything in place to create our first vectorizer. The most convenient<br />

way to do it is to inherit it from BaseEstimator. It requires us to implement the<br />

following three methods:<br />

• get_feature_names(): This returns a list of strings of the features that we<br />

will return in transform().<br />

• fit(document, y=None): As we are not implementing a classifier, we can<br />

ignore this one and simply return self.<br />

• transform(documents): This returns numpy.array(), containing an array of<br />

shape (len(documents), len(get_feature_names)). This means that for<br />

every document in documents, it has to return a value for every feature name<br />

in get_feature_names().<br />

Let us now implement these methods:<br />

sent_word_net = load_sent_word_net()<br />

class LinguisticVectorizer(BaseEstimator):<br />

def get_feature_names(self):<br />

return np.array(['sent_neut', 'sent_pos', 'sent_neg',<br />

'nouns', 'adjectives', 'verbs', 'adverbs',<br />

'allcaps', 'exclamation', 'question', 'hashtag',<br />

'mentioning'])<br />

# we don't fit here but need to return the reference<br />

# so that it can be used like fit(d).transform(d)<br />

def fit(self, documents, y=None):<br />

return self<br />

def _get_sentiments(self, d):<br />

sent = tuple(d.split())<br />

tagged = nltk.pos_tag(sent)<br />

pos_vals = []<br />

neg_vals = []<br />

nouns = 0.<br />

adjectives = 0.<br />

verbs = 0.<br />

adverbs = 0.<br />

[ 143 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!