08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Classification II – Sentiment Analysis<br />

The only missing thing is to define how GridSearchCV should determine the best<br />

estimator. This can be done by providing the desired score function to (surprise!) the<br />

score_func parameter. We could either write one ourselves or pick one from the<br />

sklearn.metrics package. We should certainly not take metric.accuracy because<br />

of our class imbalance (we have a lot less tweets containing sentiment than neutral<br />

ones). Instead, we want to have good precision and recall on both the classes: the<br />

tweets <strong>with</strong> sentiment and the tweets <strong>with</strong>out positive or negative opinions. One<br />

metric that combines both precision and recall is the F-measure metric, which is<br />

implemented as metrics.f1_score:<br />

Putting everything together, we get the following code:<br />

from sklearn.grid_search import GridSearchCV<br />

from sklearn.metrics import f1_score<br />

def grid_search_model(clf_factory, X, Y):<br />

cv = ShuffleSplit(<br />

n=len(X), n_iter=10, test_size=0.3, indices=True, random_<br />

state=0)<br />

param_grid = dict(vect__ngram_range=[(1, 1), (1, 2), (1, 3)],<br />

vect__min_df=[1, 2],<br />

vect__stop_words=[None, "english"],<br />

vect__smooth_idf=[False, True],<br />

vect__use_idf=[False, True],<br />

vect__sublinear_tf=[False, True],<br />

vect__binary=[False, True],<br />

clf__alpha=[0, 0.01, 0.05, 0.1, 0.5, 1],<br />

)<br />

grid_search = GridSearchCV(clf_factory(),<br />

param_grid=param_grid,<br />

cv=cv,<br />

score_func=f1_score,<br />

verbose=10)<br />

grid_search.fit(X, Y)<br />

return grid_search.best_estimator_<br />

[ 134 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!