08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 6<br />

To keep our experimentation agile, let us wrap everything together in a train_<br />

model() function, which takes a function as a parameter that creates the classifier:<br />

from sklearn.metrics import precision_recall_curve, auc<br />

from sklearn.cross_validation import ShuffleSplit<br />

def train_model(clf_factory, X, Y):<br />

# setting random_state to get deterministic behavior<br />

cv = ShuffleSplit(n=len(X), n_iter=10, test_size=0.3,<br />

indices=True, random_state=0)<br />

scores = []<br />

pr_scores = []<br />

for train, test in cv:<br />

X_train, y_train = X[train], Y[train]<br />

X_test, y_test = X[test], Y[test]<br />

clf = clf_factory()<br />

clf.fit(X_train, y_train)<br />

train_score = clf.score(X_train, y_train)<br />

test_score = clf.score(X_test, y_test)<br />

scores.append(test_score)<br />

proba = clf.predict_proba(X_test)<br />

precision, recall, pr_thresholds = precision_recall_curve<br />

(y_test, proba[:,1])<br />

pr_scores.append(auc(recall, precision))<br />

summary = (np.mean(scores), np.std(scores),<br />

np.mean(pr_scores), np.std(pr_scores))<br />

print "%.3f\t%.3f\t%.3f\t%.3f"%summary<br />

>>> X, Y = load_sanders_data()<br />

>>> pos_neg_idx=np.logical_or(Y=="positive", Y=="negative")<br />

>>> X = X[pos_neg_idx]<br />

>>> Y = Y[pos_neg_idx]<br />

>>> Y = Y=="positive"<br />

>>> train_model(create_ngram_model)<br />

0.805 0.024 0.878 0.016<br />

[ 129 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!