01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5<br />

Setting the threshold at 0.63, we see that we can still achieve a precision of above<br />

80 percent, detecting good answers when we accept a low recall of 37 percent. This<br />

means that we will detect only one in three bad answers, but those answers that we<br />

manage to detect we would be reasonably sure of.<br />

To apply this threshold in the prediction process, we have to use predict_<br />

proba(), which returns per class probabilities, instead of predict(), which<br />

returns the class itself:<br />

>>> thresh80 = threshold[idx80][0]<br />

>>> probs_for_good = clf.predict_proba(answer_features)[:,1]<br />

>>> answer_class = probs_for_good>thresh80<br />

We can confirm that we are in the desired precision/recall range using<br />

classification_report:<br />

>>> from sklearn.metrics import classification_report<br />

>>> print(classification_report(y_test, clf.predict_proba [:,1]>0.63,<br />

target_names=['not accepted', 'accepted']))<br />

precision recall f1-score support<br />

not accepted 0.63 0.93 0.75 108<br />

accepted 0.80 0.36 0.50 92<br />

avg / total 0.71 0.67 0.63 200<br />

Using the threshold will not guarantee that we are always above the<br />

precision and recall values that we determined previously together<br />

with its threshold.<br />

[ 113 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!