01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

If instead our goal would have been to detect as much good or bad answers as<br />

possible, we would be more interested in recall:<br />

Chapter 5<br />

The next screenshot shows all the good answers and the answers that have been<br />

classified as being good ones:<br />

In terms of the previous diagram, precision is the fraction of the intersection of<br />

the right circle while recall is the fraction of the intersection of the left circle.<br />

So, how can we optimize for precision? Up to now, we have always used 0.5 as<br />

the threshold to decide whether an answer is good or not. What we can do now<br />

is to count the number of TP, FP, and FN instances while varying that threshold<br />

between 0 and 1. With these counts, we can then plot precision over recall.<br />

The handy function precision_recall_curve() from the metrics module does<br />

all the calculations for us as shown in the following code:<br />

>>> from sklearn.metrics import precision_recall_curve<br />

>>> precision, recall, thresholds = precision_recall_curve(y_test,<br />

clf.predict(X_test)<br />

[ 111 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!