08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 2<br />

The last few lines select the best model. First we compare the predictions, pred,<br />

<strong>with</strong> the actual labels, virginica. The little trick of computing the mean of the<br />

comparisons gives us the fraction of correct results, the accuracy. At the end of the<br />

for loop, all possible thresholds for all possible features have been tested, and the<br />

best_fi and best_t variables hold our model. To apply it to a new example, we<br />

perform the following:<br />

if example[best_fi] > t: print 'virginica'<br />

else: print 'versicolor'<br />

What does this model look like? If we run it on the whole data, the best model that<br />

we get is split on the petal length. We can visualize the decision boundary. In the<br />

following screenshot, we see two regions: one is white and the other is shaded in<br />

grey. Anything that falls in the white region will be called Iris Virginica and anything<br />

that falls on the shaded side will be classified as Iris Versicolor:<br />

In a threshold model, the decision boundary will always be a line that is parallel to<br />

one of the axes. The plot in the preceding screenshot shows the decision boundary<br />

and the two regions where the points are classified as either white or grey. It also<br />

shows (as a dashed line) an alternative threshold that will achieve exactly the same<br />

accuracy. Our method chose the first threshold, but that was an arbitrary choice.<br />

[ 37 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!