08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The only possibilities we have in this case is to either get more features, make the<br />

model more complex, or change the model.<br />

Chapter 5<br />

Fixing high variance<br />

If, on the contrary, we suffer from high variance that means our model is too<br />

complex for the data. In this case, we can only try to get more data or decrease the<br />

complexity. This would mean to increase k so that more neighbors would be taken<br />

into account or to remove some of the features.<br />

High bias or low bias<br />

To find out what actually our problem is, we have to simply plot the train and test<br />

errors over the data size.<br />

High bias is typically revealed by the test error decreasing a bit at the beginning, but<br />

then settling at a very high value <strong>with</strong> the train error approaching a growing dataset<br />

size. High variance is recognized by a big gap between both curves.<br />

Plotting the errors for different dataset sizes for 5NN shows a big gap between the<br />

train and test error, hinting at a high variance problem. Refer to the following graph:<br />

[ 103 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!