08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Getting Started <strong>with</strong> <strong>Python</strong> <strong>Machine</strong> <strong>Learning</strong><br />

The models of degree 10 and 100 don't seem to expect a bright future for our<br />

startup. They tried so hard to model the given data correctly that they are clearly<br />

useless to extrapolate further. This is called overfitting. On the other hand, the<br />

lower-degree models do not seem to be capable of capturing the data properly.<br />

This is called underfitting.<br />

So let us play fair to the models of degree 2 and above and try out how they behave<br />

if we fit them only to the data of the last week. After all, we believe that the last<br />

week says more about the future than the data before. The result can be seen in the<br />

following psychedelic chart, which shows even more clearly how bad the problem<br />

of overfitting is:<br />

Still, judging from the errors of the models when trained only on the data from week<br />

3.5 and after, we should still choose the most complex one.<br />

Error d=1: 22143941.107618<br />

Error d=2: 19768846.989176<br />

Error d=3: 19766452.361027<br />

Error d=10: 18949339.348539<br />

Error d=100: 16915159.603877<br />

Training and testing<br />

If only we had some data from the future that we could use to measure our<br />

models against, we should be able to judge our model choice only on the resulting<br />

approximation error.<br />

[ 28 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!