08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 1<br />

We are missing only 8 out of 743 entries, so we can afford to remove them. Remember<br />

that we can index a SciPy array <strong>with</strong> another array. sp.isnan(y) returns an array of<br />

Booleans indicating whether an entry is not a number. Using ~, we logically negate<br />

that array so that we choose only those elements from x and y where y does contain<br />

valid numbers.<br />

x = x[~sp.isnan(y)]<br />

y = y[~sp.isnan(y)]<br />

To get a first impression of our data, let us plot the data in a scatter plot using<br />

Matplotlib. Matplotlib contains the pyplot package, which tries to mimic Matlab's<br />

interface—a very convenient and easy-to-use one (you will find more tutorials on<br />

plotting at http://matplotlib.org/users/pyplot_tutorial.html).<br />

import matplotlib.pyplot as plt<br />

plt.scatter(x,y)<br />

plt.title("Web traffic over the last month")<br />

plt.xlabel("Time")<br />

plt.ylabel("Hits/hour")<br />

plt.xticks([w*7*24 for w in range(10)],<br />

['week %i'%w for w in range(10)])<br />

plt.autoscale(tight=True)<br />

plt.grid()<br />

plt.show()<br />

In the resulting chart, we can see that while in the first weeks the traffic stayed more<br />

or less the same, the last week shows a steep increase:<br />

[ 21 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!