08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Regression – Recommendations Improved<br />

Using the binary matrix of recommendations<br />

One of the interesting conclusions from the Netflix Challenge was one of those<br />

obvious-in-hindsight ideas: we can learn a lot about you just from knowing which<br />

movies you rated, even <strong>with</strong>out looking at which rating was given! Even <strong>with</strong> a<br />

binary matrix where we have a rating of 1 where a user rated a movie and 0 where<br />

they did not, we can make useful predictions. In hindsight, this makes perfect sense;<br />

we do not choose movies to watch completely randomly, but instead pick those<br />

where we already have an expectation of liking them. We also do not make random<br />

choices of which movies to rate, but perhaps only rate those we feel most strongly<br />

about (naturally, there are exceptions, but on an average this is probably true).<br />

We can visualize the values of the matrix as an image where each rating is depicted<br />

as a little square. Black represents the absence of a rating and the grey levels<br />

represent the rating value. We can see that the matrix is sparse—most of the squares<br />

are black. We can also see that some users rate a lot more movies than others and<br />

that some movies are the target of many more ratings than others.<br />

The code to visualize the data is very simple (you can adapt it to show a larger<br />

fraction of the matrix than is possible to show in this book), as follows:<br />

from matplotlib import pyplot as plt<br />

imagedata = reviews[:200, :200].todense()<br />

plt.imshow(imagedata, interpolation='nearest')<br />

The following screenshot is the output of this code:<br />

[ 166 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!