08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Regression – Recommendations Improved<br />

Now we can use this in several ways. A simple one is to select the nearest neighbors<br />

of each user. These are the users that most resemble it. We will use the measured<br />

correlation discussed earlier:<br />

def estimate(user, rest):<br />

'''<br />

estimate movie ratings for 'user' based on the 'rest' of the<br />

universe.<br />

'''<br />

# binary version of user ratings<br />

bu = user > 0<br />

# binary version of rest ratings<br />

br = rest > 0<br />

ws = all_correlations(bu,br)<br />

# select 100 highest values<br />

selected = ws.argsort()[-100:]<br />

# estimate based on the mean:<br />

estimates = rest[selected].mean(0)<br />

# We need to correct estimates<br />

# based on the fact that some movies have more ratings than others:<br />

estimates /= (.1+br[selected].mean(0))<br />

When compared to the estimate obtained over all the users in the dataset, this<br />

reduces the RMSE by 20 percent. As usual, when we look only at those users that<br />

have more predictions, we do better: there is a 25 percent error reduction if the user<br />

is in the top half of the rating activity.<br />

Looking at the movie neighbors<br />

In the previous section, we looked at the users that were most similar. We can also<br />

look at which movies are most similar. We will now build recommendations based on<br />

a nearest neighbor rule for movies: when predicting the rating of a movie M for a user<br />

U, the system will predict that U will rate M <strong>with</strong> the same points it gave to the movie<br />

most similar to M.<br />

Therefore, we proceed <strong>with</strong> two steps: first, we compute a similarity matrix (a matrix<br />

that tells us which movies are most similar); second, we compute an estimate for<br />

each (user-movie) pair.<br />

We use the NumPy zeros and ones functions to allocate arrays (initialized to zeros<br />

and ones respectively):<br />

movie_likeness = np.zeros((nmovies,nmovies))<br />

allms = np.ones(nmovies, bool)<br />

cs = np.zeros(nmovies)<br />

[ 168 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!