08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

coefficients = []<br />

# We are now going to run a leave-1-out cross-validation loop<br />

for u in xrange(reviews.shape[0]): # for all user ids<br />

es0 = np.delete(es,u,1) # all but user u<br />

r0 = np.delete(reviews, u, 0)<br />

P0,P1 = np.where(r0 > 0) # we only care about actual<br />

predictions<br />

X = es[:,P0,P1]<br />

y = r0[r0 > 0]<br />

reg.fit(X.T,y)<br />

coefficients.append(reg.coef_)<br />

prediction = reg.predict(es[:,u,reviews[u] > 0].T)<br />

# measure error as before<br />

The result is an RMSE of almost exactly 1. We can also analyze the coefficients<br />

variable to find out how well our predictors fare:<br />

print coefficients.mean(0) # the mean value across all users<br />

Chapter 8<br />

The values of the array are [ 0.25164062, 0.01258986, 0.60827019]. The<br />

estimate of the most similar movie has the highest weight (it was the best individual<br />

prediction, so it is not surprising), and we can drop the correlation-based method<br />

from the learning process as it has little influence on the final result.<br />

What this setting does is it makes it easy to add a few extra ideas; for example, if the<br />

single most similar movie is a good predictor, how about we use the five most similar<br />

movies in the learning process as well? We can adapt the earlier code to generate the<br />

k-th most similar movie and then use the stacked learner to learn the weights:<br />

es = [<br />

usermodel.estimate_all()<br />

similar_movie.estimate_all(k=1),<br />

similar_movie.estimate_all(k=2),<br />

similar_movie.estimate_all(k=3),<br />

similar_movie.estimate_all(k=4),<br />

similar_movie.estimate_all(k=5),<br />

]<br />

# the rest of the code remains as before!<br />

We gained a lot of freedom in generating new machine learning systems. In this<br />

case, the final result is not better, but it was easy to test this new idea.<br />

[ 171 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!