Building Machine Learning Systems with Python - Richert, Coelho
Building Machine Learning Systems with Python - Richert, Coelho
Building Machine Learning Systems with Python - Richert, Coelho
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 7<br />
Fortunately, scikit-learn makes it very easy to do the right thing: it has classes named<br />
LassoCV, RidgeCV, and ElasticNetCV, all of which encapsulate a cross-validation<br />
check for the inner parameter. The code is 100 percent like the previous one, except<br />
that we do not need to specify any value for alpha<br />
from sklearn.linear_model import ElasticNetCV<br />
met = ElasticNetCV(fit_intercept=True)<br />
kf = KFold(len(target), n_folds=10)<br />
for train,test in kf:<br />
met.fit(data[train],target[train])<br />
p = map(met.predict, data[test])<br />
p = np.array(p).ravel()<br />
e = p-target[test]<br />
err += np.dot(e,e)<br />
rmse_10cv = np.sqrt(err/len(target))<br />
This results in a lot of computation, so you may want to get some coffee while<br />
you are waiting (depending on how fast your computer is).<br />
Rating prediction and recommendations<br />
If you have used any commercial online system in the last 10 years, you have<br />
probably seen these recommendations. Some are like Amazon's "costumers who<br />
bought X also bought Y." These will be dealt <strong>with</strong> in the next chapter under the<br />
topic of basket analysis. Others are based on predicting the rating of a product,<br />
such as a movie.<br />
This last problem was made famous <strong>with</strong> the Netflix Challenge; a million-dollar<br />
machine learning public challenge by Netflix. Netflix (well-known in the U.S.<br />
and U.K., but not available elsewhere) is a movie rental company. Traditionally,<br />
you would receive DVDs in the mail; more recently, the business has focused on<br />
online streaming of videos. From the start, one of the distinguishing features of the<br />
service was that it gave every user the option of rating films they had seen, using<br />
these ratings to then recommend other films. In this mode, you not only have the<br />
information about which films the user saw, but also their impression of them<br />
(including negative impressions).<br />
In 2006, Netflix made available a large number of customer ratings of films in<br />
its database and the goal was to improve on their in-house algorithm for ratings<br />
prediction. Whoever was able to beat it by 10 percent or more would win 1 million<br />
dollars. In 2009, an international team named BellKor's Pragmatic Chaos was able to<br />
beat that mark and take the prize. They did so just 20 minutes before another team,<br />
The Ensemble, passed the 10 percent mark as well! An exciting photo-finish for a<br />
competition that lasted several years.<br />
[ 159 ]