08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 7<br />

Fortunately, scikit-learn makes it very easy to do the right thing: it has classes named<br />

LassoCV, RidgeCV, and ElasticNetCV, all of which encapsulate a cross-validation<br />

check for the inner parameter. The code is 100 percent like the previous one, except<br />

that we do not need to specify any value for alpha<br />

from sklearn.linear_model import ElasticNetCV<br />

met = ElasticNetCV(fit_intercept=True)<br />

kf = KFold(len(target), n_folds=10)<br />

for train,test in kf:<br />

met.fit(data[train],target[train])<br />

p = map(met.predict, data[test])<br />

p = np.array(p).ravel()<br />

e = p-target[test]<br />

err += np.dot(e,e)<br />

rmse_10cv = np.sqrt(err/len(target))<br />

This results in a lot of computation, so you may want to get some coffee while<br />

you are waiting (depending on how fast your computer is).<br />

Rating prediction and recommendations<br />

If you have used any commercial online system in the last 10 years, you have<br />

probably seen these recommendations. Some are like Amazon's "costumers who<br />

bought X also bought Y." These will be dealt <strong>with</strong> in the next chapter under the<br />

topic of basket analysis. Others are based on predicting the rating of a product,<br />

such as a movie.<br />

This last problem was made famous <strong>with</strong> the Netflix Challenge; a million-dollar<br />

machine learning public challenge by Netflix. Netflix (well-known in the U.S.<br />

and U.K., but not available elsewhere) is a movie rental company. Traditionally,<br />

you would receive DVDs in the mail; more recently, the business has focused on<br />

online streaming of videos. From the start, one of the distinguishing features of the<br />

service was that it gave every user the option of rating films they had seen, using<br />

these ratings to then recommend other films. In this mode, you not only have the<br />

information about which films the user saw, but also their impression of them<br />

(including negative impressions).<br />

In 2006, Netflix made available a large number of customer ratings of films in<br />

its database and the goal was to improve on their in-house algorithm for ratings<br />

prediction. Whoever was able to beat it by 10 percent or more would win 1 million<br />

dollars. In 2009, an international team named BellKor's Pragmatic Chaos was able to<br />

beat that mark and take the prize. They did so just 20 minutes before another team,<br />

The Ensemble, passed the 10 percent mark as well! An exciting photo-finish for a<br />

competition that lasted several years.<br />

[ 159 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!