08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Weights<br />

Regression – Recommendations Improved<br />

We can try a weighted average, multiplying each prediction by a given weight before<br />

summing it all up. How do we find the best weights though? We learn them from the<br />

data of course!<br />

Ensemble learning<br />

We are using a general technique in machine learning called ensemble<br />

learning; this is not only applicable in regression. We learn an ensemble<br />

(that is, a set) of predictors. Then, we combine them. What is interesting<br />

is that we can see each prediction as being a new feature, and we are now<br />

just combining features based on training data, which is what we have<br />

been doing all along. Note that we are doing so for regression here, but<br />

the same reasoning is applicable during classification: you learn how to<br />

create several classifiers and a master classifier, which takes the output<br />

of all of them and gives a final prediction. Different forms of ensemble<br />

learning differ on how you combine the base predictors. In our case, we<br />

reuse the training data that learned the predictors.<br />

Data<br />

Predict 1<br />

Predict 2<br />

Predict 3<br />

....<br />

Final<br />

Prediction<br />

By having a flexible way to combine multiple methods, we can simply try any idea<br />

we wish by adding it into the mix of learners and letting the system give it a weight.<br />

We can also use the weights to discover which ideas are good: if they get a high<br />

weight, this means that it seems they are adding useful information. Ideas <strong>with</strong><br />

very low weights can even be dropped for better performance.<br />

The code for this is very simple, and is as follows:<br />

# We import the code we used in the previous examples:<br />

import similar_movie<br />

import corrneighbors<br />

import usermodel<br />

from sklearn.linear_model import LinearRegression<br />

es = [<br />

usermodel.estimate_all()<br />

corrneighbors.estimate_all(),<br />

similar_movie.estimate_all(),<br />

]<br />

[ 170 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!