01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Regression – Recommendations<br />

Ridge, Lasso, and Elastic nets<br />

These penalized models often go by rather interesting names. The<br />

L1 penalized model is often called the Lasso, while an L2 penalized<br />

model is known as Ridge regression. Of course, we can combine the<br />

two and we obtain an Elastic net model.<br />

Both the Lasso and the Ridge result in smaller coefficients than unpenalized regression.<br />

However, the Lasso has the additional property that it results in more coefficients<br />

being set to zero! This means that the final model does not even use some of its input<br />

features, the model is sparse. This is often a very desirable property as the model<br />

performs both feature selection and regression in a single step.<br />

You will notice that whenever we add a penalty, we also add a weight λ, which<br />

governs how much penalization we want. When λ is close to zero, we are very<br />

close to OLS (in fact, if you set λ to zero, you are just performing OLS), and when<br />

λ is large, we have a model which is very different from the OLS one.<br />

The Ridge model is older as the Lasso is hard to compute manually. However, with<br />

modern computers, we can use the Lasso as easily as Ridge, or even combine them<br />

to form Elastic nets. An Elastic net has two penalties, one for the absolute value and<br />

another for the squares.<br />

Using Lasso or Elastic nets in scikit-learn<br />

Let us adapt the preceding example to use elastic nets. Using scikit-learn, it is very<br />

easy to swap in the Elastic net regressor for the least squares one that we had before:<br />

from sklearn.linear_model import ElasticNet<br />

en = ElasticNet(fit_intercept=True, alpha=0.5)<br />

Now we use en whereas before we had used lr. This is the only change that is<br />

needed. The results are exactly what we would have expected. The training error<br />

increases to 5.0 (which was 4.6 before), but the cross-validation error decreases to<br />

5.4 (which was 5.6 before). We have a larger error on the training data, but we gain<br />

better generalization. We could have tried an L1 penalty using the Lasso class or L2<br />

using the Ridge class with the same code.<br />

The next plot shows what happens when we switch from unpenalized regression<br />

(shown as a dotted line) to a Lasso regression, which is closer to a flat line. The<br />

benefits of a Lasso regression are, however, more apparent when we have many<br />

input variables and we consider this setting next:<br />

[ 154 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!