1FfUrl0
1FfUrl0
1FfUrl0
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Regression – Recommendations<br />
Ridge, Lasso, and Elastic nets<br />
These penalized models often go by rather interesting names. The<br />
L1 penalized model is often called the Lasso, while an L2 penalized<br />
model is known as Ridge regression. Of course, we can combine the<br />
two and we obtain an Elastic net model.<br />
Both the Lasso and the Ridge result in smaller coefficients than unpenalized regression.<br />
However, the Lasso has the additional property that it results in more coefficients<br />
being set to zero! This means that the final model does not even use some of its input<br />
features, the model is sparse. This is often a very desirable property as the model<br />
performs both feature selection and regression in a single step.<br />
You will notice that whenever we add a penalty, we also add a weight λ, which<br />
governs how much penalization we want. When λ is close to zero, we are very<br />
close to OLS (in fact, if you set λ to zero, you are just performing OLS), and when<br />
λ is large, we have a model which is very different from the OLS one.<br />
The Ridge model is older as the Lasso is hard to compute manually. However, with<br />
modern computers, we can use the Lasso as easily as Ridge, or even combine them<br />
to form Elastic nets. An Elastic net has two penalties, one for the absolute value and<br />
another for the squares.<br />
Using Lasso or Elastic nets in scikit-learn<br />
Let us adapt the preceding example to use elastic nets. Using scikit-learn, it is very<br />
easy to swap in the Elastic net regressor for the least squares one that we had before:<br />
from sklearn.linear_model import ElasticNet<br />
en = ElasticNet(fit_intercept=True, alpha=0.5)<br />
Now we use en whereas before we had used lr. This is the only change that is<br />
needed. The results are exactly what we would have expected. The training error<br />
increases to 5.0 (which was 4.6 before), but the cross-validation error decreases to<br />
5.4 (which was 5.6 before). We have a larger error on the training data, but we gain<br />
better generalization. We could have tried an L1 penalty using the Lasso class or L2<br />
using the Ridge class with the same code.<br />
The next plot shows what happens when we switch from unpenalized regression<br />
(shown as a dotted line) to a Lasso regression, which is closer to a flat line. The<br />
benefits of a Lasso regression are, however, more apparent when we have many<br />
input variables and we consider this setting next:<br />
[ 154 ]