01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 7<br />

Summary<br />

In this chapter, we started with the oldest trick in the book, ordinary least squares. It<br />

is still sometimes good enough. However, we also saw that more modern approaches<br />

that avoid overfitting can give us better results. We used Ridge, Lasso, and Elastic<br />

nets; these are the state-of-the-art methods for regression.<br />

We once again saw the danger of relying on training error to estimate generalization:<br />

it can be an overly optimistic estimate to the point where our model has zero training<br />

error, but we can know that it is completely useless. When thinking through these<br />

issues, we were led into two-level cross-validation, an important point that many in<br />

the field still have not completely internalized. Throughout, we were able to rely on<br />

scikit-learn to support all the operations we wanted to perform, including an easy<br />

way to achieve correct cross-validation.<br />

At the end of this chapter, we started to shift gears and look at recommendation<br />

problems. For now, we approached these problems with the tools we knew:<br />

penalized regression. In the next chapter, we will look at new, better tools for this<br />

problem. These will improve our results on this dataset.<br />

This recommendation setting also has a disadvantage that it requires that users<br />

have rated items on a numeric scale. Only a fraction of users actually perform this<br />

operation. There is another type of information that is often easier to obtain: which<br />

items were purchased together. In the next chapter, we will also see how to leverage<br />

this information in a framework called basket analysis.<br />

[ 163 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!