08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Regression – Recommendations Improved<br />

However, we do have to be careful to not overfit our dataset. In fact, if we randomly<br />

try too many things, some of them will work well on this dataset but will not<br />

generalize. Even though we are using cross-validation, we are not cross-validating<br />

our design decisions. In order to have a good estimate, and if data is plentiful, you<br />

should leave a portion of the data untouched until you have your final model that<br />

is about to go into production. Then, testing your model gives you an unbiased<br />

prediction of how well you should expect it to work in the real world.<br />

Basket analysis<br />

The methods we have discussed so far work well when you have numeric ratings of<br />

how much a user liked a product. This type of information is not always available.<br />

Basket analysis is an alternative mode of learning recommendations. In this mode,<br />

our data consists only of what items were bought together; it does not contain any<br />

information on whether individual items were enjoyed or not. It is often easier to get<br />

this data rather than ratings data as many users will not provide ratings, while the<br />

basket data is generated as a side effect of shopping. The following screenshot shows<br />

you a snippet of Amazon.com's web page for the book War and Peace, Leo Tolstoy,<br />

which is a classic way to use these results:<br />

This mode of learning is not only applicable to actual shopping baskets, naturally.<br />

It is applicable in any setting where you have groups of objects together and need<br />

to recommend another. For example, recommending additional recipients to a<br />

user writing an e-mail is done by Gmail and could be implemented using similar<br />

techniques (we do not know what Gmail uses internally; perhaps they combine<br />

multiple techniques as we did earlier). Or, we could use these methods to develop an<br />

application to recommend webpages to visit based on your browsing history. Even if<br />

we are handling purchases, it may make sense to group all purchases by a customer<br />

into a single basket independently of whether the items where bought together or on<br />

separate transactions (this depends on the business context).<br />

[ 172 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!