08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Regression – Recommendations Improved<br />

We can see, for example, that there were 80 transactions of which 1378, 1379l,<br />

and 1380 were bought together. Of these, 57 also included 1269, so the estimated<br />

conditional probability is 57/80 ≈ 71%. Compared to the fact that only 0.3% of all<br />

transactions included 1269 , this gives us a lift of 255.<br />

The need to have a decent number of transactions in these counts in order to be able<br />

to make relatively solid inferences is why we must first select frequent itemsets.<br />

If we were to generate rules from an infrequent itemset, the counts would be very<br />

small; due to this, the relative values would be meaningless (or subject to very large<br />

error bars).<br />

Note that there are many more association rules that have been discovered from<br />

this dataset: 1030 datasets are required to support at least 80 minimum baskets<br />

and a minimum lift of 5. This is still a small dataset when compared to what is now<br />

possible <strong>with</strong> the Web; when you perform millions of transactions, you can expect to<br />

generate many thousands, even millions, of rules.<br />

However, for each customer, only a few of them will be relevant at any given time,<br />

and so each costumer only receives a small number of recommendations.<br />

More advanced basket analysis<br />

There are now other algorithms for basket analysis that run faster than Apriori. The<br />

code we saw earlier was simple, and was good enough for us as we only had circa<br />

100 thousand transactions. If you have had many millions, it might be worthwhile to<br />

use a faster algorithm (although note that for most applications, learning association<br />

rules can be run offline).<br />

There are also methods to work <strong>with</strong> temporal information leading to rules that take<br />

into account the order in which you have made your purchases. To take an extreme<br />

example of why this may be useful, consider that someone buying supplies for a<br />

large party may come back for trash bags. Therefore it may make sense to propose<br />

trash bags on the first visit. However, it would not make sense to propose party<br />

supplies to everyone who buys a trash bag.<br />

You can find <strong>Python</strong> open source implementations (a new BSD license as scikit-learn)<br />

of some of these in a package called pymining. This package was developed by<br />

Barthelemy Dagenais and is available at https://github.com/bartdag/pymining.<br />

[ 178 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!