Building Machine Learning Systems with Python - Richert, Coelho
Building Machine Learning Systems with Python - Richert, Coelho
Building Machine Learning Systems with Python - Richert, Coelho
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Regression – Recommendations Improved<br />
We can see, for example, that there were 80 transactions of which 1378, 1379l,<br />
and 1380 were bought together. Of these, 57 also included 1269, so the estimated<br />
conditional probability is 57/80 ≈ 71%. Compared to the fact that only 0.3% of all<br />
transactions included 1269 , this gives us a lift of 255.<br />
The need to have a decent number of transactions in these counts in order to be able<br />
to make relatively solid inferences is why we must first select frequent itemsets.<br />
If we were to generate rules from an infrequent itemset, the counts would be very<br />
small; due to this, the relative values would be meaningless (or subject to very large<br />
error bars).<br />
Note that there are many more association rules that have been discovered from<br />
this dataset: 1030 datasets are required to support at least 80 minimum baskets<br />
and a minimum lift of 5. This is still a small dataset when compared to what is now<br />
possible <strong>with</strong> the Web; when you perform millions of transactions, you can expect to<br />
generate many thousands, even millions, of rules.<br />
However, for each customer, only a few of them will be relevant at any given time,<br />
and so each costumer only receives a small number of recommendations.<br />
More advanced basket analysis<br />
There are now other algorithms for basket analysis that run faster than Apriori. The<br />
code we saw earlier was simple, and was good enough for us as we only had circa<br />
100 thousand transactions. If you have had many millions, it might be worthwhile to<br />
use a faster algorithm (although note that for most applications, learning association<br />
rules can be run offline).<br />
There are also methods to work <strong>with</strong> temporal information leading to rules that take<br />
into account the order in which you have made your purchases. To take an extreme<br />
example of why this may be useful, consider that someone buying supplies for a<br />
large party may come back for trash bags. Therefore it may make sense to propose<br />
trash bags on the first visit. However, it would not make sense to propose party<br />
supplies to everyone who buys a trash bag.<br />
You can find <strong>Python</strong> open source implementations (a new BSD license as scikit-learn)<br />
of some of these in a package called pymining. This package was developed by<br />
Barthelemy Dagenais and is available at https://github.com/bartdag/pymining.<br />
[ 178 ]