1FfUrl0
1FfUrl0
1FfUrl0
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 8<br />
The beer and diapers story<br />
One of the stories that is often mentioned in the context of basket analysis<br />
is the "diapers and beer" story. It states that when supermarkets first<br />
started to look at their data, they found that diapers were often bought<br />
together with beer. Supposedly, it was the father who would go out to<br />
the supermarket to buy diapers and then would pick up some beer as<br />
well. There has been much discussion of whether this is true or just an<br />
urban myth. In this case, it seems that it is true. In the early 1990s, Osco<br />
Drug did discover that in the early evening beer and diapers were bought<br />
together, and it did surprise the managers who had, until then, never<br />
considered these two products to be similar. What is not true is that this<br />
led the store to move the beer display closer to the diaper section. Also,<br />
we have no idea whether it was really that fathers were buying beer and<br />
diapers together more than mothers (or grandparents).<br />
Obtaining useful predictions<br />
It is not just "customers who bought X also bought Y", even though that is how<br />
many online retailers phrase it (see the Amazon.com screenshot given earlier); a real<br />
system cannot work like this. Why not? Because such a system would get fooled by<br />
very frequently bought items and would simply recommend that which is popular<br />
without any personalization.<br />
For example, at a supermarket, many customers buy bread (say 50 percent of<br />
customers buy bread). So if you focus on any particular item, say dishwasher soap,<br />
and look at what is frequently bought with dishwasher soap, you might find that<br />
bread is frequently bought with soap. In fact, 50 percent of the times someone<br />
buys dishwasher soap, they buy bread. However, bread is frequently bought with<br />
anything else just because everybody buys bread very often.<br />
What we are really looking for is customers who bought X are statistically more likely to<br />
buy Y than the baseline. So if you buy dishwasher soap, you are likely to buy bread,<br />
but not more so than the baseline. Similarly, a bookstore that simply recommended<br />
bestsellers no matter which books you had already bought would not be doing a<br />
good job of personalizing recommendations.<br />
Analyzing supermarket shopping baskets<br />
As an example, we will look at a dataset consisting of anonymous transactions at a<br />
supermarket in Belgium. This dataset was made available by Tom Brijs at Hasselt<br />
University. The data is anonymous, so we only have a number for each product and<br />
a basket that is a set of numbers. The datafile is available from several online sources<br />
(including the book's companion website) as retail.dat.<br />
[ 173 ]