01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 8<br />

The beer and diapers story<br />

One of the stories that is often mentioned in the context of basket analysis<br />

is the "diapers and beer" story. It states that when supermarkets first<br />

started to look at their data, they found that diapers were often bought<br />

together with beer. Supposedly, it was the father who would go out to<br />

the supermarket to buy diapers and then would pick up some beer as<br />

well. There has been much discussion of whether this is true or just an<br />

urban myth. In this case, it seems that it is true. In the early 1990s, Osco<br />

Drug did discover that in the early evening beer and diapers were bought<br />

together, and it did surprise the managers who had, until then, never<br />

considered these two products to be similar. What is not true is that this<br />

led the store to move the beer display closer to the diaper section. Also,<br />

we have no idea whether it was really that fathers were buying beer and<br />

diapers together more than mothers (or grandparents).<br />

Obtaining useful predictions<br />

It is not just "customers who bought X also bought Y", even though that is how<br />

many online retailers phrase it (see the Amazon.com screenshot given earlier); a real<br />

system cannot work like this. Why not? Because such a system would get fooled by<br />

very frequently bought items and would simply recommend that which is popular<br />

without any personalization.<br />

For example, at a supermarket, many customers buy bread (say 50 percent of<br />

customers buy bread). So if you focus on any particular item, say dishwasher soap,<br />

and look at what is frequently bought with dishwasher soap, you might find that<br />

bread is frequently bought with soap. In fact, 50 percent of the times someone<br />

buys dishwasher soap, they buy bread. However, bread is frequently bought with<br />

anything else just because everybody buys bread very often.<br />

What we are really looking for is customers who bought X are statistically more likely to<br />

buy Y than the baseline. So if you buy dishwasher soap, you are likely to buy bread,<br />

but not more so than the baseline. Similarly, a bookstore that simply recommended<br />

bestsellers no matter which books you had already bought would not be doing a<br />

good job of personalizing recommendations.<br />

Analyzing supermarket shopping baskets<br />

As an example, we will look at a dataset consisting of anonymous transactions at a<br />

supermarket in Belgium. This dataset was made available by Tom Brijs at Hasselt<br />

University. The data is anonymous, so we only have a number for each product and<br />

a basket that is a set of numbers. The datafile is available from several online sources<br />

(including the book's companion website) as retail.dat.<br />

[ 173 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!