18.02.2015 Views

Berry

Berry

Berry

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Market Basket Analysis and Association Rules 313<br />

In other words, minimum support pruning eliminates items that do not<br />

appear in enough transactions. The threshold criterion applies to each step in<br />

the algorithm. The minimum threshold also implies that:<br />

A and B must appear together in at least 10,000 transactions, and,<br />

A and C must appear together in at least 10,000 transactions, and,<br />

A and D must appear together in at least 10,000 transactions,<br />

and so on.<br />

Each step of the calculation of the co-occurrence table can eliminate combinations<br />

of items that do not meet the threshold, reducing its size and the number<br />

of combinations to consider during the next pass.<br />

Figure 9.11 is an example of how the calculation takes place. In this example,<br />

choosing a minimum support level of 10 percent would eliminate all the combinations<br />

with three items—and their associated rules—from consideration.<br />

This is an example where pruning does not have an effect on the best rule since<br />

the best rule has only two items. In the case of pizza, these toppings are all<br />

fairly common, so are not pruned individually. If anchovies were included in<br />

the analysis—and there are only 15 pizzas containing them out of the 2,000—<br />

then a minimum support of 10 percent, or even 1 percent, would eliminate<br />

anchovies during the first pass.<br />

The best choice for minimum support depends on the data and the situation.<br />

It is also possible to vary the minimum support as the algorithm progresses.<br />

For instance, using different levels at different stages you can find<br />

uncommon combinations of common items (by decreasing the support level<br />

for successive steps) or relatively common combinations of uncommon items<br />

(by increasing the support level).<br />

The Problem of Big Data<br />

A typical fast food restaurant offers several dozen items on its menu, say 100.<br />

To use probabilities to generate association rules, counts have to be calculated<br />

for each combination of items. The number of combinations of a given size<br />

tends to grow exponentially. A combination with three items might be a small<br />

fries, cheeseburger, and medium Diet Coke. On a menu with 100 items, how<br />

many combinations are there with three different menu items? There are<br />

161,700! This calculation is based on the binomial formula On the other hand,<br />

a typical supermarket has at least 10,000 different items in stock, and more typically<br />

20,000 or 30,000.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!