28.11.2012 Views

Chapter 06 - Changing Education Paradigm

Chapter 06 - Changing Education Paradigm

Chapter 06 - Changing Education Paradigm

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

28 CHAPTER 6. MINING ASSOCIATION RULES IN LARGE DATABASES<br />

1-var Constraint Anti-Monotone Succinct<br />

S v, 2f=; ; g yes yes<br />

v 2 S no yes<br />

S V no yes<br />

S V yes yes<br />

S = V partly yes<br />

min(S) v no yes<br />

min(S) v yes yes<br />

min(S) =v partly yes<br />

max(S) v yes yes<br />

max(S) v no yes<br />

max(S) =v partly yes<br />

count(S) v yes weakly<br />

count(S) v no weakly<br />

count(S) =v partly weakly<br />

sum(S) v yes no<br />

sum(S) v no no<br />

sum(S) =v partly no<br />

avg(S) v, 2f=; ; g no no<br />

(frequency constraint) (yes) (no)<br />

Table 6.3: Characterization of 1-variable constraints: anti-monotonicity and succinctness.<br />

either. This property is used at each iteration of the Apriori algorithm to reduce the number of candidate itemsets<br />

examined, thereby reducing the search space for association rules.<br />

Other examples of anti-monotone constraints include \min(J.price) 500" and \S.year = 1998". Any itemset<br />

which violates either of these constraints can be discarded since adding more items to such itemsets can never satisfy<br />

the constraints. A constraint such as\avg(I.price) 100" is not anti-monotone. For a given set that does not satisfy<br />

this constraint, a superset created by adding some (cheap) items may result in satisfying the constraint. Hence,<br />

pushing this constraint inside the mining process will not guarantee completeness of the data mining query response.<br />

A list of 1-variable constraints, characterized on the notion of anti-monotonicity, is given in the second column of<br />

Table 6.3.<br />

\What other kinds of constraints can we use for pruning the search space?" Apriori-like algorithms deal with other<br />

constraints by rst generating candidate sets and then testing them for constraint satisfaction, thereby following a<br />

generate-and-test paradigm. Instead, is there a kind of constraint for which we can somehow enumerate all and only<br />

those sets that are guaranteed to satisfy the constraint? This property of constraints is called succintness. If a rule<br />

constraint is succinct, then we can directly generate precisely those sets that satisfy it, even before support counting<br />

begins. This avoids the substantial overhead of the generate-and-test paradigm. In other words, such constraints are<br />

pre-counting prunable. Let's study an example of how succinct constraints can be used in mining association rules.<br />

Example 6.8 Based on Table 6.3, the constraint \min(J:price) 500" is succinct. This is because we can explicitly<br />

and precisely generate all the sets of items satisfying the constraint. Speci cally, such a set must contain at least<br />

one item whose price is less than $500. It is of the form: S1 [ S2, where S1 6= ; is a subset of the set of all those<br />

items with prices less than $500, and S2, possibly empty, is a subset of the set of all those items with prices > $500.<br />

Because there is a precise \formula" to generate all the sets satisfying a succinct constraint, there is no need to<br />

iteratively check the rule constraint during the mining process.<br />

What about the constraint \min(J:price) 500", which occurs in Example 6.7? This is also succinct, since we<br />

can generate all sets of items satisfying the constraint. In this case, we simply do not include items whose price is<br />

less than $500, since they cannot be in any set that would satisfy the given constraint. 2<br />

Note that a constraint such as\avg(I:price) 100" could not be pushed into the mining process, since it is<br />

neither anti-monotone nor succinct according to Table 6.3.<br />

Although optimizations associated with succinctness (or anti-monotonicity) cannot be applied to constraints like<br />

\avg(I:price) 100", heuristic optimization strategies are applicable and can often lead to signi cant pruning.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!