28.11.2012 Views

Chapter 06 - Changing Education Paradigm

Chapter 06 - Changing Education Paradigm

Chapter 06 - Changing Education Paradigm

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

30 CHAPTER 6. MINING ASSOCIATION RULES IN LARGE DATABASES<br />

Constraint-based mining allow users to focus the search for rules by providing metarules, (i.e., pattern templates)<br />

and additional mining constraints. Such mining is facilitated with the use of a declarative data mining<br />

query language and user interface, and poses great challenges for mining query optimization. In particular, the<br />

rule constraint properties of anti-monotonicity and succinctness can be used during mining to guide the<br />

process, leading to more e cient and e ective mining.<br />

Exercises<br />

1. The Apriori algorithm makes use of prior knowledge of subset support properties.<br />

(a) Prove that all non-empty subsets of a frequent itemset must also be frequent.<br />

(b) Prove that the support of any non-empty subset s 0 of itemset s must be as great as the support of s.<br />

(c) Given frequent itemset l and subset s of l, prove that the con dence of the rule \s 0 ) (l , s 0 )" cannot be<br />

more than the con dence of \s ) (l , s)", where s 0 is a subset of s.<br />

2. Section 6.2.2 describes a method for generating association rules from frequent itemsets. Propose a more<br />

e cient method. Explain why it is more e cient than the one proposed in Section 6.2.2. (Hint: Consider<br />

incorporating the properties of Question 1b and 1c into your design).<br />

3. Suppose we have the following transactional data.<br />

(INSERT TRANSACTIONAL DATA HERE).<br />

Assume that the minimum support and minimum con dence thresholds are 3% and 60%, respectively.<br />

(a) Find the set of frequent itemsets using the Apriori algorithm. Show the derivation of Ck and Lk for each<br />

iteration, k.<br />

(b) Generate strong association rules from the frequent itemsets found above.<br />

4. In Section 6.2.3, we studied two methods of scan reduction. Can you think of another approach, which trims<br />

transactions by removing items that do not contribute to frequent itemsets? Show the details of this approach<br />

in pseudo-code and with an example.<br />

5. Suppose that a large store has a transaction database that is distributed among four locations. Transactions in<br />

each component database have the same format, namely Tj : fi1;:::;img, where Tj is a transaction identi er,<br />

and ik (1 k m) is the identi er of an item purchased in the transaction. Propose an e cient algorithm to<br />

mine global association rules (without considering multilevel associations). You may present your algorithm in<br />

the form of an outline. Your algorithm should not require shipping all of the data to one site and should not<br />

cause excessive network communication overhead.<br />

6. Suppose that a data relation describing students at Big-University has been generalized to the following generalized<br />

relation R.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!