Chapter 06 - Changing Education Paradigm
Chapter 06 - Changing Education Paradigm
Chapter 06 - Changing Education Paradigm
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
30 CHAPTER 6. MINING ASSOCIATION RULES IN LARGE DATABASES<br />
Constraint-based mining allow users to focus the search for rules by providing metarules, (i.e., pattern templates)<br />
and additional mining constraints. Such mining is facilitated with the use of a declarative data mining<br />
query language and user interface, and poses great challenges for mining query optimization. In particular, the<br />
rule constraint properties of anti-monotonicity and succinctness can be used during mining to guide the<br />
process, leading to more e cient and e ective mining.<br />
Exercises<br />
1. The Apriori algorithm makes use of prior knowledge of subset support properties.<br />
(a) Prove that all non-empty subsets of a frequent itemset must also be frequent.<br />
(b) Prove that the support of any non-empty subset s 0 of itemset s must be as great as the support of s.<br />
(c) Given frequent itemset l and subset s of l, prove that the con dence of the rule \s 0 ) (l , s 0 )" cannot be<br />
more than the con dence of \s ) (l , s)", where s 0 is a subset of s.<br />
2. Section 6.2.2 describes a method for generating association rules from frequent itemsets. Propose a more<br />
e cient method. Explain why it is more e cient than the one proposed in Section 6.2.2. (Hint: Consider<br />
incorporating the properties of Question 1b and 1c into your design).<br />
3. Suppose we have the following transactional data.<br />
(INSERT TRANSACTIONAL DATA HERE).<br />
Assume that the minimum support and minimum con dence thresholds are 3% and 60%, respectively.<br />
(a) Find the set of frequent itemsets using the Apriori algorithm. Show the derivation of Ck and Lk for each<br />
iteration, k.<br />
(b) Generate strong association rules from the frequent itemsets found above.<br />
4. In Section 6.2.3, we studied two methods of scan reduction. Can you think of another approach, which trims<br />
transactions by removing items that do not contribute to frequent itemsets? Show the details of this approach<br />
in pseudo-code and with an example.<br />
5. Suppose that a large store has a transaction database that is distributed among four locations. Transactions in<br />
each component database have the same format, namely Tj : fi1;:::;img, where Tj is a transaction identi er,<br />
and ik (1 k m) is the identi er of an item purchased in the transaction. Propose an e cient algorithm to<br />
mine global association rules (without considering multilevel associations). You may present your algorithm in<br />
the form of an outline. Your algorithm should not require shipping all of the data to one site and should not<br />
cause excessive network communication overhead.<br />
6. Suppose that a data relation describing students at Big-University has been generalized to the following generalized<br />
relation R.