Data Warehousing and Data Mining Announcements (Tue. Dec. 3 ...
Data Warehousing and Data Mining Announcements (Tue. Dec. 3 ...
Data Warehousing and Data Mining Announcements (Tue. Dec. 3 ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
The Apriori algorithm<br />
Multiple passes over the transactions<br />
Pass finds all frequent -itemsets (i.e., itemsets of<br />
size )<br />
Use the set of frequent -itemsets found in pass to<br />
construct c<strong>and</strong>idate + 1 -itemsets to be counted<br />
in pass + 1<br />
A + 1 -itemset is a c<strong>and</strong>idate only if all its subsets of<br />
size are frequent<br />
25<br />
Example: pass 1<br />
TID items<br />
T001 A, B, E<br />
T002 B, D<br />
T003 B, C<br />
T004 A, B, D<br />
T005 A, C<br />
itemset count<br />
T006 B, C<br />
{A} 6<br />
T007 A, C<br />
{B} 7<br />
T008 A, B, C, E<br />
{C} 6<br />
T009 A, B, C<br />
{D} 2<br />
T010 F<br />
{E} 2<br />
Transactions<br />
% = 20%<br />
Frequent 1-itemsets<br />
(Itemset {F} is infrequent)<br />
26<br />
Example: pass 2<br />
TID<br />
T001<br />
T002<br />
T003<br />
T004<br />
T005<br />
T006<br />
T007<br />
T008<br />
T009<br />
T010<br />
items<br />
A, B, E<br />
B, D<br />
B, C<br />
A, B, D<br />
A, C<br />
B, C<br />
A, C<br />
A, B, C, E<br />
A, B, C<br />
F<br />
Transactions<br />
% = 20%<br />
itemset<br />
{A} 6<br />
{B} 7<br />
{C} 6<br />
{D} 2<br />
{E} 2<br />
Frequent<br />
1-itemsets<br />
Generate<br />
c<strong>and</strong>idates<br />
count<br />
Scan <strong>and</strong><br />
count<br />
itemset<br />
{A,B} 4<br />
{A,C} 4<br />
{A,D} 1<br />
{A,E} 2<br />
{B,C} 4<br />
{B,D} 2<br />
{B,E} 2<br />
{C,D} 0<br />
{C,E} 1<br />
count<br />
{D,E} 0<br />
C<strong>and</strong>idate<br />
2-itemsets<br />
Check<br />
min. support<br />
itemset<br />
{A,B} 4<br />
{A,C} 4<br />
{A,E} 2<br />
{B,C} 4<br />
{B,D} 2<br />
{B,E} 2<br />
count<br />
Frequent<br />
2-itemsets<br />
27<br />
Example: pass 3<br />
TID<br />
T001<br />
T002<br />
T003<br />
T004<br />
T005<br />
T006<br />
T007<br />
T008<br />
T009<br />
T010<br />
items<br />
A, B, E<br />
B, D<br />
B, C<br />
A, B, D<br />
A, C<br />
B, C<br />
A, C<br />
A, B, C, E<br />
A, B, C<br />
F<br />
Transactions<br />
% = 20%<br />
itemset<br />
{A,B} 4<br />
{A,C} 4<br />
{A,E} 2<br />
{B,C} 4<br />
{B,D} 2<br />
{B,E} 2<br />
Frequent<br />
2-itemsets<br />
Generate<br />
c<strong>and</strong>idates<br />
count<br />
Scan <strong>and</strong><br />
count<br />
itemset<br />
{A,B,C} 2<br />
{A,B,E} 2<br />
count<br />
C<strong>and</strong>idate<br />
3-itemsets<br />
Check<br />
min. support<br />
itemset<br />
{A,B,C} 2<br />
count<br />
{A,B,E} 2<br />
Frequent<br />
3-itemsets<br />
28<br />
Example: pass 4<br />
29<br />
Example: final answer<br />
30<br />
TID<br />
T001<br />
T002<br />
T003<br />
T004<br />
T005<br />
T006<br />
T007<br />
T008<br />
T009<br />
T010<br />
items<br />
A, B, E<br />
B, D<br />
B, C<br />
A, B, D<br />
A, C<br />
B, C<br />
A, C<br />
A, B, C, E<br />
A, B, C<br />
F<br />
Transactions<br />
% = 20%<br />
itemset<br />
{A,B,C} 2<br />
{A,B,E} 2<br />
Frequent<br />
3-itemsets<br />
Generate<br />
c<strong>and</strong>idates<br />
count<br />
itemset<br />
count<br />
C<strong>and</strong>idate<br />
4-itemsets<br />
No more itemsets to count!<br />
itemset<br />
{A} 6<br />
{B} 7<br />
{C} 6<br />
{D} 2<br />
{E} 2<br />
count<br />
Frequent<br />
1-itemsets<br />
itemset<br />
{A,B} 4<br />
{A,C} 4<br />
{A,E} 2<br />
{B,C} 4<br />
{B,D} 2<br />
{B,E} 2<br />
count<br />
Frequent<br />
2-itemsets<br />
itemset<br />
{A,B,C} 2<br />
{A,B,E} 2<br />
count<br />
Frequent<br />
3-itemsets<br />
5