30.12.2013 Views

Data Warehousing and Data Mining Announcements (Tue. Dec. 3 ...

Data Warehousing and Data Mining Announcements (Tue. Dec. 3 ...

Data Warehousing and Data Mining Announcements (Tue. Dec. 3 ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The Apriori algorithm<br />

Multiple passes over the transactions<br />

Pass finds all frequent -itemsets (i.e., itemsets of<br />

size )<br />

Use the set of frequent -itemsets found in pass to<br />

construct c<strong>and</strong>idate + 1 -itemsets to be counted<br />

in pass + 1<br />

A + 1 -itemset is a c<strong>and</strong>idate only if all its subsets of<br />

size are frequent<br />

25<br />

Example: pass 1<br />

TID items<br />

T001 A, B, E<br />

T002 B, D<br />

T003 B, C<br />

T004 A, B, D<br />

T005 A, C<br />

itemset count<br />

T006 B, C<br />

{A} 6<br />

T007 A, C<br />

{B} 7<br />

T008 A, B, C, E<br />

{C} 6<br />

T009 A, B, C<br />

{D} 2<br />

T010 F<br />

{E} 2<br />

Transactions<br />

% = 20%<br />

Frequent 1-itemsets<br />

(Itemset {F} is infrequent)<br />

26<br />

Example: pass 2<br />

TID<br />

T001<br />

T002<br />

T003<br />

T004<br />

T005<br />

T006<br />

T007<br />

T008<br />

T009<br />

T010<br />

items<br />

A, B, E<br />

B, D<br />

B, C<br />

A, B, D<br />

A, C<br />

B, C<br />

A, C<br />

A, B, C, E<br />

A, B, C<br />

F<br />

Transactions<br />

% = 20%<br />

itemset<br />

{A} 6<br />

{B} 7<br />

{C} 6<br />

{D} 2<br />

{E} 2<br />

Frequent<br />

1-itemsets<br />

Generate<br />

c<strong>and</strong>idates<br />

count<br />

Scan <strong>and</strong><br />

count<br />

itemset<br />

{A,B} 4<br />

{A,C} 4<br />

{A,D} 1<br />

{A,E} 2<br />

{B,C} 4<br />

{B,D} 2<br />

{B,E} 2<br />

{C,D} 0<br />

{C,E} 1<br />

count<br />

{D,E} 0<br />

C<strong>and</strong>idate<br />

2-itemsets<br />

Check<br />

min. support<br />

itemset<br />

{A,B} 4<br />

{A,C} 4<br />

{A,E} 2<br />

{B,C} 4<br />

{B,D} 2<br />

{B,E} 2<br />

count<br />

Frequent<br />

2-itemsets<br />

27<br />

Example: pass 3<br />

TID<br />

T001<br />

T002<br />

T003<br />

T004<br />

T005<br />

T006<br />

T007<br />

T008<br />

T009<br />

T010<br />

items<br />

A, B, E<br />

B, D<br />

B, C<br />

A, B, D<br />

A, C<br />

B, C<br />

A, C<br />

A, B, C, E<br />

A, B, C<br />

F<br />

Transactions<br />

% = 20%<br />

itemset<br />

{A,B} 4<br />

{A,C} 4<br />

{A,E} 2<br />

{B,C} 4<br />

{B,D} 2<br />

{B,E} 2<br />

Frequent<br />

2-itemsets<br />

Generate<br />

c<strong>and</strong>idates<br />

count<br />

Scan <strong>and</strong><br />

count<br />

itemset<br />

{A,B,C} 2<br />

{A,B,E} 2<br />

count<br />

C<strong>and</strong>idate<br />

3-itemsets<br />

Check<br />

min. support<br />

itemset<br />

{A,B,C} 2<br />

count<br />

{A,B,E} 2<br />

Frequent<br />

3-itemsets<br />

28<br />

Example: pass 4<br />

29<br />

Example: final answer<br />

30<br />

TID<br />

T001<br />

T002<br />

T003<br />

T004<br />

T005<br />

T006<br />

T007<br />

T008<br />

T009<br />

T010<br />

items<br />

A, B, E<br />

B, D<br />

B, C<br />

A, B, D<br />

A, C<br />

B, C<br />

A, C<br />

A, B, C, E<br />

A, B, C<br />

F<br />

Transactions<br />

% = 20%<br />

itemset<br />

{A,B,C} 2<br />

{A,B,E} 2<br />

Frequent<br />

3-itemsets<br />

Generate<br />

c<strong>and</strong>idates<br />

count<br />

itemset<br />

count<br />

C<strong>and</strong>idate<br />

4-itemsets<br />

No more itemsets to count!<br />

itemset<br />

{A} 6<br />

{B} 7<br />

{C} 6<br />

{D} 2<br />

{E} 2<br />

count<br />

Frequent<br />

1-itemsets<br />

itemset<br />

{A,B} 4<br />

{A,C} 4<br />

{A,E} 2<br />

{B,C} 4<br />

{B,D} 2<br />

{B,E} 2<br />

count<br />

Frequent<br />

2-itemsets<br />

itemset<br />

{A,B,C} 2<br />

{A,B,E} 2<br />

count<br />

Frequent<br />

3-itemsets<br />

5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!