Web Mining and Social Networking: Techniques and ... - tud.ttu.ee
Web Mining and Social Networking: Techniques and ... - tud.ttu.ee
Web Mining and Social Networking: Techniques and ... - tud.ttu.ee
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.1 Association Rule <strong>Mining</strong> 39Table 3.5. AprioriAll C<strong>and</strong>idate Generation L 3 to C 4Large 4-sequences C<strong>and</strong>idate 5-sequences〈b(ac)d〉〈(bc)(ac)d〉〈bcad〉〈d(bc)ad〉〈bdad〉〈d(bc)da〉〈bdcd〉〈d(bc)(ad)〉〈(bc)ad〉〈(bc)(ac)〉〈(bc)cd〉〈c(ac)d〉〈d(ac)d〉〈dbad〉〈d(bc)a〉〈d(bc)d〉〈dcad〉previous pass, different order of combination make different c<strong>and</strong>idates. For example: fromthe items, a <strong>and</strong> b, thr<strong>ee</strong> c<strong>and</strong>idates 〈ab〉, 〈ba〉 <strong>and</strong> 〈(ab)〉 can be generated. But in associationrule mining only 〈(ab)〉 is generated. The reason is that in association rule mining, the timeorder is not taken into account. Obviously the number of c<strong>and</strong>idate sequences in sequentialpattern mining is much larger than the size of the c<strong>and</strong>idate itemsets in association rule miningduring the generation of c<strong>and</strong>idate sequences. Table 3.5 shows how to generate c<strong>and</strong>idate5-sequences by joining large 4-sequences. By scanning the large 4-itemsets, it finds that thefirst itemsets 〈(bc)ad〉 <strong>and</strong> second itemsets 〈(bc)(ac)〉 share their first thr<strong>ee</strong> items, accordingto the join condition of Apriori they are joined to produce the c<strong>and</strong>idate sequence 〈(bc)(ac)d〉.Similarly other c<strong>and</strong>idate 5-sequences are generated.The check process is simple <strong>and</strong> straightforward. Scan the database to count the supportsof those c<strong>and</strong>idate sequences <strong>and</strong> then the frequent sequential patterns can be found.AprioriAll is a the first algorithm on mining sequential patterns <strong>and</strong> its core idea is commonlyapplied by many later algorithms. The disadvantages of AprioriAll are that too manyc<strong>and</strong>idates are generated <strong>and</strong> multiple passes over the database are necessary <strong>and</strong> thus, the costof computation is high.GSPSrikant <strong>and</strong> Agrawal generalized the definition of sequential pattern mining problem in[232] by incorporating some new properties, i.e., time constraints, transaction relaxation, <strong>and</strong>taxonomy. For the time constraints, the maximum gap <strong>and</strong> the minimal gap are defined tospecified the gap betw<strong>ee</strong>n any two adjacent transactions in the sequence. When testing a c<strong>and</strong>idate,if any gap of the c<strong>and</strong>idate falls out of the range betw<strong>ee</strong>n the maximum gap <strong>and</strong> theminimal gap, then the c<strong>and</strong>idate is not a pattern. Furthermore, the authors relaxed the definitionof transaction by using a sliding window, that when the time range betw<strong>ee</strong>n two items issmaller than the sliding window, these two items are considered to be in the same transaction.The taxonomy is used to generate multiple level sequential patterns.