10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.1 Association Rule <strong>Mining</strong> 39Table 3.5. AprioriAll C<strong>and</strong>idate Generation L 3 to C 4Large 4-sequences C<strong>and</strong>idate 5-sequences〈b(ac)d〉〈(bc)(ac)d〉〈bcad〉〈d(bc)ad〉〈bdad〉〈d(bc)da〉〈bdcd〉〈d(bc)(ad)〉〈(bc)ad〉〈(bc)(ac)〉〈(bc)cd〉〈c(ac)d〉〈d(ac)d〉〈dbad〉〈d(bc)a〉〈d(bc)d〉〈dcad〉previous pass, different order of combination make different c<strong>and</strong>idates. For example: fromthe items, a <strong>and</strong> b, thr<strong>ee</strong> c<strong>and</strong>idates 〈ab〉, 〈ba〉 <strong>and</strong> 〈(ab)〉 can be generated. But in associationrule mining only 〈(ab)〉 is generated. The reason is that in association rule mining, the timeorder is not taken into account. Obviously the number of c<strong>and</strong>idate sequences in sequentialpattern mining is much larger than the size of the c<strong>and</strong>idate itemsets in association rule miningduring the generation of c<strong>and</strong>idate sequences. Table 3.5 shows how to generate c<strong>and</strong>idate5-sequences by joining large 4-sequences. By scanning the large 4-itemsets, it finds that thefirst itemsets 〈(bc)ad〉 <strong>and</strong> second itemsets 〈(bc)(ac)〉 share their first thr<strong>ee</strong> items, accordingto the join condition of Apriori they are joined to produce the c<strong>and</strong>idate sequence 〈(bc)(ac)d〉.Similarly other c<strong>and</strong>idate 5-sequences are generated.The check process is simple <strong>and</strong> straightforward. Scan the database to count the supportsof those c<strong>and</strong>idate sequences <strong>and</strong> then the frequent sequential patterns can be found.AprioriAll is a the first algorithm on mining sequential patterns <strong>and</strong> its core idea is commonlyapplied by many later algorithms. The disadvantages of AprioriAll are that too manyc<strong>and</strong>idates are generated <strong>and</strong> multiple passes over the database are necessary <strong>and</strong> thus, the costof computation is high.GSPSrikant <strong>and</strong> Agrawal generalized the definition of sequential pattern mining problem in[232] by incorporating some new properties, i.e., time constraints, transaction relaxation, <strong>and</strong>taxonomy. For the time constraints, the maximum gap <strong>and</strong> the minimal gap are defined tospecified the gap betw<strong>ee</strong>n any two adjacent transactions in the sequence. When testing a c<strong>and</strong>idate,if any gap of the c<strong>and</strong>idate falls out of the range betw<strong>ee</strong>n the maximum gap <strong>and</strong> theminimal gap, then the c<strong>and</strong>idate is not a pattern. Furthermore, the authors relaxed the definitionof transaction by using a sliding window, that when the time range betw<strong>ee</strong>n two items issmaller than the sliding window, these two items are considered to be in the same transaction.The taxonomy is used to generate multiple level sequential patterns.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!