Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

38 3 Algorithms <strong>and</strong> <strong>Techniques</strong>Customer ID Transaction Time Items Bought100 July 3 '07 apple100 July 6 '07 strawberry100 July 8 '07 banana, strawberry100 July 10 '07 pear100 July 12 '07 apple, banana, strawberry100 July 16 '07 apple100 July 21 '07 pear200 July 4 '07 banana200 July 7 '07 strawberry, pear200 July 9 '07 apple200 July 10 '07 strawberry200 July 15 '07 banana, pear300 July 13 '07 pear300 July 15 '07 banana, strawberry300 July 21 '07 apple, strawberry300 July 24 '07 strawberry, pearFig. 3.3. Database Sorted by Customer ID <strong>and</strong> Transaction TimeLarge ItemsetsapplebananastrawberrypearMapped ToabcdFig. 3.4. Large ItemsetsCustomer ID Customer Sequence100 < ac(bc)d(abc)ad >200 < b(cd)ac(bd) >300 < d(bc)(ac)(cd) >Fig. 3.5. Transformed Database• Transformation Phase: All the large itemsets are mapped into a series of integers <strong>and</strong>the original database is converted by replacing the itemsets. For example, with the help ofthe mapping table in Figure 3.4, the transformed database is obtained, as shown in Figure3.5.• Sequence Phase: Mine the transformed database <strong>and</strong> find all frequent sequential patterns.• Maximal Phase: Prune those patterns which are contained in other sequential patterns. Inother words, only maximum sequential patterns are remained.Since most of the phases are straightforward, researches focused on the sequence phase.AprioriAll [11] was first proposed based on the Apriori algorithm in association rule mining[9]. There are two steps to mine sequential patterns, i.e., c<strong>and</strong>idate generation <strong>and</strong> test.The c<strong>and</strong>idate generation process is similar to the AprioriGen in [9]. The Apriori propertyis applied to prune those c<strong>and</strong>idate sequences whose subsequence is not frequent. The differenceis that when the authors generate the c<strong>and</strong>idate by joining the frequent patterns in the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!