10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.1 Association Rule <strong>Mining</strong> 33{apple,pear,strawberry}{apple,banana,pear,strawberry}Intersect {apple,pear} & {pear,strawberry}{apple,pear} {pear,strawberry}{apple,banana,pear} {apple,banana,strawberry} {apple,pear,strawberry} {banana,pear,strawberry}{apple,banana} {apple,pear} {apple,strawberry} {banana,pear} {banana,strawberry} {pear,strawberry}Intersect {pear} & {strawberry}{apple} {banana} {pear} {strawberry}{apple} {banana} {pear} {strawberry}{ }Original dataset in vertical formatFig. 3.1. Eclat mining process (vertical dataset, support count via intersection)the frequent itemset mining algorithm. Because the cost on finding frequent itemsetsis high <strong>and</strong> accounts for most of the whole performance on discovering associaterules, almost all research so far has b<strong>ee</strong>n focused on the frequent itemsetgeneration stage.EclatThere are many algorithms had b<strong>ee</strong>n proposed based on Apriroi idea, in which Eclat[270, 268] is distinct that it is the first one which proposed to generate all frequentitemsets in a depth-first manner, while employs the vertical database layout <strong>and</strong> usesthe intersection based approach to compute the support of an itemset.Figure 3.1 illustrates the key idea of Eclat on c<strong>and</strong>idate support counting. Whilefirst scanning of the dataset, it converts the original format (i.e., Table 3.1) into verticalTID list format, as shown in Figure 3.1. For example, the TID list of itemset{apple} is {100, 200, 500}, <strong>and</strong> indicates the transactions that the itemset exist inthe original dataset.To count the support of k-c<strong>and</strong>idate itemset, the algorithm intersects its two (k-1)-subset to get the result. For example, as shown in Figure 3.1, to count the supportof the itemset {pear,strawberry}, it intersects the TID lists of {pear} <strong>and</strong>{strawberry}, resulting in {200, 300}. The support is therefore 2.To reduce the memory used to count the support, Eclat proposed to traverse thelattice (as shown in Figure 3.1) in a depth-first manner. The pseudo code of the Eclatalgorithm is presented in Algorithm 3.2.Algorithm 3.2: Eclat - Frequent Itemset <strong>Mining</strong>Input: A transaction database D, a user specified threshold min sup , a set of atoms of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!