10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.1 Association Rule <strong>Mining</strong> 37projection-based pattern growth. More details <strong>and</strong> survey on sequential pattern mining can befound in [249, 172].Sequential Pattern <strong>Mining</strong> ProblemLet I = {i 1 ,i 2 ,...,i k } be a set of items. A subset of I is called an itemset or an element. Asequence, s, is denoted as 〈t 1 ,t 2 ,...,t l 〉, where t j is an itemset, i.e., (t j ⊆ I) for 1 ≤ j ≤ l. Theitemset, t j , is denoted as (x 1 x 2 ...x m ), where x k is an item, i.e., x k ∈ I for 1 ≤ k ≤ m. Forbrevity, the brackets are omitted if an itemset has only one item. That is, itemset (x) is writtenas x. The number of items in a sequence is called the length of the sequence. A sequencewith length l is called an l-sequence. A sequence, s a = 〈a 1 ,a 2 ,...,a n 〉, is contained in anothersequence, s b = 〈b 1 ,b 2 ,...,b m 〉, if there exists integers 1 ≤ i 1 < i 2 < ... < i n ≤ m, such thata 1 ⊆ b i1 , a 2 ⊆ b i2 ,..., a n ⊆ b in . We denote s a a subsequence of s b , <strong>and</strong> s b a supersequence ofs a . Given a sequence s = 〈s 1 ,s 2 ,...,s l 〉, <strong>and</strong> an item α, s ⋄ α denotes that s concatenates withα, which has two possible forms, such as Itemset Extension ( IE), s⋄α=〈s 1 ,s 2 ,...,s l ∪{α}〉,or Sequence Extension ( SE), s ⋄ α=〈s 1 ,s 2 ,...,s l ,{α}〉. Ifs ′ = p ⋄ s, then p is a pre f ix of s ′<strong>and</strong> s is a su f f ix of s ′ .A sequence database, S, is a set of tuples 〈sid,s〉, where sid is a sequence id <strong>and</strong> s isa sequence. A tuple 〈sid,s〉 is said to contain a sequence β, ifβ is a subsequence of s. Thesupport of a sequence, β, in a sequence database, S, is the number of tuples in the databasecontaining β, denoted as support(β). Given a user specified positive integer, ε, a sequence,β, is called a frequent sequential pattern if support(β) ≥ ε.Existing Sequential Pattern <strong>Mining</strong> AlgorithmsSequential pattern mining algorithms can be grouped into two categories. One category isApriori-like algorithm, such as Apriori-all [11], GSP [232], SPADE [269], <strong>and</strong> SPAM [14],the other category is projection-based pattern growth, such as PrefixSpan [202].AprioriALLSequential pattern mining was first introduced by Agrawal in [11] where thr<strong>ee</strong> Apriori basedalgorithms were proposed. Given the transaction database with thr<strong>ee</strong> attributes customer-id,transaction-time <strong>and</strong> purchased-items, the mining process were decomposed into five phases:• Sort Phase: The original transaction database is sorted based on the customer <strong>and</strong> transactiontime. Figure 3.3 shows the sorted transaction data.• L-itemsets Phase: The sorted database is first scanned to obtain those frequent (or large)1-itemsets based on the user specified support threshold. Suppose the minimal support is70%. In this case the minimal support count is 2, <strong>and</strong> the result of large 1-itemsets is listedin Figure 3.4.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!