10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

44 3 Algorithms <strong>and</strong> <strong>Techniques</strong>{a}1000110001000010S-step{b}0010100100010100{a} s0111111000110001&{b}00101001000101000010100000010000Sup{ab}=2Fig. 3.10. SPAM S-Step joinThe main drawback of SPAM is the huge memory consumption. For example, althoughan item, α, does not exist in a sequence, s, SPAM still uses one bit to represent the existenceof α in s. This disadvantage restricts SPAM as a best algorithm on mining large datasets inlimit resource environments.PrefixSpanPrefixSpan was proposed in [201]. The main idea of PrefixSpan algorithm is to apply databaseprojection to make the database smaller for next iteration <strong>and</strong> improve the performance. Theauthors claimed that in PrefixSpan there is no n<strong>ee</strong>d for c<strong>and</strong>idates generation [201] 1 . It recursivelyprojects the database by already found short length patterns. Different projectionmethods were introduced, i.e., level-by-level projection, bi-level projection, <strong>and</strong> pseudo projection.The workflow of PrefixSpan is presented as follows. Assume that items within transactionsare sorted in alphabetical order (it does not affect the result of discovered patterns).Similar to other algorithms, the first step of PrefixSpan is to scan the database to get the 1-length patterns. Then the original database is projected into different partitions with regard tothe frequent 1-length pattern by taking the corresponding pattern as the prefix. For example,Figure 3.11 (b) shows the projected databases with the frequent (or large) 1-length patternsas their prefixes. The next step is to scan the projected database of γ, where γ could be anyone of the 1-length patterns. After the scanning, we can obtain the frequent 1-length patternsin the projected database. These patterns, combined with their common prefix γ, are d<strong>ee</strong>medas 2-length patterns. The process will be executed recursively. The projected database is partitionedby the k-length patterns, to find those (k+1)-length patterns, until the projected databaseis empty or no more frequent patterns can be found.The introduced strategy is named level-by-level projection. The main computation cost isthe time <strong>and</strong> space usage when constructing <strong>and</strong> scanning the projected databases, as shown1 However, some works (e.g., [263, 264]) have found that PrefixSpan also n<strong>ee</strong>ds to test thec<strong>and</strong>idates, which are existing in the projected database.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!