10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

SID TID {a} {b} {c} {d}100 1 1 0 0 0100 2 0 0 1 0100 3 0 1 1 0100 4 0 0 0 1100 5 1 1 1 0100 6 1 0 0 0100 7 0 0 0 1200 1 0 1 0 0200 2 0 0 1 1200 3 1 0 0 0200 4 0 0 1 0200 5 0 1 0 1300 1 0 0 0 1300 2 0 1 1 0300 3 1 0 1 0300 4 0 0 1 1Fig. 3.9. Bitmap Vertical Table3.1 Association Rule <strong>Mining</strong> 43SPAMAyres et al. [14] proposed SPAM algorithm based on the key idea of SPADE. The difference isthat SPAM utilizes a bitmap representation of the database instead of {SID,TID} pairs usedin the SPADE algorithm. Hence, SPAM can perform much better than SPADE <strong>and</strong> others byemploying bitwise operations.While scanning the database for the first time, a vertical bitmap is constructed for eachitem in the database, <strong>and</strong> each bitmap has a bit corresponding to each itemset (element) ofthe sequences in the database. If an item appears in an itemset, the bit corresponding to theitemset of the bitmap for the item is set to one; otherwise, the bit is set to zero. The size ofa sequence is the number of itemsets contained in the sequence. Figure 3.9 shows the bitmapvertical table of that in Figure 3.5. A sequence in the database of size betw<strong>ee</strong>n 2 k +1 <strong>and</strong> 2 k+1is considered as a 2 k+1 -bit sequence. The bitmap of a sequence will be constructed accordingto the bitmaps of items contained in it.To generate <strong>and</strong> test the c<strong>and</strong>idate sequences, SPAM uses two steps, S-step <strong>and</strong> I-step,based on the lattice concept. As a depth-first approach, the overall process starts from S-step<strong>and</strong> then I-step. To extend a sequence, the S-step appends an item to it as the new last element,<strong>and</strong> the I-step appends the item to its last element if possible. Each bitmap partition of asequence to be extended is transformed first in the S-step, such that all bits after the first bitwith value one are set to one. Then the resultant bitmap of the S-step can be obtained by doingANDing operation for the transformed bitmap <strong>and</strong> the bitmap of the appended item. Figure3.10 illustrates how to join two 1-length patterns, a <strong>and</strong> b, based on the example database inFigure 3.5. On the other h<strong>and</strong>, the I-step just uses the bitmaps of the sequence <strong>and</strong> the appendeditem to do ANDing operation to get the resultant bitmap, which extends the pattern 〈ab〉 to thec<strong>and</strong>idate 〈a(bc)〉. The support counting becomes a simple check how many bitmap partitionsnot containing all zeros.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!