13.07.2015 Views

Comparative Study of Techniques to Discover Frequent ... - IRD India

Comparative Study of Techniques to Discover Frequent ... - IRD India

Comparative Study of Techniques to Discover Frequent ... - IRD India

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

International Journal on Advanced Computer Theory and Engineering (IJACTE)length n can be calculated by using two n-1 lengthpatterns belonging <strong>to</strong> the same class[5]. Example:Consider three a<strong>to</strong>ms corresponding <strong>to</strong> three web pagesP1, P2 and P3. Sample lattice up <strong>to</strong> some length-3patterns and all length-2 patterns shown in fig.8.GSP AlgorithmFig.7. Flowchart for GSP algorithmInput: S(sessions), F1(<strong>Frequent</strong> 1-sequences),min_sup(the minimal access count that satisfies thesupport threshold).Output: the set <strong>of</strong> all the access patterns: FTable: 3 Mining result for GSP algorithm1-<strong>Frequent</strong>accesspatterngenerated2-<strong>Frequent</strong>accesspatterngenerated3-<strong>Frequent</strong>accesspatterngenerated4-<strong>Frequent</strong>accesspatterngenerated{A}:4 {AB}:3 {ABC}:2 {ABCD}:2{B}:5 {BC}:3 {ABE}:2{C}:6 {BE}:2 {BCD}:2{D}:3{E}:3{G}:2{I}:4{CD}:3C. Depth First Search(DFS)In this algorithm, the patterns are categorizedaccording <strong>to</strong> the length executed on lattice model.Patterns will form a lattice based on the pattern-lengthand pattern-frequency. And using this lattice, frequentpatterns are searched depth first.Lattice Construction: The basic element <strong>of</strong> the latticeis an a<strong>to</strong>m i.e. single page. Each a<strong>to</strong>m or page stands forlength-1 prefix equivalence class. Beginning frombot<strong>to</strong>m elements the frequency <strong>of</strong> upper elements withFig.8. Pattern Lattice[5]Session id-timestamp list: After data preprocessing, aseries <strong>of</strong> web pages visited in each session is obtained.Session id timestamp list is a list which keeps session idand timestamp information for any patterns in allsessions. The timestamp information keeps thetimestamp value <strong>of</strong> last a<strong>to</strong>m for patterns with length >1. Example: 4 pages and 3 sessions given below.S1 = Page1 →Page2 → Page4 → Page1 → Page3S2 = Page4 → Page3 → Page1 → Page2S3 = Page3 → Page4 → Page1Table 4. Session id-timestamp listTable 5. Session id-timestamp list for Page3Page1The count for pattern Page3Page1 is 2/3 since i<strong>to</strong>ccurs twice in three sessions. They are then prunedbased on the minimum support. Fig.9. shows theflowchart depicting the working <strong>of</strong> DFS algorithm.48ISSN (Print) : 2319 – 2526, Volume-2, Issue-3, 2013

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!