13.07.2015 Views

Comparative Study of Techniques to Discover Frequent ... - IRD India

Comparative Study of Techniques to Discover Frequent ... - IRD India

Comparative Study of Techniques to Discover Frequent ... - IRD India

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

International Journal on Advanced Computer Theory and Engineering (IJACTE)for searching frequent patterns and a few algorithmsbased on construction <strong>of</strong> pattern lattice. Also they havepresented the results <strong>of</strong> the experiments they carried out<strong>to</strong> observe and analyse performances <strong>of</strong> the algorithmsthey have listed[5].Theint Theint Aye explains the importance <strong>of</strong> datapreprocessing <strong>to</strong> improve ease and efficiency <strong>of</strong> miningprocess. It mainly focuses on data preprocessing stagesand explained algorithms for field extraction and datacleaning algorithms that performs the process <strong>of</strong>separating fields from the single line <strong>of</strong> the log file andeliminates inconsistent or unnecessary items in theanalysed data respectively.[1]Robert Cooley et al. have presented a survey <strong>of</strong> theresearch in the area <strong>of</strong> Web usage mining and havebriefly discussed the pattern discovery techniques.III. STAGES IN WEB USAGE MININGBriefly, the stages are noted as follows:i. Obtain data from various sourcesii. Data preprocessingiii. Pattern <strong>Discover</strong>yiv. Pattern Analysiscookies, path analysis, association rules, sequentialpatterns, clustering, decision trees etc.C. Pattern AnalysisFollowing statistics are a few listed ones which arethe end products <strong>of</strong> analysis such as the frequency <strong>of</strong>visits per document, most recent visit per document,who is visiting which documents, frequency <strong>of</strong> use <strong>of</strong>each hyperlink, and most recent use <strong>of</strong> each hyperlink.The common techniques used for pattern analysis arevisualization techniques, OLAP techniques, Data &Knowledge Querying, Usability Analysis.IV. TECHNIQUES TO DISCOVER FREQUENTPATTERNSA. <strong>Frequent</strong> Access Pattern Mining(FAP Mining)FAP algorithm is rooted on FP Growth Algorithmand it also constructs a FAP tree like FP tree. FAPalgorithm is divided in<strong>to</strong> two steps. In the first step, itconstructs frequent access pattern tree (FAP tree) as perthe access paths obtained from the user sessions andrecords the access counts <strong>of</strong> each page. Next step iswhere the function <strong>of</strong> FAP-growth is used <strong>to</strong> mine bothlong and short access patterns on the FAP tree.FP Growth when used in mining association rulesand sequential patterns shows good functionality. Butthere is no sequence among those elements <strong>of</strong> an itemduring mining association rules, whereas access patternmining requires sequential page access. Thus the Fpgrowthhas been revised by X. Wang et al in [7]Table 1. Episode <strong>of</strong> user access path in users session filesFig.2. General Architecture <strong>of</strong> Web Usage Mining[14]A. Data PreprocessingThe data should be preprocessed <strong>to</strong> improve theefficiency and ease <strong>of</strong> the mining process. The main task<strong>of</strong> data preprocessing is <strong>to</strong> prune noisy and irrelevantdata, and <strong>to</strong> reduce data volume for the patterndiscovery phase. Field Extraction and data cleaningalgorithms parse the web log records separating thefields and purging. CoveringB. Pattern discoveryFew techniques <strong>to</strong> discover patterns frompreprocessed data are listed like converting IP addresses<strong>to</strong> domain names, filtering, dynamic site analysis,Fig.3. Flowchart for algorithm <strong>to</strong> construct frequent acesspattern46ISSN (Print) : 2319 – 2526, Volume-2, Issue-3, 2013

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!