10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

136 6 <strong>Web</strong> Usage <strong>Mining</strong>Fig. 6.7. An example of index page derived by PageGather algorithm[204]to T.(b) Else, either go back (backtrack) to the parent of C with some possibility, or cease withsome possibility.For a set of target pages (T 1 ,T 2 ,···,T n ), the search pattern follows the similar procedure,but after the user identifying T i , it continues searching T i+1 . In this scenario, the hardest taskis to differentiate the target page from other pages by simply looking at the <strong>Web</strong> log. In thiss<strong>tud</strong>y, the authors claimed that the target pages could be separated out based on either whetherthe targets are content pages or index (or navigational) pages, or setting a time threshold. Forthe former case, the content pages are most likely to be the target pages for a user. Givena website of portal site, where there is not a clear separation betw<strong>ee</strong>n the content <strong>and</strong> indexpages, resulting in the difficulty in judging the content pages, counting the time spent on aspecific page will provide an useful hint to judge this. Here it is known that the user spentmore time than the time threshold are considered the target page.To identify the backtrack points, the <strong>Web</strong> log is analyzed. However, the browser cachingtechnology brings in unexpected difficulty in differentiating the backtrack points, otherwise,the phenomenon of a page where previous <strong>and</strong> next pages are the same gives the justificationto it. In this work, rather than disabling the browser caching, a new algorithm of detectingbacktrack points was devised. The algorithm is motivated by the fact that if there is no linkbetw<strong>ee</strong>n P 1 <strong>and</strong> P 2 , the user must click the “back” button in the browser to return from P 1 to P 2 .Therefore the detection of the backtrack points is becoming the process of detecting whetherthere is a link betw<strong>ee</strong>n two successive pages in the <strong>Web</strong> log. To do this, the authors built a

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!