10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

106 5 <strong>Web</strong> Linkage <strong>Mining</strong>betw<strong>ee</strong>n users make the pages that are constrained by Q alone to be more heterogeneous incontent than the pages that are constrained by both U <strong>and</strong> Q.Based on a query log, two kinds of implicit link relations are defined according to twodifferent constraints:1. L I 1: d i <strong>and</strong> d j appear in the same entries constrained by U <strong>and</strong> Q in the query log. Thatis d i <strong>and</strong> d j are clicked by the same user issuing the same query;2. L I 2: d i <strong>and</strong> d j appear in the same entries constrained by the query Q only. That is d i <strong>and</strong>d j are clicked according to the same query, but the query may be issued by different users.It is clear that the constraint for L I 2 is not as strict as that for L I 1. Thus, more links of L I 2can be found than L I 1, however they may be more noisy.Similar to the implicit links, thr<strong>ee</strong> kinds of explicit links are defined based on the hyperlinksamong the <strong>Web</strong> pages according to the following thr<strong>ee</strong> different conditions:L E (i, j)={ 1 CondE0 Other(5.13)1. Cond E 1 : there exist hyperlinks from d j to d i (In-Link to d i from d j );2. Cond E 2 : there exist hyperlinks from d i to d j (Out- Link from d i to d j );3. Cond E 3 : either Cond E 1orCond E 2 holds.These thr<strong>ee</strong> types of explicit links under the above conditions are denoted as LE1, LE2,LE3 respectively. In the above definitions, the inlink <strong>and</strong> out-link are distinguished because,given the target <strong>Web</strong> page, the in-link is the hyperlink created by other <strong>Web</strong> page editors whorefer to the target <strong>Web</strong> page. In contrast, the out-link is created by the editor of the source<strong>Web</strong> page. They may be different when used to describe the target <strong>Web</strong> page. From the abovedefinitions, D.Shen et al. [80]said that the implicit links give the relationships betw<strong>ee</strong>n <strong>Web</strong>pages from the view of <strong>Web</strong> users. However, the explicit links reflect the relationships among<strong>Web</strong> pages from the view of <strong>Web</strong>-page editors.To utilize these link information for classification, D.Shen et al. [80] propose two approach.One is to predict the label of the target <strong>Web</strong> page by the labels of its neighborsthrough majority voting.This algorithm is similar to k-Nearst Neighbor (KNN). However, kis not a constant as in KNN <strong>and</strong> it is decided by the set of the neighbors of the target page.The other enhances classification performance through the links by constructing virtualdocuments. Given a document, the virtual document is constructed by borrowing some extratext from its neighbors. Although originally the concept of the virtual document is pion<strong>ee</strong>redby [101], the notion are extended by including different links. After constructing the virtualdocument through links, any traditional text classification algorithm could be employed toclassify the web pages. In [80], they take Naive Bayesian classifier <strong>and</strong> Support Vector Machineas the classifiers. The experimental results show that implicit links can improve theclassification performance obviously as compared to the explicit links based methods. Moredetails are in [80].SummaryIn this chapter, we have reported some interesting research topics in the context of web linkagemining. The principal technologies used in linkage analysis have b<strong>ee</strong>n substantially addressed<strong>and</strong> the real applications have b<strong>ee</strong>n covered as well. We start with the theory of co-citation <strong>and</strong>bibliographic couple, followed by two most famous linkage mining algorithms, i.e. PageRank

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!