Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

96 5 Web Linkage Mining5.4.1 Bipartite Cores as CommunitiesAlthough broad topics evolve only slowly, more specific topics may be quite ephemeral onthe Web. While well established communities are represented by large bipartite core, theseemerging topics are usually represented by a small bipartite core. HITS finds dense bipartitegraph communities based on broad topic queries using eigenvector computation. However, it istoo inefficient to iterate all such communities in the Web graph with eigenvector computation.Kumar et al. [146] proposed a technique to hunt for these emerging communities from a largeWeb graph, which can be described as follows.First, a bipartite core is defined as a complete bipartite sub-graph which can be partitionedinto two subsets denoted as F and C. A complete bipartite sub-graph on node sets F (|F| = i)and C (|C| = j) contains all possible directed edges from the vertices of F to the vertices ofC. Intuitively, the set F corresponds to pages created by members of the community, pointingto what they believe are valuable pages for that community. For this reason, we will refer tothe set F as a set of fans, and the set C as centers. Fans are like hubs, and centers are likeauthorities. Fig.5.7 below shows an example of a (4,3) bipartite core (4 fans and 3 centers).Fig. 5.7. A (4,3) bipartite coreGiven a large Web crawl data represented as a directed graph, the procedure as proposedin [146] consists of two major steps: pruning and core generation, which can be summarizedas follows.Step 1: Pruning. Unqualified web pages are removed with many types of pruning, e.g. pruningby in-degree, and iterative pruning of fans and centers.Step 2: Generating all (i, j) cores. After pruning, the remaining pages will be used to findcores. First, fixing j, we start with all (1, j) cores which is the set of all vertices with outdegreeat least j. Then, all (2, j) can be constructed by checking every fan which also pointsto any center in a (1, j) core. All (3, j) cores can be constructed by checking every fan whichpoints to any center in a (2, j) core, and so on. This idea is similar to the concept of Apriorialgorithm for association rule mining [6] as every proper subset of the fans in any (i, j) coreforms a core of smaller size.In [146], a large number of emerging communities have been found from a large Webcrawl data. Note that, this method does not find all members pages in the communities, and italso does not find the topics of the communities, their hierarchical organizations, or relationshipsbetween communities.
5.4.2 Network Flow/Cut-based Notions of Communities5.4 Web Community Discovery 97Bipartite cores based communities are usually very small and do not represent full communities.Flake et al. [93] defined a more general community as a subgraph whose internal linkdensity exceeds the density of connection to nodes outside it by some margin.Formally, a community is defined as a subset C of V C ⊂V such that each c ∈V has at leastas many neighbors in C as in VCC. This is a NP-complete graph partitioning problem. Hence,we need to approximate and recast it into a less stringent definition, based on the network flowmodel from operations research.The max-flow/min-cut problem [66] is posed as follows. We are given a graph G =(V,E)with a source node s ∈ V and a target node t ∈ V . Each edge (u,v) is like a water pipe with apositive integer maximum flow capacity c(u,v). The max-flow algorithm finds the maximumrate of flow from s to t without exceeding the capacity constraints on any edge. It is knownthat this maximum flow is the same as a minimum-capacity cut (min-cut) separating s and t.Flake et al. [93] applied the above concept to the Web context as follows. Suppose we aregiven the Web graph G =(V,E) with a node subset S ⊂ V identified as seed URLs, whichare examples of the community the user wishes to discover. An artificial source s will becreated and connected to all seed nodes u ∈ S, setting c(s,u) =∞ . Then, we connect allv ∈ V −S−{s,t} to an artificial target t with c(v,t)=1. Each original edge is made undirectedand heuristically set the capacity to k (usually set to |S|). The s → t max-flow algorithm isapplied on the resulting graph. All nodes in the s-side of the min-cut are defined as membersof the community C.In reality, we do not have the whole Web graph and must collect the necessary portionsby crawling. The crawler begins with the seed set S and finds all in- and out-neighbors of theseed nodes to some fixed depth. The crawled nodes together with the seed set S are then usedto set up the max-flow problem described above. And the process can continue until someconditions are satisfied. This crawling process can be thought of as a different form of focusedcrawling that is driven not by textual content but by consideration based solely on hyperlink.Compared with the bipartite cores based approach, the max-flow based community discoverycan extract larger, more complete communities. However, it cannot find the theme, thehierarchy, and the relationships of Web communities.5.4.3 Web Community ChartThe two community finding algorithms described earlier can only identify groups of pagesthat belong to web communities. They cannot derive or infer the relationships between extractedcommunities. M. Toyoda and M. Kitsuregawa [243] have proposed a technique forconstructing a web community chart that provides not only a set of web communities but alsothe relationships between them.Their technique is based on a link-based related page algorithm that gives related pages toa given input page. The main idea is to apply a related page algorithm to a number of pages,and investigate how each page derives other pages as related pages. If a page s derives a paget as a related page and the page t also derives s as a related page, then we say that there isa symmetric derivation relationship between s and t. For example, a fan page i of a baseballteam derives other fan pages as related pages. And, when we apply the related page algorithmto another fan page j, the page j also derives the original fan page i as its related page. Thesymmetric derivation relationship between two pages often means that they are both pointedto by similar set of hubs.
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65: 46 3 Algorithms and Techniques3.2 S
Page 66 and 67: 48 3 Algorithms and TechniquesConce
Page 68 and 69: 50 3 Algorithms and TechniquesNaive
Page 70 and 71: 52 3 Algorithms and Techniquesuses
Page 72 and 73: 54 3 Algorithms and Techniquesin th
Page 74 and 75: 56 3 Algorithms and Techniques// Fu
Page 76 and 77: 58 3 Algorithms and Techniquesendd
Page 78 and 79: 60 3 Algorithms and Techniquesstart
Page 80 and 81: 62 3 Algorithms and TechniquesHere
Page 82 and 83: 64 3 Algorithms and Techniques3.8.2
Page 84 and 85: 66 3 Algorithms and Techniquesfor e
Page 86 and 87: 68 3 Algorithms and Techniquesthat
Page 89 and 90: 4Web Content MiningIn recent years
Page 91 and 92: score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94: 4.2 Web Search 75algorithm. The Web
Page 95 and 96: 4.3 Feature Enrichment of Short Tex
Page 97 and 98: 4.4 Latent Semantic Indexing 794.4
Page 99 and 100: Notation4.5 Automatic Topic Extract
Page 101 and 102: 4.5 Automatic Topic Extraction from
Page 103 and 104: 4.6 Opinion Search and Opinion Spam
Page 105: 4.6 Opinion Search and Opinion Spam
Page 108 and 109: 90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111: 92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113: 94 5 Web Linkage Mininga =(a(1),·
Page 116 and 117: 98 5 Web Linkage MiningNext, consid
Page 118 and 119: 100 5 Web Linkage Mining(5) Creatin
Page 120 and 121: 102 5 Web Linkage Miningpower-law d
Page 122 and 123: 104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125: 106 5 Web Linkage Miningbetween use
Page 126 and 127: 6Web Usage MiningIn previous chapte
Page 129 and 130: 6.1 Modeling Web User Interests usi
Page 137 and 138: 6.2 Web Usage Mining using Probabil
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 149 and 150: 6.4 Co-Clustering Analysis of weblo
Page 151 and 152: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 164 and 165:
146 7 Extracting and Analyzing Web
Page 166 and 167:
Page 168 and 169:
Page 170 and 171:
Page 172 and 173:
Page 174 and 175:
Page 176 and 177:
Page 178 and 179:
Page 180 and 181:
Page 182 and 183:
Page 184 and 185:
Page 186 and 187:
Page 188 and 189:
170 8 Web Mining and Recommendation
Page 190 and 191:
Page 192 and 193:
Page 194 and 195:
Page 196 and 197:
Page 198 and 199:
Page 200 and 201:
Page 202 and 203:
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
190 9 Conclusionsries commonly used
Page 210 and 211:
192 9 Conclusionsas computer scienc
Page 212 and 213:
194 9 Conclusionsresearches have de
Page 214 and 215:
196 References14. J. Ayres, J. Gehr
Page 216 and 217:
198 References49. D. Chakrabarti, R
Page 218 and 219:
200 References82. C. Dwork, R. Kuma
Page 220 and 221:
202 References119. J. Hou and Y. Zh
Page 222 and 223:
204 References151. A. N. Langville
Page 224 and 225:
206 References186. J. K. Mui and K.
Page 226 and 227:
208 References223. C. Shahabi, A. M
Page 228:
210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?