Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

146 7 Extracting and Analyzing Web Social NetworksWeb community slightly differs from a community of people, for example, a Web communitymay include competing companies. Since a Web community represents a certain topic, we canunderstand when and how the topic emerged and evolved in the Web.As introduced in Section 5.4, there are several algorithms for finding Web communities.Here, the extraction of Web community utilizes Web community chart that is a graph of communities,in which related communities are connected by weighted edges. The main advantageof the Web community chart is existence of relevance between communities. We can navigatethrough related communities, and locate evolution around a particular community.M.Toyoda and M. Kitsuregawa explain how Web communities evolve, and what kindsof metrics can measure degree of the evolution, such as growth rate and novelty. They firstexplain the details of changes of Web communities, and then introduce evolution metrics thatcan be used for finding patterns of evolution. Here the notations used are summarized in thissection.t 1 , t 2 , ..., t n : Time when each archive crawled. Currently, a month is used as the unit time.W(t k ): The Web archive at time t k .C(t k ): The Web community chart at time t k .c(t k ), d(t k ), e(t k ), ...: Communities in C(t k ).7.1.1 Types of ChangesEmerge: A community c(t k ) emerges in C(t k ), when c(t k ) shares no URLs with any communityin C(t k−1 ). Note that not all URLs in c(t k ) newly appear in W(t k ). Some URLs in c(t k )may be included in W (t k−1 ), and do not have enough connectivity to form a community.Dissolve: A community c(t k−1 ) in C(t k1 ) has dissolved, when c(t k−1 ) shares no URLs withany community in C(t k ). Note that not all URLs in c(t k−1 ) disappeared from W (t k−1 ). SomeURLs in c(t k−1 ) may still be included in W (t k ) losing connectivity to any community.Grow and shrink: When c(t k−1 ) in C(t k−1 ) shares URLs with only c(t k ) in C(t k ), and viceversa, only two changes can occur to c(t k−1 ). The community grows when new URLs areappeared in c(t k ), and shrinks when URLs disappeared from c(t k−1 ). When the number ofappeared URLs is greater than the number of disappeared URLs, it grows. In the reverse case,it shrinks.Split: A community c(t k−1 ) may split into some smaller communities. In this case, c(t k−1 )shares URLs with multiple communities in C(t k ). Split is caused by disconnections of URLsin SDG. Split communities may grow and shrink. They may also merge (see the next item)with other communities.Merge: When multiple communities (c(t k−1 )), d(t k−1 ), ...) share URLs with a single communitye(t k ), these communities are merged into e(t k ) by connections of their URLs in SDG.Merged community may grow and shrink. They may also split before merging.7.1.2 Evolution MetricsEvolution metrics measure how a particular community c(t k ) has evolved. For example, wecan know how much c(t k ) has grown, and how many URLs newly appeared in c(t k ). Theproposed metrics can be used for finding various patterns of evolution described above. Tomeasure changes of c(t k ), the community is identified at time t k−1 corresponding to c(t k ).This corresponding community, c(t k−1 ), is defined as the community that shares the mostURLs with c(t k ). If there were multiple communities that share the same number of URLs, acommunity that has the largest number of URLs is selected.
7.1 Extracting Evolution of Web Community from a Series of Web Archive 147The community at time t k corresponding to c(t k−1 ) can be reversely identified. Whenthis corresponding community is just c(t k ), they call the pair (c(t k−1 )), c(t k )) as main line.Otherwise, the pair is called as branch line. A main line can be extended to a sequence bytracking such symmetrically corresponding communities over time. A community in a mainline is considered to keep its identity, and can be used for a good starting point for findingchanges around its topic.The metrics are defined by differences between c(t k ) and its corresponding communityc(t k−1 ). To define metrics, the following attributes are used to represent how many URLs thefocused community obtains or loses.N(c(t k )): the number of URLs in the c(t k ).Nsh(c(t k−1 ), c(t k )): the number of URLs shared by c(t k−1 ) and c(t k ).Ndis(c(t k−1 )): the number of disappeared URLs from c(t k−1 ) that exist in c(t k−1 ) but do notexist in any community in C(t k )).Nsp(c(tk 1 ), c(t k )): the number of URLs split from c(t k−1 ) to communities at t k other thanc(t k ).Nap(c(t k )): the number of newly appeared URLs in c(t k )) that exist in c(t k ) but do not existin any community C(t k−1 ).Nmg(c(t k−1 ), c(t k )): the number of URLs merged into c(t k )) from communities at t k−1 otherthan c(t k−1 ).Then evolution metrics are defined as follows. The growth rate, R grow (c(t k−1 ), c(t k )),represents the increase of URLs per unit time. It allows us to find most growing or shrinkingcommunities. The growth rate is defined as follows. Note that when c(t k−1 ) does not exist,zero is used as N(c(t k−1 )).R grow (c(t k−1 ),c(t k )) = N(c(t k)) − N(c(t k−1 )). (7.1)t k −t k−1The stability, R stability (c(t k−1 ), c(t k )), represents the amount of disappeared, appeared,merged and split URLs per unit time. When there is no change of URLs, the stability becomeszero. Note that c(t k ) may not be stable even if the growth rate of c(t k ) is zero, because c(t k )may lose and obtain the same number of URLs. A stable community on a topic is the beststarting point for finding interesting changes around the topic. The stability is defined as:R stability (c(t k−1 ),c(t k )) = N(c(t k)) + N(c(t k−1 )) − 2N sh (c(t k−1 ),c(t k ))t k −t k−1. (7.2)The disappearance rate, R disappear (c(t k−1 ), c(t k )), is the number of disappeared URLsfrom c(t k−1 ) per unit time. Higher disappear rate means that the community has lost URLsmainly by disappearance. The disappear rate is defined asR disappear (c(t k−1 ),c(t k )) = N dis(c(t k−1 ))t k −t k−1. (7.3)The merge rate, R merge (c(t k−1 ), c(t k )), is the number of absorbed URLs from other communitiesby merging per unit time. Higher merge rate means that the community has obtainedURLs mainly by merging. The merge rate is defined as follows.
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
Page 66 and 67:
48 3 Algorithms and TechniquesConce
Page 68 and 69:
50 3 Algorithms and TechniquesNaive
Page 70 and 71:
52 3 Algorithms and Techniquesuses
Page 72 and 73:
54 3 Algorithms and Techniquesin th
Page 74 and 75:
56 3 Algorithms and Techniques// Fu
Page 76 and 77:
58 3 Algorithms and Techniquesendd
Page 78 and 79:
60 3 Algorithms and Techniquesstart
Page 80 and 81:
62 3 Algorithms and TechniquesHere
Page 82 and 83:
64 3 Algorithms and Techniques3.8.2
Page 84 and 85:
66 3 Algorithms and Techniquesfor e
Page 86 and 87:
68 3 Algorithms and Techniquesthat
Page 89 and 90:
4Web Content MiningIn recent years
Page 91 and 92:
score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94:
4.2 Web Search 75algorithm. The Web
Page 95 and 96:
4.3 Feature Enrichment of Short Tex
Page 97 and 98:
4.4 Latent Semantic Indexing 794.4
Page 99 and 100:
Notation4.5 Automatic Topic Extract
Page 101 and 102:
4.5 Automatic Topic Extraction from
Page 103 and 104:
4.6 Opinion Search and Opinion Spam
Page 105:
4.6 Opinion Search and Opinion Spam
Page 108 and 109:
90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111:
92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113:
94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115: 96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117: 98 5 Web Linkage MiningNext, consid
Page 118 and 119: 100 5 Web Linkage Mining(5) Creatin
Page 120 and 121: 102 5 Web Linkage Miningpower-law d
Page 122 and 123: 104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125: 106 5 Web Linkage Miningbetween use
Page 126 and 127: 6Web Usage MiningIn previous chapte
Page 129 and 130: 6.1 Modeling Web User Interests usi
Page 137 and 138: 6.2 Web Usage Mining using Probabil
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 149 and 150: 6.4 Co-Clustering Analysis of weblo
Page 151 and 152: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 166 and 167: 148 7 Extracting and Analyzing Web
Page 188 and 189: 170 8 Web Mining and Recommendation
Page 208 and 209: 190 9 Conclusionsries commonly used
Page 210 and 211: 192 9 Conclusionsas computer scienc
Page 212 and 213: 194 9 Conclusionsresearches have de
Page 214 and 215:
196 References14. J. Ayres, J. Gehr
Page 216 and 217:
198 References49. D. Chakrabarti, R
Page 218 and 219:
200 References82. C. Dwork, R. Kuma
Page 220 and 221:
202 References119. J. Hou and Y. Zh
Page 222 and 223:
204 References151. A. N. Langville
Page 224 and 225:
206 References186. J. K. Mui and K.
Page 226 and 227:
208 References223. C. Shahabi, A. M
Page 228:
210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?