Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

More documents

Recommendations

Info

134 6 Web Usage MiningWith the propagation and popularity of search engine, Web Usage Mining has attracted a lotof interests on how to facilitate the search performance via learning usage knowledge. Moreover,the research of Web community was greatly benefited from the advance of Web UsageMining. In this section, we will review some studies carried out in this area.6.5.1 Mining Web Logs to Improve Website OrganizationA good design and organization of a website is essential in improving the website’s attractivenessand popularity in Web applications. A well designed and organized site is often a basicrequirement for securing the success of a site. However, it is not a easy task for every Webdesigner to satisfy the aims initially designated. There are many reasons, which are associatedwith the above disappointments. First, different users have their own navigational tasks sofollowing different access traces. Second, even the same user may have different informationneeds at different times. Third, the website is not logically organized and the individual Webpages are aggregated and placed in inappropriate positions, resulting in the users uneasily locatingthe needed information; and furthermore, a site may be designed for a particular kindof use, but be used in many different ways in practice; the designer’s original intent is not fullyrealized. All above mentioned reasons will affect the satisfactory degree of a website use. Bydeeply looking into the causes of such reasons, we can intuitively see that it is mainly becausethe early website design and organization only reflects the intents of website designers or developers,instead, the user opinions or tastes are not sufficiently taken into account. Inspiredby this observation, using Web Usage Mining techniques is intuitively proposed to addressthe improvement of website design and organization. Essentially the knowledge learned fromWeb Usage Mining is able to reveal the user navigational behavior and to benefit the siteorganization improvement by leveraging the knowledge.Adaptive Web SitesIn [204, 203] the authors proposed an approach to address the above challenges by creatingadaptive websites. The approach is to allow Web sites automatically improve their organizationand presentation by learning from visitor access patterns. Different from other methods such ascustomized Web sites, which are to personalize the Web page presentation to individual users,the proposed approach is focused on the site optimization through the automatic synthesisof index pages. The basic idea of the proposed is originated from learning the user accesspatterns and implemented by synthesizing a number of new index pages to represent the useraccess interests. The approach was called PageGather algorithm. Since the algorithm needsonly to explicitly learn usage information rather than to affectively make destructive changesto the original Web sites, the author claimed that the strength of the approach is nondestructivetransformation.The PageGather algorithm is based on a basic assumption of visit-coherence: the pagesa user visits during one interaction with the site tend to be conceptually related. It uses clusteringmining to find the aggregations of related Web pages at a site from the access log. Thewhole process of the proposed algorithm consists of four sub-steps:Algorithm 6.11 PageGather algorithmStep 1. Process the access log into visits.Step 2. Compute the co-occurrence frequencies between pages and create a similarity matrix.
6.5 Web Usage Mining Applications 135Step 3. Create the graph corresponding to the matrix, and employ clique(or connected components)finding algorithm in the graph.Step 4. For each cluster found, create new index Web pages by synthesizing the links to thedocuments of pages contained in the cluster.In the first step, an access log, containing a sequence of hits, or requests to the Webserver, is taken for processing. Each request typically consists of time-stamp made, the URLrequested and the IP address from which the request originated. The IP address in this caseis treated as a single user. Thus a series of hits made in a day period, ordered by the timestamps,is collected as a single session for that user. The obtained user sessions in the formof requested URLs form the session vector, will be used in the second step. To compute theco-occurrence frequencies between pages, the conditional probability of each pair of pages P 1and P 2 is calculated. P r (P 1 |P 2 ) denotes the probability of a user visiting P 1 if it has alreadyvisited P 2 , while P r (P 2 |P 1 ) is the probability of a user visiting P 2 after having visiting P 1 . Theco-occurrence frequency between P 1 and P 2 is the minimum of these values. The reason whyusing the minimum of two conditional probabilities is to avoid the problem of asymmetricalrelationships of two pages playing distinct roles in the Web site. Last, a matrix correspondingto the calculated co-occurrence frequencies is created, and in turn, a graph which is equivalentto the matrix is built up to reflect the connections of pages derived from the log as well. Inthe third step, a clique finding algorithm is employed on the graph to reveal the connectedcomponents of the graph. In this manner, a clique (or called cluster) is the collection of nodes(i.e. pages) whose members are directly connected with edges. In other words, the subgraphof the clique, in which each pair of nodes has a connected path between them, satisfies the factthat every node in the clique or cluster is related to at least one other node in the subgraph.Eventually, for each found cluster of pages, a new indexing page containing all the links tothe documents in the cluster is generated. From the above descriptions, we can see that theadded indexing pages represent the coherent relationships between pages from the user navigationalperspective, therefore, providing an additional way for users to visually know theaccess intents of other users and easily browse directly to the needed pages from the providedinstrumental pages. Figure 6.7 depicts an example of index page derived by PageGatheralgorithm [204].Mining Web Logs to Improve Website OrganizationIn [233], the authors proposed a novel algorithm to automatically find the Web pages in awebsite whose location is different from where the users expect to find. The motivation behindthe approach is that users will backtrack if they do not find the page where they expect it, andthe backtrack point is where the expected location of page should be. Apparently mining Weblog will provide a possible solution to identifying the backtrack points, and in turn, improvingwebsite organization.The model of user search pattern usually follows the below procedures [233]:Algorithm 6.12: backtrack path findingFor a single target page T, the user is expected to execute the following search strategy1. Start from the root.2. While (current location C is not the target page T) do(a) If any of the links from C is likely to reach T, follow the link that appears most likely
Page 2 and 3:
Web Mining and Social Networking
Page 4:
Guandong Xu • Yanchun Zhang • L
Page 8 and 9:
VIIIPrefacefollowing characteristic
Page 11:
Acknowledgements: We would like to
Page 14 and 15:
XIVContents3.1.2 Basic Algorithms f
Page 16 and 17:
XVIContentsPart III Social Networki
Page 19:
Part IFoundation
Page 22 and 23:
4 1 Introduction(3). Learning usefu
Page 24 and 25:
6 1 Introductioncalled computationa
Page 26 and 27:
8 1 Introduction• The data on the
Page 28 and 29:
10 1 Introductionin a broad range t
Page 31 and 32:
2Theoretical BackgroundsAs discusse
Page 33 and 34:
2.2 Textual, Linkage and Usage Expr
Page 35 and 36:
2.4 Eigenvector, Principal Eigenvec
Page 37 and 38:
2.5 Singular Value Decomposition (S
Page 39 and 40:
2.6 Tensor Expression and Decomposi
Page 41 and 42:
2.7 Information Retrieval Performan
Page 43 and 44:
2.8 Basic Concepts in Social Networ
Page 45:
2.8 Basic Concepts in Social Networ
Page 48 and 49:
30 3 Algorithms and TechniquesTable
Page 50 and 51:
32 3 Algorithms and TechniquesSpeci
Page 52 and 53:
34 3 Algorithms and Techniquesa sub
Page 54 and 55:
36 3 Algorithms and TechniquesMetho
Page 56 and 57:
38 3 Algorithms and TechniquesCusto
Page 58 and 59:
40 3 Algorithms and TechniquesTable
Page 60 and 61:
42 3 Algorithms and Techniquesa bSI
Page 62 and 63:
44 3 Algorithms and Techniques{a}10
Page 64 and 65:
46 3 Algorithms and Techniques3.2 S
Page 66 and 67:
48 3 Algorithms and TechniquesConce
Page 68 and 69:
50 3 Algorithms and TechniquesNaive
Page 70 and 71:
52 3 Algorithms and Techniquesuses
Page 72 and 73:
54 3 Algorithms and Techniquesin th
Page 74 and 75:
56 3 Algorithms and Techniques// Fu
Page 76 and 77:
58 3 Algorithms and Techniquesendd
Page 78 and 79:
60 3 Algorithms and Techniquesstart
Page 80 and 81:
62 3 Algorithms and TechniquesHere
Page 82 and 83:
64 3 Algorithms and Techniques3.8.2
Page 84 and 85:
66 3 Algorithms and Techniquesfor e
Page 86 and 87:
68 3 Algorithms and Techniquesthat
Page 89 and 90:
4Web Content MiningIn recent years
Page 91 and 92:
score(q,d)=4.2 Web Search 73V(q) ·
Page 93 and 94:
4.2 Web Search 75algorithm. The Web
Page 95 and 96:
4.3 Feature Enrichment of Short Tex
Page 97 and 98:
4.4 Latent Semantic Indexing 794.4
Page 99 and 100:
Notation4.5 Automatic Topic Extract
Page 101 and 102: 4.5 Automatic Topic Extraction from
Page 103 and 104: 4.6 Opinion Search and Opinion Spam
Page 105: 4.6 Opinion Search and Opinion Spam
Page 108 and 109: 90 5 Web Linkage Mining5.2 Co-citat
Page 110 and 111: 92 5 Web Linkage Mining{ /1 out deg
Page 112 and 113: 94 5 Web Linkage Mininga =(a(1),·
Page 114 and 115: 96 5 Web Linkage Mining5.4.1 Bipart
Page 116 and 117: 98 5 Web Linkage MiningNext, consid
Page 118 and 119: 100 5 Web Linkage Mining(5) Creatin
Page 120 and 121: 102 5 Web Linkage Miningpower-law d
Page 122 and 123: 104 5 Web Linkage MiningFig. 5.10.
Page 124 and 125: 106 5 Web Linkage Miningbetween use
Page 126 and 127: 6Web Usage MiningIn previous chapte
Page 129 and 130: 6.1 Modeling Web User Interests usi
Page 137 and 138: 6.2 Web Usage Mining using Probabil
Page 143 and 144: 6.3 Finding User Access Pattern via
Page 149 and 150: 6.4 Co-Clustering Analysis of weblo
Page 151: 6.5 Web Usage Mining Applications 1
Page 155 and 156: 6.5 Web Usage Mining Applications 1
Page 161: Part IIISocial Networking and Web R
Page 164 and 165: 146 7 Extracting and Analyzing Web
Page 188 and 189: 170 8 Web Mining and Recommendation
Page 202 and 203:
184 8 Web Mining and Recommendation
Page 204 and 205:
Page 206 and 207:
Page 208 and 209:
190 9 Conclusionsries commonly used
Page 210 and 211:
192 9 Conclusionsas computer scienc
Page 212 and 213:
194 9 Conclusionsresearches have de
Page 214 and 215:
196 References14. J. Ayres, J. Gehr
Page 216 and 217:
198 References49. D. Chakrabarti, R
Page 218 and 219:
200 References82. C. Dwork, R. Kuma
Page 220 and 221:
202 References119. J. Hou and Y. Zh
Page 222 and 223:
204 References151. A. N. Langville
Page 224 and 225:
206 References186. J. K. Mui and K.
Page 226 and 227:
208 References223. C. Shahabi, A. M
Page 228:
210 References260. G.-R. Xue, D. Sh
show all

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?