10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

128 6 <strong>Web</strong> Usage <strong>Mining</strong>Algorithm 6.7: Variational EM AlgorithmStep 1: (E-step) For each document, find the optimizing values of variational parameters θ ∗ m<strong>and</strong> φ ∗ m .Step 2: (M-step) Maximize the resulting low bound on the likelihood with respect to modelparameters α <strong>and</strong> β. This corresponds to finding the maximum likelihood estimate with theapproximate posterior which is computed in the E-step.The E-step <strong>and</strong> M-step are executed iteratively until a maximum likelihood value reaches.Meanwhile, the calculated estimation parameters can be used to infer topic distribution ofa new document by performing the variational inference. More details with respect to thevariational EM algorithm are referred to [35].<strong>Web</strong> Usage Data ModelIn section 6.1.2, we give a <strong>Web</strong> data model of session-pageview matrix. Here we use the samedata expression, which summarized as:• S = {s i ,i = 1,...,m}: a set of m user sessions.• P = {p j , j = 1,...,n} : a set of n <strong>Web</strong> pageviews.• For each user, a user session is represented as a vector of visited pageviews with correspondingweights: s i = {a ij , j = 1,...,n}, where a ij denotes the weight for page p j visitedin s i user session. The corresponding weight is usually determined by the number of hit orthe amount time spent on the specific page. Here, we use the latter to construct the usagedata for the dataset used in experimental s<strong>tud</strong>y.• SP m×n = {a ij ,i = 1,...,m, j = 1,...,n}: the ultimate usage data in the form of a weightmatrix with a dimensionality of m × n .6.3.2 Modeling User Navigational Task via LDASimilar to capturing the underlying topics over the word vocabulary <strong>and</strong> each document’sprobability distribution over the mixing topic space, LDA is also able to be used to discoverthe hidden access tasks <strong>and</strong> the user preference mixtures over the uncovered task space fromuser navigational observations. That is, from the usage data, LDA identifies the hidden tasks<strong>and</strong> represent each task as a simplex of <strong>Web</strong> pages. Furthermore, LDA characterizes each <strong>Web</strong>user session as a simplex of these discovered tasks. In other words, LDA reveals two aspectsof underlying usage information to us: the hidden task space <strong>and</strong> the task mixture distributionof each <strong>Web</strong> user session, which reflect the underlying correlations betw<strong>ee</strong>n <strong>Web</strong> pages aswell as <strong>Web</strong> user sessions. With the discovered task-simplex expressions, it is viable to modelthe user access patterns in the forms of weighted page vectors, in turn, to predict the targetuser’s potentially interested pages by employing a collaborative recommendation algorithm.In the following part, we first discuss how to discover the user access patterns in terms oftask-simplex expression based on LDA model <strong>and</strong> how to infer a target user’s navigationalpreference by using the estimated task-simplex space, <strong>and</strong> then present a collaborative recommendationalgorithm by incorporating the inferred navigational task distribution of the targetuser with the discovered user access patterns to make <strong>Web</strong> recommendations.Similar to the implementation of document-topic expression in text mining discussedabove, viewing <strong>Web</strong> user sessions as mixtures of tasks makes it possible to formulate theproblem of identifying the set of underlying tasks hidden in a collection of user clicks. Givenm <strong>Web</strong> user sessions containing t hidden tasks expressed over n distinctive pageviews, we can

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!