10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6.2 <strong>Web</strong> Usage <strong>Mining</strong> using Probabilistic Latent Semantic Analysis 119• P(z k |s i ) denotes a user session-specific probability distribution on the latent class factorz k ,• P(p j |z k ) denotes the class-conditional probability distribution of pages over a specificlatent variable z k .Based on these definitions, the probabilistic latent semantic model can be expressed in thefollowing way:• Select a user session s i with a probability P(s i ) ,• Pick a hidden factor z k with a probability P(z k |s i ),• Generate a page p j with probability P(p j |z k ) ;As a result, we can obtain an occurrence probability of an observed pair (s i , p j ) by adoptingthe latent factor variable z k . Translating this process into a probability model results in th<strong>ee</strong>xpression:P(s i , p j )=P(s i ) · P(p j |s i ) (6.13)where P(p j |s i )= ∑ P(p j |z) · P(z|s i ).z∈ZBy applying the Bayesian formula, a re-parameterized version will be transformed basedon above equations asP(s i , p j )=∑ P(z)P(s i |z)P(p j |z) (6.14)z∈ZFollowing the likelihood principle, we can determine the total likelihood of the observation as∑Li = m(s i , p j ) · logP(s i , p j ) (6.15)s i ∈S,p j ∈Pwhere m(s i , p j ) corresponds to the entry of the session-pageview matrix associated with sessions i <strong>and</strong> pageview p j , which is discussed in the previous section.In order to maximize the total likelihood, it n<strong>ee</strong>ds to repeatedly generate the conditionalprobabilities of P(z), P(s i |z ) <strong>and</strong> P(p j |z ) by utilizing the usage observation data. Knownfrom statistics, Expectation-Maximization (EM) algorithm is an efficient procedure to performmaximum likelihood estimation in latent variable model [72]. Generally, two steps n<strong>ee</strong>d toimplement in the procedure alternately: (1) Expectation (E) step where posterior probabilitiesare calculated for the latent factors based on the current estimates of conditional probability,<strong>and</strong> (2) Maximization (M) step, where the estimated conditional probabilities are updated <strong>and</strong>used to maximize the likelihood based on the posterior probabilities computed in the previousE step.The whole procedure is given as follows:First, given the r<strong>and</strong>omized initial values of P(z), P(s i |z ), P(p j |z )then, in E-step, we can simply apply Bayesian formula to generate the following variablebased on the usage observation:P(z k |s i , p j )= P(z k)P(s i |z k )P(p j |z k )∑ P(z k )P(s i |z k )P(p j |z k )z k ∈Z(6.16)furthermore, in M-step, we compute:P(p j |z k )=∑ m(s i , p j )P(z k |s i , p j )s i ∈S∑ m(s i , p ′s i ∈S,p ′ j ∈P j )P(z k|s i , p ′ j ) (6.17)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!