10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

120 6 <strong>Web</strong> Usage <strong>Mining</strong>P(s i |z k )=∑ m(s i , p j )P(z k |s i , p j )p j∈ ∈P∑ m(s ′s ′ i ∈S,p i , p j)P(z k |s ′ i , p j)j∈P∑(6.18)P(z k )= 1 m(s i , p j )P(zRk |s i , p j ) (6.19)s i ∈S,p j ∈Pwhere R = ∑ m(s i, , p j )s i ∈S,p j ∈PBasically, substituting equations (6.17)-(6.19) into (6.14) <strong>and</strong> (6.15) will result in themonotonically increasing of total likelihood Li of the observation data. The iterative implementationof the E-step <strong>and</strong> M-step is repeating until Li is converging to a local optimal limit,which means the calculated results can represent the optimal probability estimates of the usageobservation data. From the previous formulation, it is easily found that the computational complexityof the PLSA model is O(mnk), where m, n <strong>and</strong> k denote the number of user sessions,<strong>Web</strong> pages <strong>and</strong> latent factors, respectively.By now, we have obtained the conditional probability distribution of P(z k ), P(s i |z k ) <strong>and</strong>P(p j |z k ) by performing the E <strong>and</strong> M step iteratively. The estimated probability distributionwhich is corresponding to the local maximum likelihood contains the useful information forinferring semantic usage factors, performing <strong>Web</strong> user sessions clustering which are describedin next sections.6.2.2 Constructing User Access Pattern <strong>and</strong> Identifying Latent Factor withPLSAAs discussed in the previous section, note that each latent factor z k does really represent aspecific aspect associated with the usage co-occurrence activities in nature. In other words, foreach factor, there might exist a task-oriented user access pattern corresponding to it. We, thus,can utilize the class-conditional probability estimates generated by the PLSA model to producethe aggregated user profiles for characterizing user navigational behaviors. Conceptually, eachaggregated user profile will be expressed as a collection of pages, which are accompanied bytheir corresponding weights indicating the contributions to such user group made by thosepages. Furthermore, analyzing the generated user profile can lead to reveal common user accessinterests, such as dominant or secondary “theme” by sorting the page weights.Partitioning User SessionsFirstly, we begin with the probabilistic variable , which represents the occurrence probabilityin the condition of a latent class factor z k exhibited by a given user session s i . On the otherh<strong>and</strong>, the probabilistic distribution over the factor space of a specific user session s i can reflectthe specific user access preference over the whole latent factor space, therefore, it may be utilizedto uncover the dominant factors by distinguishing the top probability values. Therefore,for each user session s i , we can further compute a set of probabilities over the latent factorspace via Bayesian formula as follows:P(z k |s i )= P(s i|z k )P(z k )∑ P(s i |z k )P(z k )z k ∈Z(6.20)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!