10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.2 <strong>Web</strong> Usage <strong>Mining</strong> using Probabilistic Latent Semantic Analysis 121Actually, the set of probabilities P(z k |s i ) is tending to be “sparse”, that is, for a given s i , typicallyonly few entries are significant different from predefined threshold. Hence we can classifythe user into corresponding cluster based on these probabilities greater than a given threshold.Since each user session can be expressed as a pages vector in the original n-dimensionalspace, we can create a mixture representation of the collection of user sessions within samecluster that associated with the factor z k in terms of a collection of weighted pages. The algorithmfor partitioning user session is described as follows.Algorithm 6.4: Partitioning user sessionsInput: A set of calculated probability values of P(z k |s i ), a user session-page matrix SP, <strong>and</strong> apredefined threshold μ.Output: A set of session clusters SCL =(SCL 1 ,SCL 2 ,···SCL k )Step 1: Set SCL 1 = SCL 2 = ···= SCL k = ϕ,Step 2: For each s i ∈ S, select P(z k |s i ),ifP(z k |s i ) ≥ μ, then SCL k = SCL k ∪ s i ,Step 3: If there are still users sessions to be clustered, go back to step 2,Step 4: Output session clusters SCL = {SCL k }.Characterizing Latent Semantic FactorAs mentioned in previous section, the core of the PLSA model is the latent factor space.From this point of view, how to characterize the factor space or explain the semantic meaningof factors is a crucial issue in PLSA model. Similarly, we can also utilize another obtainedconditional probability distribution by the PLSA model to identify the semantic meaning ofthe latent factor by partitioning <strong>Web</strong> pages into corresponding categories associated with thelatent factors.For each hidden factor z k , we may consider that the pages, whose conditional probabilitiesP(p j |z k ) are greater than a predefined threshold, can be viewed to provide similar functionalcomponents corresponding to the latent factor. In this way, we can select all pages with probabilitiesexc<strong>ee</strong>ding a certain threshold to form an topic-specific page group. By analyzing theURLs of the pages <strong>and</strong> their weights derived from the conditional probabilities, which areassociated with the specific factor, we may characterize <strong>and</strong> explain the semantic meaning ofeach factor. In next section, two examples with respect to the discovered latent factors arepresented. The algorithm to generating the topic-oriented <strong>Web</strong> page group is briefly describedas follows:Algorithm 6.5: Characterizing latent semantic factorsInput: A set of conditional probabilities, P ( p j |z k), a predefined threshold μ.Output: A set of latent semantic factors represented by several dominant pages.Step 1: Set PCL 1 = PCL 2 = ···= PCL k = ϕ ,Step 2: For each z k , choose all <strong>Web</strong> pages such that P ( p j |z k)≥ μ <strong>and</strong> P(zk∣ ∣ p j ) ≥ μ, thenconstruct PCL k = p j ∪ PCL k ,Step 3: If there are still pages to be classified, go back to step 2,Step 4: Output PCL = {PCL k }.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!