10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Notation4.5 Automatic Topic Extraction from <strong>Web</strong> Documents 81Table 4.1. Notations <strong>and</strong> DescriptionsDescriptionsD a set of documents in corpusD number of documentsd a documentd ′ a documentN d number of terms in a document dL d number of outlinks in a document dW a vocabulary (i.e. a set of terms)W number of termsω a word within a documentZ a set of latent topics within documentsz k a latent variable representing a topicK number of latent topicsβ a r<strong>and</strong>om variable for P(ω | z)Ω a r<strong>and</strong>om variable for P(d | z)θ a r<strong>and</strong>om variable for the joint distribution of a topic mixtureη,α hyperparametersMult(·|θ d ) Multinomial distribution with parameter θ dMult(·|β zi ) Multinomial distribution with parameter β ziPLSA: Probabilistic Latent Semantic Analysis[118] proposed a statistical technique for the analysis of two-mode <strong>and</strong> co-occurrence of data,called probabilistic latent semantic analysis or the aspect model. The PLSA model defines aproper generative model of the data. Let the occurrence of a term ω in a document d be anevent in the PLSA model <strong>and</strong> z denote a latent variable associated with each event in the model(i.e. the latent topic). The generative process of the PLSA model can be described as follows.1. Choose a document over a distribution P(d).2. Choose a latent topic z with probability P(z | d).3. Choose a term ω according to P(ω | z).Fig. 4.2. Graphical model representation of the PLSA model [118].The graphical model for this generative process is shown in Figure 4.2(a). The PLSAmodel postulates that a document d <strong>and</strong> a term w are conditionally independent given anunobserved topic z, i.e. the probability of an event P(ω | d) is defined as:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!