10.07.2015 Views

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

Web Mining and Social Networking: Techniques and ... - tud.ttu.ee

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.4 Eigenvector, Principal Eigenvector 172.3.1 Correlation-based SimilarityPearson correlation coefficient, which is to calculate the deviations of users’ ratingson various items from their mean ratings on the rated items, is a commonly used similarityfunction in traditional collaborative filtering approaches, where the attributeweight is expressed by a feature vector of numeric ratings on various items, e.g. therating can be from 1 to 5 where 1 st<strong>and</strong>s for the lest like voting <strong>and</strong> 5 for the mostpreferable one. The Pearson correlation coefficient can well deal with collaborativefiltering since all ratings are on a discrete scale rather than on an analogous scale.The measure is described below. Given two users i <strong>and</strong> j, <strong>and</strong> their rating vectors R i<strong>and</strong> R j , the Pearson correlation coefficient is then defined by:sim(i, j)=corr (R i ,R j )= √ n∑n∑k=1k=1(Ri,k − R i)(R j,k − R j)(Ri,k − R i) 2n∑k=1(R j,k − R j) 2(2.1)where R i,k denotes the rating of user i on item k, R i is the average rating of user i.However, this measure is not appropriate in the <strong>Web</strong> mining scenario where thedata type encountered (i.e. user session) is actually a sequence of analogous pageweights. To address this intrinsic property of usage data, the cosine coefficient is abetter choice instead, which is to measure the cosine function of angle betw<strong>ee</strong>n twofeature vectors. Cosine function is widely used in information retrieval research.2.3.2 Cosine-Based SimilaritySince in a vector expression form, any vector could be considered as a line in amultiple-dimensional space, it is intuitive to define the similarity (or distance) betw<strong>ee</strong>ntwo vectors as the cosine function of angle betw<strong>ee</strong>n two “lines”. In this manner,the cosine coefficient can be calculated by the ratio of the dot product of twovectors with respect to their vector norms. Given two vectors A <strong>and</strong> B, the cosinesimilarity is then defined as:( ) −→ −→ −→A −→ A · Bsim(A,B)=cos , B =∣ −→ A ∣ × ∣ −→ B ∣ (2.2)where “·” denotes the dot operation <strong>and</strong> “×” the norm form.2.4 Eigenvector, Principal EigenvectorIn linear algebra, there are two kinds of objects: scalars, which are just numbers,<strong>and</strong> vectors, which can be considered as arrows in a space, <strong>and</strong> which have bothmagni<strong>tud</strong>e <strong>and</strong> direction (though more precisely a vector is a member of a vectorspace). In the context of traditional functions of algebra, the most important functions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!