Social Choice Theory and Recommender Systems ... - CiteSeerX

More documents

Recommendations

Info

One way to impose SI is to normalize all of the users’ratings to a common scale before applying f. One naturalnormalization is:u iR− min( u i R)u i R′←max u R − min u R( ) ( )This transforms all ratings to the [0,1] range, filtering outany dependence on multiplicative (α i) or additive (β i) scalefactors. 1Another way to ensure SI is to constrain f to dependonly on the relative rank among titles (the ordinalpreferences of users), and not on the magnitude of ratings.Freund et al. [1998] strongly advocate this approach.One important property of [the collaborativefiltering] problem is that the most relevantinformation to be combined represents relativepreferences rather than absolute ratings. In otherwords, even if the ranking of [titles] is expressed byassigning each [title] a numeric score, we would liketo ignore the absolute values of these scores andconcentrate only on their relative order.By ignoring all but relative rank, Freund et al.’s algorithmsatisfies SI. On the other hand, the similarity-basedmethods violate it.Cohen et al. [1999] develop another algorithm forcombining ordinal preferences from multiple experts toform a composite ranking of items, applicable forcollaborative filtering. Their algorithm proceeds in twostages. The first stage actually satisfies all four propertiesdefined in this section: UNIV, UNAM, SI and IIA. As aresult, the first-stage preference relation may containcycles (e.g., title j is preferred to title k, k is preferred to l,and l is preferred to j). The second stage of their algorithmattempts to find the acyclic preference function that mostclosely approximates the stage one preference relation. Thecomplete two-stage algorithm retains invariance to scale(satisfies SI), but may depend on irrelevant alternatives(violates IIA).Different researchers favor one or the other of these fourproperties; the following proposition shows that only onevery restrictive CF function obeys them all.Proposition 1 (Nearest neighbor). Assuming that|NR| > 2, then the only function f of the form (1) thatsatisfies UNIV, UNAM, IIA, and SI is such that:R ij> R ik⇒ P aj> P ak,for all titles j,k ∈ NR, and for one distinguished user i. Thechoice of user i can depend on the ratings{R⋅t j: j ∈ T-NR}, as long as this dependence is invariant toscale, but once the “best” i is determined, his or her ratings1 If max(u i ⋅R) = min(u i ⋅R), then set u i ⋅R′ = 0.iifor the titles in NR must be fully adopted as the activeuser’s predicted ratings.Proof (sketch): Let j be a title in NR. Rewrite f in equation(1) in the following, equivalent, form:P aj= f({R⋅t j: j ∈ T-NR}, { R⋅t j: j ∈ NR})= g({R⋅t j: j ∈ NR}) ,where the choice of function g is itself allowed to dependon {R⋅t j: j ∈ T-NR}. With the exception of the “no rating”value ⊥, the problem has been cast into the same terms asin the Social Choice literature. Doyle and Wellman [1991]point out that Arrow’s original proof does not require thatall users’ preference orderings be complete, and the proofinsists that the aggregate ordering is complete only foritems that some user has expressed a preference over. Withthe additional assumption of minimal functionality (part ofthe definition of UNIV), similar to Doyle and Wellman’s“conflict resolution” condition, standard social choiceproofs become applicable. It follows, from Sen’s [1986] orRobert’s [1980] extension of Arrow’s theorem [1963], thatg, and therefore f, must be of the nearest neighbor formspecified. •If the dictatorial user i does not express a rating for sometitles, then there exists a secondary dictator h whosepreferences are fully adopted among those titles unrated byi and rated by h. This “cascade of dictators” continues untilthe minimal functionality clause of UNIV is satisfied, asshown in Doyle and Wellman’s [1991] axiomatic treatmentof default logic.Weighted Average CollaborativeFilteringWe now examine a slight weakening of the set ofproperties leading to Proposition 1. Under these newconditions, we find that the only possible CF function is aweighted sum: The active user’s predicted rating for eachtitle is a weighted average of the other users’ ratings for thesame title. Our argument is again based on results fromSocial Choice theory; we largely follow Fishburn’s [1987]explication of work originally due to Roberts [1980].We replace the SI property with a weaker one:Property 4 ∗ (TI) Translation Invariance. Consider twoinput ratings matrices, R and R′, such that, for all users i,u i⋅R′ = α (u i⋅R) + β ifor any positive constant α, and anyconstants β i. Then P aj> P akif and only if P′ aj> P′ ak, for alltitles j,k ∈ NR.This condition requires that recommendations remainunchanged when all ratings are multiplied by the sameconstant, and/or when any of the individual ratings areshifted by additive constants. The TI property, like SI, stillhonors the belief that the absolute rating of one title by one
user is not comparable to the absolute rating of anotheruser. Unlike SI, it assumes that the magnitude of ratingsdifferences, (R ij- R ik) and (R hj- R hk), are comparablebetween users i and h.Though they violate SI, the similarity-based methods ofGroupLens, Ringo, and vector similarity obey TI.Proposition 2 (Weighted average). Assuming that|NR| > 2, then the only function f of the form (1) thatsatisfies UNIV, UNAM, IIA, and TI is such that:w⋅R⋅t j> w⋅R⋅t k⇒ P aj> P akfor all titles j,k ∈ NR, where w = is a rowvector of n nonnegative weights, at least one of which ispositive. The specific weights can depend on the ratings{R⋅t j: j ∈ T-NR}.Proof: Follows from Roberts [1980]. •Proposition 2 does not rule out the nearest neighbor policy,as all but one of the w icould be zero.Weighted Average CollaborativeFiltering, … AgainNext, we derive the same conclusion as Proposition 2working from a different axiomatization. This result isadapted from Harsanyi [1955].The derivation requires two assumptions.Property 5 (RRU) Ratings are utilities. Each user’s ratingu i⋅R are a positive linear transformation from his or herutilities. That is, the ratings themselves are expressions ofutility.We also assume that users obey the rationality postulatesof expected utility theory [Savage, 1954; von Neumannand Morgenstern, 1953]. For example, if user i’s ratingsfor three titles are such that R ij> R ik> R il, then there issome probability p for which the user would be indifferentbetween the following two situations: (1) getting title jwith probability p or title l with probability 1 - p, and (2)getting title k for sure.Property 2 *(UnamE) Unanimity of Equality. For allj,k ∈ NR, if R ij= R ikfor all i ≠ a, then P aj= P ak.Proposition 3 (Weighted average, … again). The onlyfunction f of the form (1) that satisfies both RRU andUnamE is such that:P aj= w⋅R⋅t j,for all titles j ∈ NR, where w is an n-dimensional rowvector of real number weights.Proof: Follows from Harsanyi [1955]. •Note that this proposition, unlike the previous, admitsnegative weights.Implications of the AnalysisWhat are the implications of the theoretical limitationshighlighted in Propositions 1–3? First, we believe thatidentifying the connection between CF and Social Choicetheory allows CF researchers to leverage a great deal ofprevious work on preference and utility aggregation. ASocial Choice perspective on combining default reasoningrules has yielded valuable insights for that task [Doyle andWellman, 1991], and similar benefits may accrue for CF.The connection between collaborative filtering andvoting has been recognized informally by many authors;indeed, several use the term “vote” to describe users’ratings. Cohen et al. [1999] make the connection moreexplicit, pointing out the relationship between their rankmergingalgorithm and voting methods formulated as earlyas 1876. In fact, weighted versions of any of the manyproposed voting schemes [Fishburn, 1973] are immediatecandidates for new CF algorithms. One of the goals of thispaper is to extend the analogy beyond terminological andalgorithmic similarity to include axiomatic foundations.Understanding what is theoretically impossible is animportant first step in algorithm design. We believe thatthe results in this paper can help guide CF development inthe future. Though our derivations constrain the type of CFfunction, they do not contain a recommendation as to howexactly to choose the best neighbor, or how to choose theoptimal set of weights. Nonetheless, identifying thefunctional forms themselves can be of value, byconstraining the search among algorithms to one of findingthe best instantiation of a particular form.With regards to real-world applications, CF designersfor Internet commerce applications might typically beinterested more in the predictive performance of a CFalgorithm, rather than in the properties of preferencecoalescence that it does or does not obey. Yet there is noconsensus on how best to measure effectiveness, asevidenced by the proliferation of many proposedevaluation scores. As a result, comparisons among thevarious algorithms are blurred. Even if a standard,accepted evaluation measure is somehow settled upon,empirical performance can be measured only for a limitednumber of special cases, whereas the theoretical resultsapply in all circumstances.ConclusionWe have illustrated a correspondence betweencollaborative filtering (CF) and Social Choice theory. Bothframeworks center on the goal of combining thepreferences (expressed as ratings and utilities, respectively)of a group into a single preference relation. Some of the
Page 1 and 2: Social Choice Theory and Recommende
Page 3: core of a vast literature in Social

Social Choice Theory and Recommender Systems ... - CiteSeerX

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?