31.07.2013 Views

Recommender systems: Latent Factor Models - SNAP - Stanford ...

Recommender systems: Latent Factor Models - SNAP - Stanford ...

Recommender systems: Latent Factor Models - SNAP - Stanford ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CS246: Mining Massive Datasets<br />

Jure Leskovec, <strong>Stanford</strong> University<br />

http://cs246.stanford.edu


Training data<br />

100 million ratings, 480,000 users, 17,770 movies<br />

6 years of data: 2000-2005<br />

Test data<br />

Last few ratings of each user (2.8 million)<br />

Evaluation criterion: Root Mean Square Error (RMSE)<br />

Netflix Cinematch RMSE: 0.9514<br />

Competition<br />

2700+ teams<br />

$1 million prize for 10% improvement on Cinematch<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 2


480,000<br />

users<br />

1 3 4<br />

3 5 5<br />

4 5 5<br />

3<br />

3<br />

2 2 2<br />

1<br />

17,700 movies<br />

5<br />

2 1 1<br />

3 3<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

3


480,000<br />

users<br />

1 3 4<br />

3 5 5<br />

4 5 5<br />

3<br />

3<br />

2 ? ?<br />

1<br />

17,700 movies<br />

?<br />

2 1 ?<br />

3 ?<br />

SSE = ∑


Basically the winner of the Netflix Challenge<br />

Multi-scale modeling of the data:<br />

Combine top level, regional<br />

modeling of the data, with<br />

a refined, local view:<br />

Global:<br />

Overall deviations of users/movies<br />

<strong>Factor</strong>ization:<br />

Addressing regional effects<br />

CF (k-NN):<br />

Extract local patterns<br />

Global effects<br />

<strong>Factor</strong>ization<br />

CF/NN<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 5


Global:<br />

Mean movie rating: 3.7 stars<br />

The Sixth Sense is 0.5 stars above avg.<br />

Joe rates 0.2 stars below avg.<br />

⇒ Baseline estimation:<br />

Joe will rate The Sixth Sense 4 stars<br />

Local neighborhood (CF/NN):<br />

Joe didn’t like related movie Signs<br />

⇒ Final estimate:<br />

Joe will rate The Sixth Sense 3.8 stars<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 6


Earliest and most popular collaborative<br />

filtering method<br />

Derive unknown ratings from those of “similar”<br />

movies (item-item variant)<br />

Define similarity measure s ij of items i and j<br />

Select k-nearest neighbors, compute the rating<br />

N(i; u): items most similar to i that were rated by u<br />

ˆ<br />

=<br />

∑<br />

j∈N<br />

( i;<br />

u)<br />

s<br />

j∈N<br />

( i;<br />

u)<br />

ij<br />

r<br />

uj<br />

rui ∑<br />

sij… similarity of items i and j<br />

s<br />

ruj…rating of user u on item j<br />

ij<br />

N(i;u)… set of similar items<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

7


In practice we get better estimates if we<br />

model deviations:<br />

^<br />

r = b + ∑<br />

ui ui<br />

baseline estimate for r ui<br />

μ = overall mean rating<br />

b u = rating deviation of user u<br />

= avg. rating of user u – μ<br />

b i = avg. rating of movie i – μ<br />

j∈N(; iu)<br />

∑<br />

j∈N(; iu)<br />

( − )<br />

s r b<br />

ij uj uj<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 8<br />

s<br />

ij<br />

Problems:<br />

1) Similarity measures are arbitrary<br />

2) Pairwise similarities neglect<br />

interdependencies among neighbors<br />

3) Taking a weighted average is<br />

restricting


Use a weighted sum rather than weighted avg.:<br />

How to set w ij?<br />

( )<br />

r = b + ∑ w r −b<br />

^<br />

ui ui j∈N(; iu)<br />

ij uj uj<br />

Allow:<br />

Remember, error metric is SSE: ∑


Find w ij that minimize SSE on training data!<br />

min


So far:<br />

Weights w ij derived based<br />

on their role; no use of an<br />

arbitrary similarity measure<br />

(w ij ≠ s ij)<br />

Explicitly account for<br />

interrelationships among<br />

the neighboring movies<br />

Next: <strong>Latent</strong> factor model<br />

Extract “regional” correlations<br />

( )<br />

r = b + ∑ w r −b<br />

^<br />

ui ui j∈N(; iu)<br />

ij uj uj<br />

Global effects<br />

<strong>Factor</strong>ization<br />

CF/NN<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 11


Geared<br />

towards<br />

females<br />

The Color<br />

Purple<br />

Sense and<br />

Sensibility<br />

The Princess<br />

Diaries<br />

serious<br />

Amadeus<br />

The Lion King<br />

funny<br />

Ocean’s 11<br />

Braveheart<br />

Independence<br />

Day<br />

Lethal<br />

Weapon<br />

Geared<br />

towards<br />

males<br />

Dumb and<br />

Dumber<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

12


1<br />

2<br />

1<br />

4<br />

2<br />

“SVD” on Netflix data: R ≈ Q · P T<br />

3<br />

5<br />

4<br />

4<br />

3<br />

4<br />

1<br />

3<br />

2<br />

5<br />

4<br />

3<br />

5<br />

2<br />

4<br />

3<br />

4<br />

2<br />

5<br />

4<br />

2<br />

3<br />

4<br />

1<br />

5<br />

2<br />

2<br />

4<br />

3<br />

5<br />

≈<br />

items<br />

.1<br />

-.5<br />

-.2<br />

1.1<br />

-.7<br />

-1<br />

-.4<br />

.6<br />

.3<br />

2.1<br />

2.1<br />

.7<br />

Q<br />

.2<br />

.5<br />

.5<br />

.3<br />

-2<br />

.3<br />

1.1<br />

-.8<br />

2.1<br />

For now let’s assume we can approximate the<br />

rating matrix R as a product of “thin” Q · P T<br />

There are important differences between “SVD”<br />

and the real SVD. We will get to them later<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 13<br />

-.2<br />

.7<br />

-.4<br />

.3<br />

.5<br />

.6<br />

.5<br />

1.4<br />

1.7<br />

-2<br />

.3<br />

2.4<br />

-.5<br />

-1<br />

.9<br />

users<br />

.8<br />

1.4<br />

-.3<br />

P T<br />

-.4<br />

2.9<br />

.4<br />

.3<br />

-.7<br />

.8<br />

1.4<br />

1.2<br />

.7<br />

2.4<br />

-.1<br />

-.6<br />

-.9<br />

1.3<br />

.1


How to estimate the missing rating?<br />

items<br />

.1<br />

-.5<br />

-.2<br />

1.1<br />

-.7<br />

-1<br />

-.4<br />

.6<br />

.3<br />

2.1<br />

2.1<br />

.7<br />

Q<br />

.2<br />

.5<br />

.5<br />

.3<br />

-2<br />

.3<br />

items<br />

1<br />

2<br />

1<br />

4<br />

2<br />

1.1<br />

-.8<br />

2.1<br />

3<br />

5<br />

4<br />

4<br />

3<br />

4<br />

1<br />

3<br />

-.2<br />

.7<br />

-.4<br />

users<br />

?<br />

2<br />

5<br />

4<br />

3<br />

5<br />

2<br />

.3<br />

.5<br />

.6<br />

4<br />

3<br />

4<br />

2<br />

.5<br />

1.4<br />

1.7<br />

5<br />

4<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 14<br />

2<br />

3<br />

-2<br />

.3<br />

2.4<br />

4<br />

1<br />

5<br />

2<br />

2<br />

4<br />

3<br />

5<br />

-.5<br />

-1<br />

.9<br />

≈<br />

users<br />

.8<br />

1.4<br />

-.3<br />

P T<br />

-.4<br />

2.9<br />

.4


How to estimate the missing rating?<br />

items<br />

.1<br />

-.5<br />

-.2<br />

1.1<br />

-.7<br />

-1<br />

-.4<br />

.6<br />

.3<br />

2.1<br />

2.1<br />

.7<br />

Q<br />

.2<br />

.5<br />

.5<br />

.3<br />

-2<br />

.3<br />

items<br />

1<br />

2<br />

1<br />

4<br />

2<br />

1.1<br />

-.8<br />

2.1<br />

3<br />

5<br />

4<br />

4<br />

3<br />

4<br />

1<br />

3<br />

-.2<br />

.7<br />

-.4<br />

users<br />

?<br />

2<br />

5<br />

4<br />

3<br />

5<br />

2<br />

.3<br />

.5<br />

.6<br />

4<br />

3<br />

4<br />

2<br />

.5<br />

1.4<br />

1.7<br />

5<br />

4<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 15<br />

2<br />

3<br />

-2<br />

.3<br />

2.4<br />

4<br />

1<br />

5<br />

2<br />

2<br />

4<br />

3<br />

5<br />

-.5<br />

-1<br />

.9<br />

≈<br />

users<br />

.8<br />

1.4<br />

-.3<br />

P T<br />

-.4<br />

2.9<br />

.4


How to estimate the missing rating?<br />

items<br />

.1<br />

-.5<br />

-.2<br />

1.1<br />

-.7<br />

-1<br />

-.4<br />

.6<br />

.3<br />

2.1<br />

2.1<br />

.7<br />

Q<br />

.2<br />

.5<br />

.5<br />

.3<br />

-2<br />

.3<br />

items<br />

1<br />

2<br />

1<br />

4<br />

2<br />

1.1<br />

-.8<br />

2.1<br />

3<br />

5<br />

4<br />

4<br />

3<br />

4<br />

1<br />

3<br />

-.2<br />

.7<br />

-.4<br />

users<br />

2.4<br />

2<br />

5<br />

4<br />

3<br />

5<br />

2<br />

.3<br />

.5<br />

.6<br />

4<br />

3<br />

4<br />

2<br />

.5<br />

1.4<br />

1.7<br />

5<br />

4<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 16<br />

2<br />

3<br />

-2<br />

.3<br />

2.4<br />

4<br />

1<br />

5<br />

2<br />

2<br />

4<br />

3<br />

5<br />

-.5<br />

-1<br />

.9<br />

≈<br />

users<br />

.8<br />

1.4<br />

-.3<br />

P T<br />

-.4<br />

2.9<br />

.4


Geared<br />

towards<br />

females<br />

The Color<br />

Purple<br />

Sense and<br />

Sensibility<br />

The Princess<br />

Diaries<br />

serious<br />

Amadeus<br />

The Lion King<br />

<strong>Factor</strong> 2<br />

funny<br />

Ocean’s 11<br />

Braveheart<br />

Independence<br />

Day<br />

Lethal<br />

Weapon<br />

<strong>Factor</strong> 1<br />

Geared<br />

towards<br />

males<br />

Dumb and<br />

Dumber<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

17


Geared<br />

towards<br />

females<br />

The Color<br />

Purple<br />

Sense and<br />

Sensibility<br />

The Princess<br />

Diaries<br />

serious<br />

Amadeus<br />

The Lion King<br />

<strong>Factor</strong> 2<br />

funny<br />

Ocean’s 11<br />

Braveheart<br />

Independence<br />

Day<br />

Lethal<br />

Weapon<br />

<strong>Factor</strong> 1<br />

Geared<br />

towards<br />

males<br />

Dumb and<br />

Dumber<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

18


Remember SVD:<br />

A: Input data matrix<br />

U: Left singular vecs<br />

V: Right singular vecs<br />

m<br />

Σ: Singular values<br />

U<br />

SVD gives minimum reconstruction error (MSE!)<br />

min


1<br />

2<br />

1<br />

4<br />

2<br />

3<br />

5<br />

4<br />

4<br />

3<br />

4<br />

1<br />

3<br />

2<br />

5<br />

4<br />

3<br />

5<br />

2<br />

4<br />

3<br />

4<br />

2<br />

5<br />

4<br />

2<br />

3<br />

4<br />

1<br />

5<br />

2<br />

2<br />

4<br />

3<br />

5<br />

≈<br />

items<br />

.1<br />

-.5<br />

-.2<br />

1.1<br />

-.7<br />

-1<br />

-.4<br />

.6<br />

.3<br />

2.1<br />

2.1<br />

.7<br />

SVD isn’t defined when entries are missing<br />

Use specialized methods to find P, Q<br />

min


Want to minimize SSE for test data<br />

Idea: Minimize SSE on Training data<br />

Want large f (# of factors) to capture all the signals<br />

But, test SSE begins to rise for f > 2<br />

Regularization is needed<br />

Allow rich model where there are sufficient data<br />

Shrink aggressively where data are scarce<br />

min<br />

P,<br />

Q<br />

∑<br />

training<br />

( r<br />

ui<br />

“error”<br />

λ… regularization parameter<br />

T 2 ⎡<br />

− + ⎢∑<br />

2<br />

q<br />

+ ∑<br />

i pu<br />

) λ pu<br />

⎣ u<br />

i<br />

“length”<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 21<br />

q<br />

i<br />

2<br />

1 3 4<br />

3 5 5<br />

4 5 5<br />

3<br />

3<br />

2 ? ?<br />

?<br />

2 1 ?<br />

3 ?<br />

1<br />

⎤<br />

⎥<br />


min<br />

P,<br />

Q<br />

Geared<br />

towards<br />

females<br />

∑<br />

training<br />

The Color<br />

Purple<br />

Sense and<br />

Sensibility<br />

The Princess<br />

Diaries<br />

T 2 ⎡ 2<br />

( rui<br />

− qi<br />

pu<br />

) + λ⎢∑<br />

pu<br />

+ ∑ qi<br />

⎣ u<br />

i<br />

⎤<br />

⎥<br />

⎦<br />

min factors “error” + λ “length”<br />

2<br />

serious<br />

Amadeus<br />

The Lion King<br />

<strong>Factor</strong> 2<br />

funny<br />

Ocean’s 11<br />

Braveheart<br />

Independence<br />

Day<br />

Lethal<br />

Weapon<br />

<strong>Factor</strong> 1<br />

Geared<br />

towards<br />

males<br />

Dumb and<br />

Dumber<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

22


min<br />

P,<br />

Q<br />

Geared<br />

towards<br />

females<br />

∑<br />

training<br />

The Color<br />

Purple<br />

Sense and<br />

Sensibility<br />

The Princess<br />

Diaries<br />

T 2 ⎡ 2<br />

( rui<br />

− qi<br />

pu<br />

) + λ⎢∑<br />

pu<br />

+ ∑ qi<br />

⎣ u<br />

i<br />

⎤<br />

⎥<br />

⎦<br />

min factors “error” + λ “length”<br />

2<br />

serious<br />

Amadeus<br />

The Lion King<br />

<strong>Factor</strong> 2<br />

funny<br />

Ocean’s 11<br />

Braveheart<br />

Independence<br />

Day<br />

Lethal<br />

Weapon<br />

<strong>Factor</strong> 1<br />

Geared<br />

towards<br />

males<br />

Dumb and<br />

Dumber<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

23


min<br />

P,<br />

Q<br />

Geared<br />

towards<br />

females<br />

∑<br />

training<br />

The Color<br />

Purple<br />

Sense and<br />

Sensibility<br />

The Princess<br />

Diaries<br />

T 2 ⎡ 2<br />

( rui<br />

− qi<br />

pu<br />

) + λ⎢∑<br />

pu<br />

+ ∑ qi<br />

⎣ u<br />

i<br />

⎤<br />

⎥<br />

⎦<br />

min factors “error” + λ “length”<br />

2<br />

serious<br />

Amadeus<br />

The Lion King<br />

<strong>Factor</strong> 2<br />

funny<br />

Ocean’s 11<br />

Braveheart<br />

Independence<br />

Day<br />

Lethal<br />

Weapon<br />

<strong>Factor</strong> 1<br />

Geared<br />

towards<br />

males<br />

Dumb and<br />

Dumber<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

24


min<br />

P,<br />

Q<br />

Geared<br />

towards<br />

females<br />

∑<br />

training<br />

The Color<br />

Purple<br />

Sense and<br />

Sensibility<br />

The Princess<br />

Diaries<br />

T 2 ⎡ 2<br />

( rui<br />

− qi<br />

pu<br />

) + λ⎢∑<br />

pu<br />

+ ∑ qi<br />

⎣ u<br />

i<br />

⎤<br />

⎥<br />

⎦<br />

min factors “error” + λ “length”<br />

2<br />

serious<br />

Amadeus<br />

The Lion King<br />

<strong>Factor</strong> 2<br />

funny<br />

Ocean’s 11<br />

Braveheart<br />

Independence<br />

Day<br />

Lethal<br />

Weapon<br />

<strong>Factor</strong> 1<br />

Geared<br />

towards<br />

males<br />

Dumb and<br />

Dumber<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

25


Want to find matrices P and Q:<br />

, Q<br />

∑(<br />

rui<br />

training<br />

T 2 ⎡<br />

qi<br />

pu<br />

) + λ⎢<br />

⎣ u<br />

pu<br />

2<br />

+<br />

min<br />

P<br />

− ∑ ∑<br />

Online “stochastic” gradient decent:<br />

Initialize P and Q (random, using SVD)<br />

Then iterate over the ratings and update factors:<br />

For each r ui:


2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

Koren, Bell, Volinksy, IEEE Computer, 2009<br />

27


user bias<br />

movie bias<br />

Baseline predictor<br />

Separates users and movies<br />

Benefits from insights into user’s<br />

behavior<br />

Among the main practical<br />

contributions of the competition<br />

μ = overall mean rating<br />

b u = bias of user u<br />

b i = bias of movie i<br />

user-movie interaction<br />

User-Movie interaction<br />

Characterizes the matching between<br />

users and movies<br />

Attracts most research in the field<br />

Benefits from algorithmic and<br />

mathematical innovations<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 28


We have expectations on the rating by<br />

user u of movie i, even without estimating u’s<br />

attitude towards movies like i<br />

– Rating scale of user u<br />

– Values of other ratings user<br />

gave recently (day-specific<br />

mood, anchoring, multi-user<br />

accounts)<br />

– (Recent) popularity of movie i<br />

– Selection bias; related to<br />

number of ratings user gave on<br />

the same day (“frequency”)<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

29


We have expectations on the rating by<br />

user u of movie i, even without estimating u’s<br />

attitude towards movies like i<br />

– Rating scale of user u<br />

– Values of other ratings user<br />

gave recently (day-specific<br />

mood, anchoring, multi-user<br />

accounts)<br />

– (Recent) popularity of movie i<br />

– Selection bias; related to<br />

number of ratings user gave on<br />

the same day (“frequency”)<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

29


Solve:<br />

min<br />

Q,<br />

P<br />

∑<br />

( u,<br />

i)<br />

∈R<br />

( T<br />

r − ( + b + b + q p ) )<br />

µ<br />

ui<br />

+ λ<br />

λ Is typically selected<br />

via grid-search on a<br />

validation set<br />

u i i<br />

goodness of fit<br />

( )<br />

2 2 2 2<br />

q + p + b + b<br />

i<br />

regularization<br />

Stochastic gradient decent to find parameters<br />

Note: Both biases (b u, b i) as well as interactions (q i,<br />

p u) are treated as parameters (we estimate them)<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 31<br />

u<br />

u<br />

2<br />

u<br />

i


Basic Collaborative filtering: 0.94<br />

Collaborative filtering++: 0.91<br />

<strong>Latent</strong> factors: 0.90<br />

<strong>Latent</strong> factors+Biases: 0.89<br />

Global average: 1.1296<br />

User average: 1.0651<br />

Movie average: 1.0533<br />

Netflix: 0.9514<br />

Final BellKor: 0.869<br />

Grand Prize: 0.8563<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 32


RMSE<br />

0.92<br />

0.915<br />

0.91<br />

0.905<br />

0.9<br />

0.895<br />

0.89<br />

0.885<br />

1 10 100 1000<br />

Millions of parameters<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

CF (no time bias)<br />

Basic <strong>Latent</strong> <strong>Factor</strong>s<br />

<strong>Latent</strong> <strong>Factor</strong>s w/ Biases<br />

33


Sudden rise in the<br />

average movie rating<br />

(early 2004)<br />

Improvements in Netflix<br />

GUI improvements<br />

Meaning of rating changed<br />

Movie age<br />

Users prefer new movies<br />

without any reasons<br />

Older movies are just<br />

inherently better than<br />

newer ones<br />

Y. Koren, Collaborative filtering with<br />

temporal dynamics, KDD ’09<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 35


Original model:<br />

r ui = µ +b u + b i + q i ·p u T<br />

Add time dependence to biases:<br />

r ui = µ +b u(t)+ b i(t) +q i · p u T<br />

Make parameters b u and b i to depend on time<br />

(1) Parameterize time-dependence by linear trends<br />

(2) Each bin corresponds to 10 consecutive weeks<br />

Add temporal dependence to factors<br />

p u(t)… user preference vector on day t<br />

2/6/2012<br />

Y. Koren, Collaborative filtering with temporal dynamics, KDD ’09<br />

Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 36


RMSE<br />

0.92<br />

0.915<br />

0.91<br />

0.905<br />

0.9<br />

0.895<br />

0.89<br />

0.885<br />

0.88<br />

0.875<br />

CF (no time bias)<br />

Basic <strong>Latent</strong> <strong>Factor</strong>s<br />

CF (time bias)<br />

<strong>Latent</strong> <strong>Factor</strong>s w/ Biases<br />

+ Linear time factors<br />

+ Per-day user biases<br />

+ CF<br />

1 10 100<br />

Millions of parameters<br />

1000 10000<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 37


Many options for modeling<br />

Variants of the ideas we have seen so far<br />

Different numbers of factors<br />

Different ways to model time<br />

Different ways to handle implicit information<br />

Other models (not described here)<br />

Nearest-neighbor models<br />

Restricted Boltzmann machines<br />

Model averaging is useful….<br />

Linear model combining<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 38


2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 39


June 26 th submission triggers 30-day “last call”<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 40


Ensemble team formed<br />

Group of other teams on leaderboard forms a new team<br />

Relies on combining their models<br />

Quickly also get a qualifying score over 10%<br />

BellKor<br />

Continue to get small improvements in their scores<br />

Realize that they are in direct competition with Ensemble<br />

Strategy<br />

Both teams carefully monitoring the leaderboard<br />

Only sure way to check for improvement is to submit a set<br />

of predictions<br />

This alerts the other team of your latest score<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />

41


Submissions limited to 1 a day<br />

Only 1 final submission could be made in the last 24h<br />

24 hours before deadline…<br />

BellKor team member in Austria notices (by chance) that<br />

Ensemble posts a score that is slightly better than BellKor’s<br />

Frantic last 24 hours for both teams<br />

Much computer time on final optimization<br />

run times carefully calibrated to end about an hour before<br />

deadline<br />

Final submissions<br />

BellKor submits a little early (on purpose), 40 mins before<br />

deadline<br />

Ensemble submits their final entry 20 mins later<br />

….and everyone waits….<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 42


2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 43


2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 44


Some slides and plots borrowed from<br />

Yehuda Koren, Robert Bell and Padhraic<br />

Smyth<br />

Further reading:<br />

Y. Koren, Collaborative filtering with temporal<br />

dynamics, KDD ’09<br />

http://www2.research.att.com/~volinsky/netflix/bpc.html<br />

http://www.the-ensemble.com/<br />

2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!