Recommender systems: Latent Factor Models - SNAP - Stanford ...
Recommender systems: Latent Factor Models - SNAP - Stanford ...
Recommender systems: Latent Factor Models - SNAP - Stanford ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
CS246: Mining Massive Datasets<br />
Jure Leskovec, <strong>Stanford</strong> University<br />
http://cs246.stanford.edu
Training data<br />
100 million ratings, 480,000 users, 17,770 movies<br />
6 years of data: 2000-2005<br />
Test data<br />
Last few ratings of each user (2.8 million)<br />
Evaluation criterion: Root Mean Square Error (RMSE)<br />
Netflix Cinematch RMSE: 0.9514<br />
Competition<br />
2700+ teams<br />
$1 million prize for 10% improvement on Cinematch<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 2
480,000<br />
users<br />
1 3 4<br />
3 5 5<br />
4 5 5<br />
3<br />
3<br />
2 2 2<br />
1<br />
17,700 movies<br />
5<br />
2 1 1<br />
3 3<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
3
480,000<br />
users<br />
1 3 4<br />
3 5 5<br />
4 5 5<br />
3<br />
3<br />
2 ? ?<br />
1<br />
17,700 movies<br />
?<br />
2 1 ?<br />
3 ?<br />
SSE = ∑
Basically the winner of the Netflix Challenge<br />
Multi-scale modeling of the data:<br />
Combine top level, regional<br />
modeling of the data, with<br />
a refined, local view:<br />
Global:<br />
Overall deviations of users/movies<br />
<strong>Factor</strong>ization:<br />
Addressing regional effects<br />
CF (k-NN):<br />
Extract local patterns<br />
Global effects<br />
<strong>Factor</strong>ization<br />
CF/NN<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 5
Global:<br />
Mean movie rating: 3.7 stars<br />
The Sixth Sense is 0.5 stars above avg.<br />
Joe rates 0.2 stars below avg.<br />
⇒ Baseline estimation:<br />
Joe will rate The Sixth Sense 4 stars<br />
Local neighborhood (CF/NN):<br />
Joe didn’t like related movie Signs<br />
⇒ Final estimate:<br />
Joe will rate The Sixth Sense 3.8 stars<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 6
Earliest and most popular collaborative<br />
filtering method<br />
Derive unknown ratings from those of “similar”<br />
movies (item-item variant)<br />
Define similarity measure s ij of items i and j<br />
Select k-nearest neighbors, compute the rating<br />
N(i; u): items most similar to i that were rated by u<br />
ˆ<br />
=<br />
∑<br />
j∈N<br />
( i;<br />
u)<br />
s<br />
j∈N<br />
( i;<br />
u)<br />
ij<br />
r<br />
uj<br />
rui ∑<br />
sij… similarity of items i and j<br />
s<br />
ruj…rating of user u on item j<br />
ij<br />
N(i;u)… set of similar items<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
7
In practice we get better estimates if we<br />
model deviations:<br />
^<br />
r = b + ∑<br />
ui ui<br />
baseline estimate for r ui<br />
μ = overall mean rating<br />
b u = rating deviation of user u<br />
= avg. rating of user u – μ<br />
b i = avg. rating of movie i – μ<br />
j∈N(; iu)<br />
∑<br />
j∈N(; iu)<br />
( − )<br />
s r b<br />
ij uj uj<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 8<br />
s<br />
ij<br />
Problems:<br />
1) Similarity measures are arbitrary<br />
2) Pairwise similarities neglect<br />
interdependencies among neighbors<br />
3) Taking a weighted average is<br />
restricting
Use a weighted sum rather than weighted avg.:<br />
How to set w ij?<br />
( )<br />
r = b + ∑ w r −b<br />
^<br />
ui ui j∈N(; iu)<br />
ij uj uj<br />
Allow:<br />
Remember, error metric is SSE: ∑
Find w ij that minimize SSE on training data!<br />
min
So far:<br />
Weights w ij derived based<br />
on their role; no use of an<br />
arbitrary similarity measure<br />
(w ij ≠ s ij)<br />
Explicitly account for<br />
interrelationships among<br />
the neighboring movies<br />
Next: <strong>Latent</strong> factor model<br />
Extract “regional” correlations<br />
( )<br />
r = b + ∑ w r −b<br />
^<br />
ui ui j∈N(; iu)<br />
ij uj uj<br />
Global effects<br />
<strong>Factor</strong>ization<br />
CF/NN<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 11
Geared<br />
towards<br />
females<br />
The Color<br />
Purple<br />
Sense and<br />
Sensibility<br />
The Princess<br />
Diaries<br />
serious<br />
Amadeus<br />
The Lion King<br />
funny<br />
Ocean’s 11<br />
Braveheart<br />
Independence<br />
Day<br />
Lethal<br />
Weapon<br />
Geared<br />
towards<br />
males<br />
Dumb and<br />
Dumber<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
12
1<br />
2<br />
1<br />
4<br />
2<br />
“SVD” on Netflix data: R ≈ Q · P T<br />
3<br />
5<br />
4<br />
4<br />
3<br />
4<br />
1<br />
3<br />
2<br />
5<br />
4<br />
3<br />
5<br />
2<br />
4<br />
3<br />
4<br />
2<br />
5<br />
4<br />
2<br />
3<br />
4<br />
1<br />
5<br />
2<br />
2<br />
4<br />
3<br />
5<br />
≈<br />
items<br />
.1<br />
-.5<br />
-.2<br />
1.1<br />
-.7<br />
-1<br />
-.4<br />
.6<br />
.3<br />
2.1<br />
2.1<br />
.7<br />
Q<br />
.2<br />
.5<br />
.5<br />
.3<br />
-2<br />
.3<br />
1.1<br />
-.8<br />
2.1<br />
For now let’s assume we can approximate the<br />
rating matrix R as a product of “thin” Q · P T<br />
There are important differences between “SVD”<br />
and the real SVD. We will get to them later<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 13<br />
-.2<br />
.7<br />
-.4<br />
.3<br />
.5<br />
.6<br />
.5<br />
1.4<br />
1.7<br />
-2<br />
.3<br />
2.4<br />
-.5<br />
-1<br />
.9<br />
users<br />
.8<br />
1.4<br />
-.3<br />
P T<br />
-.4<br />
2.9<br />
.4<br />
.3<br />
-.7<br />
.8<br />
1.4<br />
1.2<br />
.7<br />
2.4<br />
-.1<br />
-.6<br />
-.9<br />
1.3<br />
.1
How to estimate the missing rating?<br />
items<br />
.1<br />
-.5<br />
-.2<br />
1.1<br />
-.7<br />
-1<br />
-.4<br />
.6<br />
.3<br />
2.1<br />
2.1<br />
.7<br />
Q<br />
.2<br />
.5<br />
.5<br />
.3<br />
-2<br />
.3<br />
items<br />
1<br />
2<br />
1<br />
4<br />
2<br />
1.1<br />
-.8<br />
2.1<br />
3<br />
5<br />
4<br />
4<br />
3<br />
4<br />
1<br />
3<br />
-.2<br />
.7<br />
-.4<br />
users<br />
?<br />
2<br />
5<br />
4<br />
3<br />
5<br />
2<br />
.3<br />
.5<br />
.6<br />
4<br />
3<br />
4<br />
2<br />
.5<br />
1.4<br />
1.7<br />
5<br />
4<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 14<br />
2<br />
3<br />
-2<br />
.3<br />
2.4<br />
4<br />
1<br />
5<br />
2<br />
2<br />
4<br />
3<br />
5<br />
-.5<br />
-1<br />
.9<br />
≈<br />
users<br />
.8<br />
1.4<br />
-.3<br />
P T<br />
-.4<br />
2.9<br />
.4
How to estimate the missing rating?<br />
items<br />
.1<br />
-.5<br />
-.2<br />
1.1<br />
-.7<br />
-1<br />
-.4<br />
.6<br />
.3<br />
2.1<br />
2.1<br />
.7<br />
Q<br />
.2<br />
.5<br />
.5<br />
.3<br />
-2<br />
.3<br />
items<br />
1<br />
2<br />
1<br />
4<br />
2<br />
1.1<br />
-.8<br />
2.1<br />
3<br />
5<br />
4<br />
4<br />
3<br />
4<br />
1<br />
3<br />
-.2<br />
.7<br />
-.4<br />
users<br />
?<br />
2<br />
5<br />
4<br />
3<br />
5<br />
2<br />
.3<br />
.5<br />
.6<br />
4<br />
3<br />
4<br />
2<br />
.5<br />
1.4<br />
1.7<br />
5<br />
4<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 15<br />
2<br />
3<br />
-2<br />
.3<br />
2.4<br />
4<br />
1<br />
5<br />
2<br />
2<br />
4<br />
3<br />
5<br />
-.5<br />
-1<br />
.9<br />
≈<br />
users<br />
.8<br />
1.4<br />
-.3<br />
P T<br />
-.4<br />
2.9<br />
.4
How to estimate the missing rating?<br />
items<br />
.1<br />
-.5<br />
-.2<br />
1.1<br />
-.7<br />
-1<br />
-.4<br />
.6<br />
.3<br />
2.1<br />
2.1<br />
.7<br />
Q<br />
.2<br />
.5<br />
.5<br />
.3<br />
-2<br />
.3<br />
items<br />
1<br />
2<br />
1<br />
4<br />
2<br />
1.1<br />
-.8<br />
2.1<br />
3<br />
5<br />
4<br />
4<br />
3<br />
4<br />
1<br />
3<br />
-.2<br />
.7<br />
-.4<br />
users<br />
2.4<br />
2<br />
5<br />
4<br />
3<br />
5<br />
2<br />
.3<br />
.5<br />
.6<br />
4<br />
3<br />
4<br />
2<br />
.5<br />
1.4<br />
1.7<br />
5<br />
4<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 16<br />
2<br />
3<br />
-2<br />
.3<br />
2.4<br />
4<br />
1<br />
5<br />
2<br />
2<br />
4<br />
3<br />
5<br />
-.5<br />
-1<br />
.9<br />
≈<br />
users<br />
.8<br />
1.4<br />
-.3<br />
P T<br />
-.4<br />
2.9<br />
.4
Geared<br />
towards<br />
females<br />
The Color<br />
Purple<br />
Sense and<br />
Sensibility<br />
The Princess<br />
Diaries<br />
serious<br />
Amadeus<br />
The Lion King<br />
<strong>Factor</strong> 2<br />
funny<br />
Ocean’s 11<br />
Braveheart<br />
Independence<br />
Day<br />
Lethal<br />
Weapon<br />
<strong>Factor</strong> 1<br />
Geared<br />
towards<br />
males<br />
Dumb and<br />
Dumber<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
17
Geared<br />
towards<br />
females<br />
The Color<br />
Purple<br />
Sense and<br />
Sensibility<br />
The Princess<br />
Diaries<br />
serious<br />
Amadeus<br />
The Lion King<br />
<strong>Factor</strong> 2<br />
funny<br />
Ocean’s 11<br />
Braveheart<br />
Independence<br />
Day<br />
Lethal<br />
Weapon<br />
<strong>Factor</strong> 1<br />
Geared<br />
towards<br />
males<br />
Dumb and<br />
Dumber<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
18
Remember SVD:<br />
A: Input data matrix<br />
U: Left singular vecs<br />
V: Right singular vecs<br />
m<br />
Σ: Singular values<br />
U<br />
SVD gives minimum reconstruction error (MSE!)<br />
min
1<br />
2<br />
1<br />
4<br />
2<br />
3<br />
5<br />
4<br />
4<br />
3<br />
4<br />
1<br />
3<br />
2<br />
5<br />
4<br />
3<br />
5<br />
2<br />
4<br />
3<br />
4<br />
2<br />
5<br />
4<br />
2<br />
3<br />
4<br />
1<br />
5<br />
2<br />
2<br />
4<br />
3<br />
5<br />
≈<br />
items<br />
.1<br />
-.5<br />
-.2<br />
1.1<br />
-.7<br />
-1<br />
-.4<br />
.6<br />
.3<br />
2.1<br />
2.1<br />
.7<br />
SVD isn’t defined when entries are missing<br />
Use specialized methods to find P, Q<br />
min
Want to minimize SSE for test data<br />
Idea: Minimize SSE on Training data<br />
Want large f (# of factors) to capture all the signals<br />
But, test SSE begins to rise for f > 2<br />
Regularization is needed<br />
Allow rich model where there are sufficient data<br />
Shrink aggressively where data are scarce<br />
min<br />
P,<br />
Q<br />
∑<br />
training<br />
( r<br />
ui<br />
“error”<br />
λ… regularization parameter<br />
T 2 ⎡<br />
− + ⎢∑<br />
2<br />
q<br />
+ ∑<br />
i pu<br />
) λ pu<br />
⎣ u<br />
i<br />
“length”<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 21<br />
q<br />
i<br />
2<br />
1 3 4<br />
3 5 5<br />
4 5 5<br />
3<br />
3<br />
2 ? ?<br />
?<br />
2 1 ?<br />
3 ?<br />
1<br />
⎤<br />
⎥<br />
⎦
min<br />
P,<br />
Q<br />
Geared<br />
towards<br />
females<br />
∑<br />
training<br />
The Color<br />
Purple<br />
Sense and<br />
Sensibility<br />
The Princess<br />
Diaries<br />
T 2 ⎡ 2<br />
( rui<br />
− qi<br />
pu<br />
) + λ⎢∑<br />
pu<br />
+ ∑ qi<br />
⎣ u<br />
i<br />
⎤<br />
⎥<br />
⎦<br />
min factors “error” + λ “length”<br />
2<br />
serious<br />
Amadeus<br />
The Lion King<br />
<strong>Factor</strong> 2<br />
funny<br />
Ocean’s 11<br />
Braveheart<br />
Independence<br />
Day<br />
Lethal<br />
Weapon<br />
<strong>Factor</strong> 1<br />
Geared<br />
towards<br />
males<br />
Dumb and<br />
Dumber<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
22
min<br />
P,<br />
Q<br />
Geared<br />
towards<br />
females<br />
∑<br />
training<br />
The Color<br />
Purple<br />
Sense and<br />
Sensibility<br />
The Princess<br />
Diaries<br />
T 2 ⎡ 2<br />
( rui<br />
− qi<br />
pu<br />
) + λ⎢∑<br />
pu<br />
+ ∑ qi<br />
⎣ u<br />
i<br />
⎤<br />
⎥<br />
⎦<br />
min factors “error” + λ “length”<br />
2<br />
serious<br />
Amadeus<br />
The Lion King<br />
<strong>Factor</strong> 2<br />
funny<br />
Ocean’s 11<br />
Braveheart<br />
Independence<br />
Day<br />
Lethal<br />
Weapon<br />
<strong>Factor</strong> 1<br />
Geared<br />
towards<br />
males<br />
Dumb and<br />
Dumber<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
23
min<br />
P,<br />
Q<br />
Geared<br />
towards<br />
females<br />
∑<br />
training<br />
The Color<br />
Purple<br />
Sense and<br />
Sensibility<br />
The Princess<br />
Diaries<br />
T 2 ⎡ 2<br />
( rui<br />
− qi<br />
pu<br />
) + λ⎢∑<br />
pu<br />
+ ∑ qi<br />
⎣ u<br />
i<br />
⎤<br />
⎥<br />
⎦<br />
min factors “error” + λ “length”<br />
2<br />
serious<br />
Amadeus<br />
The Lion King<br />
<strong>Factor</strong> 2<br />
funny<br />
Ocean’s 11<br />
Braveheart<br />
Independence<br />
Day<br />
Lethal<br />
Weapon<br />
<strong>Factor</strong> 1<br />
Geared<br />
towards<br />
males<br />
Dumb and<br />
Dumber<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
24
min<br />
P,<br />
Q<br />
Geared<br />
towards<br />
females<br />
∑<br />
training<br />
The Color<br />
Purple<br />
Sense and<br />
Sensibility<br />
The Princess<br />
Diaries<br />
T 2 ⎡ 2<br />
( rui<br />
− qi<br />
pu<br />
) + λ⎢∑<br />
pu<br />
+ ∑ qi<br />
⎣ u<br />
i<br />
⎤<br />
⎥<br />
⎦<br />
min factors “error” + λ “length”<br />
2<br />
serious<br />
Amadeus<br />
The Lion King<br />
<strong>Factor</strong> 2<br />
funny<br />
Ocean’s 11<br />
Braveheart<br />
Independence<br />
Day<br />
Lethal<br />
Weapon<br />
<strong>Factor</strong> 1<br />
Geared<br />
towards<br />
males<br />
Dumb and<br />
Dumber<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
25
Want to find matrices P and Q:<br />
, Q<br />
∑(<br />
rui<br />
training<br />
T 2 ⎡<br />
qi<br />
pu<br />
) + λ⎢<br />
⎣ u<br />
pu<br />
2<br />
+<br />
min<br />
P<br />
− ∑ ∑<br />
Online “stochastic” gradient decent:<br />
Initialize P and Q (random, using SVD)<br />
Then iterate over the ratings and update factors:<br />
For each r ui:
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
Koren, Bell, Volinksy, IEEE Computer, 2009<br />
27
user bias<br />
movie bias<br />
Baseline predictor<br />
Separates users and movies<br />
Benefits from insights into user’s<br />
behavior<br />
Among the main practical<br />
contributions of the competition<br />
μ = overall mean rating<br />
b u = bias of user u<br />
b i = bias of movie i<br />
user-movie interaction<br />
User-Movie interaction<br />
Characterizes the matching between<br />
users and movies<br />
Attracts most research in the field<br />
Benefits from algorithmic and<br />
mathematical innovations<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 28
We have expectations on the rating by<br />
user u of movie i, even without estimating u’s<br />
attitude towards movies like i<br />
– Rating scale of user u<br />
– Values of other ratings user<br />
gave recently (day-specific<br />
mood, anchoring, multi-user<br />
accounts)<br />
– (Recent) popularity of movie i<br />
– Selection bias; related to<br />
number of ratings user gave on<br />
the same day (“frequency”)<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
29
We have expectations on the rating by<br />
user u of movie i, even without estimating u’s<br />
attitude towards movies like i<br />
– Rating scale of user u<br />
– Values of other ratings user<br />
gave recently (day-specific<br />
mood, anchoring, multi-user<br />
accounts)<br />
– (Recent) popularity of movie i<br />
– Selection bias; related to<br />
number of ratings user gave on<br />
the same day (“frequency”)<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
29
Solve:<br />
min<br />
Q,<br />
P<br />
∑<br />
( u,<br />
i)<br />
∈R<br />
( T<br />
r − ( + b + b + q p ) )<br />
µ<br />
ui<br />
+ λ<br />
λ Is typically selected<br />
via grid-search on a<br />
validation set<br />
u i i<br />
goodness of fit<br />
( )<br />
2 2 2 2<br />
q + p + b + b<br />
i<br />
regularization<br />
Stochastic gradient decent to find parameters<br />
Note: Both biases (b u, b i) as well as interactions (q i,<br />
p u) are treated as parameters (we estimate them)<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 31<br />
u<br />
u<br />
2<br />
u<br />
i
Basic Collaborative filtering: 0.94<br />
Collaborative filtering++: 0.91<br />
<strong>Latent</strong> factors: 0.90<br />
<strong>Latent</strong> factors+Biases: 0.89<br />
Global average: 1.1296<br />
User average: 1.0651<br />
Movie average: 1.0533<br />
Netflix: 0.9514<br />
Final BellKor: 0.869<br />
Grand Prize: 0.8563<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 32
RMSE<br />
0.92<br />
0.915<br />
0.91<br />
0.905<br />
0.9<br />
0.895<br />
0.89<br />
0.885<br />
1 10 100 1000<br />
Millions of parameters<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
CF (no time bias)<br />
Basic <strong>Latent</strong> <strong>Factor</strong>s<br />
<strong>Latent</strong> <strong>Factor</strong>s w/ Biases<br />
33
Sudden rise in the<br />
average movie rating<br />
(early 2004)<br />
Improvements in Netflix<br />
GUI improvements<br />
Meaning of rating changed<br />
Movie age<br />
Users prefer new movies<br />
without any reasons<br />
Older movies are just<br />
inherently better than<br />
newer ones<br />
Y. Koren, Collaborative filtering with<br />
temporal dynamics, KDD ’09<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 35
Original model:<br />
r ui = µ +b u + b i + q i ·p u T<br />
Add time dependence to biases:<br />
r ui = µ +b u(t)+ b i(t) +q i · p u T<br />
Make parameters b u and b i to depend on time<br />
(1) Parameterize time-dependence by linear trends<br />
(2) Each bin corresponds to 10 consecutive weeks<br />
Add temporal dependence to factors<br />
p u(t)… user preference vector on day t<br />
2/6/2012<br />
Y. Koren, Collaborative filtering with temporal dynamics, KDD ’09<br />
Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 36
RMSE<br />
0.92<br />
0.915<br />
0.91<br />
0.905<br />
0.9<br />
0.895<br />
0.89<br />
0.885<br />
0.88<br />
0.875<br />
CF (no time bias)<br />
Basic <strong>Latent</strong> <strong>Factor</strong>s<br />
CF (time bias)<br />
<strong>Latent</strong> <strong>Factor</strong>s w/ Biases<br />
+ Linear time factors<br />
+ Per-day user biases<br />
+ CF<br />
1 10 100<br />
Millions of parameters<br />
1000 10000<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 37
Many options for modeling<br />
Variants of the ideas we have seen so far<br />
Different numbers of factors<br />
Different ways to model time<br />
Different ways to handle implicit information<br />
Other models (not described here)<br />
Nearest-neighbor models<br />
Restricted Boltzmann machines<br />
Model averaging is useful….<br />
Linear model combining<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 38
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 39
June 26 th submission triggers 30-day “last call”<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 40
Ensemble team formed<br />
Group of other teams on leaderboard forms a new team<br />
Relies on combining their models<br />
Quickly also get a qualifying score over 10%<br />
BellKor<br />
Continue to get small improvements in their scores<br />
Realize that they are in direct competition with Ensemble<br />
Strategy<br />
Both teams carefully monitoring the leaderboard<br />
Only sure way to check for improvement is to submit a set<br />
of predictions<br />
This alerts the other team of your latest score<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets<br />
41
Submissions limited to 1 a day<br />
Only 1 final submission could be made in the last 24h<br />
24 hours before deadline…<br />
BellKor team member in Austria notices (by chance) that<br />
Ensemble posts a score that is slightly better than BellKor’s<br />
Frantic last 24 hours for both teams<br />
Much computer time on final optimization<br />
run times carefully calibrated to end about an hour before<br />
deadline<br />
Final submissions<br />
BellKor submits a little early (on purpose), 40 mins before<br />
deadline<br />
Ensemble submits their final entry 20 mins later<br />
….and everyone waits….<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 42
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 43
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 44
Some slides and plots borrowed from<br />
Yehuda Koren, Robert Bell and Padhraic<br />
Smyth<br />
Further reading:<br />
Y. Koren, Collaborative filtering with temporal<br />
dynamics, KDD ’09<br />
http://www2.research.att.com/~volinsky/netflix/bpc.html<br />
http://www.the-ensemble.com/<br />
2/6/2012 Jure Leskovec, <strong>Stanford</strong> C246: Mining Massive Datasets 45