10.11.2016 Views

Learning Data Mining with Python

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Recommending Movies Using Affinity Analysis<br />

The format given here represents the full matrix, but in a more compact way.<br />

The first row indicates that user #196 reviewed movie #242, giving it a ranking<br />

of 3 (out of five) on the December 4, 1997.<br />

Any combination of user and movie that isn't in this database is assumed to not exist.<br />

This saves significant space, as opposed to storing a bunch of zeroes in memory. This<br />

type of format is called a sparse matrix format. As a rule of thumb, if you expect<br />

about 60 percent or more of your dataset to be empty or zero, a sparse format will<br />

take less space to store.<br />

When computing on sparse matrices, the focus isn't usually on the data we don't<br />

have—comparing all of the zeroes. We usually focus on the data we have and<br />

compare those.<br />

The Apriori implementation<br />

The goal of this chapter is to produce rules of the following form: if a person<br />

recommends these movies, they will also recommend this movie. We will also discuss<br />

extensions where a person recommends a set of movies is likely to recommend<br />

another particular movie.<br />

To do this, we first need to determine if a person recommends a movie. We can<br />

do this by creating a new feature Favorable, which is True if the person gave a<br />

favorable review to a movie:<br />

all_ratings["Favorable"] = all_ratings["Rating"] > 3<br />

We can see the new feature by viewing the dataset:<br />

all_ratings[10:15]<br />

UserID MovieID Rating Datetime Favorable<br />

10 62 257 2 1997-11-12 22:07:14 False<br />

11 286 1014 5 1997-11-17 15:38:45 True<br />

12 200 222 5 1997-10-05 09:05:40 True<br />

13 210 40 3 1998-03-27 21:59:54 False<br />

14 224 29 3 1998-02-21 23:40:57 False<br />

[ 66 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!