10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Recommending Movies Using Affinity Analysis<br />

print("Rule: If a person recommends {0} they will also<br />

recommend {1}".format(premise, conclusion))<br />

print(" - Confidence:<br />

{0:.3f}".format(rule_confidence[(premise, conclusion)]))<br />

print("")<br />

The result is as follows:<br />

Rule #1<br />

Rule: If a person recommends frozenset({64, 56, 98, 50, 7}) they will<br />

also recommend 174<br />

- Confidence: 1.000<br />

Rule #2<br />

Rule: If a person recommends frozenset({98, 100, 172, 79, 50, 56})<br />

they will also recommend 7<br />

- Confidence: 1.000<br />

Rule #3<br />

Rule: If a person recommends frozenset({98, 172, 181, 174, 7}) they<br />

will also recommend 50<br />

- Confidence: 1.000<br />

Rule #4<br />

Rule: If a person recommends frozenset({64, 98, 100, 7, 172, 50}) they<br />

will also recommend 174<br />

- Confidence: 1.000<br />

Rule #5<br />

Rule: If a person recommends frozenset({64, 1, 7, 172, 79, 50}) they<br />

will also recommend 181<br />

- Confidence: 1.000<br />

The resulting printout shows only the movie IDs, which isn't very helpful <strong>with</strong>out<br />

the names of the movies also. The dataset came <strong>with</strong> a file called u.items, which<br />

stores the movie names and their corresponding MovieID (as well as other<br />

information, such as the genre).<br />

We can load the titles from this file using pandas. Additional information about<br />

the file and categories is available in the README that came <strong>with</strong> the dataset.<br />

The data in the files is in CSV format, but <strong>with</strong> data separated by the | symbol;<br />

it has no header and the encoding is important to set. The column names were<br />

found in the README file.<br />

movie_name_filename = os.path.join(data_folder, "u.item")<br />

movie_name_data = pd.read_csv(movie_name_filename, delimiter="|",<br />

header=None, encoding = "mac-roman")<br />

[ 74 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!