10.11.2016 Views

Learning Data Mining with Python

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Recommending Movies Using Affinity Analysis<br />

Extracting association rules<br />

After the Apriori algorithm has completed, we have a list of frequent itemsets.<br />

These aren't exactly association rules, but they are similar to it. A frequent itemset<br />

is a set of items <strong>with</strong> a minimum support, while an association rule has a premise<br />

and a conclusion.<br />

We can make an association rule from a frequent itemset by taking one of the movies<br />

in the itemset and denoting it as the conclusion. The other movies in the itemset will<br />

be the premise. This will form rules of the following form: if a reviewer recommends all<br />

of the movies in the premise, they will also recommend the conclusion.<br />

For each itemset, we can generate a number of association rules by setting each<br />

movie to be the conclusion and the remaining movies as the premise.<br />

In code, we first generate a list of all of the rules from each of the frequent itemsets,<br />

by iterating over each of the discovered frequent itemsets of each length:<br />

candidate_rules = []<br />

for itemset_length, itemset_counts in frequent_itemsets.items():<br />

for itemset in itemset_counts.keys():<br />

We then iterate over every movie in this itemset, using it as our conclusion.<br />

The remaining movies in the itemset are the premise. We save the premise and<br />

conclusion as our candidate rule:<br />

for conclusion in itemset:<br />

premise = itemset - set((conclusion,))<br />

candidate_rules.append((premise, conclusion))<br />

This returns a very large number of candidate rules. We can see some by printing<br />

out the first few rules in the list:<br />

print(candidate_rules[:5])<br />

The resulting output shows the rules that were obtained:<br />

[(frozenset({79}), 258), (frozenset({258}), 79), (frozenset({50}),<br />

64), (frozenset({64}), 50), (frozenset({127}), 181)]<br />

In these rules, the first part (the frozenset) is the list of movies in the premise,<br />

while the number after it is the conclusion. In the first case, if a reviewer<br />

recommends movie 79, they are also likely to recommend movie 258.<br />

Next, we compute the confidence of each of these rules. This is performed much<br />

like in Chapter 1, Getting Started <strong>with</strong> <strong>Data</strong> <strong>Mining</strong>, <strong>with</strong> the only changes being<br />

those necessary for computing using the new data format.<br />

[ 72 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!