10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Predicting Sports Winners <strong>with</strong> Decision Trees<br />

We first create a (default) dictionary to store the team's last result:<br />

from collections import defaultdict<br />

won_last = defaultdict(int)<br />

The key of this dictionary will be the team and the value will be whether they won<br />

their previous game. We can then iterate over all the rows and update the current<br />

row <strong>with</strong> the team's last result:<br />

for index, row in dataset.iterrows():<br />

home_team = row["Home Team"]<br />

visitor_team = row["Visitor Team"]<br />

row["HomeLastWin"] = won_last[home_team]<br />

row["VisitorLastWin"] = won_last[visitor_team]<br />

dataset.ix[index] = row<br />

Note that the preceding code relies on our dataset being in chronological order. Our<br />

dataset is in order; however, if you are using a dataset that is not in order, you will<br />

need to replace dataset.iterrows() <strong>with</strong> dataset.sort("Date").iterrows().<br />

We then set our dictionary <strong>with</strong> the each team's result (from this row) for the next<br />

time we see these teams. The code is as follows:<br />

won_last[home_team] = row["HomeWin"]<br />

won_last[visitor_team] = not row["HomeWin"]<br />

After the preceding code runs, we will have two new features: HomeLastWin and<br />

VisitorLastWin. We can have a look at the dataset. There isn't much point in<br />

looking at the first five games though. Due to the way our code runs, we didn't have<br />

data for them at that point. Therefore, until a team's second game of the season, we<br />

won't know their current form. We can instead look at different places in the list.<br />

The following code will show the 20th to the 25th games of the season:<br />

dataset.ix[20:25]<br />

Here's the output:<br />

[ 46 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!