10.11.2016 Views

Learning Data Mining with Python

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Predicting Sports Winners <strong>with</strong> Decision Trees<br />

Next, we use the cross_val_score function to test the result. First, we extract<br />

the dataset:<br />

X_homehigher = dataset[["HomeLastWin", "VisitorLastWin",<br />

"HomeTeamRanksHigher"]].values<br />

Then, we create a new DecisionTreeClassifier and run the evaluation:<br />

clf = DecisionTreeClassifier(random_state=14)<br />

scores = cross_val_score(clf, X_homehigher, y_true,<br />

scoring='accuracy')<br />

print("Accuracy: {0:.1f}%".format(np.mean(scores) * 100))<br />

This now scores 60.3 percent—even better than our previous result. Can we<br />

do better?<br />

Next, let's test which of the two teams won their last match. While rankings can give<br />

some hints on who won (the higher ranked team is more likely to win), sometimes<br />

teams play better against other teams. There are many reasons for this – for example,<br />

some teams may have strategies that work against other teams really well. Following<br />

our previous pattern, we create a dictionary to store the winner of the past game and<br />

create a new feature in our data frame. The code is as follows:<br />

last_match_winner = defaultdict(int)<br />

dataset["HomeTeamWonLast"] = 0<br />

Then, we iterate over each row and get the home team and visitor team:<br />

for index, row in dataset.iterrows():<br />

home_team = row["Home Team"]<br />

visitor_team = row["Visitor Team"]<br />

We want to see who won the last game between these two teams regardless of which<br />

team was playing at home. Therefore, we sort the team names alphabetically, giving<br />

us a consistent key for those two teams:<br />

teams = tuple(sorted([home_team, visitor_team]))<br />

We look up in our dictionary to see who won the last encounter between the two<br />

teams. Then, we update the row in the dataset data frame:<br />

row["HomeTeamWonLast"] = 1 if last_match_winner[teams] ==<br />

row["Home Team"] else 0<br />

dataset.ix[index] = row<br />

[ 52 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!