10.11.2016 Views

Learning Data Mining with Python

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Predicting Sports Winners <strong>with</strong> Decision Trees<br />

There are many possible features we could use, but we will try the following<br />

questions:<br />

• Which team is considered better generally?<br />

• Which team won their last encounter?<br />

We will also try putting the raw teams into the algorithm to check whether the<br />

algorithm can learn a model that checks how different teams play against each other.<br />

Putting it all together<br />

For the first feature, we will create a feature that tells us if the home team is generally<br />

better than the visitors. To do this, we will load the standings (also called a ladder in<br />

some sports) from the NBA in the previous season. A team will be considered better<br />

if it ranked higher in 2013 than the other team.<br />

To obtain the standings data, perform the following steps:<br />

1. Navigate to http://www.basketball-reference.com/leagues/NBA_2013_<br />

standings.html in your web browser.<br />

2. Select Expanded Standings to get a single list for the entire league.<br />

3. Click on the Export link.<br />

4. Save the downloaded file in your data folder.<br />

Back in your I<strong>Python</strong> Notebook, enter the following lines into a new cell. You'll need<br />

to ensure that the file was saved into the location pointed to by the data_folder<br />

variable. The code is as follows:<br />

standings_filename = os.path.join(data_folder,<br />

"leagues_NBA_2013_standings_expanded-standings.csv")<br />

standings = pd.read_csv(standings_filename, skiprows=[0,1])<br />

You can view the ladder by just typing standings into a new cell and running<br />

the code:<br />

Standings<br />

[ 50 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!