10.11.2016 Views

Learning Data Mining with Python

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 3<br />

If you are facing trouble extracting features of these types, check the pandas<br />

documentation at http://pandas.pydata.org/pandas-docs/stable/ for help.<br />

Alternatively, you can try an online forum such as Stack Overflow for assistance.<br />

More extreme examples could use player data to estimate the strength of each<br />

team's sides to predict who won. These types of complex features are used every<br />

day by gamblers and sports betting agencies to try to turn a profit by predicting the<br />

outcome of sports matches.<br />

Summary<br />

In this chapter, we extended our use of scikit-learn's classifiers to perform<br />

classification and introduced the pandas library to manage our data. We analyzed<br />

real-world data on basketball results from the NBA, saw some of the problems that<br />

even well-curated data introduces, and created new features for our analysis.<br />

We saw the effect that good features have on performance and used an ensemble<br />

algorithm, Random forests, to further improve the accuracy.<br />

In the next chapter, we will extend the affinity analysis that we performed in the first<br />

chapter to create a program to find similar books. We will see how to use algorithms<br />

for ranking and also use approximation to improve the scalability of data mining.<br />

[ 59 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!