08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table of Contents<br />

Tuning the instance 90<br />

Tuning the classifier 90<br />

Fetching the data 91<br />

Slimming the data down to chewable chunks 92<br />

Preselection and processing of attributes 93<br />

Defining what is a good answer 94<br />

Creating our first classifier 95<br />

Starting <strong>with</strong> the k-nearest neighbor (kNN) algorithm 95<br />

Engineering the features 96<br />

Training the classifier 97<br />

Measuring the classifier's performance 97<br />

Designing more features 98<br />

Deciding how to improve 101<br />

Bias-variance and its trade-off 102<br />

Fixing high bias 102<br />

Fixing high variance 103<br />

High bias or low bias 103<br />

Using logistic regression 105<br />

A bit of math <strong>with</strong> a small example 106<br />

Applying logistic regression to our postclassification problem 108<br />

Looking behind accuracy – precision and recall 110<br />

Slimming the classifier 114<br />

Ship it! 115<br />

Summary 115<br />

Chapter 6: Classification II – Sentiment Analysis 117<br />

Sketching our roadmap 117<br />

Fetching the Twitter data 118<br />

Introducing the Naive Bayes classifier 118<br />

Getting to know the Bayes theorem 119<br />

Being naive 120<br />

Using Naive Bayes to classify 121<br />

Accounting for unseen words and other oddities 124<br />

Accounting for arithmetic underflows 125<br />

Creating our first classifier and tuning it 127<br />

Solving an easy problem first 128<br />

Using all the classes 130<br />

Tuning the classifier's parameters 132<br />

Cleaning tweets 136<br />

Taking the word types into account 138<br />

Determining the word types 139<br />

[ iii ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!