08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table of Contents<br />

<strong>Building</strong> more complex classifiers 40<br />

A more complex dataset and a more complex classifier 41<br />

<strong>Learning</strong> about the Seeds dataset 42<br />

Features and feature engineering 43<br />

Nearest neighbor classification 44<br />

Binary and multiclass classification 47<br />

Summary 48<br />

Chapter 3: Clustering – Finding Related Posts 49<br />

Measuring the relatedness of posts 50<br />

How not to do it 50<br />

How to do it 51<br />

Preprocessing – similarity measured as similar number<br />

of common words 51<br />

Converting raw text into a bag-of-words 52<br />

Counting words 53<br />

Normalizing the word count vectors 56<br />

Removing less important words 56<br />

Stemming 57<br />

Installing and using NLTK 58<br />

Extending the vectorizer <strong>with</strong> NLTK's stemmer 59<br />

Stop words on steroids 60<br />

Our achievements and goals 61<br />

Clustering 62<br />

KMeans 63<br />

Getting test data to evaluate our ideas on 65<br />

Clustering posts 67<br />

Solving our initial challenge 68<br />

Another look at noise 71<br />

Tweaking the parameters 72<br />

Summary 73<br />

Chapter 4: Topic Modeling 75<br />

Latent Dirichlet allocation (LDA) 75<br />

<strong>Building</strong> a topic model 76<br />

Comparing similarity in topic space 80<br />

Modeling the whole of Wikipedia 83<br />

Choosing the number of topics 86<br />

Summary 87<br />

Chapter 5: Classification – Detecting Poor Answers 89<br />

Sketching our roadmap 90<br />

<strong>Learning</strong> to classify classy answers 90<br />

[ ii ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!