08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Classification – Detecting Poor Answers<br />

But still, this would mean that we could classify roughly four out of the ten wrong<br />

answers. At least we are heading in the right direction. More features lead to higher<br />

accuracy, which leads us to adding more features. Therefore, let us extend the feature<br />

space <strong>with</strong> even more features:<br />

• AvgSentLen: This feature measures the average number of words in a<br />

sentence. Maybe there is a pattern that particularly good posts don't overload<br />

the reader's brain <strong>with</strong> very long sentences.<br />

• AvgWordLen: This feature is similar to AvgSentLen; it measures the average<br />

number of characters in the words of a post.<br />

• NumAllCaps: This feature measures the number of words that are written in<br />

uppercase, which is considered a bad style.<br />

• NumExclams: This feature measures the number of exclamation marks.<br />

The following charts show the value distributions for average sentences and word<br />

lengths as well as the number of uppercase words and exclamation marks:<br />

[ 100 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!