01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Classification – Detecting Poor Answers<br />

But still, this would mean that we could classify roughly four out of the ten wrong<br />

answers. At least we are heading in the right direction. More features lead to higher<br />

accuracy, which leads us to adding more features. Therefore, let us extend the feature<br />

space with even more features:<br />

• AvgSentLen: This feature measures the average number of words in a<br />

sentence. Maybe there is a pattern that particularly good posts don't overload<br />

the reader's brain with very long sentences.<br />

• AvgWordLen: This feature is similar to AvgSentLen; it measures the average<br />

number of characters in the words of a post.<br />

• NumAllCaps: This feature measures the number of words that are written in<br />

uppercase, which is considered a bad style.<br />

• NumExclams: This feature measures the number of exclamation marks.<br />

The following charts show the value distributions for average sentences and word<br />

lengths as well as the number of uppercase words and exclamation marks:<br />

[ 100 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!