01.04.2015 Views

1FfUrl0

1FfUrl0

1FfUrl0

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5<br />

But this is not enough, and it comes at the price of lower classification runtime<br />

performance. Take, for instance, the value of k = 90, where we have a very low test<br />

error. To classify a new post, we need to find the 90 nearest other posts to decide<br />

whether the new post is a good one or not:<br />

Clearly, we seem to be facing an issue with using the nearest neighbor algorithm<br />

for our scenario. It also has another real disadvantage. Over time, we will get more<br />

and more posts to our system. As the nearest neighbor method is an instance-based<br />

approach, we will have to store all the posts in our system. The more posts we get,<br />

the slower the prediction will be. This is different with model-based approaches<br />

where you try to derive a model from the data.<br />

So here we are, with enough reasons now to abandon the nearest neighbor approach<br />

and look for better places in the classification world. Of course, we will never know<br />

whether there is the one golden feature we just did not happen to think of. But for<br />

now, let's move on to another classification method that is known to work great in<br />

text-based classification scenarios.<br />

Using logistic regression<br />

Contrary to its name, logistic regression is a classification method, and is very<br />

powerful when it comes to text-based classification. It achieves this by first<br />

performing regression on a logistic function, hence the name.<br />

[ 105 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!