08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Using Naive Bayes to classify<br />

Given a new tweet, the only part left is to simply calculate the probabilities:<br />

Chapter 6<br />

We also need to choose the class having the higher probability. As for both<br />

classes the denominator, , is the same, so we can simply ignore it <strong>with</strong>out<br />

changing the winner class.<br />

Note, however, that we don't calculate any real probabilities any more. Instead,<br />

we are estimating which class is more likely given the evidence. This is another<br />

reason why Naive Bayes is so robust: it is not so much interested in the real<br />

probabilities, but only in the information which class is more likely to. In short,<br />

we can write it as follows:<br />

Here we are calculating the part after argmax for all classes of C ("pos" and "neg"<br />

in our case) and returning the class that results in the highest value.<br />

But for the following example, let us stick to real probabilities and do some<br />

calculations to see how Naive Bayes works. For the sake of simplicity, we will<br />

assume that Twitter allows only for the two words mentioned earlier, awesome<br />

and crazy, and that we had already manually classified a handful of tweets:<br />

Tweet<br />

awesome<br />

awesome<br />

awesome crazy<br />

crazy<br />

crazy<br />

crazy<br />

Class<br />

Positive<br />

Positive<br />

Positive<br />

Positive<br />

Negative<br />

Negative<br />

[ 121 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!