08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Classification II – Sentiment Analysis<br />

In this case, we have six total tweets, out of which four are positive and two negative,<br />

which results in the following priors:<br />

This means, <strong>with</strong>out knowing anything about the tweet itself, we would be wise in<br />

assuming the tweet to be positive.<br />

The piece that is still missing is the calculation of and , which are the<br />

probabilities for the two features and conditioned on class C.<br />

This is calculated as the number of tweets in which we have seen that the concrete<br />

feature is divided by the number of tweets that have been labeled <strong>with</strong> the class of<br />

. Let's say we want to know the probability of seeing awesome occurring once in a<br />

tweet knowing that its class is "positive"; we would have the following:<br />

Since out of the four positive tweets three contained the word awesome, obviously<br />

the probability for not having awesome in a positive tweet is its inverse as we have<br />

seen only tweets <strong>with</strong> the counts 0 or 1:<br />

Similarly for the rest (omitting the case that a word is not occurring in a tweet):<br />

For the sake of completeness, we will also compute the evidence so that we can see<br />

real probabilities in the following example tweets. For two concrete values of and<br />

,we can calculate the evidence as follows:<br />

[ 122 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!