08.06.2015 Views

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

Building Machine Learning Systems with Python - Richert, Coelho

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Classification II – Sentiment Analysis<br />

The prior and evidence values are easily determined:<br />

• is the prior probability of class <strong>with</strong>out knowing about the data. This<br />

quantity can be obtained by simply calculating the fraction of all training<br />

data instances belonging to that particular class.<br />

• is the evidence, or the probability of features and . This can be<br />

retrieved by calculating the fraction of all training data instances having that<br />

particular feature value.<br />

• The tricky part is the calculation of the likelihood . It is the value<br />

describing how likely it is to see feature values and if we know that the<br />

class of the data instance is . To estimate this we need a bit more thinking.<br />

Being naive<br />

From the probability theory, we also know the following relationship:<br />

This alone, however, does not help much, since we treat one difficult problem<br />

(estimating ) <strong>with</strong> another one (estimating ).<br />

However, if we naively assume that and are independent from each other,<br />

simplifies to and we can write it as follows:<br />

Putting everything together, we get this quite manageable formula:<br />

The interesting thing is that although it is not theoretically correct to simply tweak<br />

our assumptions when we are in the mood to do so, in this case it proves to work<br />

astonishingly well in real-world applications.<br />

[ 120 ]

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!