22.02.2014 Views

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Let E be the event that an incoming message contains the word w. Let S be the<br />

event that an incoming message is spam and contains the word w. Bayes<br />

theorem tells us that the probability that an incoming message containing the<br />

word w is spam is<br />

p(S|E) = p(E|S)p(S) / (p(E|S)p(S) + p(E|S)p(S)).<br />

If we assume that p(S) = p(S) = 0.5, i.e., that any incoming message is equally<br />

likely to be spam or not, then we get the simplified formula<br />

p(S|E) = p(E|S) / (p(E|S) + p(E|S)).<br />

We estimate p(E|S) = p(w) and p(E|S) = q(w). So, we estimate p(S|E) by<br />

r(w) = p(w) / (p(w) + q(w)).<br />

If r(w) is greater than some preset threshold, then we classify the incoming<br />

message as spam. We can consider a threshold <strong>of</strong> 0.9 to begin with.<br />

121<br />

Example: Let w = Rolex. Suppose it occurs in 250 / 2000 spam messages and in<br />

5 / 1000 good messages. We will estimate the probability that an incoming<br />

message with Rolex in it is spam assuming that it is equally likely that the<br />

incoming message is spam or not. We know that p(Rolex) = 250 / 2000 = 0.125<br />

and q(Rolex) = 5 / 1000 = 0.005. So,<br />

r(Rolex) = 0.125 / (0.125 + 0.005) = 0.962 > 0.9.<br />

Hence, we would reject the message as spam. (Note that some <strong>of</strong> us would reject<br />

all messages with the word Rolex in it as spam, but that is another case entirely.)<br />

122

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!