Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

More documents

Recommendations

Info

Now we calculate (a): p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|F)p(F)) = (0.99$10 45 ) / (0.99$10 45 + 0.005$0.99999) = 0.002. Roughly 0.2% of people who test positive actually have the disease. Getting a positive should not be an immediate cause for alarm (famous last words). Now we calculate (b): p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|F)p(F)) (0.995$0.99999) / (0.995$0.99999 + 0.01$10 45 ) = 0.9999999. Thus, 99.99999% of people who test negative really do not have the disease. 119 Bayesian Spam Filters used to be the first line of defense for email programs. Like many good things, the spammers ran right over the process in about two years. However, it is an interesting example of useful discrete mathematics. The filtering involves a training period. Email messages need to be marked as Good or Bad messages, which we will denote as being the G or B sets. Eventually the filter will mark messages for you, hopefully accurately. The filter finds all of the words in both sets and keeps a running total of each word per set. We construct two functions n G (w) and n B (w) that return the number of messages containing the word w in the G and B sets, respectively. We use a uniform distribution. The empirical probability that a spam message contains the word w is p(w) = n B (w) / |B|. The empirical probability that a nonspam message contains the word w is q(w) = n G (w) / |G|. We can use p and q to estimate if an incoming message is or is not spam based on a set of words that we build dynamically over time. 120
Let E be the event that an incoming message contains the word w. Let S be the event that an incoming message is spam and contains the word w. Bayes theorem tells us that the probability that an incoming message containing the word w is spam is p(S|E) = p(E|S)p(S) / (p(E|S)p(S) + p(E|S)p(S)). If we assume that p(S) = p(S) = 0.5, i.e., that any incoming message is equally likely to be spam or not, then we get the simplified formula p(S|E) = p(E|S) / (p(E|S) + p(E|S)). We estimate p(E|S) = p(w) and p(E|S) = q(w). So, we estimate p(S|E) by r(w) = p(w) / (p(w) + q(w)). If r(w) is greater than some preset threshold, then we classify the incoming message as spam. We can consider a threshold of 0.9 to begin with. 121 Example: Let w = Rolex. Suppose it occurs in 250 / 2000 spam messages and in 5 / 1000 good messages. We will estimate the probability that an incoming message with Rolex in it is spam assuming that it is equally likely that the incoming message is spam or not. We know that p(Rolex) = 250 / 2000 = 0.125 and q(Rolex) = 5 / 1000 = 0.005. So, r(Rolex) = 0.125 / (0.125 + 0.005) = 0.962 > 0.9. Hence, we would reject the message as spam. (Note that some of us would reject all messages with the word Rolex in it as spam, but that is another case entirely.) 122
Page 1 and 2:
Discrete Mathematics University of
Page 3 and 4:
Definition: A proposition is a stat
Page 5 and 6:
We can compound logical operators t
Page 7 and 8:
Theorem: p" (q!r) ' (p"q)!(p"r). Pr
Page 9 and 10: Propositional logic is pretty limit
Page 11 and 12: Rules of Inference are used instead
Page 13 and 14: Sets, Functions, Sequences, and Sum
Page 15 and 16: Definition: n sets A i are disjoint
Page 17 and 18: Definition: The composition of n fu
Page 19 and 20: Some other common summations with c
Page 21 and 22: Notation: Timing, as a function of
Page 23 and 24: ! $ procedure binary_search(x, " a
Page 25 and 26: Definition: If a,b,Z and a-0, we sa
Page 27 and 28: Theorem: There are infinitely many
Page 29 and 30: Algorithm: Addition of integers pro
Page 31 and 32: Euclidean Algorithm: Compute gcd(a,
Page 33 and 34: Sun Tzu’s Puzzle: The a k ,{2, 1,
Page 35 and 36: • Sometimes P(k) is for a (possib
Page 37 and 38: Example: Every postage amount < $.1
Page 39 and 40: Definition: A recursive algorithm s
Page 41 and 42: Merge sort is a balanced binary tre
Page 43 and 44: Examples: • Consider 3 students i
Page 45 and 46: Theorem: The number of r-combinatio
Page 47 and 48: If we allow repetitions in the perm
Page 49 and 50: Algorithm: Generating the next r-co
Page 51 and 52: Theorem: Let E and F be events in a
Page 53 and 54: Note: For bit strings of length 4,
Page 55 and 56: Theorem: If X i , 1;i;n, are random
Page 57 and 58: Definition: The random variables X
Page 59: Example: We have 2 boxes. The first
Page 63 and 64: Advanced Counting Principles Defini
Page 65 and 66: Examples: Typical ones include •
Page 67 and 68: Now comes the second case for n = 2
Page 69 and 70: Theorem: Assume {b i },{c i },R. Su
Page 71 and 72: Example (funny integer multiplicati
Page 73 and 74: • In step 4, we have to take care
Page 75 and 76: Definition: The extended binomial c
Page 77 and 78: Example: a n = 8a n41 + 10 n41 with
Page 79 and 80: Relations Definition: A relation on
Page 81 and 82: Representation: The relation R from
Page 83 and 84: ! # 0 0 0 0 # • M R4 = # 1 0 0 0
Page 85 and 86: Definition: A partition of a set S
Page 87 and 88: Examples of Simple Graphs: • A co
Page 89 and 90: Examples: v 1 v 2 • and v 4 v 3 v
Page 91 and 92: Theorem: Let G = (V,E) be a graph w
Page 93 and 94: Definition: A coloring of a simple
Page 95 and 96: Definition: A m-ary tree is a roote
Page 97 and 98: Definition: A decision tree is a ro
Page 99 and 100: Note: Game trees are another highly
Page 101 and 102: Definition (Postorder Traversal): L
Page 103 and 104: Definition: Let G = (V,E) be a simp
Page 105 and 106: Definition: A minimum spanning tree
Page 107 and 108: Examples: Most circuits are of the
Page 109: Definition: An implicant is sum ter
show all

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

Create successful ePaper yourself

Delete template?

Save as template?