Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
Notes on computational linguistics.pdf - UCLA Department of ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Stabler - Lx 185/209 2003<br />
(36) Theorem: P(A) = 1 − P(A)<br />
Pro<strong>of</strong>: Obviously AandAare disjoint, so by axiom iii, P(A∪ A) = P(A) + P(A)<br />
Since A ∪ A = Ω, axiom ii tells us that P(A) + P(A) = 1 <br />
(37) Theorem: P(∅) = 0<br />
(38) Theorem: P(A∪ B) = P(A) + P(B)− P(A∩ B)<br />
Pro<strong>of</strong>: Since A is the uni<strong>on</strong> <strong>of</strong> disjoint events (A ∩ B) and (B ∩ A), P(A) = P(A∩ B) + (B ∩ A).<br />
Since B is the uni<strong>on</strong> <strong>of</strong> disjoint events (A ∩ B) and (A ∩ B), P(B) = P(A∩ B) + (A ∩ B).<br />
And finally, since (A ∪ B) is the uni<strong>on</strong> <strong>of</strong> disjoint events (B ∩ A), (A ∩ B) and (A ∩ B), P(A ∪ B) =<br />
P(B ∩ A) + P(A∩ B) + P(A ∩ B).<br />
Now we can calculate P(A) + P(B) = P(A ∩ B) + (B ∩ A) + P(A ∩ B) + (A ∩ B), andsoP(A ∪ B) =<br />
P(A)+ P(B)− P(A∩ B). <br />
(39) Theorem (Boole’s inequality): P( ∞ 0 Ai) ≤ ∞ 0 P(Ai)<br />
(40) Exercises<br />
a. Prove that if A ⊆ B then P(A) ≤ P(B)<br />
b. In (38), we see what P(A∪ B) is. What is P(A∪ B ∪ C)?<br />
c. Prove Boole’s inequality.<br />
(41) The c<strong>on</strong>diti<strong>on</strong>al probability <strong>of</strong> A given B, P(A|B) =df P(A∩B)<br />
P(B)<br />
(42) Bayes’ theorem: P(A|B) = P(A)P(B|A)<br />
P(B)<br />
Pro<strong>of</strong>: From the definiti<strong>on</strong> <strong>of</strong> c<strong>on</strong>diti<strong>on</strong>al probability just stated in (41), (i) P(A∩B) = P(B)P(A|B). The<br />
definiti<strong>on</strong> <strong>of</strong> c<strong>on</strong>diti<strong>on</strong>al probability (41) also tells us P(B|A) = P(A∩B)<br />
P(A) , and so (ii) P(A∩B) = P(A)P(B|A).<br />
Given (i) and (ii), we know P(A)P(B|A) = P(B)P(A|B), from which the theorem follows immediately. <br />
The English mathematician Thomas Bayes (1702-1761) was a Presbyterian minister. He distributed some papers,<br />
and published <strong>on</strong>e an<strong>on</strong>ymously, but his influential work <strong>on</strong> probability, c<strong>on</strong>taining a versi<strong>on</strong> <strong>of</strong> the theorem<br />
above, was not published until after his death.<br />
Bayes is also associated with the idea that probabilities may be regarded as degrees <strong>of</strong> belief, and this has inspired<br />
recent work in models <strong>of</strong> scientific reas<strong>on</strong>ing. See, e.g. Horwich (1982), Earman (1992), Pearl (1988).<br />
In fact, in a Micros<strong>of</strong>t advertisement we are told that their Lumiere Project uses “a Bayesian perspective <strong>on</strong> integrating<br />
informati<strong>on</strong> from user background, user acti<strong>on</strong>s, and program state, al<strong>on</strong>g with a Bayesian analysis <strong>of</strong><br />
the words in a user’s query…this Bayesian informati<strong>on</strong>-retrieval comp<strong>on</strong>ent <strong>of</strong> Lumiere was shipped in all <strong>of</strong> the<br />
Office ’95 products as the Office Answer Wizard…As a user works, a probability distributi<strong>on</strong> is generated over<br />
areas that the user may need assistance with. A probability that the user would not mind being bothered with<br />
assistance is also computed.”<br />
See, e.g. http://www.research.micros<strong>of</strong>t.com/research/dtg/horvitz/lum.htm.<br />
For entertainment, and more evidence <strong>of</strong> the Bayesian cult that is sweeping certain subcultures, see, e.g. http://www.afit.af.m<br />
For some more serious remarks <strong>on</strong> Micros<strong>of</strong>t’s “Bayesian” spelling correcti<strong>on</strong>, and a new proposal inspired by<br />
trigram and Bayesian methods, see e.g. Golding and Schabes (1996).<br />
For some serious proposals about Bayesian methods in percepti<strong>on</strong>: Knill and Richards (1996); and in language<br />
acquisiti<strong>on</strong>: Brent and Cartwright (1996), de Marcken (1996).<br />
(43) A and B are independent iff P(A∩ B) = P(A)P(B).<br />
133