20.07.2013 Views

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

Notes on computational linguistics.pdf - UCLA Department of ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Stabler - Lx 185/209 2003<br />

(36) Theorem: P(A) = 1 − P(A)<br />

Pro<strong>of</strong>: Obviously AandAare disjoint, so by axiom iii, P(A∪ A) = P(A) + P(A)<br />

Since A ∪ A = Ω, axiom ii tells us that P(A) + P(A) = 1 <br />

(37) Theorem: P(∅) = 0<br />

(38) Theorem: P(A∪ B) = P(A) + P(B)− P(A∩ B)<br />

Pro<strong>of</strong>: Since A is the uni<strong>on</strong> <strong>of</strong> disjoint events (A ∩ B) and (B ∩ A), P(A) = P(A∩ B) + (B ∩ A).<br />

Since B is the uni<strong>on</strong> <strong>of</strong> disjoint events (A ∩ B) and (A ∩ B), P(B) = P(A∩ B) + (A ∩ B).<br />

And finally, since (A ∪ B) is the uni<strong>on</strong> <strong>of</strong> disjoint events (B ∩ A), (A ∩ B) and (A ∩ B), P(A ∪ B) =<br />

P(B ∩ A) + P(A∩ B) + P(A ∩ B).<br />

Now we can calculate P(A) + P(B) = P(A ∩ B) + (B ∩ A) + P(A ∩ B) + (A ∩ B), andsoP(A ∪ B) =<br />

P(A)+ P(B)− P(A∩ B). <br />

(39) Theorem (Boole’s inequality): P( ∞ 0 Ai) ≤ ∞ 0 P(Ai)<br />

(40) Exercises<br />

a. Prove that if A ⊆ B then P(A) ≤ P(B)<br />

b. In (38), we see what P(A∪ B) is. What is P(A∪ B ∪ C)?<br />

c. Prove Boole’s inequality.<br />

(41) The c<strong>on</strong>diti<strong>on</strong>al probability <strong>of</strong> A given B, P(A|B) =df P(A∩B)<br />

P(B)<br />

(42) Bayes’ theorem: P(A|B) = P(A)P(B|A)<br />

P(B)<br />

Pro<strong>of</strong>: From the definiti<strong>on</strong> <strong>of</strong> c<strong>on</strong>diti<strong>on</strong>al probability just stated in (41), (i) P(A∩B) = P(B)P(A|B). The<br />

definiti<strong>on</strong> <strong>of</strong> c<strong>on</strong>diti<strong>on</strong>al probability (41) also tells us P(B|A) = P(A∩B)<br />

P(A) , and so (ii) P(A∩B) = P(A)P(B|A).<br />

Given (i) and (ii), we know P(A)P(B|A) = P(B)P(A|B), from which the theorem follows immediately. <br />

The English mathematician Thomas Bayes (1702-1761) was a Presbyterian minister. He distributed some papers,<br />

and published <strong>on</strong>e an<strong>on</strong>ymously, but his influential work <strong>on</strong> probability, c<strong>on</strong>taining a versi<strong>on</strong> <strong>of</strong> the theorem<br />

above, was not published until after his death.<br />

Bayes is also associated with the idea that probabilities may be regarded as degrees <strong>of</strong> belief, and this has inspired<br />

recent work in models <strong>of</strong> scientific reas<strong>on</strong>ing. See, e.g. Horwich (1982), Earman (1992), Pearl (1988).<br />

In fact, in a Micros<strong>of</strong>t advertisement we are told that their Lumiere Project uses “a Bayesian perspective <strong>on</strong> integrating<br />

informati<strong>on</strong> from user background, user acti<strong>on</strong>s, and program state, al<strong>on</strong>g with a Bayesian analysis <strong>of</strong><br />

the words in a user’s query…this Bayesian informati<strong>on</strong>-retrieval comp<strong>on</strong>ent <strong>of</strong> Lumiere was shipped in all <strong>of</strong> the<br />

Office ’95 products as the Office Answer Wizard…As a user works, a probability distributi<strong>on</strong> is generated over<br />

areas that the user may need assistance with. A probability that the user would not mind being bothered with<br />

assistance is also computed.”<br />

See, e.g. http://www.research.micros<strong>of</strong>t.com/research/dtg/horvitz/lum.htm.<br />

For entertainment, and more evidence <strong>of</strong> the Bayesian cult that is sweeping certain subcultures, see, e.g. http://www.afit.af.m<br />

For some more serious remarks <strong>on</strong> Micros<strong>of</strong>t’s “Bayesian” spelling correcti<strong>on</strong>, and a new proposal inspired by<br />

trigram and Bayesian methods, see e.g. Golding and Schabes (1996).<br />

For some serious proposals about Bayesian methods in percepti<strong>on</strong>: Knill and Richards (1996); and in language<br />

acquisiti<strong>on</strong>: Brent and Cartwright (1996), de Marcken (1996).<br />

(43) A and B are independent iff P(A∩ B) = P(A)P(B).<br />

133

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!