Notes on computational linguistics.pdf - UCLA Department of ...

More documents

Recommendations

Info

Stabler - Lx 185/209 2003 (36) Theorem: P(A) = 1 − P(A) Proof: Obviously AandAare disjoint, so by axiom iii, P(A∪ A) = P(A) + P(A) Since A ∪ A = Ω, axiom ii tells us that P(A) + P(A) = 1 (37) Theorem: P(∅) = 0 (38) Theorem: P(A∪ B) = P(A) + P(B)− P(A∩ B) Proof: Since A is the union of disjoint events (A ∩ B) and (B ∩ A), P(A) = P(A∩ B) + (B ∩ A). Since B is the union of disjoint events (A ∩ B) and (A ∩ B), P(B) = P(A∩ B) + (A ∩ B). And finally, since (A ∪ B) is the union of disjoint events (B ∩ A), (A ∩ B) and (A ∩ B), P(A ∪ B) = P(B ∩ A) + P(A∩ B) + P(A ∩ B). Now we can calculate P(A) + P(B) = P(A ∩ B) + (B ∩ A) + P(A ∩ B) + (A ∩ B), andsoP(A ∪ B) = P(A)+ P(B)− P(A∩ B). (39) Theorem (Boole’s inequality): P( ∞ 0 Ai) ≤ ∞ 0 P(Ai) (40) Exercises a. Prove that if A ⊆ B then P(A) ≤ P(B) b. In (38), we see what P(A∪ B) is. What is P(A∪ B ∪ C)? c. Prove Boole’s inequality. (41) The conditional probability of A given B, P(A|B) =df P(A∩B) P(B) (42) Bayes’ theorem: P(A|B) = P(A)P(B|A) P(B) Proof: From the definition of conditional probability just stated in (41), (i) P(A∩B) = P(B)P(A|B). The definition of conditional probability (41) also tells us P(B|A) = P(A∩B) P(A) , and so (ii) P(A∩B) = P(A)P(B|A). Given (i) and (ii), we know P(A)P(B|A) = P(B)P(A|B), from which the theorem follows immediately. The English mathematician Thomas Bayes (1702-1761) was a Presbyterian minister. He distributed some papers, and published one anonymously, but his influential work on probability, containing a version of the theorem above, was not published until after his death. Bayes is also associated with the idea that probabilities may be regarded as degrees of belief, and this has inspired recent work in models of scientific reasoning. See, e.g. Horwich (1982), Earman (1992), Pearl (1988). In fact, in a Microsoft advertisement we are told that their Lumiere Project uses “a Bayesian perspective on integrating information from user background, user actions, and program state, along with a Bayesian analysis of the words in a user’s query…this Bayesian information-retrieval component of Lumiere was shipped in all of the Office ’95 products as the Office Answer Wizard…As a user works, a probability distribution is generated over areas that the user may need assistance with. A probability that the user would not mind being bothered with assistance is also computed.” See, e.g. http://www.research.microsoft.com/research/dtg/horvitz/lum.htm. For entertainment, and more evidence of the Bayesian cult that is sweeping certain subcultures, see, e.g. http://www.afit.af.m For some more serious remarks on Microsoft’s “Bayesian” spelling correction, and a new proposal inspired by trigram and Bayesian methods, see e.g. Golding and Schabes (1996). For some serious proposals about Bayesian methods in perception: Knill and Richards (1996); and in language acquisition: Brent and Cartwright (1996), de Marcken (1996). (43) A and B are independent iff P(A∩ B) = P(A)P(B). 133
Stabler - Lx 185/209 2003 8.1.5 Random variables (44) A random (or stochastic) variable on probability space (Ω, 2Ω ,P) is a function X : Ω → R. (45) Any set of numbers A ∈ 2R determines (or “generates”) an event, a set of outcomes, namely X−1 (A) = {e| X(e) ∈ A}. (46) So then, for example, P(X−1 (A)) = P({e| X(e) ∈ A}) is the probability of an event, as usual. (47) Many texts use the notation X ∈ A for an event, namely, {e| X(e) ∈ A}). So P(X ∈ A) is just P(X−1 (A)), whichisjustP({e| X(e) ∈ A}). Sometimes you also see P{X ∈ A}, with the same meaning. (48) Similarly, for some a ∈ R, itiscommontoseeP(X = s),whereX = s is the event {e| X(e) = s}). (49) The range of X is sometimes called the sample space of the stochastic variable X, ΩX. X is discrete if ΩX is finite or countably infinite. Otherwise it is continuous. • Why do things this way? What is the purpose of these functions X? The answer is: the functions X just formalize the classification of events, the sets of outcomes that we are interested in, as explained in (45) and (48). This is a standard way to name events, and once you are practiced with the notation, it is convenient. The events are classified numerically here, that is, they are named by real numbers, but when the set of events ΩX is finite or countable, obviously we could name these events with any finite or countable set of names. 8.1.6 Stochastic processes and n-gram models of language users (50) A stochastic process is a function X from times (or “indices”) to random variables. If the time is continuous, then X : R → [Ω → R], where[Ω→R] is the set of random variables. If the time is discrete, then X : N → [Ω → R] (51) For stochastic processes X, instead of the usual argument notation X(t), we use subscripts Xt, toavoid confusion with the arguments of the random variables. So Xt is the value of the stochastic process X at time t, a random variable. When time is discrete, for t = 0, 1, 2,... we have the sequence of random variables X0,X1,X2,... (52) We will consider primarily discrete time stochastic processes, that is, sequences X0,X1,X2,...of random variables. So Xi is a random variable, namely the one that is the value of the stochastic process X at time i. (53) Xi = q is interpreted as before as the event (now understood as occurring at time i) whichisthesetof outcomes {e| Xi(e) = q}. So, for example, P(Xi = q) is just a notation for the probability, at time i, of an outcome that is named q by Xi, thatis,P(Xi = q) is short for P({e| Xi(e) = q}). (54) Notice that it would make perfect sense for all the variables in the sequence to be identical, X0 = X1 = X2 = .... In that case, we still think of the process as one that occurs in time, with the same classification of outcomes available at each time. Let’s call a stochastic process time-invariant (or stationary) iff all of its random variables are the same function. That is, for all q, q ′ ∈ N,Xq = Xq ′. (55) A finite stochastic process X is one where sample space of all the stochastic variables, is finite. ΩX = ∞ i=0 ΩXi The elements of ΩX name events, as explained in (45) and (48), but in this context the elements of ΩX are often called states. Markov chains 134
Page 1 and 2:
Notes on computati
Page 3 and 4:
Stabler - Lx 185/209 2003 Linguisti
Page 5 and 6:
Stabler - Lx 185/209 2003 1 Setting
Page 7 and 8:
Stabler - Lx 185/209 2003 1.2 Propo
Page 9 and 10:
Stabler - Lx 185/209 2003 (8) Pitfa
Page 11 and 12:
Stabler - Lx 185/209 2003 In fact,
Page 13 and 14:
Stabler - Lx 185/209 2003 10. Call
Page 15 and 16:
Stabler - Lx 185/209 2003 compute s
Page 17 and 18:
Stabler - Lx 185/209 2003 (11) Pred
Page 19 and 20:
Stabler - Lx 185/209 2003 1.6 The l
Page 21 and 22:
Stabler - Lx 185/209 2003 L0 and L1
Page 23 and 24:
Stabler - Lx 185/209 2003 Exercises
Page 25 and 26:
Stabler - Lx 185/209 2003 3 more ex
Page 27 and 28:
Stabler - Lx 185/209 2003 2 Recogni
Page 29 and 30:
Stabler - Lx 185/209 2003 (8) Then
Page 31 and 32:
Stabler - Lx 185/209 2003 (13) The
Page 33 and 34:
Page 35 and 36:
Stabler - Lx 185/209 2003 Problem (
Page 37 and 38:
Stabler - Lx 185/209 2003 1 ?- [col
Page 39 and 40:
Stabler - Lx 185/209 2003 lex(’th
Page 41 and 42:
Stabler - Lx 185/209 2003 2 ?- ([
Page 43 and 44:
Stabler - Lx 185/209 2003 (3) Dalry
Page 45 and 46:
Stabler - Lx 185/209 2003 3.3 Recog
Page 47 and 48:
Stabler - Lx 185/209 2003 (22) Supp
Page 49 and 50:
Stabler - Lx 185/209 2003 (27) With
Page 51 and 52:
Stabler - Lx 185/209 2003 (31) tcl/
Page 53 and 54:
Stabler - Lx 185/209 2003 A c-comma
Page 55 and 56:
Stabler - Lx 185/209 2003 (46) Does
Page 57 and 58:
Stabler - Lx 185/209 2003 m. The id
Page 59 and 60:
Stabler - Lx 185/209 2003 4 Brief d
Page 61 and 62:
Stabler - Lx 185/209 2003 (50) Nem
Page 63 and 64:
Stabler - Lx 185/209 2003 5 Trees,
Page 65 and 66:
Stabler - Lx 185/209 2003 children(
Page 67 and 68:
Stabler - Lx 185/209 2003 5.3 Movem
Page 69 and 70:
Stabler - Lx 185/209 2003 a. d e c
Page 71 and 72:
Stabler - Lx 185/209 2003 children(
Page 73 and 74:
Stabler - Lx 185/209 2003 adjoin_no
Page 75 and 76:
Stabler - Lx 185/209 2003 (37) With
Page 77 and 78:
Stabler - Lx 185/209 2003 Mates’
Page 79 and 80:
Stabler - Lx 185/209 2003 81. ((A
Page 81 and 82:
Stabler - Lx 185/209 2003 6.2 LR pa
Page 83 and 84: Stabler - Lx 185/209 2003 6.3 LC pa
Page 85 and 86: Stabler - Lx 185/209 2003 (20) Like
Page 87 and 88: Stabler - Lx 185/209 2003 6.4 All t
Page 89 and 90: Stabler - Lx 185/209 2003 (26) GLC
Page 91 and 92: Stabler - Lx 185/209 2003 (33) GLC
Page 93 and 94: Stabler - Lx 185/209 2003 The secon
Page 95 and 96: Stabler - Lx 185/209 2003 6.5.2 Bot
Page 97 and 98: Stabler - Lx 185/209 2003 6.6 Asses
Page 99 and 100: Stabler - Lx 185/209 2003 (56) Stru
Page 101 and 102: Stabler - Lx 185/209 2003 6.6.4 A d
Page 103 and 104: Stabler - Lx 185/209 2003 1. Downlo
Page 105 and 106: Stabler - Lx 185/209 2003 It is ess
Page 107 and 108: Stabler - Lx 185/209 2003 inference
Page 109 and 110: Stabler - Lx 185/209 2003 1 (’SBA
Page 111 and 112: Stabler - Lx 185/209 2003 7.2 Tree
Page 113 and 114: Stabler - Lx 185/209 2003 (14) With
Page 115 and 116: Stabler - Lx 185/209 2003 /* earley
Page 117 and 118: Stabler - Lx 185/209 2003 8 Stochas
Page 119 and 120: Stabler - Lx 185/209 2003 8.1.1 Cor
Page 121 and 122: Stabler - Lx 185/209 2003 to punctu
Page 123 and 124: Stabler - Lx 185/209 2003 jane aust
Page 125 and 126: Stabler - Lx 185/209 2003 where usu
Page 127 and 128: Stabler - Lx 185/209 2003 0.1 0.09
Page 129 and 130: Stabler - Lx 185/209 2003 We get al
Page 131 and 132: Stabler - Lx 185/209 2003 Word leng
Page 133: Stabler - Lx 185/209 2003 8.1.4 Pro
Page 137 and 138: Stabler - Lx 185/209 2003 Matrix ar
Page 139 and 140: Stabler - Lx 185/209 2003 (65) To a
Page 141 and 142: Stabler - Lx 185/209 2003 octave:18
Page 143 and 144: Stabler - Lx 185/209 2003 added pro
Page 145 and 146: Stabler - Lx 185/209 2003 P(q1 ...q
Page 147 and 148: Stabler - Lx 185/209 2003 c. Finall
Page 149 and 150: Stabler - Lx 185/209 2003 This is p
Page 151 and 152: Stabler - Lx 185/209 2003 (100) Abn
Page 153 and 154: Stabler - Lx 185/209 2003 3. If des
Page 155 and 156: Stabler - Lx 185/209 2003 It will b
Page 157 and 158: Stabler - Lx 185/209 2003 Entropy (
Page 159 and 160: Stabler - Lx 185/209 2003 One indir
Page 161 and 162: Stabler - Lx 185/209 2003 4. the fu
Page 163 and 164: Stabler - Lx 185/209 2003 8.2.3 Ass
Page 165 and 166: Stabler - Lx 185/209 2003 (142) We
Page 167 and 168: Stabler - Lx 185/209 2003 8.4 Next
Page 169 and 170: Stabler - Lx 185/209 2003 9.1 “Mi
Page 171 and 172: Stabler - Lx 185/209 2003 (4) More
Page 173 and 174: Stabler - Lx 185/209 2003 This is a
Page 175 and 176: Stabler - Lx 185/209 2003 (7) Let
Page 177 and 178: Stabler - Lx 185/209 2003 (9) Let
Page 179 and 180: Stabler - Lx 185/209 2003 dP3 maria
Page 181 and 182: Stabler - Lx 185/209 2003 The 4 Eng
Page 183 and 184: Stabler - Lx 185/209 2003 9.1.4 Fou
Page 185 and 186:
Stabler - Lx 185/209 2003 (21) The
Page 187 and 188:
Stabler - Lx 185/209 2003 :- [’pp
Page 189 and 190:
Stabler - Lx 185/209 2003 9.2.2 Som
Page 191 and 192:
Stabler - Lx 185/209 2003 who laugh
Page 193 and 194:
Stabler - Lx 185/209 2003 Kayne ass
Page 195 and 196:
Stabler - Lx 185/209 2003 (32) Vari
Page 197 and 198:
Stabler - Lx 185/209 2003 dP t 3 v
Page 199 and 200:
Stabler - Lx 185/209 2003 10 Toward
Page 201 and 202:
Stabler - Lx 185/209 2003 We can en
Page 203 and 204:
Stabler - Lx 185/209 2003 Head move
Page 205 and 206:
Stabler - Lx 185/209 2003 D which w
Page 207 and 208:
Stabler - Lx 185/209 2003 CP C’ C
Page 209 and 210:
Stabler - Lx 185/209 2003 CP C’ C
Page 211 and 212:
Stabler - Lx 185/209 2003 that::=T
Page 213 and 214:
Stabler - Lx 185/209 2003 T t C C S
Page 215 and 216:
Stabler - Lx 185/209 2003 10.3.4 AP
Page 217 and 218:
Stabler - Lx 185/209 2003 T t C C C
Page 219 and 220:
Stabler - Lx 185/209 2003 10.3.6 Co
Page 221 and 222:
Stabler - Lx 185/209 2003 10.4 Modi
Page 223 and 224:
Stabler - Lx 185/209 2003 10.5 Summ
Page 225 and 226:
Stabler - Lx 185/209 2003 10.5.1 Re
Page 227 and 228:
Page 229 and 230:
Stabler - Lx 185/209 2003 10.6.3 Mu
Page 231 and 232:
Stabler - Lx 185/209 2003 10.6.5 Pi
Page 233 and 234:
Stabler - Lx 185/209 2003 CP C’ C
Page 235 and 236:
Stabler - Lx 185/209 2003 but also
Page 237 and 238:
Stabler - Lx 185/209 2003 And if yo
Page 239 and 240:
Stabler - Lx 185/209 2003 sentence:
Page 241 and 242:
Stabler - Lx 185/209 2003 Example:
Page 243 and 244:
Stabler - Lx 185/209 2003 15.1 Mono
Page 245 and 246:
Stabler - Lx 185/209 2003 Example:
Page 247 and 248:
Stabler - Lx 185/209 2003 16 Harder
Page 249 and 250:
Stabler - Lx 185/209 2003 A first,
Page 251 and 252:
Stabler - Lx 185/209 2003 non-demon
Page 253 and 254:
Stabler - Lx 185/209 2003 16.4 Scop
Page 255 and 256:
Stabler - Lx 185/209 2003 16.5 Infe
Page 257 and 258:
Stabler - Lx 185/209 2003 Extra cre
Page 259 and 260:
Stabler - Lx 185/209 2003 CP C’ C
Page 261 and 262:
Stabler - Lx 185/209 2003 (5) In su
Page 263 and 264:
Stabler - Lx 185/209 2003 (7) Anoth
Page 265 and 266:
Stabler - Lx 185/209 2003 17.2.1 A
Page 267 and 268:
Page 269 and 270:
Stabler - Lx 185/209 2003 Reference
Page 271 and 272:
Stabler - Lx 185/209 2003 Cornell,
Page 273 and 274:
Stabler - Lx 185/209 2003 Hale, Joh
Page 275 and 276:
Stabler - Lx 185/209 2003 Kraft, L.
Page 277 and 278:
Stabler - Lx 185/209 2003 Pollock,
Page 279 and 280:
Stabler - Lx 185/209 2003 Stabler,
Page 281 and 282:
Index (x, y), openintervalfromx to
Page 283 and 284:
Stabler - Lx 185/209 2003 Herbrand,
Page 285:
Stabler - Lx 185/209 2003 Seki, Hir
show all

Notes on computational linguistics.pdf - UCLA Department of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?