The dissertation of Andreas Stolcke is approved: University of ...

More documents

Recommendations

Info

,,vv,v,v,,directly. However, note that for minimization purposes only the } ~,– derives 1 +-,v,”,CHAPTER 2. FOUNDATIONS 12follows thatu 0Œ‹ },u 0Ž‹ 0, with t%.o.between distributions, or between distributions and models. 4Computingtw.o. u 0w6 0 iff t 6 u. This justifies thinking oftw.o. u 0 as a pseudo-distancescenarios at most one is known, e.g., because it is given by a model. For example, we might use the relativet%. . u 0 exactly presumes knowledge of the full distributions t and u , but in typicalentropy to define an estimator for model parameters, such that the estimated value is that which vminimizes45070 , where t is the distribution from which the samples are drawn. Since t is not known, we cannotIt can be shown that }Š~tB0 always, with equality if and only if the two distributions are identical. Itcomputet%.o. ¨u 0 term is relevant, } tB0 is,u 0 to be minimized, which,an unknown constant, but one that remains fixed as 4 is varied. This leaves } ~is the expected value of L log , u 0 under the true distributiont0 . It can therefore be estimated by averagingover the sample corpus )10 (2.10),;1} ~. )/.y zs“” log u ,;1tB0&‘’LWe thus see that estimated cross-entropy is proportional (by a factor L1 • of) to the log of the•likelihood. Therefore, ML estimators are effectively also minimum relative entropy estimators.2.3 Grammars with hidden variablesAlthough all grammars considered here generate their samples through a combination of multinomials,the sequence of choices that can give rise to a given sample is not always uniquely determined, unlikefor ¢ -gram grammars. There, one can uniquely identify the sequence of choices leading to the generationof a complete string, by inspecting the ¢ -grams occurring in the string in left-to-right order. Knowing the-grams, one can then compute their probabilities, and hence the probability of the string itself (by taking¢products).A complete sequence of generator events (multinomial samples) that generate string 1 is called aderivation of 1 . (Thus, for ¢ -gram models the only derivation of a strings is the string itself.) Grammarsthat generate strings with more than one derivation are called ambiguous. Each derivation – has a probabilitywhich is the product of the probabilities of the multinomial outcomes making up the derivation. In general,then, a string probability is a sum of derivation probabilities, namely, of all derivations generating the samestring:. 450—6 y+-,;1–
šªlog ›ª,ªCHAPTER 2. FOUNDATIONS 132.3.1 Mixture grammarsA simple example for this are grammars that generate strings from two subsidiary, componentgrammars, 4 1 and 4 2. The first step in a derivation is choosing which of the two submodels is to generatethe string, according to a single mixture ˜ probability . Following that, the chosen submodel generates thestring according to whatever distribution it represents. The total probability of is thus11 LT˜M0)/. 4 20 (2.11)+-,;1+-,21+-,. ˜&4 14 20&6˜. 4 10
Page 1 and 2: The dissertation of Andreas Stolcke
Page 3 and 4: Bayesian Learning of Probabilistic
Page 5 and 6: iAcknowledgmentsLife and work in Be
Page 7 and 8: iiiContentsList of FiguresList of T
Page 9 and 10: CONTENTSv4.5.4 Summary and Discussi
Page 14 and 15: CHAPTER 1. INTRODUCTION 2Instance-b
Page 16 and 17: CHAPTER 1. INTRODUCTION 4A.0.830.33
Page 18 and 19: CHAPTER 1. INTRODUCTION 6the ¨ 0 l
Page 20 and 21: ..1 £££1; 450,1 £££1; 450CHAP
Page 22 and 23: VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
Page 26 and 27: 4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
Page 28 and 29: 6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
Page 30 and 31: CHAPTER 2. FOUNDATIONS 18As more da
Page 32 and 33: CHAPTER 2. FOUNDATIONS 20Global mod
Page 34 and 35: CHAPTER 2. FOUNDATIONS 22¡An expli
Page 36 and 37: ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
Page 38 and 39: 666CHAPTER 2. FOUNDATIONS 262.5.7 P
Page 40 and 41: It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
Page 42 and 43: , uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
Page 44 and 45: CHAPTER 3. HIDDEN MARKOV MODELS 32T
Page 46 and 47: CHAPTER 3. HIDDEN MARKOV MODELS 34R
Page 48 and 49: 4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
Page 50 and 51: 6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
Page 52 and 53: 2. For each candidate"I!computeLet"
Page 54 and 55: 6\“ç%&ät\“ç tè ä, u¦¸¹u
Page 56 and 57: , u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
Page 58 and 59: CHAPTER 3. HIDDEN MARKOV MODELS 46l
Page 60 and 61: CHAPTER 3. HIDDEN MARKOV MODELS 48c
Page 62 and 63: CHAPTER 3. HIDDEN MARKOV MODELS 50d
Page 64 and 65: CHAPTER 3. HIDDEN MARKOV MODELS 520
Page 66 and 67: correlation between initial and fin
Page 68 and 69: ,CHAPTER 3. HIDDEN MARKOV MODELS 56
Page 70 and 71: CHAPTER 3. HIDDEN MARKOV MODELS 580
Page 72 and 73: ,CHAPTER 3. HIDDEN MARKOV MODELS 60
Page 74 and 75:
CHAPTER 3. HIDDEN MARKOV MODELS 62b
Page 76 and 77:
CHAPTER 3. HIDDEN MARKOV MODELS 64t
Page 78 and 79:
CHAPTER 3. HIDDEN MARKOV MODELS 66t
Page 80 and 81:
CHAPTER 3. HIDDEN MARKOV MODELS 68s
Page 82 and 83:
CHAPTER 3. HIDDEN MARKOV MODELS 706
Page 84 and 85:
CHAPTER 3. HIDDEN MARKOV MODELS 723
Page 86 and 87:
CHAPTER 3. HIDDEN MARKOV MODELS 74b
Page 88 and 89:
domain. 3 In short, we will leave o
Page 90 and 91:
,,,,,£CHAPTER 4. STOCHASTIC CONTEX
Page 92 and 93:
9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
Page 94 and 95:
¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
Page 96 and 97:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
Page 106 and 107:
,= ===I¸theybÜ„thiscg„\\ ¸¸
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
104Chapter 5Probabilistic Attribute
Page 118 and 119:
,1makingandCHAPTER 5. PROBABILISTIC
Page 120 and 121:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
122Chapter 6Efficient parsing with
Page 136 and 137:
1z1CHAPTER 6. EFFICIENT PARSING WIT
Page 138 and 139:
and each state in set ( -ÏH ):6)
Page 140 and 141:
NPDetVTVI P CHAPTER 6. EFFICIENT PA
Page 142 and 143:
CHAPTER 6. EFFICIENT PARSING WITH S
Page 144 and 145:
) In particular, the string probabi
Page 146 and 147:
H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
Page 148 and 149:
©) 6#=©,©,©,,ÆNNö,²NL++and>+
Page 150 and 151:
) The probabilistic unit-production
Page 152 and 153:
¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
Page 154 and 155:
The forward and inner probabilities
Page 156 and 157:
9 itself²²NN++9 ¸0ÌLL++??1£,=C
Page 158 and 159:
,,by nonterminals. Multiplying this
Page 160 and 161:
6CHAPTER 6. EFFICIENT PARSING WITH
Page 162 and 163:
description. Again, we ignore this
Page 164 and 165:
for all pairs of states d =¸+= Š
Page 166 and 167:
,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
Page 168 and 169:
Page 170 and 171:
are then summed over all nontermina
Page 172 and 173:
+CHAPTER 6. EFFICIENT PARSING WITH
Page 174 and 175:
1CHAPTER 6. EFFICIENT PARSING WITH
Page 176 and 177:
Page 178 and 179:
= ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
Page 180 and 181:
168Chapter 7-grams from Stochastic
Page 182 and 183:
CHAPTER 7. -GRAMS FROM STOCHASTIC
Page 184 and 185:
)ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
Page 186 and 187:
-grams CCHAPTER 7. -GRAMS FROM ST
Page 188 and 189:
,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N
Page 190 and 191:
,,CHAPTER 7. -GRAMS FROM STOCHASTI
Page 192 and 193:
Consider the following problem: sta
Page 194 and 195:
CHAPTER 8. FUTURE DIRECTIONS 1828.2
Page 196 and 197:
184BibliographyAHO, ALFRED V., RAVI
Page 198 and 199:
BIBLIOGRAPHY 186DAGAN, IDO, FERNAND
Page 200 and 201:
BIBLIOGRAPHY 188——, & ——. 1
Page 202 and 203:
BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
Page 204:
BIBLIOGRAPHY 192WALLACE, C. S., & P
show all

The dissertation of Andreas Stolcke is approved: University of ...

Create successful ePaper yourself

Delete template?

Save as template?