The dissertation of Andreas Stolcke is approved: University of ...

More documents

Recommendations

Info

,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N £££ N 6CHAPTER 7. -GRAMS FROM STOCHASTIC CONTEXT-FREE GRAMMARS 176This leads toÍ6 6tL 2t-L £ t/Ï t¬6 0£ 0£1 2 1 (7.10)Now, for 5 this becomes infinity, and for probabilities 5, the solution is negative! This is arather striking manifestation of the failure of this grammar, for tKV 0£ 5, to be consistent in the sense of Booth& Thompson (1973) (see Section 6.4.8). An inconsistent grammar is one in which the stochastic derivationprocess has non-zero probability of not terminating. The expected length of the generated strings shouldtherefore be infinite in this case.Booth and Thompson derive a criterion for checking the consistency of a SCFG: Find the firstmomentmatrix 6 E is the expected number of occurrences of nonterminal= in a one-step.”‘, where Ó”‘0expansion of ) nonterminal , and make sure its powers E6converge to as" 0¸Í°consistent, otherwise it is not. 5If so, the grammar isFor the grammar in (7.9), E is the 1 ¾ 1 matrix , 2 u 0 . Thus we can confirm our earlier observationby noting that , 2 u 0|6converges to 0 iff u Ï 0£ 5, or t0£ 5.Notice that E is identical to the matrix A that occurs in the linear equations (7.6) for the ¢ -gramcomputation. The actual coefficient matrix is I L A, and its inverse, if it exists, can be written as the geometricsumThis series converges precisely if A6converges to 0. We have thus shown that the existence of a solutionfor the¢ -gram problem is equivalent to the consistency of the grammar in question. Furthermore, the solution vector6 c L A0 I 1 b will always consist of non-negative numbers: it is the sum and product of the non-negativevalues given by equations (7.7) and (7.8).The matrix L I A and its inverse turn out to have a special role for SCFG: it is, in a sense, a‘universal problem solver’ for a whole series of global quantities associated with probabilistic grammars. Abrief overview of these is given in the appendix to this chapter.7.6 ExperimentsThe algorithm described here has been implemented, and is being used to generate bigrams fora speech recognizer that is part of the BeRP spoken-language system (Jurafsky et al. 1994a). The speechdecoder and language model components of the BeRP system were used in an experiment to assess the benefitof using bigram probabilities obtained through SCFGs versus estimating them directly from the availabletraining corpus. The system’s domain are inquiries about restaurants in the city of Berkeley. Table 7.1 givesstatistics for the training and test corpora used, as well as the language models involved in the experiment.Our experiments made use of a context-free grammar hand-written for the BeRP domain. Computing thebigram probabilities from this SCFG of 133 nonterminals involves solving 657 linear systems for unigram5 An alternative version of this criterion is to check the magnitude of the largest of E’s eigenvalues (its spectral radius). If that valueisÎ 1, the grammar is inconsistent; ifÏ 1, it is consistent.
CHAPTER 7. -GRAMS FROM STOCHASTIC CONTEXT-FREE GRAMMARS 177Training corpus Test corpusNo. of sentences 2621 364No. of words 16974 2208Bigram vocabulary 1064Bigram coverage 100% 77%SCFG productions 1177SCFG vocabulary 655SCFG coverage 63% 51%Table 7.1: BeRP corpora and language model statistics.Coverage is measured by the percentage of sentences parsed with non-zero probability by a given languagemodel.expectations and 108959 linear systems for bigram expectations.SPARCstation 10 using a non-optimized Lisp implementation. 6The process takes about 9 hours on aThe experiments and results described below overlap with those reported in Jurafsky et al. (1994b).In experiment 1, the recognizer used bigrams that were estimated directly from the training corpus,without any smoothing, resulting in a word error rate of 33.7%.In experiment 2, a different set of bigram probabilities was used, computed from the context-freegrammar, whose probabilities had previously been estimated from the same training corpus, using standardEM techniques. This resulted in a word error rate of 32.9%. This may seem surprisingly good given thelow coverage of the underlying CFGs, but notice that the conversion into bigrams is bound to result in aless constraining language model, effectively increasing coverage. For comparison purposes we also ran thesame experiment with bigrams computed indirectly by Monte-Carlo sampling from the SCFG, using 200,000samples. The result was slightly worse (33.3%), confirming that the precise computation has an inherentadvantage, as it cannot omit words or constructions that the SCFG assigns very low probability.Finally, in experiment 3, the bigrams generated from the SCFG were augmented by those from theraw training data, in a proportion of 200,000 : 2500. We have not attempted to optimize this mixture proportion,e.g., by deleted interpolation (Jelinek & Mercer 1980). 7 With the bigram estimates thus obtained, the worderror rate dropped to 29.6%, which represents a statistically significant improvement over experiments 1and 2.Table 7.2 summarizes these figures and also adds two more points of comparison: a pure SCFGlanguage model and a mixture model that interpolates between bigram and SCFG. Notice that the latter caseis different from experiment 3, where the language model used is a standard bigram, albeit one that wasobtained by ‘mixing’ counts obtained both from the data and from the SCFG. The system referred to here, on6 One inefficiency is that the actual number of nonterminals (and hence the rank of the coefficient matrix) is 445, as the grammar isconverted to the Simple Normal Form introduced in Chapter 4.7 This proportion comes about because in the original system, predating the method described here, bigrams had to be estimated fromthe SCFG by random sampling. Generating 200,000 sentence samples was found to give good converging estimates for the bigrams.The bigrams from the raw training sentences were then simply added to the randomly generated ones. We later verified that the bigramsestimated from the SCFG were indeed identical to the ones computed directly using the method described here.
Page 1 and 2:
The dissertation of Andreas Stolcke
Page 3 and 4:
Bayesian Learning of Probabilistic
Page 5 and 6:
iAcknowledgmentsLife and work in Be
Page 7 and 8:
iiiContentsList of FiguresList of T
Page 9 and 10:
CONTENTSv4.5.4 Summary and Discussi
Page 14 and 15:
CHAPTER 1. INTRODUCTION 2Instance-b
Page 16 and 17:
CHAPTER 1. INTRODUCTION 4A.0.830.33
Page 18 and 19:
CHAPTER 1. INTRODUCTION 6the ¨ 0 l
Page 20 and 21:
..1 £££1; 450,1 £££1; 450CHAP
Page 22 and 23:
VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
Page 24 and 25:
,,vv,v,v,,directly. However, note t
Page 26 and 27:
4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
Page 28 and 29:
6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
Page 30 and 31:
CHAPTER 2. FOUNDATIONS 18As more da
Page 32 and 33:
CHAPTER 2. FOUNDATIONS 20Global mod
Page 34 and 35:
CHAPTER 2. FOUNDATIONS 22¡An expli
Page 36 and 37:
ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
Page 38 and 39:
666CHAPTER 2. FOUNDATIONS 262.5.7 P
Page 40 and 41:
It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
Page 42 and 43:
, uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
Page 44 and 45:
CHAPTER 3. HIDDEN MARKOV MODELS 32T
Page 46 and 47:
CHAPTER 3. HIDDEN MARKOV MODELS 34R
Page 48 and 49:
4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
Page 50 and 51:
6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
Page 52 and 53:
2. For each candidate"I!computeLet"
Page 54 and 55:
6\“ç%&ät\“ç tè ä, u¦¸¹u
Page 56 and 57:
, u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
Page 58 and 59:
CHAPTER 3. HIDDEN MARKOV MODELS 46l
Page 60 and 61:
CHAPTER 3. HIDDEN MARKOV MODELS 48c
Page 62 and 63:
CHAPTER 3. HIDDEN MARKOV MODELS 50d
Page 64 and 65:
CHAPTER 3. HIDDEN MARKOV MODELS 520
Page 66 and 67:
correlation between initial and fin
Page 68 and 69:
,CHAPTER 3. HIDDEN MARKOV MODELS 56
Page 70 and 71:
Page 72 and 73:
,CHAPTER 3. HIDDEN MARKOV MODELS 60
Page 74 and 75:
CHAPTER 3. HIDDEN MARKOV MODELS 62b
Page 76 and 77:
CHAPTER 3. HIDDEN MARKOV MODELS 64t
Page 78 and 79:
CHAPTER 3. HIDDEN MARKOV MODELS 66t
Page 80 and 81:
CHAPTER 3. HIDDEN MARKOV MODELS 68s
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
CHAPTER 3. HIDDEN MARKOV MODELS 74b
Page 88 and 89:
domain. 3 In short, we will leave o
Page 90 and 91:
,,,,,£CHAPTER 4. STOCHASTIC CONTEX
Page 92 and 93:
9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
Page 94 and 95:
¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
Page 96 and 97:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
Page 106 and 107:
,= ===I¸theybÜ„thiscg„\\ ¸¸
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
104Chapter 5Probabilistic Attribute
Page 118 and 119:
,1makingandCHAPTER 5. PROBABILISTIC
Page 120 and 121:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 122 and 123:
Page 124 and 125:
Page 126 and 127:
Page 128 and 129:
Page 130 and 131:
Page 132 and 133:
Page 134 and 135:
122Chapter 6Efficient parsing with
Page 136 and 137:
1z1CHAPTER 6. EFFICIENT PARSING WIT
Page 138 and 139: and each state in set ( -ÏH ):6)
Page 140 and 141: NPDetVTVI P CHAPTER 6. EFFICIENT PA
Page 142 and 143: CHAPTER 6. EFFICIENT PARSING WITH S
Page 144 and 145: ) In particular, the string probabi
Page 146 and 147: H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
Page 148 and 149: ©) 6#=©,©,©,,ÆNNö,²NL++and>+
Page 150 and 151: ) The probabilistic unit-production
Page 152 and 153: ¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
Page 154 and 155: The forward and inner probabilities
Page 156 and 157: 9 itself²²NN++9 ¸0ÌLL++??1£,=C
Page 158 and 159: ,,by nonterminals. Multiplying this
Page 160 and 161: 6CHAPTER 6. EFFICIENT PARSING WITH
Page 162 and 163: description. Again, we ignore this
Page 164 and 165: for all pairs of states d =¸+= Š
Page 166 and 167: ,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
Page 170 and 171: are then summed over all nontermina
Page 172 and 173: +CHAPTER 6. EFFICIENT PARSING WITH
Page 174 and 175: 1CHAPTER 6. EFFICIENT PARSING WITH
Page 178 and 179: = ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
Page 180 and 181: 168Chapter 7-grams from Stochastic
Page 182 and 183: CHAPTER 7. -GRAMS FROM STOCHASTIC
Page 184 and 185: )ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
Page 186 and 187: -grams CCHAPTER 7. -GRAMS FROM ST
Page 190 and 191: ,,CHAPTER 7. -GRAMS FROM STOCHASTI
Page 192 and 193: Consider the following problem: sta
Page 194 and 195: CHAPTER 8. FUTURE DIRECTIONS 1828.2
Page 196 and 197: 184BibliographyAHO, ALFRED V., RAVI
Page 198 and 199: BIBLIOGRAPHY 186DAGAN, IDO, FERNAND
Page 200 and 201: BIBLIOGRAPHY 188——, & ——. 1
Page 202 and 203: BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
Page 204: BIBLIOGRAPHY 192WALLACE, C. S., & P
show all

The dissertation of Andreas Stolcke is approved: University of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?