The dissertation of Andreas Stolcke is approved: University of ...

More documents

Recommendations

Info

are then summed over all nonterminals>, and the result is once multiplied by the rule probability +-,= ¸CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 1586.6 Implementation IssuesThis section briefly discusses some of the experience gained from implementing the probabilisticEarley parser. Implementation is mainly straightforward and many of the standard techniques for context-freegrammars can be used (Graham et al. 1980). However, some aspects are unique due to the addition ofprobabilities.6.6.1 PredictionDue to the collapsing of transitive predictions, this step can be implemented in a very efficient andstraightforward manner. As explained in Section 6.4.5, one has to perform a single pass over the currentstate set, identifying all nonterminals> occurring to the right of dots, and add states corresponding to all@that are reachable through the relation> left-corner =. As indicated in equation (6.3), C/¸contributions to the forward probabilities of new states have to be summed when several paths lead to theproductions=same state. However, the summation in equation (6.3) can be mostly eliminated if the Æ values for all oldstates with the same nonterminal> are summed first, and then multiplied by © . These quantities CTœ=ë0,>to give the forward probability for the predicted state.@E06.6.2 CompletionUnlike prediction, the completion step still involves iteration. Each complete state derived bycompletion can potentially feed other completions. An important detail here is that to ensure that allcontributions to a state’s andŠ are summed before proceeding with using that state as input to furthercompletion steps.ÆOne approach to this problem is to insert complete states into a prioritized queue. The queue ordersstates by their start indices, highest first. This is because states corresponding to later expansion always haveto be completed first before they can lead to the completion of earlier expansions. For each start index, theentries are managed as a first-in-first-out queue, ensuring that the directed dependency graph formed by thestates is traversed in breadth-first order.A completion pass can now be implemented as follows. Initially, all complete states from theprevious scanning step are inserted in the queue. States are then removed from the front of the queue, andused to complete other states. Among the new states thus produced, complete ones are again added to thequeue. The process iterates until no more states remain in the queue. Because the computation of probabilitiesalready includes chains of unit productions, states derived from such productions need not be queued, whichalso ensures that the iteration terminates.A similar queuing scheme, with the start index order reversed, can be used for the reverse completionstep needed in the computation of outer probabilities (Section 6.5.2).
²,+666²N²²²²+NNNNN,+++N£?+ 20+N{£££+L+?01£+CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 1596.6.3 Efficient parsing with large sparse grammarsDuring work with a moderate-sized, application-specific natural language grammar taken from theBeRP system (Jurafsky et al. 1994b) we had opportunity to optimize our implementation of the algorithm.Below we relate some of the lessons learned in the process.6.6.3.1 Speeding up matrix inversionsfrom a matrix + ,Both prediction and completion steps make use of a matrix © defined as a geometric series derived©É6,2²+ 2N £££6Both © and are indexed by the nonterminals in the grammar. The matrix is derived from the SCFG rules+ +and probabilities (either the left-corner relation or the unit-production relation).For an application using a fixed grammar the time taken by the precomputation of left-corner andunit-productionmatrices may not be crucial, since it occurs off-line. There are cases, however, when that costshould be minimal, e.g., when rule probabilities are iteratively reestimated.Even if the matrix +is sparse, the matrix inversion can be prohibitive for large numbers ofnonterminals ¢ . Empirically, matrices of rank ¢ with a bounded number t of non-zero entries in each row(i.e., t is independent of ¢ ) can be inverted in time ð20 , whereas a full matrix of size ¢ ¾g¢ would require¢time ð3 0 ¢ .In many cases the grammar has a relatively small number of nonterminals that have productionsinvolving other nonterminals in a left-corner (or the RHS of a unit-production). Only those nonterminalscan have non-zero contributions to the higher powers of the matrix + . This fact can be used to substantiallyreduce the cost of the matrix inversion needed to compute © .Let + Ù be a subset of the entries of + , namely, only those elements indexed by nonterminals thathave a non-empty row in + . For example, for the left-corner computation, + Ù is obtained from + by deletingall rows and columns indexed by nonterminals that do not have productions starting with nonterminals. LetÙ be the identity matrix over the same set of nonterminals as + Ù . Then © can be computed as,2²© 6Ù NÙ NÙ 2N|£££ 0V+,2²,2²Ù LÙ 01+augmented with zero elements to match the dimensions of the right operand, + .The speedups obtained with this technique can be substantial. For a grammar with 789 nonterminals,of which only 132 have nonterminal productions, the left-corner matrix was computed in 12 seconds (includingthe final multiply with + and addition of ² ). Inversion of the full matrix ² Ltook 4 minutes 28 seconds. 1717 These figures are not very meaningful for their absolute values. All measurements were obtained on a Sun SPARCstation 2 with aN© Ù+Here ©Ù is the inverse of ² ÙÁLÙ , and denotes a matrix multiplication in which the left operand is first
Page 1 and 2:
The dissertation of Andreas Stolcke
Page 3 and 4:
Bayesian Learning of Probabilistic
Page 5 and 6:
iAcknowledgmentsLife and work in Be
Page 7 and 8:
iiiContentsList of FiguresList of T
Page 9 and 10:
CONTENTSv4.5.4 Summary and Discussi
Page 14 and 15:
CHAPTER 1. INTRODUCTION 2Instance-b
Page 16 and 17:
CHAPTER 1. INTRODUCTION 4A.0.830.33
Page 18 and 19:
CHAPTER 1. INTRODUCTION 6the ¨ 0 l
Page 20 and 21:
..1 £££1; 450,1 £££1; 450CHAP
Page 22 and 23:
VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
Page 24 and 25:
,,vv,v,v,,directly. However, note t
Page 26 and 27:
4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
Page 28 and 29:
6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
Page 30 and 31:
CHAPTER 2. FOUNDATIONS 18As more da
Page 32 and 33:
CHAPTER 2. FOUNDATIONS 20Global mod
Page 34 and 35:
CHAPTER 2. FOUNDATIONS 22¡An expli
Page 36 and 37:
ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
Page 38 and 39:
666CHAPTER 2. FOUNDATIONS 262.5.7 P
Page 40 and 41:
It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
Page 42 and 43:
, uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
Page 44 and 45:
CHAPTER 3. HIDDEN MARKOV MODELS 32T
Page 46 and 47:
CHAPTER 3. HIDDEN MARKOV MODELS 34R
Page 48 and 49:
4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
Page 50 and 51:
6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
Page 52 and 53:
2. For each candidate"I!computeLet"
Page 54 and 55:
6\“ç%&ät\“ç tè ä, u¦¸¹u
Page 56 and 57:
, u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
Page 58 and 59:
CHAPTER 3. HIDDEN MARKOV MODELS 46l
Page 60 and 61:
CHAPTER 3. HIDDEN MARKOV MODELS 48c
Page 62 and 63:
CHAPTER 3. HIDDEN MARKOV MODELS 50d
Page 64 and 65:
CHAPTER 3. HIDDEN MARKOV MODELS 520
Page 66 and 67:
correlation between initial and fin
Page 68 and 69:
,CHAPTER 3. HIDDEN MARKOV MODELS 56
Page 70 and 71:
Page 72 and 73:
,CHAPTER 3. HIDDEN MARKOV MODELS 60
Page 74 and 75:
CHAPTER 3. HIDDEN MARKOV MODELS 62b
Page 76 and 77:
CHAPTER 3. HIDDEN MARKOV MODELS 64t
Page 78 and 79:
CHAPTER 3. HIDDEN MARKOV MODELS 66t
Page 80 and 81:
CHAPTER 3. HIDDEN MARKOV MODELS 68s
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
CHAPTER 3. HIDDEN MARKOV MODELS 74b
Page 88 and 89:
domain. 3 In short, we will leave o
Page 90 and 91:
,,,,,£CHAPTER 4. STOCHASTIC CONTEX
Page 92 and 93:
9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
Page 94 and 95:
¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
Page 96 and 97:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
Page 106 and 107:
,= ===I¸theybÜ„thiscg„\\ ¸¸
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
104Chapter 5Probabilistic Attribute
Page 118 and 119:
,1makingandCHAPTER 5. PROBABILISTIC
Page 120 and 121: CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 134 and 135: 122Chapter 6Efficient parsing with
Page 136 and 137: 1z1CHAPTER 6. EFFICIENT PARSING WIT
Page 138 and 139: and each state in set ( -ÏH ):6)
Page 140 and 141: NPDetVTVI P CHAPTER 6. EFFICIENT PA
Page 142 and 143: CHAPTER 6. EFFICIENT PARSING WITH S
Page 144 and 145: ) In particular, the string probabi
Page 146 and 147: H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
Page 148 and 149: ©) 6#=©,©,©,,ÆNNö,²NL++and>+
Page 150 and 151: ) The probabilistic unit-production
Page 152 and 153: ¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
Page 154 and 155: The forward and inner probabilities
Page 156 and 157: 9 itself²²NN++9 ¸0ÌLL++??1£,=C
Page 158 and 159: ,,by nonterminals. Multiplying this
Page 160 and 161: 6CHAPTER 6. EFFICIENT PARSING WITH
Page 162 and 163: description. Again, we ignore this
Page 164 and 165: for all pairs of states d =¸+= Š
Page 166 and 167: ,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
Page 172 and 173: +CHAPTER 6. EFFICIENT PARSING WITH
Page 174 and 175: 1CHAPTER 6. EFFICIENT PARSING WITH
Page 178 and 179: = ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
Page 180 and 181: 168Chapter 7-grams from Stochastic
Page 182 and 183: CHAPTER 7. -GRAMS FROM STOCHASTIC
Page 184 and 185: )ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
Page 186 and 187: -grams CCHAPTER 7. -GRAMS FROM ST
Page 188 and 189: ,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N
Page 190 and 191: ,,CHAPTER 7. -GRAMS FROM STOCHASTI
Page 192 and 193: Consider the following problem: sta
Page 194 and 195: CHAPTER 8. FUTURE DIRECTIONS 1828.2
Page 196 and 197: 184BibliographyAHO, ALFRED V., RAVI
Page 198 and 199: BIBLIOGRAPHY 186DAGAN, IDO, FERNAND
Page 200 and 201: BIBLIOGRAPHY 188——, & ——. 1
Page 202 and 203: BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
Page 204: BIBLIOGRAPHY 192WALLACE, C. S., & P
show all

The dissertation of Andreas Stolcke is approved: University of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?