The dissertation of Andreas Stolcke is approved: University of ...

More documents

Recommendations

Info

+CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 1606.6.3.2 Efficient predictionAs discussed in Section 6.4.9, the worst-case run-time on fully parameterized CNF grammars isdominated by the completion step. However, this is not necessarily true of sparse grammars. Our experimentsshowed that the computation is dominated by the generation of Earley states during the prediction steps.It is therefore worthwhile to minimize the total number of predicted states generated by the parser.Since predicted states only affect the derivation if they lead to subsequent scanning we can use the next inputsymbol to constrain the relevant predictions. To this end, we compute the extended left-corner © relation w,indicating which terminals can appear as left corners of which ©w nonterminals. is a Boolean matrix withrows indexed by nonterminals and columns indexed by terminals. It can be computed as the productwhere + w6|© w©wa non-zero entry H_ at iff there is a production for H nonterminal that starts with terminals .hasis the old left-corner relation.©During the prediction step we can ignore incoming states whose RHS nonterminal following thedot cannot have the current input as a left-corner, and then eliminate from the remaining predictions all thosewhose LHS cannot produce the current input as a left-corner. These filtering steps are very fast as they involveonly table lookup.On a test corpus this technique cut the number of generated predictions to almost 1/4 and speeded upparsing by a factor of 3.3. The corpus consisted of 1143 sentence with an average length of 4.65 words. Thetop-down prediction alone generated 991781 states and parsed at a rate of 590 milliseconds per sentence. Withbottom-up filtered prediction only 262287 states were generated, resulting in 180 milliseconds per sentence.A trivial optimization often found in Earley parsers is to precompute the entire first prediction step,as it doesn’t depend on the input and may eliminate a substantial portion of the total predictions per sentence. 18We found that with bottom-up filtering this technique lost its edge: scanning the precomputed predicted statesturned out to be slower than computing the zeroth state set filtered by the first input.6.7 Discussion6.7.1 Relation to finite-state modelsThroughout the exposition of the Earley algorithm and its probabilistic extension we have beenalluding, in concepts and terminology, to the algorithms used with probabilistic finite-state models,in particularHidden Markov Models (Rabiner & Juang 1986). Many concepts carry over, if suitably generalized, mostnotably that of forward probabilities. Prefix probabilities can be computed from forward probabilities by theEarley parser just as in HMMs because Earley states summarize past history in much the same way as thestates in a finite-state model. There are important differences, however. The number of states in an HMMCommonLisp/CLOS implementation of generic sparse matrices that was not particularly optimized for this task.18 The first prediction step accounted for roughly 30% of all predictions on our test corpus.
CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 161remains fixed, whereas the number of possible Earley states grows linearly with the length of the input (dueto the start index).Incidentally, the HMM concept of backward probabilities has no useful analog in Earley parsing.It is tempting to define´@, í 0 as the conditional probability that the generator produces the remaining stringgiven that it is currently in state . Alas, this would be an ill-defined quantity since the generation of a suffixdepends (via completion) on more than just the current state.The solution found by Baker (1979), adopted here in modified form, is to use outer probabilitiesíinstead of ’backward’ probabilities. Outer probabilities follow the hierarchical structure of a derivation, ratherthan the sequential structure imposed by left-to-right processing. Fortunately, outer probability computationis just as well supported by the Earley chart as forward and inner probabilities. 196.7.2 Online pruningIn finite-state parsing (especially speech decoding) one often makes use of the forward probabilitiesfor pruning partial parses before having seen the entire input. Pruning is formally straightforward in Earleyparsers: in each state set, rank states according to their values, then remove those states with smallprobabilities compared to the current best candidate, or simply those whose rank exceed a given limit. NoticeÆthis will not only omit certain parses, but will also underestimate the forward and inner probabilities of thederivations that remain. Pruning procedures have to be evaluated empirically since they invariably sacrificecompleteness and, in the case of the Viterbi algorithm, optimality of the result.While Earley-based on-line pruning awaits further study, there is reason to believe the Earleyframework has inherent advantages over strategies based only on bottom-up information (including so-called‘over-the-top’ parsers). Context-free forward probabilities include all available probabilistic information(subject to assumptions implicit in the SCFG formalism) available from an input prefix, whereas the usualinside probabilities do not take in account the nonterminal prior probabilities that result from the top-downrelation to the start state. Using top-down constraints does not necessarily mean sacrificing robustness, asdiscussed in Section 6.5.4. On the contrary, by using Earley-style parsing with a set of carefully designed andestimated ‘fault tolerant’ top-level productions, it should be possible to use probabilities to better advantagein robust parsing. This approach is a subject of ongoing work in the tight-coupling framework of the BeRPsystem (Jurafsky et al. 1994b:see below).6.7.3 Relation to probabilistic LR parsingOne of the major alternative context-free parsing paradigms besides Earley’s algorithmis LR parsing(Aho & Ullman 1972). A comparison of the two approaches, both in their probabilistic and non-probabilisticaspects, is interesting and provides useful insights. The following remarks assume familiarity with bothapproaches. We sketch the fundamental relations, as well as the the important tradeoffs between the two19 The closest thing to a HMM backward probability is probably the suffix probability ¦Œ[ 2 ·…¸ §s‡.
Page 1 and 2:
The dissertation of Andreas Stolcke
Page 3 and 4:
Bayesian Learning of Probabilistic
Page 5 and 6:
iAcknowledgmentsLife and work in Be
Page 7 and 8:
iiiContentsList of FiguresList of T
Page 9 and 10:
CONTENTSv4.5.4 Summary and Discussi
Page 14 and 15:
CHAPTER 1. INTRODUCTION 2Instance-b
Page 16 and 17:
CHAPTER 1. INTRODUCTION 4A.0.830.33
Page 18 and 19:
CHAPTER 1. INTRODUCTION 6the ¨ 0 l
Page 20 and 21:
..1 £££1; 450,1 £££1; 450CHAP
Page 22 and 23:
VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
Page 24 and 25:
,,vv,v,v,,directly. However, note t
Page 26 and 27:
4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
Page 28 and 29:
6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
Page 30 and 31:
CHAPTER 2. FOUNDATIONS 18As more da
Page 32 and 33:
CHAPTER 2. FOUNDATIONS 20Global mod
Page 34 and 35:
CHAPTER 2. FOUNDATIONS 22¡An expli
Page 36 and 37:
ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
Page 38 and 39:
666CHAPTER 2. FOUNDATIONS 262.5.7 P
Page 40 and 41:
It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
Page 42 and 43:
, uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
Page 44 and 45:
CHAPTER 3. HIDDEN MARKOV MODELS 32T
Page 46 and 47:
CHAPTER 3. HIDDEN MARKOV MODELS 34R
Page 48 and 49:
4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
Page 50 and 51:
6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
Page 52 and 53:
2. For each candidate"I!computeLet"
Page 54 and 55:
6\“ç%&ät\“ç tè ä, u¦¸¹u
Page 56 and 57:
, u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
Page 58 and 59:
CHAPTER 3. HIDDEN MARKOV MODELS 46l
Page 60 and 61:
CHAPTER 3. HIDDEN MARKOV MODELS 48c
Page 62 and 63:
CHAPTER 3. HIDDEN MARKOV MODELS 50d
Page 64 and 65:
CHAPTER 3. HIDDEN MARKOV MODELS 520
Page 66 and 67:
correlation between initial and fin
Page 68 and 69:
,CHAPTER 3. HIDDEN MARKOV MODELS 56
Page 70 and 71:
Page 72 and 73:
,CHAPTER 3. HIDDEN MARKOV MODELS 60
Page 74 and 75:
CHAPTER 3. HIDDEN MARKOV MODELS 62b
Page 76 and 77:
CHAPTER 3. HIDDEN MARKOV MODELS 64t
Page 78 and 79:
CHAPTER 3. HIDDEN MARKOV MODELS 66t
Page 80 and 81:
CHAPTER 3. HIDDEN MARKOV MODELS 68s
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
CHAPTER 3. HIDDEN MARKOV MODELS 74b
Page 88 and 89:
domain. 3 In short, we will leave o
Page 90 and 91:
,,,,,£CHAPTER 4. STOCHASTIC CONTEX
Page 92 and 93:
9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
Page 94 and 95:
¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
Page 96 and 97:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
Page 106 and 107:
,= ===I¸theybÜ„thiscg„\\ ¸¸
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
104Chapter 5Probabilistic Attribute
Page 118 and 119:
,1makingandCHAPTER 5. PROBABILISTIC
Page 120 and 121:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 122 and 123: CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 134 and 135: 122Chapter 6Efficient parsing with
Page 136 and 137: 1z1CHAPTER 6. EFFICIENT PARSING WIT
Page 138 and 139: and each state in set ( -ÏH ):6)
Page 140 and 141: NPDetVTVI P CHAPTER 6. EFFICIENT PA
Page 142 and 143: CHAPTER 6. EFFICIENT PARSING WITH S
Page 144 and 145: ) In particular, the string probabi
Page 146 and 147: H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
Page 148 and 149: ©) 6#=©,©,©,,ÆNNö,²NL++and>+
Page 150 and 151: ) The probabilistic unit-production
Page 152 and 153: ¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
Page 154 and 155: The forward and inner probabilities
Page 156 and 157: 9 itself²²NN++9 ¸0ÌLL++??1£,=C
Page 158 and 159: ,,by nonterminals. Multiplying this
Page 160 and 161: 6CHAPTER 6. EFFICIENT PARSING WITH
Page 162 and 163: description. Again, we ignore this
Page 164 and 165: for all pairs of states d =¸+= Š
Page 166 and 167: ,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
Page 170 and 171: are then summed over all nontermina
Page 174 and 175: 1CHAPTER 6. EFFICIENT PARSING WITH
Page 178 and 179: = ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
Page 180 and 181: 168Chapter 7-grams from Stochastic
Page 182 and 183: CHAPTER 7. -GRAMS FROM STOCHASTIC
Page 184 and 185: )ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
Page 186 and 187: -grams CCHAPTER 7. -GRAMS FROM ST
Page 188 and 189: ,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N
Page 190 and 191: ,,CHAPTER 7. -GRAMS FROM STOCHASTI
Page 192 and 193: Consider the following problem: sta
Page 194 and 195: CHAPTER 8. FUTURE DIRECTIONS 1828.2
Page 196 and 197: 184BibliographyAHO, ALFRED V., RAVI
Page 198 and 199: BIBLIOGRAPHY 186DAGAN, IDO, FERNAND
Page 200 and 201: BIBLIOGRAPHY 188——, & ——. 1
Page 202 and 203: BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
Page 204: BIBLIOGRAPHY 192WALLACE, C. S., & P
show all

The dissertation of Andreas Stolcke is approved: University of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?