The dissertation of Andreas Stolcke is approved: University of ...

More documents

Recommendations

Info

1CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 162frameworks. 20Like an Earley parser, LR parsing uses dotted productions, or items, to keep track of the progressof derivations as the input is processed. (The start indices are not part of LR items: we may therefore use theterm item to refer to both LR items and Earley states without start indices.) An Earley parser constructs setsof possible items on the fly, by following all possible partial derivations. An LR parser, on the other hand,has access to a complete list of sets of possible items computed beforehand, and at runtime simply followstransitions between these sets. The item sets are known as the ‘states’ of the LR parser. 21 A grammar issuitable for LR parsing if these transitions can be performed deterministically by considering only the nextinput and the contents of a shift-reduce stack. Generalized LR parsing is an extension that allows paralleltracking of multiple state transitions and stack actions by using a graph-structured stack (Tomita 1986).Probabilistic LR parsing (Wright 1990) is based on LR items augmented with certain conditionalprobabilities. Specifically, the t probability associated with an LR ) ¸item ˜ is, in our terminology, a .M£normalized forward probability:@ ,) ¸ Æ.—£ ˜M0where the denominator is the probability of the current prefix. 22 LR item probabilities, are thus conditioned@A?10 0å…å…å +-,91 @A?0å…å…å 10 .t @ Cforward probabilities, and can be used to compute conditional probabilities of next words: isthe sum of the ’s of all items having to the right of the dot (extra work is required if the item correspondsto a ‘reduce’ state, i.e., if the dot is in final position).Notice that the definition of t is independent of H as well as the start index of the correspondingEarley state. Therefore, to ensure that item probabilities are correct independent of input position, item setswould have to be constructed so that their probabilities are unique within each set. However, this may beimpossible given that the probabilities can take on infinitely many values and in general depend on the historyof the parse. The solution used by Wright (1990) is to collapse items whose probabilities are within a smalltoleranceAand are otherwise identical. The same threshold is used to simplify a number of other technicalproblems, e.g., left-corner probabilities are computed by iterated prediction, until the resulting changes inprobabilities are smaller thanA.Subject to these approximations, then, a probabilistic LR parser can computeprefix probabilities by multiplying successive conditional probabilities for the words it sees. 23As an alternative to the computation of LR transition probabilities from a given SCFG, one mightinstead estimate such probabilities directly from traces of parses on a training corpus. Due to the impreciserelationship between LR probabilities and SCFG probabilities it is not entirely clear if the model thusestimated corresponds to any particular SCFG in the usual sense. However, Briscoe & Carroll (1993) turnt 620 Like Earley parsers, LR parsers can be built using various amounts of lookahead to make the operation of the parser (more)deterministic, and hence more efficient. Only the case of zero-lookahead, LR(0), is considered here; the correspondence between LR()parsers and-lookahead Earley parsers is discussed in the literature (Earley 1970; Aho & Ullman 1972).21 Once more, it is helpful to compare this to a closely related finite-state concept: the states of the LR parser correspond to sets ofEarley states, similar to the way the states of a deterministic FSA correspond to sets of states of an equivalent non-deterministic FSAunder the standard subset construction.22 The identity of this expression with the item probabilities of Wright (1990) can be proved by induction on the steps performed tocompute the ‚ ’s, as shown in Appendix 6.9. Unfortunately, Wright presents this computation without giving a precise definition of whatthese numbers are probabilities of.23 It is not clear what the numerical properties of this approximation are, e.g., how the errors will accumulate over longer parses.
CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 163this incongruity into an advantage by using the LR parser as a probabilistic model its own right, and showhow LR probabilities can be extended to capture non-context-free contingencies.The problem of capturing more complex distributional constraints in natural language is clearlyimportant, but well beyond the scope of this chapter. We simply remark that it should be possible to define‘interesting’ non-standard probabilities in terms of Earley parser actions so as to better model non-context-freephenomena. Apart from such considerations, the choice between LR methods and Earley parsing is a typicalspace-time tradeoff. Even though an Earley parser runs with the same linear time and space complexity as anLR parser on grammars of the appropriate LR class, the constant factors involved will be much in favor of theLR parser as almost all the work has already been compiled into its transition and action table. However, thesize of LR parser tables can be exponential in the size of the grammar (due to the number of potential itemsubsets). Furthermore, if the generalized LR method is used for dealing with non-deterministic grammars(Tomita 1986) the runtime on arbitrary inputs may also grow exponentially.The bottom line is that each application’s needs have to be evaluated against the pros and consof both approaches to find the best solution. From a theoretical point of view, the Earley approach hasthe inherent appeal of being the more general (and exact) solution to the computation of the various SCFGprobabilities.6.7.4 Other related workThe literature on Earley-based probabilistic parsers is sparse, presumably because of the precedentset by the Inside/Outside algorithm, which is more naturally formulated as a bottom-up algorithm.Schabes (1991) shows that Earley’s algorithm can be augmented with the computation of inner andouter probabilities, in much the same way as presented here. However, the algorithm presented is not fullygeneral as it is restricted to sentences with bounded ambiguity, i.e., there are no provisions for handling unitproduction cycles. Schabes focuses on grammar estimation (using generalized inside and outside probabilities)and only mentions the potential for computing prefix probabilities.Magerman & Marcus (1991) are interested primarily in scoring functions to guide a parser efficientlyto the most promising parses. They use Earley-style top-down prediction only to suggest worthwhile parses,not to compute precise probabilities. 24Both Nakagawa (1987) and Päseler (1988) use a non-probabilistic Earley parser augmented with‘word match’ scoring. Though not truly probabilistic, these algorithms are similar to the Viterbi versiondescribed here, in that they find a parse that optimizes the accumulated matching scores (without regard torule probabilities). Prediction and completion loops do not come into play since no precise inner or forwardprobabilities are computed.Dan Jurafsky (personal communication) had written an Earley parser for the Berkeley RestaurantProject (BeRP) speech understanding system that computed forward probabilities for restricted grammars(without left-corner or unit production recursion). The parser now uses the methods described here to provide24 They argue that precise probabilities are inappropriate in natural language parsing.
Page 1 and 2:
The dissertation of Andreas Stolcke
Page 3 and 4:
Bayesian Learning of Probabilistic
Page 5 and 6:
iAcknowledgmentsLife and work in Be
Page 7 and 8:
iiiContentsList of FiguresList of T
Page 9 and 10:
CONTENTSv4.5.4 Summary and Discussi
Page 14 and 15:
CHAPTER 1. INTRODUCTION 2Instance-b
Page 16 and 17:
CHAPTER 1. INTRODUCTION 4A.0.830.33
Page 18 and 19:
CHAPTER 1. INTRODUCTION 6the ¨ 0 l
Page 20 and 21:
..1 £££1; 450,1 £££1; 450CHAP
Page 22 and 23:
VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
Page 24 and 25:
,,vv,v,v,,directly. However, note t
Page 26 and 27:
4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
Page 28 and 29:
6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
Page 30 and 31:
CHAPTER 2. FOUNDATIONS 18As more da
Page 32 and 33:
CHAPTER 2. FOUNDATIONS 20Global mod
Page 34 and 35:
CHAPTER 2. FOUNDATIONS 22¡An expli
Page 36 and 37:
ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
Page 38 and 39:
666CHAPTER 2. FOUNDATIONS 262.5.7 P
Page 40 and 41:
It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
Page 42 and 43:
, uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
Page 44 and 45:
CHAPTER 3. HIDDEN MARKOV MODELS 32T
Page 46 and 47:
CHAPTER 3. HIDDEN MARKOV MODELS 34R
Page 48 and 49:
4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
Page 50 and 51:
6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
Page 52 and 53:
2. For each candidate"I!computeLet"
Page 54 and 55:
6\“ç%&ät\“ç tè ä, u¦¸¹u
Page 56 and 57:
, u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
Page 58 and 59:
CHAPTER 3. HIDDEN MARKOV MODELS 46l
Page 60 and 61:
CHAPTER 3. HIDDEN MARKOV MODELS 48c
Page 62 and 63:
CHAPTER 3. HIDDEN MARKOV MODELS 50d
Page 64 and 65:
CHAPTER 3. HIDDEN MARKOV MODELS 520
Page 66 and 67:
correlation between initial and fin
Page 68 and 69:
,CHAPTER 3. HIDDEN MARKOV MODELS 56
Page 70 and 71:
Page 72 and 73:
,CHAPTER 3. HIDDEN MARKOV MODELS 60
Page 74 and 75:
CHAPTER 3. HIDDEN MARKOV MODELS 62b
Page 76 and 77:
CHAPTER 3. HIDDEN MARKOV MODELS 64t
Page 78 and 79:
CHAPTER 3. HIDDEN MARKOV MODELS 66t
Page 80 and 81:
CHAPTER 3. HIDDEN MARKOV MODELS 68s
Page 82 and 83:
Page 84 and 85:
Page 86 and 87:
CHAPTER 3. HIDDEN MARKOV MODELS 74b
Page 88 and 89:
domain. 3 In short, we will leave o
Page 90 and 91:
,,,,,£CHAPTER 4. STOCHASTIC CONTEX
Page 92 and 93:
9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
Page 94 and 95:
¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
Page 96 and 97:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
Page 98 and 99:
Page 100 and 101:
Page 102 and 103:
Page 104 and 105:
==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
Page 106 and 107:
,= ===I¸theybÜ„thiscg„\\ ¸¸
Page 108 and 109:
Page 110 and 111:
Page 112 and 113:
Page 114 and 115:
Page 116 and 117:
104Chapter 5Probabilistic Attribute
Page 118 and 119:
,1makingandCHAPTER 5. PROBABILISTIC
Page 120 and 121:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 122 and 123:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 124 and 125: CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 134 and 135: 122Chapter 6Efficient parsing with
Page 136 and 137: 1z1CHAPTER 6. EFFICIENT PARSING WIT
Page 138 and 139: and each state in set ( -ÏH ):6)
Page 140 and 141: NPDetVTVI P CHAPTER 6. EFFICIENT PA
Page 142 and 143: CHAPTER 6. EFFICIENT PARSING WITH S
Page 144 and 145: ) In particular, the string probabi
Page 146 and 147: H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
Page 148 and 149: ©) 6#=©,©,©,,ÆNNö,²NL++and>+
Page 150 and 151: ) The probabilistic unit-production
Page 152 and 153: ¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
Page 154 and 155: The forward and inner probabilities
Page 156 and 157: 9 itself²²NN++9 ¸0ÌLL++??1£,=C
Page 158 and 159: ,,by nonterminals. Multiplying this
Page 160 and 161: 6CHAPTER 6. EFFICIENT PARSING WITH
Page 162 and 163: description. Again, we ignore this
Page 164 and 165: for all pairs of states d =¸+= Š
Page 166 and 167: ,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
Page 170 and 171: are then summed over all nontermina
Page 172 and 173: +CHAPTER 6. EFFICIENT PARSING WITH
Page 178 and 179: = ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
Page 180 and 181: 168Chapter 7-grams from Stochastic
Page 182 and 183: CHAPTER 7. -GRAMS FROM STOCHASTIC
Page 184 and 185: )ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
Page 186 and 187: -grams CCHAPTER 7. -GRAMS FROM ST
Page 188 and 189: ,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N
Page 190 and 191: ,,CHAPTER 7. -GRAMS FROM STOCHASTI
Page 192 and 193: Consider the following problem: sta
Page 194 and 195: CHAPTER 8. FUTURE DIRECTIONS 1828.2
Page 196 and 197: 184BibliographyAHO, ALFRED V., RAVI
Page 198 and 199: BIBLIOGRAPHY 186DAGAN, IDO, FERNAND
Page 200 and 201: BIBLIOGRAPHY 188——, & ——. 1
Page 202 and 203: BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
Page 204: BIBLIOGRAPHY 192WALLACE, C. S., & P
show all

The dissertation of Andreas Stolcke is approved: University of ...

Create successful ePaper yourself

Delete template?

Save as template?