The dissertation of Andreas Stolcke is approved: University of ...

More documents

Recommendations

Info

122Chapter 6Efficient parsing with StochasticContext-free Grammars6.1 IntroductionSo far we have discussed stochastic context-free grammar mainly from the point of view of learning.A much more standard task is the use of a preexisting SCFG for various problems in computation linguisticsapplications requiring probabilistic processing. In the literature, SCFGs are used for the selection of parsesfor ambiguous inputs (Fujisaki et al. 1991); to guide the rule choice efficiently during parsing (Jones & Eisner1992a); to compute island probabilities for non-linear parsing (Corazza et al. 1991). In speech recognition,probabilistic context-free grammars play a central role in integrating low-level word models with higherlevellanguage models (Ney 1992), as well as in non-finite state acoustic and phonotactic modeling (Lari& Young 1991). In some work, context-free grammars are combined with scoring functions that are notstrictly probabilistic (Nakagawa 1987), or they are used with context-sensitive and/or semantic probabilities(Magerman & Marcus 1991; Magerman & Weir 1992; Jones & Eisner 1992a; Briscoe & Carroll 1993).Althoughclearly not a perfect model of natural language, stochastic context-free grammars (SCFGs)are superior to non-probabilisticCFGs, with probabilitytheory providinga sound theoretical basis for ranking,pruning, etc. All of the applications listed above involve (or could potentially make use of) one or more ofthe following standard tasks, compiled by Jelinek & Lafferty (1991). 11. What is the probability that a given string is generated by a grammar{?2. What is the single most likely parse (or derivation) for 1 ?3. What is the probability that 1 occurs as a prefix of some string generated by{ (the prefix probabilityof 1 )?1 Their paper phrases these problem in terms of context-free probabilistic grammars, but they generalize in obvious ways to otherclasses of models.1
1CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 1234. How should the parameters (e.g., rule in{ probabilities) be chosen to maximize the probability over atraining set of strings?The incremental model merging algorithm for SCFGs (Chapter 4) requires either (1) or (2) forefficient operation. Traditional grammar parameter estimation is essentially (4), and is typically also usedas a post-processing step to model merging (after the grammar structure has been learned). The algorithmdescribed in this chapter can compute solutions to all four of these problems in a single framework, with anumber of additional advantages over previously presented isolated solutions. It was originally developedsolely as a general and efficient tool and accessory to the model merging algorithm. We then realized that italso solves task (3) in an efficient and elegant fashion, greatly expanding its usefulness, as described below.Most probabilistic parsers are based on a generalization of bottom-up chart parsing, such as theCYK algorithm. Partial parses are assembled just as in non-probabilistic parsing (modulo possible pruningbased on probabilities), while substring probabilities (also known as ‘inside’ probabilities) can be computedin a straightforward way. Thus, the CYK chart parser underlies the ‘standard’ solutions to problems (1) and(4) (Baker 1979), as well as (2) (Jelinek 1985). While the Jelinek & Lafferty (1991) solution to problem (3)is not a direct extension of CYK parsing they nevertheless present their algorithm in terms of its similaritiesto the computation of inside probabilities.In our algorithm, computations for tasks (1) and (3) proceed incrementally, as the parser scans itsinput from left to right; in particular, prefix probabilities are available as soon as the prefix has been seen, andare updated incrementally as it is extended. Tasks (2) and (4) require one more (reverse) pass over the parsetable constructed from the input.Incremental, left-to-right computation of prefix probabilities is particularly important since thatis a necessary condition for using SCFGs as a replacement for finite-state language models in many applications,such a speech decoding. As pointed out by Jelinek & Lafferty (1991), knowing probabilities+-,;1£££1 @0 0 for arbitrary prefixes 0 £££1 @enables probabilistic prediction of possible follow-words 1 @ # 1,1as 1. 0 £££1>@0Í6+-,210 £££1$@C1$@ +-,;1# 10_³ 0 £££1$@0 . These conditional probabilities can then be used as#word transition probabilities in a Viterbi-style decoder or to incrementally compute the cost function for a+-,21$@stack decoder (Bahl et al. 1983).Another application where prefix probabilities play a central role is the extraction ¢ of -gramprobabilities from SCFGs, a problem that is the subject of Chapter 7. Here, too, efficient incrementalcomputation saves time since the work for common prefix strings can be shared.The key to most of the features of our algorithm is that it is based on the top-down parsing methodfor non-probabilistic CFGs developed by Earley (1970). Earley’s algorithm is appealing because it runs withbest-known efficiency on a number of special classes of grammars. In particular, Earley parsing is moreefficient than the bottom-up methods in cases where top-down prediction can rule out potential parses ofsubstrings. The worst-case computational expense of the algorithm (either for the complete input, or theincrementally for each new word) is as good as that of the other known specialized algorithms, but can besubstantially better on well-known grammar classes.
Page 1 and 2:
The dissertation of Andreas Stolcke
Page 3 and 4:
Bayesian Learning of Probabilistic
Page 5 and 6:
iAcknowledgmentsLife and work in Be
Page 7 and 8:
iiiContentsList of FiguresList of T
Page 9 and 10:
CONTENTSv4.5.4 Summary and Discussi
Page 14 and 15:
CHAPTER 1. INTRODUCTION 2Instance-b
Page 16 and 17:
CHAPTER 1. INTRODUCTION 4A.0.830.33
Page 18 and 19:
CHAPTER 1. INTRODUCTION 6the ¨ 0 l
Page 20 and 21:
..1 £££1; 450,1 £££1; 450CHAP
Page 22 and 23:
VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
Page 24 and 25:
,,vv,v,v,,directly. However, note t
Page 26 and 27:
4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
Page 28 and 29:
6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
Page 30 and 31:
CHAPTER 2. FOUNDATIONS 18As more da
Page 32 and 33:
CHAPTER 2. FOUNDATIONS 20Global mod
Page 34 and 35:
CHAPTER 2. FOUNDATIONS 22¡An expli
Page 36 and 37:
ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
Page 38 and 39:
666CHAPTER 2. FOUNDATIONS 262.5.7 P
Page 40 and 41:
It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
Page 42 and 43:
, uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
Page 44 and 45:
CHAPTER 3. HIDDEN MARKOV MODELS 32T
Page 46 and 47:
CHAPTER 3. HIDDEN MARKOV MODELS 34R
Page 48 and 49:
4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
Page 50 and 51:
6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
Page 52 and 53:
2. For each candidate"I!computeLet"
Page 54 and 55:
6\“ç%&ät\“ç tè ä, u¦¸¹u
Page 56 and 57:
, u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
Page 58 and 59:
CHAPTER 3. HIDDEN MARKOV MODELS 46l
Page 60 and 61:
CHAPTER 3. HIDDEN MARKOV MODELS 48c
Page 62 and 63:
CHAPTER 3. HIDDEN MARKOV MODELS 50d
Page 64 and 65:
CHAPTER 3. HIDDEN MARKOV MODELS 520
Page 66 and 67:
correlation between initial and fin
Page 68 and 69:
,CHAPTER 3. HIDDEN MARKOV MODELS 56
Page 70 and 71:
Page 72 and 73:
,CHAPTER 3. HIDDEN MARKOV MODELS 60
Page 74 and 75:
CHAPTER 3. HIDDEN MARKOV MODELS 62b
Page 76 and 77:
CHAPTER 3. HIDDEN MARKOV MODELS 64t
Page 78 and 79:
CHAPTER 3. HIDDEN MARKOV MODELS 66t
Page 80 and 81:
CHAPTER 3. HIDDEN MARKOV MODELS 68s
Page 82 and 83:
Page 84 and 85: CHAPTER 3. HIDDEN MARKOV MODELS 723
Page 86 and 87: CHAPTER 3. HIDDEN MARKOV MODELS 74b
Page 88 and 89: domain. 3 In short, we will leave o
Page 90 and 91: ,,,,,£CHAPTER 4. STOCHASTIC CONTEX
Page 92 and 93: 9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
Page 94 and 95: ¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
Page 96 and 97: CHAPTER 4. STOCHASTIC CONTEXT-FREE
Page 104 and 105: ==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
Page 106 and 107: ,= ===I¸theybÜ„thiscg„\\ ¸¸
Page 116 and 117: 104Chapter 5Probabilistic Attribute
Page 118 and 119: ,1makingandCHAPTER 5. PROBABILISTIC
Page 120 and 121: CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 136 and 137: 1z1CHAPTER 6. EFFICIENT PARSING WIT
Page 138 and 139: and each state in set ( -ÏH ):6)
Page 140 and 141: NPDetVTVI P CHAPTER 6. EFFICIENT PA
Page 142 and 143: CHAPTER 6. EFFICIENT PARSING WITH S
Page 144 and 145: ) In particular, the string probabi
Page 146 and 147: H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
Page 148 and 149: ©) 6#=©,©,©,,ÆNNö,²NL++and>+
Page 150 and 151: ) The probabilistic unit-production
Page 152 and 153: ¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
Page 154 and 155: The forward and inner probabilities
Page 156 and 157: 9 itself²²NN++9 ¸0ÌLL++??1£,=C
Page 158 and 159: ,,by nonterminals. Multiplying this
Page 160 and 161: 6CHAPTER 6. EFFICIENT PARSING WITH
Page 162 and 163: description. Again, we ignore this
Page 164 and 165: for all pairs of states d =¸+= Š
Page 166 and 167: ,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
Page 170 and 171: are then summed over all nontermina
Page 172 and 173: +CHAPTER 6. EFFICIENT PARSING WITH
Page 174 and 175: 1CHAPTER 6. EFFICIENT PARSING WITH
Page 178 and 179: = ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
Page 180 and 181: 168Chapter 7-grams from Stochastic
Page 182 and 183: CHAPTER 7. -GRAMS FROM STOCHASTIC
Page 184 and 185:
)ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
Page 186 and 187:
-grams CCHAPTER 7. -GRAMS FROM ST
Page 188 and 189:
,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N
Page 190 and 191:
,,CHAPTER 7. -GRAMS FROM STOCHASTI
Page 192 and 193:
Consider the following problem: sta
Page 194 and 195:
CHAPTER 8. FUTURE DIRECTIONS 1828.2
Page 196 and 197:
184BibliographyAHO, ALFRED V., RAVI
Page 198 and 199:
BIBLIOGRAPHY 186DAGAN, IDO, FERNAND
Page 200 and 201:
BIBLIOGRAPHY 188——, & ——. 1
Page 202 and 203:
BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
Page 204:
BIBLIOGRAPHY 192WALLACE, C. S., & P
show all

The dissertation of Andreas Stolcke is approved: University of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?