The dissertation of Andreas Stolcke is approved: University of ...

More documents

Recommendations

Info

CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 904.3.6 Miscellaneous4.3.6.1 Restricted chunkingWe already observed that unrestricted chunking can produce arbitrary CFG structures. For practicalpurposes, however, the set of potential chunks needs to be restricted to avoid generating an infeasibly largenumber of hypothesis in each search step. Specifically, the following restrictions can optionally be imposed.1. No null productions. These would result from proposing empty chunks.2. No unit (or chain) productions. These would be the result of singleton chunks.3. A sequence of nonterminals (of any length exceeding 1) needs to occur at least twice in the grammar tobe a candidate for chunking.4. A chunk is replaced wherever it occurs, as opposed to choosing only a subset of the occurrences.It is not known what sort of global constraint the last two restrictions place on the grammars that can beinferred, since both make reference to the form of an intermediate grammar hypothesis, which depends onthe actually occurring samples and the dynamics of the search process.4.3.6.2 Chunking undoneOccasionally a chunking operation and the nonterminals created for it become superfluous inretrospect, because only one occurrence of the nonterminal remains in the grammar as a result of productionsmerging. In this case it is beneficial to undo the chunking operation, a step we call unchunking. Note thatthe final outcome in such cases could have been achieved by not choosing to chunk in the first place, butunchunking provides a trivial and convenient way to recover from chunks that seem temporarily advantageous. 74.3.6.3 Efficient sample incorporationThe simple extension of the batch merging procedure to an incremental, on-line version was alreadydiscussed for the HMM case (Section 3.3.5), and can be applied unchanged for SCFGs. This includes theuse of a prior factor. weighting to control generalization and prevent overgeneralization during the earlyrounds of incremental merging (Section 3.4.4). Incremental merging is the default method used in all theexperiments reported below, unless otherwise noted.As a result of incremental nonterminal merging, a sequence of nonterminals that was previouslythe subject of chunking can reappear. In that case the chunking operation is re-applied and the previouslyallocated LHS nonterminal is used in replacing the re-occurring sequence. This special form of the chunkingoperator is known as rechunking.Various additional strategies are possible in order to reduce the number of new nonterminalscreated during sample incorporation, thereby reducing subsequent merging work. The drawback of all these7 The unchunkingoperation was adapted from Cook et al. (1976) after we had noticed the similarity between the two approaches (seethe discussion in Section 4.4.3)
CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 91approaches is that they reuse previously hypothesized grammar structures, possibly preventing the algorithmfrom considering better alternatives.1. Avoid duplicate samples: incorporate duplicate samples only once, with appropriately adjusted counts.This is a trivial optimization that can never do harm.2. Try parsing samples first before resorting to the ordinary creation of new productions. If a new sample isparsed successfully counts on the old productions are updated to reflect the new sample. 8 This methodsubsumes strategy 1 above. (See Section 6.5.3 for ways to efficiently handle the parsing of bracketedsamples, which is needed if this method is to applied to structured samples.)3. To save initial merging of preterminals, reuse existing preterminals where possible. This precludes thecreation of grammars with ambiguity at the level of lexical productions.4. Try to parse the new sample into string fragments using existing rules, and add only a top-levelproduction to link these fragments to the start symbol. This subsumes both strategy 2 and strategy 3.(Section 6.5.4 describes one approach to parsing ungrammatical samples into fragments that can beused here.)Unless noted otherwise, only strategy 2 was used in obtaining the results reported here.4.4 Related Work4.4.1 Bayesian grammar learning by enumerationWe already mentioned Horning (1969) as an early proponent of the Bayesian version of grammarinference by enumeration, as the principle is general enough to be applied (in theory) to any type of probabilisticgrammar. Horning’s focus was actually on probabilistic CFGs, and the formal device used to enumerategrammars, as well as to assign prior probabilities, was a grammar-generating grammar, or grammar grammar.As expected, enumeration is not practical beyond the simplest target grammars, but Horning’s work istheoretically important and was one of the first to point out the use of posterior probabilities as a formalizationof the simplicity vs. data fit trade-off.4.4.2 Merging and chunking based approachesThe idea of combining merging and chunking with a hill-climbing style search procedure to induceCFG structures seems to have been developed independently by several researchers. Below is a list of thosewe are aware of.8 If the sample is ambiguous the counts could be updated for all derivation according to their respective probabilities, or using onlyfor the Viterbi derivation. In either case the likelihood of the sample will be underestimated by the Viterbi-based computation of theposterior probability. Updating according to the Viterbi derivation should favor the creation of unambiguous grammar structures, but nodetailed comparisons have been done on this issue.
Page 1 and 2:
The dissertation of Andreas Stolcke
Page 3 and 4:
Bayesian Learning of Probabilistic
Page 5 and 6:
iAcknowledgmentsLife and work in Be
Page 7 and 8:
iiiContentsList of FiguresList of T
Page 9 and 10:
CONTENTSv4.5.4 Summary and Discussi
Page 14 and 15:
CHAPTER 1. INTRODUCTION 2Instance-b
Page 16 and 17:
CHAPTER 1. INTRODUCTION 4A.0.830.33
Page 18 and 19:
CHAPTER 1. INTRODUCTION 6the ¨ 0 l
Page 20 and 21:
..1 £££1; 450,1 £££1; 450CHAP
Page 22 and 23:
VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
Page 24 and 25:
,,vv,v,v,,directly. However, note t
Page 26 and 27:
4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
Page 28 and 29:
6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
Page 30 and 31:
CHAPTER 2. FOUNDATIONS 18As more da
Page 32 and 33:
CHAPTER 2. FOUNDATIONS 20Global mod
Page 34 and 35:
CHAPTER 2. FOUNDATIONS 22¡An expli
Page 36 and 37:
ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
Page 38 and 39:
666CHAPTER 2. FOUNDATIONS 262.5.7 P
Page 40 and 41:
It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
Page 42 and 43:
, uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
Page 44 and 45:
CHAPTER 3. HIDDEN MARKOV MODELS 32T
Page 46 and 47:
CHAPTER 3. HIDDEN MARKOV MODELS 34R
Page 48 and 49:
4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
Page 50 and 51:
6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
Page 52 and 53: 2. For each candidate"I!computeLet"
Page 54 and 55: 6\“ç%&ät\“ç tè ä, u¦¸¹u
Page 56 and 57: , u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
Page 58 and 59: CHAPTER 3. HIDDEN MARKOV MODELS 46l
Page 60 and 61: CHAPTER 3. HIDDEN MARKOV MODELS 48c
Page 62 and 63: CHAPTER 3. HIDDEN MARKOV MODELS 50d
Page 64 and 65: CHAPTER 3. HIDDEN MARKOV MODELS 520
Page 66 and 67: correlation between initial and fin
Page 68 and 69: ,CHAPTER 3. HIDDEN MARKOV MODELS 56
Page 72 and 73: ,CHAPTER 3. HIDDEN MARKOV MODELS 60
Page 74 and 75: CHAPTER 3. HIDDEN MARKOV MODELS 62b
Page 76 and 77: CHAPTER 3. HIDDEN MARKOV MODELS 64t
Page 78 and 79: CHAPTER 3. HIDDEN MARKOV MODELS 66t
Page 80 and 81: CHAPTER 3. HIDDEN MARKOV MODELS 68s
Page 86 and 87: CHAPTER 3. HIDDEN MARKOV MODELS 74b
Page 88 and 89: domain. 3 In short, we will leave o
Page 90 and 91: ,,,,,£CHAPTER 4. STOCHASTIC CONTEX
Page 92 and 93: 9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
Page 94 and 95: ¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
Page 96 and 97: CHAPTER 4. STOCHASTIC CONTEXT-FREE
Page 104 and 105: ==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
Page 106 and 107: ,= ===I¸theybÜ„thiscg„\\ ¸¸
Page 116 and 117: 104Chapter 5Probabilistic Attribute
Page 118 and 119: ,1makingandCHAPTER 5. PROBABILISTIC
Page 120 and 121: CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 134 and 135: 122Chapter 6Efficient parsing with
Page 136 and 137: 1z1CHAPTER 6. EFFICIENT PARSING WIT
Page 138 and 139: and each state in set ( -ÏH ):6)
Page 140 and 141: NPDetVTVI P CHAPTER 6. EFFICIENT PA
Page 142 and 143: CHAPTER 6. EFFICIENT PARSING WITH S
Page 144 and 145: ) In particular, the string probabi
Page 146 and 147: H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
Page 148 and 149: ©) 6#=©,©,©,,ÆNNö,²NL++and>+
Page 150 and 151: ) The probabilistic unit-production
Page 152 and 153:
¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
Page 154 and 155:
The forward and inner probabilities
Page 156 and 157:
9 itself²²NN++9 ¸0ÌLL++??1£,=C
Page 158 and 159:
,,by nonterminals. Multiplying this
Page 160 and 161:
6CHAPTER 6. EFFICIENT PARSING WITH
Page 162 and 163:
description. Again, we ignore this
Page 164 and 165:
for all pairs of states d =¸+= Š
Page 166 and 167:
,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
Page 168 and 169:
CHAPTER 6. EFFICIENT PARSING WITH S
Page 170 and 171:
are then summed over all nontermina
Page 172 and 173:
+CHAPTER 6. EFFICIENT PARSING WITH
Page 174 and 175:
1CHAPTER 6. EFFICIENT PARSING WITH
Page 176 and 177:
CHAPTER 6. EFFICIENT PARSING WITH S
Page 178 and 179:
= ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
Page 180 and 181:
168Chapter 7-grams from Stochastic
Page 182 and 183:
CHAPTER 7. -GRAMS FROM STOCHASTIC
Page 184 and 185:
)ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
Page 186 and 187:
-grams CCHAPTER 7. -GRAMS FROM ST
Page 188 and 189:
,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N
Page 190 and 191:
,,CHAPTER 7. -GRAMS FROM STOCHASTI
Page 192 and 193:
Consider the following problem: sta
Page 194 and 195:
CHAPTER 8. FUTURE DIRECTIONS 1828.2
Page 196 and 197:
184BibliographyAHO, ALFRED V., RAVI
Page 198 and 199:
BIBLIOGRAPHY 186DAGAN, IDO, FERNAND
Page 200 and 201:
BIBLIOGRAPHY 188——, & ——. 1
Page 202 and 203:
BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
Page 204:
BIBLIOGRAPHY 192WALLACE, C. S., & P
show all

The dissertation of Andreas Stolcke is approved: University of ...

Create successful ePaper yourself

Delete template?

Save as template?