The dissertation of Andreas Stolcke is approved: University of ...

More documents

Recommendations

Info

CHAPTER 3. HIDDEN MARKOV MODELS 74both practical and effective when embedded in a realistic speech system.The details of the construction of these word models, along with discussion of ancillary issues anda graphical HMM representation of the pronunciation for the 50 most common words in the BeRP corpus canbe found in Wooters (1993).3.7 Conclusions and Further ResearchOur evaluations indicate that the HMM merging approach is a promising new way to induceprobabilistic finite-state models from data. It compares favorably with the standard Baum-Welch method,especially when there are few prior constraints on the HMM topology. Our implementation of the algorithmand applications in the speech domain have shown it to be feasible in practice.Experimentation with the range of plausible priors, as well as new, application-specific ones istime-consuming, and we have barely scratched the surface in this area. However, the experience so far withthe priors discussed in Section 3.3.3 is that the particular choice of prior type and parameters does not greatlyaffect the course of the best-first search, except possibly the decision when to stop. In other words, the mergingheuristic, together with the effect of the likelihood are the determining factors in the choice of merges. Thiscould, and in fact should, change with the use of more informative priors.Likewise, we haven’t pursued merging of HMMs with non-discrete outputs. For example, HMMswith mixtures of Gaussians as emission densities are being used extensively (Gauvain & Lee 1991) for speechmodeling. Our merging algorithm becomes applicable to such models provided that one has a prior for suchdensities, which should be straightforward (Cheeseman et al. 1988). Efficient implementation of the mergingoperator may be a bigger problem—one wants to avoid having to explicitly compute a merged density foreach merge under consideration.One of the major shortcomings of the current merging strategy is its inability to ‘back off’ froma merging step that turns out be an overgeneralization in the light of new data. A solution to this problemmight be the addition of a complementary state splitting operator, along the lines of Bell et al. (1990) or Ronet al. (1994). The evaluation functions used in those approaches are entropy-based, and thus equivalent tomaximum likelihood, which also means that a Bayesian generalization (by adding a suitable prior) should notbe hard.As mentioned in Section 3.5.3, standard splitting alone will not explore the full power of finite-statemodels, and combining merging with splitting would circumvent this limitation. The major difficulty withevaluating splits (as opposed to merges) is that it requires rather more elaborate statistics than simple Viterbicounts, since splitting decisions are based on co-occurrences of states in a path.
75Chapter 4Stochastic Context-free Grammars4.1 Introduction and OverviewIn this chapter we will look at model merging as applied to the probabilistic version of context-freegrammars. The stochastic context-free grammar (SCFG) formalism is a generalization of the HMM, just asnon-probabilistic CFGs can be thought of as an extension of finite state grammars.Unlike their their non-probabilisticcounterpart, SCFGs are not a ‘mainstream’ approach to languagemodeling yet. 1 In most of today’s probabilisticlanguage models finite-state or even simple ¢ -gram approachesdominate. One reason for this is that although most standard algorithms for probabilistic finite-state models(i.e., HMMs) have generalized versions for SCFGs, they become computationally more demanding, and oftenintractable in practice (see Section 4.2.2).A more important problem is that SCFGs may actually be worse at modeling one aspect of languagein which simple finite-state models do a surprisingly good job: capturing the short-distance, lexical (asopposed to phrase-structural) contingencies between words. This is a direct consequence of the conditionalindependence assumptions embodied in SCFGs, and has prompted the investigation of ‘mildly contextsensitive’grammars and their probabilistic versions (Resnik 1992; Schabes 1992). These, however, come atan even greater computational price.domain.Recent work has shown that probabilistic CFGs can be useful if applied carefully and in the rightLari & Young (1991) discuss various applications of estimated SCFGs for phonetic modeling.Jurafsky et al. (1994b) show that a SCFG built from hand-crafted rules with probabilities estimated from acorpus can improve speech recognition performance over standard ¢ -gram language models, either by directlycoupling the SCFG to the speech decoder, or by using the SCFG effectively as a smoothing device to improvethe estimates of ¢ -gram probabilities from sparse data. The algorithms that form the basis of these last twoapproaches are described in the second part of this thesis, in Chapter 6 and Chapter 7, respectively.1 While bare CFGs aren’t widely used in computational linguistics either, they form the basis or ‘backbone’ of most of today’sfeature and unification-based grammar formalisms, such as LFG (Kaplan & Bresnan 1982), GPSG (Gazdar et al. 1985), and constructiongrammar (Fillmore 1988).
Page 1 and 2:
The dissertation of Andreas Stolcke
Page 3 and 4:
Bayesian Learning of Probabilistic
Page 5 and 6:
iAcknowledgmentsLife and work in Be
Page 7 and 8:
iiiContentsList of FiguresList of T
Page 9 and 10:
CONTENTSv4.5.4 Summary and Discussi
Page 14 and 15:
CHAPTER 1. INTRODUCTION 2Instance-b
Page 16 and 17:
CHAPTER 1. INTRODUCTION 4A.0.830.33
Page 18 and 19:
CHAPTER 1. INTRODUCTION 6the ¨ 0 l
Page 20 and 21:
..1 £££1; 450,1 £££1; 450CHAP
Page 22 and 23:
VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
Page 24 and 25:
,,vv,v,v,,directly. However, note t
Page 26 and 27:
4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
Page 28 and 29:
6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
Page 30 and 31:
CHAPTER 2. FOUNDATIONS 18As more da
Page 32 and 33:
CHAPTER 2. FOUNDATIONS 20Global mod
Page 34 and 35:
CHAPTER 2. FOUNDATIONS 22¡An expli
Page 36 and 37: ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
Page 38 and 39: 666CHAPTER 2. FOUNDATIONS 262.5.7 P
Page 40 and 41: It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
Page 42 and 43: , uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
Page 44 and 45: CHAPTER 3. HIDDEN MARKOV MODELS 32T
Page 46 and 47: CHAPTER 3. HIDDEN MARKOV MODELS 34R
Page 48 and 49: 4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
Page 50 and 51: 6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
Page 52 and 53: 2. For each candidate"I!computeLet"
Page 54 and 55: 6\“ç%&ät\“ç tè ä, u¦¸¹u
Page 56 and 57: , u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
Page 58 and 59: CHAPTER 3. HIDDEN MARKOV MODELS 46l
Page 60 and 61: CHAPTER 3. HIDDEN MARKOV MODELS 48c
Page 62 and 63: CHAPTER 3. HIDDEN MARKOV MODELS 50d
Page 64 and 65: CHAPTER 3. HIDDEN MARKOV MODELS 520
Page 66 and 67: correlation between initial and fin
Page 68 and 69: ,CHAPTER 3. HIDDEN MARKOV MODELS 56
Page 72 and 73: ,CHAPTER 3. HIDDEN MARKOV MODELS 60
Page 74 and 75: CHAPTER 3. HIDDEN MARKOV MODELS 62b
Page 76 and 77: CHAPTER 3. HIDDEN MARKOV MODELS 64t
Page 78 and 79: CHAPTER 3. HIDDEN MARKOV MODELS 66t
Page 80 and 81: CHAPTER 3. HIDDEN MARKOV MODELS 68s
Page 88 and 89: domain. 3 In short, we will leave o
Page 90 and 91: ,,,,,£CHAPTER 4. STOCHASTIC CONTEX
Page 92 and 93: 9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
Page 94 and 95: ¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
Page 96 and 97: CHAPTER 4. STOCHASTIC CONTEXT-FREE
Page 104 and 105: ==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
Page 106 and 107: ,= ===I¸theybÜ„thiscg„\\ ¸¸
Page 116 and 117: 104Chapter 5Probabilistic Attribute
Page 118 and 119: ,1makingandCHAPTER 5. PROBABILISTIC
Page 120 and 121: CHAPTER 5. PROBABILISTIC ATTRIBUTE
Page 134 and 135: 122Chapter 6Efficient parsing with
Page 136 and 137:
1z1CHAPTER 6. EFFICIENT PARSING WIT
Page 138 and 139:
and each state in set ( -ÏH ):6)
Page 140 and 141:
NPDetVTVI P CHAPTER 6. EFFICIENT PA
Page 142 and 143:
CHAPTER 6. EFFICIENT PARSING WITH S
Page 144 and 145:
) In particular, the string probabi
Page 146 and 147:
H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
Page 148 and 149:
©) 6#=©,©,©,,ÆNNö,²NL++and>+
Page 150 and 151:
) The probabilistic unit-production
Page 152 and 153:
¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
Page 154 and 155:
The forward and inner probabilities
Page 156 and 157:
9 itself²²NN++9 ¸0ÌLL++??1£,=C
Page 158 and 159:
,,by nonterminals. Multiplying this
Page 160 and 161:
6CHAPTER 6. EFFICIENT PARSING WITH
Page 162 and 163:
description. Again, we ignore this
Page 164 and 165:
for all pairs of states d =¸+= Š
Page 166 and 167:
,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
Page 168 and 169:
Page 170 and 171:
are then summed over all nontermina
Page 172 and 173:
+CHAPTER 6. EFFICIENT PARSING WITH
Page 174 and 175:
1CHAPTER 6. EFFICIENT PARSING WITH
Page 176 and 177:
Page 178 and 179:
= ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
Page 180 and 181:
168Chapter 7-grams from Stochastic
Page 182 and 183:
CHAPTER 7. -GRAMS FROM STOCHASTIC
Page 184 and 185:
)ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
Page 186 and 187:
-grams CCHAPTER 7. -GRAMS FROM ST
Page 188 and 189:
,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N
Page 190 and 191:
,,CHAPTER 7. -GRAMS FROM STOCHASTI
Page 192 and 193:
Consider the following problem: sta
Page 194 and 195:
CHAPTER 8. FUTURE DIRECTIONS 1828.2
Page 196 and 197:
184BibliographyAHO, ALFRED V., RAVI
Page 198 and 199:
BIBLIOGRAPHY 186DAGAN, IDO, FERNAND
Page 200 and 201:
BIBLIOGRAPHY 188——, & ——. 1
Page 202 and 203:
BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
Page 204:
BIBLIOGRAPHY 192WALLACE, C. S., & P
show all

The dissertation of Andreas Stolcke is approved: University of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?