BIBLIOGRAPHY 186DAGAN, IDO, FERNANDO PEREIRA, & LILLIAN LEE. 1994. Similarity-based estimation <strong>of</strong> word coocurrenceprobabilities. In Proceedings <strong>of</strong> the 31th Annual Meeting <strong>of</strong> the Association for ComputationalLingu<strong>is</strong>tics, New Mexico State <strong>University</strong>, Las Cruces, NM.DE SAUSSURE, FERDINAND. 1916. Cours de lingu<strong>is</strong>tique generale. Par<strong>is</strong>: Payot.DEMPSTER, A. P., N. M. LAIRD, & D. B. RUBIN. 1977. Maximum likelihood from incomplete data via theEM algorithm. Journal <strong>of</strong> the Royal Stat<strong>is</strong>tical Society, Series B 34.1–38.EARLEY, JAY. 1970. An efficient context-free parsing algorithm. Communications <strong>of</strong> the ACM 6.451–455.EVANS, T. G. 1971. Grammatical inference techniques in pattern analys<strong>is</strong>. In S<strong>of</strong>tware engineering, ed. byJ. Tou, 183–202. New York: Academic Press.FASS, LEONA F. 1983. Learning context-free languages from their structured sentences. ACM SIGACT News15.24–35.FELDMAN, J., G. LAKOFF, D. BAILEY, S. NARAYANAN, T. REGIER, & A. STOLCKE. ¨ 1994. 0—the first fouryears. AI Review 8. Special <strong>is</strong>sue on Integration <strong>of</strong> Natural Language and V<strong>is</strong>ion Processing, to appear.FELDMAN, JEROME A., GEORGE LAKOFF, ANDREAS STOLCKE, & SUSAN HOLLBACH WEBER. 1990. Miniaturelanguage acqu<strong>is</strong>ition: A touchstone for cognitive science. In Proceedings <strong>of</strong> the 12th Annual Conference<strong>of</strong> the Cognitive Science Society, 686–693, MIT, Cambridge, Mass.FILLMORE, CHARLES J. 1988. <strong>The</strong> mechan<strong>is</strong>ms <strong>of</strong> “Construction Grammar”. In Proceedings <strong>of</strong> the FourteenthAnnual Meeting <strong>of</strong> the Berkeley Lingu<strong>is</strong>tics Society, ed. by Shelley Axmaker, Annie Ja<strong>is</strong>ser, & HelenSingme<strong>is</strong>ter, 35–55, Berkeley, Ca.FUJISAKI, T., F. JELINEK, J. COCKE, E. BLACK, & T. NISHINO. 1991. A probabil<strong>is</strong>tic parsing method forsentence d<strong>is</strong>ambiguation. In Current Issues in Parsing Technology, ed. by Masaru Tomita, chapter 10,139–152. Boston: Kluwer Academic Publ<strong>is</strong>hers.GAROFOLO, J. S., 1988. Getting Started with the DARPA TIMIT CD-ROM: an Acoustic Phonetic ContinuousSpeech Database. National Institute <strong>of</strong> Standards and Technology (NIST), Gaithersburgh, Maryland.GAUVAIN, JEAN-LUC, & CHIN-HIN LEE. 1991. Bayesian learning <strong>of</strong> Gaussian mixture densities for hiddenMarkov models. In Proceedings DARPA Speech and Natural Language Processing Workshop, 271–277.Pacific Grove, CA: Defence Advanced Research Projects Agency, Information Science and TechnologyOffice.GAZDAR, GERALD, E. KLEIN, G. K. PULLUM, & I. A. SAG. 1985. Generalized Phrase Structure Grammar.Cambridge, Mass.: Harvard <strong>University</strong> Press.GEMAN, STUART, ELIE BIENENSTOCK, & RENÉ DOURSAT. 1992. Neural networks and the bias/variancedilemma. Neural Computation 4.1–58.
BIBLIOGRAPHY 187——, & DONALD GEMAN. 1984. Stochastic relaxation, Gibbs d<strong>is</strong>tributions, and the Bayesian restoration <strong>of</strong>images. IEEE Transactions on Pattern Analys<strong>is</strong> and Machine Intelligence 6.721–741.GRAHAM, SUSAN L., MICHAEL A. HARRISON, & WALTER L. RUZZO. 1980. An improved context-freerecognizer. ACM Transactions on Programming Languages and Systems 2.415–462.GULL, S. F. 1988. Bayesian inductive inference and maximum entropy. In Maximum Entropy and BayesianMethods in Science and Engineering, Volume 1: Foundations, ed. by G. J. Erickson & C. R. Smith,53–74. Dordrecht: Kluwer.HAUSSLER, DAVID, ANDERS KROGH, I. SAIRA MIAN, & KIMMEN SJÖLANDER. 1992. Protein modelingusing hidden Markov models: Analys<strong>is</strong> <strong>of</strong> globins. Technical Report UCSC-CRL-92-23, Computer andInformation Sciences, <strong>University</strong> <strong>of</strong> California, Santa Cruz, Ca. Rev<strong>is</strong>ed Sept. 1992.HINTON, GEOFFREY E., & TERRENCE J. SEJNOWSKI. 1986. Learning and relearning in boltzmann machines.In Parallel D<strong>is</strong>tributed Processing: Explorations in the Microstructure <strong>of</strong> Cognition, ed. by David E.Rumelhart & James L. McClelland, volume 1: Foundations, 282–317. Cambridge, Mass.: BradfordBooks (MIT Press).HJELMSLEV, LOUIS. 1953. Prolegomena to a theory <strong>of</strong> language. Baltimore: Waverly Press. Translatedby Franc<strong>is</strong> J. Whitfield from the Dan<strong>is</strong>h original Omkring sprogteoriens grundlaeggelse, Kopenhagen,1943.HOPCROFT, JOHN E., & JEFFREY D. ULLMAN. 1979. Introduction to Automata <strong>The</strong>ory, Languages, andComputation. Reading, Mass.: Add<strong>is</strong>on-Wesley.HORNING, JAMES JAY. 1969. A study <strong>of</strong> grammatical inference. Technical Report CS 139, Computer ScienceDepartment, Stanford <strong>University</strong>, Stanford, Ca.JELINEK, FREDERICK. 1985. Markov source modeling in text generation. In <strong>The</strong> Impact <strong>of</strong> ProcessingTechniques on Communications, ed. by J. K. Skwirzinski. Dordrecht: Nijh<strong>of</strong>f.——, & JOHN D. LAFFERTY. 1991. Computation <strong>of</strong> the probability<strong>of</strong> initial substring generation by stochasticcontext-free grammars. Computational Lingu<strong>is</strong>tics 17.315–323.——, ——, & ROBERT L. MERCER. 1992. Basic methods <strong>of</strong> probabil<strong>is</strong>tic context free grammars. In SpeechRecognition and Understanding. Recent Advances, Trends, and Applications, ed. by Pietro Laface &Renato De Mori, volume F75 <strong>of</strong> NATO Advanced Sciences Institutes Series, 345–360. Berlin: SpringerVerlag. Proceedings <strong>of</strong> the NATO Advanced Study Institute, Cetraro, Italy, July 1990.——, & ROBERT L. MERCER. 1980. Interpolated estimation <strong>of</strong> Markov source parameters from sparse data.In Proceedings Workshop on Pattern Recognition in Practice, 381–397, Amsterdam.JONES, MARK A., & JASON M. EISNER. 1992a. A probabil<strong>is</strong>tic parser and its applications. In AAAI Workshopon Stat<strong>is</strong>tically-Based NLP Techniques, 20–27, San Jose, CA.
- Page 1 and 2:
The dissertation of Andreas Stolcke
- Page 3 and 4:
Bayesian Learning of Probabilistic
- Page 5 and 6:
iAcknowledgmentsLife and work in Be
- Page 7 and 8:
iiiContentsList of FiguresList of T
- Page 9 and 10:
CONTENTSv4.5.4 Summary and Discussi
- Page 14 and 15:
CHAPTER 1. INTRODUCTION 2Instance-b
- Page 16 and 17:
CHAPTER 1. INTRODUCTION 4A.0.830.33
- Page 18 and 19:
CHAPTER 1. INTRODUCTION 6the ¨ 0 l
- Page 20 and 21:
..1 £££1; 450,1 £££1; 450CHAP
- Page 22 and 23:
VU=@U@@=U===UCHAPTER 2. FOUNDATIONS
- Page 24 and 25:
,,vv,v,v,,directly. However, note t
- Page 26 and 27:
4@@@@-@b@6@˜--@@@0@@@@@CHAPTER 2.
- Page 28 and 29:
6tt,u ·¥¸¹u ,10ºtu ,2 10Yt ¸
- Page 30 and 31:
CHAPTER 2. FOUNDATIONS 18As more da
- Page 32 and 33:
CHAPTER 2. FOUNDATIONS 20Global mod
- Page 34 and 35:
CHAPTER 2. FOUNDATIONS 22¡An expli
- Page 36 and 37:
ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0
- Page 38 and 39:
666CHAPTER 2. FOUNDATIONS 262.5.7 P
- Page 40 and 41:
It, u¦¸¹u Ù 0w6¬tt,_, u Ù 0
- Page 42 and 43:
, uu!¸¹u Ù 0c6,u ,ÔÔ0 ö1 ö1
- Page 44 and 45:
CHAPTER 3. HIDDEN MARKOV MODELS 32T
- Page 46 and 47:
CHAPTER 3. HIDDEN MARKOV MODELS 34R
- Page 48 and 49:
4ÿ= ê•4TÃE0&Ò¢¡? •ç1 Lht
- Page 50 and 51:
6666ò U1ò +9,9. 4 20+-,¡ . 4 10C
- Page 52 and 53:
2. For each candidate"I!computeLet"
- Page 54 and 55:
6\“ç%&ät\“ç tè ä, u¦¸¹u
- Page 56 and 57:
, u1 ¸¼u Ù 0 and , u3 ¸¹u Ù 0
- Page 58 and 59:
CHAPTER 3. HIDDEN MARKOV MODELS 46l
- Page 60 and 61:
CHAPTER 3. HIDDEN MARKOV MODELS 48c
- Page 62 and 63:
CHAPTER 3. HIDDEN MARKOV MODELS 50d
- Page 64 and 65:
CHAPTER 3. HIDDEN MARKOV MODELS 520
- Page 66 and 67:
correlation between initial and fin
- Page 68 and 69:
,CHAPTER 3. HIDDEN MARKOV MODELS 56
- Page 70 and 71:
CHAPTER 3. HIDDEN MARKOV MODELS 580
- Page 72 and 73:
,CHAPTER 3. HIDDEN MARKOV MODELS 60
- Page 74 and 75:
CHAPTER 3. HIDDEN MARKOV MODELS 62b
- Page 76 and 77:
CHAPTER 3. HIDDEN MARKOV MODELS 64t
- Page 78 and 79:
CHAPTER 3. HIDDEN MARKOV MODELS 66t
- Page 80 and 81:
CHAPTER 3. HIDDEN MARKOV MODELS 68s
- Page 82 and 83:
CHAPTER 3. HIDDEN MARKOV MODELS 706
- Page 84 and 85:
CHAPTER 3. HIDDEN MARKOV MODELS 723
- Page 86 and 87:
CHAPTER 3. HIDDEN MARKOV MODELS 74b
- Page 88 and 89:
domain. 3 In short, we will leave o
- Page 90 and 91:
,,,,,£CHAPTER 4. STOCHASTIC CONTEX
- Page 92 and 93:
9 ¸)Ô ¸ 9 ¸Ô 1 2 £££;,ÔÔC
- Page 94 and 95:
¸= ¸= ¸.1.2¸¸) 1_) 20&6#=,,,,
- Page 96 and 97:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
- Page 98 and 99:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
- Page 100 and 101:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
- Page 102 and 103:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
- Page 104 and 105:
==Ì==ÌCHAPTER 4. STOCHASTIC CONTE
- Page 106 and 107:
,= ===I¸theybÜ„thiscg„\\ ¸¸
- Page 108 and 109:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
- Page 110 and 111:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
- Page 112 and 113:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
- Page 114 and 115:
CHAPTER 4. STOCHASTIC CONTEXT-FREE
- Page 116 and 117:
104Chapter 5Probabilistic Attribute
- Page 118 and 119:
,1makingandCHAPTER 5. PROBABILISTIC
- Page 120 and 121:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
- Page 122 and 123:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
- Page 124 and 125:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
- Page 126 and 127:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
- Page 128 and 129:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
- Page 130 and 131:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
- Page 132 and 133:
CHAPTER 5. PROBABILISTIC ATTRIBUTE
- Page 134 and 135:
122Chapter 6Efficient parsing with
- Page 136 and 137:
1z1CHAPTER 6. EFFICIENT PARSING WIT
- Page 138 and 139:
and each state in set ( -ÏH ):6)
- Page 140 and 141:
NPDetVTVI P CHAPTER 6. EFFICIENT PA
- Page 142 and 143:
CHAPTER 6. EFFICIENT PARSING WITH S
- Page 144 and 145:
) In particular, the string probabi
- Page 146 and 147:
H : d=:6) ¸ÆÆÆ:6) ¸ .V=i£Ù )
- Page 148 and 149: ©) 6#=©,©,©,,ÆNNö,²NL++and>+
- Page 150 and 151: ) The probabilistic unit-production
- Page 152 and 153: ¸0 ¸ 29¸¸ 99W9 [;t£ ? 1 ?1u 6
- Page 154 and 155: The forward and inner probabilities
- Page 156 and 157: 9 itself²²NN++9 ¸0ÌLL++??1£,=C
- Page 158 and 159: ,,by nonterminals. Multiplying this
- Page 160 and 161: 6CHAPTER 6. EFFICIENT PARSING WITH
- Page 162 and 163: description. Again, we ignore this
- Page 164 and 165: for all pairs of states d =¸+= Š
- Page 166 and 167: ,9 ¸0 : 0¸ £j9¸ A )z9£CHAPTER
- Page 168 and 169: CHAPTER 6. EFFICIENT PARSING WITH S
- Page 170 and 171: are then summed over all nontermina
- Page 172 and 173: +CHAPTER 6. EFFICIENT PARSING WITH
- Page 174 and 175: 1CHAPTER 6. EFFICIENT PARSING WITH
- Page 176 and 177: CHAPTER 6. EFFICIENT PARSING WITH S
- Page 178 and 179: = ¸Let t,t) ¸ .V=i£,t,t,6666yyyy
- Page 180 and 181: 168Chapter 7-grams from Stochastic
- Page 182 and 183: CHAPTER 7. -GRAMS FROM STOCHASTIC
- Page 184 and 185: )ÅÆÅÅÅÆÅÅÅÆÅÅÅÆÅÅÅ
- Page 186 and 187: -grams CCHAPTER 7. -GRAMS FROM ST
- Page 188 and 189: ,?Ó,tÌ?L A0,I 1N I N A A 2 A 3 N
- Page 190 and 191: ,,CHAPTER 7. -GRAMS FROM STOCHASTI
- Page 192 and 193: Consider the following problem: sta
- Page 194 and 195: CHAPTER 8. FUTURE DIRECTIONS 1828.2
- Page 196 and 197: 184BibliographyAHO, ALFRED V., RAVI
- Page 200 and 201: BIBLIOGRAPHY 188——, & ——. 1
- Page 202 and 203: BIBLIOGRAPHY 190QUINLAN, J. ROSS, &
- Page 204: BIBLIOGRAPHY 192WALLACE, C. S., & P