12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 1. INTRODUCTION 3based (or conditioned) on evidence seen in the past, using the framework <strong>of</strong> probability theory as a cons<strong>is</strong>tentmathematical bas<strong>is</strong>.However, th<strong>is</strong> fundamental feature <strong>is</strong> only truly useful because probabil<strong>is</strong>tic models are also adaptable:there are effective algorithms for tuning a model based on previously observed data, so as to optimizeits predictions on new data (assuming old and new data obey the same stat<strong>is</strong>tics).1.2.1 Probabil<strong>is</strong>tic finite-state modelsAn example in point are the probabil<strong>is</strong>tic finite-state models known as Hidden Markov models(HMMs) routinely used in speech recognition to model the phone sequences making up the words to berecognized. <strong>The</strong> top part <strong>of</strong> Figure 1.1 shows a simple word model for “and.” Each phonetic realization <strong>of</strong>the word corresponds to a path through the network <strong>of</strong> states and transitions, with the probabilities indicated.Given a network structure and a training corpus data to be modeled, there are standard algorithms foroptimizing (or estimating) the probability parameters <strong>of</strong> the HMM to fit the data.However, a more fundamental problem <strong>is</strong> how to obtain a suitable model structure in first place.<strong>The</strong> basic idea here will be to construct initial model networks from the observed data (as shown in the bottompart <strong>of</strong> Figure 1.1), and then gradually transform them into a more compact and general form by a processcalled model merging. We will see that there <strong>is</strong> a fundamental tension between optimizing the fit <strong>of</strong> the modelto the observed data, and the goal <strong>of</strong> generalizing to new data. We will use the Bayesian notions <strong>of</strong> prior andposterior model probabilities to formalize these conflicting goals and derive a combined criterion that allowsfinding a comprom<strong>is</strong>e between them.Chapter 3 describes th<strong>is</strong> approach to structural learning <strong>of</strong> HMMs and d<strong>is</strong>cusses many <strong>of</strong> the <strong>is</strong>suesand methods recurring in later chapters.1.2.2 <strong>The</strong> Miniature Language (* Learning 0) TaskAn additional motivation for the present work came from a seemingly simple task proposed byFeldman et al. (1990): construct a machine learner that could generalize from usage examples <strong>of</strong> a naturallanguage fragment to novel instances, for an arbitrary natural language. Figure 1.2 shows the essentialelements <strong>of</strong> th<strong>is</strong> miniature language learning problem, informally known as “¨ the 0” task. <strong>The</strong> goal <strong>is</strong>to ‘learn’ ¨ the 0 language from exposure to pairs <strong>of</strong> corresponding two-dimensional pictures and naturallanguage descriptions. Both the syntax and semantics <strong>of</strong> the language were intentionally limited to makethe problem more manageable. <strong>The</strong> purpose <strong>of</strong> the proposal was to highlight certain fundamental problemswith traditional cognitive science theories, including <strong>is</strong>sue such as dependence on the underlying conceptualsystem, grounding <strong>of</strong> meaning and categorization in perception, and others which are explored in recent andongoing research (Regier 1992; Feldman et al. 1994).For our purposes we can abstract a (much simpler) subproblem from th<strong>is</strong> interd<strong>is</strong>ciplinary task:given pairs <strong>of</strong> sentences and associated idealized semantics (e.g., in first-order logic formulae), construct anadequate formal description <strong>of</strong> the relation between these two for the given language.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!