12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

75Chapter 4Stochastic Context-free Grammars4.1 Introduction and OverviewIn th<strong>is</strong> chapter we will look at model merging as applied to the probabil<strong>is</strong>tic version <strong>of</strong> context-freegrammars. <strong>The</strong> stochastic context-free grammar (SCFG) formal<strong>is</strong>m <strong>is</strong> a generalization <strong>of</strong> the HMM, just asnon-probabil<strong>is</strong>tic CFGs can be thought <strong>of</strong> as an extension <strong>of</strong> finite state grammars.Unlike their their non-probabil<strong>is</strong>ticcounterpart, SCFGs are not a ‘mainstream’ approach to languagemodeling yet. 1 In most <strong>of</strong> today’s probabil<strong>is</strong>ticlanguage models finite-state or even simple ¢ -gram approachesdominate. One reason for th<strong>is</strong> <strong>is</strong> that although most standard algorithms for probabil<strong>is</strong>tic finite-state models(i.e., HMMs) have generalized versions for SCFGs, they become computationally more demanding, and <strong>of</strong>tenintractable in practice (see Section 4.2.2).A more important problem <strong>is</strong> that SCFGs may actually be worse at modeling one aspect <strong>of</strong> languagein which simple finite-state models do a surpr<strong>is</strong>ingly good job: capturing the short-d<strong>is</strong>tance, lexical (asopposed to phrase-structural) contingencies between words. Th<strong>is</strong> <strong>is</strong> a direct consequence <strong>of</strong> the conditionalindependence assumptions embodied in SCFGs, and has prompted the investigation <strong>of</strong> ‘mildly contextsensitive’grammars and their probabil<strong>is</strong>tic versions (Resnik 1992; Schabes 1992). <strong>The</strong>se, however, come atan even greater computational price.domain.Recent work has shown that probabil<strong>is</strong>tic CFGs can be useful if applied carefully and in the rightLari & Young (1991) d<strong>is</strong>cuss various applications <strong>of</strong> estimated SCFGs for phonetic modeling.Jurafsky et al. (1994b) show that a SCFG built from hand-crafted rules with probabilities estimated from acorpus can improve speech recognition performance over standard ¢ -gram language models, either by directlycoupling the SCFG to the speech decoder, or by using the SCFG effectively as a smoothing device to improvethe estimates <strong>of</strong> ¢ -gram probabilities from sparse data. <strong>The</strong> algorithms that form the bas<strong>is</strong> <strong>of</strong> these last twoapproaches are described in the second part <strong>of</strong> th<strong>is</strong> thes<strong>is</strong>, in Chapter 6 and Chapter 7, respectively.1 While bare CFGs aren’t widely used in computational lingu<strong>is</strong>tics either, they form the bas<strong>is</strong> or ‘backbone’ <strong>of</strong> most <strong>of</strong> today’sfeature and unification-based grammar formal<strong>is</strong>ms, such as LFG (Kaplan & Bresnan 1982), GPSG (Gazdar et al. 1985), and constructiongrammar (Fillmore 1988).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!