12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 86Sample strings can be enriched to convey some <strong>of</strong> the phrase structure information. Bracketedsamples are ordinary samples with substrings enclosed by balanced, non-overlapping pairs <strong>of</strong> parentheses.<strong>The</strong> bracketing need not be complete. For example, a partially bracketed sample for the example language <strong>of</strong>Section 4.3.2 <strong>is</strong>(a a (a b) b b)It <strong>is</strong> known that access to completely bracketed samples (equivalent to unlabeled derivation trees)makes learning non-probabil<strong>is</strong>tic CFGs possible and tractable, by applying techniques borrowed from finitestatemodel induction (Sakakibara 1990). Pereira & Schabes (1992) have shown that providing even partialbracketing information can help the induction <strong>of</strong> properly structured SCFGs using the standard estimationapproach. Th<strong>is</strong> ra<strong>is</strong>es the question how bracketed samples can be incorporated into the merging algorithmdescribed so far.Th<strong>is</strong> <strong>is</strong> indeed possible by a simple extension <strong>of</strong> the sample incorporation procedure describedabove. Instead <strong>of</strong> creating a single top-level production to account for a new sample, the algorithm createsa collection <strong>of</strong> productions and nonterminals to mirror the bracketing observed. Thus the sample (a a (ab) b b) <strong>is</strong> added to grammar using the productionsS --> A1 A2 X B2 B3 (1)X --> A3 B1 (1)plus lexical productions for the preterminals created. Each pair or parentheses generates an intermediatenonterminal, such as X above.Merging and chunking are then applied to the resulting grammar as before. If the provided samplebracketing <strong>is</strong> complete, i.e., contains brackets for all phrase boundaries in the target grammar, then chunkingbecomes unnecessary. Merging alone can in principle produce the target grammar in th<strong>is</strong> case, providedsamples for all productions are given.4.3.4 SCFG priorsIn choosing prior d<strong>is</strong>tributions for SCFGs we again extend various approaches previously used forHMMs. As before, a model 4<strong>is</strong> decomposed into a d<strong>is</strong>crete structure 4 Ã and collection <strong>of</strong> continuousparameters U Ä . Again, we have a choice <strong>of</strong> narrow or broad parameter priors, depending on whether theidentity <strong>of</strong> non-zero rule probabilities <strong>is</strong> part <strong>of</strong> 4ùÃ or U Ä (Section 3.3.3). However, notice that broadparameter priors become problematic here since the set <strong>of</strong> possible rules (and hence parameters) <strong>is</strong> not apriori limited as for HMMs. As the length <strong>of</strong> RHSs grows, the number <strong>of</strong> potential rules over a given set <strong>of</strong>nonterminals also grows (exponentially). Th<strong>is</strong> makes narrow parameter priors inherently simpler and morenatural for SCFGs.<strong>The</strong> following combination <strong>of</strong> priors was used in the experiments reported below. <strong>The</strong> parameterprior +-,;U Ä . 4TÃE0 <strong>is</strong> a product <strong>of</strong> Dirichlet d<strong>is</strong>tributions (Section 2.5.5.1), one for each nonterminal. EachDirichlet thus allocates prior probability over all possible expansions <strong>of</strong> a single nonterminal. <strong>The</strong> total prior

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!