12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

£££§C11y‚“sꄃ+-,9 0&61CCHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 129where@1?@2£££?@6are string <strong>of</strong> terminals and/or nonterminals, ) ¸.<strong>is</strong> production <strong>of</strong>{, and@2 <strong>is</strong>derived from@1 by replacing one occurrence <strong>of</strong> )with..derivations ) C1producing from ) . 6 1 Cb) <strong>The</strong> string probability +-, ) 0 (<strong>of</strong> 1 given ) ) <strong>is</strong> the sum <strong>of</strong> the probabilities <strong>of</strong> all left-most1c) <strong>The</strong> sentence 1probability C +9,9(<strong>of</strong> given{) <strong>is</strong> the string probability given the symbol9 start 0 <strong>of</strong>{. By definition, th<strong>is</strong> <strong>is</strong> also the 1 probability assigned to by the grammar{.1 .{§0 +-,21d) <strong>The</strong> prefix probability C¬+-,9having 1 as a prefix,0 (<strong>of</strong> 1 given{) <strong>is</strong> the sum <strong>of</strong> the probabilities <strong>of</strong> all sentence strings C+-,9÷s0£(In particular, 1). CTzA0w6 +9,9In the following, we assume that the probabilities in a SCFG are proper and cons<strong>is</strong>tent as definedin Booth & Thompson (1973), and that the grammar contains no useless nonterminals (ones that can neverappear in a derivation). <strong>The</strong>se restrictions ensure that all nonterminals define probability measures overstrings, i.e., +-, )C10 <strong>is</strong> a proper d<strong>is</strong>tribution over 1 for all ) . Formal definitions <strong>of</strong> these conditions aregiven in Section 6.4.8.6.4.2 Earley paths and their probabilitiesIn order to define the probabilities associated with parser operation on a SCFG, we need the concept<strong>of</strong> a path, or partial derivation, executed by the Earley parser.Definition 6.2a) An (unconstrained) Earley path, or simply path, <strong>is</strong> a sequence <strong>of</strong> Earley states linkedby prediction, scanning, or completion. For the purpose <strong>of</strong> th<strong>is</strong> definition, we allow scanning to operatein ‘generation mode,’ i.e., all states with terminals to the right <strong>of</strong> the dot can be scanned, not just thosematching the input. (For completed states, the predecessor state <strong>is</strong> defined to be the complete state fromthe same state set contributing to the completion.)b) A path <strong>is</strong> said to be constrained by (or generate) a string 1 if in all scanned states the terminalsimmediately to the left <strong>of</strong> the dot, in sequence, form the string 1 .c) A path <strong>is</strong> complete if the last state on it matches the first, except that the dot has moved to the end <strong>of</strong>the RHS.d) We say that a path starts with nonterminal ) if the first state on it <strong>is</strong> a predicted state with ) on theLHS.6 In a left-most derivation each step replaces the nonterminal furthest to the left in the partially expandedstring. <strong>The</strong> order <strong>of</strong> expansion<strong>is</strong> actually irrelevant for th<strong>is</strong> definition, due to the multiplicative combination <strong>of</strong> production probabilities. We restrict summation toleft-most derivations to avoid counting duplicates, and because left-most derivations will play an important role later.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!