12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

for all pairs <strong>of</strong> states d =¸+= ŠÙ…Ù´ ´Ù+= Ḿ© ´Ù…Ù¸¸ 99 9W9 ,òu óCHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 152<strong>is</strong> non-zero. <strong>The</strong>n@>£ and6)¸.M£j>Θ in the chart, such that the unit-productionrelation ©,> CE=-0 C©=-0<strong>The</strong> first summation <strong>is</strong> carried out once for each state :6) ¸,>applied for each choice <strong>of</strong>>, but only if ) ¸ .t>ΘRationale. Th<strong>is</strong> increments ḾÙ…Ù the equivalent <strong>of</strong> ©, whereas the second summation <strong>is</strong>not a unit production..—£j>͘times, accounting for the infinity <strong>of</strong> C«=ë0 ,><strong>is</strong>surroundings which= in can occur if it can be derived through cyclic productions. Note that the computation<strong>is</strong> unchanged, sinceŠÙ…Ù already includes an infinity <strong>of</strong> cyclically generated subtrees for=, where<strong>of</strong>´Ùappropriate.6.5.3 Parsing bracketed inputs<strong>The</strong> estimation procedure described above (and EM-based estimators in general) are only guaranteedto find locally optimal parameter estimates. Unfortunately, it seems that in the case <strong>of</strong> unconstrained SCFGestimation local maxima present a very real problem, and make success dependent on chance and initialconditions (Lari & Young 1990). Pereira & Schabes (1992) showed that partially bracketed input samplescan alleviate the problem in certain cases. <strong>The</strong> bracketing information constrains the parse <strong>of</strong> the inputs, andtherefore the parameter estimates, steering it clear from some <strong>of</strong> the suboptimal solutions that could otherw<strong>is</strong>ebe found.A second advantage <strong>of</strong> bracketed inputs <strong>is</strong> that they potentially allow more efficient processing,since the space <strong>of</strong> potential derivations (or equivalently, Earley paths) <strong>is</strong> reduced. It <strong>is</strong> therefore interesting tosee how any given parser can incorporate partial bracketing information. Th<strong>is</strong> <strong>is</strong> typically not a big problem,but in the case <strong>of</strong> Earley’s algorithm there <strong>is</strong> a particularly simple and elegant solution.Consider again the grammartróthat any candidate parse has to be cons<strong>is</strong>tent with, e.g., there cannot be a parse that has a constituent spanningthe first and second , or the third and fourth. <strong>The</strong> supplied bracketing can be nested, <strong>of</strong> course, and need notbe complete, i.e., within a bracketing there are still potentially several ways <strong>of</strong> parsing a substring.<strong>The</strong> Earley parser can deal efficiently with partial bracketing information as follows. A partiallybracketed input <strong>is</strong> processed as usual, left-to-right. When a bracketed portion <strong>is</strong> encountered, the parserinvokes itself recursively on the substring delimited by the pair <strong>of</strong> parentheses encountered. More prec<strong>is</strong>ely:òA partially bracketed input for th<strong>is</strong> grammar would be 07 . <strong>The</strong> parentheses indicate phrase boundaries¡<strong>The</strong> recursive parser instance gets to see only the substring as input.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!