12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

H : d§=:6) ¸Æ¸¸ 99W9 Consequently, the example trace shows that the factortL++?16ò u ó°£?,,ò:6) ¸ .„>£16?1󰣘ŠÙ ó ÙÌCHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 139<strong>The</strong> ex<strong>is</strong>tence <strong>of</strong> ©¦ <strong>is</strong> shown in Section 6.4.8.<strong>The</strong> modified completion loop in the probabil<strong>is</strong>tic Earley parser can now use the © matrix tocollapse all unit completions into a single step. Note that we still have to do iterative completion on non-unitproductions.Completion (probabil<strong>is</strong>tic, transitive)ò…Æ@$£Ù…Ù )ŠÙ…Ù óò…Ƹò Æ.—£>Θ)ŠóŽ 6QC Hfor all=i?> such that ©,> C©=ã0 <strong>is</strong>non-zero, and=¸@<strong>is</strong> not a unit production (.@%.1). <strong>The</strong>nÙ += ÆŠÙ…Ù © C©=ë0 ,>6.4.6 An exampleŠÙ += Š}ŠÙ…Ù © Cª=90 ,>Consider the grammar9tróu ówhere u 6 1 L"t . Th<strong>is</strong> highly ambiguous grammar generates strings <strong>of</strong> any number <strong>of</strong> ’s, using all possiblebinary parse trees over the given number <strong>of</strong> terminals. <strong>The</strong> states involved in parsing the string are l<strong>is</strong>tedin Table 6.2, along with their forward and inner probabilities. <strong>The</strong> example illustrates how the parser dealswith left-recursion and the merging <strong>of</strong> alternative sub-parses during completion.Since the grammar has only a single nonterminal, the left-corner matrix + has rank 1:ò 6Its transitive closure <strong>is</strong>©"6µ0?tóò,;²in a number <strong>of</strong> prediction steps.<strong>The</strong> sample string can be parsed as either , 070 or ,_, 070 , each parse having a probability <strong>of</strong>1 being introduced into the forward probability termstòt 3u 2 . <strong>The</strong> total string probability <strong>is</strong> thus 2t 3u 2 , the computed Æ andŠ values for the final state. <strong>The</strong> Ævalues for the scanned states in sets 1, 2 and 3 are the prefix probabilities for , , and , respectively: 1, +-,9 CTg0%6 u, +-,9 CÀg0%6 CÀh0%6+-,91 NËtE0 u 2 .6.4.7 Null productionsNull productions ) ¸Aintroduce some complications into the relatively straightforward parseroperation described so far, some <strong>of</strong> which are due specifically to the probabil<strong>is</strong>tic aspects <strong>of</strong> parsing. Th<strong>is</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!