12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

eingDefinition 6.1 <strong>The</strong> following quantities are defined relative to a SCFG{, a nonterminal ) , and a string 1CHAPTER 6. EFFICIENT PARSING WITH STOCHASTIC CONTEXT-FREE GRAMMARS 127It <strong>is</strong> easy to see that Earley parser operations are correct, in the sense that each chain <strong>of</strong> transitions(predictions, scanning steps, completions) corresponds to a possible (partial) derivation. Intuitively, it <strong>is</strong> alsotrue that a parser that performs these transitions exhaustively <strong>is</strong> complete, i.e., it finds all possible derivations.Formal pro<strong>of</strong>s <strong>of</strong> these properties are given in the literature, e.g., Aho & Ullman (1972). <strong>The</strong> relationshipbetween Earley transitions and derivations will be stated more formally in the next section.<strong>The</strong> parse trees for sentences can be reconstructed from the chart contents. We will illustrate th<strong>is</strong> inSection 6.5 when d<strong>is</strong>cussing Viterbi parses.Table 6.1 gives an example for Earley parsing, in the form <strong>of</strong> a trace <strong>of</strong> transitions as they areperformed by our implementation.Earley’s parser can deal with any type <strong>of</strong> context-free rule format, even with null orA-productions,i.e., those that replace a nonterminal with the empty string. Such productions do however require specialattention, and make the algorithm and its description more complicated than otherw<strong>is</strong>e necessary. In thefollowing sections we assume that no null productions have to be dealt with, and then summarize thenecessary changes in Section 6.4.7. One might chose to simply preprocess the grammar to eliminate nullproductions, a process which <strong>is</strong> also described.6.4 Probabil<strong>is</strong>tic Earley Parsing6.4.1 Stochastic context-free grammarsA stochastic context-free grammar (SCFG) extends the standard context-free formal<strong>is</strong>m by addingprobabilities to each production:¸ )òtróôwhere the rule probabilityt <strong>is</strong> usually written as +-, ) ¸.t <strong>is</strong> a conditional probability, <strong>of</strong> production ) ¸ .. Th<strong>is</strong> notation to some extent hides the fact thatchosen, given that <strong>is</strong> up for expansion. <strong>The</strong>.E0)probabilities <strong>of</strong> all rules with the same )nonterminal on the LHS must therefore sum to unity. Contextfreenessin a probabil<strong>is</strong>tic setting translates into conditional independence <strong>of</strong> rule choices. As a result,complete derivations have joint probabilities that are simply the products <strong>of</strong> the rule probabilities involved.<strong>The</strong> probabilities <strong>of</strong> interest mentioned in Section 6.1 can now be defined formally.over the alphabet <strong>of</strong>{. 5a) <strong>The</strong> probability <strong>of</strong> a (partial) derivation@1C~@2C £££

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!