12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 97<strong>The</strong> result grammar <strong>is</strong> weakly equivalent to (generates the same sentence as) the target, and hasidentical lexical rules. However, the sentence-level productions are less structured: 13S --> NP VC P NP (28)--> NP VI (39)--> NP VT NP (33)NP --> DET N (161)<strong>The</strong> algorithm was thus successful in grouping terminals into appropriate lexical categories and identifyingthe pervasive NP constituents. <strong>The</strong> log posterior probability <strong>of</strong> th<strong>is</strong> grammar (-230.30769) <strong>is</strong> slightly belowthat <strong>of</strong> the target grammar (-230.1136).phrase structure:However, a more extensive beam-search (width 30) finds an alternative grammar exhibiting a deeperS --> NP VP (100)VP --> VI (39)--> X NP (61)X --> VT (33)--> VC P (28)NP --> DET N (161)Th<strong>is</strong> structure turns out to have a somewhat higher log posterior probability than the target (-228.658).Chunking (VC P) <strong>is</strong> prefered to the standard (P NP) because it allows merging the two VP expansionsinvolving VC and VT, respectively.Interestingly, the importance <strong>of</strong> the prior d<strong>is</strong>tributionfor the production lengths <strong>is</strong> already evident inth<strong>is</strong> experiment. Were it not for the Po<strong>is</strong>son prior length d<strong>is</strong>tribution, the flat productions in the first grammarabove would actually yield a higher posterior probability than the target grammar.Recursive embedding <strong>of</strong> constituentsTo get PP constituents based only on d<strong>is</strong>tributional evidence, thegrammar can be enriched, e.g., with topicalized PPs and PPs embedded in NPs. <strong>The</strong> changed productions areS --> NP VP [3/5]--> PP COMMA NP Vi [1/5]--> PP Vc NP [1/5]NP --> Det N [3/4]--> Det N PP [1/4]COMMA --> ,Th<strong>is</strong> extends the range <strong>of</strong> sentence to samples such asabove a square <strong>is</strong> the squarethe circle below the triangle below a circle touches the triangleabove a triangle , a circle rollsbelow a square , the triangle above the square below the square bounces13 Induced grammars in th<strong>is</strong> section are notated with nonterminal labels that chosen from the target grammar where possible, so as toenhance readability. <strong>The</strong> actual nonterminal names are <strong>of</strong> course irrelevant to the algorithm.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!