12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 98After processing 100 random samples <strong>of</strong> th<strong>is</strong> grammar (which now generates an unbounded number<strong>of</strong> sentences), the following phrase structure rules are found.S --> NP VC PP (22)--> NP VI (10)--> NP VT NP (29)--> PP COMMA NP VI (19)--> PP VC NP (20)PP --> P NP (112)NP --> DET N (241)--> NP PP (51)<strong>The</strong> induced rule structure differs in two aspects from the target. Firstly, no VP phrase structure in inferred.Th<strong>is</strong> <strong>is</strong> not surpr<strong>is</strong>ing since each <strong>of</strong> the VP realizations appears only in a single rule, i.e., there <strong>is</strong> nogeneralization or greater succinctness to be gained from positing a VP constituent. Secondly, the embedding<strong>of</strong> PP in NP <strong>is</strong> realized in two, rather than one recursion: one left-recursive through the NP rule, and onetail-recursive through the PP rule. Th<strong>is</strong> redundancy also leads to a slightly suboptimal score compared to thetarget grammar. Still, the grammar found <strong>is</strong> clearly weakly equivalent to the target.AgreementAnother pervasive phenomenon in natural languages <strong>is</strong> agreement, whereby two lexical itemsor constituents are forced to share one <strong>of</strong> more (abstract) features such as number, gender, case marking,etc. For our purposes we can construe the phonologically motivated alternation <strong>of</strong> Engl<strong>is</strong>h determiners (‘a’and ‘the’ versus ‘an’ and ‘thee’) as a case <strong>of</strong> NP-internal agreement. A minimal extension <strong>of</strong> the baselinegrammar to th<strong>is</strong> end <strong>is</strong> the following (only nonterminals that differ from the baseline are shown):NP --> Det1 N1--> Det2 N2Det1 --> a--> theDet2 --> an--> theeN1 --> circle--> square--> triangleN2 --> arc--> octagonAfter merging 40 random samples, these productions were in fact generalized correctly, while theremainder <strong>of</strong> the grammar was identical to the one learned before from the baseline corpus.It should be noted that context-free grammars are not particularly suited to the description <strong>of</strong>agreement phenomena, especially if the agreeing constituents are separated by material not affected by theagreement. A CFG description in th<strong>is</strong> case has to replicate syntactic categories up to the smallest enclosingconstituent level. 1414 <strong>The</strong> standard solution to th<strong>is</strong> problem <strong>is</strong> to factor agreement into a separate description, usually using a feature-based formal<strong>is</strong>m.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!