12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 103Rules learned from smaller samples tend to be useful in structuring larger samples (but not the other¡way around). Thus the analyses <strong>of</strong> the previous samples can effectively guide the search for mergesbased on the longer, more recent ones.An interesting related result from non-probabil<strong>is</strong>tic language induction <strong>is</strong> that strictly (lexicographically)ordered sample presentation makes the learning problem for certain classes <strong>of</strong> grammar provably easier(Porat & Feldman 1991).4.5.4 Summary and D<strong>is</strong>cussion<strong>The</strong> above artificial examples show that SCFG learning based purely on d<strong>is</strong>tributional evidence andthe generic Bayesian simplicity bias can correctly identify several <strong>of</strong> the common lingu<strong>is</strong>tic structures. It alsoshows that th<strong>is</strong> limited evidence can be m<strong>is</strong>leading, especially when it comes to finding the ‘right’ phrasestructures. Th<strong>is</strong> <strong>is</strong> hardly surpr<strong>is</strong>ing, given that a lot <strong>of</strong> the cues for human judgments <strong>of</strong> lingu<strong>is</strong>tic structurepresumable come from other sources, such as the semantic referents <strong>of</strong> the syntactic elements, phonologicalcues, morphological markers, etc. (Morgan et al. 1987).In a brief experiment, we applied our algorithm to a 1200 sentence corpus collected with the BeRPspeech system (Jurafsky et al. 1994a). <strong>The</strong> algorithm produced mostly plausible lexical categories and a largenumber <strong>of</strong> chunks corresponding to frequent phrases and collocations. However, the generalization achievedwas nowhere near what would be required for a sufficient coverage <strong>of</strong> new data. <strong>The</strong> problems can partlybe addressed by simple preprocessing steps, such as tagging <strong>of</strong> lexical items using standard probabil<strong>is</strong>ticapproaches that achieve reasonable performance on data <strong>of</strong> th<strong>is</strong> sort (Kupiec 1992b). Non-traditional phrasestructuring may not be a problem if sentence-level generalization <strong>is</strong> the main goal for an application. Also,bracketing may be induced separately using bracketing models trained from structured data (Brill 1993). Weplan to investigate such hybrid strategies in the future.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!