The dissertation of Andreas Stolcke is approved: University of ...
The dissertation of Andreas Stolcke is approved: University of ...
The dissertation of Andreas Stolcke is approved: University of ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
CHAPTER 4. STOCHASTIC CONTEXT-FREE GRAMMARS 103Rules learned from smaller samples tend to be useful in structuring larger samples (but not the other¡way around). Thus the analyses <strong>of</strong> the previous samples can effectively guide the search for mergesbased on the longer, more recent ones.An interesting related result from non-probabil<strong>is</strong>tic language induction <strong>is</strong> that strictly (lexicographically)ordered sample presentation makes the learning problem for certain classes <strong>of</strong> grammar provably easier(Porat & Feldman 1991).4.5.4 Summary and D<strong>is</strong>cussion<strong>The</strong> above artificial examples show that SCFG learning based purely on d<strong>is</strong>tributional evidence andthe generic Bayesian simplicity bias can correctly identify several <strong>of</strong> the common lingu<strong>is</strong>tic structures. It alsoshows that th<strong>is</strong> limited evidence can be m<strong>is</strong>leading, especially when it comes to finding the ‘right’ phrasestructures. Th<strong>is</strong> <strong>is</strong> hardly surpr<strong>is</strong>ing, given that a lot <strong>of</strong> the cues for human judgments <strong>of</strong> lingu<strong>is</strong>tic structurepresumable come from other sources, such as the semantic referents <strong>of</strong> the syntactic elements, phonologicalcues, morphological markers, etc. (Morgan et al. 1987).In a brief experiment, we applied our algorithm to a 1200 sentence corpus collected with the BeRPspeech system (Jurafsky et al. 1994a). <strong>The</strong> algorithm produced mostly plausible lexical categories and a largenumber <strong>of</strong> chunks corresponding to frequent phrases and collocations. However, the generalization achievedwas nowhere near what would be required for a sufficient coverage <strong>of</strong> new data. <strong>The</strong> problems can partlybe addressed by simple preprocessing steps, such as tagging <strong>of</strong> lexical items using standard probabil<strong>is</strong>ticapproaches that achieve reasonable performance on data <strong>of</strong> th<strong>is</strong> sort (Kupiec 1992b). Non-traditional phrasestructuring may not be a problem if sentence-level generalization <strong>is</strong> the main goal for an application. Also,bracketing may be induced separately using bracketing models trained from structured data (Brill 1993). Weplan to investigate such hybrid strategies in the future.