12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 5. PROBABILISTIC ATTRIBUTE GRAMMARS 120cannot be adequately described, let alone learned. 8<strong>The</strong> obvious solution to th<strong>is</strong> representational problem are hierarchical feature structures (f-structures)as used by a number <strong>of</strong> lingu<strong>is</strong>tic theories <strong>of</strong> grammar (Shieber 1986). However, th<strong>is</strong> ra<strong>is</strong>es new problemsconcerning the ‘proper’ probabil<strong>is</strong>tic formulation. For example, the set <strong>of</strong> hierarchical feature specificationsover a finite set <strong>of</strong> feature names becomes infinite, ra<strong>is</strong>ing the question <strong>of</strong> an appropriate prior d<strong>is</strong>tributionover th<strong>is</strong> space. Also, new and more varied merging operators would have to be introduced to match theincreased expressiveness <strong>of</strong> the formal<strong>is</strong>m, leading to new intricacies for the search procedure. Still, pursuingsuch an extension, maybe not in the full generality <strong>of</strong> standard f-structures, <strong>is</strong> a worthwhile subject for futureresearch.5.5.3 Trade-<strong>of</strong>fs between context-free and feature descriptionsReturning to the <strong>is</strong>sue <strong>of</strong> generalized feature constraints,there <strong>is</strong> also a fundamental question on whatevidence such ‘hidden’ feature constraints could be learned. We have seen in Section 4.5.2 that agreement,for example, can be represented and learned based on posterior probabilities and merging operators, but itbecame clear that context-free productions are an inadequate formal<strong>is</strong>m for these phenomena. Ideally, onewould want to move from a description such astoS --> NP_sg VP_sg--> NP_pl VP_plS --> NP VPNP.number = VP.numberIn general, there are other cases where context-free rules and features provide alternative models for the samed<strong>is</strong>tributional facts. In such cases a description length criterion should be used to decide which <strong>is</strong> the betterformulation.Incidentally, nonterminals themselves may be expressed as features (Gazdar et al. 1985; Shieber1986), thereby eliminating the need for separate descriptions mechan<strong>is</strong>ms. Th<strong>is</strong> could be the bas<strong>is</strong> <strong>of</strong> a unifieddescription length metric that allows fair compar<strong>is</strong>ons <strong>of</strong> alternative modeling solutions.5.6 SummaryIn th<strong>is</strong> chapter we examined a minimal extension <strong>of</strong> stochastic context-free grammars that incorporatessimple probabil<strong>is</strong>tic attributes (or features). We saw how a pair <strong>of</strong> feature-oriented operators (featuremerging and attribution) together with the ex<strong>is</strong>ting SCFG operators can induce simple grammars in th<strong>is</strong>formal<strong>is</strong>m, and applied the approach to a rudimentary form <strong>of</strong> semantics found in the ¨ 0 miniature language8 A restricted version in which embedding <strong>is</strong> restricted to one level and the semantics are flattened out in several features can belearned using the operators described in th<strong>is</strong> chapter.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!