12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Óz ?ts%Õ> ÖvuwCHAPTER 5. PROBABILISTIC ATTRIBUTE GRAMMARS 119theory <strong>of</strong> Markov networks (Pearl 1988) tells us that such a marginal probabilities cannot be assigned locallyin a cons<strong>is</strong>tent fashion.A partial solution to th<strong>is</strong> particular problem <strong>is</strong> the imposition <strong>of</strong> a global total ordering amongagreement features, so that the entire system <strong>of</strong> constraints <strong>is</strong> again expressible as a product <strong>of</strong> conditionalprobabilities. An ordering based on the lingu<strong>is</strong>tic notion <strong>of</strong> phrase head might accompl<strong>is</strong>h th<strong>is</strong>: all RHSfeatures could then only depend on RHS features associated with the d<strong>is</strong>tingu<strong>is</strong>hed head constituent. In anycase, the convenient global bottom-up ordering <strong>of</strong> feature constraints would have to be given up, and theprobabilities <strong>of</strong> the – Ã (string) component <strong>of</strong> a derivation would no longer be independent <strong>of</strong> the featuralaspects <strong>of</strong> the grammar.Systems <strong>of</strong> unordered constraints can still be given a probabil<strong>is</strong>tic interpretation using well-knownconcepts from stat<strong>is</strong>tical physics. Instead <strong>of</strong> directly defining probabilities for local feature assignments, wecould instead define an energy function that expresses the ‘badness’ <strong>of</strong> an assignment as an arbitrary positivenumber. <strong>The</strong> energy function can be defined by local components, namely as a product <strong>of</strong> local contributions,e.g., one for each rule. <strong>The</strong> above rule would thereby be translated into a term that gives low energy if andonly if NP.number = VP.number. <strong>The</strong> total energy b-,;1 0 <strong>of</strong> a complete feature assignment 1 <strong>is</strong> thenused to generate probabilities according to the Boltzmann d<strong>is</strong>tribution+-,210%6<strong>is</strong> the normalizing constant (integral over the numerator), andx <strong>is</strong> a parameter (the temperature)that controls the ‘peakedness’ <strong>of</strong> the d<strong>is</strong>tribution. 7where>Th<strong>is</strong> formulation <strong>is</strong> elegant and intuitively appealing (although it obviates the traditional concept<strong>of</strong> derivation). However, it carries a heavy computational price: Simply obtaining the probability <strong>of</strong> a givenstring/feature assignment pair for various alternative grammars generally requires stochastic simulationsin order to compute the constant>.<strong>The</strong> posterior probabilities <strong>of</strong> models can also be evaluated usingMonte-Carlo techniques (Neal 1993), but the approach seems too inefficient for the generate-and-test searchstrategies we have explored so far. On the other hand, the formulation also suggests investigating otherlearning paradigms, such as stochastic optimization via simulated annealing and Boltzmann machine learning(Geman & Geman 1984; Hinton & Sejnowski 1986).5.5.2 Hierarchical featuresIn Chapter 4 we saw that the merging algorithm <strong>is</strong> quite capable <strong>of</strong> inducing recursive syntacticrules. However, in the PAG formal<strong>is</strong>m there <strong>is</strong> no matching recursive structure in the feature descriptions: it <strong>is</strong>constrained to ‘flat’ feature structures. As a result, even the ¨ 0 semantics <strong>of</strong> sentences with simple embeddedrelative clausesa circle <strong>is</strong> below a square which <strong>is</strong> to the left <strong>of</strong> a triangle which ...7 all Aa"zyconfigurations are equally probable, regardless <strong>of</strong> their energy, whereas ata" 0 all probability mass <strong>is</strong> concentrateon the lowest energy configuration.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!