12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

CHAPTER 5. PROBABILISTIC ATTRIBUTE GRAMMARS 114S --> A CIRCLE IS BELOW A SQUARES.tr = CIRCLE.f1S.lm = SQUARE.f2S.rel = ‘below’CIRCLE --> circleCIRCLE.f1 = ‘circle’SQUARE --> squareSQUARE.f2 = ‘square’<strong>The</strong> nonterminal merging operation merge(CIRCLE, SQUARE) = N then triggers the merging <strong>of</strong> features f1,f2.S --> A N IS BELOW A NS.tr = N.fS.lm = N.fS.rel = ‘below’N --> circleN.f = ‘circle’--> squareN.f = ‘square’<strong>The</strong> nonterminal merging step also renders the two original productions for S identical, and means theassociated feature equations should be reconciled. To th<strong>is</strong> end, the two instances <strong>of</strong> ‘circle’ are implicitlyattributed to the first and second N.f feature (as part <strong>of</strong> the combined merging operators d<strong>is</strong>cussed in theprevious section).5.3.5 PAG PriorsPrior probability d<strong>is</strong>tributions for PAGs can be constructed using the now familiar techniques. <strong>The</strong>probabilities for the features equations (involving both fixed values and RHS features) represent multinomialparameter tuples for each LHS feature, and are accordingly covered by a Dirichlet prior d<strong>is</strong>tribution. <strong>The</strong>feature equations themselves can be given priors based on description length. Each equation involving aconstant feature value comes at a code length cost <strong>of</strong> log .h!. bits. Equations with features on the RHS are.f”)encoded using log <strong>is</strong> the set <strong>of</strong> featuresattached to nonterminal throughout the grammar (since only those need to be encoded uniquely), and"<strong>is</strong>the length <strong>of</strong> production RHS.. N log"bits, where ) <strong>is</strong> the RHS nonterminal in question,f”Using a description length prior favors feature merging and other operations that achieve morecompact feature specifications by virtue <strong>of</strong> collapsing feature names and equations. Alternatively, one couldmake all feature specifications a priori equally likely 4 and rely solely on the prior <strong>of</strong> the context-free part <strong>of</strong>the grammar for a bias for smaller rules. Since feature equations cannot ex<strong>is</strong>t independently <strong>of</strong> context-freeproductions th<strong>is</strong> will implicitly also bias the feature descriptions against complexity. Furthermore, redundant4 Th<strong>is</strong> <strong>is</strong> not strictly possible for a proper prior since the probability mass has to sum to one, so there has to be some sort <strong>of</strong> decayor cut-<strong>of</strong>f for large grammars. However, when comparing grammars <strong>of</strong> limited, roughly equal size we can use a uniform prior as anapproximation.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!