12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 5. PROBABILISTIC ATTRIBUTE GRAMMARS 109on adjacent parse tree nodes in a way similar to the belief propagation algorithm <strong>of</strong> Pearl (1988), conditioningon the top-level feature assignments and those given by the choice <strong>of</strong> SCFG rules. Once the conditionald<strong>is</strong>tribution <strong>of</strong> feature values on adjacent nodes <strong>is</strong> known, the likelihood <strong>is</strong> maximized by choosing a featureequation’s probability to be proportional to the number <strong>of</strong> times the two features had the same value.While th<strong>is</strong> scheme <strong>is</strong> in principle capable <strong>of</strong> finding full PAGs <strong>of</strong> the kind shown above startingfrom randomly initialized feature probabilities, we found that it <strong>is</strong> also very prone to local maxima and highlydependent on the initial conditions <strong>of</strong> the parameters. As expected, the problem <strong>is</strong> observed increasingly in‘deep’ grammars with many intermediate <strong>of</strong> features, and hence hidden variables as part <strong>of</strong> a derivation.Thus, the problems with the parameter estimation approach to structural learning that were previouslyobserved with HMMs (Section 3.6.1) and SCFGs (Pereira & Schabes 1992) seem to show up even moreseverely in the PAG formal<strong>is</strong>m. Th<strong>is</strong> gives further motivation to our investigation <strong>of</strong> merging-based learningapproaches.5.3 PAG Merging<strong>The</strong> following sections describe the usual ingredients necessary to specify a merging-based learningalgorithm for the PAG formal<strong>is</strong>m. Not surpr<strong>is</strong>ingly, we can build on the corresponding components <strong>of</strong> theSCFG merging algorithm.5.3.1 Sample incorporation<strong>The</strong> first step in the learning algorithm <strong>is</strong> the creation <strong>of</strong> ad hoc rules that allow new samples to begenerated. We use newly created top-level productions to generate the sample strings as before, but augmentthese with feature value assignments to generate the observed attribute-value pairs. Thus, a sample(a circle <strong>is</strong> below a square, { tr = ‘circle’, lm = ‘square’, rel = ‘below’ })<strong>is</strong> incorporated by creating productionsS --> A1 CIRCLE1 IS1 BELOW1 A2 SQUARE1 (1)S.tr = ‘circle’ (1)S.lm = ‘square’ (1)S.rel = ‘below’ (1)A1 --> a (1)CIRCLE1 --> circle (1)IS1 --> <strong>is</strong> (1)BELOW1 --> below (1)A2 --> a (1)SQUARE1 --> square (1)We extend our earlier notation to associate counts with both productions and individual feature equations.Counts on feature equations indicate how <strong>of</strong>ten an equation has been used in feature derivations, and can beused to compute likelihoods, probability estimates, etc. (Section 3.4.3).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!