12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

values in the RHSs <strong>of</strong> productions and feature equations, respectively. Specifically, the operation fattrib , )g?<strong>is</strong>0CHAPTER 5. PROBABILISTIC ATTRIBUTE GRAMMARS 113N --> square [0.5]N1.f = ‘square’ [1.0]--> circle [0.5]N1.f = ‘square’ [1.0]which preserves the likelihood <strong>of</strong> the original grammar.<strong>The</strong> general strategy in combining feature operations with syntactic merging steps <strong>is</strong> to preservedetermin<strong>is</strong>m in the feature equations where possible, under the constraint that all previouslygenerated samplescan still be generated. <strong>The</strong> algorithm implementing th<strong>is</strong> strategy has a striking similarity with standard featureunification. Thus, we can think <strong>of</strong> the feature operations in the example (1) as unifying a constant with thenewly created feature f2, and (2) as unifying the features f1 and f2. Unlike standard unification we canresolve feature value conflicts by splitting probabilities.Feature attribution Although the above method can efficiently find many feature attributions, featureattributions must still be considered independently as potential search steps. A good heur<strong>is</strong>tic to suggestprom<strong>is</strong>ing candidates <strong>is</strong> correlation (or mutual information) between occurrences <strong>of</strong> nonterminals and feature<strong>is</strong> considered if )nonterminalpositive mutual information).and feature valueihave close to perfect correlation (or alternatively, largeWhen applying an operator <strong>of</strong> type fattrib , )g?<strong>is</strong>0 one has to deal with the nondetermin<strong>is</strong>m ar<strong>is</strong>ingfrom multiple occurrences <strong>of</strong> ) nonterminals in production RHSs. (Which instance ) <strong>of</strong> should valuei thebe attributed to?) A full search <strong>of</strong> all alternative attributions would d<strong>is</strong>cover that some yield feature mergingopportunities that preserve the model likelihood, while others do not, and chose the appropriate alternativeon that bas<strong>is</strong>.When processing samples in incremental mode a simple, but effective approach <strong>is</strong> to delay (buffer)processing <strong>of</strong> those samples which lead to productions that give r<strong>is</strong>e to ambiguity in the feature attributions.Specifically, by first processing samples with d<strong>is</strong>tinct feature values, attributionscan be done determin<strong>is</strong>tically.<strong>The</strong> ambiguities in the held-back samples are then implicitly resolved by merging with ex<strong>is</strong>ting productions.For example,S --> A CIRCLE IS BELOW A SQUARES.tr = ‘circle’S.lm = ‘square’S.rel = ‘below’S --> A CIRCLE IS BELOW A CIRCLES.tr = ‘circle’S.lm = ‘circle’S.rel = ‘below’<strong>The</strong> operations fattrib(CIRCLE, circle) <strong>is</strong> unambiguous for the first production, but not for the second one.<strong>The</strong> second production <strong>is</strong> therefore set aside, whereas the first one eventually (after several attribution steps)results in

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!