12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

,,vv,v,v,,directly. However, note that for minimization purposes only the } ~,– derives 1 +-,v,”,CHAPTER 2. FOUNDATIONS 12follows thatu 0Œ‹ },u 0Ž‹ 0, with t%.o.between d<strong>is</strong>tributions, or between d<strong>is</strong>tributions and models. 4Computingtw.o. u 0w6 0 iff t 6 u. Th<strong>is</strong> justifies thinking <strong>of</strong>tw.o. u 0 as a pseudo-d<strong>is</strong>tancescenarios at most one <strong>is</strong> known, e.g., because it <strong>is</strong> given by a model. For example, we might use the relativet%. . u 0 exactly presumes knowledge <strong>of</strong> the full d<strong>is</strong>tributions t and u , but in typicalentropy to define an estimator for model parameters, such that the estimated value <strong>is</strong> that which vminimizes45070 , where t <strong>is</strong> the d<strong>is</strong>tribution from which the samples are drawn. Since t <strong>is</strong> not known, we cannotIt can be shown that }Š~tB0 always, with equality if and only if the two d<strong>is</strong>tributions are identical. Itcomputet%.o. ¨u 0 term <strong>is</strong> relevant, } tB0 <strong>is</strong>,u 0 to be minimized, which,an unknown constant, but one that remains fixed as 4 <strong>is</strong> varied. Th<strong>is</strong> leaves } ~<strong>is</strong> the expected value <strong>of</strong> L log , u 0 under the true d<strong>is</strong>tributiont0 . It can therefore be estimated by averagingover the sample corpus )10 (2.10),;1} ~. )/.y zs“” log u ,;1tB0&‘’LWe thus see that estimated cross-entropy <strong>is</strong> proportional (by a factor L1 • <strong>of</strong>) to the log <strong>of</strong> the•likelihood. <strong>The</strong>refore, ML estimators are effectively also minimum relative entropy estimators.2.3 Grammars with hidden variablesAlthough all grammars considered here generate their samples through a combination <strong>of</strong> multinomials,the sequence <strong>of</strong> choices that can give r<strong>is</strong>e to a given sample <strong>is</strong> not always uniquely determined, unlikefor ¢ -gram grammars. <strong>The</strong>re, one can uniquely identify the sequence <strong>of</strong> choices leading to the generation<strong>of</strong> a complete string, by inspecting the ¢ -grams occurring in the string in left-to-right order. Knowing the-grams, one can then compute their probabilities, and hence the probability <strong>of</strong> the string itself (by taking¢products).A complete sequence <strong>of</strong> generator events (multinomial samples) that generate string 1 <strong>is</strong> called aderivation <strong>of</strong> 1 . (Thus, for ¢ -gram models the only derivation <strong>of</strong> a strings <strong>is</strong> the string itself.) Grammarsthat generate strings with more than one derivation are called ambiguous. Each derivation – has a probabilitywhich <strong>is</strong> the product <strong>of</strong> the probabilities <strong>of</strong> the multinomial outcomes making up the derivation. In general,then, a string probability <strong>is</strong> a sum <strong>of</strong> derivation probabilities, namely, <strong>of</strong> all derivations generating the samestring:. 450—6 y+-,;1–

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!