12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Æ,È0ÈNÉ[[[N 1=00£CHAPTER 2. FOUNDATIONS 232.5.5 Structure vs. parameter priorsA grammatical model <strong>is</strong> here described in two stages:1. A model structure or topology <strong>is</strong> specified as a set <strong>of</strong> states, nonterminals, transitions, productions, etc.(depending on the type <strong>of</strong> model). <strong>The</strong>se elements represent d<strong>is</strong>crete choices as to which derivationsfrom the grammar can have non-zero probability.2. Conditional on a given structure, the model’s continuous parameters are specified. <strong>The</strong>se are typicallymultinomial parameters.4T×and the parameter part . <strong>The</strong> model prior +9, 450 can therefore be written asÄ U 4TÃWe will write 4Â60 to describe the decomposition <strong>of</strong> model 4 into the structure partUÄ. 4TÃE0Even th<strong>is</strong> framework leaves some room for choice: as d<strong>is</strong>cussed earlier, one may choose to makethe structure specification very unconstrained, e.g., by allowing all probability parameters to take on non-zerovalues, effectively pushing the structure specification into the parameter choice. Examples <strong>of</strong> th<strong>is</strong> will bed<strong>is</strong>cussed in the following chapters.+-,+-,+-,;UÄ450w64TÃE02.5.5.1 Priors for multinomial parametersSince the continuous parameters in the grammar types dealt with in th<strong>is</strong> thes<strong>is</strong> are all from multinomiald<strong>is</strong>tributions, it <strong>is</strong> convenient to d<strong>is</strong>cuss some standard priors for th<strong>is</strong> type <strong>of</strong> d<strong>is</strong>tribution at th<strong>is</strong>point.Each multinomial represents a d<strong>is</strong>crete, finite probabil<strong>is</strong>tic choice <strong>of</strong> some event. Let ¢ be thenumber <strong>of</strong> choices in ST6,2U U =1£££ 0 a multinomial and, the probability parameters associated with eachchoice (only 1 <strong>of</strong> these parameters are W @ U @6 free since 1).¢JLA standard prior for multinomials <strong>is</strong> the Dirichlet d<strong>is</strong>tribution11@ (2.14)+-,SE0%6Å-,;ÆÆ =UÇ ^ ?where 1£££normalizing constant Æ 1£££7Å9,2ÆÆ—=1£££\@YX1are parameters <strong>of</strong> the prior which can be given an intuitive interpretation (see below). <strong>The</strong>0 <strong>is</strong> the ¢ -dimensional Beta function,Æ =,2Æ,2Æ—=Å-,;ÆÆ =10Á[[[0 6|W @ Æ—@<strong>is</strong> the total prior weight.One important reason for the use <strong>of</strong> the Dirichlet prior in the case <strong>of</strong> multinomial parameters(Cheeseman et al. 1988; Cooper & Herskovits 1992; Buntine 1992) <strong>is</strong> its mathematical expediency. It <strong>is</strong>1£££È ,2Æ 0%6Æ =<strong>The</strong> prior weights Æw@ determine the bias embodied in the prior: the prior expectation <strong>of</strong> U@ <strong>is</strong> Ç$^Ç0 , where

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!