12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ÊS==66@N,ÆÆ=NÆ00ÆÊ=S=N0Æ=#@0#=,CHAPTER 2. FOUNDATIONS 24a conjugate prior, i.e., <strong>of</strong> the same functional form as the likelihood function for the multinomial. <strong>The</strong>likelihood for a sample from the multinomial with total observed outcomes 1£££7<strong>is</strong> given by equation(2.4). Th<strong>is</strong> means that the prior (2.14) and the likelihood (2.4) combine according to Bayes’ law to give anexpression for the posterior density that <strong>is</strong> again <strong>of</strong> the same form, namely:11@ £ (2.15)\@YX1U ] ^+-,Ç ^ ? 1 N1£££S%. 1£££ZÅ-, 0'6Æ =Furthermore, it <strong>is</strong> convenient that the integral over the product <strong>of</strong> (2.14) and (2.4) has a closed-form solution.11\@YX1+-,SB0+-,Ç ^ ? 1£££Z. SE07–S 6Å-,2ÆÆ =–SU] ^1£££(2.16)ÆM=Å-,Å-,;ÆÆ—= 1 N1£££ZTh<strong>is</strong> integral will be used to compute the posterior for a given model structure, as detailed in Sections 2.5.7and 3.4.3.To get an intuition for the effect <strong>of</strong> the Dirichlet prior it <strong>is</strong> helpful to look at the two-dimensionalcase. For ¢g6 2 there <strong>is</strong> only one free parameter, say U 1 6Ét , which we can identify with the probability <strong>of</strong>heads in a biased coin flip ( U 2 61 LËt being the probability <strong>of</strong> tails). Assume there <strong>is</strong> no a priori reasonto prefer either outcome, i.e., the prior d<strong>is</strong>tribution should be symmetrical about the value t/60£ 5. Th<strong>is</strong>1£££symmetry entails a choice <strong>of</strong> ’s which are equal, in our case 1 6 2 6 . <strong>The</strong> resulting prior d<strong>is</strong>tribution<strong>is</strong> depicted in Figure 2.4(a), for various values <strong>of</strong> Æ%@ . For 1 Æ the prior has the effect <strong>of</strong> adding Æ%@ L 1Æ&@!Ì Æ‘virtual’ samples to the likelihood expression, resulting in a MAP estimate <strong>of</strong>Æ @L 1£ (2.17)eU @Æ—@For 6 1 the prior <strong>is</strong> uniform and the MAP estimate <strong>is</strong> identical to the ML estimate.Æw@Figure 2.4(b) shows the effect <strong>of</strong> varying amounts <strong>of</strong> data (total number <strong>of</strong> samples) on theposterior d<strong>is</strong>tribution. With no data (Ð6 0) the posterior <strong>is</strong> identical to the prior, illustrated here for Æ 6 2. As increases the posterior peaks around the ML parameter setting.W d;dÍNdÎL 10For 0 ÏÏ 1 the MAP estimate <strong>is</strong> biased towards the extremes <strong>of</strong> the parameter space, U@ 6 0 and U @ 6 1.2.5.6 Description Length priors<strong>The</strong> MDL framework <strong>is</strong> most useful for designing priors for the d<strong>is</strong>crete, structural aspects <strong>of</strong>grammars. Any (prefix-free) coding scheme for models that assigns 4a code length Ñ450 can be used toinduce a prior d<strong>is</strong>tribution over models with?>Ô7ÕoÄ Ö+-,450wÒ|Ó£We can take advantage <strong>of</strong> th<strong>is</strong> fact to design ‘natural’ priors for many domains. Th<strong>is</strong> idea will be extensivelyused with all grammar types.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!