12.07.2015 Views

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

The dissertation of Andreas Stolcke is approved: University of ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

..1 £££1; 450,1 £££1; 450CHAPTER 2. FOUNDATIONS 8d<strong>is</strong>tinction between language (d<strong>is</strong>tribution) and grammar (description), and simply write+-,+-,)/. 450'6)/. ¨450702.2.1 Interpretation <strong>of</strong> probabilities<strong>The</strong> probabilities assigned to strings by a model can be interpreted as long-term relative frequencies<strong>of</strong> those strings, assuming conditionalindependence <strong>of</strong> samples. Th<strong>is</strong> frequent<strong>is</strong>t interpretationbecomes problematicas we introduce probabilities <strong>of</strong> entities that do not correspond to outcomes <strong>of</strong> repeatable experiments,such as the models themselves.In such cases it <strong>is</strong> more plausible to think <strong>of</strong> probabilities as degrees <strong>of</strong> belief, the subjectiv<strong>is</strong>t interpretation<strong>of</strong> probabilities. <strong>The</strong> classic paper by Cox (1946) shows that the two interpretations are compatiblein that they observe the same calculus if some simple assumptions about beliefs and their compositionalproperties are made.<strong>The</strong> unification <strong>of</strong> the frequent<strong>is</strong>t and the subjectiv<strong>is</strong>t treatment <strong>of</strong> probabilities <strong>is</strong> fundamental tothe Bayesian approach described later. It allows probabilities to be used as the common ‘currency’ relatingobservations and beliefs about underlying explanations for observations.2.2.2 An example: 8 -gram modelsA simple example both illustrates these notions and introduces a useful standard tool for later use.An ¢ -gram model defines the probability<strong>of</strong> a string +9,21 . 450 as a product <strong>of</strong> conditionalprobabilities+9,21+9,21+-,;1+-,21$=. 450:61. $; 4502. $ 1 1; 450=$?£££+9,21$@1>@A?B=1$@C?#1 ££££££+9,21ED1ED ?B=1ED ?+-,1EDwhere G <strong>is</strong> the string length, 1 @ <strong>is</strong> the H th symbol in string 1 , and $ <strong>is</strong> a special delimiter marking beginningand end <strong>of</strong> a string.<strong>The</strong> parameters <strong>of</strong> an ¢ -gram model are thus the probabilities$.1BD ?E=#1 £££#2 £££10+-,21 =1 =>?for1 =JI1£££ all 1$ . (It <strong>is</strong> convenient to allow a prefix <strong>of</strong> 1 1 £££1 =>?3"K.11 to take on the value $ to denotethe left end <strong>of</strong> the string. <strong>The</strong> context always makes it clear which end <strong>of</strong> a string $ stands for.)It <strong>is</strong> useful to compare definition (2.1) to the expression for +-,21 . 450 arrived at by repeated conditioning,which <strong>is</strong> true <strong>of</strong> any probability d<strong>is</strong>tribution over strings:0 (2.2)We see that that ¢ -gram models represent exactly those d<strong>is</strong>tributions in which each symbol <strong>is</strong> independent <strong>of</strong>the beginning <strong>of</strong> the string given just the immediately preceding ¢JL 1 symbols (including the position <strong>of</strong> theleft string boundary).. 450%61. $02. $ 1 10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!