∑where n i is the number of occurrences of i in a sample of n = K n i points from the discretedistribution on {1, · · · , K} defined by X. Then,i=1X | β = (n 1 , . . . , n K ) ∼ Dir(α + β).This relationship is used in Bayesi<strong>an</strong> statistics to estimate the hidden parameters X, given acollection of n samples. Intuitively, if the prior is represented as Dir(α), then Dir(α + β) isthe posterior following a sequence of observations <strong>with</strong> histogram β.The Dirichlet process was introduced by Ferguson (1973) as the extension of the Dirichletdistribution from finite dimensions to infinite dimensions. It is a distribution of distributions<strong>an</strong>d has two parameters: the shape parameter G 0 is a distribution over a sample space Ω <strong>an</strong>dthe concentration parameter α 0 is a positive scalar. They have similar interpretation as theircounterparts in the Dirichlet distribution. The formal definition is the following:Definition The Dirichlet process over a set Ω is a stochastic process whose sample path is aprobability distribution over Ω. For a r<strong>an</strong>dom distribution F distributed according to a Dirichletprocess DP(α 0 , G 0 ), given <strong>an</strong>y finite measurable partition A 1 , A 2 , · · · , A K of the samplespace Ω, the r<strong>an</strong>dom vector (F (A 1 ), · · · , F (A K )) is distributed as a Dirichlet distribution <strong>with</strong>parameters (α 0 G 0 (A 1 ), · · · , α 0 G 0 (A K )).Use the results form the Dirichlet distribution, for <strong>an</strong>y measurable set A, the r<strong>an</strong>domvariable F (A) has me<strong>an</strong> G 0 (A) <strong>an</strong>d vari<strong>an</strong>ce G 0(A)(1−G 0 (A))α 0 +1. The me<strong>an</strong> implies the shapeparameter G 0 represents the center of a r<strong>an</strong>dom distribution F drawn from a Dirichlet processDP(α 0 , G 0 ). Define a i ∼ F as <strong>an</strong> observation drawn from the distribution F . Because bydefinition P (a i ∈ A | F ) = F (A), we c<strong>an</strong> derive P (a i ∈ A | G 0 ) = E(P (a i ∈ A | F ) | G 0 ) =E(F (A) | G 0 ) = G 0 (A). Hence, the shape parameter G 0 is also the marginal distribution of<strong>an</strong> observation a i . The vari<strong>an</strong>ce implies the concentration parameter α 0 controls how close ther<strong>an</strong>dom distribution F is to the shape parameter G 0 . The larger α 0 is, the more likely F isclose to G 0 , <strong>an</strong>d vice versa.26
Suppose there are n observations, a = (a 1 , · · · , a n ), drawn from the distribution F . Usen∑δ ai (A j ) toi=1represent the number of a i in set A j , where A 1 , · · · , A K is a measurable partition of thesample space Ω <strong>an</strong>d δ ai (A j ) is the Dirac measure, where⎧⎪⎨ 1 if a i ∈ A jδ ai (A j ) =.⎪⎩ 0 if a i /∈ A j( n∑)n∑Conditional on (F (A 1 ), · · · , F (A K )), the vector δ ai (A 1 ), · · · , δ ai (A K ) has a multinominaldistribution. By the conjugacy of Dirichlet distribution to the multi-nominali=1i=1distribution,the posterior distribution of (F (A 1 ), · · · , F (A K )) is still a Dirichlet distribution(F (A 1 ), · · · , F (A K )) | a ∼ Dir(α 0 G 0 (A 1 ) +n∑δ ai (A 1 ), · · · , α 0 G 0 (A K ) +i=1)n∑δ ai (A K )Because this result is valid for <strong>an</strong>y finite measurable partition, the posterior of F is still Dirichletprocess by definition, <strong>with</strong> new parameters α ∗ 0 <strong>an</strong>d G∗ 0 , wherei=1α ∗ 0 = α 0 + nG ∗ 0 = α 0α 0 + n G 0 +nα 0 + nThe posterior shape parameter, G ∗ 0 , is the mixture of the prior <strong>an</strong>d the empirical distributionimplied by observations. As n → ∞, the shape parameter of the posterior converges tothe empirical distribution. The concentration parameter α ∗ 0n∑i=1δ ain→ ∞ implies the posterior of Fconverges to the empirical distribution <strong>with</strong> probability one. Ferguson (1973) showed that ar<strong>an</strong>dom distribution drawn from a Dirichlet process is almost sure discrete, although the shapeparameter G 0 c<strong>an</strong> be continuous.27