26.10.2013 Views

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

Nonparametric Bayesian Discrete Latent Variable Models for ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 <strong>Nonparametric</strong> <strong>Bayesian</strong> Analysis<br />

belief. In this case, one would need to use priors from a larger family. Using a prior<br />

distribution from a more general family will typically result in the integral in eq. (2.2)<br />

being intractable, hence increased computational complexity in posterior calculations.<br />

Conditionally conjugate priors provide a richer family of distributions while retaining<br />

some of the tractability.<br />

<strong>Bayesian</strong> inference refers to obtaining the posterior distributions <strong>for</strong> the parameters<br />

of interest and extracting in<strong>for</strong>mation about these parameters from the posterior. After<br />

updating our beliefs about the model parameters using the Bayes’ rule, we can use the<br />

model to make predictions about new observations x∗ , or about any unobserved quantity<br />

φ whose distribution depends on the parameters, using the predictive distribution,<br />

<br />

P (φ | X) = P (φ | Ψ)P (Ψ | X) dΨ. (2.4)<br />

We may be asked to give a single prediction value ˆ φ, referred to as a point estimate,<br />

rather than the distribution of the predictions. In this case, we choose a value ˆ φ from<br />

the predictive distribution that minimizes the expected loss <strong>for</strong> a given loss function<br />

L( ˆ φ, φ),<br />

E L( ˆ φ, φ) <br />

= L( ˆ φ, φ) P (φ | X) dφ. (2.5)<br />

The optimal choice ˆ φ is referred to as the Bayes estimate.<br />

In density modeling, we want to model the generating density in the light of the<br />

observations. The Kullback-Leibler (KL) divergence is a standard measure of the discrepancy<br />

between two distributions. The difference of the estimated density Q(φ) to<br />

the true generating density P (φ) is given as<br />

<br />

DKL(P Q) =<br />

P (φ) log<br />

P (φ)<br />

dφ. (2.6)<br />

Q(φ)<br />

Since the generating density P is fixed, maximizing the log-likelihood minimizes the KL<br />

divergence.<br />

Some other widely used loss functions are the squared error loss, L( ˆ φ, φ) = φ − ˆ φ 2<br />

which is minimized by the posterior mean, the absolute error loss L( ˆ φ, φ) = φ − ˆ φ <br />

minimized by the posterior median, and the zero-one loss,<br />

minimized by the posterior mode.<br />

L( ˆ φ, φ) = 1 if φ − ˆ φ > ε<br />

L( ˆ φ, φ) = 0 otherwise,<br />

Assumptions on the Model Structure<br />

(2.7)<br />

Above, we referred to the set of all unknown variables in a model as the parameters,<br />

denoted by Ψ. Some of the parameters in a model may have physical interpretations and<br />

there<strong>for</strong>e their values may be of interest, whereas some are merely necessary <strong>for</strong> building<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!