13.07.2015 Views

View - Statistics - University of Washington

View - Statistics - University of Washington

View - Statistics - University of Washington

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

100one particular data value, which causes a spike in the histogram <strong>of</strong> data values.Since the spike consists <strong>of</strong> a single discrete data value, the variance <strong>of</strong> a Gaussiancomponent might shrink as the component tries to model this single value. Ifthe variance is allowed to approach zero, then the likelihood will approach infinity.This problem is solved by imposing a lower bound on the variance <strong>of</strong> a component.If we imagine that the observed data are a discretized version <strong>of</strong> a true Gaussianmixture, then an observation Y i can be thought <strong>of</strong> as having a round-<strong>of</strong>f errorwhich is unobserved. This means that an observed value <strong>of</strong> Y i could arise from atrue value in the range (Y i −0.5) to (Y i +0.5). To capture this variability, I imposea lower bound <strong>of</strong> 0.5 on the estimate <strong>of</strong> σ for each component.With Poisson mixtures, there is only a mean parameter µ, so the variance constraintis not needed. However, Poisson mixtures sometimes have an identifiabilityproblem. If the µ j values for two components become too similar, then they aremodeling the same feature <strong>of</strong> the data and their mixture proportions become arbitrary.That is, if components A and B have the same mean, then P A and P Bare not uniquely defined.There is a milder problem with means in Gaussian mixtures, as well. If twocomponents have means which are close, then the component with the larger variancewill be split. That is, points close to the common mean <strong>of</strong> the two segmentswill be classified into the segment with smaller variance, since it has a higher likelihood.Points farther from the mean, both high and low, will be classified into thecomponent with larger variance; this component will then contain sets <strong>of</strong> pointswhich are disjoint in grey level. This problem was discussed in section 3.4.2.There is some question <strong>of</strong> how to determine when the EM algorithm has converged.Typically, one looks for convergence in the loglikelihood, so the algorithmis stopped when the change in loglikelihood from one iteration to the next is belowa certain threshold. It is not always clear how this threshold should be chosen.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!