04.01.2013 Views

Springer - Read

Springer - Read

Springer - Read

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

290 Chapter 8 State-Space Models<br />

The last equality follows from the fact that<br />

0 ∂<br />

<br />

<br />

∂<br />

∂<br />

(1) (f (x|Y; θ)dx f(x|Y; θ) dx.<br />

∂θ ∂θ<br />

θ ˆθ ∂θ θ ˆθ<br />

The computational advantage of the EM algorithm over direct maximization of the<br />

likelihood is most pronounced when the calculation and maximization of the exact<br />

likelihood is difficult as compared with the maximization of Q in the M-step. (There<br />

are some applications in which the maximization of Q can easily be carried out<br />

explicitly.)<br />

Missing Data<br />

The EM algorithm is particularly useful for estimation problems in which there are<br />

missing observations. Suppose the complete data set consists of Y1,...,Yn of which<br />

r are observed and n − r are missing. Denote the observed and missing data by Y <br />

(Yi1,...,Yir ) ′ and X (Yj1,...,Yjn−r ) ′ , respectively. Assuming that W (X ′ , Y ′ ) ′<br />

has a multivariate normal distribution with mean 0 and covariance matrix , which<br />

depends on the parameter θ, the log-likelihood of the complete data is given by<br />

ℓ(θ; W) − n 1<br />

1<br />

ln(2π)− ln det() −<br />

2 2 2 W′ W.<br />

The E-step requires that we compute the expectation of ℓ(θ; W) with respect to the<br />

conditional distribution of W given Y with θ θ (i) . Writing (θ) as the block matrix<br />

<br />

11 12<br />

<br />

,<br />

21 22<br />

which is conformable with X and<br />

<br />

Y, the conditional distribution<br />

<br />

of<br />

<br />

W given Y is<br />

11|2(θ) 0<br />

and covariance matrix , where ˆX <br />

multivariate normal with mean ˆX<br />

0<br />

0 0<br />

Eθ(X|Y) 12 −1<br />

22 Y and 11|2(θ) 11 − 12 −1<br />

22 21 (see Proposition A.3.1).<br />

Using Problem A.8, we have<br />

Eθ (i)<br />

where ˆW <br />

(X ′ , Y ′ ) −1 (θ)(X ′ , Y ′ ) ′ |Y trace 11|2(θ (i) ) −1<br />

11|2 (θ) + ˆW ′ −1 (θ) ˆW,<br />

<br />

ˆX ′ , Y ′<br />

′<br />

. It follows that<br />

Q θ|θ (i) <br />

ℓ θ, ˆW − 1<br />

2 trace 11|2<br />

θ (i) −1<br />

11|2 (θ) .<br />

The first term on the right is the log-likelihood based on the complete data, but with<br />

X replaced by its “best estimate” ˆX calculated from the previous iteration. If the<br />

increments θ (i+1) − θ (i) are small, then the second term on the right is nearly constant<br />

(≈ n − r) and can be ignored. For ease of computation in this application we shall<br />

use the modified version<br />

˜Q θ|θ (i) <br />

ℓ θ; ˆW .

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!