25.02.2015 Views

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.6 FISHER INFORMATION MATRIX<br />

<strong>for</strong>m in (2.28). Note that in some applications, the observed in<strong>for</strong>mation matrix at a particular<br />

dataset zn<br />

may be easier to compute and/or preferred from an inference point <strong>of</strong> view relative<br />

to the actual in<strong>for</strong>mation matrix Fn(θ<br />

) in (2.29). Although the method in this work is<br />

described <strong>for</strong> the determination <strong>of</strong> Fn(θ<br />

) the efficient <strong>Hessian</strong> estimation may also be used<br />

directly <strong>for</strong> the determination <strong>of</strong> H θ z ) when it is not easy to calculate the <strong>Hessian</strong> directly.<br />

(<br />

n<br />

2.6.2 -Two Key Properties <strong>of</strong> the In<strong>for</strong>mation <strong>Matrix</strong>: Connections to<br />

--Covariance <strong>Matrix</strong> <strong>of</strong> Parameter Estimates<br />

Let<br />

*<br />

θ denotes the unknown “true” value <strong>of</strong> θ . The primary rationale <strong>for</strong> (n)<br />

F as a<br />

measure <strong>of</strong> in<strong>for</strong>mation about θ within the data<br />

(n)<br />

Z<br />

covariance matrix <strong>for</strong> the estimate <strong>of</strong> θ constructed from Z<br />

comes from its connection to the<br />

(n)<br />

. The first <strong>of</strong> the key properties<br />

makes this connection via an asymptotic normality result [23]. In particular, <strong>for</strong> some<br />

common <strong>for</strong>ms <strong>of</strong> estimates<br />

θˆ n<br />

(e.g. maximum likelihood and Bayesian maximum a<br />

posteriori), it is known that, under modest conditions<br />

ˆ * −1<br />

n ( θ<br />

n<br />

−θ<br />

) → N (0, F )<br />

(2.30)<br />

where<br />

dist<br />

→ denotes convergence in distribution and<br />

F<br />

*<br />

Fn<br />

( θ )<br />

≡ lim<br />

(2.31)<br />

n→∞<br />

n<br />

provided that the indicated limit exists and is invertible. Hence, in practice, <strong>for</strong> n reasonably<br />

−1<br />

large, F<br />

n(<br />

θ ) ’ can serve as an approximate covariance matrix <strong>of</strong> the estimate θˆ n<br />

when θ is<br />

chosen close to the unknown<br />

<strong>of</strong> some recursive algorithms where the data<br />

*<br />

θ . Relationship (2.30) also holds <strong>for</strong> optimal implementations<br />

Z are processed recursively instead <strong>of</strong> in a hatch<br />

i<br />

mode as is typical in maximum likelihood. This includes optimal versions <strong>of</strong> gradient-based SA<br />

algorithms, which includes popular algorithms such as LMS and NN BP as special cases. The<br />

second key property <strong>of</strong> the in<strong>for</strong>mation matrix applies in finite-samples.<br />

33

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!