Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2.6 FISHER INFORMATION MATRIX<br />
<strong>for</strong>m in (2.28). Note that in some applications, the observed in<strong>for</strong>mation matrix at a particular<br />
dataset zn<br />
may be easier to compute and/or preferred from an inference point <strong>of</strong> view relative<br />
to the actual in<strong>for</strong>mation matrix Fn(θ<br />
) in (2.29). Although the method in this work is<br />
described <strong>for</strong> the determination <strong>of</strong> Fn(θ<br />
) the efficient <strong>Hessian</strong> estimation may also be used<br />
directly <strong>for</strong> the determination <strong>of</strong> H θ z ) when it is not easy to calculate the <strong>Hessian</strong> directly.<br />
(<br />
n<br />
2.6.2 -Two Key Properties <strong>of</strong> the In<strong>for</strong>mation <strong>Matrix</strong>: Connections to<br />
--Covariance <strong>Matrix</strong> <strong>of</strong> Parameter Estimates<br />
Let<br />
*<br />
θ denotes the unknown “true” value <strong>of</strong> θ . The primary rationale <strong>for</strong> (n)<br />
F as a<br />
measure <strong>of</strong> in<strong>for</strong>mation about θ within the data<br />
(n)<br />
Z<br />
covariance matrix <strong>for</strong> the estimate <strong>of</strong> θ constructed from Z<br />
comes from its connection to the<br />
(n)<br />
. The first <strong>of</strong> the key properties<br />
makes this connection via an asymptotic normality result [23]. In particular, <strong>for</strong> some<br />
common <strong>for</strong>ms <strong>of</strong> estimates<br />
θˆ n<br />
(e.g. maximum likelihood and Bayesian maximum a<br />
posteriori), it is known that, under modest conditions<br />
ˆ * −1<br />
n ( θ<br />
n<br />
−θ<br />
) → N (0, F )<br />
(2.30)<br />
where<br />
dist<br />
→ denotes convergence in distribution and<br />
F<br />
*<br />
Fn<br />
( θ )<br />
≡ lim<br />
(2.31)<br />
n→∞<br />
n<br />
provided that the indicated limit exists and is invertible. Hence, in practice, <strong>for</strong> n reasonably<br />
−1<br />
large, F<br />
n(<br />
θ ) ’ can serve as an approximate covariance matrix <strong>of</strong> the estimate θˆ n<br />
when θ is<br />
chosen close to the unknown<br />
<strong>of</strong> some recursive algorithms where the data<br />
*<br />
θ . Relationship (2.30) also holds <strong>for</strong> optimal implementations<br />
Z are processed recursively instead <strong>of</strong> in a hatch<br />
i<br />
mode as is typical in maximum likelihood. This includes optimal versions <strong>of</strong> gradient-based SA<br />
algorithms, which includes popular algorithms such as LMS and NN BP as special cases. The<br />
second key property <strong>of</strong> the in<strong>for</strong>mation matrix applies in finite-samples.<br />
33