Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
1.8 VERSIONS OF <strong>SPSA</strong> ALGORITHM<br />
the <strong>Hessian</strong> matrix <strong>of</strong> L ˆ θ ). Hence, equation (1.10) is a stochastic analogue <strong>of</strong> the<br />
( k<br />
well-known Newton-Raphson algorithm <strong>of</strong> deterministic optimization. Since gˆ<br />
( ˆ θ ) has a<br />
known <strong>for</strong>m, the parallel recursions in equations (1.9) and (1.10) can be implemented once that<br />
k<br />
k<br />
Ĝk<br />
is specified. The SP gradient approximation requires two measurements <strong>of</strong><br />
L( ⋅):<br />
y<br />
( + )<br />
k<br />
and<br />
y . These represent measurements at design levels θˆ k<br />
+ ck∆k<br />
and θˆ<br />
k<br />
− ck∆k<br />
respectively, where<br />
(−)<br />
k<br />
c<br />
k<br />
is a positive scalar and<br />
∆<br />
k<br />
represents a user-generated random vector satisfying certain<br />
regularity conditions, e.g<br />
∆<br />
k<br />
being a vector <strong>of</strong> independent Bernoulli ± 1 random variables<br />
satisfies these conditions but a vector <strong>of</strong> uni<strong>for</strong>mly distributed random variables does not. The<br />
SP comes from the fact that all elements <strong>of</strong><br />
θˆ k<br />
are perturbed simultaneously in <strong>for</strong>ming<br />
gˆ<br />
( ˆ θ ) , as opposed to the finite difference <strong>for</strong>m, where they are perturbed one at time. To<br />
k<br />
k<br />
per<strong>for</strong>m one iteration <strong>of</strong> (1.9) and (1.10), one additional measurement, say<br />
(0)<br />
y<br />
k<br />
is required; this<br />
measurement represents an observation <strong>of</strong> L (⋅)<br />
at the nominal design level θˆ k .<br />
Main Advantage:<br />
- 1st-<strong>SPSA</strong> gives region(s) where the function value is low, and this allows to conjecture in<br />
which region(s) is the global solution.<br />
- 2nd-<strong>SPSA</strong> is based on a highly efficient approximation <strong>of</strong> the gradient based on loss function<br />
measurements. In particular, on each iteration the <strong>SPSA</strong> only needs three loss measurements to<br />
estimate the gradient, regardless <strong>of</strong> the dimensionality <strong>of</strong> the problem. Moreover, the 2nd-<strong>SPSA</strong><br />
is grounded on a solid mathematical framework that permits to assess its stochastic properties<br />
also <strong>for</strong> optimization problems affected by noise or uncertainties. Due to these striking<br />
advantages, 2nd-<strong>SPSA</strong> is recently used as optimization engine <strong>for</strong> adaptive control problems.<br />
17