25.02.2015 Views

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.8 VERSIONS OF <strong>SPSA</strong> ALGORITHM<br />

the <strong>Hessian</strong> matrix <strong>of</strong> L ˆ θ ). Hence, equation (1.10) is a stochastic analogue <strong>of</strong> the<br />

( k<br />

well-known Newton-Raphson algorithm <strong>of</strong> deterministic optimization. Since gˆ<br />

( ˆ θ ) has a<br />

known <strong>for</strong>m, the parallel recursions in equations (1.9) and (1.10) can be implemented once that<br />

k<br />

k<br />

Ĝk<br />

is specified. The SP gradient approximation requires two measurements <strong>of</strong><br />

L( ⋅):<br />

y<br />

( + )<br />

k<br />

and<br />

y . These represent measurements at design levels θˆ k<br />

+ ck∆k<br />

and θˆ<br />

k<br />

− ck∆k<br />

respectively, where<br />

(−)<br />

k<br />

c<br />

k<br />

is a positive scalar and<br />

∆<br />

k<br />

represents a user-generated random vector satisfying certain<br />

regularity conditions, e.g<br />

∆<br />

k<br />

being a vector <strong>of</strong> independent Bernoulli ± 1 random variables<br />

satisfies these conditions but a vector <strong>of</strong> uni<strong>for</strong>mly distributed random variables does not. The<br />

SP comes from the fact that all elements <strong>of</strong><br />

θˆ k<br />

are perturbed simultaneously in <strong>for</strong>ming<br />

gˆ<br />

( ˆ θ ) , as opposed to the finite difference <strong>for</strong>m, where they are perturbed one at time. To<br />

k<br />

k<br />

per<strong>for</strong>m one iteration <strong>of</strong> (1.9) and (1.10), one additional measurement, say<br />

(0)<br />

y<br />

k<br />

is required; this<br />

measurement represents an observation <strong>of</strong> L (⋅)<br />

at the nominal design level θˆ k .<br />

Main Advantage:<br />

- 1st-<strong>SPSA</strong> gives region(s) where the function value is low, and this allows to conjecture in<br />

which region(s) is the global solution.<br />

- 2nd-<strong>SPSA</strong> is based on a highly efficient approximation <strong>of</strong> the gradient based on loss function<br />

measurements. In particular, on each iteration the <strong>SPSA</strong> only needs three loss measurements to<br />

estimate the gradient, regardless <strong>of</strong> the dimensionality <strong>of</strong> the problem. Moreover, the 2nd-<strong>SPSA</strong><br />

is grounded on a solid mathematical framework that permits to assess its stochastic properties<br />

also <strong>for</strong> optimization problems affected by noise or uncertainties. Due to these striking<br />

advantages, 2nd-<strong>SPSA</strong> is recently used as optimization engine <strong>for</strong> adaptive control problems.<br />

17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!