Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
Approximation of Hessian Matrix for Second-order SPSA Algorithm ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
1.6 FORMULATION OF <strong>SPSA</strong> ALGORITHM<br />
∂L<br />
g ( θ)<br />
= = 0 . (1.4)<br />
∂ θ<br />
For the gradient-free setting, it is assumed that measurements <strong>of</strong> L (θ ) say y (θ ) are available at<br />
various values <strong>of</strong> θ . These measurements may or may not include random noise. No direct<br />
measurements (either with or without noise) <strong>of</strong> g (θ ) are assumed available in this setting. In<br />
the Robbins-Monroe/stochastic gradient (SG) case [9], it is assumed that direct measurements <strong>of</strong><br />
g (θ) are available, usually in the presence <strong>of</strong> added noise. The basic problem is to take the<br />
available in<strong>for</strong>mation measurements <strong>of</strong> L (θ ) and/or g (θ ) and attempt to estimate<br />
*<br />
θ . This is<br />
essentially a local unconstrained optimization problem. The <strong>SPSA</strong> algorithm is a tool <strong>for</strong><br />
solving optimization problems in which the cost function is analytically unavailable or difficult<br />
to compute. The algorithm is essentially a randomized version <strong>of</strong> the Kiefer-Wolfowitz method<br />
in which the gradient is estimated using only two measurements <strong>of</strong> the cost function at each<br />
iteration [15][16]. <strong>SPSA</strong> is particularly efficient in problems <strong>of</strong> high dimension and where the<br />
cost function must be estimated through expensive simulations. The convergence properties <strong>of</strong><br />
the algorithm have been established in [16]. Consider the problem <strong>of</strong> finding the minimum <strong>of</strong> a<br />
real valued function L (θ ), <strong>for</strong> θ ∈D<br />
where D is an open domain in<br />
P<br />
R . The function is not<br />
assumed to be explicitly known, but noisy measurements M ( n,<br />
θ)<br />
<strong>of</strong> it are available:<br />
M ( n,<br />
θ)<br />
= L(<br />
θ)<br />
+ ε ( θ)<br />
(1.5)<br />
n<br />
where { ε n<br />
} is the measurement noise process. We assume that the function L (⋅)<br />
is at least<br />
three-times continuously differentiable and has a unique minimize in D. The process { ε n<br />
} is a<br />
zero-mean process, uni<strong>for</strong>mly bounded and smooth in θ in an appropriate technical sense. The<br />
problem is to minimize L (⋅)<br />
using only the noisy measurements M (⋅)<br />
. The <strong>SPSA</strong> algorithm <strong>for</strong><br />
minimizing functions relies on the SP gradient approximation [16]. At each iteration k <strong>of</strong> the<br />
algorithm, a random perturbation vector<br />
∆ is taken, where the ∆<br />
ki<br />
<strong>for</strong>ms a<br />
T<br />
k<br />
= ( ∆k<br />
1,...,<br />
∆kp)<br />
sequence <strong>of</strong> Bernoulli random variables taking the values ± 1. The perturbations are assumed to<br />
be independent <strong>of</strong> the measurement noise process. In fixed gain <strong>SPSA</strong>, the step size <strong>of</strong> the<br />
perturbation is fixed at, say, some c > 0. To compute the gradient estimate at iteration k, it is<br />
necessary to evaluate M (⋅)<br />
at two values <strong>of</strong> θ :<br />
M θ ) L(<br />
θ + c∆<br />
) + ε ( θ + c∆<br />
)<br />
(1.6)<br />
+<br />
k<br />
( =<br />
k 2k−1<br />
k<br />
13