25.02.2015 Views

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1.6 FORMULATION OF <strong>SPSA</strong> ALGORITHM<br />

∂L<br />

g ( θ)<br />

= = 0 . (1.4)<br />

∂ θ<br />

For the gradient-free setting, it is assumed that measurements <strong>of</strong> L (θ ) say y (θ ) are available at<br />

various values <strong>of</strong> θ . These measurements may or may not include random noise. No direct<br />

measurements (either with or without noise) <strong>of</strong> g (θ ) are assumed available in this setting. In<br />

the Robbins-Monroe/stochastic gradient (SG) case [9], it is assumed that direct measurements <strong>of</strong><br />

g (θ) are available, usually in the presence <strong>of</strong> added noise. The basic problem is to take the<br />

available in<strong>for</strong>mation measurements <strong>of</strong> L (θ ) and/or g (θ ) and attempt to estimate<br />

*<br />

θ . This is<br />

essentially a local unconstrained optimization problem. The <strong>SPSA</strong> algorithm is a tool <strong>for</strong><br />

solving optimization problems in which the cost function is analytically unavailable or difficult<br />

to compute. The algorithm is essentially a randomized version <strong>of</strong> the Kiefer-Wolfowitz method<br />

in which the gradient is estimated using only two measurements <strong>of</strong> the cost function at each<br />

iteration [15][16]. <strong>SPSA</strong> is particularly efficient in problems <strong>of</strong> high dimension and where the<br />

cost function must be estimated through expensive simulations. The convergence properties <strong>of</strong><br />

the algorithm have been established in [16]. Consider the problem <strong>of</strong> finding the minimum <strong>of</strong> a<br />

real valued function L (θ ), <strong>for</strong> θ ∈D<br />

where D is an open domain in<br />

P<br />

R . The function is not<br />

assumed to be explicitly known, but noisy measurements M ( n,<br />

θ)<br />

<strong>of</strong> it are available:<br />

M ( n,<br />

θ)<br />

= L(<br />

θ)<br />

+ ε ( θ)<br />

(1.5)<br />

n<br />

where { ε n<br />

} is the measurement noise process. We assume that the function L (⋅)<br />

is at least<br />

three-times continuously differentiable and has a unique minimize in D. The process { ε n<br />

} is a<br />

zero-mean process, uni<strong>for</strong>mly bounded and smooth in θ in an appropriate technical sense. The<br />

problem is to minimize L (⋅)<br />

using only the noisy measurements M (⋅)<br />

. The <strong>SPSA</strong> algorithm <strong>for</strong><br />

minimizing functions relies on the SP gradient approximation [16]. At each iteration k <strong>of</strong> the<br />

algorithm, a random perturbation vector<br />

∆ is taken, where the ∆<br />

ki<br />

<strong>for</strong>ms a<br />

T<br />

k<br />

= ( ∆k<br />

1,...,<br />

∆kp)<br />

sequence <strong>of</strong> Bernoulli random variables taking the values ± 1. The perturbations are assumed to<br />

be independent <strong>of</strong> the measurement noise process. In fixed gain <strong>SPSA</strong>, the step size <strong>of</strong> the<br />

perturbation is fixed at, say, some c > 0. To compute the gradient estimate at iteration k, it is<br />

necessary to evaluate M (⋅)<br />

at two values <strong>of</strong> θ :<br />

M θ ) L(<br />

θ + c∆<br />

) + ε ( θ + c∆<br />

)<br />

(1.6)<br />

+<br />

k<br />

( =<br />

k 2k−1<br />

k<br />

13

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!