25.02.2015 Views

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.8 VERSIONS OF <strong>SPSA</strong> ALGORITHM<br />

where β > 0 depends on the choice <strong>of</strong> the gain sequences ( a k<br />

andc k<br />

), µ depends on both the<br />

<strong>Hessian</strong> and the third derivatives <strong>of</strong> L (θ ) and<br />

*<br />

θ , and Σ depends on <strong>Hessian</strong> matrix<br />

(note that in general µ ≠ 0, in contrasts to many well-known asymptotic normality results in<br />

estimation). Given the restrictions on the gain sequences to ensure convergence and asymptotic<br />

*<br />

θ<br />

normality, the fastest allowable value <strong>for</strong> the rate <strong>of</strong> convergence <strong>of</strong><br />

θˆ k<br />

to<br />

*<br />

θ is<br />

−1/3<br />

k .<br />

In addition to establishing the <strong>for</strong>mal convergence <strong>of</strong> <strong>SPSA</strong>, Spall in [18] shows that the<br />

probability distribution <strong>of</strong> an appropriately scaled<br />

θˆ k<br />

is approximately normal (with a specified<br />

mean and covariance matrix) <strong>for</strong> large k . Spall in [18] uses the asymptotic normality result in<br />

(1.8), together with a parallel result <strong>for</strong> FDSA [9], to establish the relative efficiency <strong>of</strong> <strong>SPSA</strong>.<br />

This efficiency depends on the shape <strong>of</strong> L (θ ) , the values <strong>for</strong> a } and c } , and the<br />

distributions <strong>of</strong> the { ∆ k<br />

} and measurement noise terms. There is no single expression that can<br />

be used to characterize the relative efficiency; however, as discussed in [17] in most practical<br />

problems <strong>SPSA</strong> will be asymptotically more efficient than FDSA.<br />

{ k<br />

{ k<br />

For example, if<br />

a<br />

k<br />

and<br />

asymptotic mean squared error<br />

c<br />

k<br />

are chosen as in the guidelines <strong>of</strong><br />

Spall [18] then by equating the<br />

2<br />

E<br />

⎛ ⎞<br />

⎜<br />

ˆ θ −θ *<br />

k ⎟ in <strong>SPSA</strong> and FDSA algorithm, we find<br />

⎝ ⎠<br />

No. <strong>of</strong> measurements <strong>of</strong> L (θ ) in <strong>SPSA</strong> / No. <strong>of</strong> measurements <strong>of</strong> L (θ ) in FDSA → 1/p<br />

as the number <strong>of</strong> loss measurements in both procedures gets large. Hence, above expression<br />

implies that the p-fold savings per iteration (gradient approximation) translates directly into a<br />

p-fold savings in the overall optimization process despite the complex non-linear ways in which<br />

the sequence <strong>of</strong> gradient approximations manifests itself in the ultimate solutionθˆ k . One<br />

properly chosen simultaneous random change in all the variables in a problem provides as much<br />

in<strong>for</strong>mation <strong>for</strong> optimization as a full set <strong>of</strong> one at time changes <strong>of</strong> each variable.<br />

1.8. -Versions <strong>of</strong> <strong>SPSA</strong> <strong>Algorithm</strong><br />

The standard first-<strong>order</strong> SA algorithms <strong>for</strong> estimating θ involve a simple recursion with.<br />

15

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!