Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

More documents

Recommendations

Info

1.6 FORMULATION OF SPSA ALGORITHM ∂L g ( θ) = = 0 . (1.4) ∂ θ For the gradient-free setting, it is assumed that measurements of L (θ ) say y (θ ) are available at various values of θ . These measurements may or may not include random noise. No direct measurements (either with or without noise) of g (θ ) are assumed available in this setting. In the Robbins-Monroe/stochastic gradient (SG) case [9], it is assumed that direct measurements of g (θ) are available, usually in the presence of added noise. The basic problem is to take the available information measurements of L (θ ) and/or g (θ ) and attempt to estimate * θ . This is essentially a local unconstrained optimization problem. The SPSA algorithm is a tool for solving optimization problems in which the cost function is analytically unavailable or difficult to compute. The algorithm is essentially a randomized version of the Kiefer-Wolfowitz method in which the gradient is estimated using only two measurements of the cost function at each iteration [15][16]. SPSA is particularly efficient in problems of high dimension and where the cost function must be estimated through expensive simulations. The convergence properties of the algorithm have been established in [16]. Consider the problem of finding the minimum of a real valued function L (θ ), for θ ∈D where D is an open domain in P R . The function is not assumed to be explicitly known, but noisy measurements M ( n, θ) of it are available: M ( n, θ) = L( θ) + ε ( θ) (1.5) n where { ε n } is the measurement noise process. We assume that the function L (⋅) is at least three-times continuously differentiable and has a unique minimize in D. The process { ε n } is a zero-mean process, uniformly bounded and smooth in θ in an appropriate technical sense. The problem is to minimize L (⋅) using only the noisy measurements M (⋅) . The SPSA algorithm for minimizing functions relies on the SP gradient approximation [16]. At each iteration k of the algorithm, a random perturbation vector ∆ is taken, where the ∆ ki forms a T k = ( ∆k 1,..., ∆kp) sequence of Bernoulli random variables taking the values ± 1. The perturbations are assumed to be independent of the measurement noise process. In fixed gain SPSA, the step size of the perturbation is fixed at, say, some c > 0. To compute the gradient estimate at iteration k, it is necessary to evaluate M (⋅) at two values of θ : M θ ) L( θ + c∆ ) + ε ( θ + c∆ ) (1.6) + k ( = k 2k−1 k 13
CHAPTER 1. INTRODUCTION M − k ( k 2k k θ ) = L( θ − c∆ ) + ε ( θ − c∆ ) . (1.7) The i-th component of the gradient estimate is ( M H ( k, θ) = i + k ( θ) − M 2c∆ ki − k ( θ)) . 1.7 -Basic Assumptions of SPSA Algorithm Once again, the goal is to minimize a loss function L (θ ) over P θ ∈C ⊆ R . The SPSA algorithm works by iterating from an initial guess of the optimal θ , where the iteration process depends on the above-mentioned simultaneous perturbation approximation to the gradient g (θ ). In [16] are presented sufficient conditions for convergence of the SPSA iterate ( ˆ θ →θ * a.s.) using a differential equation approach well known by SA theory [17]. In particular, we must impose conditions on both gain sequences ( a k and and the statistical relationship of c k ), the user specified distribution of k ∆ k , ∆ k to the measurements y(·). We will not repeat the conditions here since they are available in [17]. The main conditions are that a k and c k both go to 0 at rates neither too fast nor too slow, that L (θ ) is sufficiently smooth (several times differentiable) near * θ and that the { ∆ki} −1 0 with finite inverse moments ( ∆ ki ) are independent and symmetrically distributed about E for all k, i. One particular distribution for ∆ ki that satisfies these latter conditions is the symmetric Bernoulli ±1 distribution; two common distributions that do not satisfy the conditions (in particular, the critical finite inverse moment condition) are the uniform and normal. Although the convergence results for SPSA is of some independent interest, the most interesting theoretical results in [16] and those that best justify the use of SPSA, are the asymptotic efficiency conclusions that follow from an asymptotic normality result. In particular, under some minor additional conditions in [16] (proposition 2), it can be shown that k β / 2 dist ˆ * ( θk − θ ) →N( µ , Σ) as k →∞ (1.8) 14
Page 1 and 2: Approximation of Hessian Matrix for
Page 3 and 4: Copyright 2009 by Jorge Ivan Medina
Page 5 and 6: ここで提案するアルゴ
Page 7 and 8: ABSTRACT shown that for the same as
Page 9 and 10: Contents 1. Introduction 1 1.1 Moti
Page 11 and 12: CONTENTS 5.3 Parameter Estimation b
Page 13 and 14: LIST OF FIGURES Fig. 4.1 Block diag
Page 15 and 16: List of Abbreviations SPSA 1st-SPSA
Page 17 and 18: CHAPTER 1. INTRODUCTION the converg
Page 19 and 20: CHAPTER 1.INTRODUCTION approximatio
Page 21 and 22: CHAPTER 1. INTRODUCTION and simulta
Page 23 and 24: CHAPTER 1. INTRODUCTION Typical app
Page 25 and 26: CHAPTER 1. INTRODUCTION 1.4--Featur
Page 27: CHAPTER 1. INTRODUCTION Some of the
Page 31 and 32: CHAPTER 1. INTRODUCTION usually, a
Page 33 and 34: CHAPTER 1. INTRODUCTION Main Disadv
Page 35 and 36: CHAPTER 2. PROPOSED SPSA ALGORITHM
Page 79 and 80:
CHAPTER 2. PROPOSED SPSA ALGORITHM
Page 81 and 82:
Page 83 and 84:
Page 85 and 86:
Page 87 and 88:
Page 89 and 90:
Page 91 and 92:
Page 93 and 94:
Page 95 and 96:
CHAPTER 3. APPLICATION USING M2-SPS
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
CHAPTER 6. CONCLUSIONS AND FUTURE W
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
REFERENCE [10] S. A. Billings, G. N
Page 149 and 150:
REFERENCE [29] M. Metivier and P. P
Page 151 and 152:
REFERENCE [51] D. Parikh, N. Ahmed
Page 153 and 154:
REFERENCE [72] N. J. Gordon, D. J S
Page 155 and 156:
APPENDIX A that this random vector
Page 157 and 158:
APPENDIX A Part 2: To show that ~
Page 159 and 160:
APPENDIX A Proof of Theorem 2a (M2-
Page 161 and 162:
APPENDIX A 1 ⎡~ ~ ~ ~ ( ( ) ( ))
Page 163 and 164:
APPENDIX A ˆ θ * −α * k+ 1 −
Page 165 and 166:
APPENDIX A results. Here, zk+n+ 1 i
Page 167 and 168:
APPENDIX A Because the second eleme
Page 169 and 170:
154 APPENDIX A
Page 171 and 172:
APPENDIX B The Wei [48] approach is
Page 173 and 174:
158 APPENDIX B
Page 175 and 176:
LIST OF THE PUBLICATIONS AND INTERN
Page 177 and 178:
LIST OF THE PUBLICATIONS AND INTERN
Page 179:
Author Biography Jorge Ivan Medina
show all

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Create successful ePaper yourself

Delete template?

Save as template?