Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

More documents

Recommendations

Info

2.6 FISHER INFORMATION MATRIX required first or second derivatives of the log-likelihood function may be a formidable task in some applications, and computing the required expectation of the generally non-linear multivariate function is often impossible in problems of practical interest. This section outlines a computer resampling approach to estimating F n(θ ) that is useful when analytical methods for computing F n(θ ) are infeasible. The approach makes use of a computationally efficient and easy-to-implement method for Hessian estimation that was described by Spall[24] in the context of optimization. The computational efficiency follows by the low number of log-likelihood or gradient values needed to produce each Hessian estimate. Although there is no optimization here per se, we use the same basic simultaneous perturbation (SP) formula for Hessian estimation [this is the same SP principle given earlier in Spall[24] for gradient estimation]. However, the way in which the individual Hessian estimates are averaged differs from Spall[24] because of the distinction between the problem of recursive optimization and the problem of estimation of F n(θ ) . The essence of the method is to produce a large number of SP estimates of the Hessian matrix of logl ( ⋅) and then average the negative of these estimates to obtain an approximation to F n(θ ) . This approach is directly motivated by the definition of F n(θ ) as the main value of the negative Hessian matrix (2.29). To produce the SP Hessian estimates, we generate pseudodata vectors in a Monte Carlo manner. The pseudodata are generated according to a bootstrap resampling scheme treating the chosen θ as “truth.” The pseudodata are generated according to the probability model p f z( ζ θ ) given in (2.27). So, for example, if it is assumed that the real data Zn are jointly normally distributed, N ( µ ( θ ), Σ( θ )) , then the pseudodata are generated by Monte Carlo according to a normal distribution based on a mean µ and covariance matrix Σ evaluated at the chosen θ . Let the i-th pseudodata vector be Z pseudo (i) ; the use of Z pseudo without the argument is a generic reference to a pseudodata vector. This data vector represents a sample of size n from the assumed distribution for the set of data based on 35
CHAPTER 2. PROPOSED SPSA ALGORITHM the unknown parameters taking on the chosen value of θ . Hence, the basis for the technique is to use computational horsepower in lieu of traditional detailed theoretical analysis to determine F n(θ ) . Two other notable Monte Carlo techniques are the bootstrap method for determining statistical distributions of estimates and the Markov chain Monte Carlo method for producing pseudorandom numbers and related quantities. Part of the appeal of the Monte Carlo method here for estimating F n(θ ) is that it can be implemented with only evaluations of the log-likelihood. The approach below can work with either log l( θ Z ) values (alone) or pseudo with the gradient g θ Z pseudo ) ≡ ∂log l ( θ Z )/ ∂θ if that is available. The former usually ( pseudo corresponds to cases where the likelihood function and associated non-linear process are so complex that no gradients are available. To highlight the fundamental commonality of the approach in this dissertation, we assume the following: Let G θ Z pseudo ) ≡ ∂ log l( θ Z ) / ∂θ represent either a gradient approximation ( pseudo (based log l ( θ Z )) values) or the exact gradient g θ Z ) . Because of its efficiency, the pseudo ( pseudo SP gradient approximation is recommended in the case where only logl( θ Z ) values are pseudo available (Spall [24]). We now present the Hessian estimate. Let Ĥ k denote the th k estimate of the Hessian H ( ⋅) . The formula for estimating the Hessian is: log l Hˆ k ⎪⎧ δGk = 1/ 2⎨ 2ck ⎪⎩ −1 −1 −1 ⎛ δGk −1 −1 −1 [ ∆ , ∆ ,..., ∆ ] + ⎜ [ ∆ , ∆ ∆ ] 1 k 2 kp ⎜ k1 k 2,..., ⎝ 2ck k kp T ⎞ ⎪⎫ ⎟ ⎬ ⎠ ⎪⎭ (2.33) where δG = G θ + ∆ Z ) − G( θ − ∆ Z ) and the perturbation vector in this k approach [ ] T k ( k pseudo k pseudo ∆ = ∆ ,..., ∆ k , ∆ k2 1 is a mean zero random vector such that the { ∆ki} k p are “small” symmetrically distributed random variables where k,i are uniformly bounded, and satisfy E( 1/ ∆ ki ) < ∞ uniformly in k, i. This latter condition excludes such commonly used Monte 36
Page 1 and 2: Approximation of Hessian Matrix for
Page 3 and 4: Copyright 2009 by Jorge Ivan Medina
Page 5 and 6: ここで提案するアルゴ
Page 7 and 8: ABSTRACT shown that for the same as
Page 9 and 10: Contents 1. Introduction 1 1.1 Moti
Page 11 and 12: CONTENTS 5.3 Parameter Estimation b
Page 13 and 14: LIST OF FIGURES Fig. 4.1 Block diag
Page 15 and 16: List of Abbreviations SPSA 1st-SPSA
Page 17 and 18: CHAPTER 1. INTRODUCTION the converg
Page 19 and 20: CHAPTER 1.INTRODUCTION approximatio
Page 21 and 22: CHAPTER 1. INTRODUCTION and simulta
Page 23 and 24: CHAPTER 1. INTRODUCTION Typical app
Page 25 and 26: CHAPTER 1. INTRODUCTION 1.4--Featur
Page 27 and 28: CHAPTER 1. INTRODUCTION Some of the
Page 29 and 30: CHAPTER 1. INTRODUCTION M − k ( k
Page 31 and 32: CHAPTER 1. INTRODUCTION usually, a
Page 33 and 34: CHAPTER 1. INTRODUCTION Main Disadv
Page 35 and 36: CHAPTER 2. PROPOSED SPSA ALGORITHM
Page 49: CHAPTER 2. PROPOSED SPSA ALGORITHM
Page 95 and 96: CHAPTER 3. APPLICATION USING M2-SPS
Page 101 and 102:
CHAPTER 3. APPLICATION USING M2-SPS
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
CHAPTER 6. CONCLUSIONS AND FUTURE W
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
REFERENCE [10] S. A. Billings, G. N
Page 149 and 150:
REFERENCE [29] M. Metivier and P. P
Page 151 and 152:
REFERENCE [51] D. Parikh, N. Ahmed
Page 153 and 154:
REFERENCE [72] N. J. Gordon, D. J S
Page 155 and 156:
APPENDIX A that this random vector
Page 157 and 158:
APPENDIX A Part 2: To show that ~
Page 159 and 160:
APPENDIX A Proof of Theorem 2a (M2-
Page 161 and 162:
APPENDIX A 1 ⎡~ ~ ~ ~ ( ( ) ( ))
Page 163 and 164:
APPENDIX A ˆ θ * −α * k+ 1 −
Page 165 and 166:
APPENDIX A results. Here, zk+n+ 1 i
Page 167 and 168:
APPENDIX A Because the second eleme
Page 169 and 170:
154 APPENDIX A
Page 171 and 172:
APPENDIX B The Wei [48] approach is
Page 173 and 174:
158 APPENDIX B
Page 175 and 176:
LIST OF THE PUBLICATIONS AND INTERN
Page 177 and 178:
LIST OF THE PUBLICATIONS AND INTERN
Page 179:
Author Biography Jorge Ivan Medina
show all

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?