Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

More documents

Recommendations

Info

2.6 FISHER INFORMATION MATRIX form in (2.28). Note that in some applications, the observed information matrix at a particular dataset zn may be easier to compute and/or preferred from an inference point of view relative to the actual information matrix Fn(θ ) in (2.29). Although the method in this work is described for the determination of Fn(θ ) the efficient Hessian estimation may also be used directly for the determination of H θ z ) when it is not easy to calculate the Hessian directly. ( n 2.6.2 -Two Key Properties of the Information Matrix: Connections to --Covariance Matrix of Parameter Estimates Let * θ denotes the unknown “true” value of θ . The primary rationale for (n) F as a measure of information about θ within the data (n) Z covariance matrix for the estimate of θ constructed from Z comes from its connection to the (n) . The first of the key properties makes this connection via an asymptotic normality result [23]. In particular, for some common forms of estimates θˆ n (e.g. maximum likelihood and Bayesian maximum a posteriori), it is known that, under modest conditions ˆ * −1 n ( θ n −θ ) → N (0, F ) (2.30) where dist → denotes convergence in distribution and F * Fn ( θ ) ≡ lim (2.31) n→∞ n provided that the indicated limit exists and is invertible. Hence, in practice, for n reasonably −1 large, F n( θ ) ’ can serve as an approximate covariance matrix of the estimate θˆ n when θ is chosen close to the unknown of some recursive algorithms where the data * θ . Relationship (2.30) also holds for optimal implementations Z are processed recursively instead of in a hatch i mode as is typical in maximum likelihood. This includes optimal versions of gradient-based SA algorithms, which includes popular algorithms such as LMS and NN BP as special cases. The second key property of the information matrix applies in finite-samples. 33
CHAPTER 2. PROPOSED SPSA ALGORITHM If θˆ n is any unbiased estimator of θ [23], ˆ * −1 cov( θ ) ≥ ( θ ) , ∀n. (2.32) n F n There is also an expression analogous to (2.31) for biased estimators, but it is not especially useful in practice because it requires knowledge of the gradient of the bias with respect to θ . Expressions (2.30) and (2.31), taken together, point to the close connection, between the inverse Fisher information matrix and the covariance matrix of the estimator. While (2.30) is an asymptotic result, (2.31) applies for all sample sizes subject to the unbiased ness requirement. It is also clear why the name “information matrix” is used for F (n) : A larger F (n) (in the matrix. sense) is associated with a smaller covariance matrix (i.e., more information) while a smaller F (n) is associated with a larger covariance matrix (i.e., less information). The calculation of F (n) is often difficult or impossible in many non-linear problems. Obtaining the required first or second derivatives of the log-likelihood function may he a formidable task in some applications, and computing the required expectation of the generally non-linear multivariate function is often impossible in problems of practical interest. To address this difficulty, the subsection outlines a computer resampling approach to estimating F (n) . This approach is useful when analytical methods for computing F (n) are infeasible. The approach makes use of an idea introduced for optimization the Hessian estimation for SA even though this problem is not directly one of optimization. The basis for the technique below is to use computational horsepower in lieu of traditional detailed theoretical analysis to determine F (n) . The method here is an example of a MCNR method for producing an estimate. Such methods have become very popular as a means of handling problems that were formerly infeasible. Other notable Monte Carlo techniques are the bootstrap method for determining statistical distributions of estimates and the Markov chain Monte Carlo method for producing pseudorandom numbers and related quantities. Part of the appeal of the Monte Carlo method here for estimating F (n) is that it can be implemented with only evaluations of the log-likelihood. 2.6.3 -Estimation of F n(θ ) The calculation of F n(θ ) is often difficult or impossible in practical problems. Obtaining the 34
Page 1 and 2: Approximation of Hessian Matrix for
Page 3 and 4: Copyright 2009 by Jorge Ivan Medina
Page 5 and 6: ここで提案するアルゴ
Page 7 and 8: ABSTRACT shown that for the same as
Page 9 and 10: Contents 1. Introduction 1 1.1 Moti
Page 11 and 12: CONTENTS 5.3 Parameter Estimation b
Page 13 and 14: LIST OF FIGURES Fig. 4.1 Block diag
Page 15 and 16: List of Abbreviations SPSA 1st-SPSA
Page 17 and 18: CHAPTER 1. INTRODUCTION the converg
Page 19 and 20: CHAPTER 1.INTRODUCTION approximatio
Page 21 and 22: CHAPTER 1. INTRODUCTION and simulta
Page 23 and 24: CHAPTER 1. INTRODUCTION Typical app
Page 25 and 26: CHAPTER 1. INTRODUCTION 1.4--Featur
Page 27 and 28: CHAPTER 1. INTRODUCTION Some of the
Page 29 and 30: CHAPTER 1. INTRODUCTION M − k ( k
Page 31 and 32: CHAPTER 1. INTRODUCTION usually, a
Page 33 and 34: CHAPTER 1. INTRODUCTION Main Disadv
Page 35 and 36: CHAPTER 2. PROPOSED SPSA ALGORITHM
Page 47: CHAPTER 2. PROPOSED SPSA ALGORITHM
Page 95 and 96: CHAPTER 3. APPLICATION USING M2-SPS
Page 97 and 98: CHAPTER 3. APPLICATION USING M2-SPS
Page 99 and 100:
CHAPTER 3. APPLICATION USING M2-SPS
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
CHAPTER 6. CONCLUSIONS AND FUTURE W
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
REFERENCE [10] S. A. Billings, G. N
Page 149 and 150:
REFERENCE [29] M. Metivier and P. P
Page 151 and 152:
REFERENCE [51] D. Parikh, N. Ahmed
Page 153 and 154:
REFERENCE [72] N. J. Gordon, D. J S
Page 155 and 156:
APPENDIX A that this random vector
Page 157 and 158:
APPENDIX A Part 2: To show that ~
Page 159 and 160:
APPENDIX A Proof of Theorem 2a (M2-
Page 161 and 162:
APPENDIX A 1 ⎡~ ~ ~ ~ ( ( ) ( ))
Page 163 and 164:
APPENDIX A ˆ θ * −α * k+ 1 −
Page 165 and 166:
APPENDIX A results. Here, zk+n+ 1 i
Page 167 and 168:
APPENDIX A Because the second eleme
Page 169 and 170:
154 APPENDIX A
Page 171 and 172:
APPENDIX B The Wei [48] approach is
Page 173 and 174:
158 APPENDIX B
Page 175 and 176:
LIST OF THE PUBLICATIONS AND INTERN
Page 177 and 178:
LIST OF THE PUBLICATIONS AND INTERN
Page 179:
Author Biography Jorge Ivan Medina
show all

Approximation of Hessian Matrix for Second-order SPSA Algorithm ...

Create successful ePaper yourself

Delete template?

Save as template?