Chapter 2 Introduction to Neural network

More documents

Recommendations

Info

Main problem is the steepest descent algorithm. We know that steepest descent is perpendicular to the contours of the error function. closest path We also know that the more elliptic the contour is the slower the convergence will be. The ratio of the ellipses diameters are controlled by the eigenvalues of the correlation matrix R xx (x-input). If we decorrelate the input before the network we will get better performance. λ 2 λ 1 λ− eigenvalues of R xx Decorrelation can be done by an unsupervised algorithm (The Generalized Hebbian algorithm) by finding the Principal Components (PCA). We will se this later in this course. 6.8 Momentum update In order to drop faster in the error surface we use the average of previous gradients when we update the weights △w (k) ij = − ∂E + β△w (k−1) ij ∂w ij } {{ } momentum term, β
The backpropagation algorithm can cause the weight update to follow a fractal pattern (fractal means it follow some rules which are applied statistical), (see figure 8.11) ⇒ The stepsize and momentum term should decrease during training to avoid this. 6.9 Adaptive step algorithms These algorithms modifies the stepsize during training in order to speed convergence. The idea is to increase the stepsize,α , if the sign of the gradient is the same for two following updates and to decrease if the sign change. Example: Error function start increase decrease increase 6.10 Silva and Almeidal’s algorithms α (k+1) i = { α (k) α (k) i u if ∇ i E (k) ∇ i E (k−1) > 0 i d if ∇ i E (k) ∇ i E (k−1) < 0 u, d parameters (e.g. u = 1.1, d = 0.9) and index i stands for weight no. i □ Other similar methods are Delta-bar-delta and and Rprop. 6.11 Second order algorithms These methods use the Newton’s algorithm instead of the steepest descent. [ ] ∂ w (k+1) = w (k) 2 −1 E − ∇ ∂w 2 w E 54
Page 1 and 2: Chapter 2 Introduction to Neural ne
Page 3 and 4: ¡¢£¤¡¢£¥ ¡ ¢£¦ § ¡¢
Page 5 and 6: 2.5 Learning Depending on the task
Page 7 and 8: 3.1 Hadamard-Walsh transform For bi
Page 9 and 10: ¢£¤¥¦§ £ ¤¥¦¨ ¢¥¢¦
Page 11 and 12: ¡ ¢£ ¤¡¤¢¤£¥¦w x 4.2 The
Page 13 and 14: □ A formula for R(m, n) ∑n−1
Page 15 and 16: Assume that we have a sequence of i
Page 17 and 18: 6.1.2 The unipolar sigmoid function
Page 19 and 20: ¯W [3] = (rand(p, k + 1) − 0.5)
Page 21 and 22: where u [3] j = p∑ i=1 w [3] ji o
Page 23 and 24: 6.4 The rule of the hidden layer Co
Page 25: ¤ A ¤ W ¢£ Since we know that f
Page 29 and 30: x - training data o - generalizatio
Page 31 and 32: 6.13.3 Cross validation A standard

Chapter 2 Introduction to Neural network

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?