Optimization: Gradient and steepest descent

Scientific Computing 2013 

Computer Classes: Worksheet 6: 

Optimization: Gradient and steepest descent 

October 10, 2013 

1 Gradient descent 

Unconstrained minimization problem searches for an argument x that minimizes the function 

minimize(F (x)) 

where x = (x 1 , x 2 , . . . , x n ) and the point where the function value is minimal is depicted with 

x ∗ . Classical first-order method to find minimum of a function is Gradient descent method. It 

iteratively slides towards opposite direction of the function gradient −∇F until the minimum is 

found. The algorithm is as follows: 

1. start with initial guess x (0) 

2. until maximum number of iterations is reached or stopping criteria is satisfied 

(a) find new search direction △x = −∇F (x (k) ) 

(b) find good step size α along the search direction 

(c) make the step x (k+1) = x (k) + α△x 

3. the solution x ∗ = x (k+1) is the vector value in the last iteration 

Search direction. 

Gradient is defined through partial derivatives 

∇F : R 

( n → R n 

∂F 

∇F = , ∂F , . . . , ∂F ) 

∂x 1 ∂x 2 ∂x n 

and it shows the direction of the greatest ascent ∇F (x) in each point x. It can be computed 

approximately with forward or central differences for partial derivatives. Take h = 1E − 7 and 

compute finite differences 

∂F 

= F ((x 1, . . . , x i + h, . . . , x n )) − F (x) 

∂x i h 

1

2.0 

1.5 

gradient descent 

1.0 

x (1) 

0.5 α∆x 

0.0 

−0.5 

−1.0 ∆x 

−1.5 

−2.0 x (0) 

−8 −6 −4 −2 0 2 4 6 8 

Figure 1: Gradient descent convergence path for F (x) = x 2 1 + 5x 2 2 

Step size. 

There are several ways to compute step size α. Two alternatives are: 

• exact line search – find minimum along the line (search direction) argmin α F (x (k) + α△x) 

• approximate line search – find just some good α that decreases F along the line 

In this computer class we use exact line search algorithm from the scipy library scipy.optimize.line_search. 

Stopping criteria. Because gradient is zero in the solution ∇F (x ∗ ) = 0, one possibility to check 

the convergence is verify that the norm of the gradient is small enough 

‖∇F ‖ < ɛ 

where tolerance ɛ can be taken 1E −3. Norm is just the length of the vector and in Euclidean space 

is defined as ‖v‖ = √ ∑i v2 i . 

Task 1 

Implement the Gradient Descent algorithm. Take for example function F (x 1, x 2) = x 2 1 + 5x 2 2 

to optimize and initial guess x (0) = (−8, −2). Draw the descent path as shown on Figure 1 

and print number of iterations. 

Hint: exact line search can be done with 

res = opt.line_search(f,gf,x,sdir,gf(x)) 

alpha = res[0] 

where f is the function, gf must compute the gradient and sdir is the search direction. 

Take the norm from numpy.linalg.norm 

2 Steepest descent 

The gradient descent method makes a lot of zigzags while descending in a valley. We do not have 

to take the opposite of gradient direction △x = −∇F but may take any descent direction (if you 

know the math it must be △x ·∇F < 0) as the search direction. It is possible to tweak the gradient 

search direction 

△x T = −A · ∇F T 

2

2.0 

1.5 

n 

1.0 

iter =25,A =diag(1.0,1.0) 

0.5 

n iter =12,A =diag(2.0,1.0) 

0.0 

−0.5 

n iter =39,A =diag(1.0,2.0) 

−1.0 

−1.5 

−2.0 

−8 −6 −4 −2 0 2 4 6 8 

Figure 2: Steepest descent paths 

where A is some matrix that scales the gradient vector thus prioritizing some axes. On the next 

computer class we’ll see how to take A smartly, using second order information about F , which 

results in the Newton method. 

Task 2 

( ) ( ) ( ) 

1 0 

2 0 

1 0 

In the previous task take A = 

, A = 

, A = 

, print the number 

of iterations for each case and draw the path. Plot the results as on Figure 2. As we see 

0 1 

0 1 

0 2 

good direction may greatly affect the number of iterations. 

3 Solid mechanics example 

Now we solve one simple problem from Solid mechanics 1 . Three springs with stiffness coefficients 

k 1 , k 2 , k 3 (see [1]) are connected together along a line. 

k 1 k 2 k 3 

x 1 x 2 

0 3 

Potential energy of a string may be taken as E = 1 2 kl2 where l is the string length. Thus the 

total energy of the three springs is 

E t = 1 2 

( 

k1 x 2 1 + k 2 (x 2 − x 1 ) 2 + k 3 (3 − x 2 ) 2) 

Equilibrium is achieved when the energy is minimal argmin (x1,x 2)E t . 

Task 3 

Solve the problem with 3 springs by finding minimum of E t. Use gradient descent you implemented 

in the previous task. Take different k 1, k 2, k 3, print the number of iterations and 

draw the paths as on Figure 3. 

1 This example is adapted from [2], 5.5.4 Mechanics interpretation of KKT conditions, by removing constraints 

3

k1=1.0,k2=1.0,k3=1.0 

4 

3 

2 

1 

0 

−1 

−1 0 1 2 3 4 

k1=0.1,k2=1.0,k3=1.0 

4 

3 

2 

1 

0 

−1 

−1 0 1 2 3 4 

k1=0.1,k2=10.0,k3=1.0 

4 

3 

2 

1 

0 

−1 

−1 0 1 2 3 4 

0 

−2 

iter=3,x=(1.0000,2.0000) 

log 10 ||▿f|| 

−4 

−6 

−8 

0.0 0.5 1.0 1.5 2.0 2.5 3.0 

0.5 

0.0 

−0.5 

−1.0 

−1.5 

−2.0 

−2.5 

−3.0 

−3.5 

2 

1 

0 

iter=17,x=(2.4994,2.7495) 

log 10 ||▿f|| 

0 2 4 6 8 10 12 14 16 18 

iter=69,x=(2.7016,2.7287) 

log 10 ||▿f|| 

−1 

−2 

−3 

−4 

0 10 20 30 40 50 60 70 

Figure 3: Searching minimum of potential energy for springs 

4

Analytical solution. This problem is simple enough to have analytical solution. The minimum 

of a function has zero gradient ∇E t = 0, so 

We have the system of 2 equations 

solving it yields the answer 

∂E t 

∂x 1 

= k 1 x 1 − k 2 (x 2 − x 1 ) = 0 

∂E t 

∂x 2 

= k 2 (x 2 − x 1 ) − k 3 (3 − x 2 ) = 0 

(k 1 + k 2 )x 1 − k 2 x 2 = 0 

−k 2 x 1 + (k 2 + k 3 )x 2 = 3k 3 

x 1 = 

x 2 = 

3k 3 k 2 

k 1 k 2 + k 2 k 3 + k 1 k 3 

3k 3 (k 1 + k 2 ) 

k 1 k 2 + k 2 k 3 + k 1 k 3 

If we subsitute stiffness values (k 1 , k 2 , k 3 ) from the example (1, 1, 1), (0.1, 1, 1), and (0.1, 10, 1) we 

get correspondingly for each case x = (1, 2), x = (2.5, 2.75), and x = (2.7027, 2.7297). Equilibrium 

for the second case looks as 

k 1 k 2 k 3 

x 1 x 2 

0 3 

Further reading 

I suggest [3] “5. BASIC MULTIDIMENSIONAL GRADIENT METHODS“ as a practical and fairly 

illuminating introduction. The “9. Unconstrained minimization” from [2] may be even more illuminating 

but more technical. Also keep in mind that convex problems have single global minimum 

and other nice features, so do not take any claim for granted from this book. :) The third book [4] 

also contains some good chapters on the topic. 

References 

[1] http://en.wikipedia.org/wiki/Hooke%27s_law 

[2] Boyd, Stephen P.; Vandenberghe, Lieven (2004). Convex Optimization (pdf). Cambridge University 

Press. ISBN 978-0-521-83378-3. 

[3] Andreas Antoniou, Wu-Sheng Lu. Practical Optimization: Algorithms and Engineering Applications. 

Springer 2007 

[4] R.Fletcher. Practical methods of Optimization. Second edition. John Wiley & Sons 2000 

5

Optimization: Gradient and steepest descent

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?