Just click here. - Bad Request - University of Southern Queensland

MAT2101 

Applied Mathematics 

Faculty of Sciences 

Electronic Study Book 

Written by 

Tony Roberts, David Mander & Tim Passmore 

Department of Mathematics & Computing 

Faculty of Sciences 

The University of Southern Queensland

Preface 

This unit brings in an emphasis upon developing applications side by side 

with the development of mathematical concepts and techniques. Please let 

us know of all errors in the study book as soon as you suspect any. This 

feedback will improve our unit year by year. 

Some parts of the unit are in the mainstream, and other parts are included for 

a richer picture. As you read you will see that we have endeavoured to convey 

the importance of the various concepts and sections. For example, concepts 

and formulae in the “aims” and the Summaries are the most essential. In 

support of this, the reading you have been asked to do has been classified by 

requests to “study”, “read” or “peruse” in order of decreasing importance. 

For your convenience we have in places suggested specific problems that you 

should try and send in your answers to us for feedback. These problems are 

a minimum that you should be able to immediately do. Our feedback will 

help you learn the more difficult aspects of the course. Ensure you make use

Preface 

of us. Send in your work by post, by fax or perhaps by e-mailing scanned 

work. 

Associated with this study guide are Matlab scripts to enhance your ability 

to probe the problems and concepts and thus to improve learning. 

As part of our commitment to the highest quality of teaching we also provide 

this study guide in electronic format. Note several aspects about the 

electronic form: 

• the electronic form is displayed using Adobe’s acrobat reader (with 

bookmarks); 

• for electronic convenience the page size is different and so the page 

numbering is totally different to the printed version; 

• clickable links allow rapid navigation around the electronic document 

to make it easier to connect widespread parts of the unit; 

• and some links to outside material have also been encoded. 

Information about mathematical figures in the history of the topics have been 

gleaned from various sources including 

http://www-groups.dcs.st-and.ac.uk/~history/index.html 

Reading 0.A Now read Chapters 1 and 2 in Kreyszig to refresh your memory 

of aspects of differential equations that were introduced in your 

previous mathematics. 

iii

Table of Contents 

Preface 

ii 

I Modelling dynamics with differential equations 1 

1 Systems of differential equations 4 

2 Scientists must write 43 

3 Describing the conservation of material 84 

4 The dynamics of momentum 113

Table of Contents 

II Structure, algebra and approximation of applied 

functions 126 

v 

5 The nature of infinite series 129 

6 Series solutions of differential equations give special functions 

185 

7 Linear transforms and their eigenvectors on inner product 

spaces 252

Part I 

Modelling dynamics with 

differential equations

Part contents 

1 Systems of differential equations 4 

1.1 Systems of linear differential equations . . . . . . . . . . . . . 7 

1.2 Qualitative solution of nonlinear, first-order systems of ode’s . 23 

1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

2 Scientists must write 43 

2.1 Basics of mathematical writing . . . . . . . . . . . . . . . . . 45 

2.2 L A TEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 

3 Describing the conservation of material 84 

3.1 Eulerian description of motion . . . . . . . . . . . . . . . . . . 86 

3.2 Conservation of mass . . . . . . . . . . . . . . . . . . . . . . . 96

PART CONTENTS 3 

3.3 Car traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 

4 The dynamics of momentum 113 

4.1 Conservation of momentum . . . . . . . . . . . . . . . . . . . 115 

4.2 Dynamics of ideal gases . . . . . . . . . . . . . . . . . . . . . 119 

4.3 Equations of quasi-one-dimensional blood flow . . . . . . . . . 124 

4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Module 1 

Systems of differential 

equations 

In the 17th century Isaac Newton published his famous universal laws of 

motion which, in essence, showed how physical systems could be described 

by differential equations. The motions of planets, falling apples, billiard 

balls and flying arrows could all be described in terms of the forces acting to 

produce changes in motion. 

During the last 25 years we have seen how scientists using these same laws 

of motion, and computers to solve complex systems of differential equations, 

have been able to navigate the Voyager spacecraft, with amazing precision,

Module 1. Systems of differential equations 5 

to rendezvous in space with Jupiter, Saturn and the outer planets of our 

solar system. Given the governing differential equations and a set of initial 

conditions, the future motion can be predicted. 

In this module we use differential equations to model physical systems and 

describe and predict their behaviour under a variety of conditions. 

Module contents 

1.1 Systems of linear differential equations . . . . . . . . 7 

1.1.1 Case study: the motion of a mass on a spring . . . . . . 7 

1.1.2 Conversion of the order of differential equations . . . . . 8 

1.1.3 The phase plane and phase portrait of the mass-spring 

system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

1.1.4 Trajectories in the phase plane of a linear system . . . . 14 

1.1.5 Classification and stability of fixed points . . . . . . . . 17 

1.2 Qualitative solution of nonlinear, first-order systems 

of ode’s . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 

1.2.1 Linearisation using the Jacobian . . . . . . . . . . . . . 35 

1.2.2 Answers to selected Exercises . . . . . . . . . . . . . . . 40 

1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

The text for this module is Chapter 3 in Kreyszig Advanced Engineering 

Mathematics, 8th ed, Wiley. References to the text use the format [K,reference].


Main aims: 

• to write differential equations as a system of first-order differential 

equations; 

• classify general solutions near any fixed point or equilibrium; 

• predict the qualitative nature of solutions near fixed points or equilibria; 

• introduce the technique of linearisation; 

• to patch together the pictures near each fixed point to obtain a global 

understanding of the solutions.


1.1 Systems of linear differential equations 

You have solved some ordinary differential equations (ode’s) in first year 

mathematics; these differential equations and their solutions are often used 

to describe the motion of some mechanical or otherwise evolving system. 

For example, the motion of a mass on a spring is discussed briefly next in 

§1.1.1. However, for many purposes it is much better to recast a differential 

equation as a system of first-order differential equations. For example, this 

is necessary to analyse “chaos.” 1 In this section we lay the foundations for 

the analysis of system of differential equations. 

1.1.1 Case study: the motion of a mass on a spring 

Kreyszig shows [K,pp158–9] that the motion of a mass attached to a spring, if 

there are no friction or damping forces, is governed by the single second-order 

ode 

my ′′ = −ky (1.1) 

where y = y(t) is the displacement at time t of the mass from its rest position 

where the spring is unstretched, y ′ = dy/dt and k and m are constants, 

m being mass and k describing the ‘stiffness’ of the spring. 

1 The topic of chaos is explored in the fourth year course mat4102. 

This equation 

comes directly from 

Newton’s Second 

Law that 

applied force = 

mass × acceleration. 

The minus sign says 

that the force of the 

spring opposes the 

motion of the mass 

and acceleration is


From first-year mathematics we know that this ode may be re-written as: 

y ′′ + k m y = 0 , 

and its general solution is 

√ √ 

k k 

y(t) = A 1 cos 

m t + A 2 sin 

m t , (1.2) 

for constants A 1 and A 2 . This solution describes √ an unending oscillation in 

time with constant angular frequency ω = k/m. 

1.1.2 Conversion of the order of differential equations 

To illustrate the approach we take in general, the second-order differential 

equation (1.1) describing a spring is here re-written as a system of two firstorder 

equations. 

Introduce two new variables y 1 and y 2 and put 

y 1 = y , and y 2 = y ′ . 

Then using (1.1) we also describe the motion of the spring by the first-order 

system 

y ′ 1 = y 2 , 

y ′ 2 = − k m y 1 .


√ 

In matrix form, with ω = k/m, 

where 

[ 

y 

′ 

1 

y ′ 2 

] 

y = 

= 

[ 

[ 

y1 

0 1 

−ω 2 0 

y 2 

] 

] [ 

y1 

y 2 

] 

and A = 

[ 

or y ′ = Ay . (1.3) 

0 1 

−ω 2 0 

Similarly, many higher order differential equations are reduced to first-order 

systems. 

] 

. 

It is convenient to 

use the angular 

frequency ω instead 

of k/m in the 

matrix formulation. 

Reading 1.A Kreyszig Chapter 3: read §3.0, then study §3.1 [K,p152–8] on 

modelling with systems of differential equations and their solutions. 

Example 1.1: rewrite the following ode as a first-order system: 

y ′′′ + 7y ′′ − 4y ′ + 8y = 0 . 

Solution: 

Define new variables: 

y 1 = y , y 2 = y ′ and y 3 = y ′′ ,


then 

y ′ 1 = y 2 , 

y ′ 2 = y 3 , 

y ′ 3 = −8y 1 + 4y 2 − 7y 3 , 

which is written in matrix form as 

⎡ 

y ′ ⎤ ⎡ 

⎤ ⎡ ⎤ 

1 0 1 0 y 1 

⎢ 

⎣ y 2 

′ ⎥ ⎢ 

⎥ ⎢ ⎥ 

⎦ = ⎣ 0 0 1 ⎦ ⎣ y 2 ⎦ . 

y 3 

′ −8 4 −7 y 3 

Exercise 1.2: 

Convert the following to first-order systems: 

(a) y ′′′ + 12y ′′ − 5y ′ + 11y = 0; 

(b) y ′′ + αy ′ + cy = 0, with α and c constants; 

(c) y ′′′′ + 7y ′′ = 9y. 

Activity 1.B Do problems from §3.1 [K,p158] and Exercise 1.2 above. Send 

in to the examiner for feedback at least Q9. 

Reading 1.C Read §3.2 [K,p159–161] for some background theory.


1.1.3 The phase plane and phase portrait of the massspring 

system 

The point of writing the spring equation (1.1) as the two dimensional matrix 

system (1.3) is that we now have a 2-D description of the motion of the mass 

in terms of its position, y 1 = y, and its velocity, y 2 = y ′ . By plotting y 2 

against y 1 a graph, known as a phase portrait, of the motion of the mass on 

the spring is made. At each point in the phase plane, illustrated by the little 

pictures in Figure 1.1, the mass-spring system has a specific combination of 

extension and velocity. 

At each time, t, a single point on the phase plane is plotted corresponding 

to the position and velocity of the mass. Over time the system traverses a 

path in the phase plane, known as a trajectory or an orbit. For a given set 

of initial conditions a trajectory for the mass might appear as in Figure 1.1. 

There are two points, corresponding to the left/right extremes of the ellipse, 

where the velocity y 2 = 0 and the displacement y 1 is extreme, meaning that 

the mass is instantaneously at rest and the spring has reached maximum 

compression/extension. A moment later the mass has changed direction and 

is picking up speed; at the top/bottom extremes of the ellipse the mass is 

moving through y 1 = 0 where the spring is unstretched and the speed is maximal. 

At other times the velocity and displacement have values intermediate 

between these extremes. 

Since there is no friction (an ideal case) the motion just keeps repeating itself 

indefinitely.


y 2 

=y' 

1 

0.8 

0.6 

0.4 

0.2 

0 

-0.2 

-0.4 

-0.6 

-0.8 

-1 

-1 -0.5 0 0.5 1 1.5 

y =y 1 

Figure 1.1: mass spring phase plane showing that at each point in the phase 

plane a little picture displays the unique state of the system quantified by 

its position y = y 1 and velocity y ′ = y 2 . The green ellipse shows a possible 

orbit or trajectory of the mass-spring system, the path through the states, 

over time.


Example 1.3: We show all this by solving (1.3) which is a homogeneous linear 

system with constant coefficient matrix, A. Kreyszig shows [K,p163, 

Theorem 1] that the general solution will be of the form 

y = c 1 x (1) e λ 1t + c 2 x (2) e λ 2t 

(1.4) 

where c j are arbitrary complex constants, λ j are the eigenvalues of A 

and x (j) the corresponding eigenvectors. The characteristic equation 

is: 

λ −1 

det(λI − A) = 

∣ ω 2 

λ ∣ = λ2 + ω 2 = 0 

yielding λ 1 = iω and λ 2 = −iω. For the eigenvectors solve 

[ ] [ ] [ ] 

λ −1 v1 0 

ω 2 

= 

λ v 2 0 

to get 

and 

x (1) = 

x (2) = 

[ 

[ 

1 

iω 

1 

−iω 

] 

] 

for λ = λ 1 = iω , 

So the general solution to the system is: 

[ ] [ ] [ 

y1 1 

= c 

y 1 e iωt + c 

2 iω 

2 

for λ = λ 2 = −iω . 

1 

−iω 

] 

e −iωt . (1.5)


Exercise 1.4: Show by choosing 

c 1 = 1 2 (A 1 − iA 2 ) and c 2 = 1 2 (A 1 + iA 2 ) 

in the general solution above, that you recover the solution (1.2). 

Normally, the constants c 1 and c 2 will be chosen so that the solution (1.5) is 

real, in which case the plot of y 1 versus y 2 or of (1.2) and its derivative will 

generate an ellipse. 

1.1.4 Trajectories in the phase plane of a linear system 

The big advantage of the phase plane is that we qualitatively see how the 

dynamics of a system will evolve. For example, in the mass-spring system 

we know 

y ′ 1 = y 2 and y ′ 2 = −ω2 y 1 . 

That is the rate of change of the position vector y is (y 2 , −ω 2 y 1 ). 2 Thus 

at each point in the phase plane we can tell the direction that the system 

evolves by drawing an arrow as in the following plot. 

2 We use the row-vector in parentheses, such as (y 1 , y 2 , . . . , y n ) to denote the corresponding 

column vector.


1 

[y1,y2]=meshgrid(linspace(-1,1,7)); 

u=y2; v=-0.88*y1; 

quiver(y1,y2,u,v) 

y 2 

=y' 

0.5 

0 

-0.5 

-1 

-1.5 -1 -0.5 0 0.5 1 1.5 

y 1 

=y 

% then try the following code to 

% simulate evolution of this DE 

hold on 

y1=0.05;y2=0.6;dt=0.01; 

pt = plot(y1,y2,’r*’,’erase’,’xor’); 

drawnow 

for t=dt:dt:20 

dy1=y2; dy2=-0.88*y1; 

y1=y1+dt*dy1; y2=y2+dt*dy2; 

set(pt,’xdata’,y1,’ydata’,y2) 

end 

The green curve shows the trajectory taken by the system: the set of states 

it goes through as time evolves. See how the evolution arrows are tangent 

to the trajectory as they must point along the direction of evolution. In this 

subsection we look at the few different sorts of pictures generically seen in 

two-dimensions. 

Reading 1.D Study Kreyszig §3.3 [K,pp162–9]: take note of the phase 

plane pictures in Fig. 78–82 [K,pp165–6], and ignore the irrelevant distinction 

between “improper node” and a “proper node.”.


Exercise 1.5: Some systems of differential equations evolve according to 

the vectors plotted in 2-D below. For each system, visualise the trajectories 

of the system and classify the origin at the centre point as either 

a node, saddle, centre or spiral point. 

(a) (b) (c) 

(d) (e) (f)


Activity 1.E Do exercises in Problem Set 3.3 [K,pp169–170] and Exercise 

1.5 above. Send in to the examiner for feedback at least Q1 & 

10. 

1.1.5 Classification and stability of fixed points 

These pictures of the dynamics near the origin allow us to answer very important 

qualitative questions about the solutions of differential equations. 

In application, we principally concern ourselves with things that can be observed. 

Thus we need to predict what may be observed and what cannot 

be observed. This is expressed via the notion of stability. Loosely, a fixed 

point (or critical point), the origin for linear systems, is stable if all nearby 

solutions stay nearby for all time and thus could be observed, a pendulum 

hanging downwards for example; whereas a critical point is unstable if at least 

one nearby solution escapes from the neighbourhood of the critical point and 

thus cannot be expected to be observed because we expect the escape to be 

found, a pencil is impossible to balance on its sharp tip for example. 

Reading 1.F Study §3.4 [K,pp170–5], especially the definition of stability 

and its consequences. 

Kreyszig, as do many texts, writes the conditions for stability and classifications 

in terms of the coefficients of the characteristic polynomial. While


this may be slightly more convenient in 2-D, it is usually easier to remember 

the conditions directly in terms of the eigenvalues. This is for two reasons: 

the classification then proceeds systematically to higher dimensions; and it 

is easy to remember the details because the dynamics are simply those of 

exp(λ j t). 

Thus we urge you to classify the fixed points of two-dimensional linear systems 

according to the eigenvalues of their coefficient matrix, A. The results 

are summarised in this table. 

Eigenvalues λ j Condition Fixed point (0, 0) 

R(λ j ) = 0 for j = 1, 2 stable centre 

Complex R(λ j ) > 0 for j = 1, 2 unstable spiral 

(R denotes real part) R(λ j ) < 0 for j = 1, 2 stable spiral 

λ 1 , λ 2 > 0 unstable node 

Real λ 1 , λ 2 < 0 stable node 

λ 1 < 0 < λ 2 unstable saddle 

However, cases not covered by the above table, the so-called degenerate cases, 

have to be considered on their own merits. 

Activity 1.G Do Problem Set 3.4 [K,p174–5]. Send in to the examiner for 

feedback at least Q2, 4 & 14.


In higher dimensions the stability of a fixed point is most easily expressed in 

terms of the eigenvalues of the corresponding coefficient matrix. Based upon 

the generic solution [K,p163] 

y = c 1 x (1) e λ 1t + · · · + c n x (n) e λnt , 

and the behaviour of exp(λ j t) we deduce: 

• the fixed point y = 0 is unstable if R(λ j ) > 0 for at least one j as 

then at least that component exp(λ j t) in the solution will grow and 

lead solutions away from the fixed point; 

• the fixed point y = 0 is stable if R(λ j ) ≤ 0 for all j as then all the 

components exp(λ j t) in the solution will decay or just oscillate; 

• unless the exceptional degenerate case occurs where R(λ j ) ≤ 0 for all Note: the case 

j but two or more pairs of eigenvalues with R(λ j ) = 0 also have equal 

imaginary part, say ω, when the general solution will have the growing 

component cte iωt from the degeneracy [K,p167] to cause the fixed point 

to be unstable. 

Eigenvalues in two or three dimensional problems may be calculated by hand. 

In higher dimensions we typically resort to computer numerics. 

Example 1.6: Determine the stability of the origin in the three-dimensional 

systems of Problems 7–9 in Problem Set 3.3 [K,p169]. 

R(λ j ) = 0 is very 

delicate as it is 

exactly on the 

dividing line 

between stability 

and instability, and 

hence any small 

effect will tip the 

dynamics from one 

to the other.


Solution: 

use Matlab to compute the eigenvalues as follows: 

>>a7=[10 -10 -4;-10 1 -14;-4 -14 -2]; 

>>eig(a7) 

ans = 

18.0000 

9.0000 

-18.0000 

>>a8=[-3 -1 2; 0 -4 2; 0 1 -5]; 

>>eig(a8) 

ans = 

-3 

-3 

-6 

>>a9=[-1 -4 2;2 5 -1;2 2 2]; 

>>eig(a9) 

ans = 

-0.0000 

3.0000 

3.0000 

Thus (here all the eigenvalues are real) the origin is: 

• unstable in Problem 7 as at least one eigenvalue (here two) is 

positive;


• stable in Problem 8 as all eigenvalues are negative (the multiple 

eigenvalue −3 introduces the component te −3t but this still decays); 

• unstable in Problem 9 as at least one eigenvalue is positive. 

Exercise 1.7: A function y(t) is governed by the third-order equation: 

y ′′′ + 5y ′′ − 2y ′ − 6y = 0 . 

(a) By introducing appropriate variables show how this can be expressed 

as a linear system of first-order ode’s. 

(b) Write down the general solution of the system. 

(c) Describe the nature of the fixed point at (0, 0, 0). 

Exercise 1.8: Prepare a phase plane diagram for the following system: 

dx 

dt 

dy 

dt 

= −x − 3y , 

= 2x − 3y .


(a) Find the real solution to this system when x(0) = 4 and y(0) = 1. 

(b) Sketch on your phase plane the solution curve above.


1.2 Qualitative solution of nonlinear, first-order 

systems of ode’s 

Systems which have complex or physically interesting behaviour are governed 

by nonlinear differential equations. It is usually impossible to solve such 

equations algebraically, but phase portraits can give a rough overview of 

what solutions look like. Near each fixed point of the system, the solution 

is dominated by the linear terms in the differential equations and so for 

each fixed point one of the pictures examined in the previous section applies. 

After considering each fixed point, all the little pictures are reasonably joined 

together to give a global overview of the solutions. 

The techniques you study here will be developed further in later modules. 

Reading 1.H Study §3.5 [K,pp180–6] and all its examples as the understanding 

of this section is the main purpose of this module. 

Note the main steps used in the analysis of nonlinear systems: 

• set up a mathematical model, converting to a system of first-order 

differential equations if necessary; 

• determine all the fixed points (critical points) of the system;


• use linearisation to examine the dynamics near each fixed point; 

• fill-in the trajectories in phase space in a sensible way. 

This last step is often aided by determining isoclines: the curves in the phase 

plane where trajectories have constant slope. 

Example 1.9: Prepare a phase plane diagram of the following system: locate 

its fixed points and determine the nature of each fixed point by 

linearisation. Find the approximate linear solution near each of the 

fixed points. Find some isoclines. Sketch in trajectories. 

dx 

dt 

dy 

dt 

= 3x − xy 

= y − x 2 y . 

Aside: the whole phase plot is easy enough to do with Matlab: below 

is the sort of picture we will work towards. However, we use mathematical 

analysis.


5 

4 

3 

y 

2 

1 

0 

[x y]=meshgrid(-2:.4:2,-.8:.4:5); 

dx=3*x-x.*y; 

dy=y-x.*x.*y; 

-1 

-3 -2 -1 0 1 2 3 quiver(x,y,dx,dy); 

x 

Solution: 

• It is useful to know where the fixed points are before we do the 

phase plot. So set the right-hand sides to zero. 

3x − xy = x(3 − y) = 0 ⇒ x = 0 or y = 3 

y − x 2 y = y(1 − x 2 ) = 0 ⇒ y = 0 or x = ±1 

Putting x = 0 from the first equation forces y = 0 from the 

second, whereas y = 3 from the first equation forces x = ±1 

from the second. So there are three fixed points: (0, 0), (1, 3) and


(−1, 3). We should make sure the phase plot includes these points 

as shown below. 

4 

3.5 

3 

2.5 

2 

y 

1.5 

1 

0.5 

0 

-0.5 

-1 

-3 -2 -1 0 1 2 3 

x 

• Consider each of the fixed points in turn. 

– To linearise near the fixed point (1, 3), make the change of 

variable: 

x = 1 + X(t) , and y = 3 + Y (t)


where X(t) and Y (t) are small. Then x ′ = X ′ , y ′ = Y ′ and 

X ′ = 3(1 + X) − (1 + X)(3 + Y ) 

= −Y − XY , 

Y ′ = (3 + Y ) − (1 + X) 2 (3 + Y ) 

= −6X − 3X 2 − 2XY − X 2 Y . 

Since X and Y are small, all the nonlinear quadratic and cubic 

terms in X and Y are negligible compared to the linear terms 

and we approximate as the linear system 

[ ] [ 

X 

′ 

Y ′ ≈ 

0 −1 

−6 0 

] [ 

X 

Y 

] 

. (1.6) 

The coefficient matrix has eigenvalues λ 1 

− √ 6. Thus (1, 3) is a saddle point. 

= √ 6 and λ 2 = 

– Similarly near (−1, 3), 

X ′ = 3(−1 + X) − (−1 + X)(3 + Y ) 

= Y − XY , 

Y ′ = (3 + Y ) − (−1 + X) 2 (3 + Y ) 

= 6X − 3X 2 + 2XY − X 2 Y , 

linearises to [ ] [ 

X 

′ 0 1 

Y ′ ≈ 

6 0 

] [ 

X 

Y 

] 

.


The eigenvalues are again ± √ 6, so (−1, 3) is also a saddle 

point. 

– Lastly, near (0, 0), x and y are small, so ignoring nonlinear 

terms in these original variables we have: 

[ ] [ 

x 

′ 3 0 

y ′ ≈ 

0 1 

] [ 

x 

y 

which has eigenvalues 1 and 3, and hence (0, 0) is an unstable 

node. 

• To predict trajectories it is most important to explore the neighbourhood 

of each fixed point. 

– Here the simplest fixed point is the origin (0, 0) as its linearisation 

(see above) is simply 

] 

x ′ = 3x and y ′ = y . 

Immediately write down the general solution of each of these 

basic differential equations separately: 

x = c 1 e 3t and y = c 2 e t . 

, 

Writing this in vector notation, 

[ ] [ ] [ 

x 1 0 

= c 

y 1 e 3t + c 

0 

2 

1 

] 

e t ,


see the eigenvectors are (1, 0) and (0, 1) corresponding to the 

eigenvalues 3 and 1 respectively. Thus in the direction (1, 0) 

solutions grow three times faster than in the (0, 1) direction. 

Hence we draw the little local picture below. 

4 

3.5 

3 

2.5 

2 

y 

1.5 

1 

0.5 

0 

-0.5 

-1 

-3 -2 -1 0 1 2 3 

x 

– To find approximate solutions near (±1, 3) simultaneously we 

need to find the eigenvectors of the coefficient matrix (being


careful with the sign of x): 

A = 

[ 

0 ∓1 

∓6 0 

] 

. 

So solve: 

(λI − A)v = 

[ 

λ ±1 

±6 λ 

] [ ] [ 

v1 0 

= 

v 2 0 

] 

, 

which yields eigenvectors 

[ ] 

∓1 

v (1) = √ for λ 6 1 = √ 6 , 

and 

v (2) = 

[ ] 

√ ±1 

6 

for λ 2 = − √ 6 . 

The linear solution of (1.6) near (±1, 3) will be [K,p163, Theorem 

1]: 

[ 

X 

Y 

] 

[ ] [ ] 

∓1 

= c 1 

√ e √ ±1 

6t + c 6 2 

√ e −√6t , 6 

or, in terms of the original variables, 

[ ] [ ] [ ] 

x ±1 ∓1 

= + c 

y 3 1 

√ e √ [ ] 

±1 

6t + c 6 2 

√ e −√6t , 6


where c 1 and c 2 are arbitrary constants. Thus for these two 

saddle points there will be exponential growth in the direction 

(∓1, √ 6), towards the top-left and bottom-right for the fixed 

point (1, 3), and decay in the direction (±1, √ 6), the top-right 

and bottom-left directions for (1, 3). Thus we sketch in the 

local pictures shown below for the two fixed points (±1, 3). 

4 

3.5 

3 

2.5 

2 

y 

1.5 

1 

0.5 

0 

-0.5 

-1 

-3 -2 -1 0 1 2 3 

x 

• The isoclines help fill in the picture. The two easiest isoclines are


where the trajectories are horizontal, slope zero, obtained by finding 

where y ′ = 0, and where the trajectories are vertical, infinite 

slope, obtained by finding where x ′ = 0. 

– y ′ = y − x 2 y = 0 whenever y = 0 or x = ±1. Thus all 

trajectories are vertical when they cross the three red dotdashed 

lines shown below. 

– x ′ = 3x − xy = 0 whenever x = 0 or y = 3. Thus all trajectories 

are horizontal when they cross the two magenta dashed 

lines plotted below.


4 

3.5 

3 

2.5 

2 

y 

1.5 

1 

0.5 

0 

-0.5 

-1 

-3 -2 -1 0 1 2 3 

x 

• Lastly, use all this information to qualitatively sketch in trajectories 

such as the solid green lines shown next.


4 

3.5 

3 

2.5 

2 

y 

1.5 

1 

0.5 

0 

-0.5 

-1 

-3 -2 -1 0 1 2 3 

x 

Activity 1.I Do exercises from Problem Set 3.5 [K,p183]. Send in to the 

examiner for feedback at least Q5 & 8.


The qualitative methods developed here generalise to higher dimensional 

systems, but it is much harder to “fill in” the phase space because of the 

intricate contortions permitted for trajectories in dimensions higher than 

two. That is why chaos (in its mathematical sense) only occurs in differential 

systems with three or more components. 

1.2.1 Linearisation using the Jacobian 

When exploring the dynamics of systems we analyse the linear dynamics 

in the neighbourhood of each fixed point. Previously we found the linear 

dynamics by a change of variable and subsequent neglect of nonlinear terms. 

This is a useful technique but a little laborious in straightforward situations. 

The alternative we explore here is to obtain the matrix of coefficients simply 

by evaluating the “derivative” of the differential system (the Jacobian). 

Consider a nonlinear, first-order system of ode’s of the form 

x ′ = f(x, y) 

y ′ = g(x, y) 

where we have taken a 2-D example, but the extension to an n-dimensional 

system is straightforward. Also assume that the system is autonomous, i.e. 

there is no explicit t-dependence on the right-hand sides. Suppose that 

(x 0 , y 0 ) is a fixed point for the system, i.e. f(x 0 , y 0 ) = g(x 0 , y 0 ) = 0, and


consider a point (x, y) nearby. We now appeal to Taylor’s theorem in twodimensions 

that we will explore in detail in a later module, §5.4. Taylor’s 

theorem allows us to approximate f and g in the neighbourhood of the fixed 

point as 

f(x, y) ≈ f(x 0 , y 0 ) + (x − x 0 ) ∂f 

+ (y − y 

∂x ∣ 0 ) ∂f 

, 

(x0 ,y 0 

∂y ∣ 

) 

(x0 ,y 0 ) 

g(x, y) ≈ g(x 0 , y 0 ) + (x − x 0 ) ∂g 

∣ + (y − y 0 ) ∂g 

∣ . 

∂x 

∂y 

∣ (x0 ,y 0 ) 

∣ (x0 ,y 0 ) 

Since (x 0 , y 0 ) is a fixed point f(x 0 , y 0 ) = g(x 0 , y 0 ) = 0; so in the neighbourhood 

of the fixed point the evolution is governed by the linear system 

x ′ ≈ ∂f 

(x − x 

∂x ∣ 0 ) + ∂f 

(y − y 

(x0 ,y 0 

∂y ∣ 0 ) , 

) 

(x0 ,y 0 ) 

y ′ ≈ ∂g 

∣ (x − x 0 ) + ∂g 

∣ (y − y 0 ) . 

∂x 

∂y 

∣ (x0 ,y 0 ) 

Thus, making the change of variable: 

the linearised system is then 

∣ (x0 ,y 0 ) 

x = x 0 + X(t) ⇒ x ′ = X ′ , 

y = y 0 + Y (t) ⇒ y ′ = Y ′ , 

[ 

X 

′ 

Y ′ ] 

≈ 

⎡ 

∂f 

⎣ ∂x 

∂g 

∂x 

⎤ 

∂f [ ] 

∂y ⎦ X 

∂g Y 

∂y


where all the derivatives in the matrix are evaluated at the fixed point (x 0 , y 0 ). 

A common notation for the matrix appearing here is 

⎡ ⎤ 

∂f ∂f 

∂(f, g) 

J(x, y) = 

∂(x, y) = ⎣ ∂x ∂y ⎦ , 

which is called the Jacobian 3 and must be evaluated at the fixed point in 

question. This Jacobian matrix evaluated at a fixed point is the matrix of 

coefficients of the linearised dynamics and hence its eigenvalues and eigenvectors 

determine the stability and classification of the fixed point. 

Example 1.10: Use the Jacobian in Example 1.9. 

worked example we had 

so the Jacobian is 

∂g 

∂x 

∂g 

∂y 

ẋ = f(x, y) = 3x − xy 

ẏ = g(x, y) = y − x 2 y 

[ ] 

∂(f, g) 3 − y −x 

J = 

∂(x, y) = −2xy 1 − x 2 

In our previous 

which when evaluated at the fixed points (1, 3), (−1, 3) and (0, 0) gives 

the respective coefficient matrices 

[ ] [ ] [ ] 

0 −1 0 1 

3 0 

, 

and 

−6 0 6 0 

0 1 

3 After Karl Jacobi (1804–51) a German mathematician and professor at Königsberg, 

noted for work in elliptic functions, number theory and differential determinants.


as seen previously. The eigenvalues of these matrices determined the 

stability of the corresponding fixed point as seen earlier. 

In n-dimensions the Jacobian matrix is 

⎡ 

J = ∂(f 1, f 2 , . . . , f n ) 

∂(x 1 , x 2 , . . . , x n ) = ⎢ 

⎣ 

∂f 1 ∂f 1 

∂x 1 

∂f 2 ∂f 2 

∂x 1 

. 

∂f n 

∂x 1 

∂x 2 

· · · 

· · · 

∂x 2 

. 

. .. . 

∂f n 

· · · 

∂x 2 

∂f 1 

⎤ 

∂x n 

∂f 2 

∂x n 

, 

⎥ 

⎦ 

and similarly determines the behaviour of the linearised dynamics near any 

fixed point. 

Activity 1.J Do problems 5–11 from Problem Set 3.5 [K,p183] using the 

Jacobian to determine the coefficient matrices of the linear dynamics 

about each of the fixed points. Send in to the examiner for feedback 

at least Q7 & 10. 

∂f n 

∂x n 

Exercise 1.11: Consider the following system of equations: 

dx 

dt 

dy 

dt 

= −4y + y 3 , 

= 5x − y − xy 2 .


(a) Find the fixed points of this system. 

(b) Compute the Jacobian and evaluate it at each fixed point. From 

your results classify each of the fixed points. 

(c) Find a general linearised solution near (0, 0). 

Exercise 1.12: A predator-prey population model (not the Lotka-Volterra 

model) is governed by the equations 

y ′ 1 = y 1 (1 − y 1 ) − 3 2 y 2 

y ′ 2 = 1 2 y 1 − y 2 

(a) Deduce that the only critical points of this system of equations 

are (0, 0) and (1/4, 1/8). 

(b) By linearising the system in the neighbourhood of each critical 

point, determine the nature of these points are a saddle and stable 

spiral respectively. 

(c) Based upon the eigenvectors at (0, 0) and the nature of the fixed 

points, sketch some representative trajectories for the system. 

(d) Write down the general solution to the linearised system in the 

neighbourhood of the origin (0, 0).


1.2.2 Answers to selected Exercises 

1.11 (a) (0, 0) and ±(2, 2) (b) saddle as λ = 4 and −5, and saddle as 

λ = 0.8191 and −9.8191 . 

1.12 (b) eigenvalues are λ = ±1/2 and λ = (−1±i √ 3)/4 respectively. (c) 

v = (3, 1) and v = (1, 1) correponding to eigenvalues λ = 1/2 and −1/2 

respectively. (d) y = c 1 (3, 1)e t/2 + c 2 (1, 1)e −t/2 .


1.3 Summary 

• The behaviour of physical systems, such as mechanical and electrical 

systems, are described using differential equations (§1.1.1 for example). 

• Higher order differential equations can be reduced to systems of firstorder 

differential equations by introducing more dependent variables 

(§1.1.2). 

• Solutions to 2-D systems of ode’s are graphically represented on the 

phase plane (§1.1.3). Higher dimensional systems may be also represented 

using a phase space, but imagination is required above 3-D. For 

a given set of initial conditions, the system evolves along a trajectory 

or orbit in the phase space, which describes how the system evolves in 

time (§1.1.4). 

• At a critical point or fixed point, the system undergoes no further evolution 

in time (§1.1.4). 

For a first-order system of ode’s in the form y ′ = f(y), a fixed point 

occurs wherever f(y) = 0 (§1.2). 

• Linear first-order systems with constant coefficients are written y ′ = 

Ay for some matrix A. Their general solution (equation (1.4)) is written 

down in terms of the eigenvalues and eigenvectors of A, provided the 

eigenvectors form a basis for the phase space (§1.1.4). 

Such linear systems have only the origin as a fixed point.


• In particular, in 2-D, the solution to a linear first-order system with 

constant coefficients must 4 be one of six basic types: a centre, stable 

or unstable node, stable or unstable spiral or a saddle point (§1.1.5). 

• Nonlinear systems are capable of more complex behaviour and usually 

have more than one fixed point. Nevertheless the behaviour of the 

system near a fixed point is dominated by linear terms in the ode and 

approximate local solutions are found via linearisation. It is usually 

not be possible to write down a general solution for a nonlinear system, 

so approximations and phase plane methods are useful to build up a 

working picture of how these systems behave (§1.2). 

Activity 1.K Do Chapter 3 Review Problems 1–23, 28–30 and 35–38 [K,pp190– 

2]. 

4 That is, unless no basis of eigenvectors exists, see [K,p168].

Module 2 

Scientists must write 


2.1 Basics of mathematical writing . . . . . . . . . . . . . 45 

2.2 L A TEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 

2.2.1 Why L A TEX? . . . . . . . . . . . . . . . . . . . . . . . . 50 

2.2.2 Start to use L A TEX . . . . . . . . . . . . . . . . . . . . . 51 

2.2.3 Simple mathematics goes in-line . . . . . . . . . . . . . 59 

2.2.4 List environments usefully organise . . . . . . . . . . . . 61 

2.2.5 Complex mathematics is displayed . . . . . . . . . . . . 63 

2.2.6 Figures float . . . . . . . . . . . . . . . . . . . . . . . . 75

Module 2. Scientists must write 44 

2.2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 78 

2.2.8 Many mathematical symbols in L A TEX . . . . . . . . . . 79 

Developing technical communication is essential as preparation for the workplace 

and advanced study. In this module we help you to structure, prepare 

and deliver small documents of technical material. This module is to be studied 

in parallel with the first module in preparation for your first assignment. 

In your assignments you will be required to use your skills in technical writing 

for certain exercises. These exercises will not only be graded on mathematical 

content, but also on the style and manner of the technical and English 

expression. 

The first section (§2.1) discusses the composition of mathematical writing. 

Although mathematical writing has much in common with non-technical 

writing, there are many distinctions and extensions. Some of the common 

problem areas are identified and discussed. These and basic aspects of writing 

will be assessed in the specified exercises in the assignments. 

The second section (§2.2) introduces you to L A TEX, the open standard for 

high quality typesetting of scientific and general documents (there are two 

alternative pronunciations of L A TEX: either “lay-teck” or “lah-teck”). As 

well as typesetting documents, L A TEX provides a convenient standard for the 

communication of mathematics in plain text such as e-mails—your e-mail 

enquiries to us should be phrased using the syntax and grammar of L A TEX. 

It is compulsory for you to use L A TEX for the specified exercises.


2.1 Basics of mathematical writing 

In this first introduction to the writing of technical documents involving 

mathematics we focus on incorporating mathematical equations, symbols 

and structures into a short expository document. This mathematical detail 

is based upon basic communications concepts that we first summarise and 

which I expect to be familiar to you. 

Basic written communication: Successful writing in any discipline is based 

on certain elements and these are summarised below. It could be useful 

to read Chapters 4 and 6 from Communication: A Foundation Course, by 

S. Tyler, C. Kossen and C. Ryan, rev. edn. 

• Analysing the task (what you are being asked to write about). 

• Analysing the audience (to whom are you writing). 

• Developing a thesis statement (what you intend to prove). 

• Deciding on your main points (how you intend to prove or support your 

thesis statement). 

• Logical sequence of points (developing a coherent argument). 

I recommend international students also read Chapter 5, “When English is a 

foreign language”, from Handbook of writing for the mathematical sciences, 

by N. J. Higham, 2nd edition.


Mathematical writing has special features 

Reading 2.A Study, noting the comments below, Chapter 3 “Mathematical 

writing” from Handbook of writing for the mathematical sciences, by 

N. J. Higham, 2nd edition. 

§3.1 What is a theorem? This is of interest, but we will not worry about 

whether you call results theorems, propositions, or lemmas. However, 

we will look for a well structured argument in your writing—you will 

need to state clearly what are your main results. 

§3.2 Proofs It is essential to help readers and to show you appreciate the 

role of the various parts of an argument by annotating the argument 

accordingly. 

§3.3 The role of examples Although I expect this to be irrelevant to your 

assignments, if necessary, introduce a generality by a preliminary specific 

example. 

§3.4 Definitions Only define terms if they are new and are needed in several 

places. 

§3.5 Notation Endeavour to choose a notation that is consistent and not 

confusing.


§3.6 Words versus symbols Readers typically have difficulty remembering 

the meaning of symbols that you have introduced. Even though 

you know what they mean, use words wherever reasonable wherever a 

symbol appears. 

§3.7 Displaying equations The crucial point in this section is the first 

sentence: “An equation is displayed when it needs to be numbered, 

when it would be hard to read if placed in-line, or when it merits 

special attention,. . . ” otherwise typeset equations in-line. 

§3.8 Parallelism A subtle aspect—we will not assess this. 

§3.9 Dos and Don’ts of mathematical writing has lots of good tips. 

Punctuating expressions It is important to remember that mathematical 

content, whether expressions, equations or derivations, 

must form an integral part of a sentence. Write and punctuate 

accordingly. 

Otiose symbols Avoid gratuitous symbols. 

Placement of symbols Endeavour to be clear. 

“The” or “A” addresses a small but common error. 

Notational synonyms Strive to find, among all readable and clear 

possibilities, an aesthetically pleasing version of your mathematical 

expressions.


Referencing equations Analogous to §3.6, a descriptive word or two 

helps remind readers of the subject of an equation that you reference. 

Miscellaneous In my opinion the most important of these are: 

• standard mathematical functions are set in roman, not italic; 

• avoid stacked fractions in superscripts and subscripts; 

• avoid tall in-line expressions; 

• choose the correct ellipsis; 

• avoid ambiguity in slashed fractions. 

If you see our study guides failing any of the above, then please inform us.


2.2 L A TEX 

Good writing deserves the best reproduction. Here we introduce you to 

L A TEX, the world’s best package for typesetting technical and other documents. 

You will use L A TEX for at least the specified exercises. 

From: pete@nospam 

there is simply nothing better. 

i started learning latex on my own few years ago, and at work i 

write all my reports using it. every one tells me how good the 

reports look and they wonder how i do it, since everyone else uses 

word and their report do not half look as good. 

i try to keep it a secret, but sometimes i am forced to tell, but 

most will not even try it because they think it is hard to do this 

way, but they do not know that it is actually easier. in latex i 

concentrate on the logic and content of my report and let latex 

worry about how to format it and typeset it. so with latex i am 

much faster than with those gui word processors. 

tex and latex makes writing more fun.


2.2.1 Why L A TEX? 

• L A TEX is arguably the premier typesetting package in the world. Donald 

Knuth and Leslie Lamport have distilled for us the wisdom, accumulated 

over hundreds of years, of many generations of printers. 

• The L A TEX system typesets documents with line and page breaks to 

maximise readability and appeal by avoiding as far as possible poor 

breaks and hyphenation. 

• It is simply the best package for documents containing mathematics. 

“TEX can print virtually any mathematical thought that comes into 

your head, and print it beautifully.” (Herbert S. Wilf, 1986) 

• It is free on virtually every computer in the world. 

• It is portable—stick to the standard commands and everyone can read 

and exchange documents. 

• The source file uses standard keyboard characters so it can be read 

by eye or posted by e-mail with no problems associated with different 

versions or binary files. 

• L A TEX has the reputation of being hard, but as a mark-up language it 

is effectively the same as html! 

• Weakness: it is not usually wysiwyg.


L A TEX is a very powerful typesetting system. Here we only introduce you to 

some basics of L A TEX. The idea is to provide you with enough to typeset the 

specified assignment questions. There is much more that you may learn to 

extend your use of L A TEX. 

As well as the guide written here there is a wealth of support information for 

L A TEX on the internet. More L A TEX information and many links to further 

sources are to be found at http://www.sci.usq.edu.au/staff/robertsa/LaTeX/latexintro. 

For further reading I suggest: 

• Chapter 13 “TEX and L A TEX” of Handbook of writing for the mathematical 

sciences, by N. J. Higham, 2nd edition; and 

• Learning L A TEX by D. F. Griffiths and N. J. Higham. 

2.2.2 Start to use L A TEX 

Install L A TEX on your computer: if you run Windows, the Department cdrom 

set has a copy of L A TEX (called MiKTeX, essential) and the shareware 

editor WinEdt (helpful, but not essential) for you to install; if you use Linux, 

L A TEX is included as an optional part of the Linux release; if you use a Macintosh, 

obtain OzTeX and I recommend the editor Alpha. Install whichever 

is appropriate for you. For Windows follow the instructions on the Maths & 

Computing cd-rom. If the installation fails, still make progress by following 

the instructions in a subsequent paragraph.


Execute L A TEX on a windows computer: 

1. Prepare a plain text file in any simple editing application such as 

notepad, or preferably in WinEdt. Your L A TEX source forms the text 

and name the file with the .tex extension, for example, first.tex . 

2. Execute the L A TEX application giving as input your source file, for example, 

first.tex . 

3. If there are errors, correct your source and redo the previous step. 

4. View the beautifully typeset “dvi” file generated by L A TEX, for example 

first.dvi , using the application yap . 

If your execution fails, still make progress by following the instructions in the 

following paragraph. 

If you do not have access to a computer with L A TEX: we have provided 

a web interface to L A TEX. The following are its instructions. 

1. Prepare a plain text file in any simple editing application, such as 

notepad, with your L A TEX source and preferably name it with the .tex 

extension, for example, first.tex (although notepad likes to insist on 

a .txt extension which is also acceptable).


2. Point your internet browser to the web address http://www.sci.usq.edu.au/latex, 

enter your usqconnect username and password when requested. 

3. Click the Browse... button and navigate around the file system on 

your computer to your L A TEX source file, for example, first.tex . 

4. Click Submit Document button. 

5. Wait for hopefully no more than a few seconds for a new web page to 

appear saying “The PDF file is available for download” in which case 

click on the link PDF file and view your beautiful document in Adobe 

acrobat reader. 

6. If a serious error has occurred, download the log file and use the error 

messages to guide fixing your document, then return to Step 2. It is a 

good idea to download the log file and check for non-fatal errors in any 

case. 

Your first document: you need to prepare a text file of the content of your 

composition interspersed with L A TEX commands. First tell L A TEX the sort 

of document you will be typesetting. For our straightforward needs in this 

course you will use the article style typeset in a 12pt size font. Around 

the document text that you wish to typeset, you need 

\documentclass[12pt,a4paper]{article} 

\begin{document}


... 

\end{document} 

The three dots above denote the place where the content text is to be placed. 

Second, specify the title and author of the document. These are to be specified 

in the following manner using the \title{...}, \author{...} and 

\maketitle commands. 


\begin{document} 

\title{Assignment 1, Question 3: The 

importance of beings fractal} 

\author{Ben Hall, Q99123456} 

\maketitle 

... 


Just those seven lines form a complete, though pointless, document. Try it. 

Type the above into a file (perhaps named first.tex) then run it through 

L A TEX and view the result. These seven lines form the skeleton of all our 

L A TEX documents. Contact us if there is any problem. 

Now put in some information. Simply type the text of your document in 

place of the three dots in the above skeleton. For example:







\maketitle 

Fractal geometry, largely inspired by Benoit Mandelbrot 

during the sixties and seventies, is one of the great 

advances in mathematics for two thousand years. Given 

the rich and diverse power of developments in 

mathematics and its applications, this is a remarkable 

claim. 

Often presented as being just a part of modern chaos 

theory, fractals are momentous in their own right. 

Euclid’s geometry describes the world around us in 

terms of points, lines and planes---for two thousand 

years these have formed the somewhat limited repertoire 

of basic geometric objects with which to describe the 

universe. Fractals immeasurably enhance this 

world-view by providing a description of much around us 

that is rough and fragmented---of objects that have 

structure on many sizes. Examples include: coastlines, 

rivers, plant distributions, architecture, wind gusts,


music, and the cardiovascular system. 


See that paragraphs are indicated by introducing a blank line to indicate to 

L A TEX where one paragraph ends and another begins. All other line breaks in 

the source are treated as simply blank characters. Line breaks in your source 

do not correspond to line breaks in the typeset document. 1 Type a few 

paragraphs, each of a couple of sentences, and typeset your own document. 

Longer documents need sections: although you may not need sectioning 

to answer simple questions, most documents do. In L A TEX automatically 

numbered sections and their titles are specified by the \section{...} command. 

Wherever you want to start a new section, just put this command 

with the title of the section within the braces. See the two sections in the 

following example 





1 In fact, this is one brilliant aspect of TEX: Knuth programmed a sophisticated optimisation 

scheme to determine the very best line breaks to be made in a paragraph. He 

incorporated the knowledge of the best printers accumulated over hundreds of years.



\maketitle 

Fractal geometry, largely inspired by Benoit Mandelbrot 

during the sixties and seventies, is one of the great 

advances in mathematics for two thousand years. Given 

the rich and diverse power of developments in 

mathematics and its applications, this is a remarkable 

claim. 

\section{Some fractal models} 

Before discussing in detail the common feature of the 

previously mentioned examples, I present a few 

examples of fractals and the type of physical 

applications that they have. 

\section{Scaling and dimensionality} 

The common theme in these examples is not just that 

they have detail on many lengths, but also that the 

structure at any scale is much the same at any other 

scale---the coastline around a continent looks just 

like any small part of the coastline.



However, for many purposes you will want to emphasise the main point of 

a paragraph by using the \paragraph{...} command to introduce a simple 

statement at the start. For example, 

\paragraph{Construct the Cantor set:} start with a 

bar of some length; then remove its middle third to 

leave two separate thirds; then remove the middle 

thirds of these to leave four separate ninths; then 

remove the middle thirds of these to obtain eight 

separate twenty-sevenths; and so on. Eventually we 

just obtain a scattered dust of points. 

produces the following 

Construct the Cantor set: start with a bar of some length; 

then remove its middle third to leave two separate thirds; then 

remove the middle thirds of these to leave four separate ninths; 

then remove the middle thirds of these to obtain eight separate 

twenty-sevenths; and so on. Eventually we just obtain a scattered 

dust of points.


I like using \paragraph commands. For example, at the start of this subsection 

they were used to highlight the actions to do if you could install L A TEX 

on you computer, or not, as the case may be. 

Note that the characters slosh, \, and braces, { and }, are special characters 

to L A TEX as they are used to invoke the typesetting mark-up commands. 

There are other special characters in L A TEX of which to be wary: 

• the percent sign, %, causes LaTeX to ignore the rest of the line, so you 

may comment the document if needed; 

• the dollar, $, used to delineate in-line mathematics; 

• the ampersand, &, for tabbing; 

• the underscore and caret, _ and ^, for subscripts and superscripts; 

• the hash, #, and the tilde, ~ . 

To actually get any of these last nine characters (except the slosh \) to appear 

in the final typeset document, just precede them by a slosh (a backslash). 

2.2.3 Simple mathematics goes in-line 

Mathematics to be typeset in-line with the text must be contained within 

matching dollar signs $...$ . For example, Newton’s law F = ma is typeset


by $F=ma$ . Note the different font used for mathematical letters (called math 

italic): it is imperative that all mathematics be typeset in a mathematics 

environment (even if the mathematics is just a single letter symbol), and not 

in the roman font that is the default for text; equally it is imperative that 

non-mathematical text is not placed within a mathematics environment. For 

example, 

Newton’s law is $F= m a$ 

for force $ F $, mass $m$ 

and acceleration $a$. 

Newton’s law is F = ma for 

force F , mass m and acceleration 

a. 

See that in any mathematics environment, blank characters are totally ignored. 

Subscripts and superscripts are typeset in a mathematics environment using 

the underscore and the caret character respectively. For example, $d^{-1}$ 

and $d^2$ will typeset d −1 and d 2 . The theorem of Pythagoras is a 2 + b 2 = 

c 2 obtained by $a^2+b^2=c^2$ . Similarly, subscripts are generated by the 

underscore character _ : for example, the Fibonacci numbers are obtained by 

the recursion F n+2 = F n+1 + F n typeset by $F_{n+2}=F_{n+1}+F_n$ . Single 

character scripts need no enclosing braces. 

L A TEX has an enormously wide variety of symbols to help typeset mathematics. 

For example, \times to get a times sign, ×; \propto to get a 

proportional to symbol, ∝; \pi to get the Greek letter π, and similarly for


the whole Greek alphabet. The names of these symbols have to be followed 

by a non-alphabetic character, often a blank. See §2.2.8 for tables of just 

some of the vast number of symbols available in L A TEX. 

Fractions are typeset using \frac{}{} (with two arguments in braces): 1 − n 

1 = 3 only if n = 1 is typeset using $\frac{1}{n}-1=3$ and $n=\frac{1}{2\times 2}$ . 

2×2 

For in-line mathematics you must only use simple expressions in fractions as 

otherwise it becomes to hard to read. 

2.2.4 List environments usefully organise 

Extremely useful are the list environments of which I describe two. Use 

them wherever you have a sequence of steps (perhaps in a mathematical 

argument) or a list of things (perhaps describing an algorithm). The format 

for a bulleted list is 

\begin{itemize} 

\item ... 

\item ... 

... 

\end{itemize} 

You might use such a list to clearly structure an argument such as:


\begin{itemize} 

\item To solve the differential equation $y’’-y’-2y=0$, 

\item 

substitute the exponential $y=\exp(\lambda t)$, 

\item and deduce that $\lambda^2-\lambda-2=0$. 

\end{itemize} 

(the blank lines are optional) is typeset as 

• To solve the differential equation y ′′ − y ′ − 2y = 0, 

• substitute the exponential y = exp(λx), 

• and deduce that λ 2 − λ − 2 = 0. 

For an example of a numbered list, see that I used a numbered list at the 

beginning of §2.2.2 to advise you of the steps to follow to start using L A TEX. 

The general format for a numbered list is 

\begin{enumerate} 

\item ... 

\item ... 

... 

\end{enumerate} 

Lists may be nested within lists, to a maximum depth of four.


2.2.5 Complex mathematics is displayed 

Recall, to include mathematics in-line with the text, use $...$ . However, 

often mathematics is sufficiently complicated that it displayed, centred, on 

a line by itself. For this purpose use the displaymath or equation environments. 

The difference between the two is that the equation environment 

automatically typesets a useful labelling number alongside the mathematics, 

whereas the displaymath environment does not. See the following examples: 

\begin{displaymath} 

a^2+b^2=c^2 

\end{displaymath} 

\begin{equation} 

\log x=\int_1^x 

\frac{1}{t}dt 

\end{equation} 

log x = 

a 2 + b 2 = c 2 

∫ x 

1 

1 

dt (2.1) 

t 

Relations 

L A TEX knows to typeset extra space around relations such as =, \approx (≈) 

including inequalities such as , \leq (≤) and \geq (≥), and also set 

relations such as \in (∈) or \subset (⊂).


Delimiters 

Delimiters, such as parentheses, brackets, and braces, come in various sizes 

to cope with different sub-expressions that they surround. The easiest way to 

get the size of a delimiter nearly correct is to use the modifying commands as 

in \left(...\right) . Note that \left and \right must be used in pairs 

so that L A TEX can determine the size of the intervening mathematics. 

See how the delimiters are of reasonable size in these two examples: 


\left(a+b\right)\left[1 

-\frac{b}{a+b}\right]=a\,; 



\sqrt{|xy|}\leq\left|\frac 

{x+y}{2}\right|\,. 


(a + b) 

[ 

1 − b ] 

a + b 

√ 

|xy| ≤ 

∣ 

x + y 

2 

= a ; 

∣ . 

Spacing 

In the previous two examples I used \,, and \,. to punctuate at the end 

of the equations. Both in and out of mathematics, LaTeX provides the 

commands:


• \, to typeset a thin space; 

• \␣ to typeset a normal space; 

• \quad to typeset a quad space; 

• and \! to typeset some negative space! 

Use these to space the mathematics where needed. Integrals often need a bit 

of help with their spacing as in 


\int\!\!\!\int xy^2\,dx\,dy 

=\frac{1}{6}x^2y^3\,, 


∫∫ 

xy 2 dxdy = 1 6 x2 y 3 , 

whereas vector problems often lead to statements such as 


u=\frac{-y}{x^2+y^2} 

\quad\mbox{and}\quad 

v=\frac{x}{x^2+y^2}\,. 


u = 

−y 

x 2 + y 2 and v = 

x 

x 2 + y 2 .


Use: 

• thin spaces, \,, to separate the infinitesimals from each other and from 

the integrand in integrals, and to separate punctuation from an equation 

or expression; 

• some negative space, \!\!\!, in multi-dimensional integrals to bring 

the integral signs closer together; 

• \quad to separate two or more equations or text on the one line. 

Observe the use of the \mbox{...} command to include a few words of 

ordinary (roman) text and its spacing within a mathematics environment. 

Arrays 

Frequently we need to set mathematics in a tabular format. For example, 

arrays are typeset within a mathematics environment by the array environment. 

The structure is 

\begin{array}{argument} 

... & ... & ... & ... \\ 

... & ... & ... & ... \\ 

... & ... & ... & ... 

\end{array}


for an array of three rows and four columns. The argument consists of the 

letters r, c or l to indicate that the corresponding columns are to be typeset 

right, centred or left justified. An array example is 


\left[\begin{array}{ccc} 

1 & x & 0 \\ 

0 & 1 & -1 

\end{array}\right] 

\left[\begin{array}{c} 

1 \\ y \\ 1 


=\left[ \begin{array}{c} 

1+xy \\ 

y-1 

\end{array} \right]\,, 


[ 

1 x 0 

0 1 −1 

] ⎡ ⎢ ⎣ 

1 

y 

1 

⎤ 

⎥ 

⎦ = 

[ 

1 + xy 

y − 1 

] 

, 

or in a case statement such as



|x|=\left\{\begin{array}{rl} 

x\,, & \mbox{if }x\geq 0\,, \\ 

-x\,, & \mbox{if }x< 0\,. 

\end{array}\right. 


|x| = 

{ 

x , if x ≥ 0 , 

−x , if x < 0 . 

Many arrays have lots of dots all over the place as in 


\left[\begin{array}{ccccc} 

-2&1&0&\cdots&0 \\ 

1&-2&1&\cdots&0 \\ 

0&1&-2&\ddots&\vdots \\ 

\vdots&\vdots&\ddots&\ddots&1\\ 

0&0&\cdots&1&-2 



⎡ 

⎢ 

⎣ 

−2 1 0 · · · 0 

1 −2 1 · · · 0 

0 1 −2 . . . . 

. 

. . .. . .. 1 

0 0 · · · 1 −2 

⎤ 

⎥ 

⎦ 

Equation arrays 

Often we want to align related equations together, or to align each line of a 

multi-line derivation. The eqnarray mathematics environment does this.


The format is the same as an array environment, except that the eqnarray 

environment automatically assumes three columns: the left column right 

justified; the centre, centred; and the right column left justified: 

\begin{eqnarray} 

... & ... & ... \\ 

... & ... & ... \\ 

... & ... & ... 

\end{eqnarray} 

Each line will be numbered by L A TEX, unless you specify \nonumber in a line, 

or unless you use the * form of eqnarray. 

For examples, in the flow of a fluid film we may report 


u&=&\epsilon^2 k_{xxx}\sin y\,,\\ 

v&=&\epsilon^3 k_{xxx} y\,, \\ 

p&=&\epsilon k_{xx}\,. 


u = ɛ 2 k xxx sin y , (2.2) 

v = ɛ 3 k xxx y , (2.3) 

p = ɛk xx . (2.4) 

Alternatively, the curl of a vector field (u, v, w) may be written with only 

one equation number:



\omega_1 & = & 

\frac{\partial w}{\partial y} 

-\frac{\partial v}{\partial z} 

\,, \nonumber \\ 

\omega_2 & = & 

\frac{\partial u}{\partial z} 

-\frac{\partial w}{\partial x} 

\,, \\ 

\omega_3 & = & 

\frac{\partial v}{\partial x} 

-\frac{\partial u}{\partial y} 

\,. \nonumber 


ω 1 = ∂w 

∂y − ∂v 

∂z , 

ω 2 = ∂u 

∂z − ∂w 

∂x , (2.5) 

ω 3 = ∂v 

∂x − ∂u 

∂y . 

Whereas a derivation may look like


\begin{eqnarray*} 

&& 

(p ∧ q) ∨ (p ∧ ¬q) 

(p\wedge q)\vee(p\wedge\neg q)\\ 

= p ∧ (q ∨ ¬q) distributive law 

& = & p\wedge(q\vee\neg q) 

\quad\mbox{distributive law}\\ = p ∧ T excluded middle 

& = & p\wedge T 

\quad\mbox{excluded middle} \\ 

& = & p 

\quad\mbox{by identity} 

\end{eqnarray*} 

= p by identity 

Functions 

L A TEX knows how to typeset a lot of mathematical functions. 

• Trigonometric and other elementary functions are defined by the obvious 

corresponding command name. Two examples are \sin x or 

\exp(i\theta) . Observe that trigonometric and other elementary 

functions are typeset properly, in roman, even to the extent of automatically 

providing a thin space if followed by a single symbol argument:



\exp(i\theta)=\cos\theta 

+i\sin\theta\,, 

exp(iθ) = cos θ + i sin θ , 



\sinh(\log x)=\frac{1}{2} 

\left(x-\frac{1}{x}\right)\,. 


sinh(log x) = 1 2 

( 

x − 1 ) 

. 

x 

• Subscriptss on more complicated functions, such as \lim_{...} and 

\max_{...} are appropriately placed under the function name. 


\lim_{q\to\infty}\|f(x)\|_q 

=\max_{x}|f(x)|\,. 


lim ‖f(x)‖ q = max |f(x)| . 

q→∞ x 

• And the same goes for both sub- and superscripts on large operators 

such as \sum, \prod, etc.



e^x=\sum_{n=0}^\infty 

\frac{x^n}{n!}\,,\quad 

n!=\prod_{i=1}^n i\,. 


e x = 

∞∑ 

n=0 

x n 

n! , n! = n ∏ 

i=1 

i . 

Although in in-line mathematics the scripts are automatically placed 

to the side in order to conserve vertical space and to strive for uniform 

vertical spacing, as in 1/(1 − x) = ∑ ∞ 

n=0 x n obtained from 

$1/(1-x)=\sum_{n=0}^\infty x^n$ 

Accents 

Common mathematical accents over a single character, say a, are: \bar a 

for ā; \tilde a for ã; \hat a for â; \dot a for ȧ; \ddot a for ä; and \vec a 

for ⃗a . Two examples:



\bar f=\frac{1}{L} 

\int_0^L f(x)\,dx\,, 

¯f = 1 ∫ L 

f(x) dx , 

L 0 



\dot{\vec \omega}= 

\vec r\times\vec I\,. 


˙⃗ω = ⃗r × ⃗ I . 

Command definition 

L A TEX provides a facility for you to define your very own commands. Most 

useful commands involve arguments; I give three of my favourites. The first 

two, with two arguments, define partial derivative commands 

\newcommand{\D}[2]{\frac{\partial #2}{\partial #1}} 

\newcommand{\DD}[2]{\frac{\partial^2 #2}{\partial #1^2}} 

\renewcommand{\vec}[1]{{\bf #1}} 

and the last, with one argument, redefines the \vec command to denote 

vectors by boldface characters (rather than have an arrow accent). Note that 

within a definition, #n denotes a placeholder for the nth supplied argument. 

This vector identity will serve nicely to illustrate two of the new commands:



\nabla\times\vec q 

=\vec i\left(\D yw-\D zv\right) 

+\vec j\left(\D zu-\D xw\right) 

+\vec k\left(\D xv-\D yu\right) 


∇ × q = i 

( ∂w 

∂y − ∂v ) ( ∂u 

+ j 

∂z ∂z − ∂w ) ( ∂v 

+ k 

∂x ∂x − ∂u ) 

∂y 

You will have noticed that L A TEX is very verbose. Many people define their 

own abbreviations for the common commands so that they are quicker to 

type. My advice is do not do this; it makes your L A TEX much less portable 

and harder to read. Instead, setup your editor to cater for the verbosity; 

use command definitions only to give you new logical facilities, such as the 

partial differentiation. 

2.2.6 Figures float 

Often we illustrate or support discussion with a figure. Usually figures are 

big. Thus they make a mess of the pagination of a document. The solution 

adopted by professional printers and L A TEX is to generally place figures at


the top or bottom of a page, or on a page by itself, near where the author 

specifies. That is, the location of a figures “floats”. The usual way to include 

a figure in L A TEX is as follows. 

1. Create a postscript file of the drawing from whatever application is 

being used to generate the figure. For example, in Matlab draw the 

figure then execute the command 

print -depsc2 filename.eps 

to store in the file filename.eps the encapsulated postscript to draw 

the figure. Users of Windows may have trouble generating postscript 

from other applications as Microsoft generally do not distribute the 

postscript printer driver—if needed, get it. 

2. Then place in the preamble (that part of the document between the 

\documentclass command and the \begin{document} environment) 

the command \usepackage{graphicx} . This tells L A TEX to load information 

about how to include graphics. 

3. Somewhere near where you want the figure, include the figure environment 

\begin{figure} 

\centerline{\includegraphics{...}} 

\caption{...} 

\end{figure}


where the argument of the \includegraphics command is the full filename, 

and the argument to the \caption command is text describing 

the figure. 

4. Or use this version to scale the picture up/down to the width of the 

page 


\includegraphics[width=0.9\textwidth]{...} 

\caption{...} 

\end{figure} 

The optional [width=0.9\textwidth] scales the figure to 90% of the 

width of the typeset text: change it if desired; leave it out in order to 

reproduce the figures unscaled. 

For example, the following commands draw and place Figure 2.1 somewhere 

near here (on page 78), but not precisely here as L A TEX has chosen a better 

place for it. 


\centerline{\includegraphics{cantor.eps}} 

\caption{steps in the construction of a Cantor set.} 

\end{figure}


Figure 2.1: steps in the construction of a Cantor set. 

Note: I strongly recommend that you generate any graphic at about the same 

size as it is to appear. This is so that the title, label and legend information 

is actually readable and the line thicknesses are creditable. Astonishingly, 

some people do shrink a figure by a factor of about three and expect the 

captions and labelling to be readable! 

2.2.7 Summary 

• L A TEX is the best. 

• In §2.2.2 you saw how to prepare and process simple documents in 

L A TEX complete with titles and sectioning. then simple mathematics 

could go in-line as seen in §2.2.3. 

• But note that lists, §2.2.4, provide a useful structure for mathematical 

derivations as well as lists of points.


• Complicated mathematics is displayed. As discussed in §2.2.5 there is 

a multitude of L A TEX structures and commands to help you do this. 

Make your displayed mathematics a work of art, but keep it as simple 

as possible so L A TEX works for you. 

• Lastly, we often need to typeset a figure in a mathematical document 

as described in §2.2.6. L A TEX floats these to a reasonable position. 

L A TEX can do so much more for you too: automatic cross-referencing, tables, 

table of contents, footnotes, marginal notes, hypertext links, bibliographies, 

indexing, two-sided printing, two-column printing, colour, different fonts, a 

vast number of mathematical symbols, music, etc. Here we have presented 

the basics. 

2.2.8 Many mathematical symbols in L A TEX


α \alpha β \beta γ \gamma δ \delta 

ɛ \epsilon ε \varepsilon ζ \zeta η \eta 

θ \theta ϑ \vartheta ι \iota κ \kappa 

λ \lambda µ \mu ν \nu ξ \xi 

π \pi ϖ \varpi ρ \rho ϱ \varrho 

σ \sigma ς \varsigma τ \tau υ \upsilon 

φ \phi ϕ \varphi χ \chi ψ \psi 

ω \omega 

Table 2.1: Lowercase Greek letters 

Γ \Gamma ∆ \Delta Θ \Theta Λ \Lambda 

Ξ \Xi Π \Pi Σ \Sigma Υ \Upsilon 

Φ \Phi Ψ \Psi Ω \Omega 

Table 2.2: Uppercase Greek letters 

± \pm ∩ \cap ⋄ \diamond ⊕ \oplus 

∓ \mp ∪ \cup △ \bigtriangleup ⊖ \ominus 

× \times ⊎ \uplus ▽ \bigtriangledown ⊗ \otimes 

÷ \div ⊓ \sqcap ⊳ \triangleleft ⊘ \oslash 

∗ \ast ⊔ \sqcup ⊲ \triangleright ⊙ \odot 

⋆ \star ∨ \vee ∧ \wedge ○ \bigcirc 

† \dagger \ \setminus ∐ \amalg ◦ \circ 

‡ \ddagger · \cdot ≀ \wr • \bullet 

Table 2.3: Binary Operation Symbols


≠ 

≤ \leq ≥ \geq ≡ \equiv |= \models 

≺ \prec ≻ \succ ∼ \sim ⊥ \perp 

≼ \preceq ≽ \succeq ≃ \simeq | \mid 

≪ \ll ≫ \gg ≍ \asymp ‖ \parallel 

⊂ \subset ⊃ \supset ≈ \approx ⊲⊳ \bowtie 

⊆ \subseteq ⊇ \supseteq ∼ = \cong ⊲⊳ \Join 

\neq ⌣ \smile 

⊑ \sqsubseteq ⊒ \sqsupseteq 

∈ \in ∋ \ni ∝ \propto 

⊢ \vdash ⊣ \dashv 

Table 2.4: Binary relations 

. 

= \doteq ⌢ \frown 

̸ 

ℵ \aleph ′ \prime ∀ \forall ∞ \infty 

¯h \hbar ∅ \emptyset ∃ \exists 

ı \imath ∇ \nabla ¬ \neg △ \triangle 

j \jmath √ \surd ♭ \flat △ \triangle 

l \ell ⊤ \top ♮ \natural ♣ \clubsuit 

℘ \wp ⊥ \bot ♯ \sharp ♦ \diamondsuit 

R \Re ‖ \| \ \backslash ♥ \heartsuit 

I \Im \angle ∂ \partial ♠ \spadesuit 

Table 2.5: Miscellaneous symbols


← \leftarrow ←− \longleftarrow ↑ \uparrow 

⇐ \Leftarrow ⇐= \Longleftarrow ⇑ \Uparrow 

→ \rightarrow −→ \longrightarrow ↓ \downarrow 

⇒ \Rightarrow =⇒ \Longrightarrow ⇓ \Downarrow 

↔ \leftrightarrow ←→ \longleftrightarrow ↕ \updownarrow 

⇔ \Leftrightarrow ⇐⇒ \Longleftrightarrow ⇕ \Updownarrow 

↦→ \mapsto ↦−→ \longmapsto ↗ \nearrow 

←↪ \hookleftarrow ↩→ \hookrightarrow ↘ \searrow 

↼ \leftharpoonup ⇀ \rightharpoonup ↙ \swarrow 

↽ \leftharpoondown ⇁ \rightharpoondown ↖ \nwarrow 

⇀↽ \rightleftharpoons 

Table 2.6: Arrow symbols 

( ( ) ) ↑ \uparrow 

[ ] ] ] ↓ \downarrow 

{ \{ } \} ↕ \updownarrow 

⌊ \lfloor ⌋ \rfloor ⇑ \Uparrow 

⌈ \lceil ⌉ \rceil ⇓ \Downarrow 

〈 \langle 〉 \rangle ⇕ \Updownarrow 

/ / \ \backslash 

| — ‖ \| 

Table 2.7: Delimiters


∑ ∑ 

\sum 

⋂ ⋂ 

\bigcap 

⊙ ⊙ 

\bigodot 

∏ ∏ 

\prod 

⋃ ⋃ 

\bigcup 

⊗ ⊗ 

\bigotimes 

∐ ∐ ⊔ ⊔ ⊕ ⊕ 

\coprod \bigsqcup \bigoplus 

∫ ∫ 

∨ ∨ ⊎ ⊎ 

\int 

\bigvee \biguplus 

∮ ∮ 

∧ ∧ 

\oint 

\bigwedge 

Table 2.8: Variable-sized symbols 

û \hat{u} ú \acute{u} ū \bar{u} ˙u \dot{u} 

ǔ \check{u} ù \grave{u} u \vec{u} ü \ddot{u} 

ŭ \breve{u} ũ \tilde{u} 

Table 2.9: Math accents

Module 3 

Describing the conservation of 

material 

In this module we begin to learn how to mathematically model the flow of 

material such as water and air. On our human scale, such material appears 

smooth and continuous, albeit made of uncounted billions of tiny molecules. 

We are lead to treat it as smooth in a mathematical description. The first 

task is to discover how to describe the movement of material. Only then do 

we move on to encode physical principles in mathematical terms that tell us 

how the various properties of the material interact and evolve. Solutions of 

the resulting mathematical models exhibit the rich variety of behaviour we 

see and use in our everyday life.

Module 3. Describing the conservation of material 85 


3.1 Eulerian description of motion . . . . . . . . . . . . . 86 

3.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 89 

3.2 Conservation of mass . . . . . . . . . . . . . . . . . . . 96 

3.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 97 

3.3 Car traffic . . . . . . . . . . . . . . . . . . . . . . . . . . 100 

3.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 105 

3.3.2 Age structured populations is another application . . . 108 


3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 112 

The text for this module is by AJ Roberts, A one-dimensional introduction 

to continuum mechanics, World Scientific. References to the text use the 

format [R,reference].


3.1 Eulerian description of motion 

What does it mean to say: “the tide moves water in an estuary with a velocity 

5 cos(x/100−t/3) km/h”? Where will the fallout from the Chernobyl nuclear 

reactor accident be carried by the wind? Answers to such questions require 

an understanding of how the bulk movement of a material may be described 

by mathematics. In this section we begin to do this. 

Main aims: 

the most important aims of the section are to 

• understand the Eulerian description 1 of movement [R,§1.3], and 

• to introduce, understand and use the material derivative [R,§1.4] 

where v denotes the velocity of the material. 

Df 

Dt = ∂f 

∂t + v ∂f 

∂x , (3.1) 

Reading 3.A Read all of Chapter 1 [R,pp1–10]. Especially study §1.3–4 

and work through Examples 1.2–4. 

1 Leonhard Euler (1707–83), born in Switzerland, developed the application of differential 

equations to the world around us. He worked prolifically in hydraulics, ballistics, 

geometry, optics, magnetism and electricity. He also introduced much modern notation 

such as i = √ −1.


The discussion in §1.2 [R,p6–7] and the Example 1.2–3 shown in Figure 1.4 

is readily realised. Get a rubber band and initially hold lightly tensioned 

between your two hands. Then move your hands apart. This is roughly the 

deformation discussed in Example 1.2–3. Use a bit of “liquid paper” or a 

texta to put some dots on the rubber band. Watch how the dots move as 

you stretch the rubber band. These could be the Lagrangian particle paths 

discussed in Example 1.2. 

Oceanographers often drop “floats” to drift with the ocean currents. These 

floaters are Lagrangian because they are carried with the moving water. 

They are used to help determine ocean circulation which, for example, in 

turn helps us model the ocean-atmosphere system to predict the nature of 

global warming. 

The nature of the material derivative is also illustrated in car traffic. An 

observer sitting on the side of the road is an Eulerian observer of the traffic. 

He/she would see, for example, a tight bunch of cars quickly passing by and 

so the observed change in time of the density, the time derivative, would be 

quite high. However, a driver in one of the cars in the bunch is a Lagrangian 

observer, the driver would be stuck in the bunch for a long time and so the 

moving driver observes rates of change in time which are much lower. Using 

the material derivative, a stationary observer is able to determine how the 

moving driver will see the surrounding traffic, and vice-versa. 

Example 3.1: worked Problem 1.4 Here I outline the steps in an answer 

to Problem 1.4 [R,p10]. Work through the details.


(a) dx L /dt = v L is the velocity of the particle at x L at time t which 

= v E (x L , t) by its definition. 

(b) From part (a) 

• dxL = v E (x L , t) = 2t x L +1+t 2 which is a linear, first-order, 

dt 1+t 2 

ordinary differential equation for x L . 

• Recall [K,§1.7] that we multiply by an integrating factor (What 

is it?) to solve analytically this class of differential equations, 

to find 

• x L = (1+t 2 )(t+C) is the general solution for some integration 

constant C. 

• But you know that at time t = 0 particles have their initial 

position, namely x L (ξ, 0) = ξ, which determines the integration 

constant to be just C = ξ. 

• Thus x L = (1 + t 2 )(t + ξ). 

• Then use this analytic solution to find that the endpoints of 

[0, 1] get carried to the endpoints of [10, 15]. 

(c) Straightforwardly check your answers are: 

• ξ E = 

x − t by rearranging x L ; 

1+t 2 

• v L = 1 + 3t 2 + 2tξ by differentiating x L ; 

• a L = 6t + 2ξ by differentiating v L ; 

• and then confirm that a L (ξ E , t) = DvE . Dt 

“ξ” is the Greek 

letter “xi” and 

corresponds to the 

English “x”.


3.1.1 Exercises 

Activity 3.B Do Problems 1.1 [R,p5], 1.3, 1.5 [R,pp10–1], and the Exercises 

3.2–3.6. Send in to the examiner for feedback at least Problem 1.3 

[R,p10] and Exercise 3.2. 

Ex. 3.2: Sketch the velocity field, v(x), corresponding to the particle paths 

shown in the following picture. Note that v = dx = 1/(dt/dx) = 

dt 

1/(slope) and so v(x) at any x is inversely proportional to the slope at 

that x. 

5 

4 

3 

t 

2 

1 

0 

0 2 4 6 8 10 

x


Ex. 3.3: This time the velocity field, v(x, t), depends upon time. For particle 

paths shown below, sketch the velocity field at times t = 2 and 

t = 4. 

5 

4 

3 

t 

2 

1 

0 

0 2 4 6 8 10 

x 

Ex. 3.4: Consider the movement of some material in a one-dimensional continuum. 

Sketch the velocity field, v(x, t), at times t = 1.5 and t = 3.5 

corresponding to the particle paths shown below.


4 

particle paths 

3.5 

3 

2.5 

t 

2 

1.5 

1 

0.5 

0 

0 2 4 6 8 10 

x 

Ex. 3.5: For the steady (time independent) velocity field shown below, sketch 

the acceleration field obtained from the material derivative.


v 

1 

0.5 

0 

-0.5 

0 1 2 3 4 5 6 7 8 

x 

Ex. 3.6: Suppose particles of a continuum accelerate at a = sin 2x, use the 

material derivative to determine the corresponding steady velocity field 

v(x) given that v = 0 at x = 0. 

Ex. 3.7: Some particle paths are shown in the following picture:


t 

5 

4.5 

4 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

particle paths 

0 

0 2 4 6 8 10 

x 

Given that the material was of uniform density at time t = 0, say 

ρ(x, 0) = 1, sketch the density of the material at time t = 3. Also 

sketch a graph of the particles’ velocity versus x at time t = 4. 

Example 3.8: worked Problem 1.2 Problem 1.2 [R,p5] leads into Section 

3.3 where we model the flow of car traffic as a continuum. This


problem is a little difficult but shows how some algebra leads us to deduce 

that we may treat car traffic on a road as a material continuum! 

(a) The probability of having n cars in a stretch of road of length 

x + δx, P n (x + δx), equals the probability of n cars in length x 

and none in length δx, together with the probability of n − 1 cars 

in length x and one car in length δx. 

• Hence P n (x + δx) = P n (x)(1 − λδx) + P n−1 (x)λδx. 

• Rearrange to Pn(x+δx)−Pn(x) + λP 

δx n = λP n−1 . 

• Thus as δx → 0, dPn + λP dx n = λP n−1 is a differential equation 

for P n . 

• Substitute the expression P n = (λx) n e −λx /n! to show that it 

satisfies the differential equation. 

• One should also show that ∫ ∞ 

0 P n (x) dx = 1 in order for P n to 

be a proper probability distribution. Use induction on n and 

integration by parts to do so. (Should any other property be 

checked?) 

(b) Deduce: 

(i) P n (0) = 0 is the probability of n cars fitting on a stretch of 

road of length 0; 

(ii) P 0 (L) = e −λL is the exponentially decaying probability of no 

cars on a length L; 

(iii) P 1 (L) = λLe −λL is the probability of finding just one car 

in a length L, it reasonably rises from zero with L but over


lengths bigger than 1/λ it decays to zero as more and more 

cars are likely on long lengths of road. 

(c) The expected number of cars on a length x of road is 

¯n(x) = 

∞∑ 

nP n (x) by definition of expectation 

n=0 

= 

∞∑ 

n (λx)n e −λx 

n=1 

n! 

by value of P n 

∞ 

= λxe −λx ∑ (λx) n−1 

n=1 

(n − 1)! 

rearranging factors 

= λx by Taylor series for e λx 

By similar machinations deduce the variance σ 2 (x) = (n − ¯n) 2 = 

n 2 − ¯n 2 is also just λx. 

Thus using an averaging length L, the average density of cars is ρ = 

¯n/L = λ. Since ¯n typically has random fluctuations of size σ = √ √ 

λL, 

this estimate of the density has fluctuations of a size σ/L = λ/L → 0 

for large L. Thus averaging works and so cars on a road may be viewed 

as a continuum!


3.2 Conservation of mass 

A biologist, physicist and mathematician were in a bar. They 

watch two people enter a house across the street. A little later, 

they see three people leave the house. The biologist says, “They 

must have reproduced.” The physicist says, “We must have misinterpreted 

the initial input.” The mathematician says, “If one 

more person enters the house, there will be no one inside.” 

Once we know how to describe motion and properties, we then have to deduce 

how these relate to each other. Principles based upon conservation enable 

us to do this. Based upon the conservation of mass we deduce a differential 

equation of wide spread importance. 

Main aims: 

the most important aims of this section are: 

• to understand how identifying physical processes in and on a slice of 

the continuum leads to a partial differential equation to solve; 

• the derivation of the continuity equation 

Reading 3.C Study all of Section 2.1 [R,pp13–16]. 

∂ρ 

∂t + ∂ (ρv) = 0 . (3.2) 

∂x



Activity 3.D Do Problem 2.1 [R,p15] and Exercises 3.9–3.12. Send in to 

the examiner for feedback at least Problems 3.10–3.12. 

Ex. 3.9: At some time the density of a material just happens to be constant 

in x and the velocity field is as drawn below 

v 

1 

0.5 

0 

-0.5 

0 1 2 3 4 5 6 7 8 

x 

Use the continuity equation (3.2) to identify where the density is increasing 

in time and where the density is decreasing. 

Ex. 3.10: Suppose the density at some fixed station x evolves in time according 

to the following picture


2.5 

2 

ρ 

1.5 

1 

0.5 

0 1 2 3 4 5 6 

t 

Can you justifiably deduce anything from the continuity equation about 

the velocity field v in the neighbourhood of x? If so, what? 

Ex. 3.11: Suppose the density and velocity at some time t are such that the 

product ρv is as shown in the following picture 

1.5 

ρ v 

1 

0.5 

0 1 2 3 4 5 6 

x


Can you justifiably deduce anything about the evolution of the density 

field ρ? If so, what? 

Ex. 3.12: Consider an interval [a, b] of a continuum and investigate the rate 

at which material is carried into and out of the interval. Suppose the 

velocity and density at the left-hand side is v(a) = 1 + t 2 and ρ(a) = 2, 

while that at the right-hand side is v(b) = (1+t) 2 and ρ(b) = 1/(1+t 2 ). 

At what net rate is matter being carried into the interval? How much 

is carried in between times t = 0 and t = 1? 

Ex. 3.13: Suppose the density field of a one dimensional continuum is ρ = 

exp[sin(t − x)] and the velocity field is v = cos(t − x). What is the 

flux of material past x = 0 as a function of time? how much material 

passes x = 0 in the time interval [0, π/2] ?


3.3 Car traffic 

As far as the laws of mathematics refer to reality, they are not 

certain, and as far as they are certain, they do not refer to reality. 

Albert Einstein 

One application of continuum modelling is to car traffic. We explore the 

modelling here, and from the mathematical model deduce phenomena that 

are seen on the roads. 

Main aims: 

the most important aims of this section are: 

• to appreciate the continuum modelling of car traffic; 

• the use of experimental results to formulate a complete problem; 

• the use of the classic approach of seeking equilibria and then linearisation 

to gain a preliminary understanding of the dynamics (as in Module 

1 but in vastly higher dimension). 

• and to introduce the basic features of the method of characteristics for 

solving nonlinear partial differential equations. 

Reading 3.E Study all of Section 2.2 [R,§2.2,pp16–37].


Note that the method of characteristics is not just an algebraic technique, 

that geometry and graphical drawing plays a crucial role. This is a feature 

that many people find difficult as they predominantly see mathematics purely 

as algebraic manipulation. But for the method of characteristics the graphical 

element is essential. 

Example 3.14: Given the car flux density relation Q(ρ) = ρ(1−ρ/150) cars 

per minute where ρ is measured in cars per km, 0 ≤ ρ ≤ 150. 

(a) Draw the graph of characteristics for the evolution of a denser 

patch of cars for which the initial density is ρ 0 (x) = 25 + 50e −x2 /2 

cars per km. 

(b) Hence graph the predicted solution ρ(x, t) at times t = 0, 1, 2 and 

3 minutes. 

Solution: First, deduce the wave speed c(ρ) = Q ′ (ρ) = 1 − ρ/75 km 

per minute. Then tabulate the initial density field, the wave speed (the 

“slope” of the characteristics passing through all the initial points) and 

thus also the equation of the characteristic x = s + c 0 (s) t:


s ρ 0 c 0 = c[ρ 0 ] characteristic t = 1 t = 2 t = 3 

-4 25.02 0.6664 x = −4 + 0.67 t -3.33 -2.67 -2.00 

-3 25.56 0.6593 x = −3 + 0.66 t -2.34 -1.68 -1.02 

-2 31.77 0.5764 x = −2 + 0.58 t -1.42 -0.85 -0.27 

-1 55.33 0.2623 x = −1 + 0.26 t -0.74 -0.47 -0.21 

0 75 0 x = 0 + 0 t 0 0 0 

1 55.33 0.2623 x = 1 + 0.26 t 1.26 1.52 1.79 

2 31.77 0.5764 x = 2 + 0.58 t 2.58 3.15 3.73 

3 25.56 0.6593 x = 3 + 0.66 t 3.66 4.32 4.98 

4 25.02 0.6664 x = 4 + 0.67 t 4.67 5.33 6.00 

Also tabulated is the position, x value, of each characteristic at three 

later times. At these positions the density is the value of ρ 0 in the same 

row. On each of the characteristics, plotted below in a characteristic 

diagram, the density is constant, namely the value it had initially, as 

tabulated in the legend.


3 

2.5 

t 

2 

1.5 

1 

0.5 

25.02 

25.56 

31.77 

55.33 

75 

55.33 

31.77 

25.56 

25.02 

0 

-4 -2 0 2 4 6 

x 

s=(-4:4)’; 

rho0=25+50*exp(-s.^2/2); 

c0=1-rho0/75; 

t=0:3; 

x=s*ones(size(t))+c0*t; 

plot(x,t) 

legend(num2str(rho0)) 

Then at any time t compute from the equation for each characteristic 

given above, the value of x where you will find the density ρ 0 (s). Plotting 

these points gives the following curves for the density ρ(x, t) at 

each time t.


70 

60 

50 

t=0 

t=1 

t=2 

t=3 

ρ 

40 

30 

20 

10 

0 

-4 -2 0 2 4 

x 

s=linspace(-4,4)’; 

rho0=25+50*exp(-s.^2/2); 

c0=1-rho0/75; 

t=0:3; 

x=s*ones(size(t))+c0*t; 

plot(x,rho0) 

Observe how, over time, the density steepens at the back of a bunch of 

cars, and lessens at the front. 

In the traffic light example in the textbook you have to imagine that all values 

of density occur at the mathematical point x = 0. Physically the initial 

density will smoothly rise from 0 in a distance in front of the traffic lights 

to the jamming density a distance behind the traffic lights: the reason is 

that density is only defined by choosing some averaging length, when this 

averaging length contains part queue and part empty road in front of the


lights then you get an intermediate value of the density. However, such a 

physically smooth transition occurs in the mathematical model at the mathematical 

point x = 0. Thus draw characteristics corresponding to every value 

of density emanating from x = 0. 


Activity 3.F Do Problems 2.2–6 [R,pp34–6] and Exercises 3.15–3.16. Send 

in to the examiner for feedback at least Problems 2.2 & 2.3(a) [R,pp34– 

5], and Ex. 3.16(a) below. 

In the last line of 

Prob. 2.6(d), 

“dr/dt” should be 

“dρ/dt.” 

Ex. 3.15: Assume the flux Q(ρ) = ρ(1 − ρ/150)(1 − ρ/300) cars per minute 

where the density ρ is in cars per km. 

(a) A uniform stream of cars is travelling at 50 km/hr. Approximately 

what density is the car traffic? At location x = 0, say, a group 

of cars leave the road to view the scenery, fill up with petrol, etc. 

Because they leave, the car traffic density is decreased locally: at 

what speed does the patch of low density travel in the car traffic? 

(b) Repeat the above for the case when the cars travel at 20 km/hr 

(because they are at a higher density).


Ex. 3.16: (a) Show that a constant ρ(x, t) = ρ ∗ is an equilibrium solution 

(a fixed point) of 

∂ρ 

∂t = c(ρ) ∂2 ρ 

∂x . 2 

Argue that small fluctuations to ρ about ρ ∗ , say ˆρ(x, t), then obey 

the differential equation ∂ ˆρ = c ∂t ∗ ∂2 ˆρ 

(approximately). 

∂x 2 

(b) Repeat the above for the differential equation 

∂ρ 

∂t = ∂ [ 

c(ρ) ∂ρ ] 

. 

∂x ∂x 

Ex. 3.17: The initial value problem ∂ρ + ρ ∂ρ = 0 such that ρ(x, 0) = ρ ∂t ∂x 0(x) 

has solution ρ = ρ 0 (s) on characteristics x = s+ρ 0 (s)t. Regard x = s+ 

ρ 0 (s)t as an implicit equation for the function s(x, t), then differentiate 

it to find implicit formula for ∂s ∂s 

and . Hence show that ρ = ρ ∂t ∂x 0[s(x, t)] 

does indeed satisfy the governing differential equation ∂ρ + ρ ∂ρ = 0 . 

∂t ∂x 

Ex. 3.18: In Figure 3.1 is drawn the wave speed c(ρ) as a function of density 

ρ for car traffic along some road. Sketch the corresponding car 

flux-density relation Q(ρ). Estimate the value of the density corresponding 

to the maximum flux of cars. 

Also shown in Figure 3.1 is a plot of some initial car density field 

ρ 0 (x). Draw, with a little care, in the tx-plane characteristic curves 

for the subsequent evolution of car traffic; draw enough so that you 

then can draw a predicted density field ρ(x, t) at time t = 1 minute.


1 

c(ρ) (km/min) 

0.5 

0 

-0.5 

0 50 100 150 

ρ (cars/km) 

150 

ρ 0 

(cars/km) 

100 

50 

0 

-3 -2 -1 0 1 2 3 

x (km) 

Figure 3.1:


Approximately where and when do you estimate a “traffic shock” will 

form? Give reasons. 

3.3.2 Age structured populations is another application 

This example of age structured populations is introduced to show a slightly 

different use of the continuity equation. The same important concepts are 

used in a different application. 

Consider a population of individuals of some species, either plant or animal. 

We study the structure of ages of the individuals in the population (not their 

spatial structure as in car traffic and other applications explored later). 

• Let x denote the age of individuals, in years say, and use fractions of 

years by letting the age x be a real number (not just integers). Let 

t denote time as usual, in years also. Then the density ρ(x, t) is the 

average number of individuals in the population who at time t have age 

approximately x. 

• Now, the number of individuals who cross an age x are precisely those 

that are at age x. Hence the flux is q = ρ. Equivalently, imagine that 

individuals are aging at a rate v = 1 year per year (obviously!), and so 

again the flux q = ρv = ρ.


• Individual plants or animals will die due to accidents, disease, old age, 

etc. For simplicity, here just assume death only by accident with a constant 

probability; ignore old age and other mortal enemies. Then this is 

an example of the process introduced in Problem 2.1 where individuals 

are removed at some rate. The expected number of individuals to die 

at age x, and hence removed from the population, is then proportional 

to the number at that age, namely ρ. Thus include a “source” term 

r = −λρ to the right-hand side of the continuity equation (negative 

because deaths remove individuals). 

• A continuity equation for the age structure ( ∂ρ + ∂q 

∂t ∂t 

∂ρ 

∂t + ∂ρ 

∂x = −λρ . 

= r) is thus 

• For example, a steady age population is found by assuming ∂ρ/∂t = 0 

and solving this equation. Then ρ = Ce −λx which shows the exponentially 

decreasing numbers of individuals at age x as fatal accidents 

almost inevitably happen to an individual sooner or later. 

• But what is the integration constant C? A partial differential equation 

generally needs boundary conditions and so far I have not supplied it 

with any. Here we need to specify some birth-rate. A constant rate of 

births could reflect a scientist continually preparing new young cultures 

to place in the population: specified by say ρ(0, t) = C. 

A more sophisticated model says that the number of births depends 

upon the number and age of the parent population. One simple example


arises by assuming that each individual constantly and independent of 

age gives rise to new individuals: then ρ(0, t) = µ ∫ ∞ 

0 ρ dx for some 

birth-rate µ. Determine the integration constant C as a function of the 

birth-rate µ. 


3.6 v = √ 1 − cos 2x 

3.10 No. 

3.11 Yes. The rate of change in density with time has opposite sign to the 

slope of ρv. 

3.12 Rate of matter increase is 1 + 2t 2 − 2t/(1 + t 2 ). Total is 1 + 2 − log 2. 

3 

3.15 (a) ρ = 17.33 cars per km and c = 40.40 km per hour (b) ρ = 81.39 

cars per km and c = −11.17 cars per km. 

Prob. 1.1 (a) ρ = 1 (2n + 1) where n = [ ] 

L 

L 2 ; (b) p = 

2n+1 

{2 + L 10−6 n(n + 

1)/3} 

Prob. 1.3 (a) Plot x = 1/(1 + t); (b) v L = −ξ 2 /(1 + ξt) 2 and a L = 

2ξ 3 /(1 + ξt) 3 ; (c) v E = −x 2 (d) determine Dv E /Dt = 2x 3 . 

Prob. 1.5 The label ξ is constant for each particle.


Prob. 2.5 (a) ∂s 

∂x 

shock! 

= 1/[1 + c′ ∂s 

0 (s)t] and = −c 

∂t 0 (s)/[1 + c ′ 0 (s)t]] (b) A 

Prob. 2.6 (a) Because the radioactive material is conserved. (b) characteristics 

are x = (s + t)(1 + t 2 ) (c) follows from s = x/(1 + t 2 ) − t 

(d) characteristics stay the same, but there is decay along each characteristic.


3.4 Summary 

• The continuum approximation leads to describing density, velocity and 

stress/pressure fields, for example, as functions of position x and time t 

(§3.1). 

• Conservation of material leads to the continuity equation (§3.2) 

∂ρ 

∂t + ∂ 

∂x (ρv) = 0 . 

• Typically, experimental observations are needed to complete the set of 

continuum equations. For example, v = V (ρ) for cars (§3.3). 

• Linearisation of dynamical equations about convenient equilibria leads 

to approximate solutions which allow us to make useful predictions 

about the dynamics that occur in the applications. 

• The method of characteristics, based upon the chain rule and a graphical 

approach, leads to exact solutions of the partial differential equations 

describing nonlinear waves and shocks.

Module 4 

The dynamics of momentum 

Mass conservation is just one powerful principle in modelling the dynamics 

of material. Another fundamental principle for mechanical systems is the 

conservation of momentum. This is explored in this module and applied to 

the dynamics of gases and blood. 


4.1 Conservation of momentum . . . . . . . . . . . . . . . 115 

4.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 116 

4.2 Dynamics of ideal gases . . . . . . . . . . . . . . . . . . 119

Module 4. The dynamics of momentum 114 

4.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 120 

4.3 Equations of quasi-one-dimensional blood flow . . . . 124 


4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 125


4.1 Conservation of momentum 

Balls in flight and other bodies obey Newton’s laws of motion, in particular 

F = ma, or in words, the net applied force F causes a body with mass m 

to move with acceleration a. These rules apply when the body is rigid. But 

many bodies are not. Many materials flex and compress or expand. What 

rules apply then? We find out in this section that Newton’s laws still apply 

in a novel manner. 

Main aims: This section largely repeats the arguments for conservation 

of mass (§3.2) although applied in a more sophisticated fashion. The most 

important aims are: 

• to understand how identifying the physical processes in and on a slice 

of continuum leads to a partial differential equation; 

• to derive the momentum equation 

ρ Dv ( ∂v 

Dt = ρ ∂t + v ∂v ) 

= F + ∂σ 

∂x ∂x . (4.1) 

Reading 4.A Study Section 3.1 [R,pp47–51].



Activity 4.B Do Problems 3.1–2 [R,pp51–2] and the problems below. Send 

in to the examiner for feedback at least Exercises 4.1 and 4.3 below. 

Ex. 4.1: Consider a body extending over some range of x which is initially 

at rest and then is accelerated into motion by the gravitational body 

force F = −gρ. Show that v = −gt (independent of x) satisfies the 

momentum equation (4.1) and explain why this describes a body falling 

freely. 

Ex. 4.2: A material moves according to the velocity field v = 2tx + 1 + t 2 

1+t 2 

and has a constant density field ρ. How much momentum is in the 

interval [0, 1] of the material? As a function of time t, what is the rate 

at which momentum enters the interval [0, 1] through being carried 

across the ends x = 0 and x = 1? 

Ex. 4.3: A material body has no applied body force, F = 0, but has an internal 

pattern of stress σ(x) shown below. Sketch the resultant material 

acceleration. For simplicity assume the body has constant density in x 

at this particular time.


1 

stress σ 

0 

-1 

0 1 2 3 4 5 6 7 8 

x 

Example 4.4: worked Problem 3.3 In outline [R,p52]. 

(a) The continuity equation ∂ρ + ∂ (ρv) = 0 when ρ is constant reduces 

∂t ∂x 

to just ∂v = 0, which implies v may not depend upon x and hence 

∂x 

only depends upon t. 

(b) With stress σ = −p, F = −Cv and v independent of x, the momentum 

equation (3.2) reduces to 

• ∂p = − ( ) 

Cv + ρ ∂v 

∂x ∂t which, since the right-hand side is independent 

of x, is integrated to 

• p = − ( ) 

Cv + ρ ∂v 

∂t x + D for some integration constant D 

independent of x, 

• and thus observe the pressure is linear in x. 

(c) Substituting x = 0 determines D = p 0 (t). Substituting x = L then 

determines the given differential equation for v(t).


(d) The differential equation is a linear, first order differential equation 

and so may be solved by multiplying by the integrating factor 

e Ct/ρ . Treating p L−p 0 

as a constant, obtain the solution 

L 

v = p L − p 0 

LC 

( 

1 − e 

−Ct/ρ ) . 

Interpret the solution to see that the flow exponentially quickly 

approaches the steady, long-term flow-rate of (p L − p 0 )/(LC) representing 

a balance between driving pressure drop, p L − p 0 , and 

total viscous drag, LC.


4.2 Dynamics of ideal gases 

Perhaps the next simplest mechanical continuum is that formed by ideal 

gases. For example, air is an ideal gas to a very good approximation. Here 

we show how to use the two conservation equations, one for material and one 

for momentum, to deduce the nature and propagation of sound. We extend 

the analysis to a description of a sonic boom such as that generated by a 

supersonic plane. 

Main aims: 

• to understand the need to supplement the partial differential equations 

by an equation of state; 

• to see the wave equation arise in the linearised dynamics of the mathematical 

model 

∂ 2 u 

∂t 2 

= c2 ∂2 u 

∂x 2 . (4.2) 

• to show further use of the basics of the method of characteristics. 

Reading 4.C Study Section 3.2 [R,pp52–8]. 

Note: in [R,p55], 

twice g(x − c ∗ t) 

should read 

g(x + c ∗ t).



Activity 4.D Do Problem 3.4 [R,p58] 

Example 4.5: worked Problem 3.5 In outline [R,p58], fill in the details. 

(a) Reproduce the argument on [R,pp24–5] for velocity v instead of ρ, 

and for k + v instead of c(ρ). 

(b) Draw a characteristic diagram for 0 ≤ x ≤ 4.5 and 0 ≤ t ≤ 4 as in 

Figure 4.1. 

• Characteristics emanating from the x-axis (x > 0) have slope 

k(= 1) as the velocity v = 0 on them all from the initial state. 

For example, the characteristic emanating from (x, t) = (1, 0) 

is the line x = 1 + t, and on this line we know that v = 0. 

• Characteristics emanating from the t-axis (t > 0) have differing 

slopes of 1 + v for the prescribed v. For example, 

the characteristic emanating from (x, t) = (0, 1/2) is the line 

x = ( ) 

1 + 1 

2π (t − 1/2), and on this line v = 1/2π. 

By looking at the value of v on each characteristic at each of 

the two times t = 2 and t = 4, draw the solution curves for the 

velocity v as seen in Figure 4.2. Evidently from the intersection 

of the characteristics the first shock needs to form at some time 

between roughly t = 2 and t = 2.5. 

A Matlab 

program to animate 

the characteristic 

solution is in 

fanreal.m, try it.

t 


4 

3.5 

3 

2.5 

2 

1.5 

1 

0.5 

0 

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 

x 

Figure 4.1: characteristic diagram for a fan blowing air into a long pipe.


0.2 

t=2 

0.1 

v 

0 

-0.1 

-0.2 

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 

x 

0.2 

t=4 

0.1 

v 

0 

-0.1 

-0.2 

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 

x 

Figure 4.2: velocity field predicted at two times by the characteristic solution. 

Note the multi-valued solution indicating the need for “shocks”.


Exercise 4.6: For an ideal gas with γ = 1 the continuity and momentum 

equations are 

∂ρ 

∂t + ∂(ρv) 

[ ∂v 

∂x = 0 and ρ ∂t + v ∂v ] 

= −k 2 ∂ρ 

∂x ∂x . 

Linearise about the fixed point v = 0 and ρ = ρ ∗ and then combine the 

linearised equations to deduce that sound, density-velocity fluctuations, 

obey the wave equation 

∂ 2ˆv 

∂t 2 

= k2 

∂2ˆv 

∂x 2 .


4.3 Equations of quasi-one-dimensional blood flow 

Main aims: 

• to generalise the continuity and momentum equations to situations 

where the cross-sectional area of a continuum varies in space and time; 

• to see how to model the dynamics of blood flowing through an elastic 

artery by the forced wave equation. 

Reading 4.E Study Section 5.1–2, [R,pp111–123]. 

Activity 4.F Do Problems 5.1–2, [R,pp123–4]. Send in to the examiner for 

feedback at least Prob. 5.1. 


4.2 ρ ( t 

+ 1 + t 2) , −ρ [ ] 

2t + 4t2 

1+t 2 (1+t 2 ) 2 

Prob. 3.1 (a) F i+1 = −(4 − i)mg 

(b) σ = −(L − x)ρg 

Prob. 3.2 ∂(ρv) + ∂(ρv2 ) 

∂t ∂x 

= F + ∂σ 

∂x + ru 

Prob. 3.4 (a) ρ = [ ρ 2/5 

0 − 2 5 gx/k2] 5/2 

(b) ∂v + ( ρ 1/5 

∂t ∗ k + 6v/5 ) ∂v 

= 0 ∂x


4.4 Summary 

• The principle of conservation of momentum leads to the momentum 

equation (§4.1) ( ∂v 

ρ 

∂t + v ∂v ) 

= F + ∂σ 

∂x ∂x . 

• Typically, experimental observations are needed to complete the set of 

continuum equations. For example, σ = −p ∝ −ρ γ−1 for gasses (§4.2). 

• Linearisation of dynamical equations about convenient equilibria leads 

to approximate solutions which allow us to make useful predictions 

about the dynamics that occur in the applications. 

• In situations where the continuum varies in cross-sectional area, say 

A(x, t), the continuity equation becomes (§4.3) 

∂ 

∂t (Aρ) + ∂ 

∂x (Aρv) = 0 , 

and the momentum equation is 

[ ∂v 

Aρ 

∂t + v ∂v ] 

= F 1 + ∂ 

∂x ∂x (Aσ) . 

• The material and muscles of an artery suggest a linear model, Hooke’s 

law for arteries, p = p ∗ +α(R−R ∗ )+P (x, t), relating the pressure (−σ) 

to the varying radius of the artery and including the muscle applied 

pressure P (§4.3).

Part II 

Structure, algebra and 

approximation of applied functions

Part contents 

5 The nature of infinite series 129 

5.1 Introduction to summing an infinite series . . . . . . . . . . . 132 

5.2 Establishing when a series converges . . . . . . . . . . . . . . 142 

5.3 Power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 

5.4 Taylor’s theorem in n-dimensions . . . . . . . . . . . . . . . . 166 

5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 

6 Series solutions of differential equations give special functions 

185 

6.1 Power series method leads to Legendre polynomials . . . . . . 189 

6.2 Frobenius method is needed to describe Bessel functions . . . 195 

6.3 Computer algebra for repetitive tasks . . . . . . . . . . . . . . 206

PART CONTENTS 128 

6.4 The orthogonal solutions to second order differential equations 245 

6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 

7 Linear transforms and their eigenvectors on inner product 

spaces 252 

7.1 Inner product spaces . . . . . . . . . . . . . . . . . . . . . . . 255 

7.2 The nature of linear transformations . . . . . . . . . . . . . . 269 

7.3 Revision of eigenvalues and eigenvectors . . . . . . . . . . . . 284 

7.4 Diagonalisation transformation . . . . . . . . . . . . . . . . . 289 

7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

Module 5 

The nature of infinite series 

Quite often we use power series to approximately solve differential equations 

(see the next module). For example, an exact solution to the differential 

equation y ′′ = 6y 2 is y = 1/(1+x) 2 . But suppose, using techniques developed 

in the next module, we knew only the power series approximate solution 

y = 1 − 2x + 3x 2 − 4x 3 + 5x 4 − · · · : how can we sensibly ascribe a value to 

such an infinite sum? 

This module will focus on the following question: 

How is it possible, quite generally, to add up infinitely many numbers 

and still obtain a sum which is finite and sensible?

Module 5. The nature of infinite series 130 

A set of infinitely many numbers will be called a ‘sequence’ and when the 

numbers in a sequence are added together they are said to form an ‘infinite 

series’. If an infinite series has a finite sum it is said to ‘converge’ and if not, 

to ‘diverge’. 

Though not couched in these terms, our question has a long history in mathematics, 

beginning with the work of the Greek philosopher Zeno of Elea 

in the 5th century B.C. Zeno is noted for having posed four ‘paradoxes’ 

which showed that in order to understand fundamental concepts like motion, 

change, continuity and infinity, one must resolve questions like the one we 

have before us. In turn, it was essential that these concepts be placed on 

a firm mathematical foundation to allow the complete development of differential 

and integral calculus, begun by Newton and Leibnitz in the 17th 

century. 


5.1 Introduction to summing an infinite series . . . . . . 132 

5.1.1 Zeno’s Second Paradox: Achilles and the Tortoise . . . . 133 

5.1.2 Case studies: using partial sums . . . . . . . . . . . . . 134 

5.1.3 Case study: the harmonic series diverges . . . . . . . . . 139 

5.2 Establishing when a series converges . . . . . . . . . . 142 

5.2.1 Absolute and conditional convergence . . . . . . . . . . 143 

5.2.2 Tests for the convergence of series . . . . . . . . . . . . 145


5.3 Power series . . . . . . . . . . . . . . . . . . . . . . . . . 147 

5.3.1 Functions from power series . . . . . . . . . . . . . . . . 151 

5.3.2 Taylor and Maclaurin Series . . . . . . . . . . . . . . . . 152 

5.3.3 Truncation error for Taylor series . . . . . . . . . . . . . 155 

5.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 164 

5.4 Taylor’s theorem in n-dimensions . . . . . . . . . . . . 166 

5.4.1 Identify local maxima and minima . . . . . . . . . . . . 170 

5.4.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 180 


5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 183


5.1 Introduction to summing an infinite series 

Suppose then that we have an infinite sequence of real numbers and, for 

simplicity, that all the numbers are positive. Intuitively, it is clear that if 

all the numbers remain about the same size, or if they progressively increase 

in size, such as 1, 2, 3, 4,. . . , then when the numbers are added together 

their sum will grow without limit. On the other hand, if the numbers grow 

progressively smaller in size, so that when the numbers are added together 

each successive number contributes less and less to the overall sum, such as 

1, 1/2, 1/4, 1/8, . . . , then it might be possible for the sum to remain finite. 

Now suppose that we allow the sequence to contain both positive and negative 

numbers. The negative numbers will tend to cancel out the contributions 

which the positive numbers make to the sum. If the negative numbers are 

randomly interspersed among the positive numbers, then the effect that they 

might have on the sum is difficult to assess. However, in many practical 

situations, the negative terms alternate with the positive ones and this is 

easier to handle. These intuitive ideas will be developed more fully below. 

Main aims: 

• introduce some examples of summing an infinite series; 

• show examples of when a sum cannot be found.


5.1.1 Zeno’s Second Paradox: Achilles and the Tortoise 

Consider just one of Zeno’s paradoxes which, in modern units, could be 

expressed as follows. 

Achilles, who runs 10 times faster than a tortoise, set off to chase one 

100 metres away. At the same time the tortoise began to crawl away from 

him. By the time Achilles reached the point where the tortoise started, the 

tortoise was 10 m away. Achilles continued the chase but, upon reaching the 

tortoises previous position, the tortoise had moved and was now 1 m away. 

Achilles continued for another metre, but yet again the tortoise had moved 

further. This apparently continues forever: the tortoise has always moved 

by the time Achilles had reached where it last was. Evidently, Achilles was 

never able to catch the tortoise. 

This conclusion is clearly absurd. We know from experience that tortoises 

are relatively easy to catch. Zeno was concerned with finding the fault in 

his logic. Had he the use of a modern number system, much of his problem 

would have disappeared: the total distance run by Achilles in chasing the 

tortoise is 

100 + 10 + 1 + 0.1 + 0.01 + 0.001 + · · · = 111.11111 . . . metres. 

To suggest that Achilles could not cover this distance is to say that he would 

never be able to run 112m, which he certainly could. The problem is with the 

word never. The ancient Greeks apparently thought that it was impossible


to sum an infinite set of numbers and arrive at a finite sum. This is refuted 

in our number system by the commonplace notion of a recurring decimal, for 

example 

1 

3 = 0.333333 . . . 

= 3 10 + 3 

10 2 + 3 

10 3 + 3 

10 4 + 3 

10 5 + · · · 

where the right-hand side is the sum of an infinite series and the left-hand 

side is its clearly finite sum. Thus this infinite series converges to the value 

1/3. 

This is an example 

of a convergent 

geometric series. 

Definition 5.1 Given an infinite sequence of numbers that we wish to sum, 

say z 1 , z 2 , z 3 , . . . , we define the partial sums S n = ∑ n 

k=1 z k and say that the 

infinite series, the infinite sum, ∑ ∞ 

k=1 z k converges to the value lim n→∞ S n if 

this limit exists. 

5.1.2 Case studies: using partial sums 

Example 5.1: establishing convergence from partial sums. 

show that: 

∞∑ 1 

k(k + 1) = 1 . 

k=1 

Here I


• Begin by considering the sequence {S n } of partial sums: 

S 1 = 

S 2 = 

S 3 = 

1 

(the first term) 

1 × 2 

1 

1 × 2 + 1 (the sum of the first 2 terms) 

2 × 3 

1 

1 × 2 + 1 

2 × 3 + 1 (the sum of the first 3 terms) 

3 × 4 

. 

S n = 

1 

1 × 2 + 1 

2 × 3 + 1 

3 × 4 + · · · + 1 

n(n + 1) . 

• Finding a sum for this series requires that we find a limit for the 

sequence {S n }. To proceed, note that 

then 

S n = 

1 

n(n + 1) = 1 n − 1 

n + 1 

( 1 

1 − 1 ( 1 

+ 

2) 

2 − 1 ( 1 

+ · · · + 

3) 

n − 1 − 1 ) ( 1 

+ 

n n − 1 ) 

. 

n + 1 

• Clearly, all terms cancel except the first and last, a process known 

as a telescopic sum and this leaves: 

S n = 1 − 1 

n + 1 .


• It follows that: 

( 

lim S n = lim 1 − 1 ) 

= 1 . 

n→∞ n→∞ n + 1 

Thus ∑ ∞ 1 

k=1 

below. 

k(k+1) 

converges and its sum is 1, as is seen in the table 

n nth term z n partial sum S n 

1 0.5 0.5 

10 0.0090909090909 0.909090909090 

50 0.0003921568627 0.980392156862 

100 0.0000990099009 0.990099009900 

200 0.0000248756218 0.995024875621 

500 0.0000039920159 0.998003992015 

1000 0.0000009990009 0.999000999000 

10 000 0.0000000099990 0.999900009999 

This result is also displayed graphically using Matlab


1 

0.9 

S n 

0.8 

0.7 

0.6 

S n 

0.5 

0 5 10 15 20 25 30 35 40 45 50 

n 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

1/n 

n=50; 

k=1:n; 

s=cumsum(1./(k.*(k+1))); 

subplot(2,1,1) 

plot(k,s,’+’,k,1+zeros(size(k)),’--’) 


plot(1./k,s,’.’,0,1,’o’) 

The top plot shows the partial sums converging to the limit as n → ∞ 

and the bottom plot shows the same limit, plotted as a circle in the 

top left corner, but perhaps more convincingly as 1/n → 0 (equivalent 

to n → ∞) by plotting S n against 1/n. 

Example 5.2: establishing divergence from partial sums. The series 

∞∑ 

(−1) k+1 = 1 − 1 + 1 − 1 + · · · 

k=1


is divergent, for the partial sum 

n∑ 

S n = (−1) k+1 = 1 − 1 + 1 − 1 + · · · + (−1) n+1 = 

k=1 

{ 

0 , if n is even, 

1 , if n is odd. 

Thus the sequence of partial sums is {1, 0, 1, 0, . . .} which has no limit. 

This example provides a good illustration of the absurdities which can 

arise from supposing that a limit exists when, in fact, it does not. 

• Suppose that the above series has a limit, say S, then 

S = 1 − 1 + 1 − 1 + 1 − 1 + · · · 

= 1 − (1 − 1 + 1 − 1 + 1 − · · ·) 

= 1 − S 

⇒ 2S = 1 

⇒ S = 1 2 . 

• However, it is equally valid (actually equally invalid) to argue 

S = 1 − 1 + 1 − 1 + 1 − 1 + · · · 

= (1 − 1) + (1 − 1) + (1 − 1) + · · · 

= 0 , 

• or again, 

S = 1 − (1 − 1) − (1 − 1) − (1 − 1) − · · · 

= 1 . 

In this context 

divergence has 

nothing to do with a 

differential operator! 

It means that an


• We cannot sensibly ascribe any particular value to the sum and 

hence we say that the series is divergent. 

5.1.3 Case study: the harmonic series diverges 

The infinite series 

∞∑ 

k=1 

1 

k = 1 + 1 2 + 1 3 + 1 4 + · · · 

is called the harmonic series. We will show in a moment that the harmonic 

series diverges, which is important in connection with Kreyszig’s caution 

[K,p735] that the terms in a series getting inexorably smaller, z k → 0, is not 

a sufficient condition for the series to converge. 

Example 5.3: The harmonic series is divergent The proof is by contradiction, 

i.e. assume that the series converges to some value H, and 

show that this leads to a contradiction. 

Let H = 1 + 1 2 + 1 3 + 1 4 + 1 5 + 1 6 + 1 7 + · · · , 

E = 1 2 + 1 4 + 1 6 + · · · , 

O = 1 + 1 3 + 1 5 + 1 7 + · · · . 

E represents the 

sum of the even 

terms and O the 

sum of the odd 

terms. Since they 

are both sub-sets of 

H, they must 

converge if H does.


Now observe three facts which form the contradiction. 

• Since the harmonic series has simply been partitioned into a series 

of its even terms and a series of its odd terms, we must have 

H = E + O . 

• Since for all n, the nth term of O is larger than the nth term of 

E, it follows that 

O > E 

which means that O contributes more than half of the total of H, 

so that E must contribute less than half of the total. 

• Taking a common factor of 1/2 out of each term of E allows us to 

rewrite E as 

E = 1 (1 + 1 2 2 + 1 3 + 1 4 + 1 5 + 1 ·) 

6 + · · , 

or E = 1 2 H , 

which contradicts the previous observation that E must be less 

than half H. 

In spite of the fact that 1/k → 0 as k → ∞, the harmonic series is 

divergent, a famous example originally discovered by Nicole d’Oresme 

in the 14th century. It should be noted however, that the harmonic 

series diverges very slowly: after fifteen thousand terms the sum has


grown to 10.1931 and after one million terms, to only 14.3927, yet it 

does diverge! 

5 

4 

S n 

3 

2 

S n 

1 

0 5 10 15 20 25 30 35 40 45 50 

n 

4.5 

4 

3.5 

3 

2.5 

2 

1.5 

1 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

1/n 

n=50; 

k=1:n; 

s=cumsum(1./k); 


plot(k,s,’+’,k,log(2*k+1),’--’) 


plot(1./k,s,’+’)


5.2 Establishing when a series converges 

Main aims: 

• introduce the two types of convergence when summing an infinite series: 

absolute convergence is robust, and conditional convergence which is, 

in a sense, marginal. 

• develop and use three tests for convergence of the sum of a series. 

Reading 5.A Study Section 14.1 in Kreyszig [K,pp732–40]. 

Note: 

• Chapter 14 deals with sequences and series of complex numbers, but 

the same theory applies if the numbers are real. 

• Remember the distinction between a sequence and a series: an infinite 

series is summed to give a sequence of partial sums. 

• Cauchy’s convergence principle for series also applies to a sequence in 

the form, paraphrasing that on [K,p735], that


Theorem 5.2 A sequence S n converges if and only if for every ɛ > 0 

(no matter how small), we can find an N (depending upon ɛ in general) 

such that |S n − S m | < ɛ for all n, m > N. 

Cauchy’s principle is extremely useful, especially in more difficult problems, 

because we can test rigorously for convergence without actually 

knowing the value of the limit to which the sequence or series converges! 

But we will see little of this aspect in this unit. 

5.2.1 Absolute and conditional convergence 

The notions of absolute convergence and conditional convergence are well 

illustrated by contrasting the harmonic series, which diverges, with the alternating 

harmonic series, 

∞∑ 

(−1) k+1 1 

k=1 

k = 1 − 1 2 + 1 3 − 1 4 + · · · , 

which converges [K,p736, Example 3], but only just.


1 

0.9 

S n 

0.8 

0.7 

0.6 

S n 

0.5 

0 5 10 15 20 25 30 35 40 45 50 

n 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

1/n 

n=50; 

k=1:n; 

s=cumsum((-1).^(k-1)./k); 


plot(k,s,’+’,k,log(2)+zeros(1,n),’--’) 


plot(1./k,s,’+’,0,log(2),’o’) 

Essentially, the alternation of sign produces some degree of cancellation in 

successive terms which is sufficient to allow the series to converge whereas 

the harmonic series, which has terms of the same size but all positive, fails 

to converge. In this situation the convergence is conditional. 

On the other hand, an absolutely convergent series such as 

∞∑ 

(−1) k+1 1 k 2 

k=1 

converges absolutely because the sum of the absolute values of terms 

∞∑ 

1 ∣ ∣∣∣ ∞∑ 1 

∣ (−1)k+1 = 

k 2 k 2 

k=1 

k=1


converges, though obviously to a different sum (see Using the Comparison 

test in §§5.2.2). Here its terms z k → 0 fast enough to ensure convergence 

even though all terms are positive. 

5.2.2 Tests for the convergence of series 

The comparison test, ratio test and root test which Kreyszig establishes in 

Theorems 5–10 of §14.1 are very useful tools in determining whether a given 

series converges. Notice that they do not tell you what the sum of the series 

may be, other methods are needed for that. The ratio test is the most 

important of these. 

Often geometric series are useful in applications of the comparison test since 

their convergence is easily established [K,Theorem 9, p739]. 

Example 5.4: using the Comparison test. The series ∑ ∞ 

k=1 1/k 2 is convergent, 

for 

• a sneaky way to write this is 

∞∑ 

k=1 

• Now observe that 

1 

k 2 = 1 + ∞ ∑ 

k=2 

1 

k 2 = 1 + ∞ ∑ 

k=1 

1 

(k + 1) 2 < 1 

k(k + 1) , 

1 

(k + 1) 2 .


• then ∑ ∞ 

k=1 1/(k + 1) 2 converges by comparison with ∑ ∞ 

k=1 1/[k(k + 1)] 

which was shown to converge in the worked example in §§5.1.2. 

• Hence the series ∑ ∞ 

k=1 1/k 2 converges as displayed below. 

1.8 

1.6 

S n 

1.4 

1.2 

S n 

1 

0 5 10 15 20 25 30 35 40 45 50 

n 

1.8 

1.6 

1.4 

1.2 

1 

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

1/n 

n=50; 

k=1:n; 

s=cumsum(1./k.^2); 


plot(k,s,’+’,k,pi^2/6+zeros(1,n),’--’) 


plot(1./k,s,’+’,0,pi^2/6,’o’) 

Activity 5.B Do examples from Problem Set 14.1 [K,p730]. Send in to the 

examiner for feedback at least Q3, 7, 12 & 13.


5.3 Power series 

We are interested in power series such as the “solution” y = 1 − 2x + 3x 2 − 

4x 3 + 5x 4 − · · · of the differential equation (1 + x) 2 y ′′ = 6y. This power 

series and its properties will depend upon x, for example: at x = 0 it is 

y = 1 − 0 + 0 − 0 + 0 − · · · which trivially converges to y = 1; at x = 1 it 

clearly diverges as the terms in the series 1 − 2 + 3 − 4 + 5 − · · · increase in 

magnitude; at x = 1/2 it converges and so we might say y(1/2) ≈ 1 − 1 + 

3/4 − 1/2 + 5/16 = 9/16, but what then is the error? how good may we 

expect the linear approximation, y = 1 − 2x? This section addresses these 

questions: 

• how does convergence depend upon x in such a power series? 

• what sort of error may we expect in any finite truncation of the infinite 

series? 

Main aims: 

• to show that within their domain of convergence, power series define 

well-behaved functions of x (or z); 

• conversely, the Taylor or Maclaurin series of a function generally converges 

to the function in some domain;


• to deduce an expression that usefully estimates the error in using a 

Taylor series approximation. 

A power series is an infinite series with terms that involve a variable; Kreyszig 

uses a complex variable z, but the theory applies equally to real power series, 

where we might use x to represent a real variable. Thus a power series like 

∞∑ 

a n z n = a 0 + a 1 z + a 2 z 2 + · · · 

n=0 

involves both constant coefficients, a 0 , a 1 , a 2 ,. . . , and increasing powers of 

a complex or real variable z, roughly like an “infinite polynomial”. Notice 

that we start the summation at n = 0 to allow for a constant term a 0 , not 

depending on z, but the convergence or divergence of the resulting series is 

determined by the value of z, as well as by the coefficients. 

Reading 5.C Study [K,pp741–5, §14.2], particularly Radius of convergence. 

Example 5.5: Write down the centre and determine the radius of convergence 

of the power series 1 − 2x + 3x 2 − 4x 3 + 5x 4 − · · ·.


Solution: Clearly this has centre of expansion x = 0 as it is written 

in powers of x = (x − 0). To determine its radius of convergence note 

that the power series is ∑ ∞ 

n=0 (n + 1)(−1) n x n ; that is, its nth coefficient 

is a n = (−1) n (n + 1). Use the ratio test 

a n+1 x n+1 ∣ ∣∣∣∣ = 

∣ a n x n 

∣ 

(−1) n+1 (n + 2)x n+1 

(−1) n (n + 1)x n ∣ ∣∣∣∣ 

= 

∣ n + 2 ∣∣∣ ∣ 

n + 1 x → |x| as n → ∞ , 

which is less than 1 if and only if |x| < 1. Thus the radius of convergence 

is R = 1 and we expect the power series to usefully converge for −1 < 

x < 1. 

This analysis holds 

if x is either real or 

complex. 

Sometimes a power series only involves only even or odd powers of x−c (or x) 

in which case the radius of convergence is best determined from that in terms 

of (x − c) 2 (or x 2 ). The following example shows the sort of considerations 

that could be applied. 

Example 5.6: convergence in x 2 Consider the power series for 

sin x = x − 1 6 x3 + 1 

120 x5 − · · · = ∑ 

n odd 

(−1) (n−1)/2 

and show it converges for all x. A direct application of the ratio test 

fails because the ratio of consecutive terms is either 0 or ∞ as all the 

n! 

x n ,


terms in even powers are zero! However, recast the series as 

sin x = 

= x 

∞∑ (−1) n 

(2n + 1)! x2n+1 

n=0 

∞∑ 

n=0 

1 

(2n + 1)! zn 

upon letting z = −x 2 and extracting the common factor of x from the 

series. Then it is straightforward to show that the series ∑ ∞ 1 

n=0 (2n+1)! zn 

converges for all z from the ratio test: 

a n+1 z n+1 ∣ ∣∣∣∣ = 

∣ a n z n 

∣ 

z n+1 /(2n + 3)! 

z n /(2n + 1)! 

∣ ∣∣∣∣ 

∣ = z 

(2n + 3)(2n + 2) ∣ → 0 as n → ∞ , 

for all z. Since it converges for all z = −x 2 , the original series must 

correspondingly converge for all x. 

Other substitutions may be used to analyse the convergence of power series 

with other patterns of zero terms. 

Activity 5.D Do problems in Problem Set 14.2 [K,p745]. Send in to the 

examiner for feedback at least Q2 & 4.


5.3.1 Functions from power series 

The key point of this subsection is that at every point z for which such a 

power series converges, we can use its sum to define the value of a function 

f(z). 

∞∑ 

f(z) = a n z n = a 0 + a 1 z + a 2 z 2 + · · · (5.1) 

n=0 

Kreyszig shows that such functions f(z), called analytic functions, have nice 

properties: they are continuous, differentiable and integrable at every point 

inside their radius of convergence. Also their derivatives and integrals are 

found exactly as you would hope to, by differentiating or integrating the 

power series term-by-term. 

Reading 5.E Study all of §14.3 [K,pp746–8] except for the subsection Power 

series represent analytic functions which you need only read. 

Exercise 5.7: Suppose function f(x) defined by a power series in (x − c) 

with some nonzero radius of convergence R: 

f(x) = 

∞∑ 

a k (x − c) k 

k=0 

= a 0 + a 1 (x − c) + · · · + a n (x − c) n + · · · 

∀x such that |x − c| < R . 

Recall that ∀ is 

short for “for all.”


By differentiating f repeatedly with respect to x and evaluating each 

derivative at x = c, show that 

f (n) (c) 

n! 

= a n for n = 0, 1, 2, . . . . 

Given that we finally have established convergence of an infinite sum and that 

we can differentiate a power series this exercise can now be done. It most 

importantly establishes that the power series representation of any function 

f(x) about x = c is unique and is its Taylor series. 

Activity 5.F Do the above exercise and problems in Problem Set 14.3 

[K,pp750–1]. Send in to the examiner for feedback at least Q3 & 4. 

5.3.2 Taylor and Maclaurin Series 

Two English mathematicians, Brook Taylor (1685–1731) and Colin Maclaurin 

(1698–1746) pioneered this work for real power series. Taylor presented 

his results for power series in (x − c) while Maclaurin’s name is associated 

with power series in x.


If a function f can be represented by a power series in (x − c), with radius 

of convergence R, then 

f(x) = f(c) + f ′ (c)(x − c) + f ′′ (c) 

2! 

(x − c) 2 + · · · + f (n) (c) 

(x − c) n + · · · 

n! 

for all x such that |x − c| < R. This series representation is called the Taylor 

series in (x − c) of the function f. 

You have shown in Exercise 5.7 that if f has a power series representation 

then it must be the Taylor series, i.e. there is only one power series in (x − c) 

to correspond to a given function f. 

When c = 0, the Taylor series gives a power series in x called the Maclaurin 

series. The Maclaurin series representation of function f is: 

f(x) = f(0) + f ′ (0)x + f ′′ (0) 

2! 

x 2 + · · · + f (n) (0) 

x n + · · · 

n! 

for all x such that |x| < R. Note that in the Maclaurin series, all the 

derivatives of f are evaluated at 0, and the interval of convergence has its 

centre at 0. 

The Taylor series in (x−c) of a function f is usually referred to as the ‘Taylor 

series expansion of f about c’, while the Maclaurin series of f is the ‘Taylor 

series expansion of f about 0’.


Example 5.8: finding a Maclaurin series. Assuming that f(x) = e x can 

be represented by a power series in x, we find its Maclaurin series as 

follows. Firstly, find f and its derivatives at x = 0: 

Hence the Maclaurin series is: 

f(x) = e x ⇒ f(0) = 1 

f ′ (x) = e x ⇒ f ′ (0) = 1 

f ′′ (x) = e x ⇒ f ′′ (0) = 1 

. 

f(x) = e x = 1 + x + x2 

2! + x3 

3! + x4 

4! + · · · = ∞ ∑ 

. 

k=0 

x k 

k! . 

Reading 5.G Study the part of §14.4 [K,pp754–7] from Power series as Taylor 

series to the end of the section inclusive. 

Note the pivotal role of their power series properties of uniqueness, differentiability 

and integrability. 

Exercise 5.9: Find the radius and interval of convergence for the power 

series 

∞∑ (x − 2) n+1 

f(x) = 

(n + 1)3 . n+1 

n=0


Find the sum of the series for f(x), thus writing an expression for f(x) 

not involving an infinite series. Hint: consider f ′ (x). 

Activity 5.H Do problems from Problem Set 14.4 [K,pp757–9]. Send in to 

the examiner for feedback at least Q2, 10 & 19. 

5.3.3 Truncation error for Taylor series 

Some of the earliest work on power series was done by the Scots mathematician 

James Gregory (1638–1675). He developed a power series method for 

interpolating table values for functions. The idea of using power series to 

estimate function values remained a prime motivation for later workers like 

Taylor. For example, putting x = 1 in the Maclaurin series for e x we obtain: 

e = exp(1) = 1 + 1 + 1 2! + 1 3! + 1 4! + · · · 

∞∑ 1 

= 

k=0 

k! 

≈ 2.718281828459045235360287 . . . . 

Now e is a transcendental number, i.e. it is not the root of any algebraic 

equation and its value is an infinite, non-recurring decimal. In fact the only


way of representing the number e exactly is as the sum of an infinite series. 

To estimate its value though, we have to take a partial sum of the series 

and in doing so we make a truncation error. With computers, it is now 

possible to compute e to hundreds, thousands, or even millions of decimal 

places. This is far greater accuracy than was ever dreamed of by Gregory, 

but every expansion involves an error and we should know something about 

these errors. 

Consider the Taylor series of a function f about c: 

f(x) = f(c) + f ′ (c)(x − c) + f ′′ (c) 

(x − c) 2 + · · · + f (n) (c) 

(x − c) n + · · · 

2! 

n! 

∀x such that |x − c| < R . (5.2) 

Truncate the series after terms up to order n to form an nth degree polynomial 

approximation to f(x): 

P n (x) = f(c) + f ′ (c)(x − c) + f ′′ (c) 

2! 

(x − c) 2 + · · · + f (n) (c) 

(x − c) n , 

n! 

where P n (x) is called the Taylor polynomial of degree n for f at c. 

truncation error made in such an approximation is: 

The 

Taylor polynomials 

are like the partial 

sums of a series. 

R n (x) = f(x) − P n (x) , 

where R n (x) is called the remainder term for an nth order approximation.


Example 5.10: Taylor polynomials approximate the function Consider 

the power series 1 − 2x + 3x 2 − 4x 3 + 5x 4 − · · · discussed earlier 

which we claimed is the power series for y = 1/(1 + x) 2 . The first few 

Taylor polynomials are: 

P 0 (x) = 1 , 

P 1 (x) = 1 − 2x , 

P 2 (x) = 1 − 2x + 3x 2 , 

P 3 (x) = 1 − 2x + 3x 2 − 4x 3 . 

These are plotted below with 1/(1 + x) 2 plotted dashed: 

f(x) and approximations 

3 

2.5 

2 

1.5 

1 

0.5 

P 0 

(x) 

P 1 

(x) 

P 2 

(x) 

P 3 

(x) 

0 

-0.5 0 0.5 

x 

x=linspace(-0.5,0.5); 

p=[ones(size(x)) 

-2*x 

3*x.^2 

-4*x.^3]; 

p=cumsum(p); 

plot(x’,p’,x,1./(1+x).^2,’--’)


Observe that all Taylor polynomials are accurate sufficiently close to 

the centre of expansion x = 0. The error, or remainder, away from 

x = 0 is given by the distance from a curve to the exact dashed line 

and is different for each polynomial. 

Example 5.11: A 1st order Taylor polynomial for f(x) = e x about 

x = 1. 

f(x) = f(c) + f ′ (c)(x − c) + R 1 (x) 

Here f(c) = f ′ (c) = e 1 = e, so 

e x = e + e(x − 1) + R 1 (x) 

= e.x + R 1 (x) 

The following theorem shows one way to estimate the remainder, R n (x). 

Theorem 5.3 (Lagrange’s remainder) let f be a function which has n+1 

derivatives that are continuous on some interval I containing c. Then, for 

every x ∈ I, there exists a number, u, between x and c, such that: 

f(x) = f(c) + f ′ (c)(x − c) + · · · + f (n) (c) 

(x − c) n + R n (x) 

n! 

= P n (x) + R n (x)


where Lagrange’s remainder is 

R n (x) = f (n+1) (u) 

(n + 1)! (x − c)n+1 . (5.3) 

Example 5.12: Lagrange’s remainder Examine the simple example of 

the cubic f(x) = 1 + x + x 3 . It has a Taylor’s series about x = 0 

which is just itself (this is why this example is simple). The linear 

Taylor polynomial approximation to f(x) is simply P 1 (x) = 1 + x. 

By inspection we know that its error is the remainder R 1 (x) = x 3 . 

However, in complicated cases we will not know this and we have to 

see what the theorem can tell us. Here it tells us that there exists a u, 

0 ≤ u ≤ x, such that 

R 1 (x) = f ′′ (u) 

x 2 = 6u 2 2 x2 = 3ux 2 . 

Here, because we already know R 1 (x) = x 3 we identify the correct 

u = x/3 which is indeed between 0 and x. In general we will not know 

R 1 (x) exactly but because 0 ≤ u ≤ x we will be able to say that the 

remainder, the error, R 1 (x) ≤ 3x 3 as 3ux 2 ≤ 3x 3 for 0 ≤ u ≤ x. Thus 

we can often place a bound on the error in a Taylor polynomial. 

Proof:


• since x is a fixed point in I with x ≠ c, let g be a function of t, defined 

as follows: 

g(t) = f(x) − f(t) − f ′ (t)(x − t) − f ′′ (t) 

(x − t) 2 − 

2! 

· · · − f (n) (t) 

(x − t) n (x − t)n+1 

− R n (x) 

n! 

(x − c) . n+1 

The reason for defining g in this way is that differentiating with respect 

to t has a telescoping effect. For example: 

d 

dt [−f(t) − f ′ (t)(x − t)] = −f ′ (t) + f ′ (t) − f ′′ (t)(x − t) 

= −f ′′ (t)(x − t). 

• The net result is that g ′ (t) simplifies to: 

g ′ (t) = − f (n+1) (t) 

n! 

(x − t) n (x − t)n 

+ (n + 1)R n (x) 

(x − c) n+1 

for all t between x and c. Also note that, for fixed x, 

and 

g(c) = f(x) − P n (x) − R n (x) = 0 , 

g(x) = f(x) − f(x) − 0 − · · · − 0 = 0 . 

Thus we have g(c) = g(x) = 0 and g is differentiable between x and c. 

Moreover, g is continuous throughout I, since f and its derivatives are 

continuous. This includes c, x and all points in between.


♠ 

• Therefore, g satisfies the conditions for Rolle’s theorem 1 , and it follows 

that there is a number u between x and c for which g ′ (u) = 0. Now 

substituting t = u in g ′ (t) gives: 

g ′ (u) = − f (n+1) (u) 

n! 

⇒ R n (x) = f (n+1) (u) 

(n + 1)! (x − c)n+1 . 

(x − u) n (x − u)n 

+ (n + 1)R n (x) 

(x − c) = 0 n+1 

Note that when applying this result, we do not expect to be able to find the 

exact value of u. If we could do that, then making an approximation to f 

would not have been necessary. Rather, we try to find bounds for f (n+1) (u) 

from which we can estimate how large the remainder R n (x) might become, 

as in the worked example below. 

Lastly, suppose we approximate a function f by some Taylor polynomial, so 

that: 

f(x) = P n (x) + R n (x) , 

1 Those unfamiliar with Rolle’s theorem should consult either of the following: 

– Mizrahi & Sullivan: Calculus & Analytic Geometry (3rd Edition); Wadsworth 

(1990)–Chapter 11. 

– Larson, Hostetler & Edwards: Calculus (5th Edition); Heath (1994)–Chapter 8.


or equivalently, 

P n (x) = f(x) − R n (x) . 

Taking limits as n → ∞, the left-hand side will give the whole Taylor series 

for f, and on the right, f(x) does not depend on n. Thus a necessary and 

sufficient condition for the Taylor series to converge to f is that: 

lim R f (n+1) (u) 

n(x) = lim 

n→∞ n→∞ (n + 1)! (x − c)n+1 = 0 . 

Example 5.13: determining the accuracy of an approximation. Use a Taylor 

polynomial of degree 5 for sin x about x = 0 to estimate sin(0.1) and 

bound the accuracy of the approximation using Lagrange’s remainder. 

• start by calculating derivatives: 

f(x) = sin x ⇒ f(0) = 0 

f ′ (x) = cos x ⇒ f ′ (0) = 1 

f ′′ (x) = − sin x ⇒ f ′′ (0) = 0 

f ′′′ (x) = − cos x ⇒ f ′′′ (0) = −1 

f (4) (x) = sin x ⇒ f (4) (0) = 0 

f (5) (x) = cos x ⇒ f (5) (0) = 1 

f (6) (x) = − sin x . 

• Now 

sin x ≈ P 5 (x) = x − x3 

3! + x5 

5!


and 

R 5 (x) = f (6) (u) 

x 6 

6! 

= − sin u x6 

6! 

• Using the above to approximate sin(0.1): 

for some number u with 0 ≤ u ≤ 0.1 . 

sin(0.1) ≈ P 5 (0.1) = 0.1 − (0.1)3 + (0.1)5 

3! 5! 

= 0.1 − 0.000166666 − 0.000000083 

and the remainder is given by 

= 0.099833416 

R 5 (0.1) = − sin u (0.1)6 

6! 

• Since the sine function is increasing on the interval [0, 0.1] we must 

have 0 ≤ sin u < 1 so 

−0.000000001 ≈ − (0.1)6 

6! 

and we conclude that 

or 

< R 5 (0.1) = − sin u (0.1)6 

6! 

0.099833416 − 0.000000001 ≤ sin(0.1) ≤ 0.099833416 

0.099833415 ≤ sin(0.1) ≤ 0.099833416 

. 

≤ 0


Activity 5.I Do problems 5.14–5.16 from Exercises 5.3.4. 


Ex. 5.14: Bound the error on the Taylor polynomial P 2 (x) (about x = 0) as 

an approximation to 1/(1 + x) 2 over the interval − 1 2 < x < 1 2 . What 

would the bound be if we were only interested in 0 ≤ x < 1 2 ? 

Ex. 5.15: Use a Taylor polynomial of degree 2 about x = 0 for e x to estimate 

e 0.1 and bound the accuracy of the approximation using Lagrange’s 

remainder. 

Ex. 5.16: Use a Taylor polynomial of degree 3 to estimate f(x) = e 2x at 

x = 0.1, and use Lagrange’s remainder theorem to determine an error 

bound for your estimate. 

Ex. 5.17: Use a Taylor polynomial of degree 4 about x = 0 for log(1 + x) to 

estimate log(1.2) and bound the accuracy of the approximation using 

Lagrange’s remainder (note: log denotes the natural logarithm).


Ex. 5.18: Find the Maclaurin series for the function f(x) = arctan x and 

determine its radius of convergence. Hint: the Maclaurin series for 

1/(1 + x 2 ) = 1 − x 2 + x 4 − x 6 + · · · . 

Ex. 5.19: Consider the function defined by the infinite series 

g(x) = 

∞∑ 

n=1 

[ 

1 

(−1) n + 1 ] 

(x + 1) n . 

n2 n 2 n 

Find the region in which this series converges.


5.4 Taylor’s theorem in n-dimensions 

It is useful to generalise Taylor’s result to functions of several variables. An 

outline of the three variable case is presented below, from which generalisation 

to other cases is straightforward. 

Main aims: 

• generalise Taylor series to many independent variables; 

• use this generalisation to find and characterise maxima and minima of 

functions of many variables. 

Given a function f(x, y, z) we seek an expansion for f(x + h, y + p, z + q) 

at some ‘nearby’ point, where the expansion is written in terms of f and its 

derivatives and powers of h, p and q. 

Exercise 5.20: By setting x−c = h in equation (5.2) show that the Taylor 

series of a real function, f(x), centred at x can be written 

f(x + h) = f(x) + hf ′ (x) + h2 

2! f ′′ (c) + · · · + hn 

n! f (n) (x) + · · · 

assuming, of course, that |h| < R, the radius of convergence of the 

power series at x.


The implication of this expansion is that the value of an analytic function 

at points x + h ‘nearby’ to x is entirely determined by the values 

of f and its derivatives at the point x, and the separation, h. This is 

useful, particularly if the radius of convergence about x is not small. 

Outline of a Taylor’s series for a function of three variables: begin by 

using the single variable Taylor expansion derived in the Exercise 5.20. 

• First, vary x only holding y and z constant, then 

f(x + h, y + p, z + q) = f + h ∂f 

∂x + h2 ∂ 2 f 

2! ∂x + h3 ∂ 3 f 

2 3! ∂x + · · · 3 

where all derivatives and f are evaluated at (x, y + p, z + q). 

Since only one 

variable changes, all 

derivatives are 

partial derivatives. 

• Now hold x and z constant in this series and do the expansion for y +p, 

f and its derivatives are now evaluated at (x, y, z + q). 

• Now hold x and y constant and do the expansion for z + q. Collect together 

all terms with the same total order of differentiation and obtain 

the following result.


f(x + h, y + p, z + q) = 

( 

f + h ∂f 

∂x + p∂f ∂y + q ∂f ) 

∂z 

+ 1 ( 

) 

h 2 ∂2 f 

2! ∂x + ∂2 f 

2 p2 

∂y + ∂2 f 

2 q2 

∂z + 2hp ∂2 f 

2 ∂x∂y + 2hq ∂2 f 

∂x∂z + 2pq ∂2 f 

∂y∂z 

+ 1 ( 

h 3 ∂3 f 

3! ∂x + 2 similar terms + 3 3h2 p ∂3 f 

+ 5 similar terms 

∂x 2 ∂y 

∂ 3 ) 

f 

+ 6hpq 

∂x∂y∂z 

+ · · · 

where f and all its derivatives are evaluated at (x, y, z). This is expressed 

more compactly in terms of the displacement vector H = hi + pj + qk as: 

f(x + h, y + p, z + q) = f + (H · ∇)f + 1 2! (H · ∇)2 f + 1 3! (H · ∇)3 f 

+ · · · + 1 n! (H · ∇)n f + · · · 

Recall from first 

year mathematics 

that the gradient of 

f is ∇f = 

i ∂f 

∂x + j ∂f 

∂y + k ∂f 

∂z . 

where 

H · ∇ ≡ 

( 

h ∂ 

∂x + p ∂ ∂y + q ∂ ) 

∂z 

and (H · ∇) n f means do the operation H · ∇ to f, then to the result, then 

to the result of that etc., until the operation has been done n times. 

Our work on extrema requires only the terms up to second order.


Example 5.21: Find up to the second-order terms of the multi-variable 

Taylor series of f(x, y) = cos x e 2y about (x, y) = (0, 0). 

Solution: “Up to the second-order terms” includes (H · ∇) 2 f but 

excludes all third derivative terms. Now, using subscripts to denote 

partial differentiation: 

• f(0, 0) = 1; 

• f x = − sin x e 2y so f x (0, 0) = 0; 

• f y = 2 cos x e 2y so f y (0, 0) = 2; 

• f xx = − cos x e 2y so f xx (0, 0) = −1; 

• f xy = −2 sin x e 2y so f xy (0, 0) = 0; 

• f yy = 4 cos x e 2y so f yy (0, 0) = 4. 

Hence the second-order truncation of the Taylor series is 

f(h, p) ≈ f + (hf x + pf y ) + 1 2 

= 1 + 2p − 1 2 h2 + 2p 2 . 

( 

h 2 f xx + 2hpf xy + p 2 f yy 

) 

Note: as f(x, y) is the product of a function of x and a function of y, 

namely cos x and e 2y , this answer is quite sensibly the product of the 

two single variable, second-order Taylor polynomials, namely 1 − x 2 /2 

and 1 + 2y + 2y 2 .


Activity 5.J Do Problem 5.23 in Exercises 5.4.2 [p180]. 

examiner for feedback at least part (b). 

Send in to the 

5.4.1 Identify local maxima and minima 

The 3D-surface plotted in the following graph contains several peaks and 

a trough. The highest peak is a global maximum the trough is a global 

minimum and the two smaller peaks are called local maxima. Collectively, 

such points are known as extrema. A local maximum is higher than all points 

nearby, but a global maximum is the highest of all points on the surface. 

Minima are defined analogously.


18 

16 

14 

12 

10 

8 

6 

4 

2 

0 

0 

5 

10 

15 

20 

25 

30 0 

5 

10 

15 

20 

25 

30 

surfc(peaks(40)+8) 

The location and study of extrema is frequently important, for example, 

suppose the height z of the surface above the xy-plane represents the temperature 

of a chemical reaction as quantities x and y of two reactants are 

added, it may be essential to know how high or low the temperature can go 

in order to properly contain the reaction. 

Mathematically, a 3-D surface is represented explicitly as z = f(x, y), or 

implicitly by F (x, y, z) = C for some constant C. In first-year mathematics 

courses we saw that local extrema occur at stationary points where 

∂f 

∂x = ∂f 

∂y = 0 ,


so that all directional derivatives of f vanish at a stationary point, or equivalently 

the tangent plane to the surface is horizontal, which means that the 

normal to the surface must be in the same direction as the z-axis: that is, 

parallel ∇F ‖ k. There are stationary points, called saddle points which 

satisfy these conditions but are nether minima nor maxima. In the following 

figure the origin (0, 0, 0) is a saddle point. In the plane x = 0, moving 

along the dashed line, the origin appears to be a local maximum, but in the 

plane y = 0, along the solid line, a local minimum. The behaviour of nearby 

points depends on the direction in which (0, 0, 0) is approached which defines 

a saddle point. It is neither a local minimum nor local maximum. 

100 

50 

0 

z 

-50 

-100 

-150 

5 

0 

y 

-5 

-5 

x 

0 

x=linspace(-5,5), y=x; 

[X Y]=meshgrid(x,y); 

Z=2*X.^2-5*Y.^2; 

surfl(X,Y,Z) 

5 

Activity 5.K Do Problem 5.24 in the Exercises 5.4.2.


Algebraically, extrema are characterised using Taylor’s formula in n-dimensions. 

For example in 2-D, suppose (a, b) is a local extremum of f(x, y), then compare 

the value of f(a, b) with nearby points f(a + h, b + p), where h, p are 

small: 

• if all nearby values of f are greater than f(a, b) then (a, b) is a local 

minimum; 

• if all nearby values of f are less than f(a, b) then (a, b) is a local maximum; 

• otherwise (a, b) is a saddle point. 

Taylor’s theorem gives 

f(a + h, b + p) = f(a, b) + hf x (a, b) + pf y (a, b) 

+ 1 ( 

h 2 f xx (a, b) + p 2 f yy (a, b) + 2hpf xy (a, b) ) 

2! 

+ higher order terms. 

Subscripts x and y 

to a function f are 

used to denote 

partial derivatives 

with respect to the 

subscript variable. 

Now f x (a, b) = f y (a, b) = 0, since (a, b) is an extremum and terms which 

are cubic and higher order in (h, p) are negligible compared to the quadratic 

term, so 

f(a + h, b + p) − f(a, b) ≈ 1 Q(h, p) (5.4) 

2


where the quadratic terms 

Q = f xx h 2 + 2f xy hp + f yy p 2 = [ h p ] [ ] [ 

f xx f xy h 

f yx f yy p 

] 

= h T Hh , (5.5) 

where all the second-order derivatives are evaluated at (a, b), and where the 

vector h = (h, p). 

Definition 5.4 In (5.5) Q(h) has been written as the quadratic form Q = 

h T Hh: 

• h T Hh is called the Hessian 2 of f at the point (a, b); 

• the symmetric matrix H of second derivatives is called the Hessian 

matrix; 

• such a quadratic form, Q, is said to be positive definite if Q(h) > 0 for 

all h ≠ 0; 

• and is said to be negative definite if Q(h) < 0 for all h ≠ 0. 

From (5.4): 

• if Q is positive definite then f(a + h, b + p) − f(a, b) > 0 (at least near 

enough to (a, b)) and so (a, b) is a local minimum; 

2 Ludwig Otto Hesse introduced these in 1884.


• if Q is negative definite then (a, b) is a local maximum; 

• otherwise, (a, b) could be a saddle point, but it could also mean that we 

need information from the “higher order terms” neglected in forming 

the approximation (5.4). 

Observe that the Hessian matrix, in n-D 

H = 

[ ∂ 2 f 

∂x i ∂x j 

] 

= 

⎡ 

⎢ 

⎣ 

∂ 2 f 

∂x 2 1 

∂ 2 f 

∂x 2 ∂x 1 

. 

∂ 2 f 

∂x n∂x 1 

∂ 2 f 

∂x 1 ∂x 2 

· · · 

∂ 2 f 

∂x 2 2 

∂ 2 f 

∂x n∂x 2 

· · · 

∂ 2 f 

∂x 1 ∂x n 

∂ 2 f 

∂x 2 ∂x n 

· · · 

. .. . 

∂ 2 f 

∂x 2 n 

⎤ 

⎥ 

⎦ 

(evaluated at a stationary point) is symmetric and so has real eigenvalues 

and orthogonal eigenvectors. Recall from first-year mathematics that we can 

thus diagonalise H = P DP T , where the columns of P are the normalised 

eigenvectors of H and where the matrix D is diagonal with the eigenvalues 

of H along its diagonal. Make a change of variable so that the axes of the See Kreyszig §7.5 

r = (r, s) coordinate system are aligned along the principle directions of the 

quadratic Q. An example is seen in the graph below where the r and s axes 

are chosen to fit to the nature of the quadratic (whose contours are shown) 

with Hessian matrix 

] 

H = 

[ 

−8 4 

4 −4 

. 

[K,p392–8] for 

another summary of 

diagonalisation.


1 

s 

0.8 

0.6 

0.4 

0.2 

p 

0 

-0.2 

-0.4 

-0.6 

r 

-0.8 

-1 

-1 -0.5 0 0.5 1 

h 

The appropriate change of variable is 

r = P T h , equivalently h = P r ,


so that in the new coordinate system the quadratic simplifies to give 

Q = h T Hh = r T P T HP r = r T Dr . 

But D is diagonal with diagonal entries the eigenvalues of H: namely D = 

diag(λ 1 , . . . , λ n ) in n-dimensions. Thus in the r coordinate system the quadratic 

is 

Q = λ 1 r 2 1 + · · · + λ n r 2 n . (5.6) 

From this we readily deduce the shape of the quadratic and hence the nature 

of the stationary point: 

• if all eigenvalues of H are positive then Q is positive definite, as all 

terms in (5.6) are positive, and the stationary point is a local minimum; 

• if all eigenvalues are negative then Q is negative definite, as all terms 

in (5.6) are negative, and the stationary point is a local maximum; 

• if some eigenvalues are positive and some are negative then the stationary 

point is a saddle point as we can increase the value of Q by 

moving in some directions and decrease the value of Q by moving in 

other directions; 

• lastly, if the eigenvalues are all positive or all negative except some that 

are precisely zero, then the neglected higher order terms in f need to 

be taken into account.


Example 5.22: analyse the behaviour of z = f(x, y) = x 3 + 4xy − 2y 2 + 8 

at its stationary points. 

Before beginning the analysis Matlab draws the following surface z = 

f(x, y): 

15 

10 

z 

5 

0 

2 

1 

0 

1 

2 

-1 

0 

y 

-2 

-3 

-3 

-2 

-1 

x


Solution: 

First find the stationary points: 

∂f 

∂x = 3x2 + 4y 

∂f 

∂y 

= 4x − 4y 

setting both of these equal to 0 gives x = y and 3x 2 +4x = x(3x+4) = 0. 

So the stationary points are (0, 0) and (− 4, − 4 ). Now find the second 

3 3 

order derivatives: 

Thus 

∂ 2 f 

∂x 2 = 6x , 

∂ 2 f 

∂y 2 = −4 , 

∂ 2 f 

∂x∂y = 4 . 

(0, 0) the Hessian matrix is 

H = 

[ 

0 4 

4 −4 

] 

and hence the characteristic polynomial is 

|λI − H| = λ 2 + 4λ − 16 . 

This is an upwards parabola which is −16 at λ = 0 and hence 

there must be one 0 for negative λ and one for positive λ. Hence 

the two eigenvalues have opposite sign and so (0, 0) is a saddle 

point.


(−4/3, −4/3) the Hessian matrix is 

H = 

[ 

−8 4 

4 −4 

and hence the characteristic polynomial is 

|λI − H| = λ 2 + 12λ + 16 . 

This is an upwards parabola which is +16 at λ = 0 and hence 

both 0’s have to occur for same signed λ. Since the slope of the 

parabola is positive when λ = 0, namely 12, then both 0’s occur 

for negative λ. Hence both (all) eigenvalues are negative and so 

(−4/3, −4/3) is a local maximum. 

] 

Activity 5.L Do problems 5.24–5.25 from Exercises 5.4.2. Send in to the 

examiner for feedback at least Ex. 5.25(a) and (d). 


Ex. 5.23: Find up to the second-order terms of the multi-variable Taylor 

series of the following functions about the specified points:


(a) f(x, y) = cos x e 2y about (π/2, 0); 

(b) f(x, y) = (x + y)/(1 + y) about (0, 0); 

(c) f(x, y, z) = e x√ 1 + y 2 + z 2 about (0, 2, 2). 

Ex. 5.24: If z = f(x, y) = 2x 2 − 5y 2 show that f x (0, 0) = f y (0, 0) = 0 and 

hence that (0, 0) is a stationary point. 

Re-write the equation for the surface in the form F (x, y, z) = C, for 

some constant C, and show that ∇F ‖ k at (0, 0, 0), proving again it 

is a stationary point. 

Ex. 5.25: Find the stationary points of the given functions and then determine 

whether they are local maxima, local minima, or saddle points. 

(a) f(x, y) = x 2 − y 2 + xy 

(b) f(x, y) = x 2 + y 2 − xy 

(c) f(x, y) = x 2 − 3xy + 5x − 2y + 6y 2 + 8 

(d) f(x, y) = log(x 2 + y 2 + 1). 

(e) f(x, y) = x 5 y + xy 5 + xy . 

Ex. 5.26: Find the three stationary points of f(x, y) = x 2 +y 2 +2 cos(x+y) 

and classify the stationary point at the origin. 

Ex. 5.27: Analyse the behaviour of f(x, y) = x 3 + 6xy + 3y 2 + 5 at its 

stationary points.



5.18 x − 1 3 x3 + 1 5 x5 − · · · with radius of convergence 1. 

5.19 −3 < x < 1 

5.23 (a) f(x, y) ≈ −x − 2xy 

(b) x + y − xy − y 2 

(c) 3 + 3x + 2 3 y + 2 3 z + 3 2 x2 + 5 

54 y2 + 5 54 z2 + 2 3 xy + 2 3 xz − 4 27 yz 

5.24 For example F = z − 2x 2 + 5y 2 = 0 whence ∇F = −4xi + 10yj + k = k 

at x = y = z = 0. 

5.25 (a) (0, 0), a saddle point 

(b) (0, 0), a local minimum 

(c) (− 18, − 11 ), a local minimum 

5 15 

(d) (0, 0), a local minimum 

(e) (0, 0), a saddle point.


5.5 Summary 

• Tests like the Comparison test, Ratio test and Root test (§§5.2.2) are 

useful in determining whether a given infinite series converges or diverges 

but they do not establish what its sum may be. For that it may 

be necessary to use direct arguments based on partial sums (§§5.1.2), 

or to resort to direct numerical evaluation. 

• Infinite series that converge absolutely are “robust”, whereas series 

that converge conditionally rely on delicate cancellation of terms in the 

series (§§5.2.1). 

• Power series are complex/real infinite series with terms involving increasing 

powers of a complex/real variable z/x, around a given centre, 

c (§5.3). Generally, they converge (absolutely) within a disc of the 

complex plane, or an interval of the real line, centred at c. The radius 

of this disc is called the radius of convergence, but convergence is not 

guaranteed (and conditional at best) on the edge of the disc. 

• Power series are used to define functions which are continuous, differentiable 

and integrable within their radii of convergence (5.3.1). Conversely, 

a given analytic function, f(z) can be represented by a power 

series expansion about some centre, c, and this expansion is unique, 

being the Taylor series of f about c, or when c = 0, the Maclaurin 

series (§§5.3.2).


• Truncated Taylor series, or Taylor polynomials are used to compute 

approximate values for functions (§§5.3.3). The accuracy of these approximations 

may be estimated with Lagrange’s remainder (5.3). 

• Taylor series are generalised to functions of more than one variable 

(§5.4). This is used, for example, in describing the nature of the stationary 

points of functions of several variables, where the first-order 

derivatives vanish (§§5.4.1). Such points will be local minima, local 

maxima or saddle points depending upon the eigenvalues of the Hessian 

matrix of second-order derivatives. 

Activity 5.M Do representatives of Problems 1–5 and 16–35 from the Chapter 

14 Review [K,pp767–8].

Module 6 

Series solutions of differential 

equations give special functions 

“Although this may seem a paradox, all exact science is dominated 

by the idea of approximation” 

Bertrand 

Russell 

We have seen how linear ordinary differential equations (ode’s) are solved if 

they have constant coefficients (Module 1). Higher-order ode’s are first represented 

as linear systems of first-order ode’s and then the general solution 

will be typically of the form (1.4). The power series solution is the standard 

method for solving linear ordinary differential equations with variable

Module 6. Series solutions of differential equations give special functions 186 

coefficients. It gives solutions in the form of power series, hence the name. 

Power series are also the paramount method for solving otherwise intractable 

nonlinear differential equations. 

How do variable coefficients arise in differential equations? Perhaps it is best 

to first explain how constant coefficients arise. Constant coefficients arise 

because one part of space looks very much like another; thus the mathematical 

expression of the processes at each point in space is the same and hence 

the differential equation modelling the processes is everywhere the same. We 

saw this in earlier modules on continuum mechanics. Conversely, differential 

equations with variable coefficients arise when different points in space have 

different properties. I give two examples: 

• look at the waves near a beach. They curve in towards the beach, 

steepen and break. Let x measure distance from the beach and h(x) 

denote the depth of the water (small near the shore and larger further 

away), then the height, y(x), of the waves satisfies a differential equation 

of the form h(x)y ′′ +h ′ (x)y ′ = · · · with coefficients depending upon 

the local water depth; 

• in finance, the Black-Scholes equation is used to estimate the current 

value of future transactions (see the course on Advanced Mathematics). 

Letting s denote the price of a stock, then the value v(s) satisfies a 

differential equation of the form rsv ′ + 1 2 β2 s 2 v ′′ = · · · where r is the bank 

interest rate and β measures how volatile is the stock. The variable 

coefficients, rs and 1 2 β2 s 2 , arise because returns are relative to the 

investment.


These are two examples of where variable coefficient differential equations 

arise. This module supplies tools for the analytic solution of such variable 

coefficient differential equations. 

In this module we develop not only the general principles and methods, but 

also apply them to differential equations that commonly arise in physical 

problems. In practise all we do is to simply try a power series solution and 

see what solutions we obtain, §6.1. This works except when the coefficient 

of the highest derivative is zero; in this case we are more inventive, §6.2. 

The solutions of these important differential equations have special properties 

that make them widely useful, though perhaps not quite so useful as 

trigonometric and exponential functions. Called Legendre polynomials and 

Bessel functions, these are examples of a wide class of special functions. Inspired 

by these examples we then develop Sturm-Liouville theory, in §6.4, to 

tell us useful and general properties about the solutions of a wide class of 

differential equations. 

We also introduce a little computer algebra, §6.3, to help with the repetitive 

analysis of this module and to attack nonlinear ode’s. 


6.1 Power series method leads to Legendre polynomials 189 

6.1.1 Introduction to the power series method . . . . . . . . . 190 

6.1.2 Legendre’s equation and Legendre polynomials . . . . . 192


6.2 Frobenius method is needed to describe Bessel functions 

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 

6.2.1 Frobenius extends the method . . . . . . . . . . . . . . 196 

6.2.2 Bessel functions are used in circular geometries . . . . . 201 

6.3 Computer algebra for repetitive tasks . . . . . . . . . 206 

6.3.1 Introducing reduce . . . . . . . . . . . . . . . . . . . . 208 

6.3.2 Introduction to the iterative method . . . . . . . . . . . 211 

6.3.3 Iteration is very flexible . . . . . . . . . . . . . . . . . . 222 

6.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 239 

6.3.5 Summary of some reduce commands . . . . . . . . . . 242 

6.4 The orthogonal solutions to second order differential 

equations . . . . . . . . . . . . . . . . . . . . . . . . . . 245 


6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 249


6.1 Power series method leads to Legendre polynomials 

In this first section we introduce the fundamental ideas of the power series 

method. These ideas are applied to standard differential equations that we 

could readily solve other ways. Do not be misled, this is only so that we can 

compare the results to the known solutions. The power series method is very 

powerful and is applied to even immensely difficult mathematical problems. 

Main aims: 

• use the uniqueness of power series representations to derive power series 

solutions of differential equations; 

• see how the method leads to linearly independent power series solutions; 

• find the polynomial solutions of Legendre’s equation as an example of 

the method. 

Note: in this module we will generally seek a solution y as a function of the 

independent variable x.


6.1.1 Introduction to the power series method 

To find power series solutions to differential equations we simply substitute 

a power series and see the logical consequences. In particular, see how neatly 

we get two linearly independent solutions of a second order ode. 

Reading 6.A Study Kreyszig §4.1 [K,pp194–8] and note especially how the 

examples work. 

Activity 6.B Do problems from Problem Set 4.1 [K,p198]—find the general 

solutions in terms of arbitrary “integration” constants. Verify for a few 

of these that the power series method yields the Taylor series expansion 

of the general analytic solution obtained by well known methods. Send 

in to the examiner for feedback at least Q1 & 7. 

Most of the theoretical basis for using power series to represent functions was 

developed in Module 5. 

Reading 6.C Read §4.2 [K,pp198–204], but make sure you review the sections 

on Shifting summation indices [K,pp202–3] and Existence of power 

series solutions [K,pp203–4]. 

Four important points are the following.


• By the uniqueness of power series coefficients, the zero function must 

have zero coefficients. Thus when we compute the left-hand side of a 

differential equation as a power series and the right-hand side is zero, 

then the coefficients of each power on the left-hand side has to be zero. 

This determines the equations for the power series coefficients. 

• Being able to shift summation indices is an important skill to learn in 

order to quickly develop power series solutions. 

• Power series solutions to linear ordinary differential equations exist 

and converge for some non-zero radius provided that the coefficient 

functions of the differential equation are well-behaved: namely they 

can all be expanded in convergent Taylor series and the coefficient of 

the highest derivative in the ode does not vanish at the expansion 

point. 

• Well behaved functions are called analytic. 

Example 6.1: shifting summation indices Perhaps the easiest way to 

learn how to shift summation indices is to: write out the first few terms 

in the sum; then rewrite as a new sum in the desired form. Usually 

the aim is to make the exponent of x the variable of summation. For 

example, consider the second derivative 

y ′′ = 

∞∑ 

m(m − 1)a m x m−2 

m=0


writing out the first 7 terms 

= 0 + 0 + 2 · 1 · a 2 + 3 · 2 · a 3 x + 4 · 3 · a 4 x 2 

= 

+ 5 · 4 · a 5 x 3 + 6 · 5 · a 6 x 4 + · · · 

rewriting in terms of the exponent of x 

∞∑ 

(m + 2)(m + 1)a m+2 x m . 

m=0 

We may use the same summation variable m, or something different 

if we wish, because m is a parameter to the sum: it has no meaning 

outside of the sum in which it is used, and thus is allowed to mean 

different things in different sums. 

Activity 6.D Do problems from Problem Set 4.2 [K,pp204–5]. Send in to 


6.1.2 Legendre’s equation and Legendre polynomials 

In many applications of mathematics we often have a need to solve problems 

in a spherical domain or on the surface of a sphere. This might be because we 

study the internal dynamics of a star, the weather in the global atmosphere,


the dynamics of a ball, or the deformation of a near spherical drop of water. The classic 

In all these cases the differential equations describing the material take a 

similar form because of the spherical symmetry. This form leads to Legendre’s 

equation, 

(1 − x 2 )y ′′ − 2xy ′ + n(n + 1)y = 0 , (6.1) 

whose solutions we now explore using the techniques of power series. 

When solving problems on a sphere such as the earth: x = sin(latitude) 

so that x = ±1 corresponds to the North and South poles and x = 0 the 

equator. Consequently, in applications we require that the solutions are well 

behaved (analytic) at x = ±1 . See that this is an essential ingredient in the 

analysis. 

We will concentrate on the solutions of Legendre’s equation for integer n. 

For example: 

n = 1 y = P 1 (x) = x satisfies (1 − x 2 )y ′′ − 2xy ′ + 2y = 0 ; 

n = 2 y = P 2 (x) = 1 2 (3x2 − 1) satisfies (1 − x 2 )y ′′ − 2xy ′ + 6y = 0 . 

But what is the other independent solution for each case? and what about 

other values of n? 

differential 

equations in 

spherical geometry 

will be derived and 

discussed in the 

course on Vector 

calculus and partila 

differential 

equations. 

Reading 6.E Study Kreyszig §4.3 [K,p205–8]. Note that Legendre polynomials 

arise as solutions when the parameter n to Legendre’s equation 

is integral, n ≥ 0.


Legendre polynomials and associated Legendre functions are readily computed 

with Matlab. See below for code to plot Legendre polynomials. 

1 

0.8 

0.6 

0.4 

0.2 

P 1 

(x) 

P 2 

(x) 

P 3 

(x) 

P 4 

(x) 

0 

-0.2 

x=linspace(-1,1); 

-0.4 

pp=[]; 

for n=1:4 

-0.6 

p=legendre(n,x); 

-0.8 

pp(n,:)=p(1,:); 

end 

-1 

-1 -0.5 0 0.5 1 plot(x,pp) 

x 

Activity 6.F Do problems 1–9, 11 & 12 from Problem Set 4.3 [K,pp209–10]. 

Send in to the examiner for feedback at least Q1, 4 & 8.


6.2 Frobenius method is needed to describe Bessel 

functions 

We are often interested in mathematically formulating and solving problems 

in a circular geometry, for example: the vibrations of a drum; the development 

of blood flow along nearly circular arteries and veins; and propagation of 

light down an optical fibre. In these circumstances we use polar coordinates 

(r, θ) to describe the cross-sectional structures in these circular domains. 

Then the unknown fields, say u, are expressed as u = f(r) cos nθ where integer 

n parametrises the structure around the circular domain, whence we are Such solutions of 

lead to solve ode’s for f(r) of the form 

f ′′ + 1 r f ′ − n2 

r 2 f = 0 . 

Not only does such an equation have variable coefficients, it also has badly 

behaved coefficients as r → 0, the very centre of the domain! In this section 

we extend the power series method to cope with these interesting sorts of 

problems. 

partial differential 

equations are 

developed in the 

course mat2102. 

Main aims: 

• generalise the power series method to cope with singular differential 

equations via the indicial equation;


• see how the different cases that can arise lead to the different Bessel 

function solutions of Bessel’s equation. 

6.2.1 Frobenius extends the method 

The key to analysing such more general problems, called the Frobenius method, 

is to seek a power series in a slightly more general form. For a problem expressed 

as an ode for y(x) all we need do is to introduce a prefactor to the 

power series of x r where r is some real or complex number to be determined 

as needed. 1 That is, we seek solutions in the form 

y(x) = x r 

∞ ∑ 

m=0 

a m x m = x r (a 0 + a 1 x + a 2 x 2 + a 3 x 3 + · · ·) . (6.2) 

Example 6.2: Find the first few terms in a generalised power series solution 

to the ode 4x 2 y ′′ + x 2 y ′ + y = 0 expanded about the centre x = 0. 

Solution: 

Substitute the more general power series form 

y(x) = a 0 x r + a 1 x r+1 + a 2 x r+2 + · · · , 

1 In more tricky problems still we may resort to not only having a prefactor of x r , but 

also expanding in non-integral powers of x. Trying y(x) = ∑ ∞ 

m=0 a mx r+qm for some real 

or complex r and q is very powerful. But we will not explore this.


noting that its derivatives are 

y ′ = ra 0 x r−1 + (r + 1)a 1 x r + (r + 2)a 2 x r+1 + · · · 

y ′′ = r(r − 1)a 0 x r−2 + (r + 1)ra 1 x r−1 + (r + 2)(r + 1)a 2 x r + · · · , 

then the ode becomes 

4r(r − 1)a 0 x r + 4(r + 1)ra 1 x r+1 + 4(r + 2)(r + 1)a 2 x r+2 + · · · 

+ ra 0 x r+1 + (r + 1)a 1 x r+2 + · · · 

+a 0 x r + a 1 x r+1 + a 2 x r+2 + · · · = 0 . 

As before, the fundamental principle is that the complicated generalised 

power series on the left-hand side can only be equal to the zero on 

the right-hand side if all the coefficients of each power of x vanish. 

Grouping all terms in x r , x r+1 and x r+2 we must have: 

[4r(r − 1) + 1] a 0 = 0 , 

[4(r + 1)r + 1] a 1 + ra 0 = 0 , 

and [4(r + 2)(r + 1) + 1] a 2 + (r + 1)a 1 = 0 . 

• Now, without loss of generality we may assume that a 0 ≠ 0. 2 Thus 

we arrive at the indicial equation for r, that 4r(r − 1) + 1 = 0. 

This is simply a quadratic for r which factors to (2r − 1) 2 = 0, 

thus r = 1/2 and the prefactor to the power series must be simply 

√ x. a0 is not constrained (other than being non-zero). 

2 If a 0 = 0 then we are effectively seeking a power series of the form y = x r+1 (a 1 + 

a 2 x + · · ·) which is not any different in principle.


• The second equation above, from coefficients of x r+1 , says that 

a 1 = −ra 0 /[4(r + 1)r + 1]. But we know r = 1/2 and hence this 

determines a 1 = −a 0 /8. 

• Similarly, the third equation above, from coefficients of x r+2 , says 

that a 2 = −(r + 1)a 1 /[4(r + 2)(r + 1) + 1]. Hence a 2 = −3a 1 /32 = 

+3a 0 /256. 

Thus a power series solution to the ode is 

( 

√ 

y 1 (x) = a 0 x 1 − 1 8 x + 3 ) 

256 x2 + · · · , 

where a 0 is an arbitrary constant. 

This example leads to two questions: when does the Frobenius method work? 

and what happened to the second (linearly independent) solution that must 

exist for this second order ode? 

Reading 6.G Study Kreyszig §4.4 [K,pp211–6]. 

Example 6.3: Find the first few orders in the expansion of a second linearly 

independent solution of the ode in Example 6.2.


Solution: See that Example 6.2 is an example of Case 2 when the 

indicial equation has a double root. Hence expect a second linearly 

independent solution to be 

y 2 (x) = y 1 (x) log x + √ x(b 1 x + b 2 x 2 + · · ·) . 

Note the omission of b 0 in this expansion in order to avoid introducing 

an arbitrary multiple of y 1 —we could leave b 0 in, but we would pointlessly 

reproduce some of the earlier analysis. Differentiating y 2 leads 

to 

y ′ 2 = y ′ 1 log x + y 1x −1 + 3 2 b 1x 1/2 + 5 2 b 2x 3/2 + · · · , 

y ′′ 

2 = y ′′ 

1 log x + 2y′ 1 x−1 − y 1 x −2 + 3 4 b 1x −1/2 + 15 4 b 2x 1/2 + · · · . 

Substitute these into the differential equation: 

4x 2 y 1 ′′ 1 − 4y 1 +3b 1 x 3/2 +15b 2 x 5/2 + · · · 

+x 2 y 1 ′ log x + xy 1 

+y 1 log x +b 1 x 3/2 + 3 b 2 1x 5/2 + · · · 

+b 2 x 5/2 + · · · = 0 . 

• The three terms involving log x immediately cancel because y 1 (x) 

satisfies the ode. 

• Also 8xy ′ 1 − 4y 1 + xy 1 (upon setting a 0 = 1 in y 1 for simplicity) 

becomes just x 5/2 /16 + · · · —the x 1/2 term disappears because the 

indicial equation has a double root, and the x 3/2 term disappears 

by chance.


• Thus grouping all terms in x 3/2 and setting its coefficient to zero 

leads to 4b 1 = 0, that is b 1 = 0. 

• Grouping all terms in x 5/2 and setting its coefficient to zero leads 

to 1/16 + 3 2 b 1 + 16b 2 = 0. Hence b 2 = −1/256. 

A second linearly independent solution is thus 

y 2 = y 1 (x) log x + √ ( 

x − 1 ) 

256 x2 + · · · 

. 

Note: 

• A regular point of a linear ode is any point where all the coefficient 

functions are analytic, namely they all have Taylor series expansions 

that have a non-zero radius of convergence, and the coefficient of the 

highest derivative is non-zero. 

If a point is not regular, then it is called a singular point. Singular 

points for an ode often arise because of a degeneracy of the coordinate 

system and have nothing to do with the subject of the application of 

the mathematics. For example, in polar coordinates the point r = 0 

is degenerate because all angles θ meet at there, but the centre of a 

circular domain is usually completely undistinguished, just an ordinary 

point of the domain, in the application.


• Convergent Taylor series centred about a regular point can always be 

found for the general solution of an ode. At singular points, a more 

general power series expansion may be needed. 

• The Frobenius method straightforwardly applies to higher order differential 

equations as well. 

Activity 6.H Do problems from Problem Set 4.4 [K,pp216–7]. Send in to 

the examiner for feedback at least Q4 & 7. 

6.2.2 Bessel functions are used in circular geometries 

Bessel’s equation, 

x 2 y ′′ + xy ′ + (x 2 − ν 2 )y = 0 , (6.3) 

arises in circular or cylindrical geometries (where the variable x would represent 

the radial distance). For example, y(x) could represent the deflection, 

as a function of radius, of the membrane of a circular drum; or y(x) could 

represent the cross-pipe structure in the blood flow along a near circular 

artery. Indeed the differential equation mentioned in Example 6.2 is a variant 

of Bessel’s differential equation. We now solve this sort of equation using 

Frobenius’ method. The solutions for integer ν that we find, Bessel functions 

of the first kind, are plotted below. 

The letter “ν” is 

the Greek letter 

“nu” corresponding 

to the English “n”.


1 

0.5 

J 0 

(x) 

J 1 

(x) 

J 2 

(x) 

J 3 

(x) 

J 4 

(x) 

0 

x=linspace(0,10); 

j=besselj((0:4)’,x); 

-0.5 

0 2 4 6 8 10 plot(x,j) 

x 

Reading 6.I Study Kreyszig §4.5 [K,pp218–225]. 

Positive order Bessel’s functions are relevant. In applications the Bessel 

functions of order ν ≥ 0 are the ones of interest. Observe that as x → 0, the 

Bessel functions J ν (x) ∼ a 0 x ν which tends to zero if the order ν is positive, 

but goes to infinity is the order ν is negative. In most applications the variable 

x is the radius r. Thus x → 0 corresponds to approaching the centre of the 

domain. The general solution to Bessel’s equation is y = c 1 J ν (x) + c 2 J −ν (x) 

but in applications we usually cannot tolerate solutions going to infinity and


so the arbitrary constant c 2 = 0 in order to eliminate the bad behaviour of 

Bessel functions of negative order. This just leaves the physically interesting 

solution to be y = c 1 J ν (x) for ν ≥ 0. 

Variable transforms are useful. Now that are investigating ode’s with 

variable coefficients we find a much richer range of possible ode’s. Some 

of these may be transformed into a well studied ode such as Bessel’s or 

Legendre’s equations. For example, if we can deduce that the solutions to a 

strange ode are J ν (x 2 ) or P n ( √ x) then we immediately know lots about the 

solutions. Thus one useful technique that of transforming an ode from one 

form into another, hopefully well known. 

Example 6.4: transform an ode to Bessel’s equation Consider, as an 

example, Problem 3 in Problem Set 4.5 of Kreyszig, p226. The task is 

to transform the ode in y(x) into Bessel’s ode for y(z) where z = x 2 . 

Then we would be able to say that the solution is known to be y ∝ 

J ν (z) for some ν, and hence know the solution to the original ode is 

y ∝ J ν (x 2 ) . 

The challenge is to transform the x-derivatives in the original ode, 

x 2 y ′′ + xy ′ + (4x 4 − 1 )y = 0, into derivatives with respect to z. We do 

4 

this using the chain rule. Among many equally valid routes, see the 

logic in the following for both the first and second derivatives. 

dy 

dx = dy 

dz × dz 

dx 

by the chain rule


= dy 

dz 2x as z = x2 

= 2 √ z dy as x = √ z . 

Then 

d2 y 

dx 

dz 

= d ( ) dy 

dx dx 

= d ( 

2 √ z dy ) 

by above expression for dy/dx 

dx dz 

= d ( 

2 √ z dy ) 

× dz by chain rule 

dz dz dx 

= 2 √ z d ( 

2 √ z dy ) 

as z = x 2 

dz dz 

= 2 dy 

dz + 4z d2 y 

by derivative of a product . 

dz 2 

Then substitute these derivatives into the original ode to deduce the 

equivalent ode 

( 

x 2 2 dy ) ( 

dz + 4z d2 y 

+ x 2 √ z dy ) ( 

+ 4x 4 − 1 ) 

y = 0 ; 

dz 2 dz 

4 

that is, using x = √ z, 

upon dividing by 4 

4z 2 d2 y dy 

+ 4z 

dz2 (4z 

dz + 2 − 1 4 

z 2 d2 y 

dz + z dy (z 2 dz + 2 − 1 16 

) 

y = 0 ; 

) 

y = 0 . 

This is Bessel’s ode for y(z) with parameter ν = 1/4. Thus its solutions


are, for example, y ∝ J 1/4 (z) = J 1/4 (x 2 ) . 

Activity 6.J Do problems from Problem Set 4.5 [K,226–7]. Send in to the 

examiner for feedback at least Q4 & 11. 

Exercise 6.5: Consider the differential equation 2y ′′ −4xy ′ +(4x 2 −6)y = 0 . 

(a) Briefly explain why you would expect it to have power series solutions 

in the form of the Maclaurin series y = ∑ ∞ 

n=0 a n x n . 

(b) Hence construct the first few terms in a power series, with errors 

O (x 4 ), of the solution with y(0) = 1 and y ′ (0) = 0 to the 

differential equation.


6.3 Computer algebra for repetitive tasks 

The whole of the developments and operations of analysis are 

now capable of being executed by machinery. . . . As soon as 

an Analytical Engine exists, it will necessarily guide the future 

course of science. Charles Babbage in Passages from the Life 

of a Philosopher (London 1864) 

“On two occasions I have been asked [by members of Parliament!], 

‘Pray, Mr. Babbage, if you put into the machine wrong figures, 

will the right answers come out?’ 

I am not able rightly to apprehend the kind of confusion of ideas 

that could provoke such a question.” Charles Babbage 

Software packages to do computer algebra do much incredibly sophisticated 

analysis. However, mostly we want computers to do the tedious repetitive 

tasks—those that it is worth investing our time making sure the computer 

is doing what we want. Developing power series solutions of differential 

equations is an ideal application. 

Main aims: 

• see how computer algebra can be usefully employed to do tedious tasks;


• use simple iteration to develop power series solutions of linear and 

nonlinear differential equations; 

• make iteration more flexible by basing it upon the residual of the governing 

equations. 

We will use the free demonstration copies of reduce 3 available from: 

windows PC: ftp://ftp.maths.bath.ac.uk/pub/algebra, download in 

binary demored.exe and demored.img; 

linux: ftp://ftp.zib.de/pub/reduce/demo/linux; 

Macintosh: ftp://ftp.maths.bath.ac.uk/pub/algebra, download and unpack 

demored.hqx. 

Check that you can start and run reduce, it should open up a window 

saying something like 

REDUCE 3.6, patched to 30 Aug 98... 

1: 

3 There is always a limitation in using free, demonstration copies. Here the main restriction 

on the demonstration version is that “garbage collection” is disabled in reduce. 

What that means in practise is that only small to medium amounts of computer algebra 

can be done before having to restart reduce. It is probably best to solve one problem at 

a time, restarting reduce in between each problem. 

We generally use 

such a coloured, 

teletype font for 

computer 

instructions and 

dialogue.


The “1:” is a prompt for a command: type quit; followed by the return or 

enter key for reduce to finish. If this works, you can run reduce. 

• If you cannot get reduce to execute on your computer system, contact 

us for help. However, in the meantime you may start your work by 

using a telnet application to connect over the internet to the computer 

marlene.zib.de, 4 login as reducet and with an empty password. A 

reduce session will start for you; it is a little slow but at least you 

can make progress with your work. 

• A summary of the reduce commands that we will use are given in 

§§6.3.5. 

• A simple introduction to reduce is given in the following Section 6.3.1. 

• http://www.zib.de/Symbolik/reduce/Overview/Overview.html is an 

on-line overview to the capabilities of reduce. 

• http://www.uni-koeln.de/cgi-bin/redref/redr_dir.html gives extensive 

online help to the commands and syntax for reduce. 

6.3.1 Introducing reduce 

• Start reduce in Unix by typing reduce in a command window. To 

exit from reduce type the command quit; followed by the enter 

4 Courtesy of Konrad-Zuse-Zentrum für Informationstechnik, Berlin


key . 

• Note: all reduce statements must be terminated with a semi-colon. 

Do not forget. They are subsequently executed by typing a enter key. 

• reduce uses exact arithmetic by default: for example to find 100! in 

full gory detail type factorial(100);enter (I will not mention the 

enter key again unless necessary). 

• Identifiers, usually we use single letters, denote either variables or expressions: 

in f:=2*x^2+3*x-5; the identifier x is a variable whereas f, 

after the assignment with :=, contains the above expression; similary 

after g:=x^2-x-6; then g contains an algebraic expression. 

• Expressions may be 

added with f+g; 

subtracted with f-g; 

multiplied with f*g; 

divided with f/g; 

exponentiated with f^3;, etc. 

• Straightforward equations may be solved (by default equal to zero): 

solve(x^2-x-6,x); or through using an expression previously found 

such as solve(f,x); .


Systems of equations may be solved by giving a list (enclosed in braces) 

of equations and a list of variables to be determined. For example, 

solve({x-y=2,a*x+y=0},{x,y}); returns the solution parametrised 

by a. 

• Basic calculus is a snap: 

differentiation uses the function df as in df(f,x); to find the first 

derivative; or df(g,x,x); for the second; or df(sin(x*y),x,y); 

for a mixed derivative. 

The product rule for differentiation is verified for the above two 

functions by df(f*g,x)-df(f,x)*g-f*df(g,x); reducing to zero. 

integration is similar, int(f,x); giving the integral of the polynomial 

in f, without an integration constant, but perhaps more impressive 

is the almost instant integration of int(x^5*cos(x^2),x); . 

Note that repeated integration must be done by repeated invocations 

of int, not by further arguments as for df . Instead, for 

example, int(f,x,0,2); will give you the definite integral from 0 

to 2. 

• One can substitute an expression for a variable in another expression. 

For example the composition f(g(x)) is computed by sub(x=g,f); 

• reduce allows you to use many lines for the one command: a command 

is not terminated until the semi-colon is typed. reduce alerts you to 

the fact that you are still entering the one command by displaying the


prompt again. Thus if you forget the semi-colon, just type a semi-colon 

at the new prompt and then the enter key to execute what you had 

typed on the previous lines. 

• If reduce displays an error message along the lines of Declare xxx operator ? 

then you have probably mistyped something and the best answer is to 

type N then enter. 

6.3.2 Introduction to the iterative method 

Computers are extremely good are repeating the same thing many times over. 

We use this aspect to find power series solutions of some simple differential 

equations, and then some “horrible” nonlinear differential equations. The 

ideas are developed by example. 

Example 6.6: The solution to y ′′ + y = 0, y(0) = 1 and y ′ (0) = 0 is 

y = cos x. Find the Maclaurin series solution by iteration first by hand 

and secondly using computer algebra. 

Solution: Rearrange this ode to y ′′ = −y and then formally integrate 

twice to y = − ∫∫ y dx dx. These integrals on the right-hand side are indefinite 

integrals so implicit constants of integration, say a+bx, should 

appear on the right-hand side. But we know that the cosine solution


to y ′′ + y = 0 has y(0) = 1 and y ′ (0) = 0 so surely we should set a = 1 

and b = 0 to account for these initial conditions. Thus 

∫∫ 

y = 1 − y dx dx (6.4) 

where here the integrals are implicitly the definite integral from 0 to 

x. This rearrangement incorporates the information of the ode and its 

initial conditions. 

In this form we readily find its power series solution by iteration: given 

an approximation y n (x) we find a new approximation by evaluating 

∫∫ 

y n+1 = 1 − y n dx dx . 

First try by hand starting from y 0 = 1: 

• y 1 = 1 − ∫∫ 1 dx dx = 1 − 1 2 x2 ; 

• y 2 = 1 − ∫∫ 1 − 1 2 x2 dx dx = 1 − 1 2 x2 + 1 

24 x4 . 

See these are the first few terms in the Maclaurin series for cos x. Now 

try using reduce to do the algebra: 

• first type the three commands 

on div;, off allfac; and on revpri; 

(do not forget the semi-colon to logically terminate each command 

and the return or enter key to get reduce to actually execute the 

line you have typed)—these commands tell reduce to format its 

output in a nice way for power series; 

Interestingly, this is 

Picard iteration that 

is also used to prove 

existence of 

solutions to ode’s.


• second set a variable to the first approximation by typing y:=1; 

which assigns the value one to the variable y; 

• type y:=1-int(int(y,x),x); to assign the first approximation, 

y 1 = 1 − x 2 /2, to the variable y—int(y,x) computes an integral 

with respect to x of whatever is in y, fortunately for us, for 

polynomial y it computes the integral which is zero at x = 0; 

• type y:=1-int(int(y,x),x); again to compute y 2 , etc; 

• iterative loops are standard in computer languages and computer 

algebra is no exception so type 

for n:=3:8 do y:=1-int(int(y,x),x); 

to compute further iterations. But nothing was printed so finally 

type y; to see the resulting power series for cos x. 

The entire dialogue should look like this: 

1: on div; 

2: off allfac; 

3: on revpri; 

4: y:=1; 

y := 1 

5: y:=1-int(int(y,x),x);


1 2 

y := 1 - ---*x 

2 

6: y:=1-int(int(y,x),x); 

1 2 1 4 

y := 1 - ---*x + ----*x 

2 24 

8: for n:=3:8 do y:=1-int(int(y,x),x); 

9: y; 

1 2 1 4 1 6 1 8 1 10 

1 - ---*x + ----*x - -----*x + -------*x - ---------*x 

2 24 720 40320 3628800 

1 12 1 14 1 16 

+ -----------*x - -------------*x + ----------------*x 

479001600 87178291200 20922789888000 

Example 6.7: Find the general Maclaurin series solution to y ′′ +y = 0 using 

computer algebra (reduce).


Solution: In the previous example we built in the specific initial conditions 

appropriate to y = cos x, namely y(0) = 1 and y ′ (0) = 0. If we 

make the integration constants arbitrary, by iterating 

∫∫ 

y = a + bx − y dx dx , 

then we recover the general solution parametrised by a and b where 

y(0) = a and y ′ (0) = b. Let’s do it. Start reduce: 

• type factor a,b; to get reduce to group all terms in a and all 

terms in b; 

• set the initial value to something simple satisfying the initial conditions 

y:=a+b*x; 

• iterate for n:=1:4 do write y:=a+b*x-int(int(y,x),x); using 

the write command to print each iterate. 

The dialogue is: 

4: factor a,b; 

5: y:=a+b*x; 

y := b*x + a 

6: for n:=1:4 do write y:=a+b*x-int(int(y,x),x); 

1 3 1 2 

Always remember to 

start reduce with 

on div; 

off allfac; 

on revpri;


y := b*(x - ---*x ) + a*(1 - ---*x ) 

6 2 

1 3 1 5 1 2 1 4 

y := b*(x - ---*x + -----*x ) + a*(1 - ---*x + ----*x ) 

6 120 2 24 

1 3 1 5 1 7 

y := b*(x - ---*x + -----*x - ------*x ) 

6 120 5040 

1 2 1 4 1 6 

+ a*(1 - ---*x + ----*x - -----*x ) 

2 24 720 

1 3 1 5 1 7 1 9 

y := b*(x - ---*x + -----*x - ------*x + --------*x ) 

6 120 5040 362880 

1 2 1 4 1 6 1 8 

+ a*(1 - ---*x + ----*x - -----*x + -------*x ) 

2 24 720 40320 

See how easily this generates the Maclaurin series for y = a cos x + 

b sin x. 

Now let’s try something rather hard—in fact almost impossible to quantitatively 

solve except via power series methods. We now use precisely the same


iteration to solve a nonlinear ode! 

Example 6.8: Find the Maclaurin series solution to the nonlinear ode 

y ′′ = 6y 2 , y(0) = 1 and y ′ (0) = −2. 

Before solving this as a power series (by my design its exact solution 

just happens to be y = 1/(1 + x) 2 ), investigate it qualitatively using 

techniques developed in earlier by considering it as a system of firstorder 

differential equations. Introduce z(x) = y ′ then the equivalent 

system is 

y ′ = z , z ′ = 6y 2 . 

Hence the evolution in the phase plane is dictated by the arrows shown 

below with the particular trajectory starting from the initial condition 

(1, −2) shown in green:


z=y' 

0 

-0.2 

-0.4 

-0.6 

-0.8 

-1 

-1.2 

-1.4 

-1.6 

-1.8 

-2 

-0.5 0 0.5 1 

y 

[y,z]=meshgrid(-.5:.1:1,-2:.1:0); 

quiver(y,z,z,6*y.^2) 

hold on 

x=linspace(0,7); 

y=1./(1+x).^2; 

z=-2./(1+x).^3; 

plot(y,z,’g’) 

hold off 

Solution: Now we find its power series solution! As before, recast the 

ode in the following form that also incorporates the initial conditions


by formally integrating twice the ode: 

∫∫ 

y = 1 − 2x + 6 y 2 dx dx , (6.5) 

where again the repeated x integral is assumed done so that each integral 

is zero at x = 0. Then iterate, starting from y 0 = 1 − 2x say: 

∫∫ 

y 1 = 1 − 2x + 6 1 − 4x + 4x 2 dx dx 

= 1 − 2x + 3x 2 − 4x 3 + 2x 4 ; 

y 2 = 

∫∫ (1 

1 − 2x + 6 − 2x + 3x 2 − 4x 3 + 2x 4) 2 

dx dx 

∫∫ 

= 1 − 2x + 6 1 − 4x + 10x 2 − 20x 3 + 29x 4 − 32x 5 

+28x 6 − 16x 7 + 4x 8 dx dx 

= 1 − 2x + 3x 2 − 4x 3 + 5x 4 − 6x 5 

+ 29 

5 x6 − 32 

7 x7 + 3x 8 − 4 3 x9 + 4 

15 x10 . 

This is quickly becoming horrible. But that is just why computers 

are made. Before rushing in to use reduce, observe that here the 

quadratic nonlinearity y 2 is going to generate very high powers of x, 

most of which we do not want. For example, in y 2 above the terms up 

to x 5 are correct, but all the higher powers are as yet wrong. 5 Another 

5 The quadratic nonlinearity y 2 rapidly generates high powers of x in the expressions. 

However, the iteration plods along only getting one or two orders of x more accurate each 

iteration.


iteration would generate a 22nd order polynomial for y 3 of which only 

the first 8 coefficients are correct, the rest are rubbish. In reduce we 

discard such high order terms in a power series by using, for example, 

the command let x^10=>0; which tells reduce to discard, set to zero, 

or otherwise ignore, all terms with a power of x of ten or more. This 

is just what we want. Thus here the dialogue would be: 

5: let x^10=>0; 

6: y:=1-2*x; 

y := 1 - 2*x 

7: for n:=1:5 do write y:=1-2*x+6*int(int(y^2,x),x); 

2 3 4 

y := 1 - 2*x + 3*x - 4*x + 2*x 

2 3 4 5 29 6 32 7 8 

y := 1 - 2*x + 3*x - 4*x + 5*x - 6*x + ----*x - ----*x + 3*x 

5 7 

4 9 

- ---*x 

3 

2 3 4 5 6 7 306 8 

y := 1 - 2*x + 3*x - 4*x + 5*x - 6*x + 7*x - 8*x + -----*x 

35


316 9 

- -----*x 

35 

2 3 4 5 6 7 8 9 

y := 1 - 2*x + 3*x - 4*x + 5*x - 6*x + 7*x - 8*x + 9*x - 10*x 

2 3 4 5 6 7 8 9 

y := 1 - 2*x + 3*x - 4*x + 5*x - 6*x + 7*x - 8*x + 9*x - 10*x 

See how the iteration settles on the correct power series but with all 

terms with powers of ten or higher have been neglected. Check this satisfies 

the ode by computing the residual df(y,x,x)-6*y^2; (df(y,x) 

computes the derivative of y with respect to x and df(y,x,x) computes 

the second derivative); the result is zero except for two terms in x 8 and 

x 9 which would cancel with the second derivative of the absent tenth 

and eleventh order terms. We thus triumphantly write the solution of 

this nonlinear ode as 

y = 1 − 2x + 3x 2 − 4x 3 + 5x 4 − 6x 5 + 7x 6 − 8x 7 + 9x 8 − 10x 9 + O(x 10 ) , 

where O(x 10 ) (read “order of x 10 ”) tells us that the error in the power 

series, the neglected terms, are x 10 or higher powers. 

In the above three examples we have developed the Taylor series about x = 0, 

the Maclaurin series. To find Taylor series about any point x = c it is simply


a matter of changing the independent variable to, for example, t = x − c 

and then finding the Maclaurin series in t. We will continue to find only 

Maclaurin series because that is all we need to also find other power series 

solutions. 

Activity 6.K Do Problems 6.13–6.17 in the Exercises 6.3.4, p239. Send in 

to the examiner for feedback at least Ex. 6.13 & 6.14. 

6.3.3 Iteration is very flexible 

So far we have simply rearranged an ode in order to derive an iteration that 

will generate the desired power series solution. 6 In this subsection we discuss 

why this strategy works at all, and what extension we need in order to solve 

a very wide range of differential equations. 

The iteration works because integration is basically a smoothing operation. 

This smoothing tends to reduce errors in a power series. For example, suppose 

an error was O(x 3 ), so that it is roughly about 10 −3 when x = 0.1 say: 

6 What we have done is rather remarkable. In the course on Numerical Computing you 

will learn about fixed point iteration as a method of solving linear and nonlinear equations. 

In fact we have done precisely fixed point iteration here. The remarkable difference is 

that in Numerical Computing you will simply find the one number that satisfies a given 

equation; here you have found the function, via its power series, that satisfies the given 

differential equation—a much more difficult task. Nonetheless the strategy of appropriately 

rearranging the equation and iterating works.


then integrating it twice will lead to an error O(x 5 ) in the integral which 

is much smaller in magnitude, roughly 10 −5 when x = 0.1. Conversely differentiation 

magnifies errors: two derivatives of an error O(x 3 ) becomes an 

error O(x) which, at roughly 10 −1 when x = 0.1, is much larger. To make 

errors smaller, equivalently to push them to higher powers in x, we generally 

need to integrate. Thus an integral reformulation of an ode is the basis for 

a successful iterative solution. 

The other question is: how do we know how many iterations should be 

performed? The answer here is simple: keep iterating until there is no more 

change to the solution. One consequence of the answer though is that we 

have to keep track of the change in the approximations. A good way to find 

the change in an approximation is to solve for it explicitly. But first we have 

to find an equation for the small change in the approximate solution at each 

iteration. This leads us to a powerful iterative framework, based upon the 

residual of the ode, which we develop and explore by example. 

Example 6.9: Legendre functions. Use iteration to find the general 

Maclaurin series solutions to Legendre’s equation (6.1), written here 

as 

(1 − x 2 )y ′′ − 2xy ′ + ky = 0 for k = n(n + 1) , 

to an error O(x 10 ) for initial conditions y(0) = 1 and y ′ (0) = 0. 

Solution: 

Immediately an initial approximation is 

y 0 = 1 ,


as this satisfies the initial conditions. The iterative challenge is: given 

a known approximation y n , find an improved solution 

ŷ is read as “y-hat”. 

y n+1 (x) = y n (x) + ŷ n (x) , 

where ŷ n is the as yet unknown change in the approximation that we 

have to find. Now substitute this form for y n+1 into the ode and 

rearrange to put all the known terms on the right-hand side and all the 

unknown on the left: 

−(1 − x 2 )ŷ ′′ 

n + 2xŷ′ n − kŷ n = (1 − x 2 )y ′′ 

n − 2xy′ n + ky n . 

This looks like a differential equation for the as yet unknown change 

ŷ n forced by the known right-hand side, the residual of Legendre’s 

equation evaluated at the current approximation, R n = (1 − x 2 )y n ′′ − 

2xy n ′ + ky n. For example, the first residual from y 0 = 1 is R 0 = k. But 

this ode for the change is far too complicated—indeed if we could solve 

it exactly then the problem would be over immediately. Instead we seek 

a simplification to make the ode for ŷ n tractable while still useful. The 

general principles of the simplification are that in any terms involving 

ŷ n : 

• near the point of expansion x = 0, x is much smaller than 1 and x 2 

is even smaller still, thus we neglect higher powers of x relative to 

lower powers—so in this example we replace the (1 − x 2 ) factor 

by 1 because the x 2 is negligible in comparison to 1 for the small x 

near the point of expansion;


• also, though be careful, because differentiation increases errors as 

differentiation by x corresponds roughly to lowering the power 

of x by 1 (equivalently it roughly corresponds to dividing by x) 

we neglect low order derivatives of ŷ n (provided they are not also 

divided by x)—so in this example xŷ n ′ is roughly of the same “size” 

as ŷ n because the derivative makes it larger but the multiplication 

by x cancels this effect, but both of these terms are smaller than 

ŷ n ′′ which is roughly 1/x2 times larger. 

After this simplification, the ode for the change then reduces to 

−ŷ ′′ 

n = R n(x) = (1 − x 2 )y ′′ 

n − 2xy′ n + ky n . 

In the first iteration, as R 0 = k, ŷ 0 ′′ = −k which upon integrating twice 

leads to the requisite change being ŷ 0 = −kx 2 /2. 

But what about the constants of integration? In this approach the initial 

approximation satisfies the initial conditions y(0) = 1 and y ′ (0) = 

0. We ensure these are satisfied by all approximations by ensuring all 

the changes ŷ n satisfy the corresponding homogeneous initial conditions 

ŷ n (0) = ŷ ′ n (0) = 0. Thus, for example, the change ŷ 0 above is indeed 

correct. Hence the next approximation is y 1 = 1 − kx 2 /2. 

We could continue doing this by hand, but the plan is to use computer 

algebra to do the tediously repetitious iteration. 

• The initial approximation is set simply by y:=1;


• We wish to discard any powers generated of O(x 10 ) so include the 

declaration let x^10=>0; 

• To iterate until the change is negligible use the repeat loop, namely 

repeat ... until r=0; where we will use r to store the residual 

and the change. 

• The repeat-until construct in reduce, unlike many other computing 

languages, expects only a single statement between the repeat 

and the until—we bracket the multiple statements needed inside 

with a begin ... end 

• Inside the loop: 

– compute residual, r:=(1-x^2)*df(y,x,x)-2*x*df(y,x)+k*y; 

– compute change, r:=-int(int(r,x),x); 

– update the approximation, write y:=y+r; 

The reduce dialogue might be: 

4: y:=1; 

y := 1 

5: let x^10=>0; 

6: repeat begin 

6: r:=(1-x^2)*df(y,x,x)-2*x*df(y,x)+k*y; 

6: r:=-int(int(r,x),x); 

6: write y:=y+r;


6: end until r=0; 

1 2 

y := 1 - ---*k*x 

2 

1 2 1 4 1 2 4 

y := 1 - ---*k*x - ---*k*x + ----*k *x 

2 4 24 

1 2 1 4 1 6 1 2 4 13 2 6 

y := 1 - ---*k*x - ---*k*x - ---*k*x + ----*k *x + -----*k *x 

2 4 6 24 360 

1 3 6 

- -----*k *x 

720 

1 2 1 4 1 6 1 8 1 2 4 

y := 1 - ---*k*x - ---*k*x - ---*k*x - ---*k*x + ----*k *x 

2 4 6 8 24 

13 2 6 101 2 8 1 3 6 17 3 8 

+ -----*k *x + ------*k *x - -----*k *x - -------*k *x 

360 3360 720 10080 

1 4 8 

+ -------*k *x 

40320


1 2 1 4 1 6 1 8 1 2 4 

y := 1 - ---*k*x - ---*k*x - ---*k*x - ---*k*x + ----*k *x 

2 4 6 8 24 

13 2 6 101 2 8 1 3 6 17 3 8 

+ -----*k *x + ------*k *x - -----*k *x - -------*k *x 

360 3360 720 10080 

1 4 8 

+ -------*k *x 

40320 

It is painful having to retype the entire loop anytime one typing mistake 

is made. Instead prepare a file, called say leg.red, containing the 

reduce commands (including an extra end; at the end): 

on div; off allfac; on revpri; 

factor x; 

y:=1; 

let x^10=>0; 

repeat begin 

r:=(1-x^2)*df(y,x,x)-2*x*df(y,x)+k*y; 

r:=-int(int(r,x),x); 

write y:=y+r; 

end until r=0; 

end;


then start reduce and get all these commands executed by typing 

in "leg.red"; The output gives the desired Maclaurin series to be 

y = 1 − k ( 1 

2 x2 + 

24 k2 − 1 ) ( 1 

4 k x 4 − 

720 k3 − 13 

360 k2 + 1 ) 

6 k x 6 

( 1 

+ 

40320 k4 − 17 

10080 k3 + 101 

3360 k2 − 1 ) 

8 k x 8 + O(x 10 ) . 

Example 6.10: Find the Maclaurin series solution to errors O(x 10 ) to the 

nonlinear ode y ′′ +(1+x)y ′ −6y 2 = 0 such that y(0) = 1 and y ′ (0) = −1. 

Solution: Again immediately write down an initial approximation 

consistent with the initial conditions: namely y 0 = 1 − x. Then, given 

a known approximation, say y n (x), seek an improved approximation 

y n+1 (x) = y n (x) + ŷ n (x) where ŷ n (x) is the as yet unknown change. 

Substitute into the differential equation and rearrange to deduce the 

following ode for the change: 

−ŷ ′′ 

n − (1 + x)ŷ′ n + 6ŷ2 n + 12y nŷ n = R n = y ′′ 

n + (1 + x)y′ n − 6y2 n , 

where here, as always, R n (x) is the known residual evaluated for the 

current approximation. Now simplify the left-hand side:


• since x is “small” (in the power series expansion) 1 + x ≈ 1 and 

similarly y n ≈ 1 from the initial condition y(0) = 1 so the lefthand 

side first simplifies to 

−ŷ ′′ 

n − ŷ′ n + 6ŷ2 n + 12ŷ n ; 

• but also the change ŷ n must be small (as each ŷ n is to make a small 

improvement in the solution) and so ŷn 2 must be much smaller still 

and should be neglected—for example we typically expect the first 

change ŷ 0 to be O(x 2 ) whence ŷ0 2 = O(x4 ) which is much smaller 

and negligible—hence the left-hand side simplifies further to 

−ŷ ′′ 

n − ŷ′ n + 12ŷ n ; 

• lastly, differentiation effectively decreases the order of any term so 

that the second derivative term dominates the others above and 

so the ode for the change becomes simply 

−ŷ ′′ 

n = R n = y ′′ 

n + (1 + x)y ′ n − 6y 2 n . 

For example, the first iteration starts by computing the residual 

R 0 = 0 + (1 + x)(−1) − 6(1 − x) 2 = −7 + 11x − 6x 2 . 

Then changing sign and integrating twice gives the first change 

∫∫ 

ŷ 0 = − 

R n dx dx = 7 2 x2 − 11 6 x3 + 1 2 x4 ,


after recalling that we need to satisfy homogeneous initial conditions 

ŷ ′ n(0) = ŷ n (0) = 0 for the changes in order to ensure the solution 

satisfies the specified initial conditions. Thus the new approximation 

is 

y 1 = 1 − x + 7 2 x2 − 11 

6 x3 + 1 2 x4 . 

Now investigate further with computer algebra. First create a file, say 

nod.red with 


y:=1-x; 

let x^10=>0; 

repeat begin 

r:=df(y,x,x)+(1+x)*df(y,x)-6*y^2; 


y:=y+r; 


y:=y; 

end; 

Second executing the commands using the in statement produces the 

output below 

2: in "nod.red"; 

on div; 

off allfac;


on revpri; 

y:=1-x; 

y := 1 - x 

let x^10=>0; 

repeat begin 

r:=df(y,x,x)+(1+x)*df(y,x)-6*y^2; 


y:=y+r; 


y:=y; 

7 2 3 25 4 257 5 219 6 1433 7 

y := 1 - x + ---*x - 3*x + ----*x - -----*x + -----*x - ------*x 

2 6 60 40 252 

end; 

6355 8 199277 9 

+ ------*x - --------*x 

1008 30240 

Thus conclude that the Maclaurin series solution is 

y = 1 − x + 7 2 x2 − 3x 3 + 25 

6 x4 − 257 

60 x5 + 219 

40 x6 − 1433 

252 x7


+ 6355 

1008 x8 − 199277 

30240 x9 + O(x 10 ) . 

The following are the principles seen in this iterative approach to finding 

power series solutions to linear and nonlinear ode’s. 

• Make an initial approximation consistent with the initial conditions of 

the ode. 

• Seek as simple an ode for successive corrections by substituting y n+1 = 

y n + ŷ n into the differential equation, grouping all the known terms 

into the residual R n , and then neglecting all but the dominant terms 

involving the change ŷ n : 

– neglect all nonlinear terms in the small change ŷ n ; 

– approximate all coefficient factors by the lowest order term in x; 

– and, counting each derivative with respect to x as equivalent to a 

division by x, keep only those terms of lowest order in x. 

This process is close kin to the linearisation that we employed in Module 

1 and will employ in later modules. 

• Iteratively make changes as guided by the residuals until the changes 

are zero to some order of error in x. This is handily done by computer 

algebra.


Warning: when testing computer algebra code, do not use the repeatuntil 

loop; while testing use a for-do loop to ensure that you do not get 

stuck in an infinite loop. Only when you are sure that your code works 

do you replace the for-do loop with a repeat-until loop. 

Applying these principles becomes more involved when we apply them in 

developing power series about a singular point of an ode. We investigate a 

couple of examples. 

Example 6.11: Bessel function of order 0. Find the power series solution 

of x 2 y ′′ + xy ′ + x 2 y = 0 that is well-behaved at x = 0 to an error 

O(x 10 )—namely find the low-orders of a power series proportional to 

the Bessel function J 0 (x). 

Solution: First find and solve the indicial equation by substituting 

y = x r + O(x r+1 ). Here the ode becomes 

x 2 y ′′ + xy ′ + x 2 y = r(r − 1)x r + rx r + x r+2 + O(x r+1 ) = r 2 x r + O(x r+1 ) . 

As x r+2 is absorbed 

into the error term 

O ( x r+1) . 

The only way this can be zero for all small x is if r 2 = 0. This leads, as 

discussed in Kreyszig [K,§4.4], to the homogeneous solutions of the ode 

being approximately y ≈ a+b log x. The logarithm is not well-behaved 

as x → 0 hence we set b = 0 and just seek solutions that tend to a 

constant as x → 0. Without loss of generality, because we can multiply 

by a constant later, we choose to find solutions such that y(0) = 1.


Second we make an initial approximation to the solution. After the 

above discussion of the indicial equation, choose y 0 = 1. 

Third, given a known approximation y n (x) seek an improved approximation 

y n+1 (x) = y n (x) + ŷ n (x) where ŷ n (x) is some small change. 

Substitute this into the ode, neglect x 2 ŷ n because it is two orders of x 

smaller than either x 2 ŷ n ′′ or xŷ′ n , and deduce that ŷ n must satisfy 

− x 2 ŷ ′′ 

n − xŷ′ n = R n = x 2 y ′′ 

n + xy′ n + x2 y n . (6.6) 

Solving this for the correction ŷ n is no longer simply a matter of integrating 

twice. 

However, rearranging the form of the ode (6.6) we again express the 

solution in terms of two integrations. All we need to do is to notice 

that the left-hand side is identical to −x(xŷ ′ n) ′ whence 

Apply this iteration here. 

−x(xŷ ′ n) ′ = R n 

∫ 

⇔ xŷ n ′ = − Rn 

x dx 

∫ ∫ 1 Rn 

⇔ ŷ n = − dx dx . 

x x 

(a) In the first iteration y 0 = 1 so the residual R 0 = x 2 . Thus 

∫ 

ŷ 0 = − 

∫ 1 x 

2 

dx dx 

x x


∫ 1 ( 

= − 1 

2 

x 

x2 + b ) dx 

= − 1 4 x2 − b log x + a 

for integrations constants a and b. 

Note the freedom to include a − b log x into ŷ 0 , but we cannot 

tolerate any component in log x, as it behaves badly at x = 0, so 

b = 0, and a has to be chosen zero in order to ensure y n (0) = 1. 

(This argument applies at all iterations.) Hence y 1 = 1 − x 2 /4. 

(b) In the second iteration R 1 = −x 4 /4. Thus, setting the integration 

constants to zero as before, 

∫ ∫ 1 −x 4 /4 

ŷ 1 = − 

dx dx 

x x 

∫ ) 1 

= − 

(− x4 

dx 

x 16 

= x4 

64 . 

Hence y 2 = 1 − x 2 /4 + x 4 /64. 

For a computer algebra program, proceed as in earlier examples but 

modify the two integrations as in 

y:=1; 

let x^10=>0;


repeat begin 

r:=x^2*df(y,x,x)+x*df(y,x)+x^2*y; 

r:=-int(int(r/x,x)/x,x); 

write y:=y+r; 


Execute this code and see the solution is 

y = J 0 (x) = 1 − 1 4 x2 + 1 

64 x4 − 1 1 

2304 x6 + 

147456 x8 + O(x 10 ) . 

Example 6.12: Bessel functions of order 0. Find the power series expansion 

about x = 0, to errors O(x 10 ), of the general solution to Bessel’s 

equation with ν = 0, namely x 2 y ′′ + xy ′ + x 2 y = 0. 

Solution: The indicial equation shows that in general the dominant 

component in the solution is a + b log x for any a and b. (See that these 

were also naturally obtained in the integration constants of the previous 

example.) Use this as the first approximation y 0 and see what ensues. 

The derivation of the equation for the iterative changes, Eqn (6.6) 

remains the same. 

Including the command factor b,a,log; to improve the appearance 

of the printing and setting the initial approximation to y 0 = a + b log x, 

the reduce code is as before, namely


factor b,a,log; 

y:=a+b*log(x); 

let x^10=>0; 

repeat begin 

r:=x^2*df(y,x,x)+x*df(y,x)+x^2*y; 

r:=-int(int(r/x,x)/x,x); 

write y:=y+r; 


Run this code to see the result 

1 2 1 4 1 6 1 8 

y := a*(1 - ---*x + ----*x - ------*x + --------*x ) 

4 64 2304 147456 

1 2 3 4 11 6 25 8 

+ b*(---*x - -----*x + -------*x - ---------*x ) 

4 128 13824 1769472 

1 2 1 4 1 6 1 8 

+ log(x)*b*(1 - ---*x + ----*x - ------*x + --------*x ) 

4 64 2304 147456 

That is, as Kreyszig assures us for double roots [K,p213], the general 

solution is of the form y = ay 1 (x) + by 2 (x) where here 

y 1 = 1 − 1 4 x2 + 1 

64 x4 − 1 1 

2304 x6 + 

147456 x8 + O(x 10 ) , 

y 2 = y 1 (x) log x + 1 4 x2 − 3 

128 x4 + 11 

13824 x6 − 25 

1769472 x8 + O(x 10 ) .


This framework of using residuals to improve approximate solutions, getting 

computers to do the tedious algebra, can be adapted to a wide variety of 

problems. The iteration will improve an approximation provided changes 

deduced from the residuals are appropriate because a simple and sensible 

approximation to the equation for the changes has been derived. But the ultimate 

result depends only upon being able to evaluate the residuals correctly 

and being able to drive them to zero to some level of accuracy. 

Activity 6.L Do problems 6.18–6.23 in the Exercises set 6.3.4, p239. Send 

in to the examiner for feedback at least 6.18 & 6.20. 


Ex. 6.13: Modify the iteration of Example 6.6 to find the Maclaurin series 

solution to the ode y ′′ − 2y = 0 such that y(0) = 1 and y ′ (0) = 0 using 

reduce and to errors O(x 10 ). 

Ex. 6.14: Similarly use reduce to find the Maclaurin series solution to 

errors O(x 15 ) to the ode y ′′ + xy = 0 such that y(0) = a and y ′ (0) = b 

(remember to factor b,a;). The Maclaurin series multiplied by a and 

b are those of two linearly independent solutions to Airy’s equation 

mentioned in Kreyszig [K,p198,p958–60].


Ex. 6.15: Use reduce to find the Maclaurin series of the solution to y ′ = 

cos(x)y such that y(0) = 1 to errors O(x 10 ). Hint: replace cos x 

in the code by its Maclaurin series, you may use that factorial(n) 

in reduce computes n!. Compare your answer to that of the exact 

analytic solution obtained by recognising the ode is separable. 

Ex. 6.16: Modify the analysis of Example 6.8 to use reduce to find the 

Maclaurin series solution to errors O(x 10 ) of the nonlinear ode y ′′ = 6y 2 

such that y(0) = 1 and y ′ (0) = b where b is some arbitrary constant. 

Note: because this is a nonlinear ode the solution depends nonlinearly 

upon b, in contrast to linear ode’s which would show a linear 

dependence only. 

Ex. 6.17: Use reduce to find the Maclaurin series solution of the nonlinear 

ode y ′′ = (1+x)y 3 to errors O(x 10 ) such that y(0) = 2 and y ′ (0) = −3. 

Ex. 6.18: Modify the reduce computer algebra of Example 6.9 to find the 

Maclaurin series of the general solution to Legendre’s equation in the 

specific case k = 3 to an error O(x 10 ). 

Ex. 6.19: Modify the arguments and the reduce computer algebra of Example 

6.9 to find the Maclaurin series, to an error O(x 10 ), of the general 

solution to the following three odes: 

(a) (x − 2)y ′ = xy ; 

(b) (1 − x 2 )y ′ = 2xy ;


(c) y ′′ − 4xy ′ + (4x 2 − 2)y = 0 . 

Ex. 6.20: Modify the computer algebra code for Example 6.11 to find the 

Maclaurin series, to errors O(x 10 ), of the well-behaved solution of the 

nonlinear ode x 2 y ′′ + x 2 y ′ + xy 3 = 0 such that y(0) = 2. 

Ex. 6.21: Use reduce to help you find the power series about x = 0, to 

errors O(x 10 ), of the well-behaved solutions of the ode x 2 y ′′ + x 3 y ′ + 

(x 2 − 2)y = 0. Hint: x 2 y ′′ − 2y = (x 4 (y/x 2 ) ′ ) ′ . Then modify your 

reduce code to find the power series of the one parameter family of 

well-behaved solutions to the nonlinear ode x 2 y ′′ +x 3 y ′ +(x 2 −2)y+y 2 = 

0. 

Ex. 6.22: Use reduce to help find the power series about x = 0, to errors 

O(x 20 ), of the well-behaved solutions of the ode xy ′′ + 3y ′ + 3x 2 y = 0. 

Hint: xy ′′ + 3y = (x 3 y ′ ) ′ /x 2 . 

Ex. 6.23: Find the power series expansions about x = 0, to errors O(x 10 ), 

for the two parameter general solution to the linear ode x 2 y ′′ −sin(x)y ′ + 

y = 0, with the aid of computer algebra. Hint: expand sin x in a 

Maclaurin series and write x 2 y ′′ − xy ′ in the form x 2−p (x p y ′ ) ′ . 

Ex. 6.24: Following is some reduce code to iteratively find a power series 

solution to an ode: what is the differential equation it purports to 

solve? and its initial conditions? what is the value of y after the first 

iteration of the repeat loop? what is the order of error in the computed 

power series after the loop terminates?



y:=2*x; 

let x^20=>0; 

repeat begin 

r:=(1-x^3)*df(y,x,x)-(y^2-x^2)*df(y,x); 


write y:=y+r; 


6.3.5 Summary of some reduce commands 

“the different branches of Arithmetic—Ambition, Distraction, Uglification 

and Derision.” the Mock Turtle in Alice in Wonderland 

by Lewis Carroll 

• reduce instructions must be terminated and separated by a semicolon. 

• quit; or bye; terminates reduce execution. 

• Use on div;, off allfac; and on revpri; to improve the printing 

of power series. 

• := is the assignment operator.


• The normal arithmetic operators are: +, -, *, / and ^ for addition, 

subtraction, multiplication, division and exponentiation respectively. 

• write will display the result of an expression, although reduce automatically 

displays the results of each command that is not in a loop. 

• int(y,x) will provide an integral of the expression in y with respect 

to the variable x, provided reduce can actually do the integral. 

• df(y,x) returns the derivative of the expression in y with respect to 

the variable x; df(y,x,z) will return the second derivative of y with 

respect to x and z. 

• factorial(n) returns the value of n!. 

• for n:=2:5 do, for example, will repeat whatever statement follows 

for values of the variable used, here n, over the range specified in the 

command, here from 2 to 5. 

• The let statement does pattern matching and replacement; for example 

let x^15=>0; tells reduce to subsequently discard any term 

involving x to the power fifteen or more. 

• repeat...until... will repeatedly execute a statement until the given 

condition is true. 

• begin...end is used to group statements into one; end; is also used 

to terminate reading in a file of reduce commands.


• in "..."; tells reduce to execute the commands contained in the 

specified file.


6.4 The orthogonal solutions to second order differential 

equations 

Power series give us very powerful methods of deriving solutions to specific 

differential equations. But in order to guide us we need to know more about 

the structure of solutions to ode’s. Sturm-Liouville theory tells us how different 

solutions of an ode relate to each other, they are orthogonal, and 

something about their nature. This then allows us to usefully write functions 

in terms of families of solutions to an ode. 

In this section we identify patterns that occur across a wide range of ode’s. 

This is mathematics at a higher level—it brings together into the framework 

of Sturm-Liouville theory a variety of ode’s and their solutions. The 

task here is not the solution of actual problems, but the appreciation of the 

synthesis of wide ranging phenomena in the solutions of ode’s. 

Main aims: 

• see that Legendre and Bessel equations are examples of Sturm-Liouville 

equations; 

• show that important properties such as reality of eigenvalues and orthogonality 

of eigenfunctions can be deduced from the differential equation.


The simplest example of functions displaying the properties that we investigate 

are the trigonometric functions and their harmonics, sin nx and cos nx 

for integer n. The properties are derived from their differential equation 

y ′′ + n 2 y = 0. 

The family of ode’s we consider are those in the form of the Sturm-Liouville 

equation 

[r(x)y ′ ] ′ + [q(x) + λp(x)]y = 0 , (6.7) 

where p, q and r are given functions and λ is a constant which is often a 

parameter to the problem. Many second-order ode’s are put into this form. 

Reading 6.M Study Kreyszig §4.7 [K,pp233–8], including the proof of Reality 

of eigenvalues in Appendix 4 [K,pA70]. 

Recall that orthogonality is just a grand word for being at right angles. These 

properties of the orthogonality of eigenfunctions and the reality of eigenvalues 

λ are reminiscent of the similar properties for eigenvectors and eigenvalues 

of symmetric matrices. This is no accident and the connection is explored 

further in Module 7. 

Othogonality implies oscillations: consider how a family of functions y n (x) 

can all be orthogonal to each other. First y 0 (x) can be fairly boring such as 

the constant P 0 (x) or cos(0·x). Secondly, y 1 (x) has to change sign somewhere,


as seen in P 1 (x) or cos(x), so that the integral ∫ b 

a y 0(x)y 1 (x)dx can be zero by 

orthogonality. Thirdly, y 2 (x) has to be orthogonal to both y 0 (x) and y 1 (x) 

so it must oscillate a couple of times, as seen in P 2 (x). And so on—as we 

consider further y n (x) we find that successive y n (x) must have more and more 

oscillations in order to maintain orthogonality. This is seen for example in the 

families P n (x) and cos(nx). It holds very widely: solutions of Sturm-Liouville 

problems have more oscillations the higher the value of the corresponding 

eigenvalue. 7 

Activity 6.N Do problems from Problem Set 4.7 [K,pp238–9]. Send in to 



6.5 (a) expect power series solution because the coefficient functions are all 

well behaved at x = 0 and the leading coefficient of y ′′ is not zero. 

(b) y = 1 + 3 2 x4 + O(x 4 ) 

6.13 y = 1 + x 2 + 1 6 x4 + 1 

90 x6 + 1 

2520 x8 + O(x 10 ) 

6.14 y = b(x − 1 12 x4 + 1 

1 

12960 x9 + 1 

504 x7 − 1 

45360 x10 + 1 

1710720 x12 ) + O(x 15 ) 

7 This can be proved but we will not do so here. 

7076160 x13 ) + a(1 − 1 6 x3 + 1 

180 x6 −


6.15 y = 1 + x + 1 2 x2 − 1 8 x4 − 1 

15 x5 − 1 

240 x6 + 1 90 x7 + 31 

5760 x8 + 1 

5670 x9 + O(x 10 ) 

6.16 y = 1 + 3x 2 + 3x 4 + 3x 6 + 18 7 x8 + b(x + 2x 3 + 3x 5 + 24 7 x7 + 25 

7 x9 ) + 

b 2 ( 1 2 x4 + x 6 + 45 

28 x8 ) + b 3 ( 1 7 x7 + 5 

14 x9 ) + O(x 10 ) 

6.17 y = 2−3x+4x 2 − 14 

O(x 10 ) 

3 x3 + 11 

6.18 y = a(1 − 3 2 x2 − 3 8 x4 − 17 

159 

4480 x9 ) + O(x 10 ) 

2 x4 − 25 4 x5 + 211 

30 x6 − 47 6 x7 + 2081 

80 x6 − 663 

6.20 y = 2−4x+4x 2 − 32 

9 x3 + 26 9 x4 − 56 

O(x 10 ) 

4480 x8 ) + b(x − 1 6 x3 − 3 

240 x8 − 41243 

4320 x9 + 

40 x5 − 27 7 − 

560x 

25 x5 + 3404 

2025 x6 − 832 

675 x7 + 1199 

1350 x8 − 4142 

6561 x9 + 

6.21 Well behaved solutions are proportional to y = x 2 − 3 

144 x8 + 

O(x 10 ). The nonlinear solutions parametrised by a, the coefficient of 

the quadratic term, are y = a(x 2 − 3 

10 x4 + 3 

56 x6 − 1 

144 x8 ) + a 2 (− 1 

10 x4 + 

11 

280 x6 − 661 

75600 x8 ) + a 3 ( 1 

140 x6 − 11 

3150 x8 ) − 17 

37800 a4 x 8 + O(x 10 ). 

10 x4 + 3 

56 x6 − 1 

6.22 Well behaved solutions are proportional to y = 1− 1 5 x3 + 1 80 x6 − 1 

1 

147840 x12 1 

− 

12566400 x15 1 

+ 

1507968000 x18 + O(x 20 ) 

6.23 y = (a+b log x)(x− 1 

11 

11612160 x7 − 

5951 

2640 x9 + 

24 x3 + 7 

3840 x5 − 89 

1161216 x7 + 6721 

2229534720 x9 1 

)+b( 

23040 x5 + 

44590694400 x9 ) + O(x 10 ). 

6.24 (1 − x 3 )y ′′ − (y 2 − x 2 )y ′ = 0, such that y(0) = 0 and y ′ (0) = 2 . 

y (1) = 2x + 1 2 x4 . The ultimate error is O(x 20 ) .


6.5 Summary 

• Power series give a powerful general method for solving linear and 

nonlinear ordinary differential equations (ode’s). At a regular point 

(§§6.2.1) solutions of an ode are developed in the form of Taylor or 

Maclaurin series (§§6.1.1): 

∞∑ 

y(x) = a m (x − c) m = a 0 + a 1 (x − c) + a 2 (x − c) 2 + · · · . 

m=0 

Because of the uniqueness of a power series representation, the constants 

a m are determined equating coefficients of like powers of x − c 

(§§6.1.1). 

• Legendre polynomials, P n (x), are an example of special functions: 

– are the only non-singular solutions of Legendre’s equation which 

is, in Sturm-Liouville form, [(1 − x 2 )y ′ ] ′ + n(n + 1)y = 0 (§§6.1.2); 

– and are orthogonal over the interval [−1, 1] with weight function 

p(x) = 1 (§§6.4). 

• At a singular point (but not “too” singular §§6.2.1) Frobenius asserts 

solutions may be developed in the modified power series: 

∞ 

y(x) = (x − c) r ∑ 

a m (x − c) m 

m=0 

= a 0 (x − c) r + a 1 (x − c) r+1 + a 2 (x − c) r+2 + · · · .


The exponent r is determined from the indicial equation obtained from 

the term of lowest order after substituting into the ode. 

• In applying Frobenius method (§§6.2.1) to second-order ode’s there are 

generally two roots r 1 ≥ r 2 to the indicial equation and consequently 

three cases are distinguished (taking c = 0 for simplicity): 

– distinct roots not differing by an integer are straightforward—a 

basis for the solutions are 

( 

y 1 (x) = x r 1 a0 + a 1 x + a 2 x 2 + · · ·) 

, 

( 

y 2 (x) = x r 2 b0 + b 1 x + b 2 x 2 + · · ·) 

; 

– a double root, r 1 = r 2 , when a basis is 

( 

y 1 (x) = x r 1 a0 + a 1 x + a 2 x 2 + · · ·) 

, 

( 

y 2 (x) = y 1 (x) log x + x r 1 b1 x + b 2 x 2 + · · ·) 

; 

– roots differing by an integer when a basis is 

( 

y 1 (x) = x r 1 a0 + a 1 x + a 2 x 2 + · · ·) 

, 

( 

y 2 (x) = ky 1 (x) log x + x r 2 b0 + b 1 x + b 2 x 2 + · · ·) 

; 

• Bessel functions, J ν (x) and Y ν (x), are special functions and 

– are solutions of Bessel’s equation (§§6.2.2) x 2 y ′′ +xy ′ +(x 2 −ν 2 )y = 

0 or, in Sturm-Liouville form, [xy ′ ] ′ + ( ) 

x − ν2 

x y = 0;


– are orthogonal over intervals with x > 0 in several senses (§§6.4) 

• The iterative construction of power series solutions is an ideal application 

of computer algebra (§§6.3.2) for linear and nonlinear problems 

provided we discard unwanted high-order terms. 

• A good iterative method is (§§6.3.3): given an approximate solution 

y(x), to seek small changes ŷ(x) so that y(x) + ŷ(x) is a better approximation. 

Such changes are determined from the residual of the 

governing equations. 

• Many ode’s of importance may be written in the form of the Sturm- 

Liouville equation (6.7), [r(x)y ′ ] ′ + [q(x) + λp(x)]y = 0: 

– non-zero solutions only exist for particular values of λ = λ n , called 

the eigenvalues, which are necessarily real; 

– the corresponding eigenfunctions are all orthogonal with weight 

function p(x) (§§6.4). 

Activity 6.O Do problems from Chapter 4 Review [K,pp247–8].

Module 7 

Linear transforms and their 

eigenvectors on inner product 

spaces 

Recall the work on differential equations and their orthogonal solutions that 

we finished in Module 6. Many of the properties we touched upon there are 

very similar to some that you have met in linear algebra before. The time 

has come to bring these two strands together. 

But solutions of differential equations involve the infinite flexibility of functions. 

We will see that functions act very much like vectors. But on any

Module 7. Linear transforms and their eigenvectors on inner product spaces253 

finite interval there are not just an infinite number of functions there are an 

infinite variety of functions. For example, in §7.4.3 we use the infinite number 

of solutions to Sturm-Liouville problems to describe any other solution function. 

But “infinity” is a slippery concept, so we now are very careful about 

how to establish the mathematical basis. First we create a basic structure 

for space, then the properties of mappings between spaces, and lastly the 

representation of these mappings by simple matrices of coefficients. 

Ultimately the development of a common setting allows us to draw simple 

vector pictures even when discussing concepts in extremely complicated situations 

such as the space of all continuous functions. 

Sturm-Liouville theory introduced in §6.4 is very close to properties of eigenvalues 

and eigenvectors of matrices. In this module we bring both within 

a unified view using the abstract theory of inner product spaces. We then 

extend the combined view a little further. 


7.1 Inner product spaces . . . . . . . . . . . . . . . . . . . 255 

7.1.1 Vector spaces form the universe . . . . . . . . . . . . . . 255 

7.1.2 Inner products give distances and angles . . . . . . . . . 262 

7.1.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 266 

7.2 The nature of linear transformations . . . . . . . . . . 269 

7.2.1 The universe of linear transformations . . . . . . . . . . 269


7.2.2 Adjoint operators . . . . . . . . . . . . . . . . . . . . . . 273 

7.2.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 282 

7.3 Revision of eigenvalues and eigenvectors . . . . . . . 284 

7.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 287 

7.4 Diagonalisation transformation . . . . . . . . . . . . . 289 

7.4.1 Adjoint eigenvectors diagonalise operators . . . . . . . . 290 

7.4.2 Orthogonal eigenvectors of self-adjoint operators . . . . 300 

7.4.3 Expansions in orthogonal eigenfunctions . . . . . . . . . 303 

7.4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . 308 


7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . 312


7.1 Inner product spaces 

Here we establish the basic abstract structure of spaces in which the analysis 

takes place of linear algebra and differential and integral equations. The abstract 

concepts are supported by examples met in your earlier mathematics. 

The approach is to build up the structures and properties that are needed 

from an axiomatic base. 

Main aims: 

• to develop vector spaces and their properties from their basic axioms 

and to understand how functions and IR n are unified in this framework; 

• to see how the definition of inner products leads to a unified view of 

the useful notions of length, angles and orthogonality; 

• show how familiar relations and inequalities generalise to many situations. 

7.1.1 Vector spaces form the universe 

The first step is to define the fundamental axioms of vector spaces. 1 

1 An entertaining and accurate introduction to vector spaces is available at 

http://ciips.ee.uwa.edu.au/~gregg/Linalg/node86.html


Reading 7.A Study the first three pages of §6.8 in Kreyszig [K,pp358–60]. 

Note the basic properties of vector addition and scalar multiplication on the 

vector space: closed, commutativity, associativity, distributivity, and the existence 

of the zero vector 0, the negative of a vector, and the multiplicative 

identity 1. As for ordinary vectors there exist the concepts of linear combination, 

linear independence, dimensionality both finite and infinite, and a 

basis. 

In our study we will stay within the realm of real vector spaces. 

Example 7.1: quadratic polynomials Show the set V of all quadratic 

polynomials (including those with zero coefficients) form a vector space 

under the usual operations of addition and scalar multiplication, write 

down a basis for the vector space and deduce it is of dimension 3. 

Solution: Denote by, for example, a the quadratic polynomial a 0 + 

a 1 x + a 2 x 2 . 

• Then “vector” (polynomial) addition a + b = (a 0 + b 0 ) + (a 1 + 

b 1 )x + (a 2 + b 2 )x 2 clearly gives another quadratic polynomial in V 

and is thus closed under addition.


• By definition and commutativity of ordinary addition: 

a + b = (a 0 + b 0 ) + (a 1 + b 1 )x + (a 2 + b 2 )x 2 

= (b 0 + a 0 ) + (b 1 + a 1 )x + (b 2 + a 2 )x 2 

= b + a . 

• Similarly for associativity: 

(u + v) + w 

= [(u 0 + v 0 ) + w 0 ] + [(u 1 + v 1 ) + w 1 ]x + [(u 2 + v 2 ) + w 2 ]x 2 

= [u 0 + (v 0 + w 0 )] + [u 1 + (v 1 + w 1 )]x + [u 2 + (v 2 + w 2 )]x 2 

= u + (v + w) . 

• Clearly the zero vector, 0, is the zero polynomial 0 + 0x + 0x 2 as 

a + 0 = (a 0 + 0) + (a 1 + 0)x + (a 2 + 0)x 2 = a. 

• Now scalar multiplication defined as ca = (ca 0 ) + (ca 1 )x + (ca 2 )x 2 

clearly gives another quadratic and so V is closed under scalar 

multiplication. 

• By definition and distributivity of ordinary multiplication: 

c(a + b) = c(a 0 + b 0 ) + c(a 1 + b 1 )x + c(a 2 + b 2 )x 2 

= (ca 0 + cb 0 ) + (ca 1 + cb 1 )x + (ca 2 + cb 2 )x 2 

= (ca 0 ) + (ca 1 )x + (ca 2 )x 2 + (cb 0 ) + (cb 1 )x + (cb 2 )x 2 

= (ca) + (cb) .


• Similarly for 

(c + d)a 

= (c + d)a 0 + (c + d)a 1 x + (c + d)a 2 x 2 

= (ca 0 + da 0 ) + (ca 1 + da 1 )x + (ca 2 + da 2 )x 2 

= (ca 0 ) + (ca 1 )x + (ca 2 )x 2 + (da 0 ) + (da 1 )x + (da 2 )x 2 

= (ca) + (da) . 

• Again by definition and associativity of ordinary multiplication 

c(da) = c [ (da 0 ) + (da 1 )x + (da 2 )x 2] 

= c [ d(a 0 + a 1 x + a 2 x 2 ) ] 

= (cd)a . 

• Lastly, the number 1 clearly serves as the identity for scalar multiplication. 

Thus this system forms a vector space. A basis for the vector space 

could be simply the powers of x in {1, x, x 2 } which in fact we used to 

show the vector space properties. Note that 1, x and x 2 are linearly 

independent quadratics because one cannot find a linear combination 

of them that is the zero quadratic, that is, zero for all x. Another basis 

for the vector space could be the first three Legendre polynomials: 

P 0 (x) = 1, P 1 (x) = x and P 2 (x) = − 1 2 + 3 2 x2 . Since the number of basis 

vectors is necessarily three, then so is the dimensionality.


Example 7.2: Show that sets with set union (∪) as the addition operator 

cannot form a vector space. 

Solution: Denote the “vectors”, namely the subsets of some universal 

set U, by capital letters, A and B. 

(a) Clearly A + B = A ∪ B is a set in U so “vector addition” is closed. 

(b) Also clearly, A+B = A∪B = B ∪A = B +A so “vector addition” 

satisfies commutativity. 

(c) Set union is associative so (A + B) + C = (A ∪ B) ∪ C = A ∪ 

(B ∪ C) = A + (B + C) ensures the associativity of this “vector 

addition.” 

(d) Now A + 0 = A ∪ 0 = A can only hold for all sets A if the zero 

vector is 0 = ∅ the empty set. 

(e) But then there is no negative for every set A as clearly there is 

generally no set B (which we would like to denote by −A) such 

that A + B = A ∪ B = ∅. 

Because of the failure of this property, we cannot form a vector space. 

Definition 7.1 A square integrable function on the interval [a, b] is a function, 

say f(x), for which ∫ b 

a [f(x)]2 dx is finite valued. The set of all square 

integrable functions on [a, b] is denoted L 2 [a, b].


Example 7.3: Argue that L 2 [a, b] is a vector space under the usual addition 

and scalar multiplication of functions. 

Solution: Denote the “vectors” by lower case letters such as f, g and 

h to denote the functions f(x), g(x) and h(x) respectively. Consider 

each property in turn. 

• Defining f + g to be the function with the value f(x) + g(x) for 

all x ∈ [a, b]. But is it necessarily in L 2 [a, b], namely square integrable? 

Note the following inequality, that for any numbers a and 

b: Remember this 

(a + b) 2 = 2a 2 + 2b 2 − (a − b) 2 ≤ 2a 2 + 2b 2 . 

Apply this pointwise to functions f and g: 

∫ b 

a 

(f + g) 2 dx ≤ 

∫ b 

a 

∫ b 

∫ b 

2f 2 + 2g 2 dx = 2 f 2 dx + 2 g 2 dx , 

a 

a 

and since the right-hand side is a finite upper bound for the nonnegative 

integral on the left, thus f + g must be in L 2 [a, b] and 

addition is closed. 

• Also commutativity, f + g = g + f, follows from pointwise commutativity, 

f(x) + g(x) = g(x) + f(x). 

• Similarly for associativity. 

• Clearly the “zero vector” is the zero function as f(x) + 0 = f(x) 

for all x 

inequality—it comes 

from the 

parallelogram 

equality.


• The “negative” of f is simply its pointwise negative −f(x). −f 

is clearly in L 2 [a, b] if f is. 

• L 2 [a, b] is closed under scalar multiplication as ∫ b 

a [cf(x)]2 dx = 

c 2 ∫ b 

a f 2 dx which is finite for all finite c and square integrable f. 

• As above distributivity and associativity of scalar multiplication 

follows immediately from pointwise properties. 

• Lastly, the identity for scalar multiplication is the function that 

is 1 for all x ∈ [a, b] as 1.f(x) = f(x) 

Often we only want to consider subsets of a vector space. For example, 

when solving a differential equation with boundary conditions we only need 

to consider those “vectors” in the vector space of functions which satisfy the 

boundary conditions. The notion of a subspace of a vector space is very 

useful from time to time. The proof of the following theorem follows directly 

from the properties of a vector space. 

Theorem 7.2 A subset U of a vector space V is a vector space itself if it is 

closed under vector addition and scalar multiplication. Such a subset is then 

called a vector subspace. 

Example 7.4: The set of vectors lying on any one line through the origin in 

the plane form a vector subspace. Clearly, given a fixed line U through


the origin: any two vectors lying in the line U add to another vector 

in U; any scalar multiple of a vector in the line U is also a vector in U. 

Thus such a line U is closed, is a subset of the vector space of the plane 

and therefore is a vector subspace. 

Activity 7.B Do Problems 1–12 in Problem Set 6.8 [K,p364], and 7.6–7.10 

from Exercises 7.1.3. Send in to the examiner for feedback at least Q1, 

7 and Ex. 7.6. 

7.1.2 Inner products give distances and angles 

One of our fundamental needs is the notion of distance and angles. For 

example, only then can we determine the errors in an approximation. A generalisation 

of the vector dot product to an inner product serves this purpose 

in any vector space. 

Reading 7.C Study the brief subsection on Inner Product Spaces in Kreyszig 

§6.8 [K,pp361–2].


Note, Kreyszig uses round brackets (parentheses) to denote a general inner 

product, (u, v), whereas I prefer the angle brackets, 〈u, v〉, as it is less likely 

to be mistaken for a vector with two components, and will use it throughout 

this study guide. Inner products occur so extensively in mathematics that 

one often uses many different types of brackets for different inner products 

on different vector spaces. 

Definition 7.3 An inner product on a real vector space V is a real function 

〈u, v〉 for each u and v in V such that the following properties hold: 

1. linearity, 〈au + bv, w〉 = a 〈u, w〉 + b 〈v, w〉 for all real a and b, and 

all vectors u, v and w in V ; 

2. symmetry, 〈u, v〉 = 〈v, u〉 for all vectors u and v in V ; 

3. positivity, 〈v, v〉 ≥ 0 for all v with equality holding only if v = 0. 

A vector space with an inner product is called an inner product space. 

Example 7.5: For functions f and g in L 2 [a, b] determine whether 〈f, g〉 = 

∫ b 

a fg dx forms an inner product.


Solution: 

Since fg = 1 4 [(f + g)2 − (f − g) 2 ] thus 

∫ b 

fg dx = 1 [ ∫ b 

∫ ] 

b 

(f + g) 2 dx − (f − g) 2 dx 

a 4 a 

a 

which always is a finite real number as f ± g are in L 2 [a, b]. 

linearity for all c, d and functions f, g and h in L 2 [a, b]: 

〈cf + dg, h〉 = 

= 

∫ b 

a 

∫ b 

a 

(cf + dg)h dx 

cfh dx + 

∫ b 

= c 〈f, h〉 + d 〈g, h〉 . 

a 

dgh dx 

symmetry Clearly 〈f, g〉 = ∫ b 

a fg dx = ∫ b 

a gf dx = 〈g, f〉 

positivity Also clearly 〈f, f〉 = ∫ b 

a f 2 dx ≥ 0 as the integrand f 2 ≥ 0. 

However, 〈f, f〉 can be 0 without f being precisely zero. For Infinite dimensional 

example, consider f(x) = 0 everywhere on [a, b] except for a finite function spaces are 

number of points at which it takes some non-zero value, then 

tricky. 

〈f, f〉 = ∫ b 

a f 2 dx = 0 but f is not zero. Strictly speaking this 〈, 〉 

is not an inner product on L 2 [a, b]. 

However, we can patch the definitions. Refine the definition of square 

integrable functions so that a “vector” f in L 2 [a, b] is the set of all 

functions which are the same except at some number of isolated points.


Then all necessary properties of an inner product space follow including 

that 〈f, f〉 = 0 only if f is the zero “vector” (the set of functions such 

that ∫ b 

a f 2 dx = 0.) 

With an inner product defined, the definition of distance between two vectors 

follows immediately. 

Definition 7.4 For vectors √ u and v in an inner product space, the length 

or norm of u is ‖u‖ = 〈u, u〉. (Thus the distance between u and v is 

‖u − v‖). A vector of norm 1 is called a unit vector. 

Note especially the consequent Schwarz inequality, also known as the Cauchy- 

Schwarz inequality, the triangle inequality, and the parallelogram equality. 

These relations are familiar in 2 and 3-dimensional geometry, and now we 

know they also hold even for very esoteric vector spaces. It means that 

schematic diagrams we draw on paper are still relevant to infinite dimensional 

inner product spaces. 

Inner products not only provide the notion of distance, they are also intimately 

tied up with the notion of angles and hence orthogonality. This 

underpins the orthogonality we discussed (§6.4) in the infinite number of 

eigenfunctions of Sturm-Liouville problems.


Definition 7.5 The angle θ between two vectors u and v in an inner product 

space is determined from 

( ) 〈u, v〉 

〈u, v〉 = ‖u‖.‖v‖. cos θ that is θ = arccos 

. (7.1) 

‖u‖.‖v‖ 

Consequently, two vectors are orthogonal if their inner product 〈u, v〉 = 0 . 

Observe how the Cauchy-Schwarz inequality ensures that there is always a 

well defined angle between any two non-zero vectors of an inner product 

space. This leads to being able to characterise vectors for which 〈u, v〉 = 

0 as being orthogonal, that is at right-angles, and leads to a generalised 

Pythagoras theorem. 

Activity 7.D Do problems from Exercises 7.11–7.13. Send in to the examiner 

for feedback at least Ex. 7.13. 


Ex. 7.6: Show that sets of objects with set intersection, ∩, as the addition 

operator cannot form a vector space. 

Ex. 7.7: Argue that the set of infinite sequences, denoted by IR ∞ it is composed 

of elements of the form a = (a 1 , a 2 , a 3 , . . .), forms a vector space.


Ex. 7.8: Determine whether the following functions are in L 2 on the given 

interval: 

(a) sin x on [0, π]; 

(b) cos x on (−∞, ∞); 

(c) e −x on [0, ∞); 

(d) x −1/4 on [0, 1]; 

(e) 1/ √ x on [0, 4]; 

(f) x −3/4 on [1, ∞). 

Ex. 7.9: Let L w 2 [a, b] denote the set of functions for which the weighted integral 

∫ b 

a w(x)[f(x)]2 dx is finite for some positive weight function w(x). 

Argue L w 2 [a, b] is a vector space under the usual addition and scalar 

multiplication of functions. 

Ex. 7.10: Argue that the space, C n [a, b], 2 of all functions with n continuous 

derivatives on [a, b] forms a vector space under addition and scalar 

multiplication of functions. 

Ex. 7.11: By considering ‖u+v‖ 2 and using the Cauchy-Schwarz inequality, 

prove the triangle inequality ‖u + v‖ ≤ ‖u‖ + ‖v‖ for all vectors in an 

inner product space. 

2 Often the space C 0 [a, b] of continuous functions on [a, b] is written as just C[a, b].


Ex. 7.12: Argue that the subset U of IR ∞ for which ∑ ∞ 

i=1 a 2 i is finite forms 

a subspace with an inner product 〈a, b〉 = ∑ ∞ 

i=1 a i b i . 

Ex. 7.13: Let u = x and v = x 2 and the inner product 〈f, g〉 = ∫ 1 

0 fg dx . 

What are the norms of u and v? What is the angle between u and v?


7.2 The nature of linear transformations 

We need to start considering functions defined on vector spaces. The simplest 

examples are functions of many variables. But we will have to move to 

dealing with functions of an infinite number of variables and even functions 

of functions! In fact you are already intimately familiar with the examples 

of differentiation and integration: sin x = cos x, d 

d 

= 2xe 

dx dx ex2 x2 and 

d 

(2√ x) = 1/ √ x so differentiation takes a function as an argument, such as 

dx 

sin x, and returns a function as a result, such as cos x. Here we investigate 

the simplest functions of a vector space, the linear transformations. They 

are the “straight lines” of vector spaces that in later units will form a basis 

for understanding quite general transformations. 

Main aims: 

• show that familiar operations on functions are examples of linear transformations; 

• the adjoint is the general analogue of the transpose. 

7.2.1 The universe of linear transformations 

Reading 7.E Study the last part, Linear Transformations, of Kreyszig §6.8 

[K,pp362–4].


Definition 7.6 If F : V → W is a function from the vector space V into 

the vector space W , then F is called a linear transformation if 

1. F (u + v) = F (u) + F (v) for all vectors u and v in V , and 

2. F (cu) = cF (u) for all vectors u in V and scalars c. 

A linear transform is also called a linear operator. 

Example 7.14: Show that the differential operator L = d2 

+x d 

dx 2 

transformation from C 2 [a, b] into C 0 [a, b]. 

dx 

is a linear 

Solution: Since L involves at most the second derivative, the range 

and domain are clearly appropriate. 

(a) Observe 

L(f + g) = d2 

d 

(f + g) + x (f + g) 

dx2 dx 

= d2 f 

dx + d2 g 

2 dx + x df 

2 dx + x dg 

dx 

= d2 f 

dx + x df 

2 dx + d2 g 

dx + x dg 

2 dx 

= Lf + Lg ,


(b) and 

L(cf) = d2 

dx 2 (cf) + x d 

dx (cf) 

= c d2 f df 

+ cx 

dx2 dx 

= cLf , 

which are the requisite properties for any a and b. 

Example 7.15: Argue that the integral L(f) = ∫ b 

a K(x, y)f(y) dy is a linear 

operator from L 2 [a, b] into itself, that is from and to square integrable 

functions, provided that K is bounded, |K(x, y)| ≤ k, for x and y in 

the interval [a, b]. 

For example, if a = 0, b = 1 and K(x, y) = x − y (the bound is k = 1) 

then L(x 2 ) = 1x − 1 and L(sin πx) = (2x − 1)/π. 

3 4 

Solution: As with many infinite dimensional vector space problems 

the overwhelming difficult lies in confirming the range of the function L. 

Thus we first dispense with the straightforward part of showing that it 

is linear:


(a) 

(b) 

L(f + g) = 

= 

= 

∫ b 

a 

∫ b 

a 

∫ b 

a 

K(x, y)(f(y) + g(y))dy 

K(x, y)f(y) + K(x, y)g(y) dy 

K(x, y)f(y) dy + 

= L(f) + L(g) ; 

L(cf) = 

= c 

∫ b 

a 

∫ b 

a 

= cL(f) . 

∫ b 

a 

K(x, y)cf(y) dy 

K(x, y)f(y) dy 

K(x, y)g(y) dy 

For f ∈ L 2 [a, b] we know ∫ b 

a f 2 (x) dx is finite. We now need to prove 

that g(x) = ∫ b 

a K(x, y)f(y) dy is also in L 2[a, b]. To help we define the 

inner product for any f and g, 〈f, g〉 = ∫ b 

a fg dx and use the Cauchy- 

Schwarz inequality that 〈f, g〉 2 ≤ ‖f‖ 2 ‖g‖ 2 . Consider 

∫ b 

a 

g 2 dx = 

= 

∫ b 

a 

∫ b ∫ b 

a 

L(f)L(f) dx 

a 

∫ b 

K(x, y)f(y)dy K(x, z)f(z)dz dx 

a


= 

≤ 

as any variable may be used for y in L(f) 

∫ b ∫ b ∫ b 

a a a 

∫ b ∫ b ∫ b 

a 

a 

K(x, y)K(x, z)dx f(y)f(z) dy dz 

|K(x, y)|.|K(x, z)|dx |f(y)|.|f(z)| dy dz 

a 

} {{ } 

≤(b−a)k 2 

∫ b 

rearranging 

∫ b 

≤ k 2 (b − a) |f(y)|dy |f(z)|dz 

a 

a 

= k 2 (b − a) 〈1, |f|〉 2 by definition of inner product 

≤ k 2 (b − a)‖f‖ 2 ‖1‖ 2 

∫ b 

= k 2 (b − a) 2 f 2 (x) dx 

a 

by Cauchy-Schwarz 

by definition of norms. 

This bound on the integral is finite and thus g = L(f) is necessarily 

square integrable, that is in L 2 [a, b]. 

Activity 7.F Do Exercise 7.23 herein. 

7.2.2 Adjoint operators 

Recall that the transpose of a matrix often crops up in solving matrix problems. 

For example, the least-squares solution of an overdetermined system


Ax = b is found by solving A T Ax = A T b. Also, the eigenvalues of a symmetric 

matrix, one for which A T = A, are always real. For general linear 

transforms we define the equivalent notion of an adjoint operator. 

Definition 7.7 The adjoint of a linear operator L mapping a subspace V 

into a subspace U of an inner product space W is the operator L † such that 

〈u, Lv〉 = 〈 L † u, v 〉 for all vectors u ∈ U and v ∈ V . 

If L † = L and U = V then L is called self-adjoint. 

Example 7.16: A † = A T . The adjoint of a matrix is its transpose using the 

usual inner product 〈u, v〉 = u T v. For all u and v, consider: 

〈u, Av〉 = u T Av by inner product definition 

= (A T u) T v by transpose properties 

= 〈 A T u, v 〉 by inner product definition, 

and hence the adjoint is A T . Clearly a symmetric matrix is self-adjoint. 

Theorem 7.8 Some straightforward properties of the adjoint follow. 

any linear operators L and M: 

For


1. (L † ) † = L; 

2. (L + M) † = L † + M † ; 

3. (LM) † = M † L † ; 

4. the adjoint depends upon the inner product, if the inner product is 

changed then so does the adjoint. 

The proofs of these properties are left as Exercise 7.26. 

Example 7.17: The shear transformation of the plane in the horizontal x- 

direction with parameter k is T (x, y) = (x + ky, y). This has matrix 

A = 

[ 

1 k 

0 1 

] 

so that 

[ 

x 

T (x, y) = A 

y 

] 

. 

Thus from Example 7.16 its adjoint must have matrix 

A T = 

[ 

1 0 

k 1 

] 

so that 

T † (x, y) = A T [ 

x 

y 

] 

= 

[ 

x 

kx + y 

] 

. 

Thus T † is the shear transformation in the vertical y-direction with 

parameter k. 

But what is T † if we used a weighted inner product? Say the inner 

product on the plane was defined as 〈(u, v), (x, y)〉 = 2xu + yv so that


we weight the horizontal direction more than the vertical direction. 

Then for all u, v, x and y 

〈(u, v), T (x, y)〉 = 〈(u, v), (x + ky, y)〉 

= 2u(x + ky) + vy 

= 2ux + (2ku + v)y 

= 〈(u, 2ku + v), (x, y)〉 

and so T † (x, y) = (x, 2kx + y) which is again a shear in the vertical but 

now with parameter 2k. The adjoint depends upon the choice of inner 

product. 

Example 7.18: Find the adjoint of the linear operator 

Lf = 

∫ b 

a 

K(x, y)f(y)dy . 

Solution: 

Using the inner product 〈f, g〉 = ∫ b 

a f(x)g(x) dx we have 

〈f, Lg〉 = 

= 

∫ b 

a 

∫ b ∫ b 

a 

∫ b 

f(x) 

a 

a 

K(x, y)g(y) dy dx 

f(x)K(x, y) dx g(y) dy 

by definitions 

swap order of integration


= 

= 

∫ b ∫ b 

a a 

〈 ∫ b 

a 

K(y, x)f(y) dy g(x) dx 

〉 

K(y, x)f(y) dy , g 

and so the adjoint L † f = ∫ b 

a 

, 

swapping roles of x and y 

K(y, x)f(y) dy—which is not the same This is analogous to 

matrix transpose. 

as L because the arguments of K are interchanged, unless K(y, x) = 

K(x, y) for all x and y in which case K is called symmetric and then 

the operator L is self-adjoint. 

Example 7.19: the adjoint of d/dx is almost −d/dx! Consider this differentiation 

operator over C 1 [a, b] with the usual inner product 〈f, g〉 = 

∫ b 

a f(x)g(x) dx. Now 

〈 

f, dg 〉 ∫ b 

= f dg 

dx 

a dx dx 

∫ b 

= [fg] b a − df 

a dx g dx by integration by parts 

〈 

= f(b)g(b) − f(a)g(a) + − df 〉 

dx , g . 

The inner product appearing here indeed suggests that d † 

dx = − 

d 

but dx 

the exact identity required by the definition of adjoint actually does 

not hold unless we also have f(b)g(b) − f(a)g(a) = 0. 

This is a usual 

difficulty in function


• One way to ensure this is to restrict the set of functions that the 

adjoint is defined to the subspace U of C 1 [a, b] of functions that are 

zero at a and b; hence f(a) = f(b) = 0, then f(b)g(b)−f(a)g(a) = 

0 and hence d † 

dx = − 

d 

on U. 

dx 

• Another way, more aesthetically pleasing, is to restrict d to the 

dx 

subspace V of functions zero at a and restrict d † 

dx to the subspace 

U (redefined) of functions zero at b (or vice-versa); then f(b)g(b)− 

f(a)g(a) = 0 as g(a) = 0 and f(b) = 0 and hence d † 

dx = − 

d 

dx 

The previous example begins to show that initial and/or boundary conditions 

are an integral part of operators and their adjoints. 

Example 7.20: L = d2 is self-adjoint on the subspace V of C 

dx 2 

2 [a, b] of functions 

that are zero at x = a and b using the usual inner product. 

Solution: Just consider 

〈f, Lg〉 = 

∫ b 

a 

fg ′′ dx 

∫ b 

= [fg ′ ] b a − f ′ g ′ dx 

a 

∫ b 

= [fg ′ − f ′ g] b a + f ′′ g dx 

using dashes for derivatives 

a 

integrating by parts once 

integrating by parts again


= [fg ′ − f ′ g] b a 

+ 〈Lf, g〉 

= [fg ′ ] b a 

+ 〈Lf, g〉 as g(a) = g(b) = 0 for g ∈ V 

= 〈Lf, g〉 

for f ∈ V as then f(a) = f(b) = 0. Thus L † = L on V and is 

self-adjoint. 

Example 7.21: Find the adjoint of L = d2 

+ x d on the subspace V = 

dx 2 dx 

{g ∈ C 2 [a, b] | g ′ (a) = g(b) = 0} using the usual inner product. 

Solution: 

Consider 

〈f, Lg〉 = 

= 

∫ b 

a 

∫ b 

a 

f(g ′′ + xg ′ ) dx 

fg ′′ + xfg ′ dx 

∫ b 

= [fg ′ + xfg] b a − f ′ g ′ + (xf) ′ g dx 

= f(b)g ′ (b) − af(a)g(a) − 

as g ′ (a) = g(b) = 0 

a 

∫ b 

a 

integrating by parts 

f ′ g ′ + (xf ′ + f)g dx 

∫ b 

= f(b)g ′ (b) − af(a)g(a) − [f ′ g] b a + (f ′′ − xf ′ − f)g dx 

a


integrating f ′ g ′ by parts 

= f(b)g ′ (b) − af(a)g(a) + f ′ (a)g(a) + 〈f ′′ − xf ′ − f, g〉 

since g(b) = 0. 

Thus L † = 

d2 − x d − 1 provided we restrict it to the subspace U 

dx 2 dx 

of functions f such that f(b)g ′ (b) + [−af(a) + f ′ (a)]g(a) = 0. Now 

we cannot control g ′ (b) nor g(a) as these may vary over any values in 

V . Thus we require that f(b) = 0 and f ′ (a) = af(a)—the adjoint is 

L † = d2 − x d − 1 over the subspace 

dx 2 dx 

U = {f ∈ C 2 [a, b] | f ′ (a) − af(a) = f(b) = 0} . 

Example 7.22: Sturm-Liouville Show that Sturm-Liouville operators are 

self-adjoint in the usual inner product on suitable subsets of C 2 [a, b]. 

Solution: 

As seen in §6.4 the Sturm-Liouville equation is 

Lg = [r(x)g ′ ] ′ + [q(x) + λp(x)]g = 0 , 

for some functions p, q and r. Let’s consider the subspace of functions 

satisfying the quite general boundary conditions kg(a) + lg ′ (a) = 0 and


mg(b) + ng ′ (b) = 0. Then 

〈f, Lg〉 = 

= 

∫ b 

a 

∫ b 

a 

f{(rg ′ ) ′ + (q + λp)g} dx 

f(rg ′ ) ′ + (q + λp)fg dx 

∫ b 

= [rfg ′ − rf ′ g] b a + (f ′ r) ′ g + (q + λp)fg dx 

a 

after integrating f(rg ′ ) ′ by parts twice 

= [r(fg ′ − f ′ g)] b a 

+ 〈Lf, g〉 . 

Thus for L to be self-adjoint we either need r(x) to be zero at the endpoints 

or, the case we consider here, fg ′ − f ′ g = 0 at the two ends. If 

l ≠ 0, then, where all functions are evaluated at x = a, g ′ = −kg/l and 

hence 

fg ′ − f ′ g = −kfg/l − f ′ g = −(kf + lf ′ )g/l = 0 

if and only if kf + lf ′ = 0 since g could have any value. Similarly 

for the other end point x = b. If it happens that l = 0 then even 

easier arguments apply. We have shown that the functions f for the 

adjoint satisfy the same boundary conditions as for L itself. Thus 

Sturm-Liouville operators are self-adjoint. 

Observe in these examples how a differential operator together with suitable 

boundary conditions are very naturally complemented by the adjoint and


its boundary conditions. If the operator and boundary conditions are used 

to form a well-posed differential equation, then so can the adjoint and its 

boundary conditions form a well-posed differential equation. Soon we will 

see that solutions to an ode are usefully related to those of its adjoint. The 

relationship is particularly useful when the operator is self-adjoint. 


Ex. 7.23: Argue that the set V of linear transforms from a vector space U 

to a vector space V , L : U → V , is itself a vector space under operator 

addition, that L + M is the transformation that applied to any vector 

u ∈ U is L(u) + M(u), and the operation of scalar multiplication, that 

cL is the transformation that applied to any vector is c.L(u). (For 

example, all the transformations of the plane, possibly represented as 

all 2 × 2 matrices, itself forms a vector space.) 

Ex. 7.24: Show that the adjoint of a matrix A under the weighted inner 

product 〈u, v〉 = u T Bv, for some suitable weight matrix B, is A † = 

(BAB −1 ) T . 

Ex. 7.25: Describe the adjoint of the transformation of the plane, T , which 

is rotation by an angle θ. 

Ex. 7.26: Use the definition of the adjoint to prove the first three properties 

listed in Theorem 7.8.


Ex. 7.27: Find the adjoint of Lf = ∫ b 

a K(x, y)f(y)dy under the weighted 

inner product 〈f, g〉 = ∫ b 

a w(x)f(x)g(x) dx. 

Ex. 7.28: Find the adjoints of the following differential operators L and the 

subspaces they operate on: 

(a) Lf = df such that 2f(0) + 5f(1) = 0; 

dx 

(b) Lf = f ′′ + 3f ′ + 4f such that f(0) = 0 and f ′ (1) = 0; 

(c) Lf = f ′′′ + f ′ such that f(0) = 0, f ′ (0) = 2f ′′ (1), f ′ (1) = 3f(1). 

Use the inner product 〈f, g〉 = ∫ 1 

0 fg dx. 

Ex. 7.29: Find the adjoints of the following differential operators L and the 

subspaces they operate on: 

(a) Lf = df such that f(0) = 3f(1); 

dx 

(b) Lf = f ′′ + 2f ′ + f such that f(0) = f(1) = 0; 

(c) Lf = f ′′′ such that f(0) = 0, f ′ (0) = 2f ′′ (1), f(1) = f ′ (1). 

Use the inner product 〈f, g〉 = ∫ 1 

0 fg dx.


7.3 Revision of eigenvalues and eigenvectors 

Reading 7.G Start by recalling familiar properties of eigenvalues and eigenvectors 

of matrices by revising the material in Kreyszig §7.1–2 [K,pp371– 

81] 

The critical facets: 

• a matrix A has a non-zero eigenvector v and eigenvalue λ if Av = λv, 

that is if the action of A is simply to stretch or compress v (possibly 

reversing direction if λ < 0); 

• eigenvalues are the solution of the characteristic equation det(λI−A) = 

0; the spectrum of A is the set of eigenvalues of A; 

• for any given eigenvector v, any scalar multiple is also an eigenvector, 

is a basis for a subspace and thus we seek linearly independent 

eigenvectors to avoid duplicity; 

• counted according to their multiplicity, there are precisely n eigenvalues 

(possibly complex) of an n × n matrix; 

• if the n eigenvalues of an n × n matrix are distinct, then there are 

precisely n linearly independent eigenvectors; however, if one or more 

eigenvalues are repeated, then the matrix may have less than n linearly 

independent eigenvectors;


• in Matlab, 

– poly(a) returns the coefficients of the characteristic polynomial 

det(λI − A), 

– eig(a) returns a vector of eigenvalues, 

– whereas [p,d]=eig(a) returns a diagonal matrix D of eigenvalues 

and P whose columns are eigenvectors so that A = P DP −1 . 

Activity 7.H Ensure you can do the problems in Kreyszig Problem Sets 7.1 

and 7.2 [K,pp375–6 & pp379–81], and Exercises 7.31 and 7.32. Send in 

to the examiner for feedback at least Ex. 7.31(a) & 7.32(a). 

Many of the above properties hold on function spaces. However, the determinant 

is not defined so other methods have to be used to find the eigenvalues 

and eigenvectors. 

Example 7.30: Find the eigenvalues and eigenfunctions corresponding to 

non-zero eigenvalues of the linear transformation ∫ 1 

0 (x + y)f(y) dy over 

the space of continuously differentiable functions. 

Solution: 

We seek non-trivial solutions to 

∫ 1 

(x + y)f(y) dy = λf(x) . 

0


• Expanding the integral on the left-hand side, observe 

x 

∫ 1 

0 

f(y) dy + 

∫ 1 

0 

yf(y) dy = λf(x) . 

• As the left-hand side is a linear function of x, then so must the 

right-hand side and hence so is f(x) (unless the left-hand side 

is zero which then implies λ = 0). Try f(x) = Ax + B in the 

equation to deduce 

( ) ( A A 

x 

2 + B + 

3 + B ) 

= λAx + λB . 

2 

• This has to hold for all x and so the coefficients on the two sides 

A 

must be equal: + B = λA and A + B = λB, which in matrix 

2 3 2 

form is the matrix eigenproblem 

[ ] [ ] [ ] 

1/2 1 A A 

= λ . 

1/3 1/2 B B 

• The characteristic equation for this 2×2 matrix is (λ− 1 2 )2 − 1 3 = 0 

with solution the non-zero eigenvalues λ = 1 2 ± 1 √ 

3 

. 

• The corresponding eigenvectors of the 2 × 2 problem are clearly 

proportional to (± √ 3, 1) which in terms of the functions in the 

function space are simply those proportional to the eigenfunctions 

f = ± √ 3x + 1.



Ex. 7.31: Each of the four pictures plotted below show the effect on vectors 

in the plane of a different transformation of the plane obtained by 

multiplying by different 2×2 matrices. In each picture, there are seven 

different coloured dashed vectors terminated by open circles, call them 

u i . In each picture the vectors resulting from multiplying by a matrix A 

are also plotted, say v i = Au i , and drawn as solid lines terminated by 

“*”. Using that the action of a matrix is to just stretch its eigenvectors 

by a factor λ (and reverse direction if λ < 0), draw on each picture your 

best estimate of the two directions corresponding to the two different 

families of eigenvectors of a 2 × 2 transformation. Label them with a 

rough estimate of the corresponding eigenvalue.


(a) 

(b) 

1.5 

1 

0.5 

0 

-0.5 

-1 

-1 0 1 2 

2 

1.5 

1 

0.5 

0 

-1 0 1 2 

(c) 

(d) 

0.5 

0 

-0.5 

-1 

1 

0.5 

0 

-0.5 

-1 

-1 -0.5 0 0.5 1 1.5 

-1 0 1 2 

Ex. 7.32: Find the only non-zero eigenvalues and the corresponding eigenfunctions 

of the following linear transformations: 

(a) Lf = ∫ 1 

0 2(x − y)f(y) dy; 

(b) Lf = ∫ b 

a exp(x − y)f(y) dy; 

(c) Lf = ∫ π 

0 cos(x + y)f(y) dy.


7.4 Diagonalisation transformation 

The orthogonal solutions of a Sturm-Liouville problem can be used to simply 

solve inhomogeneous differential equations of the form [r(x)y ′ ] ′ + [q(x) + 

µp(x)]y = f(x). The trick is to express the right-hand side f(x) as a sum 

over the eigenfunctions of the differential operator. We now explore how 

this is analogous to the diagonalisation of matrices and proceed to develop 

further properties. 

Main aims: 

• show that eigenvectors (eigenfunctions) of the adjoint operator are used 

to find expansions in the eigenvectors (eigenfunctions) of an operator; 

• because the adjoint eigenvectors are orthogonal to the eigenvectors, see 

that eigenvectors of a self adjoint operator are orthogonal. 

• an eigenfunction expansion solution to the Sturm-Liouville type problem 

[r(x)y ′ ] ′ + [q(x) + µp(x)]y = f(x) is the linear combination of 

eigenfunctions ∑ j y j (x) 〈f/p, y j 〉 /(µ − λ j ) for any specific value of the 

parameter µ.


7.4.1 Adjoint eigenvectors diagonalise operators 

Example 7.33: Consider solving the simple linear equation 

[ ] [ ] [ ] [ ] 

3 1 3 1 1 

Ax = x = = + 2 . 

1 3 −1 1 −1 

The rearrangement on the very right of this equation is motivated because 

I happen to know that (1, ±1) are eigenvectors of the matrix 

A; they correspond to eigenvalues 4 and 2. Knowing this we solve 

the linear equation using the “method of undetermined coefficients” 

by guessing a solution of the form of a linear combination of the two 

eigenvectors: 

[ ] [ ] 

1 1 

x = a + b . 

1 −1 

Substituting this into the linear equation and noting that A(1, 1) is 

4(1, 1) and A(1, −1) is 2(1, −1) 

[ ] 

[ ] [ ] [ ] [ ] 

3 1 1 1 1 

Ax = becomes 4a + 2b = + 2 . 

−1 

1 −1 1 −1 

Equating coefficients on both sides shows 4a = 1 and 2b = 2, that is 

a = 1/4 and b = 1, hence the solution is 

x = 1 [ ] [ ] [ ] 

1 1 5/4 

+ = . 

4 1 −1 −3/4


The process just given here is the same as that we use in the eigenfunction 

expansion of solutions of ode’s in §7.4.3. Look at and compare with Example 

7.37. This process is intimately tied up with the diagonalisation of 

matrices and linear operators because the solution of linear equations with a 

diagonal operator is near trivial as shown above. 

Reading 7.I Study Kreyszig §7.5 [K,pp392–6] (overlook Theorem 4 and Example 

3). 

Using diagonalisation, A = P DP −1 , the solution of the system Ax = b is 

written as x = P D −1 P −1 b which is easy to do as D −1 is simply the diagonal 

matrix of reciprocals of the eigenvalues. More explicitly we might write this as 

x = P a = ∑ j v j a j where the “amplitudes” in the solution of the eigenvectors 

v j are given by a j = (w j · b)/λ j where w j comes from the jth row of P −1 . 

Implicitly, this general result is used in the introductory example. 

Activity 7.J Do problems 1–6 and 10–22 from Problem Set 7.5 [K,pp397–8] 

Theorem 7.11 on deriving the eigenfunction expansion of solutions to inhomogeneous 

Sturm-Liouville problems, seen in action in Example 7.37, is 

equivalent to using a diagonalisation. 

We proceed to determine how to “digonalise operators”, not just matrices. 

The general setting is of linear transforms A acting on some vector space 

with an inner product 〈, 〉. The definition of eigenvalues and eigenvectors 

proceeds as before. However, there is one new twist that is useful.


Definition 7.9 For a linear transformation A the eigenvectors of A † are 

called the left-eigenvectors or adjoint eigenvectors of A. 

They are called left-eigenvectors because in the case of a matrix A, the defining 

equation A T w = λw is, upon transposing, equivalent to w T A = λw T 

in which w T appears to the left of A. Three important properties are the 

following. 

identical spectrum The spectrum of the adjoint A † , the set of eigenvalues, 

is the same as that of A. This is easily seen in finite dimensions because 

the characteristic polynomial of a matrix and its transpose are the 

same. 

orthogonal eigenvectors Any left-eigenvector and ordinary eigenvector corresponding 

to distinct eigenvalues are orthogonal. 

To see this, suppose v i is an eigenvector corresponding to eigenvalue 

λ i and w j is an eigenvector corresponding to eigenvalue λ j ≠ λ i . Then 

consider 

λ i 〈w j , v i 〉 = 〈w j , λ i v i 〉 

= 〈w j , Av i 〉 by definition of eigenvector 

= 〈 A † w j , v i 

〉 

by definition of adjoint 

= 〈λ j w j , v i 〉 by definition of left-eigenvector 

= λ j 〈w j , v i 〉 .


Since λ i ≠ λ j the only way the extreme sides of this equation can be 

equal is if the common inner product factor is zero. Thus w j and v i 

are orthogonal. 

eigen-expansion Thus, if we have a complete set of eigenvectors and when 

we normalise the left-eigenvectors so that the inner product with its 

partner eigenvector 〈w j , v j 〉 = 1, then any vector may be decomposed 

as the linear combination of the eigenvectors u = ∑ i 〈w i , u〉 v i . 

Since the eigenvectors are complete, there exists some linear combination 

u = ∑ i a i v i . Take the inner product of this to determine 

〈w j , u〉 = 

〈 

w j , ∑ i 

a i v i 

〉 

A common 

convention is to use 

δ ij to be 1 if i = j 

and 0 otherwise. 

= ∑ i 

a i 〈w j , v i 〉 

by linearity of inner product 

= ∑ i 

a i δ ij 

by orthonormalisation of w j 

= a j as all other terms are 0. 

Hence the “amplitude” of v i in the linear combination is a i = 〈w i , u〉 

as claimed. 

For a matrix A one may form the matrices of eigenvectors, 

P = [v 1 | v 2 | . . . | v n ] and Q = [w 1 | w 2 | . . . | w n ] ,


then, using the usual dot product for as the inner product, observe that Q T P 

is the matrix of inner products 〈w i , v j 〉 which is zero everywhere except the 

diagonal where w i has been normalised so the diagonal is 1. Thus Q T = P −1 , 

that is, the rows of P −1 are the normalised left-eigenvectors of A. 

Example 7.34: Compute the eigenvalues, eigenvectors and left-eigenvectors 

of the matrix appearing in the linear equation 

[ 

1 −1 

−4 1 

and hence solve the linear equation. 

] [ ] 

2 

u = , 

2 

Solution: 

Call the matrix A, then the characteristic equation is 

det(A − λI) = (λ − 1) 2 − 4 = 0 , 

which has solution the eigenvalues λ 1 = −1 and λ 2 = 3. 

• Any eigenvector corresponding to λ 1 = −1 solves 

(A − λI)v 1 = 

[ 

2 −1 

−4 2 

with all solutions proportional to v 1 = (1, 2). 

] 

v 1 = 0 ,


• Any eigenvector corresponding to λ 2 = 3 solves 

[ ] 

−2 −1 

(A − λI)v 2 = 

v 

−4 −2 2 = 0 , 

with all solutions proportional to v 2 = (1, −2). 

• The left-eigenvectors satisfy the transposed equations, so for λ 1 = 

−1: 

[ ] 

2 −4 

(A T − λI)w 1 = 

w 

−1 2 1 = 0 , 

with all solutions proportional to w 1 = (2, 1). Observe that w 1 · 

v 2 = 0 as assured by theory, and that w 1 · v 1 = 1 upon scaling w 1 

to w 1 = (1/2, 1/4). 

• Similarly the left-eigenvector corresponding to λ 2 = 3 solves 

[ ] 

−2 −4 

(A T − λI)w 2 = 

w 

−1 −2 2 = 0 , 

with all solutions proportional to w 2 = (−2, 1). Observe that 

w 2 ·v 1 = 0 as assured by theory, and that w 2 ·v 2 = 1 upon scaling 

w 2 to w 2 = (1/2, −1/4). 

Thus the inner products of the left-eigenvectors with the given righthand 

side are w 1 · (2, 2) = 3/2 and w 2 · (2, 2) = 1/2 so we know it as 

this linear combination of the eigenvectors: 

[ ] 

2 

= 3 [ ] 

1 

+ 1 [ ] 

1 

. 

2 2 2 2 −2


Divide each term in the linear combination by the corresponding eigenvalue 

to obtain the solution 

u = − 3 [ ] 

1 

+ 1 [ ] [ ] 

1 −4/3 

= 

. 

2 2 6 −2 −10/3 

Example 7.35: Find the eigenvalues, eigenvectors (eigenfunctions) and 

normalised adjoint eigenvectors for the linear operator Ly = − d2 y 

with 

dx 2 

boundary conditions y(0) = 0 and y ′ (π) = 1 2 y′ (0). 

Solution: Use the usual inner product on the domain of the differential 

equation, namely 〈f, g〉 = ∫ π 

0 f(x)g(x) dx. 

• First, solve the eigenproblem −y ′′ = λy such that y(0) = 0 and 

y ′ (π) = 1 2 y′ (0). The ode has constant coefficients so we expect 

exponential or trigonometric solutions. Exponential solutions cannot 

occur because if they did they would have to be of the form 

y(x) = sinh( √ −λx) to satisfy y(0) = 0, but the derivative of this 

y ′ (x) ∝ cosh( √ −λx) is monotonic increasing with positive x and 

so y ′ (π) cannot be 1 2 y′ (0). Similarly λ = 0 cannot give rise to an 

eigenfunction. In the last case, to satisfy y(0) = 0 trigonometric


solutions must be of the form y(x) = sin( √ λx). Then the other 

boundary condition 

y ′ (π) = 1 2 y′ (0) ⇔ √ λ cos( √ λπ) = 1 2√ 

λ 

⇔ cos( √ λπ) = 1 2 

⇔ √ λ = 1 3 , 5 3 , 7 3 , 11 3 , . . . 

√ 

⇔ λ j = j − 1 + 2 (−1)j /6 j = 1, 2, 3, . . . 

⇔ λ j = (j − 1 2 + (−1)j /6) 2 j = 1, 2, 3, . . . 

The corresponding eigenfunctions are v j (x) = sin[(j− 1 2 +(−1)j /6)x] 

plotted below.


1 

0.8 

0.6 

0.4 

0.2 

0 

-0.2 

v 1 

-0.4 

-0.6 

-0.8 

v 2 

v 3 

v 4 

v 5 

-1 

0 0.5 1 1.5 2 2.5 3 3.5 

x 

• Second, derive and solve the adjoint. Consider 

〈w, Lv〉 = 

∫ π 

0 

−wv ′′ dx 

= [−wv ′ + w ′ v] π 0 + 

∫ π 

0 

−w ′′ v dx 

x=linspace(0,pi); 

[j,x]=meshgrid(1:5,x); 

k=(j-.5+(-1).^j/6); 

v=sin(k.*x); 

plot(x,v) 

= w ′ (π)v(π) + [w(0) − 1 2 w(π)]v′ (0) + 〈−w ′′ , v〉 

and therefore the adjoint is L † w = − d2 w 

with boundary conditions 

w ′ (π) = 0 and w(0) = 1 w(π). Observe that although the 

dx 2 

2 

differential part of the adjoint is the same, the operator L is not 

self-adjoint because the boundary conditions for the adjoint are 

different to that for L.


By a similar argument to the above the solutions to the adjoint 

eigenproblem L † w = λw must be trigonometric. To satisfy w ′ (π) = 

0 the solutions must be of the form w = cos[ √ λ(π − x)]. Then 

the other boundary condition 

w(0) = 1 2 w(π) ⇔ cos[√ λπ] = 1 2 

⇔ √ λ = 1 3 , 5 3 , 7 3 , 11 

3 , . . . 

as for L. The spectrum for L and its adjoint must be the same. 

The left-eigenfunctions are then found to be w j (x) ∝ cos[(j − 1 2 + 

(−1) j /6)(π − x)]. To normalise, observe 

Thus choose 

〈 

cos[(j − 

1 

2 + (−1)j /6)(π − x)], v j (x) 〉 

= π 2 sin[(j − 1 2 + (−1)j /6)π] = π√ 3 

4 (−1)j−1 . 

w j (x) = 4(−1)j−1 

π √ 3 

cos[(j − 1 2 + (−1)j /6)(π − x)] , 

plotted below. A little algebra also confirms that w j (x) are orthogonal 

to the eigenfunctions v i (x).


0.8 

0.6 

0.4 

0.2 

0 

-0.2 

-0.4 

-0.6 

w 1 

w 2 

w 3 

w 4 

w 5 w=cos(k.*(pi-x)) ... 

.*(-1).^(j-1)*4/(pi*sqrt(3)); 

plot(x,w) 

-0.8 

0 0.5 1 1.5 2 2.5 3 3.5 

x 

Activity 7.K Do Exercises 7.39–7.40 in §7.4.4. Send in to the examiner for 

feedback at least Ex. 7.39(a) & 7.40. 

7.4.2 Orthogonal eigenvectors of self-adjoint operators 

One immediate consequence of the work in the previous subsection concerns 

a self-adjoint transform. Clearly if a transformation is self-adjoint then the


left-eigenfunctions, or left-eigenvectors of a symmetric matrix, are identical 

to the ordinary eigenvectors. This is because they satisfy precisely the 

same equations. This and other rather special properties hold for self-adjoint 

transformations (symmetric matrices). 

Reading 7.L Study the properties of orthogonal and symmetric matrices 

in Kreyszig §7.3 [K,pp381–4]. 

The theorems apply not just to symmetric matrices but also to self-adjoint 

operators. As an example consider the proof of the reality of the eigenvalues. 

Theorem 7.10 The eigenvalues of a self-adjoint (real) linear transformation 

are all real. 

Proof: Let L be a self-adjoint linear transformation on some inner product This proof is better 

space with eigenvalue λ and corresponding eigenvector v: thus Lv = λv. when set in a 

complex vector 

space, but here we 

• Take the complex conjugate of this equation to deduce Lv = λv where compromise by 

the over bar denotes complex conjugation. Since L is real, L = L. allowing complex 

eigenvalues and 

Thus λ and v must be an eigenvalue and eigenvector of L also. 

eigenvectors 

without actually 

giving a proper 

setting for them.


♠ 

• Now consider 〈v, Lv〉. On the one hand 

On the other hand, 

〈v, Lv〉 = 〈v, λv〉 as v is an eigenvector of L 

= λ 〈v, v〉 . 

〈v, Lv〉 = 〈Lv, v〉 as L is self-adjoint 

= 〈 λv, v 〉 as v is an eigenvector of L 

= λ 〈v, v〉 . 

• Hence, λ 〈v, v〉 = λ 〈v, v〉, equivalently 

(λ − λ) 〈v, v〉 = 0 . 

• Under all useful definitions of an inner product 〈v, v〉 ≠ 0, and indeed 

is real and positive. Thus the only way the previous equation can be 

satisfied is if λ = λ. That is, the eigenvalue must be real. 

This proof directly echoes that for the reality of the eigenvalues of the Sturm- 

Liouville problem. The only difference is the generalisation in the Sturm- 

Liouville problem to differential equations of the form Ly = λp(x)y for some 

“weight function” p(x). The general theory can of course be extended to 

cover such more general cases, but the details are more involved so here we 

do not do so.


Example 7.36: Observe that the linear transformation in Exercise 7.32(c) 

is self-adjoint, because K(x, y) = K(y, x), and you will have found 

the eigenvalues are real and the particular eigenvectors you found were 

orthogonal. 

However, the eigenvalues of the linear transformation in Exercise 7.32(a) 

are complex valued and this is allowed because it is not self-adjoint. 

That an n × n symmetric matrix is (orthogonally) diagonalisable because 

it always has n (orthogonal) eigenvectors, is mirrored by the claim of completeness 

made for eigenfunctions of the Sturm-Liouville problem: that a 

matrix has n eigenvectors means we can write any vector in IR n in terms of 

the eigenvectors; that the eigenfunctions are complete means that we can 

represent any function on the domain as a linear combination of the eigenfunctions. 

Activity 7.M Do problems 1–6 from Problem Set 7.3 in Kreyszig [K,p384], 

and Exercises 7.42–7.44 herein. 

7.4.3 Expansions in orthogonal eigenfunctions 

Having seen that we can obtain sets of eigenfunctions as solutions of differential 

equations we now show that these can be used to produce a new


representation of almost arbitrarily complicated functions. This is advantageous 

in many circumstances. 

For an introductory example, use the Legendre polynomials to solve the ode 

(1 − x 2 )y ′′ − 2xy ′ + y = 1 + x + x 2 , 

such that y(x) is well-behaved at x = ±1. First, rewrite the right-hand side 

in terms of Legendre polynomials [K,p208] 

4 

3 P 0(x) + P 1 (x) + 2 3 P 2(x) = 4 3 + x + 2 1 

3 2 (3x2 − 1) = 1 + x + x 2 . 

Second, try a solution in the form y = aP 0 (x) + bP 1 (x) + cP 2 (x) for some 

constants a, b and c to be determined. Because Legendre polynomials satisfy 

(1 − x 2 )P n ′′ − 2xP n ′ = −n(n + 1)P n , the left-hand side of the ode simplifies 

immensely to just aP 0 −bP 1 −5cP 2 . Lastly equate coefficients of the Legendre 

polynomials on the two sides of the equation to deduce a = 4 , b = −1 and 

3 

c = − 2 . Hence the solution is 

15 

y = 4 3 P 0(x) − P 1 (x) − 2 15 P 2(x) = 21 

15 − x − 1 5 x2 . 

Of course here this could have been obtained more straightforwardly by simply 

guessing this polynomial form. But the approach introduced here is much 

more general as seen below. 

Reading 7.N Study Kreyszig §4.8 [K,pp240–6].


Activity 7.O Do problems from Problem Set 4.8 [K,pp246–7]. Send in to 

the examiner for feedback at least Q3 & 7. 

Now generalise the earlier example. 

Theorem 7.11 The solution to the ode [r(x)y ′ ] ′ + [q(x) + µp(x)]y = f(x) 

subject to some homogeneous boundary conditions for some constant µ may 

be written 

y = ∑ 〈f/p, y m 〉 

y m (x) , 

m µ − λ m 

provided µ ≠ λ m , where λ m are eigenvalues and y m (x) are the orthonormal 

eigenfunctions of the associated Sturm-Liouville problem. 

Proof: Try a solution in the form y(x) = ∑ m a m y m (x). Because [ry m ′ ]′ + 

qy m = −λ m py m the ode becomes 

∑ 

a m (µ − λ m )p(x)y m (x) = f(x) . 

m 

Multiply by y n (x) for any n and integrate over the domain, say [a, b], to 

deduce 

∑ 

∫ b 

∫ b 

a m (µ − λ m ) p(x)y m (x)y n (x) dx = f(x)y n (x) dx . 

a 

a 

m 

The right-hand side is identical to the inner product, 〈f/p, y n 〉, of f/p and y n . 

Because the eigenfunctions are orthonormal with weight p(x), the integral on


the left-hand side is 1 if m = n and 0 otherwise. Thus the equation simplifies 

to a n (µ − λ n ) = 〈f/p, y n 〉 from which we deduce a n = 〈f/p, y n 〉 /(µ − λ n ) for 

all n provided µ ≠ λ n . ♠ 

Example 7.37: Use eigenfunction expansion to solve the ode y ′′ + 2y = 1 

such that y(0) = y(π) = 0. 

Solution: First find the eigenfunctions of the associated problem: y ′′ + 

λy = 0 such that y(0) = y(π) = 0. Fortunately this is well known to 

us: the eigenvalues are √λ n = n 2 and the complete set of orthonormal 

eigenfunctions are y n = 2/π sin nx. 

Second, write the right-hand side, here f(x) = 1 for 0 < x < π, in 

terms of the eigenfunctions. From Example 1 [K,pp241–2] we know 

that we can do this as 

f(x) = 1 = 4 (sin x + 1 π 3 sin 3x + 1 sin 5x + · · ·) 

for 0 < x < π . 

5 

Lastly, substitution shows that a solution expressed as a sum of the 

eigenfunctions just involves dividing each term appearing above by the 

corresponding 2 − λ n : 

y(x) = 4 π 

( 

sin x − 1 1 

sin 3x − sin 5x − · · ·) 

. 

6 35


Example 7.38: In example 7.35 we computed eigenvalues, eigenvectors (eigenfunctions) 

and normalised adjoint eigenvectors for the linear operator 

Ly = − d2 y 

with boundary conditions y(0) = 0 and y ′ (π) = 1 dx 2 2 y′ (0). 

Write down the formal eigenfunction expansion of the solution to the 

problem −y ′′ = h(x) with the same boundary conditions. 

Solution: We may expand any function in terms of the eigenfunctions 

as h(x) = ∑ ∞ 

j=1 〈w j , h〉 v j (x). Then the formal solution to −y ′′ = h(x) 

is 

∞∑ 〈w j , h〉 

y = v j (x) . 

λ j 

j=1 

You should wonder what occurs if the possibility of dividing by a zero µ−λ m 

eventuates in the formal solution of Theorem 7.11. Just as for the solution 

of linear algebraic equations, such division by zero indicates that the Sturm- 

Liouville differential operator is “singular”. Thus if for any m it happens 

that µ − λ m = 0, then either the ode is inconsistent, indicated by the inner 

product 〈f/p, y n 〉 ≠ 0, or the ode is consistent, when 〈f/p, y n 〉 = 0, and the 

solution can include an arbitrary multiple of the corresponding eigenfunction, 

that is Ay m (x) could be added for any constant A.



Ex. 7.39: Find the eigenvalues, eigenvectors and left-eigenvectors of the following 

matrices, and then verify the orthogonality between eigenvectors 

and left-eigenvectors: 

[ ] 

0 1 

(a) 

−6 5 

[ ] 

11 −6 

(b) 

18 −10 

(c) 

⎡ 

⎢ 

⎣ 

0 1 3 

1 6 9 

−1 −5 −8 

⎤ 

⎥ 

⎦ 

Ex. 7.40: Deduce the adjoint eigenfunctions that correspond to the nonzero 

eigenvalues of the linear operators in Exercise 7.32 (b) and (c). 

Where appropriate, verify the orthogonality among the eigenfunctions 

and these adjoint eigenfunctions. 

Ex. 7.41: Prove that if L is a self-adjoint linear transformation in some inner 

product, then eigenvectors from different eigenspaces are orthogonal. 

Ex. 7.42: Show that the differential operator Ly = d2 

dx 2 [ 

r(x) 

d 2 y 

dx 2 ] 

such that 

y(0) = y ′ (0) = y(L) = y ′ (L) = 0 is self-adjoint and hence deduce it has 

real eigenvalues and a complete set of eigenfunctions. For your interest


note that the ode Ly = h(x) with these boundary conditions describes 

the deflection under a distributed load h(x) of a beam of varying shape, 

encoded by r(x), with clamped ends. 

Ex. 7.43: Show that the eigenfunctions of the Sturm-Liouville system 

−y ′′ = λy , y(0) = 0 , y(1) − 2y ′ (1) = 0 , 

are sin( √ λt) where the eigenvalues are the positive solutions to tan √ λ = 

2 √ λ. By sketching a graph show that λ j ≈ (2j − 1) 2 π 2 /4 for large j. 

Why do we not need to worry about the possibility of complex eigenvalues? 

Ex. 7.44: Similarly find the eigenfunctions of the Sturm-Liouville system 

−y ′′ = λy , y(0) = y ′ (0) , y(π) = 0 , 

and approximate values for the eigenvalues. 

Ex. 7.45: A linear operator L is defined by 

Lf = f ′′ + 4f ′ + 3f , where f(0) = f ′ (1) = 0 , 

on a vector space with inner product 〈f, g〉 = ∫ 1 

0 

(a) Find the adjoint of L. 

fg dx . 

(b) Show that f n (x) = e −2x sin(ω n x) are eigenfunctions of L provided 

ω n = 2 tan(ω n ). (For example, ω 1 = 4.2748, ω 2 = 7.5965, etc.) 

What are the corresponding eigenvalues?



7.8 (a) yes; (b) no; (c) yes; (d) yes; (e) no; (f) yes. 

7.25 The adjoint is rotation by −θ. 

7.27 L † f = ∫ b 

a w(y)w(x)−1 K(y, x)f(y) dy 

7.28 (a) L † f = − df such that f(0) + 2f(1); (b) dx L† f = f ′′ − 3f ′ + 4f such 

that f ′ (1) = 0 and f(0) = 0; (c) L † f = −f ′′′ − f ′ such that f(0) = 0, 

f(1) + 2f ′ (0) = 0 and 3f ′′ (1) − f ′ (1) + 3f(1) = 0. 

7.29 (a) L † f = − df 

dx such that f(1) = 3f(0); (b) L† f = f ′′ − 2f ′ + f 

such that f(−1) = f(1) = 0; (c) L † f = −f ′′′ such that f(0) = 

f(1) + f ′ (0) = 0 and f ′′ (1) = f ′ (1). 

7.31 (a) λ = 1, λ = 2; (b) λ = 1, λ = 3; (c) λ = −1.5, λ = 1; (d) 

λ = −0.8, λ = 1.8. 

7.32 (a) λ = ±i/ √ 3 with eigenfunctions f(x) = 2x + (−1 ± i/ √ 3) ; (b) 

λ = b − a and the eigenfunctions are f(x) = e x ; (c) λ = ±π/2 

corresponding to eigenfunctions cos x and sin x. 

7.39 (a) λ = 2 and 3, v 1 = (1, 2), v 2 = (1, 3), w 1 = (3, −1) and w 2 = 

(−2, 1); (b) λ = 2 and −1, v 1 = (2, 3), v 2 = (1, 2), w 1 = (2, −1) 

and w 2 = (−3, 2); (c) λ = 1, −1 and −2, v 1 = (−1, 2, −1), v 2 = 

(2, 1, −1), v 3 = (−1, −1, 1), w 1 = (0, 1, 1), w 2 = (1, 2, 3) and w 3 = 

(1, 3, 5);


7.40 (b) w(x) = e −x ; (c) w 1 (x) = 2 cos x and w π 2(x) = 2 sin x. 

π


7.5 Summary 

• A vector space with the operations of vector addition and scalar multiplication 

is the foundation for the study of transformations in both 

finite and infinite dimensions (§7.1.1). The ten axioms of a vector 

space are: closure and associativity of vector addition and scalar multiplication, 

commutativity of vector addition, distributivity of scalar 

multiplication over vector addition and of scalar multiplication over 

scalar addition; the existence of a zero vector and a negative, and the 

identity of scalar multiplication by 1. 

• A subset of a vector space is a subspace if it is closed under vector 

addition and scalar multiplication. 

• The dimension of a vector space is the maximum number of linearly 

independent basis vectors. 

• An inner product imbues a vector space with distances, lengths and 

angles (§7.1.2). The three axioms of an inner product (Defn. 7.3) are 

linearity, symmetry and positivity. Inequalities, familiar from plane 

geometry, follow in general. 

√ 

• Distances and lengths are given by the norm ‖u‖ = 〈u, u〉 (Defn. 7.4). 

The angle θ between two vectors is given by cos θ = 〈u, v〉 /(‖u‖.‖v‖); 

they are orthogonal if the inner product is zero (§7.1.2).


• Matrix multiplication, differential and integral operators are examples 

of linear transformations (§7.2.1). 

• A linear transform (operator) is neatly complemented by its adjoint 

(§7.2.2) defined by 〈u, Lv〉 = 〈 L † u, v 〉 (Defn. 7.7). A self-adjoint 

transformation, such as the Sturm-Liouville operator, generalises the 

concept of a symmetric matrix. 

• The spectrum of a linear transformation and its adjoint are the same, 

and the adjoint or left-eigenvectors are orthogonal to the ordinary 

eigenvectors (§7.4.1). This allows them to be used to extract the component 

of the eigenvectors or eigenfunction in any given vector or function. 

• Thus the eigenvectors of a self-adjoint linear transformation (symmetric 

matrix) are necessarily orthogonal and complete (§7.4.2). Solutions of 

Sturm-Liouville systems provide an important example of this property. 

• The solution to the ode [r(x)y ′ ] ′ + [q(x) + µp(x)]y = f(x) subject to 

some homogeneous boundary conditions for some constant µ may be 

written 

y = ∑ 〈f/p, y m 〉 

y m (x) , 

m µ − λ m 

provided µ ≠ λ m , where λ m are eigenvalues and y m (x) are the orthonormal 

eigenfunctions of the associated Sturm-Liouville problem.

Index 

:=, 242 

array, 66, 69 

article, 53 

author, 54 

begin-end, 243 

bye, 242 

caption, 77 

df, 221, 243 

displaymath, 63 

documentclass, 53 

document, 53 

end, 243 

eqnarray, 68, 69 

equation, 63 

factorial, 243 

factor, 215 

for, 213, 243 

graphicx, 76 

includegraphics, 77 

int, 213, 243 

in, 244 

left, 64 

let, 243 

maketitle, 54 

mbox, 66 

nonumber, 69 

paragraph, 58 

quad, 66 

quit, 242 

repeat-until, 226, 243 

right, 64 

section, 56 

title, 54 

until, 243 

usepackage, 76 

write, 215, 243

Index 315 

absolute convergence, 143 

accents, 73 

Achilles, 133 

adjoint, 274, 274–283, 292, 298, 

299, 313 

adjoint eigenfunction, 308 

adjoint eigenvectors, 292, 296, 307 

age structured populations, 108 

Airy’s equation, 239 

alternating harmonic series, 143 

ampersand, 59 

analytic, 191, 193 

analytic function, 151, 167 

angle, 262, 265, 266, 266, 282 

array, 67 

associativity, 256–261, 312 

autonomous, 35 

Babbage, 206 

basis, 256, 258 

Bessel function, 202, 203, 234, 250 

Bessel functions of the first kind, 

201 

Bessel’s equation, 201, 202, 237, 

250 

brace, 59, 60, 64 

bracket, 64 

bulleted list, 61 

car traffic, 93, 94, 100, 105 

caret, 59, 60 

case statement, 67 

Cauchy’s convergence, 142 

Cauchy-Schwarz inequality, 265, 

266, 267, 272 

characteristic diagram, 102, 120 

characteristic equation, 284 

circular geometry, 195 

closed, 256, 257, 259–262, 312 

coefficients, 148, 191, 195, 197, 

198, 220, 304 

command definition, 74, 75 

commutativity, 256, 257, 259, 260, 

312 

comparison test, 145 

complete, 293, 303, 306, 308 

complex conjugate, 301 

computer algebra, 206, 207, 211, 

214, 231, 233, 234, 

240–242 

conditional convergence, 143 

conservation of mass, 96

Index 316 

conservation of momentum, 125 

continuity equation, 96–98, 108, 

109, 117, 125 

converges, 134 

critical point, 17 

critical points, 23 

definition, 46 

definition, command, 74 

degenerate case, 18, 19 

delimiter, 64 

diagonalisation, 291 

dimension, 256, 312 

dimensionality, 256, 258 

directional derivatives, 172 

displayed equations, 47 

displayed mathematics, 63 

distance, 262, 265, 265 

distributivity, 256, 257, 261, 312 

dollar, 59 

dot product, 262 

dots, 68 

eigenfunction, 246, 251, 285, 

287–289, 296, 297, 299, 

303, 305–309, 313 

eigenfunction expansion, 291, 306, 

307 

eigenspace, 308 

eigenvalue, 37, 246, 251, 284–288, 

290–292, 294, 296, 

301–303, 305–309, 313 

eigenvector, 37, 246, 284, 285, 

287, 290–296, 301, 303, 

307, 308 

elastic artery, 124 

elementary functions, 71 

ellipsis, 48 

empty set, 259 

equation of state, 119 

equilibrium, 106 

Euler, 86 

Eulerian description, 86 

even powers, 149, 150 

example, 46 

exponential, 31, 296 

extrema, 168, 170, 171, 173 

figure, 75, 76 

fixed point, 17–19, 23–26, 28, 31, 

35–37, 106 

float, 76

Index 317 

font, 53, 60 

fraction, 61 

Frobenius method, 196, 198, 201, 

250 

geometric series, 145 

global maximum, 170 

global minimum, 170 

harmonic series, 139, 140, 143, 144 

hash, 59 

Hessian, 174 

Hessian matrix, 174, 175, 179, 180 

higher dimensions, 18, 19 

html, 50 

ideal gas, 119 

identity, 256, 258, 261, 312 

in-line mathematics, 59, 61, 73 

indicial equation, 197, 199, 234, 

235, 237, 250 

infinite series, 134 

infinite sum, 134 

initial conditions, 212, 215, 

223–225, 229, 231 

inner product, 262, 263, 263 

Inner product, 265 

inner product, 268, 272, 274–280, 

282, 283, 305, 312 

inner product space, 263, 

265–267, 274 

install LaTeX, 51 

isoclines, 24, 24, 31 

iteration, 211, 212, 217, 220–223, 

225, 230, 235, 236, 239 

iterative construction, 251 

Jacobi, 37 

Jacobian, 35, 37, 37, 38 

Lagrange’s remainder, 159, 162 

Lagrangian particle paths, 87 

LaTeX, 49 

LaTeX, document, 53 

LaTeX, install, 51 

LaTeX, web access, 52 

least-squares solution, 273 

left-eigenvectors, 292, 293–295, 

301, 308 

Legendre polynomials, 194, 249, 

258, 304

Index 318 

Legendre’s equation, 193, 223, 

224, 240, 249 

length, 265 

line breaks, 56 

linear combination, 256, 258, 289, 

293, 295 

linear independence, 256 

linear operator, 270, 271, 274, 

276, 291, 296, 307, 308 

linear transform, 270, 282, 285, 

288, 291, 301, 303, 308 

linear transformation, 269, 270, 

313 

linearisation, 24, 24, 28, 100, 112, 

119, 125, 233 

linearity, 263, 312 

linearly independent, 258 

list environment, 61 

local maximum, 170, 172, 173, 

175, 177, 180 

local minimum, 172, 173, 177, 182 

logarithm, 234 

Maclaurin series, 153, 211, 212, 

214, 216, 217, 221–223, 

229, 232, 239–241 

mass on a spring, 7 

material derivative, 86, 87, 91, 92 

mathematical functions, 48, 71 

mathematics environment, 60, 66, 

68 

method of characteristics, 100, 

101, 119 

method of undetermined 

coefficients, 290 

momentum equation, 115, 117, 

125 

multiplicity, 284 

negative, 256, 259, 261, 312 

negative definite, 174, 177 

negative space, 65, 66 

nonlinear differential equations, 

23, 207 

nonlinear ode, 217, 217, 221, 229, 

233, 240, 241 

norm, 265, 265, 312 

normal space, 65 

notation, 46 

numbered list, 62 

odd powers, 149

Index 319 

orbit, 11 

order of, 221, 230, 233 

orthogonal, 245, 246, 249, 251, 

265, 266, 266, 289, 292, 

293, 299, 303, 308, 312 

paragraphs, 56 

parallelogram equality, 265 

parentheses, 64 

partial derivative command, 74 

partial sums, 134, 134, 135, 137, 

138, 156 

percent, 59 

phase plane, 11, 11, 14, 24, 217 

phase portrait, 11, 23 

population model, 108 

positive definite, 174, 174, 177 

positivity, 263, 312 

postscript, 76 

power series, 147, 148, 155, 190, 

191, 193, 196–198, 201, 

211–213, 217, 218, 

220–222, 230, 233, 234, 

237, 241, 242, 249, 251 

power series method, 189, 195, 216 

preamble, 76 

proof, 46 

proof by contradiction, 139 

punctuate, 47, 64 

Pythagoras theorem, 266 

quad space, 65 

quadratic form, 174, 174 

quadratic polynomials, 256 

radius of convergence, 148, 151, 

153, 166 

range, 270, 271 

ratio test, 145, 145, 149, 150 

redefine, 74 

reduce, 207, 242 

regular point, 200, 201, 249 

relations, 63 

residual, 221, 224, 224, 226, 229, 

230, 233, 235, 239, 251 

Rolle’s theorem, 161 

root test, 145 

rubber band, 87 

saddle point, 27, 28, 31, 172, 172, 

173, 175, 177, 179, 181

Index 320 

scalar multiplication, 256, 

256–258, 260, 261, 267, 

282, 312 

Schwarz inequality, 265 

sections, 56 

self-adjoint, 274, 274, 277–282, 

298, 300, 301, 303, 308, 

313 

sequence, 130, 132, 134, 135, 138, 

142 

set union, 259 

shear transformation, 275 

shift summation indices, 191 

singular, 249, 307 

singular point, 200, 234, 249 

slosh, 59 

sonic boom, 119 

sound, 119 

space, 65 

special functions, 249, 250 

spectrum, 284, 292, 299 

square integrable, 259, 259–261, 

264, 271, 273 

stability, 17, 17, 19, 37 

stable, 17, 19, 21 

stationary points, 171, 172, 178, 

179, 181 

structure, 195, 245 

Sturm-Liouville, 302, 303 

Sturm-Liouville equation, 246, 

251, 280 

Sturm-Liouville operators, 280, 

281 

Sturm-Liouville problem, 265, 

289, 291, 302, 305, 313 

Sturm-Liouville theory, 245 

subscript, 59, 60, 72 

subspace, 261, 268, 274, 278–280, 

283, 284, 312 

superscript, 59, 60, 72 

symbols, 47, 60, 79 

symmetric matrix, 274, 301, 303 

symmetry, 263, 312 

tabular format, 66 

tangent plane, 172 

Taylor polynomial, 156, 157–159, 

161, 162 

Taylor series, 152, 153, 153, 156, 

162, 169, 180, 191, 200, 

201, 221

Index 321 

Taylor’s series, multivariable, 166 

Taylor’s theorem, 166, 173 

telescopic sum, 135 

telnet, 208 

testing, 234 

thin space, 65, 66, 71 

tilde, 59 

trajectory, 11, 11, 15 

triangle inequality, 265, 267 

truncation error, 156, 156 

underscore, 59, 60 

uniqueness, 154, 191, 249 

unit vector, 265 

unstable, 17, 19–21, 28, 42 

vector addition, 256, 259, 261, 312 

vector space, 255, 256, 258–263, 

265–267, 269, 271, 282, 

291, 312 

vector subspace, 261, 261, 262 

wave equation, 119, 124 

wave speed, 101 

Zeno of Elea, 130 

Zeno’s Second Paradox, 133 

zero vector, 256, 257, 259, 260, 

312

Just click here. - Bad Request - University of Southern Queensland

Create successful ePaper yourself

Delete template?

Save as template?