04.03.2015 Views

ECON 381 SC Foundations Of Economic Analysis ... - Economics

ECON 381 SC Foundations Of Economic Analysis ... - Economics

ECON 381 SC Foundations Of Economic Analysis ... - Economics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>ECON</strong> <strong>381</strong> <strong>SC</strong> <strong>Foundations</strong> <strong>Of</strong> <strong>Economic</strong> <strong>Analysis</strong><br />

2009<br />

John Hillas and Dmitriy Kvasov<br />

University of Auckland


Contents<br />

Chapter 1. Logic, Sets, Functions, and Spaces 1<br />

1. Logic 1<br />

2. Sets 3<br />

3. Binary Relations 4<br />

4. Functions 5<br />

5. Spaces 7<br />

6. Metric Spaces and Continuous Functions 8<br />

7. Open sets, Compact Sets, and the Weierstrass Theorem 10<br />

8. Sequences and Subsequences 11<br />

9. Linear Spaces 14<br />

Chapter 2. Linear Algebra 17<br />

1. The Space R n 17<br />

2. Linear Functions from R n to R m 19<br />

3. Matrices and Matrix Algebra 20<br />

4. Matrices as Representations of Linear Functions 22<br />

5. Linear Functions from R n to R n and Square Matrices 24<br />

6. Inverse Functions and Inverse Matrices 25<br />

7. Changes of Basis 25<br />

8. The Trace and the Determinant 28<br />

9. Calculating and Using Determinants 30<br />

10. Eigenvalues and Eigenvectors 34<br />

Chapter 3. Consumer Behaviour: Optimisation Subject to the Budget<br />

Constraint 37<br />

1. Constrained Maximisation 37<br />

2. The Implicit Function Theorem 41<br />

3. The Theorem of the Maximum 43<br />

4. The Envelope Theorem 45<br />

5. Applications to Microeconomic Theory 49<br />

Chapter 4. Topics in Convex <strong>Analysis</strong> 57<br />

1. Convexity 57<br />

2. Support and Separation 61<br />

i


CHAPTER 1<br />

Logic, Sets, Functions, and Spaces<br />

1. Logic<br />

All the aspects of logic that we describe in this section are part of what is called<br />

first order or propositional logic.<br />

We start by supposing that we have a number of atomic statements, which we<br />

denote by lower case letters, p, q, r. Examples of such statements might be<br />

Consumer 1 is a utility maximiser<br />

the apple is green<br />

the price of good 3 is 17.<br />

We assume that each atomic statement is either true or false.<br />

Given these atomic statements we can form other statements using logical connectives.<br />

If p is a statement then ¬p, read not p, is the statement that is true precisely<br />

when p is false. If both p and q are statements then p ∧ q, read p and q, is the<br />

statement that is true when both p and q are true and false otherwise. If both p<br />

and q are statements then p ∨ q, read p or q, is the statement that is true when<br />

either p and q are true, that is, the statement that is false only if both p and q are<br />

false.<br />

We could make do with these three symbols together with brackets to group<br />

symbols and tell us what to do first. For example we could have the complicated<br />

statement ((p ∧ q) ∨ (p ∧ r)) ∨ ¬s. This means that at least one of two statements<br />

is true. The first is that either both p and q are true or both p and r are true. The<br />

second is that s is not true.<br />

Exercise 1. Think about the meaning of the statement we have just considered.<br />

Can you see a more straightforward statement that would mean the same<br />

thing?<br />

While we don’t strictly need any more symbols it is certainly convenient to<br />

have at least a couple more. If both p and q are statements then p ⇒ q, read if p<br />

then q or p implies q or p is sufficient for q or q is necessary for p, is the statement<br />

that is false when p is true and q is false and is true otherwise. Many people find<br />

this a bit nonintuitive. In particular, one might wonder about the truth of this<br />

statement when p is false and q is true. A simple (and correct) answer is that this<br />

is a definition. It is simply what we mean by the symbol and there isn’t any point<br />

in arguing about definitions. However there is a sense in which the definition is<br />

what is implied by the informal statements. When we say “if p then q” we are<br />

saying that in any situation or state in which p is true then q is also true. We are<br />

not making any claim about what might or might not be the case when p is not<br />

true. So, in states in which p is not true we make no claim about q and so our<br />

statement is true whether q is true or false. Instead of p ⇒ q we can write q ⇐ p.<br />

In this case we are most likely to read the statement as q only if p or q is necessary<br />

for p.<br />

1


2 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />

If p ⇒ q and p ⇐ q (that is q ⇒ p) then we say that p if and only if q or p is<br />

necessary and sufficient for q and write p ⇔ q.<br />

One powerful method of analysing logical relationships is by means of truth<br />

tables. A truth table lists all possible combinations of the truth values of the<br />

atomic statements and the associated truth values of the compound statements.<br />

If we have two atomic statements then the following table gives the four possible<br />

combinations of truth values.<br />

p q<br />

T T<br />

F T<br />

T F<br />

F F<br />

Now, we can add a column that would, for each combination of truth values of<br />

p and q, give the truth value of p ⇒ q, just as described above.<br />

p q p ⇒ q<br />

T T T<br />

F T T<br />

T F F<br />

F F T<br />

Such truth tables allow us to see the logical relationship between various statements.<br />

Suppose we have two compound statements A and B and we form a truth<br />

table showing the truth values of A and B for each possible profile of truth values<br />

of the atomic statements that constitute A and B. If in each row in which A is true<br />

B is also true then statement A implies statement B. If statements A and B have<br />

the same truth value is each row then statements A and B are logically equivalent.<br />

For example I claim that the statement p ⇒ q we have just considered is logically<br />

equivalent to ¬p ∨ q. We can see this by adding columns to the truth table we have<br />

just considered. Let me add a column for ¬p and then one for ¬p ∨ q. (we only add<br />

the column for ¬p to make it easier).<br />

p q p ⇒ q ¬p ¬p ∨ q<br />

T T T F T<br />

F T T T T<br />

T F F F F<br />

F F T T T<br />

Since the third column and the fifth column contain exactly the same truth values<br />

we see that the two statements, p ⇒ q and ¬p ∨ q are indeed logically equivalent.<br />

Exercise 2. Construct the truth table for the statement ¬(¬p ∨ ¬q). Is it<br />

possible to write this statement using fewer logical connectives? Hint: why not<br />

start with just one?<br />

Exercise 3. Prove that the following statements are equivalent:<br />

(i) (p ∨ ¬q) ⇒ ((¬p) ∧ q) and ¬(q ⇒ p),<br />

(ii) p ⇒ q and ¬q ⇒ ¬p.<br />

In part (ii) the second statement is called the contrapositive of the first statement.<br />

<strong>Of</strong>ten if you are asked to prove that p implies q it will be easier to show the<br />

contrapositive, that is, that not q implies not p.<br />

Exercise 4. Prove that the following statements are equivalent:<br />

(i) ¬(p ∧ q) and ¬p ∨ ¬q,<br />

(ii) ¬(p ∨ q) and ¬p ∧ ¬q.


2. SETS 3<br />

These two equivalences are known as De Morgan’s Laws.<br />

A tautology is a statement that is necessarily true. For example if the statements<br />

A and B are logically equivalent then the statement A ⇔ B is a tautology.<br />

If A logically implies B then A ⇒ B is a tautology. We can check whether a compound<br />

statement is a tautology by writing a truth table for this statement. If the<br />

statement is a tautology then its truth value should be T in each row of its truth<br />

table.<br />

A contradiction is a statement that is necessarily false, that is, a statement A<br />

such that ¬A is a tautology. Again, we can see whether a statement is a contradiction<br />

by writing a truth table for the statement.<br />

2. Sets<br />

Set theory was developed in the second half of the 19th century and is at the<br />

very foundation of modern mathematics. But we shall not be concerned here with<br />

the development of the theory. Rather we shall only give the basic language of set<br />

theory and outline some of the very basic operations on sets.<br />

We start by defining a set to be a collection of objects or elements. We will<br />

usually denote sets by capital letters and their elements by lower case letters. If<br />

the element a is in the set A we write a ∈ A. If every element of the set B is also<br />

in the set A we call B a subset of the set A and write B ⊂ A. We shall also say<br />

the A contains B. If A and B have exactly the same elements then we say they<br />

are equal or identical. Alternatively we could say A = B if and only if A ⊂ B and<br />

B ⊂ A. If B ⊂ A and B ≠ A then we say that B is a proper subset of A or that A<br />

strictly contains B.<br />

Exercise 5. How many subsets a set with N elements has?<br />

In order to avoid the paradoxes such as the one referred to in the first paragraph<br />

we shall always assume that in whatever situation we are discussing there is some<br />

given set U called the universal set which contains all of the sets with which we<br />

shall deal.<br />

We customarily enclose our specification of a set by braces. In order to specify<br />

a set one may simply list the elements. For example to specify the set D which<br />

contains the numbers 1,2, and 3 we may write D = {1, 2, 3}. Alternatively we may<br />

define the set by specifying a property that identifies the elements. For example<br />

we may specify the same set D by D = {x | x is an integer and 0 < x < 4}. Notice<br />

that this second method is more powerful. We could not, for example, list all<br />

the integers. (Since there are an infinite number of them we would die before we<br />

finished.)<br />

For any two sets A and B we define the union of A and B to be the set which<br />

contains exactly all of the elements of A and all the elements of B. We denote the<br />

union of A and B by A ∪ B. Similarly we define the intersection of A and B to<br />

be that set which contains exactly those elements which are in both A and B. We<br />

denote the intersection of A and B by A ∩ B. Thus we have<br />

A ∪ B = {x | x ∈ A or x ∈ B}<br />

A ∩ B = {x | x ∈ A and x ∈ B}.<br />

Exercise 6. The oldest mathematician among chess players and the oldest<br />

chess player among mathematicians is it the same person or (possibly) different<br />

ones?<br />

Exercise 7. The best mathematician among chess players and the best chess<br />

player among mathematicians is it the same person or (possibly) different ones?


4 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />

Exercise 8. Every tenth mathematician is a chess player and every fourth<br />

chess player is a mathematician. Are there more mathematicians or chess players<br />

and by how many times?<br />

Exercise 9. Prove the distributive laws for operations of union and intersection.<br />

(i) (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)<br />

(ii) (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)<br />

Just as the number zero is extremely useful so the concept of a set that has<br />

no elements is extremely useful also. This set we call the empty set or the null set<br />

and denote by ∅. To see one use of the empty set notice that having such a concept<br />

allows the intersection of two sets be well defined whether or not the sets have any<br />

elements in common.<br />

We also introduce the concept of a Cartesian product. If we have two sets, say<br />

A and B, the Cartesian product, A × B, is the set of all ordered pairs, (a, b) such<br />

that a is an element of A and b is an element of B. Symbolically we write<br />

A × B = {(a, b) | a ∈ A and b ∈ B}.<br />

3. Binary Relations<br />

There are a number of ways of formulating the notion of a binary relation. We<br />

shall pursue one, defining a binary relation on a set X simply as a subset of X × X,<br />

the Cartesian product of X with itself.<br />

Definition 1. A binary relation R on the set X is a subset of X × X. If the<br />

point (x, y) ∈ R we shall often write xRy instead of (x, y) ∈ R.<br />

Since we have already defined the notions of Cartesian product and subset,<br />

there is really nothing new here. However the structure and properties of binary<br />

relations that we shall now study is motivated by the informal notion of a “relation”<br />

between the elements of X.<br />

Example 1. Suppose that X is a set of boys and girls and the relation xSy is<br />

“x is a sister of y.”<br />

Example 2. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.<br />

There are binary relations >, ≥, and =.<br />

Example 3. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.<br />

The relations R, P , and I are defined by<br />

xRy if and only if x + 1 ≥ y,<br />

xP y if and only if x > y + 1, and<br />

xIy if and only if −1 ≤ x − y ≤ 1.<br />

Definition 2. The following properties of binary relations have been defined<br />

and found to be useful.<br />

(BR1) Reflexivity: For all x in X xRx.<br />

(BR2) Irreflexivity: For all x in X not xRx.<br />

(BR3) Completeness: For all x and y in X either xRy or yRx (or both). 1<br />

(BR4) Transitivity: For all x, y, and z in X if xRy and yRz then xRz.<br />

(BR5) Negative Transitivity: For all x, y, and z in X if xRy then either<br />

xRz or zRy (or both).<br />

(BR6) Symmetry: For all x and y in X if xRy then yRx.<br />

(BR7) Anti-Symmetry: For all x and y in X if xRy and yRx then x = y.<br />

(BR8) Asymmetry: For all x and y in X if xRy then not yRx.<br />

1 We shall always implicitly include “or both” when we say “either. . . or.”


4. FUNCTIONS 5<br />

Exercise 10. Show that completeness implies reflexivity, that asymmetry implies<br />

anti-symmetry, and that asymmetry implies irreflexivity.<br />

Exercise 11. Which properties does the relation described in Example 1 satisfy?<br />

Exercise 12. Which properties do the relations described in Example 2 satisfy?<br />

Exercise 13. Which properties do the relations described in Example 3 satisfy?<br />

We now define a few particularly important classes of binary relations.<br />

Definition 3. A weak order is a binary relation that satisfies transitivity and<br />

completeness.<br />

Definition 4. A strict partial order is a binary relation that satisfies transitivity<br />

and asymmetry.<br />

Definition 5. An equivalence is a binary relation that satisfies transitivity<br />

and symmetry.<br />

You have almost certainly already met examples of such binary relations in<br />

your study of <strong>Economic</strong>s. We normally assume that weak preference, strict preference,<br />

and indifference of a consumer are weak orders, strict partial orders, and<br />

equivalences, though we actually typically assume a little more about the strict<br />

preference.<br />

The following construction is also motivated by the idea of preference. Let<br />

us consider some binary relation R which we shall informally think of as a weak<br />

preference relation, though we shall not, for the moment, make any assumptions<br />

about the properties of R. Consider the relations P defined by xP y if and only if<br />

xRy and not yRx and I defined by xRy if and only if xRy and yRx.<br />

Exercise 14. Show that if R is a weak order then P is a strict partial order<br />

and I is an equivalence.<br />

We could also think of starting with a strict preference P and defining the weak<br />

preference R in terms of P . We could do so either by defining R as xRy if and only<br />

if not yP x or by defining R as xRy if and only if either xP y or not yP x.<br />

Exercise 15. Show that these two definitions of R coincide if P is asymmetric.<br />

Exercise 16. Show by example that P may be a strict partial order (so, by<br />

the previous result, the two definitions of R coincide) but R not a weak order.<br />

[Hint: If you cannot think of another example consider the binary relations defined<br />

in Example 3.]<br />

Exercise 17. Show that if P is asymmetric and negatively transitive then<br />

(i) P is transitive (and hence a strict partial order), and<br />

(ii) R is a weak order.<br />

4. Functions<br />

Let X and Y be two sets. A function (or a mapping) f from the set X to the<br />

set Y is a rule that assigns to each x in X a unique element in Y , denoted by f(x).<br />

The notation<br />

f : X → Y.


6 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />

is standard. The set X is called the domain of f and the set Y is called the<br />

codomain of f. The set of all values taken by f, i.e. the set<br />

{y ∈ Y | there exists x in X such that y = f(x)}<br />

is called the range of f. The range of a function need not coincide with its codomain<br />

Y .<br />

There are several useful ways of visualising functions. A function can be thought<br />

of as a machine that operates on elements of the set X and transforms an input<br />

x into a unique output f(x). Note that the machine is not required to produce<br />

different outputs from different inputs. This analogy helps to distinguish between<br />

the function itself, f, and its particular value, f(x). The former is the machine,<br />

the latter is the output 2 ! One of the reasons for this confusion is that in practice,<br />

to avoid being verbose, people often say things like ‘consider a function U(x, y) =<br />

x α y β ’ instead of saying ‘consider a function defined for every pair (x, y) in R 2 by<br />

the equation U(x, y) = x α y β ’.<br />

A function can also be thought of as a transformation, or a mapping, of the set<br />

X into the set Y . In line with this interpretation is the common terminology, it is<br />

said that f(x) is the image of x under the function f. Again, it is important to<br />

remember that there may be points of Y which are the images of no point of X and<br />

that there may be different points of X which have the same images in Y . What is<br />

absolutely prohibited, however, is for a point from X to have several images in Y !<br />

The part of definition of the function is the specification of its domain. However,<br />

in applications, functions are quite often defined as an algebraic formula, without<br />

explicit specification of its domain. For example, a function may be defined as<br />

f(x) = sin x + 145x 2 .<br />

The function f is then the rule that assigns the value sin x + 145x 2 to each value of<br />

x. The convention in such cases is that the domain of f is the set of all values of x<br />

for which the formula gives a unique value. Thus, if you come, for instance, across<br />

the function f(x) = 1/x you should assume that its domain is (−∞, 0) ∪ (0, ∞),<br />

unless specified otherwise.<br />

For any subset A of X, the subset f(A) of Y such that y = f(x) for some x in<br />

X is called the image of A by f, that is,<br />

f(A) = {y ∈ Y | there exists x in A such that y = f(x)}.<br />

Thus, the the range of f can be written as f(X). Similarly, one can define the<br />

inverse image. For any subset B of Y , the inverse image f −1 (B) of B is the set of<br />

x in X such that f(x) is in B, that is,<br />

f −1 (B) = {x ∈ X | f(x) ∈ B}.<br />

A function f is called a function onto Y (or surjection) if the range of f is Y ,<br />

i.e., if for every y ∈ Y there is (at least) one x ∈ X such that y = f(x). In other<br />

words, each element of Y is the image of (at least) one element of X. A function f is<br />

called one-to-one (or injection) if f(x 1 ) = f(x 2 ) implies x 1 = x 2 , that is, for every<br />

element y of f(X) there is a unique element x of X such that y = f(x). In other<br />

words, one-to-one function maps different elements of X into different elements of<br />

Y . When a function f : X → Y is both onto and one-to-one it is called a bijection.<br />

Exercise 18. Suppose that a set X has m elements and a set Y has n ≥ m<br />

elements. How many different functions are there from X to Y ? from Y to X?<br />

How many of them surjective? How many of them injective? How many of them<br />

bijective?<br />

2 Mathematician Robert Bartle put it as follows. ”Only a fool would confuse sausage-grinder<br />

with a sausage; however, enough people have confused functions with their values...”


5. SPACES 7<br />

Exercise 19. Find a function f : N → N which is<br />

(i) surjective but not injective,<br />

(ii) injective but not surjective,<br />

(iii) neither surjective nor injective,<br />

(iv) bijective<br />

If function f is a bijection then it is possible to define a function g : Y → X<br />

such that g(y) = x where x = f(y). Thus, to each element y of Y is assigned an<br />

element x in X whose image under f is y. Since f is onto, g is defined for every y<br />

of Y and since f is one-to-one g(y) is unique. The function g is called the inverse of<br />

f and is usually written as f −1 . In that case, however, it’s not immediately clear<br />

what f −1 (x) means. Is it the inverse image of x under f or the image of x under<br />

f −1 ? Happily enough they are the same if f −1 exists!<br />

Exercise 20. Prove that when a function f −1 exists it is both onto and oneto-one<br />

and that the inverse of f −1 is the function f itself.<br />

If f : X → Y and g : Y → Z, then the function h : X → Z, defined as<br />

h(x) = g(f(x)), is called the composition of g with f and denoted by g ◦ f. Note<br />

that even if f ◦ g is well-defined it is usually, different from g ◦ f.<br />

Exercise 21. Let f : X → Y . Prove that there exist a surjection g : X → A<br />

where A ⊆ X and a injection h : A → Y such that f = h ◦ g. In other words, prove<br />

that any function can be written as a composition of a surjection and an injection.<br />

The set G ⊂ X ×Y of ordered pairs (x, f(x)) is called the graph of the function<br />

f 3 . <strong>Of</strong> course, the fact that something is called a graph does not necessarily mean<br />

that it can be drawn!<br />

5. Spaces<br />

Sets are reasonably interesting mathematical objects to study. But to make<br />

them even more interesting (and useful for applications) sets are usually endowed<br />

with some additional properties, or structures. These new objects are called spaces.<br />

The structures are often modeled after the familiar properties of space we live in and<br />

reflect (in axiomatic form) such notions as order, distance, addition, multiplication,<br />

etc.<br />

Probably one of the most intuitive spaces is the space of the real numbers, R.<br />

We will briefly look at the axiomatic way of describing some of its properties.<br />

Given the set of real numbers R, the operation of addition is the function<br />

+ : R × R → R that maps any two elements x and y in R to an element denoted<br />

by x + y and called the sum of x and y. The addition satisfies the following axioms<br />

for all real numbers x, y, and z.<br />

A1: x + y = y + x.<br />

A2: (x + y) + z = x + (y + z).<br />

A3: There exist an element, denoted by 0, such that x + 0 = x.<br />

A4: For each x there exist an element, denoted by −x, such that x + (−x) = 0.<br />

All the remaining properties of the addition can be proven using these axioms.<br />

Note also that we can define another operation x − y as x + (−y) and call it<br />

subtraction.<br />

3 Some people like the idea of the graph of a function so much that they define a function to<br />

be its graph.


8 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />

Exercise 22. Prove that the axioms for addition imply the following statements.<br />

(i) The element 0 is unique.<br />

(ii) If x + y = x + z then y = z (a cancelation law).<br />

(iii) −(−x) = x.<br />

The operation of multiplication can be axiomatised in a similar way. Given the<br />

set of real numbers, R, the operation of multiplication is the function · : R × R → R<br />

that maps any two elements x and y in R to an element denoted by x · y and called<br />

the product of x and y. The multiplication satisfies the following axioms for all real<br />

numbers x, y, and z.<br />

A5: x · y = y · x.<br />

A6: (x · y) · z = x · (y · z).<br />

A7: There exist an element, denoted by 1, such that x · 1 = x.<br />

A8: For each x ≠ 0 there exist an element, denoted by x −1 , such that<br />

x · x −1 = 1.<br />

One more axiom (a distributive law) brings these two operations, addition and<br />

multiplication 4 , together.<br />

A9: x(y + z) = xy + xz for all x, y, and z in R.<br />

Another structure possessed by the real numbers has to do with the fact that<br />

the real numbers are ordered. The notion of x less than y can be axiomatised as<br />

follows. For any two distinct elements x and y either x < y or y < x and, in<br />

addition, if x < y and y < z then x < z.<br />

Another example of a space (very important and useful one) is n−dimensional<br />

real space 5 . Given the natural number n, define R n to be the set of all possible<br />

ordered n−tuples of n real numbers, with generic element denoted by x =<br />

(x 1 , . . . , x n ). Thus, the space R n is the n−fold Cartesian product of the set R with<br />

itself. Real numbers x 1 , . . . , x n are called coordinates of the vector x. Two vectors<br />

x and y are equal if and only if x 1 = y 1 , . . . , x n = y n . The operation of addition of<br />

two vectors is defined as<br />

x + y = (x 1 + y 1 , . . . , x n + y n ).<br />

Exercise 23. Prove that the addition of vectors in R n satisfies the axioms of<br />

addition.<br />

The role of multiplication in this space is player by the operation of multiplication<br />

by real number defined for all x in R n and all α in R by<br />

αx = (αx 1 , . . . , αx n ).<br />

Exercise 24. Prove that the multiplication by real number satisfies a distributive<br />

law.<br />

6. Metric Spaces and Continuous Functions<br />

The notion of metric is the generalisation of the notion of distance between two<br />

real numbers.<br />

Let X be a set and d : X ×X → R a function. The function d is called a metric<br />

if it satisfies the following properties for all x, y, and z in X.<br />

1. d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y,<br />

2. d(x, y) = d(y, x),<br />

3. d(x, y) ≤ d(x, z) + d(z, y).<br />

4 From now on, to go easy on notation we will follow the standard convention not to write<br />

the symbol for multiplication, that is to write xy instead of x · y, etc.<br />

5 We haven’t defined what the word dimension means yet, so just treat it as a (fancy) name.


6. METRIC SPACES AND CONTINUOUS FUNCTIONS 9<br />

The set X together with the function d is called a metric space, elements of X<br />

are usually called points, and the number d(x, y) is called the distance between x<br />

and y. The last property of a metric is called triangle inequality.<br />

Exercise 25. Let X be a non-empty set and d : X × X → R be the function<br />

that satisfies the following two properties for all x, y, and z in X.<br />

(i) d(x, y) = 0 if and only if x = y,<br />

(ii) d(x, y) ≤ d(x, z) + d(z, y).<br />

Prove that d is a metric.<br />

Exercise 26. Prove that d(x, y)+d(w, z) ≤ d(x, w)+d(x, z)+d(y, w)+d(y, z)<br />

for all x, y, w, and z in X, where d is some metric on X.<br />

An obvious example of a metric space is the the set of real numbers, R, together<br />

with the ‘usual’ distance, d(x, y) = |x − y|. Another example is the n−dimensional<br />

Euclidean space R n with metric<br />

d(x, y) = √ (x 1 − y 1 ) 2 + · · · + (x n − y n ) 2 .<br />

Note that the same set can be endowed with the different metrics thus resulting<br />

in the different metric spaces! For example, the set of all n−tuples of real numbers<br />

can be made into metric space by use of the (non-Euclidean) metric<br />

d T (x, y) = |x 1 − y 1 | + · · · + |x n − y n |,<br />

which is different from metric space R n . This metric is sometimes called the Manhattan<br />

(or taxicab) metric. Another curious metric is the so-called French railroad<br />

metric, defined by<br />

{<br />

d F (x, y) =<br />

0 if x = y<br />

d(x, P ) + d(y, P ) if x ≠ y<br />

where P is the particular point of R n (called Paris) and function d is the Euclidean<br />

distance.<br />

Exercise 27. Prove that the French railroad metric d F is a metric.<br />

Exercise 28. Let X be a non-empty set and d : X × X → R be the function<br />

defined by<br />

{ 1 if x ≠ y<br />

d(x, y) =<br />

0 if x = y<br />

Prove that d is a metric. (This metric is called the discrete metric.)<br />

Using the notion of metric it is possible to generalise the idea of continuous<br />

function.<br />

Suppose (X, d X ) and (Y, d Y ) are metric spaces, x 0 ∈ X, and f : X → Y is a<br />

function. Then f is continuous at x 0 if for every ε > 0 there exists a δ > 0 such<br />

that<br />

d Y (f(x 0 ), f(x)) < ε<br />

for all points x ∈ X for which d X (x 0 , x) < δ.<br />

The function f is continuous on X if f is continuous at every point of X.<br />

Let’s prove that function f(x) = x is continuous on R using the above definition.<br />

For all x 0 ∈ R, we have |f(x 0 ) − f(x)| = |x 0 − x| < ε as long as |x 0 − x| < δ = ε.<br />

That is, given any ε > 0 we are always able to find a δ, namely δ = ε, such that<br />

all points which are closer to x 0 than δ will have images which are closer to f(x 0 )<br />

than ε.


10 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />

Exercise 29. Let f : R → R be the function defined by<br />

{<br />

1/x if x ≠ 0<br />

f(x) =<br />

0 if x = 0<br />

Prove that f is continuous at every point of R, with the exception of 0.<br />

7. Open sets, Compact Sets, and the Weierstrass Theorem<br />

Let x be a point in a metric space and r > 0. The open ball B(x, r) of radius<br />

r centred at x is the set of all y ∈ X such that d(x, y) < r. Thus, the open ball is<br />

the set of all points whose distance from the centre is strictly less than r. The ball<br />

is closed if the inequality is weak, d(x, y) ≤ r.<br />

A set S in a metric space is open if for all x ∈ S there exists r ∈ R, r > 0 such<br />

that B(x, r) ⊂ S. A set S is closed if its complement<br />

is open.<br />

S C = {x ∈ X :| x /∈ S}<br />

Exercise 30. Prove that an open ball is an open set.<br />

Exercise 31. Prove that the intersection of any finite number of open sets is<br />

the open set.<br />

A set S is bounded if there exists a closed ball of finite radius that contains it.<br />

Formally, S is bounded if there exists a closed ball B(x, r) such that S ⊂ B(x, r).<br />

Exercise 32. Prove that the set S is bounded if and only if there a exists a<br />

real number p > 0 such that d(x, x ′ ) ≤ p for all x and x ′ in S.<br />

Exercise 33. Prove that the union of two bounded sets is a bounded set.<br />

A collection (possibly infinite) of open sets U 1 , U 2 , . . . in a metric space is an<br />

open cover of the set S if S is contained in its union.<br />

A set S is compact if every open cover of S has a finite subcover. That is from<br />

any open cover can select a finite number of sets U i that still cover S.<br />

Note that the definition does not say that a set is compact if there is a finite<br />

open cover! That wouldn’t be a good definition as you can cover any set with the<br />

whole space, which is just one open set.<br />

Let’s see how to use this definition to show that something is not compact.<br />

Consider the set (0, 1) ∈ R. To prove that it is not compact we need to find an<br />

open cover of (0, 1) from which we cannot select a finite cover. The collection of<br />

open intervals (1/n, 1) for all integers n ≥ 2 is an open cover of (0, 1), because for<br />

any point x ∈ (0, 1) it is always able to find an integer n such that n > 1/x, thus<br />

x ∈ (1/n, 1). But, no finite subcover will do! Let (1/N, 1) be the maximal interval<br />

in a candidate subcover then it is always possible to find a point x ∈ (0, 1) such<br />

that N < 1/x.<br />

While this definition of compactness is quite useful for finding out when the set<br />

under question is not compact it is less useful for verifying that a set is indeed compact.<br />

Much more convenient characterisation of compact sets in finite-dimensional<br />

Euclidean space, R n , is given by the following theorem.<br />

Theorem 1. Any closed and bounded subset of R n is compact.<br />

But why are we interested in compactness at all? Because of the following extremely<br />

important theorem the first version of which was proved by Carl Weierstrass<br />

around 1860.<br />

Theorem 2. Let S be a compact set in a metric space and f : S → R be a<br />

continuous function. Then function f attains its maximum and minimum in S.


8. SEQUENCES AND SUBSEQUENCES 11<br />

And why this theorem is important for us? Because many economic problems<br />

are concerned with finding a maximal (or a minimal) value of a function on some set.<br />

Weierstrass theorem provides conditions under which such search is meaningful!!!<br />

This theorem and its implications will be much dwelt upon later in the notes, so<br />

we just give here one example. The consumer utility maximisation problem is the<br />

problem of finding the maximum of utility function subject to the budget constraint.<br />

According to Weierstrass theorem, this problem has a solution if utility function is<br />

continuous and the budget set is compact.<br />

8. Sequences and Subsequences<br />

Let us consider again some metric space (X, d). An infinite sequence of points<br />

in (X, d) is simply a list<br />

x 1 , x 2 , x 3 , . . . ,<br />

where . . . indicates that the list continues “forever.”<br />

We can be a bit more formal about this. We first consider the set of natural<br />

numbers (or counting numbers) 1, 2, 3, . . . , which we denote N. We can now define<br />

an infinite sequence in the following way.<br />

X.<br />

Definition 6. An infinite sequence of elements of X is a function from N to<br />

Notation. If we look at the previous definition we see that we might have<br />

a sequence s : N → X which would define s(1), s(2), s(3), . . . or in other words<br />

would define s(n) for any natural number n. Typically when we are referring to<br />

sequences we use subscripts (or sometimes superscripts) instead of parentheses and<br />

write s 1 , s 2 , s 3 , . . . and s n instead of s(1), s(2), s(3), . . . and s(n). Also rather than<br />

saying that s : N → X is a sequence we say that {s n } is a sequence or even that<br />

{s n } ∞ n=1 is a sequence.<br />

Lets now examine a few examples.<br />

Example 4. Suppose that (X, d) is R the real numbers with the usual metric<br />

d(, x, y) = |x − y|. Then {n}, { √ n}, and {1/n} are sequences.<br />

Example 5. Again, suppose that (X, d) is R the real numbers with the usual<br />

metric d(x, y) = |x − y|. Consider the sequence {x n } where<br />

{<br />

1 if n is odd<br />

x n =<br />

0 if n is even<br />

We see that {n} and { √ n} get arbitrary large as n gets larger, while in the last<br />

example x n “bounces” back and forth between 0 and 1 as n gets larger. However for<br />

{1/n} the element of the sequence gets closer and closer to 0 (and indeed arbitrarily<br />

close to 0). We say, in this case, that the sequence converges to zero or that the<br />

sequence has limit 0. This is a particularly important concept and so we shall give<br />

a formal definition.<br />

Definition 7. Let {x n } be a sequence of points in (X, d). We say that the<br />

sequence converges to x 0 ∈ X if for any ε > 0 there is N ∈ N such that if n > N<br />

then d(x n , x 0 ) < ε.<br />

Informally we can describe this by saying that if n is large then the distance<br />

from x n to x 0 is small.<br />

If the sequence {x n } converges to x 0 , then we often write x n → x 0 as n → ∞<br />

or lim n→∞ x n = x 0 .


12 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />

Exercise 34. Show that if the sequence {x n } converges to x 0 then it does not<br />

converge to any other value unequal to x 0 . Another way of saying this is that if<br />

the sequence converges then it’s limit is unique.<br />

We have now seen a number of examples of sequences. In some the sequence<br />

“runs off to infinity;” in others it “bounces around;” while in others it converges to<br />

a limit. Could a sequence do anything else? Could a sequence, for example, settle<br />

down each element getting closer and closer to all future elements in the sequence<br />

but not converging to any particular limit? In fact, depending on what the space<br />

X is this is indeed possible.<br />

First let us recall the notion of a rational number. A rational number is a<br />

number that can be expressed as the ratio of two integers, that is r is rational if<br />

r = a/b with a and b integers and b ≠ 0. We usually denote the set of all rational<br />

numbers Q (since we have already used R for the real numbers). We now consider<br />

and example in which the underlying space X is Q. Consider the sequence of<br />

rational numbers defined in the following way<br />

x 1 = 1<br />

x n+1 = x n + 2<br />

x n + 1 .<br />

This kind of definition is called a recursive definition. Rather than writing, as a<br />

function of n, what x n is we write what x 1 is and then what x n+1 is as a function<br />

of what x n is. We can obviously find any element of the sequence that we need, as<br />

long as we sequentially calculate each previous element. In our case we’d have<br />

x 1 = 1<br />

x 2 = 1 + 2<br />

1 + 1 = 3 2 = 1.5<br />

x 3 =<br />

x 4 =<br />

x 5 =<br />

x 6 =<br />

.<br />

3<br />

2 + 2<br />

3<br />

2 + 1 = 7 5 = 1.4<br />

7<br />

5 + 2<br />

7<br />

5 + 1 = 17<br />

12 ≈ 1.416667<br />

17<br />

12 + 2<br />

17<br />

12 + 1 = 41<br />

29 ≈ 1.413793<br />

41<br />

29 + 2<br />

41<br />

29 + 1 = 99<br />

70 ≈ 1.414286<br />

We see that the sequence goes up and down but that it seems to be “converging.”<br />

What is it converging to? Lets suppose that it’s converging to some value x 0 .<br />

Recall that<br />

x n+1 = x n + 2<br />

x n + 1 .<br />

We’ll see later that if f is a continuous function then lim n → ∞f(x n ) = f(lim n → ∞x n ).<br />

In this case that means that<br />

x 0 = lim n → ∞x n+1 = lim n → ∞ x n + 2<br />

x n + 1<br />

= x 0 + 2<br />

x 0 + 1 .<br />

Thus we have<br />

x 0 = x 0 + 2<br />

x 0 + 1


8. SEQUENCES AND SUBSEQUENCES 13<br />

and if we solve this we obtain x 0 = ± √ 2. Clearly if x n > 0 then x n+1 > 0 so<br />

our sequence can’t be converging to − √ 2 so we must have x 0 = √ 2. But √ 2 is<br />

not in Q. Thus we have a sequence of elements in Q that are getting very close to<br />

each other but are not converging to any element of Q. (<strong>Of</strong> course the sequence is<br />

converging to a point in R. In fact one construction of the real number system is<br />

in terms of such sequences in Q.<br />

Definition 8. Let {x n } be a sequence of points in (X, d). We say that the<br />

sequence is a Cauchy sequence if for any ε > 0 there is N ∈ N such that if n, m > N<br />

then d(x n , x m ) < ε.<br />

Exercise 35. Show that if {x n } converges then {x n } is a Cauchy sequence.<br />

A metric space (X, d) in which every Cauchy sequence converges to a limit in<br />

X is called a complete metric space. The space of real numbers R is a complete<br />

metric space, while the space of rationals Q is not.<br />

Exercise 36. Is N the space of natural or counting numbers with metric d<br />

given by d(x, y) = |x − y| a complete metric space?<br />

In Section 6 we defined the notion of a function being continuous at a point.<br />

It is possible to give that definition in terms of sequences.<br />

Definition 9. Suppose (X, d X ) and (Y, d Y ) are metric spaces, x 0 ∈ X, and<br />

f : X → Y is a function. Then f is continuous at x 0 if for every sequence {x n } that<br />

converges to x 0 in (X, d X ) the sequence {f(x n )} converges to f(x 0 ) in (Y, d Y ).<br />

Exercise 37. Show that the function f(x) = (x + 2)/(x + 1) is continuous at<br />

any point x ≠ −1. Show that this means that if x n → x 0 as n → ∞ then<br />

x n + 2<br />

lim<br />

n→∞ x n + 1 = x 0 + 2<br />

x 0 + 1 .<br />

We can also define the concept of a closed set (and hence the concepts of open<br />

sets and compact sets) in terms of sequences.<br />

Definition 10. Let (X, d) be a metric space. A set S ⊂ X is closed if for any<br />

convergent sequence {x n } such that x n ∈ S for all n then lim n→∞ x n ∈ S. A set is<br />

open if its complement is closed.<br />

Given a sequence {x n } we can define a new sequence by taking only some of<br />

the elements of the original sequence. In the example we considered earlier in which<br />

x n was 1 if n was odd and 0 if n was even we could take only the odd n and thus<br />

obtain a sequence that did converge. The new sequence is called a subsequence of<br />

the old sequence.<br />

Definition 11. Let {x n } be some sequence in (X, d). Let {n j } ∞ j=1 be a<br />

sequence of natural numbers such that for each j we have n j < n j+1 , that is<br />

n 1 < n 2 < n 3 < . . . . The sequence {x nj } ∞ j=1 is called a subsequence of the original<br />

sequence.<br />

The notion of a subsequence is often useful. We often use it in the way that<br />

we briefly referred to above. We initially have a sequence that may not converge,<br />

but we are able to take a subsequence that does converge. Such a subsequence is<br />

called a convergent subsequence.<br />

Definition 12. A subset of a metric space with the property that every sequence<br />

in the subset has a convergent subsequence is called sequentially compact.<br />

Theorem 3. In any metric space any compact set is sequentially compact.


14 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />

If we restrict attention to finite dimensional Euclidian spaces the situation is<br />

even better behaved.<br />

Theorem 4. Any subset of R n is sequentially compact if and only if it is<br />

compact.<br />

Exercise 38. Verify the following limits.<br />

n<br />

(i) lim<br />

n→∞ n + 1 = 1<br />

n + 3<br />

(ii) lim<br />

n→∞<br />

√n 2 + 1 = 0<br />

√<br />

(iii) lim n + 1 − n = 0<br />

n→∞<br />

n√<br />

(iv) an + b n = max{a, b}<br />

lim<br />

n→∞<br />

Exercise 39. Consider a sequence {x n } in R. What can you say about the<br />

sequence if it converges and for each n x n is an integer.<br />

Exercise 40. Consider the sequence<br />

1<br />

2 , 1 3 , 2 3 , 1 4 , 2 4 , 3 4 , 1 5 , 2 5 , 3 5 , 4 5 , 1 6 , . . . .<br />

For which values z ∈ R is there a subsequence converging to z?<br />

Exercise 41. Prove that if a subsequence of a Cauchy sequence converges to<br />

a limit z then so does the original Cauchy sequence.<br />

Exercise 42. Prove that any subsequence of a convergent sequence converges.<br />

Finally one somewhat less trivial exercise.<br />

Exercise 43. Prove that if lim n→∞ x n = z then<br />

x 1 + · · · + x n<br />

lim<br />

= z<br />

n→∞ n<br />

9. Linear Spaces<br />

The notion of linear space is the axiomatic way of looking at the familiar linear<br />

operations: addition and multiplication. A trivial example of a linear space is the<br />

set of real numbers, R.<br />

What is the operation of addition? The one way of answering the question is<br />

saying that the operation of addition is just the list of its properties. So, we will<br />

define the addition of elements from some set X as the operation that satisfies the<br />

following four axioms.<br />

A1: x + y = y + x for all x and y in X.<br />

A2: x + (y + z) = (x + y) + z, for all x, y, and z in X.<br />

A3: There exists an element, denoted by 0, such that x + 0 = x for all x in<br />

X.<br />

A4: For every x in X there exist an element y in X, called inverse of x, such<br />

that x + y = 0.<br />

And, to make things more interesting we will also introduce the operation of<br />

‘multiplication by number’ by adding two more axioms.<br />

A5: 1x = x for all x in X.<br />

A6: α(βx) = (αβ)x for all x in X and for all α and β in R.<br />

Finally, two more axioms relating addition and multiplication.<br />

A7: α(x + y) = αx + αy for all x and y in X and for all α in R.<br />

A8: (α + β)x = αx + βx for all x in X and for all α and β in R.


9. LINEAR SPACES 15<br />

Elements x, y, . . . , w are linearly dependent if there exist real numbers α, β, . . . , λ,<br />

not all of them equal to zero, such that<br />

αx + βy + · · · + λz = 0.<br />

Otherwise, the elements x, y, . . . , w are linearly independent.<br />

If in a space L it is possible to find n linearly independent elements, but any<br />

n + 1 elements are linearly dependent then we say that the space L has dimension<br />

n.<br />

Nonempty subset L ′ of a linear space L is called a linear subspace if L ′ forms<br />

a linear space in itself. In other words, L ′ is a linear subspace of L if for any x and<br />

y in L and all α and β in R<br />

αx + βy ∈ L ′ .


CHAPTER 2<br />

Linear Algebra<br />

1. The Space R n<br />

In the previous chapter we introduced the concept of a linear space or a vector<br />

space. We shall now examine in some detail one example of such a space. This is<br />

the space of all ordered n-tuples (x 1 , x 2 , . . . , x n ) where each x i is a real number.<br />

We call this space n-dimensional real space and denote it R n .<br />

Remember from the previous chapter that to define a vector space we not only<br />

need to define the points in that space but also to define how we add such points<br />

and how we multiple such points by scalars. In the case of R n we do this element<br />

by element in the n-tuple or vector. That is,<br />

and<br />

(x 1 , x 2 , . . . , x n ) + (y 1 , y 2 , . . . , y n ) = (x 1 + y 1 , x 2 + y 2 , . . . , x n + y n )<br />

α(x 1 , x 2 , . . . , x n ) = (αx 1 , αx 2 , . . . , αx n ).<br />

Let us consider the case that n = 2, that is, the case of R 2 . In this case we can<br />

visualise the space as in the following diagram. The vector (x 1 , x 2 ) is represented<br />

by the point that is x 1 units along from the point (0, 0) in the horizontal direction<br />

and x 2 units up from (0, 0) in the vertical direction.<br />

x 2<br />

✻<br />

(1, 2)<br />

2<br />

1<br />

✲<br />

x 1<br />

Figure 1<br />

Let us for the moment continue our discussion in R 2 . Notice that we are<br />

implicitly writing a vector (x 1 , x 2 ) as a sum x 1 × v 1 + x 2 × v 2 where v 1 is the<br />

unit vector in the first direction and v 2 is the unit vector in the second direction.<br />

Suppose that instead we considered the vectors u 1 = (2, 1) = 2 × v 1 + 1 × v 2 and<br />

17


18 2. LINEAR ALGEBRA<br />

u 2 = (1, 2) = 1 × v 1 + 2 × v 2 . We could have written any vector (x 1 , x 2 ) instead<br />

as z 1 × u 1 + z 2 × u 2 where z 1 = (2x 1 − x 2 )/3 and z 2 = (2x 2 − x 1 )/3. That is, for<br />

any vector in R 2 we can uniquely write that vector in terms of u 1 and u 2 . Is there<br />

anything that is special about u 1 and u 2 that allows us to make this claim? There<br />

must be since we can easily find other vectors for which this would not have been<br />

true. (For example, (1, 2) and (2, 4).)<br />

The property of the pair of vectors u 1 and u 2 is that they are independent. That<br />

is, we cannot write either as a multiple of the other. More generally in n dimensions<br />

we would say that we cannot write any of the vectors as a linear combination of<br />

the others, or equivalently as the following definition.<br />

Definition 13. The vectors x 1 , . . . , x k all in R n are linearly independent if it<br />

is not possible to find scalars α 1 , . . . , α k not all zero such that<br />

α 1 x 1 + · · · + α k x k = 0.<br />

Notice that we do not as a matter of definition require that k = n or even that<br />

k ≤ n. We state as a result that if k > n then the collection x 1 , . . . , x k cannot<br />

be linearly independent. (In a real maths course we would, of course, have proved<br />

this.)<br />

Comment 1. If you examine the definition above you will notice that there<br />

is nowhere that we actually need to assume that our vectors are in R n . We can<br />

in fact apply the same definition of linear independence to any vector space. This<br />

allows us to define the concept of the dimension of an arbitrary vector space as the<br />

maximal number of linearly independent vectors in that space. In the case of R n<br />

we obtain that the dimension is in fact n.<br />

Exercise 44. Suppose that x 1 , . . . , x k all in R n are linearly independent and<br />

that the vector y in R n is equal to β 1 x 1 + · · · + β k x k . Show that this is the only<br />

way that y can be expressed as a linear combination of the x i ’s. (That is show that<br />

if y = γ 1 x 1 + · · · + γ k x k then β 1 = γ 1 , . . . , β k = γ k .)<br />

The set of all vectors that can be written as a linear combination of the vectors<br />

x 1 , . . . , x k is called the span of those vectors. If x 1 , . . . , x k are linearly independent<br />

and if the span of x 1 , . . . , x k is all of R n then the collection { x 1 , . . . , x k } is called<br />

a basis for R n . (<strong>Of</strong> course, in this case we must have k = n.) Any vector in R n<br />

can be uniquely represented as a linear combination of the vectors x 1 , . . . , x k . We<br />

shall later see that it can sometimes be useful to choose a particular basis in which<br />

to represent the vectors with which we deal.<br />

It may be that we have a collection of vectors { x 1 , . . . , x k } whose span is not<br />

all of R n . In this case we call the span of { x 1 , . . . , x k } a linear subspace of R n .<br />

Alternatively we say that X ⊂ R n is a linear subspace of R n if X is closed under<br />

vector addition and scalar multiplication. That is, if for all x, y ∈ X the vector<br />

x + y is also in X and for all x ∈ X and α ∈ R the vector αx is in X. If the span<br />

of x 1 , . . . , x k is X and if x 1 , . . . , x k are linearly independent then we say that these<br />

vectors are a basis for the linear subspace X. In this case the dimension of the<br />

linear subspace X is k. In general the dimension of the span of x 1 , . . . , x k is equal<br />

to the maximum number of linearly independent vectors in x 1 , . . . , x k .<br />

Finally, we comment that R n is a metric space with metric d : R 2n → R +<br />

defined by<br />

d((x 1 , . . . , x n ), (y 1 , . . . , y n )) = √ (x 1 − y 1 ) 2 + · · · + (x n − y n ) 2 .<br />

There are many other metrics we could define on this space but this is the standard<br />

one.


2. LINEAR FUNCTIONS FROM R n TO R m 19<br />

2. Linear Functions from R n to R m<br />

In the previous section we introduced the space R n . Here we shall discuss<br />

functions from one such space to another (possibly of different dimension). The<br />

concept of continuity that we introduced for metric spaces is immediately applicable<br />

here. We shall be mainly concerned here with an even narrower class of functions,<br />

namely, the linear functions.<br />

Definition 14. A function f : R n → R m is said to be a linear function if it<br />

satisfies the following two properties.<br />

(1) f(x + y) = f(x) + f(y) for all x, y ∈ R n , and<br />

(2) f(αx) = αf(x) for all x ∈ R n and α ∈ R.<br />

Comment 2. When considering functions of a single real variable, that is,<br />

functions from R to R functions of the form f(x) = ax + b where a and b are<br />

fixed constants are sometimes called linear functions. It is easy to see that if b ≠ 0<br />

then such functions do not satisfy the conditions given above. We shall call such<br />

functions affine functions. More generally we shall call a function g : R n → R m an<br />

affine function if it is the sum of a linear function f : R n → R m and a constant<br />

b ∈ R m . That is, if for any x ∈ R n g(x) = f(x) + b.<br />

Let us now suppose that we have two linear functions f : R n → R m and<br />

g : R n → R m . It is straightforward to show that the function (f + g) : R n → R m<br />

defined by (f + g)(x) = f(x) + g(x) is also a linear function. Similarly if we have a<br />

linear function f : R n → R m and a constant α ∈ R the function (αf) : R n → R m<br />

defined by (αf)(x) = αf(x) is a linear function. If f : R n → R m and g : R m →<br />

R k are linear functions then the composite function g ◦ f : R n → R k defined by<br />

g ◦ f(x) = g(f(x)) is again a linear function. Finally, if f : R n → R n is not only<br />

linear, but also one-to-one and onto so that it has an inverse f −1 : R n → R n then<br />

the inverse function is also a linear function.<br />

Exercise 45. Prove the facts stated in the previous paragraph.<br />

Recall in the previous section we defined the notion of a linear subspace. A<br />

linear function f : R n → R m defines two important subspaces, the image of f,<br />

denoted Im(f) ⊂ R m , and the kernel of f, denoted Ker(f) ⊂ R n . The image of f<br />

is the set of all vectors in R m such that f maps some vector in R n to that vector,<br />

that is,<br />

Im(f) = { y ∈ R m | ∃x ∈ R n such that y = f(x) }.<br />

The kernel of f is the set of all vectors in R n that are mapped by the function f<br />

to the zero vector in R m , that is,<br />

Ker(f) = { x ∈ R n | f(x) = 0 }.<br />

The kernel of f is sometimes called the null space of f.<br />

It is intuitively clear that the dimension of Im(f) is no more than n. (It is of<br />

course no more than m since it is contained in R m .) <strong>Of</strong> course, in general it may be<br />

less than n, for example if m < n or if f mapped all points in R n to the zero vector<br />

in R m . (You should satisfy yourself that this function is indeed a linear function.)<br />

However if the dimension of Im(f) is indeed less than n it means that the function<br />

has mapped the n-dimensional space R n into a linear space of lower dimension and<br />

that in the process some dimensions have been lost. The linearity of f means that<br />

a linear subspace of dimension equal to the number of dimensions that have been<br />

lost must have been collapsed to the zero vector (and that translates of this linear<br />

subspace have been collapsed to single points). Thus we can say that<br />

dim(Im(f)) + dim(Ker(f)) = n.


20 2. LINEAR ALGEBRA<br />

In the following section we shall introduce the notion of a matrix and define<br />

various operations on matrices. If you are like me when I first came across matrices,<br />

these definitions may seem somewhat arbitrary and mysterious. However, we shall<br />

see that matrices may be viewed as representations of linear functions and that when<br />

viewed in this way the operations we define on matrices are completely natural.<br />

3. Matrices and Matrix Algebra<br />

A matrix is defined as a rectangular array of numbers. If the matrix contains<br />

m rows and n columns it is called an m × n matrix (read “m by n” matrix). The<br />

element in the ith row and the jth column is called the ijth element. We typically<br />

enclose a matrix in square brackets [ ] and write it as<br />

⎡<br />

⎤<br />

a 11 . . . a 1n<br />

⎢<br />

⎣<br />

.<br />

. ..<br />

⎥<br />

. ⎦ .<br />

a m1 . . . a mn<br />

In the case that m = n we call the matrix a square matrix. If m = 1 the matrix<br />

contains a single row and we call it a row vector. If n = 1 the matrix contains<br />

a single column and we call it a column vector. For most purposes we do not<br />

distinguish between a 1 × 1 matrix [a] and the scalar a.<br />

Just as we defined the operation of vector addition and the multiplication of<br />

a vector by a scalar we define similar operations for matrices. In order to be able<br />

to add two matrices we require that the matrices be of the same dimension. That<br />

is, if matrix A is of dimension m × n we shall be able to add the matrix B to it<br />

if and only if B is also of dimension m × n. If this condition is met then we add<br />

matrices simply by adding the corresponding elements of each matrix to obtain the<br />

new m × n matrix A + B. That is,<br />

⎡<br />

⎤ ⎡<br />

⎤ ⎡<br />

⎤<br />

a 11 . . . a 1n b 11 . . . b 1n a 11 + b 11 . . . a 1n + b 1n<br />

⎢<br />

⎣<br />

.<br />

. ..<br />

⎥ ⎢<br />

. ⎦ + ⎣<br />

.<br />

. ..<br />

⎥ ⎢<br />

. ⎦ = ⎣<br />

.<br />

. ..<br />

⎥<br />

. ⎦ .<br />

a m1 . . . a mn b m1 . . . b mn a m1 + b m1 . . . a mn + b mn<br />

We can see that this definition of matrix addition satisfies many of the same<br />

properties of the addition of scalars. If A, B, and C are all m × n matrices then<br />

(1) A + B = B + A,<br />

(2) (A + B) + C = A + (B + C),<br />

(3) there is a zero matrix 0 such that for any m×n matrix A we have A+0 =<br />

0 + A = A, and<br />

(4) there is a matrix −A such that A + (−A) = (−A) + A = 0.<br />

<strong>Of</strong> course, the zero matrix referred to in 3 is simply the m×n matrix consisting<br />

of all zeros (this is called a null matrix) and the matrix −A referred to in 4 is the<br />

matrix obtained from A by replacing each element of A by its negative, that is,<br />

⎡<br />

⎡<br />

⎢<br />

− ⎣<br />

a 11 . . . a 1n<br />

⎤<br />

.<br />

. .. .<br />

⎥<br />

⎦ =<br />

⎢<br />

⎣<br />

−a 11 . . . −a 1n<br />

⎤<br />

.<br />

. .. .<br />

Now, given a scalar α in R and an m × n matrix A we define the product of α<br />

and A which we write αA to be the matrix in which each element is replaced by α<br />

times that element, that is,<br />

⎡<br />

⎡<br />

⎢<br />

α ⎣<br />

a 11 . . . a 1n<br />

⎤<br />

.<br />

. .. .<br />

⎥<br />

⎦ =<br />

⎢<br />

⎣<br />

⎥<br />

⎦ .<br />

αa 11 . . . αa 1n<br />

⎤<br />

.<br />

. .. .<br />

⎥<br />

⎦ .


3. MATRICES AND MATRIX ALGEBRA 21<br />

So far the definitions of matrix operations have all seemed the most natural<br />

ones. We now come to defining matrix multiplication. Perhaps here the definition<br />

seems somewhat less natural. However in the next section we shall see that the definition<br />

we shall give is in fact very natural when we view matrices as representations<br />

of linear functions.<br />

We define matrix multiplication of A times B written as AB where A is an<br />

m × n matrix and B is a p × q matrix only when n = p. In this case the product<br />

AB is defined to be an m × q matrix in which the element in the ith row and jth<br />

column is ∑ n<br />

k=1 a ikb kj . That is, to find the term to go in the ith row and the jth<br />

column of the product matrix AB we take the ith row of the matrix A which will<br />

be a row vector with n elements and the jth column of the matrix B which will be<br />

a column vector with n elements. We then multiply each element of the first vector<br />

by the corresponding element of the second and add all these products. Thus<br />

⎡<br />

⎢<br />

⎣<br />

⎤ ⎡<br />

a 11 . . . a 1n<br />

.<br />

. ..<br />

⎥ ⎢<br />

. ⎦ ⎣<br />

a m1 . . . a mn<br />

b 11 . . . b 1q<br />

⎤<br />

.<br />

. .. .<br />

⎥<br />

⎦ =<br />

⎡ ∑ n<br />

k=1<br />

⎢<br />

a ∑ n<br />

1kb k1 . . .<br />

k=1 a ⎤<br />

1kb kq<br />

⎣<br />

.<br />

. ..<br />

∑ .<br />

n<br />

k=1 a ∑ n<br />

mkb k1 . . .<br />

k=1 a mkb kq<br />

⎥<br />

⎦ .<br />

For example<br />

[ a b c<br />

d e f<br />

] ⎡ ⎣ p q<br />

r s<br />

t v<br />

⎤<br />

[<br />

⎦ ap + br + ct aq + bs + cv<br />

=<br />

dp + er + ft dq + es + fv<br />

]<br />

.<br />

We define the identity matrix of order n to be the n × n matrix that has 1’s on<br />

its main diagonal and zeros elsewhere that is, whose ijth element is 1 if i = j and<br />

zero if i ≠ j. We denote this matrix by I n or, if the order is clear from the context,<br />

simply I. That is,<br />

⎡<br />

I = ⎢<br />

⎣<br />

1 0 . . . 0<br />

0 1 . . . 0<br />

. .<br />

. .. .<br />

0 0 . . . 1<br />

⎤<br />

⎥<br />

⎦ .<br />

It is easy to see that if A is an m × n matrix then AI n = A and I m A = A. In fact,<br />

we could equally well define the identity matrix to be that matrix that satisfies<br />

these properties for all such matrices A in which case it would be easy to show that<br />

there was a unique matrix satisfying this property, namely, the matrix we defined<br />

above.<br />

Consider an m × n matrix A. The columns of A are m-dimensional vectors,<br />

that is, elements of R m and the rows of A are elements of R n . Thus we can ask<br />

if the n columns are linearly independent and similarly if the m rows are linearly<br />

independent. In fact we ask: What is the maximum number of linearly independent<br />

columns of A? It turns out that this is the same as the maximum number of linearly<br />

independent rows of A. We call the number the rank of the matrix A.


22 2. LINEAR ALGEBRA<br />

4. Matrices as Representations of Linear Functions<br />

Let us suppose that we have a particular linear function f : R n → R m . We have<br />

suggested in the previous section that such a function can necessarily be represented<br />

as multiplication by some matrix. We shall now show that this is true. Moreover<br />

we shall do so by explicitly constructing the appropriate matrix.<br />

Let us write the n-dimensional vector x as a column vector<br />

⎡<br />

x = ⎢<br />

⎣<br />

x 1<br />

x 2<br />

.<br />

x n<br />

Now, notice that we can write the vector x as a sum ∑ n<br />

i=1 x ie i , where e i is the ith<br />

unit vector, that is, the vector with 1 in the ith place and zeros elsewhere. That is,<br />

⎡<br />

⎢<br />

⎣<br />

x 1<br />

x 2<br />

.<br />

x n<br />

⎤<br />

⎡<br />

⎥<br />

⎦ = x 1 ⎢<br />

⎣<br />

1<br />

0<br />

.<br />

0<br />

⎤<br />

⎡<br />

⎥<br />

⎦ + x 2 ⎢<br />

⎣<br />

0<br />

1<br />

.<br />

0<br />

⎤<br />

⎥<br />

⎦ .<br />

⎤<br />

⎡<br />

⎥<br />

⎦ + · · · + x n ⎢<br />

⎣<br />

Now from the linearity of the function f we can write<br />

f(x) = f(<br />

=<br />

=<br />

n∑<br />

x i e i )<br />

i=1<br />

n∑<br />

f(x i e i )<br />

i=1<br />

n∑<br />

x i f(e i ).<br />

i=1<br />

But, what is f(e i )? Remember that e i is a unit vector in R n and that f maps<br />

vectors in R n to vectors in R m . Thus f(e i ) is the image in R m of the vector e i . Let<br />

us write f(e i ) as<br />

Thus<br />

f(x) =<br />

n∑<br />

x i f(e i )<br />

i=1<br />

⎡<br />

= x 1 ⎢<br />

⎣<br />

⎡<br />

= ⎢<br />

⎣<br />

a 11<br />

a 21<br />

.<br />

a m1<br />

⎤<br />

∑ n<br />

i=1 a 1ix i<br />

∑n<br />

i=1 a 2ix i<br />

.<br />

∑ n<br />

i=1 a mix i<br />

⎡<br />

⎢<br />

⎣<br />

a 1i<br />

a 2i<br />

.<br />

a mi<br />

⎡<br />

⎥<br />

⎦ + x 2 ⎢<br />

⎣<br />

⎤<br />

⎥<br />

⎦<br />

⎤<br />

⎥<br />

⎦ .<br />

a 12<br />

a 22<br />

.<br />

a m2<br />

⎤<br />

⎡<br />

⎥<br />

⎦ + · · · + x n ⎢<br />

⎣<br />

0<br />

0<br />

.<br />

1<br />

⎤<br />

⎥<br />

⎦ .<br />

a 1n<br />

a 2n<br />

.<br />

a mn<br />

⎤<br />

⎥<br />


4. MATRICES AS LINEAR FUNCTIONS 23<br />

and this is exactly what we would have obtained had we multiplied the matrices<br />

⎡<br />

⎢<br />

⎣<br />

⎤ ⎡<br />

a 11 a 12 . . . a 1n<br />

a 21 a 22 . . . a 2n<br />

.<br />

. . ..<br />

⎥ ⎢<br />

. ⎦ ⎣<br />

a m1 a m2 . . . a mn<br />

x 1<br />

x 2<br />

.<br />

x n<br />

⎤<br />

⎥<br />

⎦ .<br />

Thus we have not only shown that a linear function is necessarily represented by<br />

multiplication by a matrix we have also shown how to find the appropriate matrix.<br />

It is precisely the matrix whose n columns are the images under the function of the<br />

n unit vectors in R n .<br />

Exercise 46. Find the matrices that represent the following linear functions<br />

from R 2 to R 2 .<br />

(1) a clockwise rotation of π/2 (90 ◦ ),<br />

(2) a reflection in the x 1 axis,<br />

(3) a reflection in the line x 2 = x 1 (that is, the 45 ◦ line),<br />

(4) a counter clockwise rotation of π/4 (45 ◦ ), and<br />

(5) a reflection in the line x 2 = x 1 followed by a counter clockwise rotation of<br />

π/4.<br />

Recall that in Section 2 we defined, for any f, g : R n → R m and α ∈ R, the<br />

functions (f + g) and (αf). In Section 3 we defined the sum of two m × n matrices<br />

A and B, and the product of a scalar α with the matrix A. Let us instead define<br />

the sum of A and B as follows.<br />

Let f : R n → R m be the linear function represented by the matrix A and<br />

g : R n → R m be the linear function represented by the matrix B. Now define<br />

the matrix (A + B) to be the matrix that represents the linear function (f + g).<br />

Similarly let the matrix αA be the matrix that represents the linear function (αf).<br />

Exercise 47. Prove that the matrices (A + B) and αA defined in the previous<br />

paragraph coincide with the matrices defined in Section 3.<br />

We can also see that the definition we gave of matrix multiplication is precisely<br />

the right definition if we mean multiplication of matrices to mean the composition of<br />

the linear functions that the matrices represent. To be more precise let f : R n → R m<br />

and g : R m → R k be linear functions and let A and B be the m × n and k × m<br />

matrices that represent them. Let (g ◦ f) : R n → R k be the composite function<br />

defined in Section 2. Now let us define the product BA to be that matrix that<br />

represents the linear function (g ◦ f).


24 2. LINEAR ALGEBRA<br />

Now since the matrix A represents the function f and B represents g we have<br />

(g ◦ f)(x) = g(f(x))<br />

⎛⎡<br />

⎤ ⎡ ⎤⎞<br />

a 11 a 12 . . . a 1n x 1<br />

a 21 a 22 . . . a 2n<br />

x 2<br />

= g ⎜⎢<br />

⎝⎣<br />

.<br />

. . ..<br />

⎥ ⎢ ⎥⎟<br />

. ⎦ ⎣ . ⎦⎠<br />

a m1 a m2 . . . a mn x n<br />

⎛⎡<br />

∑ n<br />

i=1 a ⎤⎞<br />

∑n<br />

1ix i<br />

i=1<br />

= g ⎜⎢<br />

a 2ix i<br />

⎥⎟<br />

⎝⎣<br />

∑<br />

. ⎦⎠<br />

n<br />

i=1 a mix i<br />

⎡<br />

⎤ ⎡ ∑ n<br />

b 11 b 12 . . . b 1m<br />

i=1 b 21 b 22 . . . b 2m<br />

a ⎤<br />

∑n<br />

1ix i<br />

i=1<br />

= ⎢<br />

⎣<br />

.<br />

. . ..<br />

⎥ ⎢<br />

a 2ix i<br />

⎥<br />

. ⎦ ⎣ . ⎦<br />

∑ n<br />

b k1 b k2 . . . b km i=1 a mix i<br />

⎡ ∑ m<br />

j=1 b ∑ n<br />

1j i=1 a ⎤<br />

jix i<br />

∑m j=1<br />

=<br />

b ∑ n<br />

2j i=1 a jix i<br />

⎢<br />

⎥<br />

⎣<br />

∑<br />

. ⎦<br />

m<br />

j=1 b ∑ n<br />

kj i=1 a jix i<br />

⎡ ∑ n ∑ m<br />

i=1 j=1 b ⎤<br />

1ja ji x i<br />

∑n ∑ m i=1 j=1<br />

=<br />

b 2ja ji x i<br />

⎢<br />

⎥<br />

⎣<br />

∑<br />

. ⎦<br />

n ∑ m<br />

i=1 j=1 b kja ji x i<br />

⎡ ∑ m<br />

j=1 b ∑ m<br />

1ja j1 j=1 b ∑ m<br />

1ja j2 . . .<br />

j=1 b ⎤ ⎡<br />

1ja jn<br />

∑m j=1<br />

=<br />

b ∑ m<br />

2ja j1 j=1 b ∑ m<br />

2ja j2 . . .<br />

j=1 b 2ja jn<br />

⎢<br />

⎣<br />

. ⎥ ⎢<br />

.<br />

. ..<br />

∑ . ⎦ ⎣<br />

m<br />

j=1 b ∑ m<br />

kja j1 j=1 b ∑ m<br />

kja j2 . . .<br />

j=1 b kja jn<br />

And this last is the product of the matrix we defined in Section 3 to be BA with<br />

the column vector x. As we have claimed the definition of matrix multiplication<br />

we gave in Section 3 was not arbitrary but rather was forced on us by our decision<br />

to regard the multiplication of two matrices as corresponding to the composition<br />

of the linear functions the matrices represented.<br />

Recall that the columns of the matrix A that represented the linear function<br />

f : R n → R m were precisely the images of the unit vectors in R n under f. The<br />

linearity of f means that the image of any point in R n is in the span of the images<br />

of these unit vectors and similarly that any point in the span of the images is the<br />

image of some point in R n . Thus Im(f) is equal to the span of the columns of<br />

A. Now, the dimension of the span of the columns of A is equal to the maximum<br />

number of linearly independent columns in A, that is, to the rank of A.<br />

x 1<br />

x 2<br />

.<br />

x n<br />

⎤<br />

⎥<br />

⎦ .<br />

5. Linear Functions from R n to R n and Square Matrices<br />

In the remainder of this chapter we look more closely at an important subclass<br />

of linear functions and the matrices that represent them, viz the functions that<br />

map R n to itself. From what we have already said we see immediately that the<br />

matrix representing such a linear function will have the same number of rows as it<br />

has columns. We call such a matrix a square matrix.


7. CHANGES OF BASIS 25<br />

If the linear function f : R n → R n is one-to-one and onto then the function f<br />

has an inverse f −1 . In Exercise 45 you showed that this function too was linear.<br />

A matrix that represents a linear function that is one-to-one and onto is called a<br />

nonsingular matrix. Alternatively we can say that an n × n matrix is nonsingular<br />

if the rank of the matrix is n. To see these two statements are equivalent note<br />

first that if f is one-to-one then Ker(f) = {0}. (This is the trivial direction of<br />

Exercise 48.) But this means that dim(Ker(f)) = 0 and so dim(Im(f)) = n. And,<br />

as we argued at the end of the previous section this is the same as the rank of<br />

matrix that represents f.<br />

Exercise 48. Show that the linear function f : R n → R m is one-to-one if and<br />

only if Ker(f) = {0}.<br />

Exercise 49. Show that the linear function f : R n → R n is one-to-one if and<br />

only if it is onto.<br />

6. Inverse Functions and Inverse Matrices<br />

In the previous section we discussed briefly the idea of the inverse of a linear<br />

function f : R n → R n . This allows us a very easy definition of the inverse of a<br />

square matrix A. The inverse of A is the matrix that represents the linear function<br />

that is the inverse function of the linear function that A represents. We write the<br />

inverse of the matrix A as A −1 . Thus a matrix will have an inverse if and only if<br />

the linear function that the matrix represents has an inverse, that is, if and only<br />

if the linear function is one-to-one and onto. We saw in the previous section that<br />

this will occur if and only if the kernel of the function is {0} which in turn occurs<br />

if and only if the image of f is of full dimension, that is, is all of R n . This is the<br />

same as the matrix being of full rank, that is, of rank n.<br />

As with the ideas we have discussed earlier we can express the idea of a matrix<br />

inverse purely in terms of matrices without reference to the linear function that<br />

they represent. Given an n × n matrix A we define the inverse of A to be a matrix<br />

B such that BA = I n where I n is the n × n identity matrix discussed in Section 3.<br />

Such a matrix B will exist if and only if the matrix A is nonsingular. Moreover, if<br />

such a matrix B exists then it is also true that AB = I n , that is, (A −1 ) −1 = A.<br />

In Section 9 we shall see one method for calculating inverses of general n × n<br />

matrices. Here we shall simply describe how to calculate the inverse of a 2 × 2<br />

matrix. Suppose that we have the matrix<br />

[ ] a b<br />

A = .<br />

c d<br />

The inverse of this matrix is<br />

( 1<br />

) [ d −b<br />

ad − bc −c a<br />

Exercise 50. Show that the matrix A is of full rank if and only if ad − bc ≠ 0.<br />

Exercise 51. Check that the matrix given is, in fact, the inverse of A.<br />

7. Changes of Basis<br />

We have until now implicitly assumed that there is no ambiguity when we<br />

speak of the vector (x 1 , x 2 , . . . , x n ). Sometimes there may indeed be an obvious<br />

meaning to such a vector. However when we define a linear space all that are really<br />

specified are “what straight lines are” and “where zero is.” In particular, we do<br />

not necessarily have defined in an unambiguous way “where the axes are” or “what<br />

]<br />

.


26 2. LINEAR ALGEBRA<br />

a unit length along each axis is.” In other words we may not have a set of basis<br />

vectors specified.<br />

Even when we do have, or have decided on, a set of basis vectors we may wish<br />

to redefine our description of the linear space with which we are dealing so as to<br />

use a different set of basis vectors. Let us suppose that we have an n-dimensional<br />

space, even R n say, with a given set of basis vectors v 1 , v 2 , . . . , v n and that we<br />

wish instead to describe the space in terms of the linearly independent vectors<br />

b 1 , b 2 , . . . , b n where<br />

b i = b 1i v 1 + b 2i v 2 + · · · + b ni v n .<br />

Now, if we had the description of a point in terms of the new coordinate vectors,<br />

e.g., as<br />

z 1 b 1 + z 2 b 2 + · · · + z n b n<br />

then we can easily convert this to a description in terms of the original basis vectors.<br />

We would simply substitute the formula for b i in terms of the e j ’s into the previous<br />

formula giving<br />

(<br />

∑ n<br />

) ( n<br />

) (<br />

∑<br />

n<br />

)<br />

∑<br />

b 1i z i v 1 + b 2i z i v 2 + · · · + b ni z i v n<br />

i=1<br />

or, in our previous notation<br />

i=1<br />

⎡<br />

( ∑ n<br />

i=1 b ⎤<br />

1iz i )<br />

( ∑ n<br />

i=1<br />

⎢<br />

b 2iz i )<br />

⎥<br />

⎣ .<br />

( ∑ ⎦ .<br />

n<br />

i=1 b niz i )<br />

But this is simply the product<br />

⎡<br />

⎤ ⎡<br />

b 11 b 12 . . . b 1n<br />

b 21 b 22 . . . b 2n<br />

⎢<br />

⎣<br />

.<br />

. . ..<br />

⎥ ⎢<br />

. ⎦ ⎣<br />

b n1 b n2 . . . b nn<br />

That is, if we are given an n-tuple of real numbers that describe a vector in terms<br />

of the new basis vectors b 1 , b 2 , . . . , b n and we wish to find the n-tuple that describes<br />

the vector in terms of the original basis vectors we simply multiply the ntuple we<br />

are given, written as a column vector by the matrix whose columns are the new<br />

basis vectors b 1 , b 2 , . . . , b n . We shall call this matrix B. We see among other things<br />

that changing the basis is a linear operation.<br />

Now, if we were given the information in terms of the original basis vectors<br />

and wanted to write it in terms of the new basis vectors what should we do? Since<br />

we don’t have the original basis vectors written in terms of the new basis vectors<br />

this is not immediately obvious. However we do know that if we were to do it and<br />

then were to carry out the operation described in the previous paragraph we would<br />

be back with what we started. Further we know that the operation is a linear<br />

operation that maps n-tuples to n-tuples and so is represented by multiplication<br />

by an n × n matrix. That is we multiply the n-tuple written as a column vector by<br />

the matrix that when multiplied by B gives the identity matrix, that is, the matrix<br />

B −1 . If we are given a vector of the form<br />

z 1<br />

z 2<br />

.<br />

z n<br />

x 1 v 1 + x 2 v 2 + · · · + x n v n<br />

i=1<br />

⎤<br />

⎥<br />

⎦ .


7. CHANGES OF BASIS 27<br />

and we wish to express it in terms of the vectors b 1 , b 2 , . . . , b n we calculate<br />

⎡<br />

⎤−1 ⎡ ⎤<br />

b 11 b 12 . . . b 1n x 1<br />

b 21 b 22 . . . b 2n<br />

x 2<br />

⎢<br />

⎣<br />

.<br />

. . ..<br />

⎥ ⎢ ⎥<br />

. ⎦ ⎣ . ⎦ .<br />

b n1 b n2 . . . b nn x n<br />

Suppose now that we consider a linear function f : R n → R n and that we have<br />

originally described R n in terms of the basis vectors v 1 , v 2 , . . . , v n where v i is the<br />

vector with 1 in the ith place and zeros elsewhere. Suppose that with these basis<br />

vectors f is represented by the matrix<br />

⎡<br />

⎤<br />

a 11 a 12 . . . a 1n<br />

a 21 a 22 . . . a 2n<br />

A = ⎢<br />

⎣<br />

.<br />

. . ..<br />

⎥<br />

. ⎦ .<br />

a n1 a n2 . . . a nn<br />

If we now describe R n in terms of the vectors b 1 , b 2 , . . . , b n how will the linear<br />

function f be represented? Let us think of what we want? We shall be given<br />

a vector described in terms of the basis vectors b 1 , b 2 , . . . , b n and we shall want<br />

to know what the image of this vector under the linear function f is, where we<br />

shall again want our answer in terms of the basis vectors b 1 , b 2 , . . . , b n . We shall<br />

know how to do this when we are given the description in terms of the vectors<br />

e 1 , e 2 , . . . , e n . Thus the first thing we shall do with our vector is to convert it from<br />

a description in terms of b 1 , b 2 , . . . , b n to a description in terms of e 1 , e 2 , . . . , e n . We<br />

do this by multiplying the n-tuple by the matrix B. Thus if we call our original<br />

n-tuple z we shall now have a description of the vector in terms of e 1 , e 2 , . . . , e n ,<br />

viz Bz. Given this description we can find the image of the vector in question<br />

under f by multiplying by the matrix A. Thus we shall have A(Bz) = (AB)z.<br />

Remember however this will have given us the image vector in terms of the basis<br />

vectors e 1 , e 2 , . . . , e n . In order to convert this to a description in terms of the vectors<br />

b 1 , b 2 , . . . , b n we must multiply by the matrix B −1 . Thus our final n-tuple will be<br />

(B −1 AB)z.<br />

Recapitulating, suppose that we know that the linear function f : R n → R n is<br />

represented by the matrix A when we describe R n in terms of the standard basis<br />

vectors e 1 , e 2 , . . . , e n and that we have a new set of basis vectors b 1 , b 2 , . . . , b n . Then<br />

when R n is described in terms of these new basis vectors the linear function f will<br />

be represented by the matrix B −1 AB.<br />

Exercise 52. Let f : R n → R m be a linear function. Suppose that with the<br />

standard bases for R n and R m the function f is represented by the matrix A. Let<br />

b 1 , b 2 , . . . , b n be a new set of basis vectors for R n and c 1 , c 2 , . . . , c m be a new set of<br />

basis vectors for R m . What is the matrix that represents f when the linear spaces<br />

are described in terms of the new basis vectors?<br />

Exercise 53. Let f : R 2 → R 2 be a linear function. Suppose that with the<br />

standard bases for R n and R m the function f is represented by the matrix<br />

[ ]<br />

3 1<br />

.<br />

1 2<br />

Let [ 3<br />

2<br />

]<br />

and<br />

be a new set of basis vectors for R 2 . What is the matrix that represents f when<br />

R 2 is described in terms of the new basis vectors?<br />

[ 1<br />

1<br />

]


28 2. LINEAR ALGEBRA<br />

Properties of a square matrix that depend only on the linear function that the<br />

matrix represents and not on the particular choice of basis vectors for the linear<br />

space are called invariant properties. We have already seen one example of an<br />

invariant property, the rank of a matrix. The rank of a matrix is equal to the<br />

dimension of the image space of the function that the matrix represents which<br />

clearly depends only on the function and not on the choice of basis vectors for the<br />

linear space.<br />

The idea of a property being invariant can be expressed also in terms only of<br />

matrices without reference to the idea of linear functions. A property is invariant<br />

if whenever an n × n matrix A has the property then for any nonsingular n × n<br />

matrix B the matrix B −1 AB also has the property. We might think of rank as a<br />

function that associates to any square matrix a nonnegative integer. We shall say<br />

that such a function is an invariant if the property of having the function take a<br />

particular value is invariant for all particular values we may choose.<br />

Two particularly important invariants are the trace of a square matrix and the<br />

determinant of a square matrix. We examine these in more detail in the following<br />

section.<br />

8. The Trace and the Determinant<br />

In this section we define two important real valued functions on the space<br />

of n × n matrices, the trace and the determinant. Both of these concepts have<br />

geometric interpretations. However, while the trace is easy to calculate (much easier<br />

than the determinant) its geometric interpretation is rather hard to see. Thus we<br />

shall not go into it. On the other hand the determinant while being somewhat<br />

harder to calculate has a very clear geometric interpretation. In Section 9 we shall<br />

examine in some detail how to calculate determinants. In this section we shall be<br />

content to discuss one definition and the geometric intuition of the determinant.<br />

Given an n×n matrix A the trace of A, written tr(A) is the sum of the elements<br />

on the main diagonal, that is,<br />

⎡<br />

⎤<br />

a 11 a 12 . . . a 1n<br />

a 21 a 22 . . . a 2n<br />

tr ⎢<br />

⎣<br />

.<br />

. . ..<br />

⎥<br />

. ⎦ =<br />

∑<br />

n a ii .<br />

i=1<br />

a n1 a n2 . . . a nn<br />

Exercise 54. For the matrices given in Exercise 53 confirm that tr(A) =<br />

tr(B −1 AB).<br />

It is easy to see that the trace is a linear function on the space of all n × n<br />

matrices, that is, that for all A and B n × n matrices and for all α ∈ R<br />

(1) tr(A + B) = tr(A) + tr(B),<br />

and<br />

(2) tr(αA) = αtr(A).<br />

We can also see that if A and B are both n×n matrices then tr(AB) = tr(BA).<br />

In fact, if A is an m × n matrix and B is an n × m matrix this is still true. This<br />

will often be extremely useful in calculating the trace of a product.<br />

Exercise 55. From the definition of matrix multiplication show that if A is an<br />

m × n matrix and B is an n × m matrix that tr(AB) = tr(BA). [Hint: Look at the<br />

definition of matrix multiplication in Section 2. Then write the determinant of the<br />

product matrix using summation notation. Finally change the order of summation.]


8. THE TRACE AND THE DETERMINANT 29<br />

The determinant, unlike the trace is not a linear function of the matrix. It does<br />

however have some linear structure. If we fix all columns of the matrix except one<br />

and look at the determinant as a function of only this column then the determinant<br />

is linear in this single column. Moreover this is true whatever the column we choose.<br />

Let us write the determinant of the n × n matrix A as det(A). Let us also write<br />

the matrix A as [a 1 , a 2 , . . . , a n ] where a i is the ith column of the matrix A. Thus<br />

our claim is that for all n × n matrices A, for all i = 1, 2, . . . n, for all n vectors b,<br />

and for all α ∈ R<br />

(3)<br />

det([a 1 , . . . , a i−1 , a i + b, a i+1 , . . . , a n ]) = det([a 1 , . . . , a i−1 , a i , a i+1 , . . . , a n ])<br />

+ det([a 1 , . . . , a i−1 , b, a i+1 , . . . , a n ])<br />

and<br />

(4) det([a 1 , . . . , a i−1 , αa i , a i+1 , . . . , a n ]) = α det([a 1 , . . . , a i−1 , a i , a i+1 , . . . , a n ]).<br />

We express this by saying that the determinant is a multilinear function.<br />

Also the determinant is such that any n × n matrix that is not of full rank,<br />

that is, of rank n, has a zero determinant. In fact, given that the determinant<br />

is a multilinear function if we simply say that any matrix in which one column is<br />

the same as one of its neighbours has a zero determinant this implies the stronger<br />

statement that we made. We already see one use of calculating determinants. A<br />

matrix is nonsingular if and only if its determinant is nonzero.<br />

The two properties of being multilinear and zero whenever two neighbouring<br />

columns are the same already almost uniquely identify the determinant. Notice<br />

however that if the determinant satisfies these two properties then so does any<br />

constant times the determinant. To uniquely define the determinant we “tie down”<br />

this constant by assuming that det(I) = 1.<br />

Though we haven’t proved that it is so, these three properties uniquely define<br />

the determinant. That is, there is one and only one function with these three<br />

properties. We call this function the determinant. In Section 9 we shall discuss a<br />

number of other useful properties of the determinant. Remember that this additional<br />

properties are not really additional facts about the determinant. They can<br />

all be derived from the three properties we have given here.<br />

Let us now look to the geometric interpretation of the determinant. Let us<br />

first think about what linear transformations can do to the space R n . Since we<br />

have already said that a linear transformation that is not onto is represented by a<br />

matrix with a zero determinant let us think about linear transformations that are<br />

onto, that is, that do not map R n into a linear space of lower dimension. Such<br />

transformations can rotate the space around zero. They can “stretch” the space in<br />

different directions. And they can “flip” the space over. In the latter case all objects<br />

will become “mirror images” of themselves. We call linear transformations that<br />

make such a mirror image orientation reversing and those that don’t orientation<br />

preserving. A matrix that represents an orientation preserving linear function has a<br />

positive determinant while a matrix that represents an orientation reversing linear<br />

function has a negative determinant. Thus we have a geometric interpretation of<br />

the sign of the determinant.<br />

The absolute size of the determinant represents how much bigger or smaller the<br />

linear function makes objects. More precisely it gives the “volume” of the image<br />

of the unit hypercube under the transformation. The word volume is in quotes<br />

because it is the volume with which we are familiar only when n = 3. If n = 2 then<br />

it is area, while if n > 3 then it is the full dimensional analog in R n of volume in<br />

R 3 .


30 2. LINEAR ALGEBRA<br />

Exercise 56. Consider the matrix<br />

[ 3 1<br />

1 2<br />

In a diagram show the image under the linear function that this matrix represents<br />

of the unit square, that is, the square whose corners are the points (0,0), (1,0),<br />

(0,1), and (1,1). Calculate the area of that image. Do the same for the matrix<br />

[ 4 1<br />

−1 1<br />

In the light of Exercise 53, comment on the answers you calculated.<br />

]<br />

.<br />

]<br />

.<br />

9. Calculating and Using Determinants<br />

We have already used the concepts of the inverse of a matrix and the determinant<br />

of a matrix. The purpose of this section is to cover some of the “cookbook”<br />

aspects of calculating inverses and determinants.<br />

Suppose that we have an n × n matrix<br />

⎡<br />

then we shall use |A| or<br />

A =<br />

⎢<br />

⎣<br />

⎤<br />

a 11 . . . a 1n<br />

.<br />

. ..<br />

⎥<br />

. ⎦<br />

a n1 . . . a nn<br />

∣ a 11 . . . a 1n ∣∣∣∣∣∣<br />

. . .. .<br />

∣ a n1 . . . a nn<br />

as an alternative notation for det(A). Always remember that<br />

∣ a 11 . . . a 1n ∣∣∣∣∣∣<br />

. . .. .<br />

∣ a n1 . . . a nn<br />

is not a matrix but rather a real number. For the case n = 2 we define<br />

det(A) =<br />

∣ a ∣<br />

11 a 12 ∣∣∣<br />

a 21 a 22<br />

as a 11 a 22 − a 21 a 12 . It is possible to also give a convenient formula for the determinant<br />

of a 3 × 3 matrix. However, rather than doing this, we shall immediately<br />

consider the case of an n × n matrix.<br />

By the minor of an element of the matrix A we mean the determinant (remember<br />

a real number) of the matrix obtained from the matrix A by deleting the<br />

row and column containing the element in question. We denote the minor of the<br />

element a ij by the symbol |M ij |. Thus, for example,<br />

∣ a 22 . . . a 2n ∣∣∣∣∣∣ |M 11 | =<br />

.<br />

. . .. . .<br />

∣ a n2 . . . a nn<br />

Exercise 57. Write out the minors of a general 3 × 3 matrix.<br />

We now define the cofactor of an element to be either plus or minus the minor<br />

of the element, being plus if the sum of indices of the element is even and minus<br />

if it is odd. We denote the cofactor of the element a ij by the symbol |C ij |. Thus<br />

|C ij | = |M ij | if i + j is even and |C ij | = −|M ij | if i + j is odd. Or,<br />

|C ij | = (−1) i+j |M ij |.


9. CALCULATING AND USING DETERMINANTS 31<br />

We now define the determinant of an n × n matrix A,<br />

∣ a 11 . . . a 1n ∣∣∣∣∣∣<br />

det(A) = |A| =<br />

. . .. .<br />

∣ a n1 . . . a nn<br />

to be ∑ n<br />

j=1 a 1j|C 1j |. This is the sum of n terms, each one of which is the product<br />

of an element of the first row of the matrix and the cofactor of that element.<br />

Exercise 58. Define the determinant of the 1 × 1 matrix [a] to be a. (What<br />

else could we define it to be?) Show that the definition given above corresponds<br />

with the definition we gave earlier for 2 × 2 matrices.<br />

Exercise 59. Calculate the determinants of the following 3 × 3 matrices.<br />

⎡<br />

1 2<br />

⎤<br />

3<br />

⎡<br />

1 5<br />

⎤<br />

2<br />

⎡<br />

1 1 0<br />

⎤<br />

(a) ⎣ 3 6 9 ⎦ (b) ⎣ 1 4 3 ⎦ (c) ⎣ 5 4 1 ⎦<br />

4 5 7<br />

0 1 2<br />

2 3 2<br />

(d)<br />

⎡<br />

⎣<br />

1 0 0<br />

0 1 0<br />

0 0 1<br />

⎤<br />

⎦<br />

(e)<br />

⎡<br />

⎣<br />

2 5 2<br />

1 5 3<br />

0 1 3<br />

Exercise 60. Show that the determinant of the identity matrix, det(I n ) is 1<br />

for all values of n. [Hint: Show that it is true for I 2 . Then show that if it is true<br />

for I n−1 then it is true for I n .]<br />

One might ask what was special about the first row that we took elements of<br />

that row multiplied them by their cofactors and added them up. Why not the<br />

second row, or the first column? It will follow from a number of properties of<br />

determinants we list below that in fact we could have used any row or column and<br />

we would have arrived at the same answer.<br />

Exercise 61. Expand the matrix given in Exercise 59(b) in terms of the 2nd<br />

and 3rd rows and in terms of each column and check that the resulting answer<br />

agrees with the answer you obtained originally.<br />

We now have a way of calculating the determinant of any matrix. To find<br />

the determinant of an n × n matrix we have to calculate n determinants of size<br />

(n−1)×(n−1). This is clearly a fairly computationally costly procedure. However<br />

there are often ways to economise on the computation.<br />

Exercise 62. Evaluate the determinants of the following matrices<br />

⎡<br />

⎤ ⎡<br />

⎤<br />

1 8 0 7<br />

4 7 0 4<br />

(a) ⎢ 2 3 4 6<br />

⎥<br />

⎣ 1 6 0 −1 ⎦ (b) ⎢ 5 6 1 8<br />

⎥<br />

⎣ 0 0 9 0 ⎦<br />

0 −5 0 8<br />

1 −3 1 4<br />

[Hint: Think carefully about which column or row to use in the expansion.]<br />

We shall now list a number of properties of determinants. These properties<br />

imply that, as we stated above, it does not matter which row or column we use to<br />

expand the determinant. Further these properties will give us a series of transformations<br />

we may perform on a matrix without altering its determinant. This will<br />

allow us to calculate a determinant by first transforming the matrix to one whose<br />

determinant is easier to calculate and then calculating the determinant of the easier<br />

matrix.<br />

⎤<br />


32 2. LINEAR ALGEBRA<br />

Property 1. The determinant of a matrix equals the determinant of its transpose.<br />

|A| = |A ′ |<br />

Property 2. Interchanging two rows (or two columns) of a matrix changes<br />

its sign but not its absolute value. For example,<br />

∣ c d<br />

∣ ∣∣∣ a b ∣ = cb − ad = −(ad − cb) = − a b<br />

c d ∣ .<br />

Property 3. Multiplying one row (or column) of a matrix by a constant λ<br />

will change the value of the determinant λ-fold. For example,<br />

∣ ∣ λa 11 . . . λa 1n ∣∣∣∣∣∣ a 11 . . . a 1n ∣∣∣∣∣∣ . . .. . = λ<br />

. . .. . .<br />

∣ a n1 . . . a nn<br />

∣ a n1 . . . a nn<br />

Exercise 63. Check Property 3 for the cases n = 2 and n = 3.<br />

Corollary 1. |λA| = λ n |A| (where A is an n × n matrix).<br />

Corollary 2. | − A| = |A| if n is even. | − A| = −|A| if n is odd.<br />

Property 4. Adding a multiple of any row (column) to any other row (column)<br />

does not alter the value of the determinant.<br />

Exercise 64. Check that<br />

⎡ ⎤ ⎡<br />

1 5 2 1 5 + 3 × 2 2<br />

⎣ 1 4 3 ⎦ = ⎣ 1 4 + 3 × 3 3<br />

0 1 2 0 1 + 3 × 2 2<br />

⎡<br />

= ⎣<br />

⎤<br />

⎦<br />

1 + (−2) × 1 5 + (−2) × 4 2 + (−2) × 3<br />

1 4 3<br />

0 1 2<br />

Property 5. If one row (or column) is a constant times another row (or<br />

column) then the determinant the matrix is zero.<br />

Exercise 65. Show that Property 5 follows from Properties 3 and 4.<br />

We can strengthen Property 5 to obtain the following.<br />

Property 5 ′ . The determinant of a matrix is zero if and only if the matrix is<br />

not of full rank.<br />

Exercise 66. Explain why Property 5 ′ is a strengthening of Property 5, that<br />

is, why 5 ′ implies 5.<br />

These properties allow us to calculate determinants more easily. Given an n×n<br />

matrix A the basic strategy one follows is to use the above properties, particularly<br />

Property 4 to find a matrix with the same determinant as A in which one row (or<br />

column) has only one non-zero element. Then, rather than calculating n determinants<br />

of size (n − 1) × (n − 1) one only needs to calculate one. One then does the<br />

same thing for the (n − 1) × (n − 1) determinant that needs to be calculated, and<br />

so on.<br />

There are a number of reasons we are interested in determinants. One is that<br />

they give us one method of calculating the inverse of a nonsingular matrix. (Recall<br />

that there is no inverse of a singular matrix.) They also give us a method, known<br />

as Cramer’s Rule, for solving systems of linear equations. Before proceeding with<br />

this it is useful to state one further property of determinants.<br />

⎤<br />

⎦ .


9. CALCULATING AND USING DETERMINANTS 33<br />

Property 6. If one expands a matrix in terms of one row (or column) and<br />

the cofactors of a different row (or column) then the answer is always zero. That is<br />

n∑<br />

a ij |C kj | = 0<br />

whenever i ≠ k. Also<br />

whenever j ≠ k.<br />

j=1<br />

n∑<br />

a ij |C ik | = 0<br />

i=1<br />

Exercise 67. Verify Property 6 for the matrix<br />

⎡<br />

4 1 2<br />

⎤<br />

⎣ 5 2 1 ⎦ .<br />

1 0 3<br />

Let us define the matrix of cofactors C to be the matrix [|C ij |] whose ijth<br />

element is the cofactor of the ijth element of A. Now we define the adjoint matrix<br />

of A to be the transpose of the matrix of cofactors of A. That is<br />

adj(A) = C ′ .<br />

It is straightforward to see (using Property 6) that A adj(A) = |A|I n = adj(A)A.<br />

That is, A −1 = 1<br />

|A|<br />

adj(A). Notice that this is well defined if and only if |A| ≠ 0.<br />

We now have a method of finding the inverse of any nonsingular square matrix.<br />

Exercise 68. Use this method to find the inverses of the following matrices<br />

⎡<br />

3 −1<br />

⎤<br />

2<br />

⎡<br />

4 −2<br />

⎤<br />

1<br />

⎡<br />

1 5<br />

⎤<br />

2<br />

(a) ⎣ 1 0 3 ⎦ (b) ⎣ 7 3 3 ⎦ (c) ⎣ 1 4 3 ⎦ .<br />

4 0 2 2 0 1 0 1 2<br />

Knowing how to invert matrices we thus know how to solve a system of n linear<br />

equations in n unknowns. For we can express the n equations in matrix notation as<br />

Ax = b where A is an n × n matrix of coefficients, x is an n × 1 vector of unknowns,<br />

and b is an n × 1 vector of constants. Thus we can solve the system of equations<br />

as x = A −1 Ax = A −1 b.<br />

Sometimes, particularly if we are not interested in all of the x’s it is convenient<br />

to use another method of solving the equations. This method is known as Cramer’s<br />

Rule. Let us suppose that we wish to solve the above system of equations, that is,<br />

Ax = b. Let us define the matrix A i to be the matrix obtained from A by replacing<br />

the ith column of A by the vector b. Then the solution is given by<br />

x i = |A i|<br />

|A| .<br />

Exercise 69. Derive Cramer’s Rule. [Hint: We know that the solution to the<br />

system of equations is solved by x = (1/|A|)adj(A)b. This gives a formula for x i .<br />

Show that this formula is the same as that given by x i = |A i |/|A|.]<br />

Exercise 70. Solve the following system of equations (i) by matrix inversion<br />

and (ii) by Cramer’s Rule<br />

(a)<br />

2x 1 − x 2 = 2<br />

3x 2 + 2x 3 = 16<br />

5x 1 + 3x 3 = 21<br />

(b)<br />

−x 1 + x 2 + x 3 = 1<br />

x 1 − x 2 + x 3 = 1<br />

x 1 + x 2 + x 3 = 1<br />

.


34 2. LINEAR ALGEBRA<br />

Exercise 71. Recall that we claimed that the determinant was an invariant.<br />

Confirm this by calculating (directly) det(A) and det(B −1 AB) where<br />

⎡<br />

1 0 1<br />

⎤<br />

⎡<br />

1 0 0<br />

⎤<br />

B = ⎣ 1 −1 2 ⎦ and A = ⎣ 0 2 0 ⎦ .<br />

2 1 −1<br />

0 0 3<br />

Exercise 72. An nth order determinant of the form<br />

∣ a 11 0 0 . . . 0 ∣∣∣∣∣∣∣∣∣∣<br />

a 21 a 22 0 . . . 0<br />

a 31 a 32 a 33 . . . 0<br />

. . . . .. .<br />

∣ a n1 a n2 a n3 . . . a nn<br />

is called triangular. Evaluate this determinant. [Hint: Expand the determinant in<br />

terms of its first row. Expand the resulting (n − 1) × (n − 1) determinant in terms<br />

of its first row, and so on.]<br />

10. Eigenvalues and Eigenvectors<br />

Suppose that we have a linear function f : R n → R n . When we look at<br />

how f deforms R n one natural question to look at is: Where does f send some<br />

linear subspace? In particular we might ask if there are any linear subspaces that<br />

f maps to themselves. We call such linear subspaces invariant linear subspaces.<br />

<strong>Of</strong> course the space R n itself and the zero dimensional space {0} are invariant<br />

linear subspaces. The real question is whether there are any others. Clearly, for<br />

some linear transformations there are no other invariant subspaces. For example,<br />

a clockwise rotation of π/4 in R 2 has no invariant subspaces other than R 2 itself<br />

and {0}.<br />

A particularly important class of invariant linear subspaces are the one dimensional<br />

ones. A one dimensional linear subspace is specified by one nonzero vector,<br />

say ¯x. Then the subspace is {λ¯x | λ ∈ R}. Let us call this subspace L(¯x). If L(¯x)<br />

is an invariant linear subspace of f and if x ∈ L(¯x) then there is some value λ such<br />

that f(x) = λx. Moreover the value of λ for which this is true will be the same<br />

whatever value of x we choose in L(¯x).<br />

Now if we fix the set of basis vectors and thus the matrix A that represents f<br />

we have that if x is in a one dimensional invariant linear subspace of f then there<br />

is some λ ∈ R such that<br />

Ax = λx.<br />

Again we can define this notion without reference to linear functions. Given a<br />

matrix A if we can find a pair x, λ with x ≠ 0 that satisfy the above equation we<br />

call x an eigenvector of the matrix A and λ the associated eigenvalue. (Sometimes<br />

these are called characteristic vectors and values.)<br />

Exercise 73. Show that the eigenvalues of a matrix are an invariant, that<br />

is, that they depend only on the linear function the matrix represents and not on<br />

the choice of basis vectors. Show also that the eigenvectors of a matrix are not<br />

an invariant. Explain why the dependence of the eigenvectors on the particular<br />

basis is exactly what we would expect and argue that is some sense they are indeed<br />

invariant.<br />

Now we can rewrite the equation Ax = λx as<br />

(A − λI n )x = 0.


10. EIGENVALUES AND EIGENVECTORS 35<br />

If x, λ solve this equation and x ≠ 0 then we have a nonzero linear combination of<br />

the columns of A − λI n equal to zero. This means that the columns of A − λI n are<br />

not linearly independent and so det(A − λI n ) = 0, that is,<br />

⎡<br />

⎤<br />

a 11 − λ a 12 . . . a 1n<br />

a 21 a 22 − λ . . . a 2n<br />

det ⎢<br />

⎣<br />

.<br />

. . ..<br />

⎥<br />

. ⎦ = 0.<br />

a n1 a n2 . . . a nn − λ<br />

Now, the left hand side of this last equation is a polynomial of degree n in<br />

λ, that is, a polynomial in λ in which n is the highest power of λ that appears<br />

with nonzero coefficient. It is called the characteristic polynomial and the equation<br />

is called the characteristic equation. Now this equation may, or may not, have a<br />

solution in real numbers. In general, by the fundamental theorem of algebra the<br />

equation has n solutions, perhaps not all distinct, in the complex numbers. If the<br />

matrix A happens to be symmetric (that is, if a ij = a ji for all i and j) then all of<br />

its eigenvalues are real. If the eigenvalues are all distinct (that is, different from<br />

each other) then we are in a particularly well behaved situation. As a prelude we<br />

state the following result.<br />

Theorem 5. Given an n×n matrix A suppose that we have m eigenvectors of A<br />

x 1 , x 2 , . . . , x m with corresponding eigenvalues λ 1 , λ 2 , . . . , λ m . If λ i ≠ λ j whenever<br />

i ≠ j then x 1 , x 2 , . . . , x m are linearly independent.<br />

An implication of this theorem is that an n × n matrix cannot have more than<br />

n eigenvectors with distinct eigenvalues. Further this theorem allows us to see that<br />

if an n × n matrix has n distinct eigenvalues then it is possible to find a basis<br />

for R n in which the linear function that the matrix represents is represented by<br />

a diagonal matrix. Equivalently we can find a matrix B such that B −1 AB is a<br />

diagonal matrix.<br />

To see this let b 1 , b 2 , . . . , b n be n linearly independent eigenvectors with associated<br />

eigenvalues λ 1 , λ 2 , . . . , λ n . Let B be the matrix whose columns are the vectors<br />

b 1 , b 2 , . . . , b n . Since these vectors are linearly independent the matrix B has an<br />

inverse. Now<br />

B −1 AB = B −1 [Ab 1 Ab 2 . . . Ab n ]<br />

= B −1 [λ 1 b 1 λ 2 b 2 . . . λ n b n ]<br />

= [λ 1 B −1 b 1 λ 2 B −1 b 2 . . . λ n B −1 b n ]<br />

⎡<br />

⎤<br />

λ 1 0 . . . 0<br />

0 λ 2 . . . 0<br />

= ⎢<br />

⎣<br />

.<br />

. . ..<br />

⎥<br />

. ⎦ .<br />

0 0 . . . λ n


CHAPTER 3<br />

Consumer Behaviour: Optimisation Subject to the<br />

Budget Constraint<br />

1. Constrained Maximisation<br />

1.1. Lagrange Multipliers. Consider the problem of a consumer who seeks<br />

to distribute his income across the purchase of the two goods that he consumes,<br />

subject to the constraint that he spends no more than his total income. Let us<br />

denote the amount of the first good that he buys x 1 and the amount of the second<br />

good x 2 , the prices of the two goods p 1 and p 2 , and the consumer’s income y.<br />

The utility that the consumer obtains from consuming x 1 units of good 1 and x 2<br />

of good two is denoted u(x 1 , x 2 ). Thus the consumer’s problem is to maximise<br />

u(x 1 , x 2 ) subject to the constraint that p 1 x 1 + p 2 x 2 ≤ y. (We shall soon write<br />

p 1 x 1 + p 2 x 2 = y, i.e., we shall assume that the consumer must spend all of his<br />

income.) Before discussing the solution of this problem lets write it in a more<br />

“mathematical” way.<br />

(5)<br />

max u(x 1 , x 2 )<br />

x 1,x 2<br />

subject to p 1 x 1 + p 2 x 2 = y<br />

We read this “Choose x 1 and x 2 to maximise u(x 1 , x 2 ) subject to the constraint<br />

that p 1 x 1 + p 2 x 2 = y.”<br />

Let us assume, as usual, that the indifference curves (i.e., the sets of points<br />

(x 1 , x 2 ) for which u(x 1 , x 2 ) is a constant) are convex to the origin. Let us also<br />

assume that the indifference curves are nice and smooth. Then the point (x ∗ 1, x ∗ 2)<br />

that solves the maximisation problem (31) is the point at which the indifference<br />

curve is tangent to the budget line as given in Figure 1.<br />

One thing we can say about the solution is that at the point (x ∗ 1, x ∗ 2) it must be<br />

true that the marginal utility with respect to good 1 divided by the price of good 1<br />

must equal the marginal utility with respect to good 2 divided by the price of good<br />

2. For if this were not true then the consumer could, by decreasing the consumption<br />

of the good for which this ratio was lower and increasing the consumption of the<br />

other good, increase his utility. Marginal utilities are, of course, just the partial<br />

derivatives of the utility function. Thus we have<br />

(6)<br />

∂u<br />

∂x 1<br />

(x ∗ 1, x ∗ 2)<br />

p 1<br />

=<br />

∂u<br />

∂x 2<br />

(x ∗ 1, x ∗ 2)<br />

p 2<br />

.<br />

The argument we have just made seems very “economic.” It is easy to give an<br />

alternate argument that does not explicitly refer to the economic intuition. Let x u 2<br />

be the function that defines the indifference curve through the point (x ∗ 1, x ∗ 2), i.e.,<br />

u(x 1 , x u 2(x 1 )) ≡ ū ≡ u(x ∗ 1, x ∗ 2).<br />

Now, totally differentiating this identity gives<br />

∂u<br />

∂x 1<br />

(x 1 , x u 2(x 1 )) + ∂u<br />

∂x 2<br />

(x 1 , x u 2(x 1 )) dxu 2<br />

dx 1<br />

(x 1 ) = 0.<br />

37


383. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

x 2<br />

✻<br />

❅<br />

❅<br />

❅<br />

❅<br />

❅<br />

❅<br />

❅<br />

❅<br />

x ∗ 2<br />

<br />

<br />

<br />

<br />

<br />

<br />

x ∗ 1<br />

❅<br />

❅<br />

❅<br />

❅<br />

❅<br />

❅<br />

❅<br />

u(x 1 , x 2 ) = ū<br />

p 1 x 1 + p 2 x 2 = y<br />

✲<br />

x 1<br />

Figure 1<br />

That is,<br />

∂u<br />

dx u 2<br />

∂x<br />

(x 1 ) = − 1<br />

(x 1 , x u 2(x 1 ))<br />

dx<br />

∂u<br />

1 ∂x 2<br />

(x 1 , x u 2 (x 1)) .<br />

Now x u 2(x ∗ 1) = x ∗ 2. Thus the slope of the indifference curve at the point (x ∗ 1, x ∗ 2)<br />

dx u 2<br />

dx 1<br />

(x ∗ 1) = −<br />

∂u<br />

∂x 1<br />

(x ∗ 1, x ∗ 2)<br />

∂u<br />

∂x 2<br />

(x ∗ 1 , x∗ 2 ).<br />

Also, the slope of the budget line is − p1<br />

p 2<br />

. Combining these two results again gives<br />

result (6).<br />

Since we also have another equation that (x ∗ 1, x ∗ 2) must satisfy, viz<br />

(7) p 1 x ∗ 1 + p 2 x ∗ 2 = y<br />

we have two equations in two unknowns and we can (if we know what the utility<br />

function is and what p 1 , p 2 , and y are) go happily away and solve the problem.<br />

(This isn’t quite true but we shall not go into that at this point.) What we shall<br />

develop is a systemic and useful way to obtain the conditions (6) and (7). Let us<br />

first denote the common value of the ratios in (6) by λ. That is,<br />

and we can rewrite this and (7) as<br />

(8)<br />

∂u<br />

∂x 1<br />

(x ∗ 1, x ∗ 2)<br />

p 1<br />

= λ =<br />

∂u<br />

∂x 2<br />

(x ∗ 1, x ∗ 2)<br />

p 2<br />

∂u<br />

∂x 1<br />

(x ∗ 1, x ∗ 2) − λp 1 = 0<br />

∂u<br />

∂x 2<br />

(x ∗ 1, x ∗ 2) − λp 2 = 0<br />

y − p 1 x ∗ 1 − p 2 x ∗ 2 = 0.


1. CONSTRAINED MAXIMISATION 39<br />

Now we have three equations in x ∗ 1, x ∗ 2, and the new artificial or auxiliary variable<br />

λ. Again we can, perhaps, solve these equations for x ∗ 1, x ∗ 2, and λ. Consider the<br />

following function<br />

(9) L(x 1 , x 2 , λ) = u(x 1 , x 2 ) + λ(y − p 1 x 1 − p 2 x 2 )<br />

This function is known as the Lagrangian. Now, if we calculate ∂L<br />

∂x 1<br />

,<br />

∂L<br />

∂x 2<br />

, and, ∂L<br />

∂λ ,<br />

and set the results equal to zero we obtain exactly the equations given in (8). We<br />

now describe this technique in a somewhat more general way.<br />

Suppose that we have the following maximisation problem<br />

(10)<br />

max f(x 1 , . . . , x n )<br />

x 1,...,x n<br />

subject to g(x 1 , . . . , x n ) = c<br />

and we let<br />

(11) L(x 1 , . . . , x n , λ) = f(x 1 , . . . , x n ) + λ(c − g(x 1 , . . . , x n ))<br />

then if (x ∗ 1, . . . , x ∗ n) solves (10) there is a value of λ, say λ ∗ such that<br />

(12)<br />

(13)<br />

∂L<br />

(x ∗<br />

∂x<br />

1, . . . , x ∗ n, λ ∗ ) = 0<br />

i<br />

∂L<br />

∂λ (x∗ 1, . . . , x ∗ n, λ ∗ ) = 0.<br />

i = 1, . . . , n<br />

Notice that the conditions (12) are precisely the first order conditions for<br />

choosing x 1 , . . . , x n to maximise L, once λ ∗ has been chosen. This provides an<br />

intuition into this method of solving the constrained maximisation problem. In<br />

the constrained problem we have told the decision maker that he must satisfy<br />

g(x 1 , . . . , x n ) = c and that he should choose among all points that satisfy this constraint<br />

the point at which f(x 1 , . . . , x n ) is greatest. We arrive at the same answer<br />

if we tell the decision maker to choose any point he wishes but that for each unit by<br />

which he violates the constraint g(x 1 , . . . , x n ) = c we shall take away λ units from<br />

his payoff. <strong>Of</strong> course we must be careful to choose λ to be the correct value. If we<br />

choose λ too small the decision maker may choose to violate his constraint—e.g.,<br />

if we made the penalty for spending more than the consumer’s income very small<br />

the consumer would choose to consume more goods than he could afford and to<br />

pay the penalty in utility terms. On the other hand if we choose λ too large the<br />

decision maker may violate his constraint in the other direction, e.g., the consumer<br />

would choose not to spend any of his income and just receive λ units of utility for<br />

each unit of his income.<br />

It is possible to give a more general statement of this technique, allowing for<br />

multiple constraints. (<strong>Of</strong> course, we should always have fewer constraints than we<br />

have variables.) Suppose we have more than one constraint. Consider the problem<br />

Again we construct the Lagrangian<br />

max f(x 1 , . . . , x n )<br />

x 1,...,x n<br />

subject to g 1 (x 1 , . . . , x n ) = c 1<br />

.<br />

g m (x 1 , . . . , x n ) = c m .<br />

.<br />

(14)<br />

L(x 1 , . . . , x n , λ 1 , . . . , λ m ) = f(x 1 , . . . , x n )<br />

+ λ 1 (c 1 − g 1 (x 1 , . . . , x n )) + · · · + λ m (c m − g m (x 1 , . . . , x n ))


403. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

and again if (x ∗ 1, . . . , x ∗ n) solves (14) there are values of λ, say λ ∗ 1, . . . , λ ∗ m such that<br />

(15)<br />

∂L<br />

(x ∗<br />

∂x<br />

1, . . . , x ∗ n, λ ∗ 1, . . . , λ ∗ m) = 0<br />

i<br />

i = 1, . . . , n<br />

∂L<br />

(x ∗<br />

∂λ<br />

1, . . . , x ∗ n, λ ∗ 1, . . . , λ ∗ m) = 0<br />

j<br />

j = 1, . . . , m.<br />

1.2. Caveats and Extensions. Notice that we have been referring to the set<br />

of conditions which a solution to the maximisation problem must satisfy. (We call<br />

such conditions necessary conditions.) So far we have not even claimed that there<br />

necessarily is a solution to the maximisation problem. There are many examples of<br />

maximisation problems which have no solution. One example of an unconstrained<br />

problem with no solution is<br />

(16) max 2x<br />

x<br />

maximise over the choice of x the function 2x. Clearly the greater we make x the<br />

greater is 2x, and so, since there is no upper bound on x there is no maximum.<br />

Thus we might want to restrict maximisation problems to those in which we choose<br />

x from some bounded set. Again, this is not enough. Consider the problem<br />

(17) max<br />

0≤x≤1 1/x .<br />

The smaller we make x the greater is 1/x and yet at zero 1/x is not even defined.<br />

We could define the function to take on some value at zero, say 7. But then the<br />

function would not be continuous. Or we could leave zero out of the feasible set<br />

for x, say 0 < x ≤ 1. Then the set of feasible x is not closed. Since there would<br />

obviously still be no solution to the maximisation problem in these cases we shall<br />

want to restrict maximisation problems to those in which we choose x to maximise<br />

some continuous function from some closed (and because of the previous example)<br />

bounded set. (We call a set of numbers, or more generally a set of vectors, that<br />

is both closed and bounded a compact set.) Is there anything else that could go<br />

wrong? No! The following result says that if the function to be maximised is<br />

continuous and the set over which we are choosing is both closed and bounded, i.e.,<br />

is compact, then there is a solution to the maximisation problem.<br />

Theorem 6 (The Weierstrass Theorem). Let S be a compact set. Let f be a<br />

continuous function that takes each point in S to a real number. (We usually write:<br />

let f : S → R be continuous.) Then there is some x ∗ in S at which the function is<br />

maximised. More precisely, there is some x ∗ in S such that f(x ∗ ) ≥ f(x) for any<br />

x in S.<br />

Notice that in defining such compact sets we typically use inequalities, such<br />

as x ≥ 0. However in Section 1 we did not consider such constraints, but rather<br />

considered only equality constraints. However, even in the example of utility maximisation<br />

at the beginning of Section 5.6, there were implicitly constraints on x 1<br />

and x 2 of the form<br />

x 1 ≥ 0, x 2 ≥ 0.<br />

A truly satisfactory treatment would make such constraints explicit. It is possible<br />

to explicitly treat the maximisation problem with inequality constraints, at the<br />

price of a little additional complexity. We shall return to this question later in the<br />

book.<br />

Also, notice that had we wished to solve a minimisation problem we could<br />

have transformed the problem into a maximisation problem by simply multiplying<br />

the objective function by −1. That is, if we wish to minimise f(x) we could do<br />

so by maximising −f(x). As an exercise write out the conditions analogous to


2. THE IMPLICIT FUNCTION THEOREM 41<br />

the conditions (8) for the case that we wanted to minimise u(x). Notice that if<br />

x ∗ 1, x ∗ 2, and λ satisfy the original equations then x ∗ 1, x ∗ 2, and −λ satisfy the new<br />

equations. Thus we cannot tell whether there is a maximum at (x ∗ 1, x ∗ 2) or a<br />

minimum. This corresponds to the fact that in the case of a function of a single<br />

variable over an unconstrained domain at a maximum we require the first derivative<br />

to be zero, but that to know for sure that we have a maximum we must look at the<br />

second derivative. We shall not develop the analogous conditions for the constrained<br />

problem with many variables here. However, again, we shall return to it later in<br />

the book.<br />

2. The Implicit Function Theorem<br />

In the previous section we said things like: “Now we have three equations<br />

in x ∗ 1, x ∗ 2, and the new artificial or auxiliary variable λ. Again we can, perhaps,<br />

solve these equations for x ∗ 1, x ∗ 2, and λ.” In this section we examine the question<br />

of when we can solve a system of n equations to give n of the variable in terms<br />

of the others. Let us suppose that we have n endogenous variables x 1 , . . . , x n ,<br />

m exogenous variables or parameters, b 1 , . . . , b m , and n equations or equilibrium<br />

conditions<br />

f 1 (x 1 , . . . , x n , b 1 , . . . , b m ) = 0<br />

(18)<br />

or, using vector notation,<br />

f 2 (x 1 , . . . , x n , b 1 , . . . , b m ) = 0<br />

f n (x 1 , . . . , x n , b 1 , . . . , b m ) = 0,<br />

f(x, b) = 0,<br />

where f : R n+m → R n , x ∈ R n , that is it is an n vector, b ∈ R m , and 0 ∈ R n .<br />

When can we solve this system to obtain functions giving each x i as a function<br />

of b 1 , . . . , b m ? As we’ll see below we only give an incomplete answer to this question,<br />

but first let’s look at the case that the function f is a linear function.<br />

Suppose that our equations are<br />

a 11 x 1 + . . . a 1n x n + c 11 b 1 + c 1m b m = 0<br />

a 21 x 1 + . . . a 2n x n + c 21 b 1 + c 2m b m = 0<br />

a n1 x 1 + . . . a nn x n + c n1 b 1 + c nm b m = 0.<br />

We can write this, in matrix notation, as<br />

[ ]<br />

x<br />

[A | C] = 0,<br />

b<br />

where A is an n × n matrix, C is an n × m matrix, x is an n × 1 (column) vector,<br />

and b is an m × 1 vector.<br />

This we can rewrite as<br />

Ax + Cb = 0,<br />

and solve this to give<br />

x = −A −1 Cb.<br />

And we can do this as long as the matrix A can be inverted, that is, as long as the<br />

matrix A is of full rank.<br />

Our answer to the general question in which the function f may not be linear<br />

is that if there are some values (¯x, ¯b) for which f(¯x, ¯b) = 0 then if, when we take<br />

a linear approximation to f we can solve the approximate linear system as we did<br />

.<br />

.


423. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

above, then we can solve the true nonlinear system, at least in a neighbourhood of<br />

(¯x, ¯b). By this last phrase we mean that if b is not close to ¯b we may not be able to<br />

solve the system, and that for a particular value of b there may be many values of<br />

x that solve the system, but there is only one close to ¯x.<br />

To see why we can’t, in general, do better than this consider the equation<br />

f : R 2 → R given by f(x, b) = g(x)−b, where the function g is graphed in Figure 2.<br />

Notice that the values (¯x, ¯b) satisfy the equation f(x, b) = 0. For all values of b<br />

close to ¯b we can find a unique value of x close to ¯x such that f(x, b) = 0. However,<br />

(1) for each value of b there are other values of x far away from ¯x that also satisfy<br />

f(x, b) = 0, and (2) there are values of b, such as ˜b for which there are no values of<br />

x that satisfy f(x, b) = 0.<br />

g(x)<br />

✻<br />

˜b<br />

<br />

¯b<br />

<br />

<br />

<br />

¯x<br />

✲<br />

x<br />

Figure 2<br />

Let us consider again the system of equations 18. We say that the function f<br />

is C 1 on some open set A ⊂ R n+m if f has partial derivatives everywhere in A and<br />

these partial derivatives are continuous on A.<br />

Theorem 7. Suppose that f : R n+m → R n is a C 1 function on an open set<br />

A ⊂ R n+m and that (¯x, ¯b) in A is such that f(¯x, ¯b) = 0. Suppose also that<br />

⎡<br />

⎤<br />

∂f 1(x,b)<br />

∂x 1<br />

· · ·<br />

∂f(x, b)<br />

∂x<br />

=<br />

⎢<br />

⎣<br />

.<br />

∂f n(x,b)<br />

∂x 1<br />

· · ·<br />

∂f 1(x,b)<br />

∂x n<br />

.<br />

∂f n(x,b)<br />

∂x n<br />

is of full rank. Then there are open sets A 1 ⊂ R n and A 2 ⊂ R m with ¯x in A 1 and<br />

¯b in A2 and A 1 × A 2 ⊂ A such that for each b in A 2 there is exactly one g(b) in A 1<br />

such that f(g(b), b) = 0. Moreover, g : A 2 → A 1 is a C 1 function and<br />

∂g(b)<br />

∂b<br />

[ ] −1 [ ∂f(g(b), b) ∂f(g(b), b)<br />

= −<br />

∂x<br />

∂b<br />

⎥<br />

⎦<br />

]<br />

.


3. THE THEOREM OF THE MAXIMUM 43<br />

(19)<br />

Exercise 74. Consider the general utility maximisation problem<br />

max u(x 1 , x 2 , . . . , x n )<br />

x 1,x 2,...,x n<br />

subject to p 1 x 1 + p 2 x 2 + · · · + p n x n = w.<br />

Suppose that for some price vector ¯p the maximisation problem has a utility maximising<br />

bundle ¯x. Find conditions on the utility function such that in a neighbourhood<br />

of (¯x, ¯p) we can solve for the demand functions x(p). Find the derivatives of<br />

the demand functions, ∂x/∂p.<br />

Exercise 75. Now suppose that there are only two goods and the utility<br />

function is given by<br />

u(x 1 , x 2 ) = (x 1 ) 1 3 (x2 ) 2 3 .<br />

Solve this utility maximisation problem, as you learned to do in Section 1 of this<br />

Chapter, and then differentiate the demand functions that you find to find the<br />

partial derivative with respect to p 1 , p 2 , and w of each demand function.<br />

Also find the same derivatives using the method of the previous exercise.<br />

3. The Theorem of the Maximum<br />

<strong>Of</strong>ten in economics we are not so much interested in what the solution to a<br />

particular maximisation problem is but rather wish to know how the solution to a<br />

parameterised problem depends on the parameters. Thus in our first example of<br />

utility maximisation we might be interested not so much in what the solution to the<br />

maximisation problem is when p 1 = 2, p 2 = 7, and y = 25, but rather in how the<br />

solution depends on p 1 , p 2 , and y. (That is, we might be interested in the demand<br />

function.) Sometimes we shall also be interested in how the maximised function<br />

depends on the parameters—in the example how the maximised utility depends on<br />

p 1 , p 2 , and y.<br />

This raises a number of questions. In order for us to speak meaningfully of a<br />

demand function it should be the case that the maximisation problem has a unique<br />

solution. Further, we would like to know if the “demand” function is continuous—<br />

or even if it is differentiable. Consider again the problem (14), but this time let us<br />

explicitly add some parameters.<br />

(20)<br />

max f(x 1 , . . . , x n , a 1 , . . . , a k )<br />

x 1,...,x n<br />

subject to g 1 (x 1 , . . . , x n , a 1 , . . . , a k ) = c 1<br />

.<br />

g m (x 1 , . . . , x n , a 1 , . . . , a k ) = c m<br />

In order to be able to say whether or not the problem has a unique solution<br />

it is useful to know something about the shape or curvature of the functions f<br />

and g. We say a function is concave if for any two points in the domain of the<br />

function the value of function at a weighted average of the two points is greater<br />

than the weighted average of the value of the function at the two points. We say<br />

the function is convex if the value of the function at the average is less than the<br />

average of the values. The following definition makes this a little more explicit. (In<br />

both definitions x = (x 1 , . . . , x n ) is a vector.)<br />

Definition 15. A function f is concave if for any x and x ′ with x ≠ x ′ and<br />

for any t such that 0 < t < 1 we have f(tx + (1 − t)x ′ ) ≥ tf(x) + (1 − t)f(x ′ ). The<br />

function is strictly concave if f(tx + (1 − t)x ′ ) > tf(x) + (1 − t)f(x ′ ).<br />

A function f is convex if for any x and x ′ with x ≠ x ′ and for any t such that<br />

0 < t < 1 we have f(tx + (1 − t)x ′ ) ≤ tf(x) + (1 − t)f(x ′ ). The function is strictly<br />

convex if f(tx + (1 − t)x ′ ) < tf(x) + (1 − t)f(x ′ ).<br />

.


443. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

The result we are about to give is most conveniently stated when our statement<br />

of the problem is in terms of inequality constraints rather than equality constraints.<br />

As mentioned earlier we shall examine this kind of problem later in this course.<br />

However for the moment in order to proceed with our discussion of the problem<br />

involving equality constraints we shall assume that all of the functions with which<br />

we are dealing are increasing in the x variables. (See Exercise 1 for a formal<br />

definition of what it means for a function to be increasing.) In this case if f is<br />

strictly concave and g j is convex for each j then the problem has a unique solution.<br />

In fact the concepts of concavity and convexity are somewhat stronger than is<br />

required. We shall see later in the course that they can be replaced by the concepts<br />

of quasi-concavity and quasi-convexity. In some sense these latter concepts are the<br />

“right” concepts for this result.<br />

Theorem 8. Suppose that f and g j are increasing in (x 1 , . . . , x n ). If f is<br />

strictly concave in (x 1 , . . . , x n ) and g j is convex in (x 1 , . . . , x n ) for j = 1, . . . , m<br />

then for each value of the parameters (a 1 , . . . , a k ) if problem (20) has a solution<br />

(x ∗ 1, . . . , x ∗ n) that solution is unique.<br />

Now let v(a 1 , . . . , a k ) be the maximised value of f when the parameters are<br />

(a 1 , . . . , a k ). Let us suppose that the problem is such that the solution is unique and<br />

that (x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k )) are the values that maximise the function<br />

f when the parameters are (a 1 , . . . , a k ) then<br />

(21) v(a 1 , . . . , a k ) = f(x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k ).<br />

(Notice however that the function v is uniquely defined even if there is not a unique<br />

maximiser.)<br />

The Theorem of the Maximum gives conditions on the problem under which<br />

the function v and the functions x ∗ 1, . . . , x ∗ n are continuous. The constraints in the<br />

problem (20) define a set of feasible vectors x over which the function f is to be<br />

maximised. Let us call this set G(a 1 , . . . , a k ), i.e.,<br />

(22) G(a 1 , . . . , a k ) = {(x 1 , . . . , x n ) | g j (x 1 , . . . , x n , a 1 , . . . , a k ) = c j ∀j}<br />

Now we can restate the problem as<br />

(23)<br />

max f(x 1 , . . . , x n , a 1 , . . . , a k )<br />

x 1,...,x n<br />

subject to (x 1 , . . . , x n ) ∈ G(a 1 , . . . , a k ).<br />

Notice that both the function f and the feasible set G depend on the parameters<br />

a, i.e., both may change as a changes. The Theorem of the Maximum requires<br />

both that the function f be continuous as a function of x and a and that the<br />

feasible set G(a 1 , . . . , a k ) change continuously as a changes. We already know—<br />

or should know—what it means for f to be continuous but the notion of what it<br />

means for a set to change continuously is less elementary. We call G a set valued<br />

function or a correspondence. G associates with any vector (a 1 , . . . , a k ) a subset of<br />

the vectors (x 1 , . . . , x n ). The following two definitions define what we mean by a<br />

correspondence being continuous. First we define what it means for two sets to be<br />

close.<br />

Definition 16. Two sets of vectors A and B are within ɛ of each other if for<br />

any vector x in one set there is a vector x ′ in the other set such that x ′ is within ɛ<br />

of x.<br />

We can now define the continuity of the correspondence G in essentially the<br />

same way that we define the continuity of a single valued function.


4. THE ENVELOPE THEOREM 45<br />

Definition 17. The correspondence G is continuous at (a 1 , . . . , a k ) if for any<br />

ɛ > 0 there is δ > 0 such that if (a ′ 1, . . . , a ′ k ) is within δ of (a 1, . . . , a k ) then<br />

G(a ′ 1, . . . , a ′ k ) is within ɛ of G(a 1, . . . , a k ).<br />

It is, unfortunately, not the case that the continuity of the functions g j necessarily<br />

implies the continuity of the feasible set. (Exercise 2 asks you to construct a<br />

counterexample.)<br />

Remark 1. It is possible to define two weaker notions of continuity, which we<br />

call upper hemicontinuity and lower hemicontinuity. A correspondence is in fact<br />

continuous in the way we have defined it if it is both upper hemicontinuous and<br />

lower hemicontinuous.<br />

We are now in a position to state the Theorem of the Maximum. We assume<br />

that f is a continuous function, that G is a continuous correspondence, and that<br />

for any (a 1 , . . . , a k ) the set G(a 1 , . . . , a k ) is compact. The Weierstrass Theorem<br />

thus guarantees that there is a solution to the maximisation problem (23) for any<br />

(a 1 , . . . , a k ).<br />

Theorem 9 (Theorem of the Maximum). Suppose that f(x 1 , . . . , x n , a 1 , . . . , a k )<br />

is continuous (in (x 1 , . . . , x n , a 1 , . . . , a k )), that G(a 1 , . . . , a k ) is a continuous correspondence,<br />

and that for any (a 1 , . . . , a k ) the set G(a 1 , . . . , a k ) is compact. Then<br />

(1) v(a 1 , . . . , a k ) is continuous, and<br />

(2) if (x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k )) are (single valued) functions then<br />

they are also continuous.<br />

Later in the course we shall see how the Implicit Function Theorem allows us<br />

to identify conditions under which the functions v and x ∗ are differentiable.<br />

Exercises.<br />

Exercise 76. We say that the function f(x 1 , . . . , x n ) is nondecreasing if x ′ i ≥<br />

x i for each i implies that f(x ′ 1, . . . , x ′ n) ≥ f(x 1 , . . . , x n ), is increasing if x ′ i > x i<br />

for each i implies that f(x ′ 1, . . . , x ′ n) > f(x 1 , . . . , x n ) and is strictly increasing if<br />

x ′ i ≥ x i for each i and x ′ j > x j for at least one j implies that f(x ′ 1, . . . , x ′ n) ><br />

f(x 1 , . . . , x n ). Show that if f is nondecreasing and strictly concave then it must be<br />

strictly increasing. [Hint: This is very easy.]<br />

Exercise 77. Show by example that even if the functions g j are continuous<br />

the correspondence G may not be continuous. [Hint: Use the case n = m = k = 1.]<br />

4. The Envelope Theorem<br />

In this section we examine a theorem that is particularly useful in the study<br />

of consumer and producer theory. There is in fact nothing mysterious about this<br />

theorem. You will see that the proof of this theorem is simply calculation and a<br />

number of substitutions. Moreover the theorem has a very clear intuition. It is this:<br />

Suppose we are at a maximum (in an unconstrained problem) and we change the<br />

data of the problem by a very small amount. Now both the solution of the problem<br />

and the value at the maximum will change. However at a maximum the function<br />

is flat (the first derivative is zero). Thus when we want to know by how much the<br />

maximised value has changed it does not matter (very much) whether or not we<br />

take account of how the maximiser changes or not. See Figure 2. The intuition for<br />

a constrained problem is similar and only a little more complicated.<br />

To motivate our discussion of the Envelope Theorem we will first consider a<br />

particular case, viz, the relation between short and long run average cost curves.<br />

Recall that, in general we assume that the average cost of producing some good is


463. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

f(x, a)<br />

✻<br />

f(x ∗ (a ′ ), a ′ )<br />

f(x ∗ (a), a ′ )<br />

f(x ∗ (a), a)<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

f(·, a)<br />

<br />

<br />

x ∗ (a) x ∗ (a ′ )<br />

f(·, a ′ )<br />

✲<br />

x<br />

Figure 2<br />

a function of the amount of the good to be produced. The short run average cost<br />

function is defined to be the function which for any quantity, Q, gives the average<br />

cost of producing that quantity, taking as given the scale of operation, i.e., the size<br />

and number of plants and other fixed capital which we assume cannot be changed<br />

in the short run (whatever that is). The long run average cost function on the<br />

other hand gives, as a function of Q, the average cost of producing Q units of the<br />

good, with the scale of operation selected to be the optimal scale for that level of<br />

production.<br />

That is, if we let the scale of operation be measured by a single variable k,<br />

say, and we let the short run average cost of producing Q units when the scale is<br />

k be given by SRAC(Q, k) and the long run average cost of producing Q units by<br />

LRAC(Q) then we have<br />

LRAC(Q) = min SRAC(Q, k).<br />

k<br />

Let us denote, for a given value Q, the optimal level of k by k(Q). That is, k(Q) is<br />

the value of k that minimises the right hand side of the above equation.<br />

Graphically, for any fixed level of k the short run average cost function can be<br />

represented by a curve (normally assumed to be U-shaped) drawn in two dimensions<br />

with quantity on the horizontal axis and cost on the vertical axis. Now think about<br />

drawing one short run average cost curve for each of the (infinite) possible values of<br />

k. One way of thinking about the long run average cost curve is as the “bottom” or<br />

envelope of these short run average cost curves. Suppose that we consider a point<br />

on this long run or envelope curve. What can be said about the slope of the long<br />

run average cost curve at this point. A little thought should convince you that it<br />

should be the same as the slope of the short run curve through the same point.<br />

(If it were not then that short run curve would come below the long run curve, a


4. THE ENVELOPE THEOREM 47<br />

contradiction.) That is,<br />

See Figure 3.<br />

d LRAC(Q)<br />

dQ<br />

=<br />

∂ SRAC(Q, k(Q))<br />

.<br />

∂Q<br />

Cost<br />

✻<br />

SRAC<br />

LRAC( ¯Q) =<br />

SRAC( ¯Q, k( ¯Q))<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

¯Q<br />

LRAC<br />

✲<br />

Q<br />

Figure 3<br />

The envelope theorem is a general statement of the result of which this is a<br />

special case. We will consider not only cases in which Q and k are vectors, but also<br />

cases in which the maximisation or minimisation problem includes some constraints.<br />

Let us consider again the maximisation problem (20). Recall:<br />

max f(x 1 , . . . , x n , a 1 , . . . , a k )<br />

x 1,...,x n<br />

subject to g 1 (x 1 , . . . , x n , a 1 , . . . , a k ) = c 1<br />

.<br />

.<br />

g m (x 1 , . . . , x n , a 1 , . . . , a k ) = c m<br />

Again let L(x 1 , . . . , x n , λ 1 , . . . , λ m ; a 1 , . . . , a k ) be the Lagrangian function.<br />

(24)<br />

L(x 1 , . . . , x n , λ 1 , . . . , λ m ; a 1 , . . . , a k ) = f(x 1 , . . . , x n , a 1 , . . . , a k )<br />

m∑<br />

+ λ j (c j − g j (x 1 , . . . , x n , a 1 , . . . , a k )).<br />

j=1<br />

Let (x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k )) and (λ 1 (a 1 , . . . , a k ), . . . , λ m (a 1 , . . . , a k )) be<br />

the values of x and λ that solve this problem. Now let<br />

(25) v(a 1 , . . . , a k ) = f(x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k )<br />

That is, v(a 1 , . . . , a k ) is the maximised value of the function f when the parameters<br />

are (a 1 , . . . , a k ). The envelope theorem says that the derivative of v is equal to the<br />

derivative of L at the maximising values of x and λ. Or, more precisely


483. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

Theorem 10 (The Envelope Theorem). If all functions are defined as above<br />

and the problem is such that the functions x ∗ and λ are well defined then<br />

∂v<br />

∂a h<br />

(a 1 , . . . , a k ) = ∂L<br />

∂a h<br />

(x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ),<br />

for all h.<br />

λ 1 (a 1 , . . . , a k ), . . . , λ m (a 1 , . . . , a k ), a 1 , . . . , a k )<br />

= ∂f (x ∗<br />

∂a<br />

1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k )<br />

h<br />

m∑<br />

− λ j (a 1 , . . . , a k ) ∂g h<br />

(x ∗<br />

∂a<br />

1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k )<br />

h<br />

j=1<br />

In order to show the advantages of using matrix and vector notation we shall<br />

restate the theorem in that notation before returning to give a proof of the theorem.<br />

(In proving the theorem we shall return to using mainly scalar notation.)<br />

Theorem 10 (The Envelope Theorem). Under the same conditions as above<br />

∂v ∂L<br />

(a) =<br />

∂a ∂a (x∗ (a), λ(a), a)<br />

= ∂f<br />

∂a (x∗ (a), a) − λ(a) ∂g<br />

∂a (x∗ (a), a).<br />

Proof. From the definition of the function v we have<br />

(26) v(a 1 , . . . , a k ) = f(x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k )<br />

Thus<br />

(27)<br />

∂v<br />

(a) = ∂f (x ∗ (a), a) +<br />

∂a h ∂a h<br />

n∑<br />

i=1<br />

∂f<br />

∂x i<br />

(x ∗ (a), a) ∂x∗ i<br />

∂a h<br />

(a).<br />

Now, from the first order conditions (12) we have<br />

∂f<br />

m∑<br />

(x ∗ (a), a) − λ j (a) ∂g j<br />

(x ∗ (a), a) = 0.<br />

∂x i ∂x i<br />

Or<br />

(28)<br />

j=1<br />

∂f<br />

∂x i<br />

(x ∗ (a), a) =<br />

m∑<br />

j=1<br />

λ j (a) ∂g j<br />

∂x i<br />

(x ∗ (a), a).<br />

Also, since x ∗ (a) satisfies the constraints we have, for each j<br />

g j (x ∗ 1(a), . . . , x ∗ n(a), a 1 , . . . , a k ) ≡ c j .<br />

And, since this holds as an identity, we may differentiate both sides with respect<br />

to a h giving<br />

n∑ ∂g j<br />

(x ∗ (a), a) ∂x∗ i<br />

(a) + ∂g j<br />

(x ∗ (a), a) = 0.<br />

∂x i ∂a h ∂a h<br />

Or<br />

(29)<br />

i=1<br />

n∑<br />

i=1<br />

Substituting (28) into (27) gives<br />

∂g j<br />

∂x i<br />

(x ∗ (a), a) ∂x∗ i<br />

∂a h<br />

(a) = − ∂g j<br />

∂a h<br />

(x ∗ (a), a).<br />

∂v<br />

(a) = ∂f (x ∗ (a), a) +<br />

∂a h ∂a h<br />

n∑ m∑<br />

[<br />

i=1 j=1<br />

λ j (a) ∂g j<br />

∂x i<br />

(x ∗ (a), a)] ∂x∗ i<br />

∂a h<br />

(a).


5. APPLICATIONS TO MICRO<strong>ECON</strong>OMIC THEORY 49<br />

Changing the order of summation gives<br />

(30)<br />

∂v<br />

∂a h<br />

(a) = ∂f<br />

∂a h<br />

(x ∗ (a), a) +<br />

m∑ n∑<br />

λ j (a)[<br />

j=1<br />

i=1<br />

∂g j<br />

∂x i<br />

(x ∗ (a), a) ∂x∗ i<br />

∂a h<br />

(a)].<br />

And now substituting (29) into (30) gives<br />

∂v<br />

(a) = ∂f (x ∗ (a), a) −<br />

∂a h ∂a h<br />

m∑<br />

j=1<br />

λ j (a) ∂g j<br />

∂a h<br />

(x ∗ (a), a),<br />

which is the required result.<br />

□<br />

Exercises.<br />

Exercise 78. Rewrite this proof using matrix notation. Go through your proof<br />

and identify the dimension of each of the vectors or matrices you use. For example<br />

f x is a 1 × n vector, g x is an m × n matrix.<br />

5. Applications to Microeconomic Theory<br />

5.1. Utility Maximisation. Let us again consider the problem given in (31)<br />

max u(x 1 , x 2 )<br />

x 1,x 2<br />

subject to p 1 x 1 + p 2 x 2 − y = 0.<br />

Let v(p 1 , p 2 , y) be the maximised value of u when prices and income are p 1 , p 2 , and<br />

y. Let us consider the effect of a change in y with p 1 and p 2 remaining constant.<br />

By the Envelope Theorem<br />

∂v<br />

∂y = ∂ ∂y {u(x 1, x 2 ) + λ(y − p 1 x 1 + p 2 x 2 )} = 0 + λ1 = λ.<br />

This is the familiar result that λ is the marginal utility of income.<br />

5.2. Expenditure Minimisation. Let us consider the problem of minimising<br />

expenditure subject to attaining a given level of utility, i.e.,<br />

min<br />

x 1,...,x n<br />

n<br />

∑<br />

i=1<br />

p i x i<br />

subject to u(x 1 , . . . , x n ) − u 0 = 0.<br />

Let the minimised value of the expenditure function be denoted by<br />

e(p 1 , . . . , p n , u 0 ). Then by the Envelope Theorem we obtain<br />

∂e<br />

=<br />

∂ {<br />

∂p i ∂p i<br />

n∑<br />

p i x i + λ(u 0 − u(x 1 , . . . , x n ))} = x i − λ0 = x i<br />

i=1<br />

when evaluated at the point which solves the minimisation problem which we write<br />

as h i (p 1 , . . . , p n , u 0 ) to distinguish this (compensated) value of the demand for good<br />

i as a function of prices and utility from the (uncompensated) value of the demand<br />

for good i as a function of prices and income. This result is known as Hotelling’s<br />

Theorem.


503. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

5.3. The Hicks-Slutsky Equations. It can be shown that the compensated<br />

demand at utility u 0 , i.e., h i (p 1 , . . . , p n , u 0 ) is equal to the uncompensated demand<br />

at income e(p 1 , . . . , p n , u 0 ), i.e., x i (p 1 , . . . , p n , e(p 1 , . . . , p n , u 0 )). (This result<br />

is known as the duality theorem.) Thus totally differentiating the identity<br />

with respect to p k we obtain<br />

x i (p 1 , . . . , p n , e(p 1 , . . . , p n , u 0 )) ≡ h i (p 1 , . . . , p n , u 0 )<br />

which by Hotelling’s Theorem gives<br />

So<br />

∂x i<br />

+ ∂x i ∂e<br />

= ∂h i<br />

∂p k ∂y ∂p k ∂p k<br />

∂x i<br />

∂p k<br />

+ ∂x i<br />

∂y h k = ∂h i<br />

∂p k<br />

.<br />

∂x i<br />

= ∂h i ∂x i<br />

− h k<br />

∂p k ∂p k ∂y<br />

for all i, k = 1, . . . , n. These are the Hicks-Slutsky equations.<br />

5.4. The Indirect Utility Function. Again let v(p 1 , . . . , p n , y) be the indirect<br />

utility function, that is, the maximised value of utility as described in Application<br />

(1). Then by the Envelope Theorem<br />

since ∂u<br />

∂p i<br />

have<br />

∂v<br />

= ∂u − λx i (p 1 , . . . , p n , y) = −λx i (p 1 , . . . , p n , y)<br />

∂p i ∂p i<br />

= 0. Now, since we have already shown that λ = ∂v<br />

∂y<br />

This is known as Roy’s Theorem.<br />

x i (p 1 , . . . , p n , y) = − ∂v/∂p i<br />

∂v/∂y .<br />

(in Section 4.1) we<br />

5.5. Profit functions. Now consider the problem of a firm that maximises<br />

profits subject to technology constraints. Let x = (x 1 , . . . , x n ) be a vector of<br />

netputs, i.e., x i is positive if the firm is a net supplier of good i, negative if the firm<br />

is a net user of that good. Let assume that we can write the technology constraints<br />

as F (x) = 0. Thus the firm’s problem is<br />

max<br />

x 1,...,x n<br />

n ∑<br />

i=1<br />

p i x i<br />

subject to F (x 1 , . . . , x n ) = 0.<br />

Let ϕ i (p) be the value of x i that solves this problem, i.e., the net supply of<br />

commodity i when prices are p. (Here p is a vector.) We call the maximised value<br />

the profit function which is given by<br />

n∑<br />

Π(p) = p i ϕ i (p).<br />

And so by the Envelope Theorem<br />

i=1<br />

∂Π<br />

∂p i<br />

= ϕ i (p).<br />

This result is known as Hotelling’s lemma.


5. APPLICATIONS TO MICRO<strong>ECON</strong>OMIC THEORY 51<br />

5.6. Cobb-Douglas Example. We consider a particular Cobb-Douglas example<br />

of the utility maximisation problem<br />

√ √<br />

x1 x2<br />

(31)<br />

The Lagrangean is<br />

max<br />

x 1,x 2<br />

subject to p 1 x 1 + p 2 x 2 = w<br />

(32) L(x 1 , x 2 , λ) = √ x 1<br />

√<br />

x2 + λ(y − p 1 x 1 − p 2 x 2 )<br />

and the first order conditions are<br />

∂L<br />

= 1 1<br />

∂x 1 2 x− 2<br />

1 x 1 2<br />

(33)<br />

2 − p 1 λ = 0<br />

∂L<br />

= 1 ∂x 2 2 x 1 2<br />

1 x − 1 2<br />

(34)<br />

2 − p 2 λ = 0<br />

∂L<br />

(35)<br />

∂λ = w − p 1x 1 − p 2 x 2 = 0.<br />

If we divide equation (33) by equation (34) we obtain<br />

x 1 −1 x 2 = p 1 /p 2<br />

or<br />

p 1 x 1 = p 2 x 2<br />

and if we substitute this into equation (35) we obtain<br />

or<br />

w − p 1 x 1 − p 1 x 1 = 0<br />

(36) x 1 = w<br />

2p 1<br />

.<br />

Similarly,<br />

(37) x 2 = w<br />

2p 2<br />

.<br />

Substituting equations (36) and (37) into the utility function gives<br />

√<br />

w<br />

(38) v(p 1 , p 2 , w) =<br />

2 w<br />

=<br />

4p 1 p 2 2 √ .<br />

p 1 p 2<br />

As a check here we can check some known properties of the indirect utility<br />

function. For example it is homogeneous of degree zero, that is, is we multiply p 1 ,<br />

p 2 , and w by the same positive constant, say α we do not change the value of v.<br />

You should confirm that this is the case.<br />

We now calculate the optimal value of λ from the first order conditions by<br />

substituting equations (36) and (37) into (33), giving<br />

or<br />

or<br />

or<br />

1<br />

2<br />

( ) − 1 ( ) 1<br />

w<br />

2 2<br />

w − p1 λ = 0<br />

2p 1 2p 2<br />

√<br />

1 2p1 w<br />

= p 1 λ<br />

2 w2p 2<br />

√<br />

p1<br />

1 1<br />

√ · = λ<br />

2 p2 p 1<br />

λ =<br />

1<br />

2 √ p 1 p 2<br />

.


523. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

Our first application of the Envelope Theorem told us that this value of λ could<br />

be found as the derivative of the indirect utility function with respect to w. We<br />

confirm this by differentiating the function we found above with respect to w.<br />

∂v<br />

∂w =<br />

∂<br />

∂w<br />

w<br />

2 √ p 1 p 2<br />

=<br />

1<br />

2 √ p 1 p 2<br />

as we had found directly above.<br />

Now let us, for the same utility function consider the expenditure minimisation<br />

problem<br />

The Lagrangian is<br />

min<br />

x 1,x 2<br />

p 1 x 2 + p 2 x 2<br />

subject to √ x 1<br />

√<br />

x2 = u.<br />

(39) L(x 1 , x 2 , λ) = p 1 x 1 + p 2 x 2 + λ(u − √ x 1<br />

√<br />

x2 )<br />

and the first order conditions are<br />

(40)<br />

(41)<br />

(42)<br />

∂L<br />

= p 1 − λ 1 1<br />

∂x 1 2 x− 2<br />

1 x 1 2<br />

2 = 0<br />

∂L<br />

= p 2 − λ 1 ∂x 2 2 x 1 2<br />

1 x − 1 2<br />

2 = 0<br />

∂L<br />

∂λ = u − √ √<br />

x 1 x2 = 0.<br />

Dividing equation (40) by equation (41) gives<br />

p 1<br />

= x 2<br />

p 2 x 1<br />

or<br />

(43) x 2 = p 1x 1<br />

p 2<br />

.<br />

And, if we substitute equation (43) into equation (40) we obtain<br />

or<br />

Similarly,<br />

u − x 1<br />

√<br />

p1<br />

p 2<br />

√<br />

p2<br />

x 1 = u .<br />

p 1<br />

√<br />

p1<br />

x 2 = u ,<br />

p 2<br />

and if we substitute these values back into the objective function we obtain the<br />

expenditure function<br />

e(p 1 , p 2 , u) = p 1 u<br />

√<br />

p2<br />

p 1<br />

+ p 2 u<br />

√<br />

p1<br />

p 2<br />

= 2u √ p 1 p 2 .<br />

Hotelling’s Theorem tells us that is we differentiate this expenditure function<br />

with respect to p i we should obtain the Hicksian demand function h i .<br />

∂e(p 1 , p 2 , u)<br />

= ∂ 2u √ p 1 p 2 = 2u · 1 √ √<br />

p2 p2<br />

= u<br />

∂p 1 ∂p 1 2 p 1<br />

as we had already found. And similarly for h 2 .<br />

p 1


5. APPLICATIONS TO MICRO<strong>ECON</strong>OMIC THEORY 53<br />

Let us summarise what we have found so far. The Marshallian demand functions<br />

are<br />

x 1 (p 1 , p 2 , w) =<br />

w<br />

2p 1<br />

x 2 (p 1 , p 2 , w) =<br />

w<br />

2p 2<br />

The indirect utility function is<br />

The Hicksian demand functions are<br />

v(p 1 , p 2 , w) =<br />

h 1 (p 1 , p 2 , w) = u<br />

w<br />

2 √ p 1 p 2<br />

.<br />

√<br />

p2<br />

p 1<br />

√<br />

p1<br />

h 2 (p 1 , p 2 , w) = u ,<br />

p 2<br />

and the expenditure function is<br />

e(p 1 , p 2 , u) = 2u √ p 1 p 2 .<br />

We now look at the third application concerning the Hicks -Slutsky decomposition.<br />

First let us confirm that if we substitute the expenditure function for w in<br />

the Marshallian demand function we do obtain the Hicksian demand function.<br />

x 1 (p 1 , p 2 , e(p 1 , p 2 , u)) = e(p 1, p 2 , u)<br />

2p 1<br />

= 2u√ p 1 p 2<br />

2p 1<br />

= u<br />

√<br />

p2<br />

as required.<br />

Similarly, if we plug the indirect utility function v into the Hicksian demand<br />

function h i we obtain the Marshallian demand function x i . Confirmation of this is<br />

left as an exercise. [You should do this exercise. If you understand properly it is<br />

very easy. If you understand a bit then doing the exercise will solidify your understanding.<br />

If you can’t do it then it is a message to get some further explanation.]<br />

Let us now check the Hicks-Slutsky decomposition for the effect of a change in<br />

the price of good 2 on the demand for good 1. The Hicks-Slutsky decomposition<br />

tells us that<br />

∂x 1<br />

= ∂h 1 ∂x 1<br />

− h 2<br />

∂p 2 ∂p 2 ∂w .<br />

Calculating these partial derivatives we have<br />

p 1<br />

,<br />

∂x 1<br />

= 0<br />

∂p 2<br />

∂x 1<br />

∂w = 1<br />

2p 1<br />

∂h 1<br />

= √ u × 1 ∂p 2 p1 2 × √ 1<br />

p2<br />

u<br />

=<br />

2 √ p 1 p 2


543. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />

and<br />

√<br />

p2<br />

h 1 = u .<br />

p 1<br />

Substituting into the right hand side of the Hicks-Slutsky equation above gives<br />

RHS =<br />

u<br />

2 √ p 1 p 2<br />

− u<br />

√<br />

p2<br />

p 1<br />

·<br />

1<br />

2p 1<br />

= 0,<br />

which is exactly what we had found for the left hand side of the Hicks-Slutsky<br />

equation.<br />

Finally we check Roy’s Theorem, which tells us that the Marshallian demand<br />

for good 1 can be found as<br />

In this case we obtain<br />

as required.<br />

Exercises.<br />

x 1 (p 1 , p 2 , w) = − ∂v<br />

∂p 1<br />

.<br />

∂v<br />

∂w<br />

x 1 (p 1 , p 2 , w) = − w 2 × 1 √ p2<br />

= w<br />

2p 1<br />

,<br />

1<br />

2<br />

× −1<br />

√<br />

1<br />

p 1p 2<br />

Exercise 79. Consider the direct utility function<br />

n∑<br />

u(x) = β i log(x i − γ i ),<br />

i=1<br />

2 × p −3<br />

2 1<br />

where β i and γ i , i = 1, . . . , n are, respectively, positive and nonpositive parameters.<br />

(1) Derive the indirect utility function and show that it is decreasing in its<br />

arguments.<br />

(2) Verify Roy’s Theorem.<br />

(3) Derive the expenditure function and show that it is homogeneous of degree<br />

one and nondecreasing in prices.<br />

(4) Verify Hotelling’s Theorem.<br />

Exercise 80. For the utility function defined in exercise 2,<br />

(1) Derive the Slutsky equation.<br />

(2) Let d i (p, y) be the demand for good i derived from the above utility function.<br />

Goods i and j are said to be gross substitutes if ∂d i (p, y)/∂p j > 0<br />

and gross complements if ∂d i (p, y)/∂p j < 0. For this utility function are<br />

the various goods gross substitutes, gross complements, or can we not say?<br />

(The two previous exercises are taken from R. Robert Russell and Maurice<br />

Wilkinson, Microeconomics: A Synthesis of Modern and Neoclassical Theory, New<br />

York, John Wiley & Sons, 1979.)<br />

Exercise 81. An electric utility has two generating plants in which total costs<br />

per hour are c 1 and c 2 respectively where<br />

c 1 =80 + 2x 1 + 0.001bx 2 1<br />

b >0<br />

c 2 =90 + 1.5x 2 + 0.002x 2 2


5. APPLICATIONS TO MICRO<strong>ECON</strong>OMIC THEORY 55<br />

where x i is the quantity generated in the i-th plant. If the utility is required to produce<br />

2000 megawatts in a particular hour, how should it allocate this load between<br />

the plants so as to minimise costs? Use the Lagrangian method and interpret the<br />

multiplier. How do total costs vary as b changes. (That is, what is the derivative<br />

of the minimised cost with respect to b.)


CHAPTER 4<br />

Topics in Convex <strong>Analysis</strong><br />

1. Convexity<br />

Convexity is one of the most important mathematical properties in economics.<br />

For example, without convexity of preferences, demand and supply functions<br />

are not continuous, and so competitive markets generally do not have equilibrium<br />

points. The economic interpretation of convex preference sets in consumer theory is<br />

diminishing marginal rates of substitution; the interpretation of convex production<br />

sets is constant or decreasing returns to scale. Considerably less is known about<br />

general equilibrium models that allow non-convex production sets (e.g., economies<br />

of scale) or non-convex preferences (e.g., the consumer prefers a pint of beer or a<br />

shot of vodka alone to any mixture of the two).<br />

Another set of mathematical results closely connected to the notion of convexity<br />

is so-called separation and support theorems. These theorems are frequently used in<br />

economics to obtain a price system that leads consumers and producers to choose<br />

Pareto-efficient allocation. That is, given the prices, producers are maximizing<br />

profits, and given those profits as income, consumers are maximizing utility subject<br />

to their budget constraints.<br />

1.1. Convex Sets. Given two points x, y ∈ R n , a point z = ax + (1 − a) y,<br />

where 0 ≤ a ≤ 1, is called a convex combination of x and y.<br />

The set of all possible convex combinations of x and y, denoted by [x, y], is<br />

called the interval with endpoints x and y (or, the line segment connecting x and<br />

y):<br />

[x, y] = {ax + (1 − a) y : 0 ≤ a ≤ 1} .<br />

Definition 18. A set S ⊆ R n is convex iff for any x, y ∈ S the interval<br />

[x, y] ⊆ S.<br />

In words: a set is convex if it contains the line segment connecting any two of<br />

its points; or, more loosely speaking, a set is convex if along with any two points it<br />

contains all points between them.<br />

Convex sets in R 2 include interiors of triangle, squares, circles, ellipses, and<br />

hosts of other sets. Note also that, for example in R 3 , while the interior of a cube is<br />

a convex set, its boundary is not. The quintessential convex set in Euclidean space<br />

R n for any n > 1 is the n−dimensional sphere S R (a) of radius R > 0 about point<br />

a ∈ R n , given by<br />

S R (a) = {x : x ∈ R n , |x − a| < R}.<br />

More examples of convex sets:<br />

1. Is the empty set convex? Is a singleton convex? Is R n convex?<br />

There are also several standard ways of forming convex sets from convex sets:<br />

2. Let A, B ⊆ R n be sets. The Minkowski sum A + B ⊆ R n is defined as<br />

A + B = {x + y : x ∈ A, y ∈ B} .<br />

When B = {b} is a singleton, the set A + b is called a translation of A. Prove that<br />

A + B is convex if A and B are convex.<br />

57


58 4. TOPICS IN CONVEX ANALYSIS<br />

3. Let A ⊆ R n be a set and α ∈ R be a number. The scaling αA ⊆ R n is<br />

defined as<br />

αA = {αx : x ∈ A} .<br />

When α > 0, the set αA is called a dilation of A. Prove that αA is convex if A is<br />

convex.<br />

4. Prove that the intersection ∩ i∈I S i of any number of convex sets is convex.<br />

5. Show by example that the union of convex sets need not be convex.<br />

It is also possible to define convex combination of arbitrary (but finite) number<br />

of points.<br />

Definition 19. Let x 1 , ..., x k be a finite set of points from R n . A point<br />

k∑<br />

x = α i x i ,<br />

where α i ≥ 0 for i = 1, ..., k and<br />

x 1 , ..., x k .<br />

k ∑<br />

i=1<br />

i=1<br />

α i = 1, is called a convex combination of<br />

Note that the definition of a convex combination of two points is a special case<br />

of this definition. (Prove it)<br />

Can we generate ‘superconvex’ sets using definition 19? No! as the following<br />

Lemma shows.<br />

Lemma 1. A set S ⊆ R n is convex iff every convex combination of points of S<br />

is in S.<br />

Proof. If a set contains all convex combinations of its points it is obviously<br />

convex, because it also contains convex combinations of all pairs its points. Thus,<br />

we need to show that a convex set contains any convex combination of its points.<br />

The proof is by induction on the number of points of S in a convex combination.<br />

By definition, convex set contains all convex combinations of any two of it points.<br />

Suppose that S contains any convex combination of n or fewer points and consider<br />

one of n + 1 points, x = ∑ n+1<br />

i=1 α ix i . Since not all α i = 1, we can relabel them so<br />

that α n+1 < 1. Then<br />

x = (1 − α n+1 )<br />

n∑<br />

i=1<br />

= (1 − α n+1 ) y + α n+1 x n+1 .<br />

α i<br />

1 − α n+1<br />

x i + α n+1 x n+1<br />

Note that y ∈ S by induction hypothesis (as a convex combination of n points of<br />

S) and, as a result, so is x, being a convex combination of two points in S. □<br />

But, using definition 19, we can generate convex sets from non-convex sets!<br />

This operation is very useful, so the resulting set deserves a special name.<br />

Definition 20. Given a set S ⊆ R n the set of all convex combinations of<br />

points from S, denoted convS, is called the convex hull of S.<br />

Note: convince yourself that the adjective ‘convex’ in the term ‘convex hull’ is<br />

well-deserved by proving that convex hull is indeed convex! Now, the lemma 1 can<br />

be written more succinctly: S = convS iff S is convex.<br />

1.2. Convex Hulls. The next theorem deals the following interesting property<br />

of convex hulls: the convex hull of a set S is the intersection of all convex sets<br />

containing S. Thus, in a natural sense, the convex hull of a set S is the ‘smallest’<br />

convex set containing S. In fact, many authors define convex hulls in that way and<br />

then prove our Definition 20 as theorem.


1. CONVEXITY 59<br />

Theorem 11. Let S ⊆ R n be a set then any convex set containing S also<br />

contains convS.<br />

Proof. Let A be a convex set such that S ⊆ A. By lemma 1 A contains all<br />

convex combinations of its points and, in particular, all convex combinations of<br />

points of its subset S, which is convS.<br />

□<br />

The next property is quite obvious and, again, frustrates attempts to generate<br />

‘superconvex’ sets, this time by trying to take convex hulls of convex hulls.<br />

1. Prove that convconvS = convS for any S.<br />

2. Prove that if A ⊂ B then convA ⊂ convB.<br />

The next property relates the operation of taking convex hulls and of taking<br />

direct sums. It does not matter in which order you use these operations.<br />

3. Prove that conv (A + B) = (convA) + (convB).<br />

4. Prove that conv (A ∩ B) ⊆ (convA) ∩ (convB).<br />

5. Prove that (convA) ∪ (convB) ⊆ conv (A ∪ B).<br />

1.3. Caratheodory’s Theorem. The definition 20 implies that any point x<br />

in the convex hull of S is representable as a convex combination of (finitely) many<br />

points of S but it places no restrictions on the number of points of S required<br />

to make the combination. Caratheodory’s Theorem puts the upper bound on the<br />

number of points required, in R n the number of points never has to be more than<br />

n + 1.<br />

Theorem 12 (Caratheodory, 1907). Let S ⊆ R n be a non-empty set then every<br />

x ∈ convS can be represented as a convex combination of (at most) n + 1 points<br />

from S.<br />

Note that the theorem does not ‘identify’ points used in representation, their<br />

choice would depend on x.<br />

Show by example that the constant n + 1 in Caratheodory’s theorem cannot<br />

be improved. That is, exhibit a set S ⊆ R n and a point x ∈ convS that cannot be<br />

represented as a convex combination of fewer than n + 1 points from S.<br />

1.4. Polytopes. The simplest convex sets are those which are convex hulls of<br />

a finite set of points, that is, sets of the form S = conv{x 1 , x 2 , ..., x m }. The convex<br />

hull of a finite set of points in R n is called a polytope.<br />

1. Prove that the set<br />

n+1<br />

∑<br />

∆ = {x ∈ R n+1 : x i = 1 and x i ≥ 0 for any i}<br />

i=1<br />

is a polytope. This polytope is called the standard n−dimensional simplex.<br />

2. Prove that the set<br />

C = {x ∈ R n+1 : 0 ≤ x i ≤ 1 for any i}<br />

is a polytope. This polytope is called an n−dimensional cube.<br />

3. Prove that the set<br />

n+1<br />

∑<br />

O = {x ∈ R n+1 : |x i | ≤ 1}<br />

i=1<br />

is a polytope. This polytope is called a (hyper)octahedron.<br />

1.5. Topology of Convex Sets.<br />

(1) The closure of a convex set is a convex set.<br />

(2) The interior of a convex set (possible empty) is convex.


60 4. TOPICS IN CONVEX ANALYSIS<br />

1.6. Aside: Helly’s Theorem. While there are not so many applications of<br />

Helly’s theorem to economics (in fact, I am aware of the only one paper that uses<br />

Helly’s theorem in economic context), it is definitely one of the most famous results<br />

in convexity.<br />

Theorem 13 (Helly, 1913). Let A 1 , A 2 , ..., A m ⊆ R n be a finite family of convex<br />

sets with m ≥ n + 1. Suppose that every n + 1 sets have a nonempty intersection.<br />

Then all sets have a nonempty intersection.<br />

To prove Helly’s theorem with elegance we need first to formulate a very useul<br />

result obtianed by J.Radon.<br />

Theorem 14 (Radon, 1921). Let S ⊆ R n be a set of at least n + 2 points.<br />

Then there are two non-intersecting subsets R ⊂ S (‘red points’) and B ⊂ S (‘blue<br />

points’) such that<br />

convR ∩ convB ≠ ∅.<br />

Proof. Let x 1 , ..., x m be m ≥ n + 2 distinct points from S. Consider the<br />

system of n + 1 homogeneous linear equations in variables γ 1 , ..., γ m<br />

γ 1 x 1 + ... + γ m x m = 0 and γ 1 + ... + γ m = 0<br />

Since m ≥ n + 2, there is a nontrivial solution to this system. Let<br />

R = {x i : γ i > 0} and B = {x i : γ i < 0}.<br />

Then R ∩ B = ∅. Let β =<br />

∑<br />

∑<br />

γ i then β > 0 and γ i = −β, since γ’s sum<br />

up to zero. Moreover,<br />

since ∑ γ i x i = 0. Let<br />

x =<br />

i:γ i>0<br />

∑<br />

i:γ i>0<br />

∑<br />

i:γ i>0<br />

γ i x i =<br />

γ i<br />

β xi =<br />

∑<br />

i:γ i


2. SUPPORT AND SEPARATION 61<br />

2. Support and Separation<br />

2.1. Hyperplanes. The concept of hyperplane in R n is a straightforward generalisation<br />

of the notion of a line in R 2 and of a plane in R 3 . A line in R 2 can be<br />

described by an equation<br />

p 1 x 1 + p 2 x 2 = α<br />

where p = (p 1 , p 2 ) is some non-zero vector and α is some scalar. A plane in R 3 can<br />

be described by an equation<br />

p 1 x 1 + p 2 x 2 + p 3 x 3 = α<br />

where p = (p 1 , p 2 , p 3 ) is some non-zero vector and α is some scalar. Similarly, a<br />

hyperplane in R n can be described by an equation<br />

∑ n<br />

i=1 p ix i = α<br />

where p = (p 1 , p 2 , ..., p n ) is some non-zero vector in R n and α is some scalar. It can<br />

be written in more concise way using scalar (aka inner, dot) product notation.<br />

Definition 21. A hyperplane is the set<br />

H(p, α) = {x ∈ R n : p · x = α}<br />

where p ∈ R n is a non-zero vector and α is a scalar. The vector p is called the<br />

normal to the hyperplane H.<br />

Suppose that there are two points x ∗ , y ∗ ∈ H(p, α). Then by definition p·x ∗ = α<br />

and p · y ∗ = α. Hence p · (x ∗ − y ∗ ) = 0. In other words, vector p is orthogonal to<br />

the line segment (x ∗ − y ∗ ), or to H(p, α).<br />

Given a hyperplane H ⊂ R n points in R n can be classified according to their<br />

positions relative to hyperplane. The (closed) half-space determined by the hyperplane<br />

H(p, α) is either the set of points ‘below’ H or the set of points ‘above’ H,<br />

i.e., either the set {x ∈ R n : p · x ≤ α} or the set {x ∈ R n : p · x ≥ α}. Open<br />

half-spaces are defined by strict inequalities. Prove that a closed half-space is closed<br />

and open half-space is open.<br />

The straightforward economic example of a half-space is a budget set {x ∈<br />

R n : p · x ≤ α} of a consumer with income α facing the vector of prices p. (It was<br />

rather neat to call the normal vector p, wasn’t it?). By the way, hyperplanes and<br />

half-spaces are convex sets (Prove it).<br />

2.2. Support Functions. In this section we give a description of what is<br />

called a dual structure. Consider the set of all closed convex subsets of R n . We<br />

will show that to each such set S we can associate an extended-real valued function<br />

µ S : R n → R ∪ {∞}, that is a function that maps each vector in R n to either a real<br />

number or to −∞. Not all such functions can be arrived at in this way. In fact<br />

we shall show that any such function must be concave and homogeneous of degree<br />

1. But once we restrict attention to functions that can be arrived at as a “support<br />

function” for some such closed convex set we have another set of objects that we<br />

can analyse and perhaps make useful arguments about the original sets in which<br />

we where interested.<br />

In fact, we shall define the function µ S for any subset of R n , not just the closed<br />

and convex ones. However, if the original set S is not a closed convex one we shall<br />

lose some information about S in going to µ S . In particular, µ S only depends on<br />

the closed convex hull of S, that is, if two sets have the same closed convex hull<br />

they will lead to the same function µ S .<br />

We define µ S : R n → R ∪ {−∞} as<br />

µ S (p) = inf{p · x | x ∈ S},


62 4. TOPICS IN CONVEX ANALYSIS<br />

where inf denotes the infimum or greatest lower bound. It is a property of the<br />

real numbers that any set of real numbers has an infimum. Thus µ S (p) is well<br />

defined for any set S. If the minimum exists, for example if the set S is compact,<br />

then the infimum is the minimum. In other cases the minimum may not exist. To<br />

take a simple one dimensional example suppose that the set S was the subset of R<br />

consisting od the numbers 1/n for n = 1, . . . and that p = 2. Then clearly p·x = px<br />

does not have a minimum on the set S However 0 is less than px = 2x for any value<br />

of x in S but for any number a greater than 0 there is a value of x in S such that<br />

px < a. Thus 0 is in this case the infimum of the set {p · x | x ∈ S}.<br />

Recall that we have not assumed that S is convex. However, if we do assume<br />

that S is both convex and closed then the function µ S contains all the information<br />

needed to reconstruct S.<br />

Given any extended-real valued function µ : R n → R ∪ {∞} let us define the<br />

set S µ as<br />

S µ = {x ∈ R n | p · x ≥ µ(p) for every p ∈ R n }.<br />

That is, for each p > −infty we define the closed half space<br />

{x ∈ R n | p · x ≥ µ(p)}.<br />

Notice that is µ(p) = −∞ then p · x ≥ µ(p) for any x and so the above set will be<br />

R n rather than a half space. The set S µ is the intersection of all these closed half<br />

spaces. Since the intersection of convex sets is convex and the intersection of closed<br />

sets is closed, the set S µ is, for any function µ, a closed convex set.<br />

Suppose that we start with a set S, define µ S as above and then use µ S to<br />

define the set S µS . If the set S was a closed convex set then S µS will be exactly<br />

equal to S. Since we have seen that S µS is a closed convex set, it must be that if<br />

S is not a closed convex set it will not be equal to S µS . However S will always be<br />

a subset of S µS , and indeed S µS will be the smallest closed convex set such that S<br />

is a subset, that is S µS is the closed convex hull of S.<br />

2.3. Separation. We now consider the notion of ‘separating’ two sets by a<br />

hyperplane.<br />

Definition 22. A hyperplane H separates sets A and B if A is contained in<br />

one closed half-space and B is contained in the other. A hyperplane H strictly<br />

separates sets A and B if A is contained in one open half-space and B is contained<br />

in the other.<br />

It is clear that strict separation requires the two sets to be disjoint. For example,<br />

consider two (externally) tangent circles in a plane. Their common tangent line<br />

separates them but does not separate them strictly. On the other hand, although it<br />

is necessary for two sets be disjoint in order to strictly separate them, this condition<br />

is not sufficient, even for closed convex sets. Let A = {x ∈ R 2 : x 1 > 0 and<br />

x 1 x 2 ≥ 1} and B = {x ∈ R 2 : x 1 ≥ 0 and x 2 = 0} then A and B are disjoint<br />

closed convex sets but they cannot be strictly separated by a hyperplane (line in<br />

R 2 ). Thus the problem of the existence of separating hyperplane is more involved<br />

then it may appear to be at first.<br />

We start with separation of a set and a point.<br />

Theorem 15. Let S ⊆ R n be a convex set and x 0 /∈ S be a point. Then S and<br />

x 0 can be separated. If S is closed then S and x can be strongly separated.<br />

Idea of proof. Proof proceeds in two steps. The first step establishes the<br />

existence a point a in the closure of S which is the closest to x 0 . The second step<br />

constructs the separating hyperplane using the point a.


2. SUPPORT AND SEPARATION 63<br />

STEP 1. There exists a point a ∈ ¯S (closure of S) such that d(x 0 , a) ≤ d(x, a)<br />

for all x ∈ ¯S, and d(x 0 , a) > 0.<br />

Let ¯B(x 0 ) be closed ball with centre at x 0 that intersects the closure of S.<br />

Let A = ¯B(x 0 ) ∩ ¯S ≠ ∅. The set A is nonempty, closed and bounded (hence<br />

compact). According to Weierstrass’s theorem, the continuous distance function<br />

d(x 0 , x) achieves its minimum in A. That is, there exists a ∈ A such that d(x 0 , a) ≤<br />

d(x, a) for all x ∈ ¯S. Note that d(x 0 , a) > 0<br />

STEP 2. There exists a hyperplane H(p, α) = {x ∈ R n : p · x = α} such that<br />

p · x ≥ α for all x ∈ ¯S and p · x < α.<br />

Construct a hyperplane which goes through the point a ∈ ¯S and has normal<br />

p = a − x 0 . The proof that this hyperplane is the separating one is done by<br />

contradiction. Suppose there exists a point y ∈ ¯S which is strictly on the same side<br />

of H as x 0 . Consider the point y ′ ∈ [a, y] such that the vector y ′ − x 0 is orthogonal<br />

to y − a. Since d(x 0 , y) ≥ d(x 0 , a), the point y ′ is between a and y. Thus, y ∈ ¯S<br />

and d(x 0 , y ′ ) ≤ d(x 0 , a) which contradicts the choice of a. When S = ¯S, that is S<br />

is closed, the separation can be made strict by choosing a point strictly in between<br />

a and x 0 instead of a. This is always possible because d(x 0 , a) > 0.<br />

□<br />

Theorem 15 is very useful because separation of a pair of sets can be always<br />

reduced to separation of a set and a point.<br />

Lemma 2. Let A and B be a non-empty sets. A and B can be separated<br />

(strongly separated) iff A − B and 0 can be separated (strongly separated).<br />

Proof. If A and B are convex then A − B is convex. If A is compact and B<br />

is closed then A − B is closed. And 0 /∈ A − B iff A ∩ B = ∅.<br />

□<br />

Theorem 16 (Minkowski, 1911). Let A and B be a non-empty convex sets<br />

with A ∩ B = ∅. Then A and B can be separated. If A is compact and B is closed<br />

then A and B can be strongly separated.<br />

2.4. Support. Closely (not in topological sense) related to the notion of a<br />

separating hyperplane is the notion of supporting hyperplane.<br />

Definition 23. The hyperplane H supports the set S at the point x 0 ∈ S if<br />

x 0 ∈ H and S is a subset of one of the half-spaces determined by H.<br />

A convex set can be supported at any of its boundary points, this is the immediate<br />

consequence of Theorem 16. To prove it, consider the sets A and B = {x 0 },<br />

where x 0 is a boundary point of A.<br />

Theorem 17. Let S ⊆ R n be a convex set with nonempty interior and x 0 ∈ S<br />

be its boundary point. Then there exist a supporting hyperplane for S at x 0 .<br />

Note that if the boundary of a convex set is smooth (‘differentiable’) at the<br />

given point x 0 then the supporting hyperplane is unique and is just the tangent<br />

hyperplane. If, however, the boundary is not smooth then there can be many<br />

supporting hyperplanes passing through the given point. It is important to note<br />

that conceptually the supporting theorems are connected to calculus. But, the<br />

supporting theorems are more powerful (don’t require smoothness), more direct,<br />

and more set-theoretic.<br />

Certain points on the boundary of a convex set carry a lot of information about<br />

the set.<br />

Definition 24. A point x of a convex set S is an extreme point of S if x is<br />

not an interior point of any line segment in S.


64 4. TOPICS IN CONVEX ANALYSIS<br />

The extreme points of a closed ball and of a closed cube in R 3 are its boundary<br />

points and its eight vertices, respectively. A half-space has no extreme points even<br />

if it is closed.<br />

An interesting property of extreme points is that an extreme point can be<br />

deleted from the set without destroying convexity of the set. That is, a point x in<br />

a convex set S is an extreme point iff the set S\{x} is convex.<br />

The next Theorem is a finite-dimensional version of a quite general and powerful<br />

result by M.G. Krein and D.P. Milman.<br />

Theorem 18 (Krein & Milman, 1940). Let S ⊆ R n be convex and compact.<br />

Then S is the convex hull of its extreme points.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!