ECON 381 SC Foundations Of Economic Analysis ... - Economics
ECON 381 SC Foundations Of Economic Analysis ... - Economics
ECON 381 SC Foundations Of Economic Analysis ... - Economics
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>ECON</strong> <strong>381</strong> <strong>SC</strong> <strong>Foundations</strong> <strong>Of</strong> <strong>Economic</strong> <strong>Analysis</strong><br />
2009<br />
John Hillas and Dmitriy Kvasov<br />
University of Auckland
Contents<br />
Chapter 1. Logic, Sets, Functions, and Spaces 1<br />
1. Logic 1<br />
2. Sets 3<br />
3. Binary Relations 4<br />
4. Functions 5<br />
5. Spaces 7<br />
6. Metric Spaces and Continuous Functions 8<br />
7. Open sets, Compact Sets, and the Weierstrass Theorem 10<br />
8. Sequences and Subsequences 11<br />
9. Linear Spaces 14<br />
Chapter 2. Linear Algebra 17<br />
1. The Space R n 17<br />
2. Linear Functions from R n to R m 19<br />
3. Matrices and Matrix Algebra 20<br />
4. Matrices as Representations of Linear Functions 22<br />
5. Linear Functions from R n to R n and Square Matrices 24<br />
6. Inverse Functions and Inverse Matrices 25<br />
7. Changes of Basis 25<br />
8. The Trace and the Determinant 28<br />
9. Calculating and Using Determinants 30<br />
10. Eigenvalues and Eigenvectors 34<br />
Chapter 3. Consumer Behaviour: Optimisation Subject to the Budget<br />
Constraint 37<br />
1. Constrained Maximisation 37<br />
2. The Implicit Function Theorem 41<br />
3. The Theorem of the Maximum 43<br />
4. The Envelope Theorem 45<br />
5. Applications to Microeconomic Theory 49<br />
Chapter 4. Topics in Convex <strong>Analysis</strong> 57<br />
1. Convexity 57<br />
2. Support and Separation 61<br />
i
CHAPTER 1<br />
Logic, Sets, Functions, and Spaces<br />
1. Logic<br />
All the aspects of logic that we describe in this section are part of what is called<br />
first order or propositional logic.<br />
We start by supposing that we have a number of atomic statements, which we<br />
denote by lower case letters, p, q, r. Examples of such statements might be<br />
Consumer 1 is a utility maximiser<br />
the apple is green<br />
the price of good 3 is 17.<br />
We assume that each atomic statement is either true or false.<br />
Given these atomic statements we can form other statements using logical connectives.<br />
If p is a statement then ¬p, read not p, is the statement that is true precisely<br />
when p is false. If both p and q are statements then p ∧ q, read p and q, is the<br />
statement that is true when both p and q are true and false otherwise. If both p<br />
and q are statements then p ∨ q, read p or q, is the statement that is true when<br />
either p and q are true, that is, the statement that is false only if both p and q are<br />
false.<br />
We could make do with these three symbols together with brackets to group<br />
symbols and tell us what to do first. For example we could have the complicated<br />
statement ((p ∧ q) ∨ (p ∧ r)) ∨ ¬s. This means that at least one of two statements<br />
is true. The first is that either both p and q are true or both p and r are true. The<br />
second is that s is not true.<br />
Exercise 1. Think about the meaning of the statement we have just considered.<br />
Can you see a more straightforward statement that would mean the same<br />
thing?<br />
While we don’t strictly need any more symbols it is certainly convenient to<br />
have at least a couple more. If both p and q are statements then p ⇒ q, read if p<br />
then q or p implies q or p is sufficient for q or q is necessary for p, is the statement<br />
that is false when p is true and q is false and is true otherwise. Many people find<br />
this a bit nonintuitive. In particular, one might wonder about the truth of this<br />
statement when p is false and q is true. A simple (and correct) answer is that this<br />
is a definition. It is simply what we mean by the symbol and there isn’t any point<br />
in arguing about definitions. However there is a sense in which the definition is<br />
what is implied by the informal statements. When we say “if p then q” we are<br />
saying that in any situation or state in which p is true then q is also true. We are<br />
not making any claim about what might or might not be the case when p is not<br />
true. So, in states in which p is not true we make no claim about q and so our<br />
statement is true whether q is true or false. Instead of p ⇒ q we can write q ⇐ p.<br />
In this case we are most likely to read the statement as q only if p or q is necessary<br />
for p.<br />
1
2 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />
If p ⇒ q and p ⇐ q (that is q ⇒ p) then we say that p if and only if q or p is<br />
necessary and sufficient for q and write p ⇔ q.<br />
One powerful method of analysing logical relationships is by means of truth<br />
tables. A truth table lists all possible combinations of the truth values of the<br />
atomic statements and the associated truth values of the compound statements.<br />
If we have two atomic statements then the following table gives the four possible<br />
combinations of truth values.<br />
p q<br />
T T<br />
F T<br />
T F<br />
F F<br />
Now, we can add a column that would, for each combination of truth values of<br />
p and q, give the truth value of p ⇒ q, just as described above.<br />
p q p ⇒ q<br />
T T T<br />
F T T<br />
T F F<br />
F F T<br />
Such truth tables allow us to see the logical relationship between various statements.<br />
Suppose we have two compound statements A and B and we form a truth<br />
table showing the truth values of A and B for each possible profile of truth values<br />
of the atomic statements that constitute A and B. If in each row in which A is true<br />
B is also true then statement A implies statement B. If statements A and B have<br />
the same truth value is each row then statements A and B are logically equivalent.<br />
For example I claim that the statement p ⇒ q we have just considered is logically<br />
equivalent to ¬p ∨ q. We can see this by adding columns to the truth table we have<br />
just considered. Let me add a column for ¬p and then one for ¬p ∨ q. (we only add<br />
the column for ¬p to make it easier).<br />
p q p ⇒ q ¬p ¬p ∨ q<br />
T T T F T<br />
F T T T T<br />
T F F F F<br />
F F T T T<br />
Since the third column and the fifth column contain exactly the same truth values<br />
we see that the two statements, p ⇒ q and ¬p ∨ q are indeed logically equivalent.<br />
Exercise 2. Construct the truth table for the statement ¬(¬p ∨ ¬q). Is it<br />
possible to write this statement using fewer logical connectives? Hint: why not<br />
start with just one?<br />
Exercise 3. Prove that the following statements are equivalent:<br />
(i) (p ∨ ¬q) ⇒ ((¬p) ∧ q) and ¬(q ⇒ p),<br />
(ii) p ⇒ q and ¬q ⇒ ¬p.<br />
In part (ii) the second statement is called the contrapositive of the first statement.<br />
<strong>Of</strong>ten if you are asked to prove that p implies q it will be easier to show the<br />
contrapositive, that is, that not q implies not p.<br />
Exercise 4. Prove that the following statements are equivalent:<br />
(i) ¬(p ∧ q) and ¬p ∨ ¬q,<br />
(ii) ¬(p ∨ q) and ¬p ∧ ¬q.
2. SETS 3<br />
These two equivalences are known as De Morgan’s Laws.<br />
A tautology is a statement that is necessarily true. For example if the statements<br />
A and B are logically equivalent then the statement A ⇔ B is a tautology.<br />
If A logically implies B then A ⇒ B is a tautology. We can check whether a compound<br />
statement is a tautology by writing a truth table for this statement. If the<br />
statement is a tautology then its truth value should be T in each row of its truth<br />
table.<br />
A contradiction is a statement that is necessarily false, that is, a statement A<br />
such that ¬A is a tautology. Again, we can see whether a statement is a contradiction<br />
by writing a truth table for the statement.<br />
2. Sets<br />
Set theory was developed in the second half of the 19th century and is at the<br />
very foundation of modern mathematics. But we shall not be concerned here with<br />
the development of the theory. Rather we shall only give the basic language of set<br />
theory and outline some of the very basic operations on sets.<br />
We start by defining a set to be a collection of objects or elements. We will<br />
usually denote sets by capital letters and their elements by lower case letters. If<br />
the element a is in the set A we write a ∈ A. If every element of the set B is also<br />
in the set A we call B a subset of the set A and write B ⊂ A. We shall also say<br />
the A contains B. If A and B have exactly the same elements then we say they<br />
are equal or identical. Alternatively we could say A = B if and only if A ⊂ B and<br />
B ⊂ A. If B ⊂ A and B ≠ A then we say that B is a proper subset of A or that A<br />
strictly contains B.<br />
Exercise 5. How many subsets a set with N elements has?<br />
In order to avoid the paradoxes such as the one referred to in the first paragraph<br />
we shall always assume that in whatever situation we are discussing there is some<br />
given set U called the universal set which contains all of the sets with which we<br />
shall deal.<br />
We customarily enclose our specification of a set by braces. In order to specify<br />
a set one may simply list the elements. For example to specify the set D which<br />
contains the numbers 1,2, and 3 we may write D = {1, 2, 3}. Alternatively we may<br />
define the set by specifying a property that identifies the elements. For example<br />
we may specify the same set D by D = {x | x is an integer and 0 < x < 4}. Notice<br />
that this second method is more powerful. We could not, for example, list all<br />
the integers. (Since there are an infinite number of them we would die before we<br />
finished.)<br />
For any two sets A and B we define the union of A and B to be the set which<br />
contains exactly all of the elements of A and all the elements of B. We denote the<br />
union of A and B by A ∪ B. Similarly we define the intersection of A and B to<br />
be that set which contains exactly those elements which are in both A and B. We<br />
denote the intersection of A and B by A ∩ B. Thus we have<br />
A ∪ B = {x | x ∈ A or x ∈ B}<br />
A ∩ B = {x | x ∈ A and x ∈ B}.<br />
Exercise 6. The oldest mathematician among chess players and the oldest<br />
chess player among mathematicians is it the same person or (possibly) different<br />
ones?<br />
Exercise 7. The best mathematician among chess players and the best chess<br />
player among mathematicians is it the same person or (possibly) different ones?
4 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />
Exercise 8. Every tenth mathematician is a chess player and every fourth<br />
chess player is a mathematician. Are there more mathematicians or chess players<br />
and by how many times?<br />
Exercise 9. Prove the distributive laws for operations of union and intersection.<br />
(i) (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)<br />
(ii) (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)<br />
Just as the number zero is extremely useful so the concept of a set that has<br />
no elements is extremely useful also. This set we call the empty set or the null set<br />
and denote by ∅. To see one use of the empty set notice that having such a concept<br />
allows the intersection of two sets be well defined whether or not the sets have any<br />
elements in common.<br />
We also introduce the concept of a Cartesian product. If we have two sets, say<br />
A and B, the Cartesian product, A × B, is the set of all ordered pairs, (a, b) such<br />
that a is an element of A and b is an element of B. Symbolically we write<br />
A × B = {(a, b) | a ∈ A and b ∈ B}.<br />
3. Binary Relations<br />
There are a number of ways of formulating the notion of a binary relation. We<br />
shall pursue one, defining a binary relation on a set X simply as a subset of X × X,<br />
the Cartesian product of X with itself.<br />
Definition 1. A binary relation R on the set X is a subset of X × X. If the<br />
point (x, y) ∈ R we shall often write xRy instead of (x, y) ∈ R.<br />
Since we have already defined the notions of Cartesian product and subset,<br />
there is really nothing new here. However the structure and properties of binary<br />
relations that we shall now study is motivated by the informal notion of a “relation”<br />
between the elements of X.<br />
Example 1. Suppose that X is a set of boys and girls and the relation xSy is<br />
“x is a sister of y.”<br />
Example 2. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.<br />
There are binary relations >, ≥, and =.<br />
Example 3. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.<br />
The relations R, P , and I are defined by<br />
xRy if and only if x + 1 ≥ y,<br />
xP y if and only if x > y + 1, and<br />
xIy if and only if −1 ≤ x − y ≤ 1.<br />
Definition 2. The following properties of binary relations have been defined<br />
and found to be useful.<br />
(BR1) Reflexivity: For all x in X xRx.<br />
(BR2) Irreflexivity: For all x in X not xRx.<br />
(BR3) Completeness: For all x and y in X either xRy or yRx (or both). 1<br />
(BR4) Transitivity: For all x, y, and z in X if xRy and yRz then xRz.<br />
(BR5) Negative Transitivity: For all x, y, and z in X if xRy then either<br />
xRz or zRy (or both).<br />
(BR6) Symmetry: For all x and y in X if xRy then yRx.<br />
(BR7) Anti-Symmetry: For all x and y in X if xRy and yRx then x = y.<br />
(BR8) Asymmetry: For all x and y in X if xRy then not yRx.<br />
1 We shall always implicitly include “or both” when we say “either. . . or.”
4. FUNCTIONS 5<br />
Exercise 10. Show that completeness implies reflexivity, that asymmetry implies<br />
anti-symmetry, and that asymmetry implies irreflexivity.<br />
Exercise 11. Which properties does the relation described in Example 1 satisfy?<br />
Exercise 12. Which properties do the relations described in Example 2 satisfy?<br />
Exercise 13. Which properties do the relations described in Example 3 satisfy?<br />
We now define a few particularly important classes of binary relations.<br />
Definition 3. A weak order is a binary relation that satisfies transitivity and<br />
completeness.<br />
Definition 4. A strict partial order is a binary relation that satisfies transitivity<br />
and asymmetry.<br />
Definition 5. An equivalence is a binary relation that satisfies transitivity<br />
and symmetry.<br />
You have almost certainly already met examples of such binary relations in<br />
your study of <strong>Economic</strong>s. We normally assume that weak preference, strict preference,<br />
and indifference of a consumer are weak orders, strict partial orders, and<br />
equivalences, though we actually typically assume a little more about the strict<br />
preference.<br />
The following construction is also motivated by the idea of preference. Let<br />
us consider some binary relation R which we shall informally think of as a weak<br />
preference relation, though we shall not, for the moment, make any assumptions<br />
about the properties of R. Consider the relations P defined by xP y if and only if<br />
xRy and not yRx and I defined by xRy if and only if xRy and yRx.<br />
Exercise 14. Show that if R is a weak order then P is a strict partial order<br />
and I is an equivalence.<br />
We could also think of starting with a strict preference P and defining the weak<br />
preference R in terms of P . We could do so either by defining R as xRy if and only<br />
if not yP x or by defining R as xRy if and only if either xP y or not yP x.<br />
Exercise 15. Show that these two definitions of R coincide if P is asymmetric.<br />
Exercise 16. Show by example that P may be a strict partial order (so, by<br />
the previous result, the two definitions of R coincide) but R not a weak order.<br />
[Hint: If you cannot think of another example consider the binary relations defined<br />
in Example 3.]<br />
Exercise 17. Show that if P is asymmetric and negatively transitive then<br />
(i) P is transitive (and hence a strict partial order), and<br />
(ii) R is a weak order.<br />
4. Functions<br />
Let X and Y be two sets. A function (or a mapping) f from the set X to the<br />
set Y is a rule that assigns to each x in X a unique element in Y , denoted by f(x).<br />
The notation<br />
f : X → Y.
6 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />
is standard. The set X is called the domain of f and the set Y is called the<br />
codomain of f. The set of all values taken by f, i.e. the set<br />
{y ∈ Y | there exists x in X such that y = f(x)}<br />
is called the range of f. The range of a function need not coincide with its codomain<br />
Y .<br />
There are several useful ways of visualising functions. A function can be thought<br />
of as a machine that operates on elements of the set X and transforms an input<br />
x into a unique output f(x). Note that the machine is not required to produce<br />
different outputs from different inputs. This analogy helps to distinguish between<br />
the function itself, f, and its particular value, f(x). The former is the machine,<br />
the latter is the output 2 ! One of the reasons for this confusion is that in practice,<br />
to avoid being verbose, people often say things like ‘consider a function U(x, y) =<br />
x α y β ’ instead of saying ‘consider a function defined for every pair (x, y) in R 2 by<br />
the equation U(x, y) = x α y β ’.<br />
A function can also be thought of as a transformation, or a mapping, of the set<br />
X into the set Y . In line with this interpretation is the common terminology, it is<br />
said that f(x) is the image of x under the function f. Again, it is important to<br />
remember that there may be points of Y which are the images of no point of X and<br />
that there may be different points of X which have the same images in Y . What is<br />
absolutely prohibited, however, is for a point from X to have several images in Y !<br />
The part of definition of the function is the specification of its domain. However,<br />
in applications, functions are quite often defined as an algebraic formula, without<br />
explicit specification of its domain. For example, a function may be defined as<br />
f(x) = sin x + 145x 2 .<br />
The function f is then the rule that assigns the value sin x + 145x 2 to each value of<br />
x. The convention in such cases is that the domain of f is the set of all values of x<br />
for which the formula gives a unique value. Thus, if you come, for instance, across<br />
the function f(x) = 1/x you should assume that its domain is (−∞, 0) ∪ (0, ∞),<br />
unless specified otherwise.<br />
For any subset A of X, the subset f(A) of Y such that y = f(x) for some x in<br />
X is called the image of A by f, that is,<br />
f(A) = {y ∈ Y | there exists x in A such that y = f(x)}.<br />
Thus, the the range of f can be written as f(X). Similarly, one can define the<br />
inverse image. For any subset B of Y , the inverse image f −1 (B) of B is the set of<br />
x in X such that f(x) is in B, that is,<br />
f −1 (B) = {x ∈ X | f(x) ∈ B}.<br />
A function f is called a function onto Y (or surjection) if the range of f is Y ,<br />
i.e., if for every y ∈ Y there is (at least) one x ∈ X such that y = f(x). In other<br />
words, each element of Y is the image of (at least) one element of X. A function f is<br />
called one-to-one (or injection) if f(x 1 ) = f(x 2 ) implies x 1 = x 2 , that is, for every<br />
element y of f(X) there is a unique element x of X such that y = f(x). In other<br />
words, one-to-one function maps different elements of X into different elements of<br />
Y . When a function f : X → Y is both onto and one-to-one it is called a bijection.<br />
Exercise 18. Suppose that a set X has m elements and a set Y has n ≥ m<br />
elements. How many different functions are there from X to Y ? from Y to X?<br />
How many of them surjective? How many of them injective? How many of them<br />
bijective?<br />
2 Mathematician Robert Bartle put it as follows. ”Only a fool would confuse sausage-grinder<br />
with a sausage; however, enough people have confused functions with their values...”
5. SPACES 7<br />
Exercise 19. Find a function f : N → N which is<br />
(i) surjective but not injective,<br />
(ii) injective but not surjective,<br />
(iii) neither surjective nor injective,<br />
(iv) bijective<br />
If function f is a bijection then it is possible to define a function g : Y → X<br />
such that g(y) = x where x = f(y). Thus, to each element y of Y is assigned an<br />
element x in X whose image under f is y. Since f is onto, g is defined for every y<br />
of Y and since f is one-to-one g(y) is unique. The function g is called the inverse of<br />
f and is usually written as f −1 . In that case, however, it’s not immediately clear<br />
what f −1 (x) means. Is it the inverse image of x under f or the image of x under<br />
f −1 ? Happily enough they are the same if f −1 exists!<br />
Exercise 20. Prove that when a function f −1 exists it is both onto and oneto-one<br />
and that the inverse of f −1 is the function f itself.<br />
If f : X → Y and g : Y → Z, then the function h : X → Z, defined as<br />
h(x) = g(f(x)), is called the composition of g with f and denoted by g ◦ f. Note<br />
that even if f ◦ g is well-defined it is usually, different from g ◦ f.<br />
Exercise 21. Let f : X → Y . Prove that there exist a surjection g : X → A<br />
where A ⊆ X and a injection h : A → Y such that f = h ◦ g. In other words, prove<br />
that any function can be written as a composition of a surjection and an injection.<br />
The set G ⊂ X ×Y of ordered pairs (x, f(x)) is called the graph of the function<br />
f 3 . <strong>Of</strong> course, the fact that something is called a graph does not necessarily mean<br />
that it can be drawn!<br />
5. Spaces<br />
Sets are reasonably interesting mathematical objects to study. But to make<br />
them even more interesting (and useful for applications) sets are usually endowed<br />
with some additional properties, or structures. These new objects are called spaces.<br />
The structures are often modeled after the familiar properties of space we live in and<br />
reflect (in axiomatic form) such notions as order, distance, addition, multiplication,<br />
etc.<br />
Probably one of the most intuitive spaces is the space of the real numbers, R.<br />
We will briefly look at the axiomatic way of describing some of its properties.<br />
Given the set of real numbers R, the operation of addition is the function<br />
+ : R × R → R that maps any two elements x and y in R to an element denoted<br />
by x + y and called the sum of x and y. The addition satisfies the following axioms<br />
for all real numbers x, y, and z.<br />
A1: x + y = y + x.<br />
A2: (x + y) + z = x + (y + z).<br />
A3: There exist an element, denoted by 0, such that x + 0 = x.<br />
A4: For each x there exist an element, denoted by −x, such that x + (−x) = 0.<br />
All the remaining properties of the addition can be proven using these axioms.<br />
Note also that we can define another operation x − y as x + (−y) and call it<br />
subtraction.<br />
3 Some people like the idea of the graph of a function so much that they define a function to<br />
be its graph.
8 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />
Exercise 22. Prove that the axioms for addition imply the following statements.<br />
(i) The element 0 is unique.<br />
(ii) If x + y = x + z then y = z (a cancelation law).<br />
(iii) −(−x) = x.<br />
The operation of multiplication can be axiomatised in a similar way. Given the<br />
set of real numbers, R, the operation of multiplication is the function · : R × R → R<br />
that maps any two elements x and y in R to an element denoted by x · y and called<br />
the product of x and y. The multiplication satisfies the following axioms for all real<br />
numbers x, y, and z.<br />
A5: x · y = y · x.<br />
A6: (x · y) · z = x · (y · z).<br />
A7: There exist an element, denoted by 1, such that x · 1 = x.<br />
A8: For each x ≠ 0 there exist an element, denoted by x −1 , such that<br />
x · x −1 = 1.<br />
One more axiom (a distributive law) brings these two operations, addition and<br />
multiplication 4 , together.<br />
A9: x(y + z) = xy + xz for all x, y, and z in R.<br />
Another structure possessed by the real numbers has to do with the fact that<br />
the real numbers are ordered. The notion of x less than y can be axiomatised as<br />
follows. For any two distinct elements x and y either x < y or y < x and, in<br />
addition, if x < y and y < z then x < z.<br />
Another example of a space (very important and useful one) is n−dimensional<br />
real space 5 . Given the natural number n, define R n to be the set of all possible<br />
ordered n−tuples of n real numbers, with generic element denoted by x =<br />
(x 1 , . . . , x n ). Thus, the space R n is the n−fold Cartesian product of the set R with<br />
itself. Real numbers x 1 , . . . , x n are called coordinates of the vector x. Two vectors<br />
x and y are equal if and only if x 1 = y 1 , . . . , x n = y n . The operation of addition of<br />
two vectors is defined as<br />
x + y = (x 1 + y 1 , . . . , x n + y n ).<br />
Exercise 23. Prove that the addition of vectors in R n satisfies the axioms of<br />
addition.<br />
The role of multiplication in this space is player by the operation of multiplication<br />
by real number defined for all x in R n and all α in R by<br />
αx = (αx 1 , . . . , αx n ).<br />
Exercise 24. Prove that the multiplication by real number satisfies a distributive<br />
law.<br />
6. Metric Spaces and Continuous Functions<br />
The notion of metric is the generalisation of the notion of distance between two<br />
real numbers.<br />
Let X be a set and d : X ×X → R a function. The function d is called a metric<br />
if it satisfies the following properties for all x, y, and z in X.<br />
1. d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y,<br />
2. d(x, y) = d(y, x),<br />
3. d(x, y) ≤ d(x, z) + d(z, y).<br />
4 From now on, to go easy on notation we will follow the standard convention not to write<br />
the symbol for multiplication, that is to write xy instead of x · y, etc.<br />
5 We haven’t defined what the word dimension means yet, so just treat it as a (fancy) name.
6. METRIC SPACES AND CONTINUOUS FUNCTIONS 9<br />
The set X together with the function d is called a metric space, elements of X<br />
are usually called points, and the number d(x, y) is called the distance between x<br />
and y. The last property of a metric is called triangle inequality.<br />
Exercise 25. Let X be a non-empty set and d : X × X → R be the function<br />
that satisfies the following two properties for all x, y, and z in X.<br />
(i) d(x, y) = 0 if and only if x = y,<br />
(ii) d(x, y) ≤ d(x, z) + d(z, y).<br />
Prove that d is a metric.<br />
Exercise 26. Prove that d(x, y)+d(w, z) ≤ d(x, w)+d(x, z)+d(y, w)+d(y, z)<br />
for all x, y, w, and z in X, where d is some metric on X.<br />
An obvious example of a metric space is the the set of real numbers, R, together<br />
with the ‘usual’ distance, d(x, y) = |x − y|. Another example is the n−dimensional<br />
Euclidean space R n with metric<br />
d(x, y) = √ (x 1 − y 1 ) 2 + · · · + (x n − y n ) 2 .<br />
Note that the same set can be endowed with the different metrics thus resulting<br />
in the different metric spaces! For example, the set of all n−tuples of real numbers<br />
can be made into metric space by use of the (non-Euclidean) metric<br />
d T (x, y) = |x 1 − y 1 | + · · · + |x n − y n |,<br />
which is different from metric space R n . This metric is sometimes called the Manhattan<br />
(or taxicab) metric. Another curious metric is the so-called French railroad<br />
metric, defined by<br />
{<br />
d F (x, y) =<br />
0 if x = y<br />
d(x, P ) + d(y, P ) if x ≠ y<br />
where P is the particular point of R n (called Paris) and function d is the Euclidean<br />
distance.<br />
Exercise 27. Prove that the French railroad metric d F is a metric.<br />
Exercise 28. Let X be a non-empty set and d : X × X → R be the function<br />
defined by<br />
{ 1 if x ≠ y<br />
d(x, y) =<br />
0 if x = y<br />
Prove that d is a metric. (This metric is called the discrete metric.)<br />
Using the notion of metric it is possible to generalise the idea of continuous<br />
function.<br />
Suppose (X, d X ) and (Y, d Y ) are metric spaces, x 0 ∈ X, and f : X → Y is a<br />
function. Then f is continuous at x 0 if for every ε > 0 there exists a δ > 0 such<br />
that<br />
d Y (f(x 0 ), f(x)) < ε<br />
for all points x ∈ X for which d X (x 0 , x) < δ.<br />
The function f is continuous on X if f is continuous at every point of X.<br />
Let’s prove that function f(x) = x is continuous on R using the above definition.<br />
For all x 0 ∈ R, we have |f(x 0 ) − f(x)| = |x 0 − x| < ε as long as |x 0 − x| < δ = ε.<br />
That is, given any ε > 0 we are always able to find a δ, namely δ = ε, such that<br />
all points which are closer to x 0 than δ will have images which are closer to f(x 0 )<br />
than ε.
10 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />
Exercise 29. Let f : R → R be the function defined by<br />
{<br />
1/x if x ≠ 0<br />
f(x) =<br />
0 if x = 0<br />
Prove that f is continuous at every point of R, with the exception of 0.<br />
7. Open sets, Compact Sets, and the Weierstrass Theorem<br />
Let x be a point in a metric space and r > 0. The open ball B(x, r) of radius<br />
r centred at x is the set of all y ∈ X such that d(x, y) < r. Thus, the open ball is<br />
the set of all points whose distance from the centre is strictly less than r. The ball<br />
is closed if the inequality is weak, d(x, y) ≤ r.<br />
A set S in a metric space is open if for all x ∈ S there exists r ∈ R, r > 0 such<br />
that B(x, r) ⊂ S. A set S is closed if its complement<br />
is open.<br />
S C = {x ∈ X :| x /∈ S}<br />
Exercise 30. Prove that an open ball is an open set.<br />
Exercise 31. Prove that the intersection of any finite number of open sets is<br />
the open set.<br />
A set S is bounded if there exists a closed ball of finite radius that contains it.<br />
Formally, S is bounded if there exists a closed ball B(x, r) such that S ⊂ B(x, r).<br />
Exercise 32. Prove that the set S is bounded if and only if there a exists a<br />
real number p > 0 such that d(x, x ′ ) ≤ p for all x and x ′ in S.<br />
Exercise 33. Prove that the union of two bounded sets is a bounded set.<br />
A collection (possibly infinite) of open sets U 1 , U 2 , . . . in a metric space is an<br />
open cover of the set S if S is contained in its union.<br />
A set S is compact if every open cover of S has a finite subcover. That is from<br />
any open cover can select a finite number of sets U i that still cover S.<br />
Note that the definition does not say that a set is compact if there is a finite<br />
open cover! That wouldn’t be a good definition as you can cover any set with the<br />
whole space, which is just one open set.<br />
Let’s see how to use this definition to show that something is not compact.<br />
Consider the set (0, 1) ∈ R. To prove that it is not compact we need to find an<br />
open cover of (0, 1) from which we cannot select a finite cover. The collection of<br />
open intervals (1/n, 1) for all integers n ≥ 2 is an open cover of (0, 1), because for<br />
any point x ∈ (0, 1) it is always able to find an integer n such that n > 1/x, thus<br />
x ∈ (1/n, 1). But, no finite subcover will do! Let (1/N, 1) be the maximal interval<br />
in a candidate subcover then it is always possible to find a point x ∈ (0, 1) such<br />
that N < 1/x.<br />
While this definition of compactness is quite useful for finding out when the set<br />
under question is not compact it is less useful for verifying that a set is indeed compact.<br />
Much more convenient characterisation of compact sets in finite-dimensional<br />
Euclidean space, R n , is given by the following theorem.<br />
Theorem 1. Any closed and bounded subset of R n is compact.<br />
But why are we interested in compactness at all? Because of the following extremely<br />
important theorem the first version of which was proved by Carl Weierstrass<br />
around 1860.<br />
Theorem 2. Let S be a compact set in a metric space and f : S → R be a<br />
continuous function. Then function f attains its maximum and minimum in S.
8. SEQUENCES AND SUBSEQUENCES 11<br />
And why this theorem is important for us? Because many economic problems<br />
are concerned with finding a maximal (or a minimal) value of a function on some set.<br />
Weierstrass theorem provides conditions under which such search is meaningful!!!<br />
This theorem and its implications will be much dwelt upon later in the notes, so<br />
we just give here one example. The consumer utility maximisation problem is the<br />
problem of finding the maximum of utility function subject to the budget constraint.<br />
According to Weierstrass theorem, this problem has a solution if utility function is<br />
continuous and the budget set is compact.<br />
8. Sequences and Subsequences<br />
Let us consider again some metric space (X, d). An infinite sequence of points<br />
in (X, d) is simply a list<br />
x 1 , x 2 , x 3 , . . . ,<br />
where . . . indicates that the list continues “forever.”<br />
We can be a bit more formal about this. We first consider the set of natural<br />
numbers (or counting numbers) 1, 2, 3, . . . , which we denote N. We can now define<br />
an infinite sequence in the following way.<br />
X.<br />
Definition 6. An infinite sequence of elements of X is a function from N to<br />
Notation. If we look at the previous definition we see that we might have<br />
a sequence s : N → X which would define s(1), s(2), s(3), . . . or in other words<br />
would define s(n) for any natural number n. Typically when we are referring to<br />
sequences we use subscripts (or sometimes superscripts) instead of parentheses and<br />
write s 1 , s 2 , s 3 , . . . and s n instead of s(1), s(2), s(3), . . . and s(n). Also rather than<br />
saying that s : N → X is a sequence we say that {s n } is a sequence or even that<br />
{s n } ∞ n=1 is a sequence.<br />
Lets now examine a few examples.<br />
Example 4. Suppose that (X, d) is R the real numbers with the usual metric<br />
d(, x, y) = |x − y|. Then {n}, { √ n}, and {1/n} are sequences.<br />
Example 5. Again, suppose that (X, d) is R the real numbers with the usual<br />
metric d(x, y) = |x − y|. Consider the sequence {x n } where<br />
{<br />
1 if n is odd<br />
x n =<br />
0 if n is even<br />
We see that {n} and { √ n} get arbitrary large as n gets larger, while in the last<br />
example x n “bounces” back and forth between 0 and 1 as n gets larger. However for<br />
{1/n} the element of the sequence gets closer and closer to 0 (and indeed arbitrarily<br />
close to 0). We say, in this case, that the sequence converges to zero or that the<br />
sequence has limit 0. This is a particularly important concept and so we shall give<br />
a formal definition.<br />
Definition 7. Let {x n } be a sequence of points in (X, d). We say that the<br />
sequence converges to x 0 ∈ X if for any ε > 0 there is N ∈ N such that if n > N<br />
then d(x n , x 0 ) < ε.<br />
Informally we can describe this by saying that if n is large then the distance<br />
from x n to x 0 is small.<br />
If the sequence {x n } converges to x 0 , then we often write x n → x 0 as n → ∞<br />
or lim n→∞ x n = x 0 .
12 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />
Exercise 34. Show that if the sequence {x n } converges to x 0 then it does not<br />
converge to any other value unequal to x 0 . Another way of saying this is that if<br />
the sequence converges then it’s limit is unique.<br />
We have now seen a number of examples of sequences. In some the sequence<br />
“runs off to infinity;” in others it “bounces around;” while in others it converges to<br />
a limit. Could a sequence do anything else? Could a sequence, for example, settle<br />
down each element getting closer and closer to all future elements in the sequence<br />
but not converging to any particular limit? In fact, depending on what the space<br />
X is this is indeed possible.<br />
First let us recall the notion of a rational number. A rational number is a<br />
number that can be expressed as the ratio of two integers, that is r is rational if<br />
r = a/b with a and b integers and b ≠ 0. We usually denote the set of all rational<br />
numbers Q (since we have already used R for the real numbers). We now consider<br />
and example in which the underlying space X is Q. Consider the sequence of<br />
rational numbers defined in the following way<br />
x 1 = 1<br />
x n+1 = x n + 2<br />
x n + 1 .<br />
This kind of definition is called a recursive definition. Rather than writing, as a<br />
function of n, what x n is we write what x 1 is and then what x n+1 is as a function<br />
of what x n is. We can obviously find any element of the sequence that we need, as<br />
long as we sequentially calculate each previous element. In our case we’d have<br />
x 1 = 1<br />
x 2 = 1 + 2<br />
1 + 1 = 3 2 = 1.5<br />
x 3 =<br />
x 4 =<br />
x 5 =<br />
x 6 =<br />
.<br />
3<br />
2 + 2<br />
3<br />
2 + 1 = 7 5 = 1.4<br />
7<br />
5 + 2<br />
7<br />
5 + 1 = 17<br />
12 ≈ 1.416667<br />
17<br />
12 + 2<br />
17<br />
12 + 1 = 41<br />
29 ≈ 1.413793<br />
41<br />
29 + 2<br />
41<br />
29 + 1 = 99<br />
70 ≈ 1.414286<br />
We see that the sequence goes up and down but that it seems to be “converging.”<br />
What is it converging to? Lets suppose that it’s converging to some value x 0 .<br />
Recall that<br />
x n+1 = x n + 2<br />
x n + 1 .<br />
We’ll see later that if f is a continuous function then lim n → ∞f(x n ) = f(lim n → ∞x n ).<br />
In this case that means that<br />
x 0 = lim n → ∞x n+1 = lim n → ∞ x n + 2<br />
x n + 1<br />
= x 0 + 2<br />
x 0 + 1 .<br />
Thus we have<br />
x 0 = x 0 + 2<br />
x 0 + 1
8. SEQUENCES AND SUBSEQUENCES 13<br />
and if we solve this we obtain x 0 = ± √ 2. Clearly if x n > 0 then x n+1 > 0 so<br />
our sequence can’t be converging to − √ 2 so we must have x 0 = √ 2. But √ 2 is<br />
not in Q. Thus we have a sequence of elements in Q that are getting very close to<br />
each other but are not converging to any element of Q. (<strong>Of</strong> course the sequence is<br />
converging to a point in R. In fact one construction of the real number system is<br />
in terms of such sequences in Q.<br />
Definition 8. Let {x n } be a sequence of points in (X, d). We say that the<br />
sequence is a Cauchy sequence if for any ε > 0 there is N ∈ N such that if n, m > N<br />
then d(x n , x m ) < ε.<br />
Exercise 35. Show that if {x n } converges then {x n } is a Cauchy sequence.<br />
A metric space (X, d) in which every Cauchy sequence converges to a limit in<br />
X is called a complete metric space. The space of real numbers R is a complete<br />
metric space, while the space of rationals Q is not.<br />
Exercise 36. Is N the space of natural or counting numbers with metric d<br />
given by d(x, y) = |x − y| a complete metric space?<br />
In Section 6 we defined the notion of a function being continuous at a point.<br />
It is possible to give that definition in terms of sequences.<br />
Definition 9. Suppose (X, d X ) and (Y, d Y ) are metric spaces, x 0 ∈ X, and<br />
f : X → Y is a function. Then f is continuous at x 0 if for every sequence {x n } that<br />
converges to x 0 in (X, d X ) the sequence {f(x n )} converges to f(x 0 ) in (Y, d Y ).<br />
Exercise 37. Show that the function f(x) = (x + 2)/(x + 1) is continuous at<br />
any point x ≠ −1. Show that this means that if x n → x 0 as n → ∞ then<br />
x n + 2<br />
lim<br />
n→∞ x n + 1 = x 0 + 2<br />
x 0 + 1 .<br />
We can also define the concept of a closed set (and hence the concepts of open<br />
sets and compact sets) in terms of sequences.<br />
Definition 10. Let (X, d) be a metric space. A set S ⊂ X is closed if for any<br />
convergent sequence {x n } such that x n ∈ S for all n then lim n→∞ x n ∈ S. A set is<br />
open if its complement is closed.<br />
Given a sequence {x n } we can define a new sequence by taking only some of<br />
the elements of the original sequence. In the example we considered earlier in which<br />
x n was 1 if n was odd and 0 if n was even we could take only the odd n and thus<br />
obtain a sequence that did converge. The new sequence is called a subsequence of<br />
the old sequence.<br />
Definition 11. Let {x n } be some sequence in (X, d). Let {n j } ∞ j=1 be a<br />
sequence of natural numbers such that for each j we have n j < n j+1 , that is<br />
n 1 < n 2 < n 3 < . . . . The sequence {x nj } ∞ j=1 is called a subsequence of the original<br />
sequence.<br />
The notion of a subsequence is often useful. We often use it in the way that<br />
we briefly referred to above. We initially have a sequence that may not converge,<br />
but we are able to take a subsequence that does converge. Such a subsequence is<br />
called a convergent subsequence.<br />
Definition 12. A subset of a metric space with the property that every sequence<br />
in the subset has a convergent subsequence is called sequentially compact.<br />
Theorem 3. In any metric space any compact set is sequentially compact.
14 1. LOGIC, SETS, FUNCTIONS, AND SPACES<br />
If we restrict attention to finite dimensional Euclidian spaces the situation is<br />
even better behaved.<br />
Theorem 4. Any subset of R n is sequentially compact if and only if it is<br />
compact.<br />
Exercise 38. Verify the following limits.<br />
n<br />
(i) lim<br />
n→∞ n + 1 = 1<br />
n + 3<br />
(ii) lim<br />
n→∞<br />
√n 2 + 1 = 0<br />
√<br />
(iii) lim n + 1 − n = 0<br />
n→∞<br />
n√<br />
(iv) an + b n = max{a, b}<br />
lim<br />
n→∞<br />
Exercise 39. Consider a sequence {x n } in R. What can you say about the<br />
sequence if it converges and for each n x n is an integer.<br />
Exercise 40. Consider the sequence<br />
1<br />
2 , 1 3 , 2 3 , 1 4 , 2 4 , 3 4 , 1 5 , 2 5 , 3 5 , 4 5 , 1 6 , . . . .<br />
For which values z ∈ R is there a subsequence converging to z?<br />
Exercise 41. Prove that if a subsequence of a Cauchy sequence converges to<br />
a limit z then so does the original Cauchy sequence.<br />
Exercise 42. Prove that any subsequence of a convergent sequence converges.<br />
Finally one somewhat less trivial exercise.<br />
Exercise 43. Prove that if lim n→∞ x n = z then<br />
x 1 + · · · + x n<br />
lim<br />
= z<br />
n→∞ n<br />
9. Linear Spaces<br />
The notion of linear space is the axiomatic way of looking at the familiar linear<br />
operations: addition and multiplication. A trivial example of a linear space is the<br />
set of real numbers, R.<br />
What is the operation of addition? The one way of answering the question is<br />
saying that the operation of addition is just the list of its properties. So, we will<br />
define the addition of elements from some set X as the operation that satisfies the<br />
following four axioms.<br />
A1: x + y = y + x for all x and y in X.<br />
A2: x + (y + z) = (x + y) + z, for all x, y, and z in X.<br />
A3: There exists an element, denoted by 0, such that x + 0 = x for all x in<br />
X.<br />
A4: For every x in X there exist an element y in X, called inverse of x, such<br />
that x + y = 0.<br />
And, to make things more interesting we will also introduce the operation of<br />
‘multiplication by number’ by adding two more axioms.<br />
A5: 1x = x for all x in X.<br />
A6: α(βx) = (αβ)x for all x in X and for all α and β in R.<br />
Finally, two more axioms relating addition and multiplication.<br />
A7: α(x + y) = αx + αy for all x and y in X and for all α in R.<br />
A8: (α + β)x = αx + βx for all x in X and for all α and β in R.
9. LINEAR SPACES 15<br />
Elements x, y, . . . , w are linearly dependent if there exist real numbers α, β, . . . , λ,<br />
not all of them equal to zero, such that<br />
αx + βy + · · · + λz = 0.<br />
Otherwise, the elements x, y, . . . , w are linearly independent.<br />
If in a space L it is possible to find n linearly independent elements, but any<br />
n + 1 elements are linearly dependent then we say that the space L has dimension<br />
n.<br />
Nonempty subset L ′ of a linear space L is called a linear subspace if L ′ forms<br />
a linear space in itself. In other words, L ′ is a linear subspace of L if for any x and<br />
y in L and all α and β in R<br />
αx + βy ∈ L ′ .
CHAPTER 2<br />
Linear Algebra<br />
1. The Space R n<br />
In the previous chapter we introduced the concept of a linear space or a vector<br />
space. We shall now examine in some detail one example of such a space. This is<br />
the space of all ordered n-tuples (x 1 , x 2 , . . . , x n ) where each x i is a real number.<br />
We call this space n-dimensional real space and denote it R n .<br />
Remember from the previous chapter that to define a vector space we not only<br />
need to define the points in that space but also to define how we add such points<br />
and how we multiple such points by scalars. In the case of R n we do this element<br />
by element in the n-tuple or vector. That is,<br />
and<br />
(x 1 , x 2 , . . . , x n ) + (y 1 , y 2 , . . . , y n ) = (x 1 + y 1 , x 2 + y 2 , . . . , x n + y n )<br />
α(x 1 , x 2 , . . . , x n ) = (αx 1 , αx 2 , . . . , αx n ).<br />
Let us consider the case that n = 2, that is, the case of R 2 . In this case we can<br />
visualise the space as in the following diagram. The vector (x 1 , x 2 ) is represented<br />
by the point that is x 1 units along from the point (0, 0) in the horizontal direction<br />
and x 2 units up from (0, 0) in the vertical direction.<br />
x 2<br />
✻<br />
(1, 2)<br />
2<br />
1<br />
✲<br />
x 1<br />
Figure 1<br />
Let us for the moment continue our discussion in R 2 . Notice that we are<br />
implicitly writing a vector (x 1 , x 2 ) as a sum x 1 × v 1 + x 2 × v 2 where v 1 is the<br />
unit vector in the first direction and v 2 is the unit vector in the second direction.<br />
Suppose that instead we considered the vectors u 1 = (2, 1) = 2 × v 1 + 1 × v 2 and<br />
17
18 2. LINEAR ALGEBRA<br />
u 2 = (1, 2) = 1 × v 1 + 2 × v 2 . We could have written any vector (x 1 , x 2 ) instead<br />
as z 1 × u 1 + z 2 × u 2 where z 1 = (2x 1 − x 2 )/3 and z 2 = (2x 2 − x 1 )/3. That is, for<br />
any vector in R 2 we can uniquely write that vector in terms of u 1 and u 2 . Is there<br />
anything that is special about u 1 and u 2 that allows us to make this claim? There<br />
must be since we can easily find other vectors for which this would not have been<br />
true. (For example, (1, 2) and (2, 4).)<br />
The property of the pair of vectors u 1 and u 2 is that they are independent. That<br />
is, we cannot write either as a multiple of the other. More generally in n dimensions<br />
we would say that we cannot write any of the vectors as a linear combination of<br />
the others, or equivalently as the following definition.<br />
Definition 13. The vectors x 1 , . . . , x k all in R n are linearly independent if it<br />
is not possible to find scalars α 1 , . . . , α k not all zero such that<br />
α 1 x 1 + · · · + α k x k = 0.<br />
Notice that we do not as a matter of definition require that k = n or even that<br />
k ≤ n. We state as a result that if k > n then the collection x 1 , . . . , x k cannot<br />
be linearly independent. (In a real maths course we would, of course, have proved<br />
this.)<br />
Comment 1. If you examine the definition above you will notice that there<br />
is nowhere that we actually need to assume that our vectors are in R n . We can<br />
in fact apply the same definition of linear independence to any vector space. This<br />
allows us to define the concept of the dimension of an arbitrary vector space as the<br />
maximal number of linearly independent vectors in that space. In the case of R n<br />
we obtain that the dimension is in fact n.<br />
Exercise 44. Suppose that x 1 , . . . , x k all in R n are linearly independent and<br />
that the vector y in R n is equal to β 1 x 1 + · · · + β k x k . Show that this is the only<br />
way that y can be expressed as a linear combination of the x i ’s. (That is show that<br />
if y = γ 1 x 1 + · · · + γ k x k then β 1 = γ 1 , . . . , β k = γ k .)<br />
The set of all vectors that can be written as a linear combination of the vectors<br />
x 1 , . . . , x k is called the span of those vectors. If x 1 , . . . , x k are linearly independent<br />
and if the span of x 1 , . . . , x k is all of R n then the collection { x 1 , . . . , x k } is called<br />
a basis for R n . (<strong>Of</strong> course, in this case we must have k = n.) Any vector in R n<br />
can be uniquely represented as a linear combination of the vectors x 1 , . . . , x k . We<br />
shall later see that it can sometimes be useful to choose a particular basis in which<br />
to represent the vectors with which we deal.<br />
It may be that we have a collection of vectors { x 1 , . . . , x k } whose span is not<br />
all of R n . In this case we call the span of { x 1 , . . . , x k } a linear subspace of R n .<br />
Alternatively we say that X ⊂ R n is a linear subspace of R n if X is closed under<br />
vector addition and scalar multiplication. That is, if for all x, y ∈ X the vector<br />
x + y is also in X and for all x ∈ X and α ∈ R the vector αx is in X. If the span<br />
of x 1 , . . . , x k is X and if x 1 , . . . , x k are linearly independent then we say that these<br />
vectors are a basis for the linear subspace X. In this case the dimension of the<br />
linear subspace X is k. In general the dimension of the span of x 1 , . . . , x k is equal<br />
to the maximum number of linearly independent vectors in x 1 , . . . , x k .<br />
Finally, we comment that R n is a metric space with metric d : R 2n → R +<br />
defined by<br />
d((x 1 , . . . , x n ), (y 1 , . . . , y n )) = √ (x 1 − y 1 ) 2 + · · · + (x n − y n ) 2 .<br />
There are many other metrics we could define on this space but this is the standard<br />
one.
2. LINEAR FUNCTIONS FROM R n TO R m 19<br />
2. Linear Functions from R n to R m<br />
In the previous section we introduced the space R n . Here we shall discuss<br />
functions from one such space to another (possibly of different dimension). The<br />
concept of continuity that we introduced for metric spaces is immediately applicable<br />
here. We shall be mainly concerned here with an even narrower class of functions,<br />
namely, the linear functions.<br />
Definition 14. A function f : R n → R m is said to be a linear function if it<br />
satisfies the following two properties.<br />
(1) f(x + y) = f(x) + f(y) for all x, y ∈ R n , and<br />
(2) f(αx) = αf(x) for all x ∈ R n and α ∈ R.<br />
Comment 2. When considering functions of a single real variable, that is,<br />
functions from R to R functions of the form f(x) = ax + b where a and b are<br />
fixed constants are sometimes called linear functions. It is easy to see that if b ≠ 0<br />
then such functions do not satisfy the conditions given above. We shall call such<br />
functions affine functions. More generally we shall call a function g : R n → R m an<br />
affine function if it is the sum of a linear function f : R n → R m and a constant<br />
b ∈ R m . That is, if for any x ∈ R n g(x) = f(x) + b.<br />
Let us now suppose that we have two linear functions f : R n → R m and<br />
g : R n → R m . It is straightforward to show that the function (f + g) : R n → R m<br />
defined by (f + g)(x) = f(x) + g(x) is also a linear function. Similarly if we have a<br />
linear function f : R n → R m and a constant α ∈ R the function (αf) : R n → R m<br />
defined by (αf)(x) = αf(x) is a linear function. If f : R n → R m and g : R m →<br />
R k are linear functions then the composite function g ◦ f : R n → R k defined by<br />
g ◦ f(x) = g(f(x)) is again a linear function. Finally, if f : R n → R n is not only<br />
linear, but also one-to-one and onto so that it has an inverse f −1 : R n → R n then<br />
the inverse function is also a linear function.<br />
Exercise 45. Prove the facts stated in the previous paragraph.<br />
Recall in the previous section we defined the notion of a linear subspace. A<br />
linear function f : R n → R m defines two important subspaces, the image of f,<br />
denoted Im(f) ⊂ R m , and the kernel of f, denoted Ker(f) ⊂ R n . The image of f<br />
is the set of all vectors in R m such that f maps some vector in R n to that vector,<br />
that is,<br />
Im(f) = { y ∈ R m | ∃x ∈ R n such that y = f(x) }.<br />
The kernel of f is the set of all vectors in R n that are mapped by the function f<br />
to the zero vector in R m , that is,<br />
Ker(f) = { x ∈ R n | f(x) = 0 }.<br />
The kernel of f is sometimes called the null space of f.<br />
It is intuitively clear that the dimension of Im(f) is no more than n. (It is of<br />
course no more than m since it is contained in R m .) <strong>Of</strong> course, in general it may be<br />
less than n, for example if m < n or if f mapped all points in R n to the zero vector<br />
in R m . (You should satisfy yourself that this function is indeed a linear function.)<br />
However if the dimension of Im(f) is indeed less than n it means that the function<br />
has mapped the n-dimensional space R n into a linear space of lower dimension and<br />
that in the process some dimensions have been lost. The linearity of f means that<br />
a linear subspace of dimension equal to the number of dimensions that have been<br />
lost must have been collapsed to the zero vector (and that translates of this linear<br />
subspace have been collapsed to single points). Thus we can say that<br />
dim(Im(f)) + dim(Ker(f)) = n.
20 2. LINEAR ALGEBRA<br />
In the following section we shall introduce the notion of a matrix and define<br />
various operations on matrices. If you are like me when I first came across matrices,<br />
these definitions may seem somewhat arbitrary and mysterious. However, we shall<br />
see that matrices may be viewed as representations of linear functions and that when<br />
viewed in this way the operations we define on matrices are completely natural.<br />
3. Matrices and Matrix Algebra<br />
A matrix is defined as a rectangular array of numbers. If the matrix contains<br />
m rows and n columns it is called an m × n matrix (read “m by n” matrix). The<br />
element in the ith row and the jth column is called the ijth element. We typically<br />
enclose a matrix in square brackets [ ] and write it as<br />
⎡<br />
⎤<br />
a 11 . . . a 1n<br />
⎢<br />
⎣<br />
.<br />
. ..<br />
⎥<br />
. ⎦ .<br />
a m1 . . . a mn<br />
In the case that m = n we call the matrix a square matrix. If m = 1 the matrix<br />
contains a single row and we call it a row vector. If n = 1 the matrix contains<br />
a single column and we call it a column vector. For most purposes we do not<br />
distinguish between a 1 × 1 matrix [a] and the scalar a.<br />
Just as we defined the operation of vector addition and the multiplication of<br />
a vector by a scalar we define similar operations for matrices. In order to be able<br />
to add two matrices we require that the matrices be of the same dimension. That<br />
is, if matrix A is of dimension m × n we shall be able to add the matrix B to it<br />
if and only if B is also of dimension m × n. If this condition is met then we add<br />
matrices simply by adding the corresponding elements of each matrix to obtain the<br />
new m × n matrix A + B. That is,<br />
⎡<br />
⎤ ⎡<br />
⎤ ⎡<br />
⎤<br />
a 11 . . . a 1n b 11 . . . b 1n a 11 + b 11 . . . a 1n + b 1n<br />
⎢<br />
⎣<br />
.<br />
. ..<br />
⎥ ⎢<br />
. ⎦ + ⎣<br />
.<br />
. ..<br />
⎥ ⎢<br />
. ⎦ = ⎣<br />
.<br />
. ..<br />
⎥<br />
. ⎦ .<br />
a m1 . . . a mn b m1 . . . b mn a m1 + b m1 . . . a mn + b mn<br />
We can see that this definition of matrix addition satisfies many of the same<br />
properties of the addition of scalars. If A, B, and C are all m × n matrices then<br />
(1) A + B = B + A,<br />
(2) (A + B) + C = A + (B + C),<br />
(3) there is a zero matrix 0 such that for any m×n matrix A we have A+0 =<br />
0 + A = A, and<br />
(4) there is a matrix −A such that A + (−A) = (−A) + A = 0.<br />
<strong>Of</strong> course, the zero matrix referred to in 3 is simply the m×n matrix consisting<br />
of all zeros (this is called a null matrix) and the matrix −A referred to in 4 is the<br />
matrix obtained from A by replacing each element of A by its negative, that is,<br />
⎡<br />
⎡<br />
⎢<br />
− ⎣<br />
a 11 . . . a 1n<br />
⎤<br />
.<br />
. .. .<br />
⎥<br />
⎦ =<br />
⎢<br />
⎣<br />
−a 11 . . . −a 1n<br />
⎤<br />
.<br />
. .. .<br />
Now, given a scalar α in R and an m × n matrix A we define the product of α<br />
and A which we write αA to be the matrix in which each element is replaced by α<br />
times that element, that is,<br />
⎡<br />
⎡<br />
⎢<br />
α ⎣<br />
a 11 . . . a 1n<br />
⎤<br />
.<br />
. .. .<br />
⎥<br />
⎦ =<br />
⎢<br />
⎣<br />
⎥<br />
⎦ .<br />
αa 11 . . . αa 1n<br />
⎤<br />
.<br />
. .. .<br />
⎥<br />
⎦ .
3. MATRICES AND MATRIX ALGEBRA 21<br />
So far the definitions of matrix operations have all seemed the most natural<br />
ones. We now come to defining matrix multiplication. Perhaps here the definition<br />
seems somewhat less natural. However in the next section we shall see that the definition<br />
we shall give is in fact very natural when we view matrices as representations<br />
of linear functions.<br />
We define matrix multiplication of A times B written as AB where A is an<br />
m × n matrix and B is a p × q matrix only when n = p. In this case the product<br />
AB is defined to be an m × q matrix in which the element in the ith row and jth<br />
column is ∑ n<br />
k=1 a ikb kj . That is, to find the term to go in the ith row and the jth<br />
column of the product matrix AB we take the ith row of the matrix A which will<br />
be a row vector with n elements and the jth column of the matrix B which will be<br />
a column vector with n elements. We then multiply each element of the first vector<br />
by the corresponding element of the second and add all these products. Thus<br />
⎡<br />
⎢<br />
⎣<br />
⎤ ⎡<br />
a 11 . . . a 1n<br />
.<br />
. ..<br />
⎥ ⎢<br />
. ⎦ ⎣<br />
a m1 . . . a mn<br />
b 11 . . . b 1q<br />
⎤<br />
.<br />
. .. .<br />
⎥<br />
⎦ =<br />
⎡ ∑ n<br />
k=1<br />
⎢<br />
a ∑ n<br />
1kb k1 . . .<br />
k=1 a ⎤<br />
1kb kq<br />
⎣<br />
.<br />
. ..<br />
∑ .<br />
n<br />
k=1 a ∑ n<br />
mkb k1 . . .<br />
k=1 a mkb kq<br />
⎥<br />
⎦ .<br />
For example<br />
[ a b c<br />
d e f<br />
] ⎡ ⎣ p q<br />
r s<br />
t v<br />
⎤<br />
[<br />
⎦ ap + br + ct aq + bs + cv<br />
=<br />
dp + er + ft dq + es + fv<br />
]<br />
.<br />
We define the identity matrix of order n to be the n × n matrix that has 1’s on<br />
its main diagonal and zeros elsewhere that is, whose ijth element is 1 if i = j and<br />
zero if i ≠ j. We denote this matrix by I n or, if the order is clear from the context,<br />
simply I. That is,<br />
⎡<br />
I = ⎢<br />
⎣<br />
1 0 . . . 0<br />
0 1 . . . 0<br />
. .<br />
. .. .<br />
0 0 . . . 1<br />
⎤<br />
⎥<br />
⎦ .<br />
It is easy to see that if A is an m × n matrix then AI n = A and I m A = A. In fact,<br />
we could equally well define the identity matrix to be that matrix that satisfies<br />
these properties for all such matrices A in which case it would be easy to show that<br />
there was a unique matrix satisfying this property, namely, the matrix we defined<br />
above.<br />
Consider an m × n matrix A. The columns of A are m-dimensional vectors,<br />
that is, elements of R m and the rows of A are elements of R n . Thus we can ask<br />
if the n columns are linearly independent and similarly if the m rows are linearly<br />
independent. In fact we ask: What is the maximum number of linearly independent<br />
columns of A? It turns out that this is the same as the maximum number of linearly<br />
independent rows of A. We call the number the rank of the matrix A.
22 2. LINEAR ALGEBRA<br />
4. Matrices as Representations of Linear Functions<br />
Let us suppose that we have a particular linear function f : R n → R m . We have<br />
suggested in the previous section that such a function can necessarily be represented<br />
as multiplication by some matrix. We shall now show that this is true. Moreover<br />
we shall do so by explicitly constructing the appropriate matrix.<br />
Let us write the n-dimensional vector x as a column vector<br />
⎡<br />
x = ⎢<br />
⎣<br />
x 1<br />
x 2<br />
.<br />
x n<br />
Now, notice that we can write the vector x as a sum ∑ n<br />
i=1 x ie i , where e i is the ith<br />
unit vector, that is, the vector with 1 in the ith place and zeros elsewhere. That is,<br />
⎡<br />
⎢<br />
⎣<br />
x 1<br />
x 2<br />
.<br />
x n<br />
⎤<br />
⎡<br />
⎥<br />
⎦ = x 1 ⎢<br />
⎣<br />
1<br />
0<br />
.<br />
0<br />
⎤<br />
⎡<br />
⎥<br />
⎦ + x 2 ⎢<br />
⎣<br />
0<br />
1<br />
.<br />
0<br />
⎤<br />
⎥<br />
⎦ .<br />
⎤<br />
⎡<br />
⎥<br />
⎦ + · · · + x n ⎢<br />
⎣<br />
Now from the linearity of the function f we can write<br />
f(x) = f(<br />
=<br />
=<br />
n∑<br />
x i e i )<br />
i=1<br />
n∑<br />
f(x i e i )<br />
i=1<br />
n∑<br />
x i f(e i ).<br />
i=1<br />
But, what is f(e i )? Remember that e i is a unit vector in R n and that f maps<br />
vectors in R n to vectors in R m . Thus f(e i ) is the image in R m of the vector e i . Let<br />
us write f(e i ) as<br />
Thus<br />
f(x) =<br />
n∑<br />
x i f(e i )<br />
i=1<br />
⎡<br />
= x 1 ⎢<br />
⎣<br />
⎡<br />
= ⎢<br />
⎣<br />
a 11<br />
a 21<br />
.<br />
a m1<br />
⎤<br />
∑ n<br />
i=1 a 1ix i<br />
∑n<br />
i=1 a 2ix i<br />
.<br />
∑ n<br />
i=1 a mix i<br />
⎡<br />
⎢<br />
⎣<br />
a 1i<br />
a 2i<br />
.<br />
a mi<br />
⎡<br />
⎥<br />
⎦ + x 2 ⎢<br />
⎣<br />
⎤<br />
⎥<br />
⎦<br />
⎤<br />
⎥<br />
⎦ .<br />
a 12<br />
a 22<br />
.<br />
a m2<br />
⎤<br />
⎡<br />
⎥<br />
⎦ + · · · + x n ⎢<br />
⎣<br />
0<br />
0<br />
.<br />
1<br />
⎤<br />
⎥<br />
⎦ .<br />
a 1n<br />
a 2n<br />
.<br />
a mn<br />
⎤<br />
⎥<br />
⎦
4. MATRICES AS LINEAR FUNCTIONS 23<br />
and this is exactly what we would have obtained had we multiplied the matrices<br />
⎡<br />
⎢<br />
⎣<br />
⎤ ⎡<br />
a 11 a 12 . . . a 1n<br />
a 21 a 22 . . . a 2n<br />
.<br />
. . ..<br />
⎥ ⎢<br />
. ⎦ ⎣<br />
a m1 a m2 . . . a mn<br />
x 1<br />
x 2<br />
.<br />
x n<br />
⎤<br />
⎥<br />
⎦ .<br />
Thus we have not only shown that a linear function is necessarily represented by<br />
multiplication by a matrix we have also shown how to find the appropriate matrix.<br />
It is precisely the matrix whose n columns are the images under the function of the<br />
n unit vectors in R n .<br />
Exercise 46. Find the matrices that represent the following linear functions<br />
from R 2 to R 2 .<br />
(1) a clockwise rotation of π/2 (90 ◦ ),<br />
(2) a reflection in the x 1 axis,<br />
(3) a reflection in the line x 2 = x 1 (that is, the 45 ◦ line),<br />
(4) a counter clockwise rotation of π/4 (45 ◦ ), and<br />
(5) a reflection in the line x 2 = x 1 followed by a counter clockwise rotation of<br />
π/4.<br />
Recall that in Section 2 we defined, for any f, g : R n → R m and α ∈ R, the<br />
functions (f + g) and (αf). In Section 3 we defined the sum of two m × n matrices<br />
A and B, and the product of a scalar α with the matrix A. Let us instead define<br />
the sum of A and B as follows.<br />
Let f : R n → R m be the linear function represented by the matrix A and<br />
g : R n → R m be the linear function represented by the matrix B. Now define<br />
the matrix (A + B) to be the matrix that represents the linear function (f + g).<br />
Similarly let the matrix αA be the matrix that represents the linear function (αf).<br />
Exercise 47. Prove that the matrices (A + B) and αA defined in the previous<br />
paragraph coincide with the matrices defined in Section 3.<br />
We can also see that the definition we gave of matrix multiplication is precisely<br />
the right definition if we mean multiplication of matrices to mean the composition of<br />
the linear functions that the matrices represent. To be more precise let f : R n → R m<br />
and g : R m → R k be linear functions and let A and B be the m × n and k × m<br />
matrices that represent them. Let (g ◦ f) : R n → R k be the composite function<br />
defined in Section 2. Now let us define the product BA to be that matrix that<br />
represents the linear function (g ◦ f).
24 2. LINEAR ALGEBRA<br />
Now since the matrix A represents the function f and B represents g we have<br />
(g ◦ f)(x) = g(f(x))<br />
⎛⎡<br />
⎤ ⎡ ⎤⎞<br />
a 11 a 12 . . . a 1n x 1<br />
a 21 a 22 . . . a 2n<br />
x 2<br />
= g ⎜⎢<br />
⎝⎣<br />
.<br />
. . ..<br />
⎥ ⎢ ⎥⎟<br />
. ⎦ ⎣ . ⎦⎠<br />
a m1 a m2 . . . a mn x n<br />
⎛⎡<br />
∑ n<br />
i=1 a ⎤⎞<br />
∑n<br />
1ix i<br />
i=1<br />
= g ⎜⎢<br />
a 2ix i<br />
⎥⎟<br />
⎝⎣<br />
∑<br />
. ⎦⎠<br />
n<br />
i=1 a mix i<br />
⎡<br />
⎤ ⎡ ∑ n<br />
b 11 b 12 . . . b 1m<br />
i=1 b 21 b 22 . . . b 2m<br />
a ⎤<br />
∑n<br />
1ix i<br />
i=1<br />
= ⎢<br />
⎣<br />
.<br />
. . ..<br />
⎥ ⎢<br />
a 2ix i<br />
⎥<br />
. ⎦ ⎣ . ⎦<br />
∑ n<br />
b k1 b k2 . . . b km i=1 a mix i<br />
⎡ ∑ m<br />
j=1 b ∑ n<br />
1j i=1 a ⎤<br />
jix i<br />
∑m j=1<br />
=<br />
b ∑ n<br />
2j i=1 a jix i<br />
⎢<br />
⎥<br />
⎣<br />
∑<br />
. ⎦<br />
m<br />
j=1 b ∑ n<br />
kj i=1 a jix i<br />
⎡ ∑ n ∑ m<br />
i=1 j=1 b ⎤<br />
1ja ji x i<br />
∑n ∑ m i=1 j=1<br />
=<br />
b 2ja ji x i<br />
⎢<br />
⎥<br />
⎣<br />
∑<br />
. ⎦<br />
n ∑ m<br />
i=1 j=1 b kja ji x i<br />
⎡ ∑ m<br />
j=1 b ∑ m<br />
1ja j1 j=1 b ∑ m<br />
1ja j2 . . .<br />
j=1 b ⎤ ⎡<br />
1ja jn<br />
∑m j=1<br />
=<br />
b ∑ m<br />
2ja j1 j=1 b ∑ m<br />
2ja j2 . . .<br />
j=1 b 2ja jn<br />
⎢<br />
⎣<br />
. ⎥ ⎢<br />
.<br />
. ..<br />
∑ . ⎦ ⎣<br />
m<br />
j=1 b ∑ m<br />
kja j1 j=1 b ∑ m<br />
kja j2 . . .<br />
j=1 b kja jn<br />
And this last is the product of the matrix we defined in Section 3 to be BA with<br />
the column vector x. As we have claimed the definition of matrix multiplication<br />
we gave in Section 3 was not arbitrary but rather was forced on us by our decision<br />
to regard the multiplication of two matrices as corresponding to the composition<br />
of the linear functions the matrices represented.<br />
Recall that the columns of the matrix A that represented the linear function<br />
f : R n → R m were precisely the images of the unit vectors in R n under f. The<br />
linearity of f means that the image of any point in R n is in the span of the images<br />
of these unit vectors and similarly that any point in the span of the images is the<br />
image of some point in R n . Thus Im(f) is equal to the span of the columns of<br />
A. Now, the dimension of the span of the columns of A is equal to the maximum<br />
number of linearly independent columns in A, that is, to the rank of A.<br />
x 1<br />
x 2<br />
.<br />
x n<br />
⎤<br />
⎥<br />
⎦ .<br />
5. Linear Functions from R n to R n and Square Matrices<br />
In the remainder of this chapter we look more closely at an important subclass<br />
of linear functions and the matrices that represent them, viz the functions that<br />
map R n to itself. From what we have already said we see immediately that the<br />
matrix representing such a linear function will have the same number of rows as it<br />
has columns. We call such a matrix a square matrix.
7. CHANGES OF BASIS 25<br />
If the linear function f : R n → R n is one-to-one and onto then the function f<br />
has an inverse f −1 . In Exercise 45 you showed that this function too was linear.<br />
A matrix that represents a linear function that is one-to-one and onto is called a<br />
nonsingular matrix. Alternatively we can say that an n × n matrix is nonsingular<br />
if the rank of the matrix is n. To see these two statements are equivalent note<br />
first that if f is one-to-one then Ker(f) = {0}. (This is the trivial direction of<br />
Exercise 48.) But this means that dim(Ker(f)) = 0 and so dim(Im(f)) = n. And,<br />
as we argued at the end of the previous section this is the same as the rank of<br />
matrix that represents f.<br />
Exercise 48. Show that the linear function f : R n → R m is one-to-one if and<br />
only if Ker(f) = {0}.<br />
Exercise 49. Show that the linear function f : R n → R n is one-to-one if and<br />
only if it is onto.<br />
6. Inverse Functions and Inverse Matrices<br />
In the previous section we discussed briefly the idea of the inverse of a linear<br />
function f : R n → R n . This allows us a very easy definition of the inverse of a<br />
square matrix A. The inverse of A is the matrix that represents the linear function<br />
that is the inverse function of the linear function that A represents. We write the<br />
inverse of the matrix A as A −1 . Thus a matrix will have an inverse if and only if<br />
the linear function that the matrix represents has an inverse, that is, if and only<br />
if the linear function is one-to-one and onto. We saw in the previous section that<br />
this will occur if and only if the kernel of the function is {0} which in turn occurs<br />
if and only if the image of f is of full dimension, that is, is all of R n . This is the<br />
same as the matrix being of full rank, that is, of rank n.<br />
As with the ideas we have discussed earlier we can express the idea of a matrix<br />
inverse purely in terms of matrices without reference to the linear function that<br />
they represent. Given an n × n matrix A we define the inverse of A to be a matrix<br />
B such that BA = I n where I n is the n × n identity matrix discussed in Section 3.<br />
Such a matrix B will exist if and only if the matrix A is nonsingular. Moreover, if<br />
such a matrix B exists then it is also true that AB = I n , that is, (A −1 ) −1 = A.<br />
In Section 9 we shall see one method for calculating inverses of general n × n<br />
matrices. Here we shall simply describe how to calculate the inverse of a 2 × 2<br />
matrix. Suppose that we have the matrix<br />
[ ] a b<br />
A = .<br />
c d<br />
The inverse of this matrix is<br />
( 1<br />
) [ d −b<br />
ad − bc −c a<br />
Exercise 50. Show that the matrix A is of full rank if and only if ad − bc ≠ 0.<br />
Exercise 51. Check that the matrix given is, in fact, the inverse of A.<br />
7. Changes of Basis<br />
We have until now implicitly assumed that there is no ambiguity when we<br />
speak of the vector (x 1 , x 2 , . . . , x n ). Sometimes there may indeed be an obvious<br />
meaning to such a vector. However when we define a linear space all that are really<br />
specified are “what straight lines are” and “where zero is.” In particular, we do<br />
not necessarily have defined in an unambiguous way “where the axes are” or “what<br />
]<br />
.
26 2. LINEAR ALGEBRA<br />
a unit length along each axis is.” In other words we may not have a set of basis<br />
vectors specified.<br />
Even when we do have, or have decided on, a set of basis vectors we may wish<br />
to redefine our description of the linear space with which we are dealing so as to<br />
use a different set of basis vectors. Let us suppose that we have an n-dimensional<br />
space, even R n say, with a given set of basis vectors v 1 , v 2 , . . . , v n and that we<br />
wish instead to describe the space in terms of the linearly independent vectors<br />
b 1 , b 2 , . . . , b n where<br />
b i = b 1i v 1 + b 2i v 2 + · · · + b ni v n .<br />
Now, if we had the description of a point in terms of the new coordinate vectors,<br />
e.g., as<br />
z 1 b 1 + z 2 b 2 + · · · + z n b n<br />
then we can easily convert this to a description in terms of the original basis vectors.<br />
We would simply substitute the formula for b i in terms of the e j ’s into the previous<br />
formula giving<br />
(<br />
∑ n<br />
) ( n<br />
) (<br />
∑<br />
n<br />
)<br />
∑<br />
b 1i z i v 1 + b 2i z i v 2 + · · · + b ni z i v n<br />
i=1<br />
or, in our previous notation<br />
i=1<br />
⎡<br />
( ∑ n<br />
i=1 b ⎤<br />
1iz i )<br />
( ∑ n<br />
i=1<br />
⎢<br />
b 2iz i )<br />
⎥<br />
⎣ .<br />
( ∑ ⎦ .<br />
n<br />
i=1 b niz i )<br />
But this is simply the product<br />
⎡<br />
⎤ ⎡<br />
b 11 b 12 . . . b 1n<br />
b 21 b 22 . . . b 2n<br />
⎢<br />
⎣<br />
.<br />
. . ..<br />
⎥ ⎢<br />
. ⎦ ⎣<br />
b n1 b n2 . . . b nn<br />
That is, if we are given an n-tuple of real numbers that describe a vector in terms<br />
of the new basis vectors b 1 , b 2 , . . . , b n and we wish to find the n-tuple that describes<br />
the vector in terms of the original basis vectors we simply multiply the ntuple we<br />
are given, written as a column vector by the matrix whose columns are the new<br />
basis vectors b 1 , b 2 , . . . , b n . We shall call this matrix B. We see among other things<br />
that changing the basis is a linear operation.<br />
Now, if we were given the information in terms of the original basis vectors<br />
and wanted to write it in terms of the new basis vectors what should we do? Since<br />
we don’t have the original basis vectors written in terms of the new basis vectors<br />
this is not immediately obvious. However we do know that if we were to do it and<br />
then were to carry out the operation described in the previous paragraph we would<br />
be back with what we started. Further we know that the operation is a linear<br />
operation that maps n-tuples to n-tuples and so is represented by multiplication<br />
by an n × n matrix. That is we multiply the n-tuple written as a column vector by<br />
the matrix that when multiplied by B gives the identity matrix, that is, the matrix<br />
B −1 . If we are given a vector of the form<br />
z 1<br />
z 2<br />
.<br />
z n<br />
x 1 v 1 + x 2 v 2 + · · · + x n v n<br />
i=1<br />
⎤<br />
⎥<br />
⎦ .
7. CHANGES OF BASIS 27<br />
and we wish to express it in terms of the vectors b 1 , b 2 , . . . , b n we calculate<br />
⎡<br />
⎤−1 ⎡ ⎤<br />
b 11 b 12 . . . b 1n x 1<br />
b 21 b 22 . . . b 2n<br />
x 2<br />
⎢<br />
⎣<br />
.<br />
. . ..<br />
⎥ ⎢ ⎥<br />
. ⎦ ⎣ . ⎦ .<br />
b n1 b n2 . . . b nn x n<br />
Suppose now that we consider a linear function f : R n → R n and that we have<br />
originally described R n in terms of the basis vectors v 1 , v 2 , . . . , v n where v i is the<br />
vector with 1 in the ith place and zeros elsewhere. Suppose that with these basis<br />
vectors f is represented by the matrix<br />
⎡<br />
⎤<br />
a 11 a 12 . . . a 1n<br />
a 21 a 22 . . . a 2n<br />
A = ⎢<br />
⎣<br />
.<br />
. . ..<br />
⎥<br />
. ⎦ .<br />
a n1 a n2 . . . a nn<br />
If we now describe R n in terms of the vectors b 1 , b 2 , . . . , b n how will the linear<br />
function f be represented? Let us think of what we want? We shall be given<br />
a vector described in terms of the basis vectors b 1 , b 2 , . . . , b n and we shall want<br />
to know what the image of this vector under the linear function f is, where we<br />
shall again want our answer in terms of the basis vectors b 1 , b 2 , . . . , b n . We shall<br />
know how to do this when we are given the description in terms of the vectors<br />
e 1 , e 2 , . . . , e n . Thus the first thing we shall do with our vector is to convert it from<br />
a description in terms of b 1 , b 2 , . . . , b n to a description in terms of e 1 , e 2 , . . . , e n . We<br />
do this by multiplying the n-tuple by the matrix B. Thus if we call our original<br />
n-tuple z we shall now have a description of the vector in terms of e 1 , e 2 , . . . , e n ,<br />
viz Bz. Given this description we can find the image of the vector in question<br />
under f by multiplying by the matrix A. Thus we shall have A(Bz) = (AB)z.<br />
Remember however this will have given us the image vector in terms of the basis<br />
vectors e 1 , e 2 , . . . , e n . In order to convert this to a description in terms of the vectors<br />
b 1 , b 2 , . . . , b n we must multiply by the matrix B −1 . Thus our final n-tuple will be<br />
(B −1 AB)z.<br />
Recapitulating, suppose that we know that the linear function f : R n → R n is<br />
represented by the matrix A when we describe R n in terms of the standard basis<br />
vectors e 1 , e 2 , . . . , e n and that we have a new set of basis vectors b 1 , b 2 , . . . , b n . Then<br />
when R n is described in terms of these new basis vectors the linear function f will<br />
be represented by the matrix B −1 AB.<br />
Exercise 52. Let f : R n → R m be a linear function. Suppose that with the<br />
standard bases for R n and R m the function f is represented by the matrix A. Let<br />
b 1 , b 2 , . . . , b n be a new set of basis vectors for R n and c 1 , c 2 , . . . , c m be a new set of<br />
basis vectors for R m . What is the matrix that represents f when the linear spaces<br />
are described in terms of the new basis vectors?<br />
Exercise 53. Let f : R 2 → R 2 be a linear function. Suppose that with the<br />
standard bases for R n and R m the function f is represented by the matrix<br />
[ ]<br />
3 1<br />
.<br />
1 2<br />
Let [ 3<br />
2<br />
]<br />
and<br />
be a new set of basis vectors for R 2 . What is the matrix that represents f when<br />
R 2 is described in terms of the new basis vectors?<br />
[ 1<br />
1<br />
]
28 2. LINEAR ALGEBRA<br />
Properties of a square matrix that depend only on the linear function that the<br />
matrix represents and not on the particular choice of basis vectors for the linear<br />
space are called invariant properties. We have already seen one example of an<br />
invariant property, the rank of a matrix. The rank of a matrix is equal to the<br />
dimension of the image space of the function that the matrix represents which<br />
clearly depends only on the function and not on the choice of basis vectors for the<br />
linear space.<br />
The idea of a property being invariant can be expressed also in terms only of<br />
matrices without reference to the idea of linear functions. A property is invariant<br />
if whenever an n × n matrix A has the property then for any nonsingular n × n<br />
matrix B the matrix B −1 AB also has the property. We might think of rank as a<br />
function that associates to any square matrix a nonnegative integer. We shall say<br />
that such a function is an invariant if the property of having the function take a<br />
particular value is invariant for all particular values we may choose.<br />
Two particularly important invariants are the trace of a square matrix and the<br />
determinant of a square matrix. We examine these in more detail in the following<br />
section.<br />
8. The Trace and the Determinant<br />
In this section we define two important real valued functions on the space<br />
of n × n matrices, the trace and the determinant. Both of these concepts have<br />
geometric interpretations. However, while the trace is easy to calculate (much easier<br />
than the determinant) its geometric interpretation is rather hard to see. Thus we<br />
shall not go into it. On the other hand the determinant while being somewhat<br />
harder to calculate has a very clear geometric interpretation. In Section 9 we shall<br />
examine in some detail how to calculate determinants. In this section we shall be<br />
content to discuss one definition and the geometric intuition of the determinant.<br />
Given an n×n matrix A the trace of A, written tr(A) is the sum of the elements<br />
on the main diagonal, that is,<br />
⎡<br />
⎤<br />
a 11 a 12 . . . a 1n<br />
a 21 a 22 . . . a 2n<br />
tr ⎢<br />
⎣<br />
.<br />
. . ..<br />
⎥<br />
. ⎦ =<br />
∑<br />
n a ii .<br />
i=1<br />
a n1 a n2 . . . a nn<br />
Exercise 54. For the matrices given in Exercise 53 confirm that tr(A) =<br />
tr(B −1 AB).<br />
It is easy to see that the trace is a linear function on the space of all n × n<br />
matrices, that is, that for all A and B n × n matrices and for all α ∈ R<br />
(1) tr(A + B) = tr(A) + tr(B),<br />
and<br />
(2) tr(αA) = αtr(A).<br />
We can also see that if A and B are both n×n matrices then tr(AB) = tr(BA).<br />
In fact, if A is an m × n matrix and B is an n × m matrix this is still true. This<br />
will often be extremely useful in calculating the trace of a product.<br />
Exercise 55. From the definition of matrix multiplication show that if A is an<br />
m × n matrix and B is an n × m matrix that tr(AB) = tr(BA). [Hint: Look at the<br />
definition of matrix multiplication in Section 2. Then write the determinant of the<br />
product matrix using summation notation. Finally change the order of summation.]
8. THE TRACE AND THE DETERMINANT 29<br />
The determinant, unlike the trace is not a linear function of the matrix. It does<br />
however have some linear structure. If we fix all columns of the matrix except one<br />
and look at the determinant as a function of only this column then the determinant<br />
is linear in this single column. Moreover this is true whatever the column we choose.<br />
Let us write the determinant of the n × n matrix A as det(A). Let us also write<br />
the matrix A as [a 1 , a 2 , . . . , a n ] where a i is the ith column of the matrix A. Thus<br />
our claim is that for all n × n matrices A, for all i = 1, 2, . . . n, for all n vectors b,<br />
and for all α ∈ R<br />
(3)<br />
det([a 1 , . . . , a i−1 , a i + b, a i+1 , . . . , a n ]) = det([a 1 , . . . , a i−1 , a i , a i+1 , . . . , a n ])<br />
+ det([a 1 , . . . , a i−1 , b, a i+1 , . . . , a n ])<br />
and<br />
(4) det([a 1 , . . . , a i−1 , αa i , a i+1 , . . . , a n ]) = α det([a 1 , . . . , a i−1 , a i , a i+1 , . . . , a n ]).<br />
We express this by saying that the determinant is a multilinear function.<br />
Also the determinant is such that any n × n matrix that is not of full rank,<br />
that is, of rank n, has a zero determinant. In fact, given that the determinant<br />
is a multilinear function if we simply say that any matrix in which one column is<br />
the same as one of its neighbours has a zero determinant this implies the stronger<br />
statement that we made. We already see one use of calculating determinants. A<br />
matrix is nonsingular if and only if its determinant is nonzero.<br />
The two properties of being multilinear and zero whenever two neighbouring<br />
columns are the same already almost uniquely identify the determinant. Notice<br />
however that if the determinant satisfies these two properties then so does any<br />
constant times the determinant. To uniquely define the determinant we “tie down”<br />
this constant by assuming that det(I) = 1.<br />
Though we haven’t proved that it is so, these three properties uniquely define<br />
the determinant. That is, there is one and only one function with these three<br />
properties. We call this function the determinant. In Section 9 we shall discuss a<br />
number of other useful properties of the determinant. Remember that this additional<br />
properties are not really additional facts about the determinant. They can<br />
all be derived from the three properties we have given here.<br />
Let us now look to the geometric interpretation of the determinant. Let us<br />
first think about what linear transformations can do to the space R n . Since we<br />
have already said that a linear transformation that is not onto is represented by a<br />
matrix with a zero determinant let us think about linear transformations that are<br />
onto, that is, that do not map R n into a linear space of lower dimension. Such<br />
transformations can rotate the space around zero. They can “stretch” the space in<br />
different directions. And they can “flip” the space over. In the latter case all objects<br />
will become “mirror images” of themselves. We call linear transformations that<br />
make such a mirror image orientation reversing and those that don’t orientation<br />
preserving. A matrix that represents an orientation preserving linear function has a<br />
positive determinant while a matrix that represents an orientation reversing linear<br />
function has a negative determinant. Thus we have a geometric interpretation of<br />
the sign of the determinant.<br />
The absolute size of the determinant represents how much bigger or smaller the<br />
linear function makes objects. More precisely it gives the “volume” of the image<br />
of the unit hypercube under the transformation. The word volume is in quotes<br />
because it is the volume with which we are familiar only when n = 3. If n = 2 then<br />
it is area, while if n > 3 then it is the full dimensional analog in R n of volume in<br />
R 3 .
30 2. LINEAR ALGEBRA<br />
Exercise 56. Consider the matrix<br />
[ 3 1<br />
1 2<br />
In a diagram show the image under the linear function that this matrix represents<br />
of the unit square, that is, the square whose corners are the points (0,0), (1,0),<br />
(0,1), and (1,1). Calculate the area of that image. Do the same for the matrix<br />
[ 4 1<br />
−1 1<br />
In the light of Exercise 53, comment on the answers you calculated.<br />
]<br />
.<br />
]<br />
.<br />
9. Calculating and Using Determinants<br />
We have already used the concepts of the inverse of a matrix and the determinant<br />
of a matrix. The purpose of this section is to cover some of the “cookbook”<br />
aspects of calculating inverses and determinants.<br />
Suppose that we have an n × n matrix<br />
⎡<br />
then we shall use |A| or<br />
A =<br />
⎢<br />
⎣<br />
⎤<br />
a 11 . . . a 1n<br />
.<br />
. ..<br />
⎥<br />
. ⎦<br />
a n1 . . . a nn<br />
∣ a 11 . . . a 1n ∣∣∣∣∣∣<br />
. . .. .<br />
∣ a n1 . . . a nn<br />
as an alternative notation for det(A). Always remember that<br />
∣ a 11 . . . a 1n ∣∣∣∣∣∣<br />
. . .. .<br />
∣ a n1 . . . a nn<br />
is not a matrix but rather a real number. For the case n = 2 we define<br />
det(A) =<br />
∣ a ∣<br />
11 a 12 ∣∣∣<br />
a 21 a 22<br />
as a 11 a 22 − a 21 a 12 . It is possible to also give a convenient formula for the determinant<br />
of a 3 × 3 matrix. However, rather than doing this, we shall immediately<br />
consider the case of an n × n matrix.<br />
By the minor of an element of the matrix A we mean the determinant (remember<br />
a real number) of the matrix obtained from the matrix A by deleting the<br />
row and column containing the element in question. We denote the minor of the<br />
element a ij by the symbol |M ij |. Thus, for example,<br />
∣ a 22 . . . a 2n ∣∣∣∣∣∣ |M 11 | =<br />
.<br />
. . .. . .<br />
∣ a n2 . . . a nn<br />
Exercise 57. Write out the minors of a general 3 × 3 matrix.<br />
We now define the cofactor of an element to be either plus or minus the minor<br />
of the element, being plus if the sum of indices of the element is even and minus<br />
if it is odd. We denote the cofactor of the element a ij by the symbol |C ij |. Thus<br />
|C ij | = |M ij | if i + j is even and |C ij | = −|M ij | if i + j is odd. Or,<br />
|C ij | = (−1) i+j |M ij |.
9. CALCULATING AND USING DETERMINANTS 31<br />
We now define the determinant of an n × n matrix A,<br />
∣ a 11 . . . a 1n ∣∣∣∣∣∣<br />
det(A) = |A| =<br />
. . .. .<br />
∣ a n1 . . . a nn<br />
to be ∑ n<br />
j=1 a 1j|C 1j |. This is the sum of n terms, each one of which is the product<br />
of an element of the first row of the matrix and the cofactor of that element.<br />
Exercise 58. Define the determinant of the 1 × 1 matrix [a] to be a. (What<br />
else could we define it to be?) Show that the definition given above corresponds<br />
with the definition we gave earlier for 2 × 2 matrices.<br />
Exercise 59. Calculate the determinants of the following 3 × 3 matrices.<br />
⎡<br />
1 2<br />
⎤<br />
3<br />
⎡<br />
1 5<br />
⎤<br />
2<br />
⎡<br />
1 1 0<br />
⎤<br />
(a) ⎣ 3 6 9 ⎦ (b) ⎣ 1 4 3 ⎦ (c) ⎣ 5 4 1 ⎦<br />
4 5 7<br />
0 1 2<br />
2 3 2<br />
(d)<br />
⎡<br />
⎣<br />
1 0 0<br />
0 1 0<br />
0 0 1<br />
⎤<br />
⎦<br />
(e)<br />
⎡<br />
⎣<br />
2 5 2<br />
1 5 3<br />
0 1 3<br />
Exercise 60. Show that the determinant of the identity matrix, det(I n ) is 1<br />
for all values of n. [Hint: Show that it is true for I 2 . Then show that if it is true<br />
for I n−1 then it is true for I n .]<br />
One might ask what was special about the first row that we took elements of<br />
that row multiplied them by their cofactors and added them up. Why not the<br />
second row, or the first column? It will follow from a number of properties of<br />
determinants we list below that in fact we could have used any row or column and<br />
we would have arrived at the same answer.<br />
Exercise 61. Expand the matrix given in Exercise 59(b) in terms of the 2nd<br />
and 3rd rows and in terms of each column and check that the resulting answer<br />
agrees with the answer you obtained originally.<br />
We now have a way of calculating the determinant of any matrix. To find<br />
the determinant of an n × n matrix we have to calculate n determinants of size<br />
(n−1)×(n−1). This is clearly a fairly computationally costly procedure. However<br />
there are often ways to economise on the computation.<br />
Exercise 62. Evaluate the determinants of the following matrices<br />
⎡<br />
⎤ ⎡<br />
⎤<br />
1 8 0 7<br />
4 7 0 4<br />
(a) ⎢ 2 3 4 6<br />
⎥<br />
⎣ 1 6 0 −1 ⎦ (b) ⎢ 5 6 1 8<br />
⎥<br />
⎣ 0 0 9 0 ⎦<br />
0 −5 0 8<br />
1 −3 1 4<br />
[Hint: Think carefully about which column or row to use in the expansion.]<br />
We shall now list a number of properties of determinants. These properties<br />
imply that, as we stated above, it does not matter which row or column we use to<br />
expand the determinant. Further these properties will give us a series of transformations<br />
we may perform on a matrix without altering its determinant. This will<br />
allow us to calculate a determinant by first transforming the matrix to one whose<br />
determinant is easier to calculate and then calculating the determinant of the easier<br />
matrix.<br />
⎤<br />
⎦
32 2. LINEAR ALGEBRA<br />
Property 1. The determinant of a matrix equals the determinant of its transpose.<br />
|A| = |A ′ |<br />
Property 2. Interchanging two rows (or two columns) of a matrix changes<br />
its sign but not its absolute value. For example,<br />
∣ c d<br />
∣ ∣∣∣ a b ∣ = cb − ad = −(ad − cb) = − a b<br />
c d ∣ .<br />
Property 3. Multiplying one row (or column) of a matrix by a constant λ<br />
will change the value of the determinant λ-fold. For example,<br />
∣ ∣ λa 11 . . . λa 1n ∣∣∣∣∣∣ a 11 . . . a 1n ∣∣∣∣∣∣ . . .. . = λ<br />
. . .. . .<br />
∣ a n1 . . . a nn<br />
∣ a n1 . . . a nn<br />
Exercise 63. Check Property 3 for the cases n = 2 and n = 3.<br />
Corollary 1. |λA| = λ n |A| (where A is an n × n matrix).<br />
Corollary 2. | − A| = |A| if n is even. | − A| = −|A| if n is odd.<br />
Property 4. Adding a multiple of any row (column) to any other row (column)<br />
does not alter the value of the determinant.<br />
Exercise 64. Check that<br />
⎡ ⎤ ⎡<br />
1 5 2 1 5 + 3 × 2 2<br />
⎣ 1 4 3 ⎦ = ⎣ 1 4 + 3 × 3 3<br />
0 1 2 0 1 + 3 × 2 2<br />
⎡<br />
= ⎣<br />
⎤<br />
⎦<br />
1 + (−2) × 1 5 + (−2) × 4 2 + (−2) × 3<br />
1 4 3<br />
0 1 2<br />
Property 5. If one row (or column) is a constant times another row (or<br />
column) then the determinant the matrix is zero.<br />
Exercise 65. Show that Property 5 follows from Properties 3 and 4.<br />
We can strengthen Property 5 to obtain the following.<br />
Property 5 ′ . The determinant of a matrix is zero if and only if the matrix is<br />
not of full rank.<br />
Exercise 66. Explain why Property 5 ′ is a strengthening of Property 5, that<br />
is, why 5 ′ implies 5.<br />
These properties allow us to calculate determinants more easily. Given an n×n<br />
matrix A the basic strategy one follows is to use the above properties, particularly<br />
Property 4 to find a matrix with the same determinant as A in which one row (or<br />
column) has only one non-zero element. Then, rather than calculating n determinants<br />
of size (n − 1) × (n − 1) one only needs to calculate one. One then does the<br />
same thing for the (n − 1) × (n − 1) determinant that needs to be calculated, and<br />
so on.<br />
There are a number of reasons we are interested in determinants. One is that<br />
they give us one method of calculating the inverse of a nonsingular matrix. (Recall<br />
that there is no inverse of a singular matrix.) They also give us a method, known<br />
as Cramer’s Rule, for solving systems of linear equations. Before proceeding with<br />
this it is useful to state one further property of determinants.<br />
⎤<br />
⎦ .
9. CALCULATING AND USING DETERMINANTS 33<br />
Property 6. If one expands a matrix in terms of one row (or column) and<br />
the cofactors of a different row (or column) then the answer is always zero. That is<br />
n∑<br />
a ij |C kj | = 0<br />
whenever i ≠ k. Also<br />
whenever j ≠ k.<br />
j=1<br />
n∑<br />
a ij |C ik | = 0<br />
i=1<br />
Exercise 67. Verify Property 6 for the matrix<br />
⎡<br />
4 1 2<br />
⎤<br />
⎣ 5 2 1 ⎦ .<br />
1 0 3<br />
Let us define the matrix of cofactors C to be the matrix [|C ij |] whose ijth<br />
element is the cofactor of the ijth element of A. Now we define the adjoint matrix<br />
of A to be the transpose of the matrix of cofactors of A. That is<br />
adj(A) = C ′ .<br />
It is straightforward to see (using Property 6) that A adj(A) = |A|I n = adj(A)A.<br />
That is, A −1 = 1<br />
|A|<br />
adj(A). Notice that this is well defined if and only if |A| ≠ 0.<br />
We now have a method of finding the inverse of any nonsingular square matrix.<br />
Exercise 68. Use this method to find the inverses of the following matrices<br />
⎡<br />
3 −1<br />
⎤<br />
2<br />
⎡<br />
4 −2<br />
⎤<br />
1<br />
⎡<br />
1 5<br />
⎤<br />
2<br />
(a) ⎣ 1 0 3 ⎦ (b) ⎣ 7 3 3 ⎦ (c) ⎣ 1 4 3 ⎦ .<br />
4 0 2 2 0 1 0 1 2<br />
Knowing how to invert matrices we thus know how to solve a system of n linear<br />
equations in n unknowns. For we can express the n equations in matrix notation as<br />
Ax = b where A is an n × n matrix of coefficients, x is an n × 1 vector of unknowns,<br />
and b is an n × 1 vector of constants. Thus we can solve the system of equations<br />
as x = A −1 Ax = A −1 b.<br />
Sometimes, particularly if we are not interested in all of the x’s it is convenient<br />
to use another method of solving the equations. This method is known as Cramer’s<br />
Rule. Let us suppose that we wish to solve the above system of equations, that is,<br />
Ax = b. Let us define the matrix A i to be the matrix obtained from A by replacing<br />
the ith column of A by the vector b. Then the solution is given by<br />
x i = |A i|<br />
|A| .<br />
Exercise 69. Derive Cramer’s Rule. [Hint: We know that the solution to the<br />
system of equations is solved by x = (1/|A|)adj(A)b. This gives a formula for x i .<br />
Show that this formula is the same as that given by x i = |A i |/|A|.]<br />
Exercise 70. Solve the following system of equations (i) by matrix inversion<br />
and (ii) by Cramer’s Rule<br />
(a)<br />
2x 1 − x 2 = 2<br />
3x 2 + 2x 3 = 16<br />
5x 1 + 3x 3 = 21<br />
(b)<br />
−x 1 + x 2 + x 3 = 1<br />
x 1 − x 2 + x 3 = 1<br />
x 1 + x 2 + x 3 = 1<br />
.
34 2. LINEAR ALGEBRA<br />
Exercise 71. Recall that we claimed that the determinant was an invariant.<br />
Confirm this by calculating (directly) det(A) and det(B −1 AB) where<br />
⎡<br />
1 0 1<br />
⎤<br />
⎡<br />
1 0 0<br />
⎤<br />
B = ⎣ 1 −1 2 ⎦ and A = ⎣ 0 2 0 ⎦ .<br />
2 1 −1<br />
0 0 3<br />
Exercise 72. An nth order determinant of the form<br />
∣ a 11 0 0 . . . 0 ∣∣∣∣∣∣∣∣∣∣<br />
a 21 a 22 0 . . . 0<br />
a 31 a 32 a 33 . . . 0<br />
. . . . .. .<br />
∣ a n1 a n2 a n3 . . . a nn<br />
is called triangular. Evaluate this determinant. [Hint: Expand the determinant in<br />
terms of its first row. Expand the resulting (n − 1) × (n − 1) determinant in terms<br />
of its first row, and so on.]<br />
10. Eigenvalues and Eigenvectors<br />
Suppose that we have a linear function f : R n → R n . When we look at<br />
how f deforms R n one natural question to look at is: Where does f send some<br />
linear subspace? In particular we might ask if there are any linear subspaces that<br />
f maps to themselves. We call such linear subspaces invariant linear subspaces.<br />
<strong>Of</strong> course the space R n itself and the zero dimensional space {0} are invariant<br />
linear subspaces. The real question is whether there are any others. Clearly, for<br />
some linear transformations there are no other invariant subspaces. For example,<br />
a clockwise rotation of π/4 in R 2 has no invariant subspaces other than R 2 itself<br />
and {0}.<br />
A particularly important class of invariant linear subspaces are the one dimensional<br />
ones. A one dimensional linear subspace is specified by one nonzero vector,<br />
say ¯x. Then the subspace is {λ¯x | λ ∈ R}. Let us call this subspace L(¯x). If L(¯x)<br />
is an invariant linear subspace of f and if x ∈ L(¯x) then there is some value λ such<br />
that f(x) = λx. Moreover the value of λ for which this is true will be the same<br />
whatever value of x we choose in L(¯x).<br />
Now if we fix the set of basis vectors and thus the matrix A that represents f<br />
we have that if x is in a one dimensional invariant linear subspace of f then there<br />
is some λ ∈ R such that<br />
Ax = λx.<br />
Again we can define this notion without reference to linear functions. Given a<br />
matrix A if we can find a pair x, λ with x ≠ 0 that satisfy the above equation we<br />
call x an eigenvector of the matrix A and λ the associated eigenvalue. (Sometimes<br />
these are called characteristic vectors and values.)<br />
Exercise 73. Show that the eigenvalues of a matrix are an invariant, that<br />
is, that they depend only on the linear function the matrix represents and not on<br />
the choice of basis vectors. Show also that the eigenvectors of a matrix are not<br />
an invariant. Explain why the dependence of the eigenvectors on the particular<br />
basis is exactly what we would expect and argue that is some sense they are indeed<br />
invariant.<br />
Now we can rewrite the equation Ax = λx as<br />
(A − λI n )x = 0.
10. EIGENVALUES AND EIGENVECTORS 35<br />
If x, λ solve this equation and x ≠ 0 then we have a nonzero linear combination of<br />
the columns of A − λI n equal to zero. This means that the columns of A − λI n are<br />
not linearly independent and so det(A − λI n ) = 0, that is,<br />
⎡<br />
⎤<br />
a 11 − λ a 12 . . . a 1n<br />
a 21 a 22 − λ . . . a 2n<br />
det ⎢<br />
⎣<br />
.<br />
. . ..<br />
⎥<br />
. ⎦ = 0.<br />
a n1 a n2 . . . a nn − λ<br />
Now, the left hand side of this last equation is a polynomial of degree n in<br />
λ, that is, a polynomial in λ in which n is the highest power of λ that appears<br />
with nonzero coefficient. It is called the characteristic polynomial and the equation<br />
is called the characteristic equation. Now this equation may, or may not, have a<br />
solution in real numbers. In general, by the fundamental theorem of algebra the<br />
equation has n solutions, perhaps not all distinct, in the complex numbers. If the<br />
matrix A happens to be symmetric (that is, if a ij = a ji for all i and j) then all of<br />
its eigenvalues are real. If the eigenvalues are all distinct (that is, different from<br />
each other) then we are in a particularly well behaved situation. As a prelude we<br />
state the following result.<br />
Theorem 5. Given an n×n matrix A suppose that we have m eigenvectors of A<br />
x 1 , x 2 , . . . , x m with corresponding eigenvalues λ 1 , λ 2 , . . . , λ m . If λ i ≠ λ j whenever<br />
i ≠ j then x 1 , x 2 , . . . , x m are linearly independent.<br />
An implication of this theorem is that an n × n matrix cannot have more than<br />
n eigenvectors with distinct eigenvalues. Further this theorem allows us to see that<br />
if an n × n matrix has n distinct eigenvalues then it is possible to find a basis<br />
for R n in which the linear function that the matrix represents is represented by<br />
a diagonal matrix. Equivalently we can find a matrix B such that B −1 AB is a<br />
diagonal matrix.<br />
To see this let b 1 , b 2 , . . . , b n be n linearly independent eigenvectors with associated<br />
eigenvalues λ 1 , λ 2 , . . . , λ n . Let B be the matrix whose columns are the vectors<br />
b 1 , b 2 , . . . , b n . Since these vectors are linearly independent the matrix B has an<br />
inverse. Now<br />
B −1 AB = B −1 [Ab 1 Ab 2 . . . Ab n ]<br />
= B −1 [λ 1 b 1 λ 2 b 2 . . . λ n b n ]<br />
= [λ 1 B −1 b 1 λ 2 B −1 b 2 . . . λ n B −1 b n ]<br />
⎡<br />
⎤<br />
λ 1 0 . . . 0<br />
0 λ 2 . . . 0<br />
= ⎢<br />
⎣<br />
.<br />
. . ..<br />
⎥<br />
. ⎦ .<br />
0 0 . . . λ n
CHAPTER 3<br />
Consumer Behaviour: Optimisation Subject to the<br />
Budget Constraint<br />
1. Constrained Maximisation<br />
1.1. Lagrange Multipliers. Consider the problem of a consumer who seeks<br />
to distribute his income across the purchase of the two goods that he consumes,<br />
subject to the constraint that he spends no more than his total income. Let us<br />
denote the amount of the first good that he buys x 1 and the amount of the second<br />
good x 2 , the prices of the two goods p 1 and p 2 , and the consumer’s income y.<br />
The utility that the consumer obtains from consuming x 1 units of good 1 and x 2<br />
of good two is denoted u(x 1 , x 2 ). Thus the consumer’s problem is to maximise<br />
u(x 1 , x 2 ) subject to the constraint that p 1 x 1 + p 2 x 2 ≤ y. (We shall soon write<br />
p 1 x 1 + p 2 x 2 = y, i.e., we shall assume that the consumer must spend all of his<br />
income.) Before discussing the solution of this problem lets write it in a more<br />
“mathematical” way.<br />
(5)<br />
max u(x 1 , x 2 )<br />
x 1,x 2<br />
subject to p 1 x 1 + p 2 x 2 = y<br />
We read this “Choose x 1 and x 2 to maximise u(x 1 , x 2 ) subject to the constraint<br />
that p 1 x 1 + p 2 x 2 = y.”<br />
Let us assume, as usual, that the indifference curves (i.e., the sets of points<br />
(x 1 , x 2 ) for which u(x 1 , x 2 ) is a constant) are convex to the origin. Let us also<br />
assume that the indifference curves are nice and smooth. Then the point (x ∗ 1, x ∗ 2)<br />
that solves the maximisation problem (31) is the point at which the indifference<br />
curve is tangent to the budget line as given in Figure 1.<br />
One thing we can say about the solution is that at the point (x ∗ 1, x ∗ 2) it must be<br />
true that the marginal utility with respect to good 1 divided by the price of good 1<br />
must equal the marginal utility with respect to good 2 divided by the price of good<br />
2. For if this were not true then the consumer could, by decreasing the consumption<br />
of the good for which this ratio was lower and increasing the consumption of the<br />
other good, increase his utility. Marginal utilities are, of course, just the partial<br />
derivatives of the utility function. Thus we have<br />
(6)<br />
∂u<br />
∂x 1<br />
(x ∗ 1, x ∗ 2)<br />
p 1<br />
=<br />
∂u<br />
∂x 2<br />
(x ∗ 1, x ∗ 2)<br />
p 2<br />
.<br />
The argument we have just made seems very “economic.” It is easy to give an<br />
alternate argument that does not explicitly refer to the economic intuition. Let x u 2<br />
be the function that defines the indifference curve through the point (x ∗ 1, x ∗ 2), i.e.,<br />
u(x 1 , x u 2(x 1 )) ≡ ū ≡ u(x ∗ 1, x ∗ 2).<br />
Now, totally differentiating this identity gives<br />
∂u<br />
∂x 1<br />
(x 1 , x u 2(x 1 )) + ∂u<br />
∂x 2<br />
(x 1 , x u 2(x 1 )) dxu 2<br />
dx 1<br />
(x 1 ) = 0.<br />
37
383. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
x 2<br />
✻<br />
❅<br />
❅<br />
❅<br />
❅<br />
❅<br />
❅<br />
❅<br />
❅<br />
x ∗ 2<br />
<br />
<br />
<br />
<br />
<br />
<br />
x ∗ 1<br />
❅<br />
❅<br />
❅<br />
❅<br />
❅<br />
❅<br />
❅<br />
u(x 1 , x 2 ) = ū<br />
p 1 x 1 + p 2 x 2 = y<br />
✲<br />
x 1<br />
Figure 1<br />
That is,<br />
∂u<br />
dx u 2<br />
∂x<br />
(x 1 ) = − 1<br />
(x 1 , x u 2(x 1 ))<br />
dx<br />
∂u<br />
1 ∂x 2<br />
(x 1 , x u 2 (x 1)) .<br />
Now x u 2(x ∗ 1) = x ∗ 2. Thus the slope of the indifference curve at the point (x ∗ 1, x ∗ 2)<br />
dx u 2<br />
dx 1<br />
(x ∗ 1) = −<br />
∂u<br />
∂x 1<br />
(x ∗ 1, x ∗ 2)<br />
∂u<br />
∂x 2<br />
(x ∗ 1 , x∗ 2 ).<br />
Also, the slope of the budget line is − p1<br />
p 2<br />
. Combining these two results again gives<br />
result (6).<br />
Since we also have another equation that (x ∗ 1, x ∗ 2) must satisfy, viz<br />
(7) p 1 x ∗ 1 + p 2 x ∗ 2 = y<br />
we have two equations in two unknowns and we can (if we know what the utility<br />
function is and what p 1 , p 2 , and y are) go happily away and solve the problem.<br />
(This isn’t quite true but we shall not go into that at this point.) What we shall<br />
develop is a systemic and useful way to obtain the conditions (6) and (7). Let us<br />
first denote the common value of the ratios in (6) by λ. That is,<br />
and we can rewrite this and (7) as<br />
(8)<br />
∂u<br />
∂x 1<br />
(x ∗ 1, x ∗ 2)<br />
p 1<br />
= λ =<br />
∂u<br />
∂x 2<br />
(x ∗ 1, x ∗ 2)<br />
p 2<br />
∂u<br />
∂x 1<br />
(x ∗ 1, x ∗ 2) − λp 1 = 0<br />
∂u<br />
∂x 2<br />
(x ∗ 1, x ∗ 2) − λp 2 = 0<br />
y − p 1 x ∗ 1 − p 2 x ∗ 2 = 0.
1. CONSTRAINED MAXIMISATION 39<br />
Now we have three equations in x ∗ 1, x ∗ 2, and the new artificial or auxiliary variable<br />
λ. Again we can, perhaps, solve these equations for x ∗ 1, x ∗ 2, and λ. Consider the<br />
following function<br />
(9) L(x 1 , x 2 , λ) = u(x 1 , x 2 ) + λ(y − p 1 x 1 − p 2 x 2 )<br />
This function is known as the Lagrangian. Now, if we calculate ∂L<br />
∂x 1<br />
,<br />
∂L<br />
∂x 2<br />
, and, ∂L<br />
∂λ ,<br />
and set the results equal to zero we obtain exactly the equations given in (8). We<br />
now describe this technique in a somewhat more general way.<br />
Suppose that we have the following maximisation problem<br />
(10)<br />
max f(x 1 , . . . , x n )<br />
x 1,...,x n<br />
subject to g(x 1 , . . . , x n ) = c<br />
and we let<br />
(11) L(x 1 , . . . , x n , λ) = f(x 1 , . . . , x n ) + λ(c − g(x 1 , . . . , x n ))<br />
then if (x ∗ 1, . . . , x ∗ n) solves (10) there is a value of λ, say λ ∗ such that<br />
(12)<br />
(13)<br />
∂L<br />
(x ∗<br />
∂x<br />
1, . . . , x ∗ n, λ ∗ ) = 0<br />
i<br />
∂L<br />
∂λ (x∗ 1, . . . , x ∗ n, λ ∗ ) = 0.<br />
i = 1, . . . , n<br />
Notice that the conditions (12) are precisely the first order conditions for<br />
choosing x 1 , . . . , x n to maximise L, once λ ∗ has been chosen. This provides an<br />
intuition into this method of solving the constrained maximisation problem. In<br />
the constrained problem we have told the decision maker that he must satisfy<br />
g(x 1 , . . . , x n ) = c and that he should choose among all points that satisfy this constraint<br />
the point at which f(x 1 , . . . , x n ) is greatest. We arrive at the same answer<br />
if we tell the decision maker to choose any point he wishes but that for each unit by<br />
which he violates the constraint g(x 1 , . . . , x n ) = c we shall take away λ units from<br />
his payoff. <strong>Of</strong> course we must be careful to choose λ to be the correct value. If we<br />
choose λ too small the decision maker may choose to violate his constraint—e.g.,<br />
if we made the penalty for spending more than the consumer’s income very small<br />
the consumer would choose to consume more goods than he could afford and to<br />
pay the penalty in utility terms. On the other hand if we choose λ too large the<br />
decision maker may violate his constraint in the other direction, e.g., the consumer<br />
would choose not to spend any of his income and just receive λ units of utility for<br />
each unit of his income.<br />
It is possible to give a more general statement of this technique, allowing for<br />
multiple constraints. (<strong>Of</strong> course, we should always have fewer constraints than we<br />
have variables.) Suppose we have more than one constraint. Consider the problem<br />
Again we construct the Lagrangian<br />
max f(x 1 , . . . , x n )<br />
x 1,...,x n<br />
subject to g 1 (x 1 , . . . , x n ) = c 1<br />
.<br />
g m (x 1 , . . . , x n ) = c m .<br />
.<br />
(14)<br />
L(x 1 , . . . , x n , λ 1 , . . . , λ m ) = f(x 1 , . . . , x n )<br />
+ λ 1 (c 1 − g 1 (x 1 , . . . , x n )) + · · · + λ m (c m − g m (x 1 , . . . , x n ))
403. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
and again if (x ∗ 1, . . . , x ∗ n) solves (14) there are values of λ, say λ ∗ 1, . . . , λ ∗ m such that<br />
(15)<br />
∂L<br />
(x ∗<br />
∂x<br />
1, . . . , x ∗ n, λ ∗ 1, . . . , λ ∗ m) = 0<br />
i<br />
i = 1, . . . , n<br />
∂L<br />
(x ∗<br />
∂λ<br />
1, . . . , x ∗ n, λ ∗ 1, . . . , λ ∗ m) = 0<br />
j<br />
j = 1, . . . , m.<br />
1.2. Caveats and Extensions. Notice that we have been referring to the set<br />
of conditions which a solution to the maximisation problem must satisfy. (We call<br />
such conditions necessary conditions.) So far we have not even claimed that there<br />
necessarily is a solution to the maximisation problem. There are many examples of<br />
maximisation problems which have no solution. One example of an unconstrained<br />
problem with no solution is<br />
(16) max 2x<br />
x<br />
maximise over the choice of x the function 2x. Clearly the greater we make x the<br />
greater is 2x, and so, since there is no upper bound on x there is no maximum.<br />
Thus we might want to restrict maximisation problems to those in which we choose<br />
x from some bounded set. Again, this is not enough. Consider the problem<br />
(17) max<br />
0≤x≤1 1/x .<br />
The smaller we make x the greater is 1/x and yet at zero 1/x is not even defined.<br />
We could define the function to take on some value at zero, say 7. But then the<br />
function would not be continuous. Or we could leave zero out of the feasible set<br />
for x, say 0 < x ≤ 1. Then the set of feasible x is not closed. Since there would<br />
obviously still be no solution to the maximisation problem in these cases we shall<br />
want to restrict maximisation problems to those in which we choose x to maximise<br />
some continuous function from some closed (and because of the previous example)<br />
bounded set. (We call a set of numbers, or more generally a set of vectors, that<br />
is both closed and bounded a compact set.) Is there anything else that could go<br />
wrong? No! The following result says that if the function to be maximised is<br />
continuous and the set over which we are choosing is both closed and bounded, i.e.,<br />
is compact, then there is a solution to the maximisation problem.<br />
Theorem 6 (The Weierstrass Theorem). Let S be a compact set. Let f be a<br />
continuous function that takes each point in S to a real number. (We usually write:<br />
let f : S → R be continuous.) Then there is some x ∗ in S at which the function is<br />
maximised. More precisely, there is some x ∗ in S such that f(x ∗ ) ≥ f(x) for any<br />
x in S.<br />
Notice that in defining such compact sets we typically use inequalities, such<br />
as x ≥ 0. However in Section 1 we did not consider such constraints, but rather<br />
considered only equality constraints. However, even in the example of utility maximisation<br />
at the beginning of Section 5.6, there were implicitly constraints on x 1<br />
and x 2 of the form<br />
x 1 ≥ 0, x 2 ≥ 0.<br />
A truly satisfactory treatment would make such constraints explicit. It is possible<br />
to explicitly treat the maximisation problem with inequality constraints, at the<br />
price of a little additional complexity. We shall return to this question later in the<br />
book.<br />
Also, notice that had we wished to solve a minimisation problem we could<br />
have transformed the problem into a maximisation problem by simply multiplying<br />
the objective function by −1. That is, if we wish to minimise f(x) we could do<br />
so by maximising −f(x). As an exercise write out the conditions analogous to
2. THE IMPLICIT FUNCTION THEOREM 41<br />
the conditions (8) for the case that we wanted to minimise u(x). Notice that if<br />
x ∗ 1, x ∗ 2, and λ satisfy the original equations then x ∗ 1, x ∗ 2, and −λ satisfy the new<br />
equations. Thus we cannot tell whether there is a maximum at (x ∗ 1, x ∗ 2) or a<br />
minimum. This corresponds to the fact that in the case of a function of a single<br />
variable over an unconstrained domain at a maximum we require the first derivative<br />
to be zero, but that to know for sure that we have a maximum we must look at the<br />
second derivative. We shall not develop the analogous conditions for the constrained<br />
problem with many variables here. However, again, we shall return to it later in<br />
the book.<br />
2. The Implicit Function Theorem<br />
In the previous section we said things like: “Now we have three equations<br />
in x ∗ 1, x ∗ 2, and the new artificial or auxiliary variable λ. Again we can, perhaps,<br />
solve these equations for x ∗ 1, x ∗ 2, and λ.” In this section we examine the question<br />
of when we can solve a system of n equations to give n of the variable in terms<br />
of the others. Let us suppose that we have n endogenous variables x 1 , . . . , x n ,<br />
m exogenous variables or parameters, b 1 , . . . , b m , and n equations or equilibrium<br />
conditions<br />
f 1 (x 1 , . . . , x n , b 1 , . . . , b m ) = 0<br />
(18)<br />
or, using vector notation,<br />
f 2 (x 1 , . . . , x n , b 1 , . . . , b m ) = 0<br />
f n (x 1 , . . . , x n , b 1 , . . . , b m ) = 0,<br />
f(x, b) = 0,<br />
where f : R n+m → R n , x ∈ R n , that is it is an n vector, b ∈ R m , and 0 ∈ R n .<br />
When can we solve this system to obtain functions giving each x i as a function<br />
of b 1 , . . . , b m ? As we’ll see below we only give an incomplete answer to this question,<br />
but first let’s look at the case that the function f is a linear function.<br />
Suppose that our equations are<br />
a 11 x 1 + . . . a 1n x n + c 11 b 1 + c 1m b m = 0<br />
a 21 x 1 + . . . a 2n x n + c 21 b 1 + c 2m b m = 0<br />
a n1 x 1 + . . . a nn x n + c n1 b 1 + c nm b m = 0.<br />
We can write this, in matrix notation, as<br />
[ ]<br />
x<br />
[A | C] = 0,<br />
b<br />
where A is an n × n matrix, C is an n × m matrix, x is an n × 1 (column) vector,<br />
and b is an m × 1 vector.<br />
This we can rewrite as<br />
Ax + Cb = 0,<br />
and solve this to give<br />
x = −A −1 Cb.<br />
And we can do this as long as the matrix A can be inverted, that is, as long as the<br />
matrix A is of full rank.<br />
Our answer to the general question in which the function f may not be linear<br />
is that if there are some values (¯x, ¯b) for which f(¯x, ¯b) = 0 then if, when we take<br />
a linear approximation to f we can solve the approximate linear system as we did<br />
.<br />
.
423. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
above, then we can solve the true nonlinear system, at least in a neighbourhood of<br />
(¯x, ¯b). By this last phrase we mean that if b is not close to ¯b we may not be able to<br />
solve the system, and that for a particular value of b there may be many values of<br />
x that solve the system, but there is only one close to ¯x.<br />
To see why we can’t, in general, do better than this consider the equation<br />
f : R 2 → R given by f(x, b) = g(x)−b, where the function g is graphed in Figure 2.<br />
Notice that the values (¯x, ¯b) satisfy the equation f(x, b) = 0. For all values of b<br />
close to ¯b we can find a unique value of x close to ¯x such that f(x, b) = 0. However,<br />
(1) for each value of b there are other values of x far away from ¯x that also satisfy<br />
f(x, b) = 0, and (2) there are values of b, such as ˜b for which there are no values of<br />
x that satisfy f(x, b) = 0.<br />
g(x)<br />
✻<br />
˜b<br />
<br />
¯b<br />
<br />
<br />
<br />
¯x<br />
✲<br />
x<br />
Figure 2<br />
Let us consider again the system of equations 18. We say that the function f<br />
is C 1 on some open set A ⊂ R n+m if f has partial derivatives everywhere in A and<br />
these partial derivatives are continuous on A.<br />
Theorem 7. Suppose that f : R n+m → R n is a C 1 function on an open set<br />
A ⊂ R n+m and that (¯x, ¯b) in A is such that f(¯x, ¯b) = 0. Suppose also that<br />
⎡<br />
⎤<br />
∂f 1(x,b)<br />
∂x 1<br />
· · ·<br />
∂f(x, b)<br />
∂x<br />
=<br />
⎢<br />
⎣<br />
.<br />
∂f n(x,b)<br />
∂x 1<br />
· · ·<br />
∂f 1(x,b)<br />
∂x n<br />
.<br />
∂f n(x,b)<br />
∂x n<br />
is of full rank. Then there are open sets A 1 ⊂ R n and A 2 ⊂ R m with ¯x in A 1 and<br />
¯b in A2 and A 1 × A 2 ⊂ A such that for each b in A 2 there is exactly one g(b) in A 1<br />
such that f(g(b), b) = 0. Moreover, g : A 2 → A 1 is a C 1 function and<br />
∂g(b)<br />
∂b<br />
[ ] −1 [ ∂f(g(b), b) ∂f(g(b), b)<br />
= −<br />
∂x<br />
∂b<br />
⎥<br />
⎦<br />
]<br />
.
3. THE THEOREM OF THE MAXIMUM 43<br />
(19)<br />
Exercise 74. Consider the general utility maximisation problem<br />
max u(x 1 , x 2 , . . . , x n )<br />
x 1,x 2,...,x n<br />
subject to p 1 x 1 + p 2 x 2 + · · · + p n x n = w.<br />
Suppose that for some price vector ¯p the maximisation problem has a utility maximising<br />
bundle ¯x. Find conditions on the utility function such that in a neighbourhood<br />
of (¯x, ¯p) we can solve for the demand functions x(p). Find the derivatives of<br />
the demand functions, ∂x/∂p.<br />
Exercise 75. Now suppose that there are only two goods and the utility<br />
function is given by<br />
u(x 1 , x 2 ) = (x 1 ) 1 3 (x2 ) 2 3 .<br />
Solve this utility maximisation problem, as you learned to do in Section 1 of this<br />
Chapter, and then differentiate the demand functions that you find to find the<br />
partial derivative with respect to p 1 , p 2 , and w of each demand function.<br />
Also find the same derivatives using the method of the previous exercise.<br />
3. The Theorem of the Maximum<br />
<strong>Of</strong>ten in economics we are not so much interested in what the solution to a<br />
particular maximisation problem is but rather wish to know how the solution to a<br />
parameterised problem depends on the parameters. Thus in our first example of<br />
utility maximisation we might be interested not so much in what the solution to the<br />
maximisation problem is when p 1 = 2, p 2 = 7, and y = 25, but rather in how the<br />
solution depends on p 1 , p 2 , and y. (That is, we might be interested in the demand<br />
function.) Sometimes we shall also be interested in how the maximised function<br />
depends on the parameters—in the example how the maximised utility depends on<br />
p 1 , p 2 , and y.<br />
This raises a number of questions. In order for us to speak meaningfully of a<br />
demand function it should be the case that the maximisation problem has a unique<br />
solution. Further, we would like to know if the “demand” function is continuous—<br />
or even if it is differentiable. Consider again the problem (14), but this time let us<br />
explicitly add some parameters.<br />
(20)<br />
max f(x 1 , . . . , x n , a 1 , . . . , a k )<br />
x 1,...,x n<br />
subject to g 1 (x 1 , . . . , x n , a 1 , . . . , a k ) = c 1<br />
.<br />
g m (x 1 , . . . , x n , a 1 , . . . , a k ) = c m<br />
In order to be able to say whether or not the problem has a unique solution<br />
it is useful to know something about the shape or curvature of the functions f<br />
and g. We say a function is concave if for any two points in the domain of the<br />
function the value of function at a weighted average of the two points is greater<br />
than the weighted average of the value of the function at the two points. We say<br />
the function is convex if the value of the function at the average is less than the<br />
average of the values. The following definition makes this a little more explicit. (In<br />
both definitions x = (x 1 , . . . , x n ) is a vector.)<br />
Definition 15. A function f is concave if for any x and x ′ with x ≠ x ′ and<br />
for any t such that 0 < t < 1 we have f(tx + (1 − t)x ′ ) ≥ tf(x) + (1 − t)f(x ′ ). The<br />
function is strictly concave if f(tx + (1 − t)x ′ ) > tf(x) + (1 − t)f(x ′ ).<br />
A function f is convex if for any x and x ′ with x ≠ x ′ and for any t such that<br />
0 < t < 1 we have f(tx + (1 − t)x ′ ) ≤ tf(x) + (1 − t)f(x ′ ). The function is strictly<br />
convex if f(tx + (1 − t)x ′ ) < tf(x) + (1 − t)f(x ′ ).<br />
.
443. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
The result we are about to give is most conveniently stated when our statement<br />
of the problem is in terms of inequality constraints rather than equality constraints.<br />
As mentioned earlier we shall examine this kind of problem later in this course.<br />
However for the moment in order to proceed with our discussion of the problem<br />
involving equality constraints we shall assume that all of the functions with which<br />
we are dealing are increasing in the x variables. (See Exercise 1 for a formal<br />
definition of what it means for a function to be increasing.) In this case if f is<br />
strictly concave and g j is convex for each j then the problem has a unique solution.<br />
In fact the concepts of concavity and convexity are somewhat stronger than is<br />
required. We shall see later in the course that they can be replaced by the concepts<br />
of quasi-concavity and quasi-convexity. In some sense these latter concepts are the<br />
“right” concepts for this result.<br />
Theorem 8. Suppose that f and g j are increasing in (x 1 , . . . , x n ). If f is<br />
strictly concave in (x 1 , . . . , x n ) and g j is convex in (x 1 , . . . , x n ) for j = 1, . . . , m<br />
then for each value of the parameters (a 1 , . . . , a k ) if problem (20) has a solution<br />
(x ∗ 1, . . . , x ∗ n) that solution is unique.<br />
Now let v(a 1 , . . . , a k ) be the maximised value of f when the parameters are<br />
(a 1 , . . . , a k ). Let us suppose that the problem is such that the solution is unique and<br />
that (x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k )) are the values that maximise the function<br />
f when the parameters are (a 1 , . . . , a k ) then<br />
(21) v(a 1 , . . . , a k ) = f(x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k ).<br />
(Notice however that the function v is uniquely defined even if there is not a unique<br />
maximiser.)<br />
The Theorem of the Maximum gives conditions on the problem under which<br />
the function v and the functions x ∗ 1, . . . , x ∗ n are continuous. The constraints in the<br />
problem (20) define a set of feasible vectors x over which the function f is to be<br />
maximised. Let us call this set G(a 1 , . . . , a k ), i.e.,<br />
(22) G(a 1 , . . . , a k ) = {(x 1 , . . . , x n ) | g j (x 1 , . . . , x n , a 1 , . . . , a k ) = c j ∀j}<br />
Now we can restate the problem as<br />
(23)<br />
max f(x 1 , . . . , x n , a 1 , . . . , a k )<br />
x 1,...,x n<br />
subject to (x 1 , . . . , x n ) ∈ G(a 1 , . . . , a k ).<br />
Notice that both the function f and the feasible set G depend on the parameters<br />
a, i.e., both may change as a changes. The Theorem of the Maximum requires<br />
both that the function f be continuous as a function of x and a and that the<br />
feasible set G(a 1 , . . . , a k ) change continuously as a changes. We already know—<br />
or should know—what it means for f to be continuous but the notion of what it<br />
means for a set to change continuously is less elementary. We call G a set valued<br />
function or a correspondence. G associates with any vector (a 1 , . . . , a k ) a subset of<br />
the vectors (x 1 , . . . , x n ). The following two definitions define what we mean by a<br />
correspondence being continuous. First we define what it means for two sets to be<br />
close.<br />
Definition 16. Two sets of vectors A and B are within ɛ of each other if for<br />
any vector x in one set there is a vector x ′ in the other set such that x ′ is within ɛ<br />
of x.<br />
We can now define the continuity of the correspondence G in essentially the<br />
same way that we define the continuity of a single valued function.
4. THE ENVELOPE THEOREM 45<br />
Definition 17. The correspondence G is continuous at (a 1 , . . . , a k ) if for any<br />
ɛ > 0 there is δ > 0 such that if (a ′ 1, . . . , a ′ k ) is within δ of (a 1, . . . , a k ) then<br />
G(a ′ 1, . . . , a ′ k ) is within ɛ of G(a 1, . . . , a k ).<br />
It is, unfortunately, not the case that the continuity of the functions g j necessarily<br />
implies the continuity of the feasible set. (Exercise 2 asks you to construct a<br />
counterexample.)<br />
Remark 1. It is possible to define two weaker notions of continuity, which we<br />
call upper hemicontinuity and lower hemicontinuity. A correspondence is in fact<br />
continuous in the way we have defined it if it is both upper hemicontinuous and<br />
lower hemicontinuous.<br />
We are now in a position to state the Theorem of the Maximum. We assume<br />
that f is a continuous function, that G is a continuous correspondence, and that<br />
for any (a 1 , . . . , a k ) the set G(a 1 , . . . , a k ) is compact. The Weierstrass Theorem<br />
thus guarantees that there is a solution to the maximisation problem (23) for any<br />
(a 1 , . . . , a k ).<br />
Theorem 9 (Theorem of the Maximum). Suppose that f(x 1 , . . . , x n , a 1 , . . . , a k )<br />
is continuous (in (x 1 , . . . , x n , a 1 , . . . , a k )), that G(a 1 , . . . , a k ) is a continuous correspondence,<br />
and that for any (a 1 , . . . , a k ) the set G(a 1 , . . . , a k ) is compact. Then<br />
(1) v(a 1 , . . . , a k ) is continuous, and<br />
(2) if (x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k )) are (single valued) functions then<br />
they are also continuous.<br />
Later in the course we shall see how the Implicit Function Theorem allows us<br />
to identify conditions under which the functions v and x ∗ are differentiable.<br />
Exercises.<br />
Exercise 76. We say that the function f(x 1 , . . . , x n ) is nondecreasing if x ′ i ≥<br />
x i for each i implies that f(x ′ 1, . . . , x ′ n) ≥ f(x 1 , . . . , x n ), is increasing if x ′ i > x i<br />
for each i implies that f(x ′ 1, . . . , x ′ n) > f(x 1 , . . . , x n ) and is strictly increasing if<br />
x ′ i ≥ x i for each i and x ′ j > x j for at least one j implies that f(x ′ 1, . . . , x ′ n) ><br />
f(x 1 , . . . , x n ). Show that if f is nondecreasing and strictly concave then it must be<br />
strictly increasing. [Hint: This is very easy.]<br />
Exercise 77. Show by example that even if the functions g j are continuous<br />
the correspondence G may not be continuous. [Hint: Use the case n = m = k = 1.]<br />
4. The Envelope Theorem<br />
In this section we examine a theorem that is particularly useful in the study<br />
of consumer and producer theory. There is in fact nothing mysterious about this<br />
theorem. You will see that the proof of this theorem is simply calculation and a<br />
number of substitutions. Moreover the theorem has a very clear intuition. It is this:<br />
Suppose we are at a maximum (in an unconstrained problem) and we change the<br />
data of the problem by a very small amount. Now both the solution of the problem<br />
and the value at the maximum will change. However at a maximum the function<br />
is flat (the first derivative is zero). Thus when we want to know by how much the<br />
maximised value has changed it does not matter (very much) whether or not we<br />
take account of how the maximiser changes or not. See Figure 2. The intuition for<br />
a constrained problem is similar and only a little more complicated.<br />
To motivate our discussion of the Envelope Theorem we will first consider a<br />
particular case, viz, the relation between short and long run average cost curves.<br />
Recall that, in general we assume that the average cost of producing some good is
463. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
f(x, a)<br />
✻<br />
f(x ∗ (a ′ ), a ′ )<br />
f(x ∗ (a), a ′ )<br />
f(x ∗ (a), a)<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
f(·, a)<br />
<br />
<br />
x ∗ (a) x ∗ (a ′ )<br />
f(·, a ′ )<br />
✲<br />
x<br />
Figure 2<br />
a function of the amount of the good to be produced. The short run average cost<br />
function is defined to be the function which for any quantity, Q, gives the average<br />
cost of producing that quantity, taking as given the scale of operation, i.e., the size<br />
and number of plants and other fixed capital which we assume cannot be changed<br />
in the short run (whatever that is). The long run average cost function on the<br />
other hand gives, as a function of Q, the average cost of producing Q units of the<br />
good, with the scale of operation selected to be the optimal scale for that level of<br />
production.<br />
That is, if we let the scale of operation be measured by a single variable k,<br />
say, and we let the short run average cost of producing Q units when the scale is<br />
k be given by SRAC(Q, k) and the long run average cost of producing Q units by<br />
LRAC(Q) then we have<br />
LRAC(Q) = min SRAC(Q, k).<br />
k<br />
Let us denote, for a given value Q, the optimal level of k by k(Q). That is, k(Q) is<br />
the value of k that minimises the right hand side of the above equation.<br />
Graphically, for any fixed level of k the short run average cost function can be<br />
represented by a curve (normally assumed to be U-shaped) drawn in two dimensions<br />
with quantity on the horizontal axis and cost on the vertical axis. Now think about<br />
drawing one short run average cost curve for each of the (infinite) possible values of<br />
k. One way of thinking about the long run average cost curve is as the “bottom” or<br />
envelope of these short run average cost curves. Suppose that we consider a point<br />
on this long run or envelope curve. What can be said about the slope of the long<br />
run average cost curve at this point. A little thought should convince you that it<br />
should be the same as the slope of the short run curve through the same point.<br />
(If it were not then that short run curve would come below the long run curve, a
4. THE ENVELOPE THEOREM 47<br />
contradiction.) That is,<br />
See Figure 3.<br />
d LRAC(Q)<br />
dQ<br />
=<br />
∂ SRAC(Q, k(Q))<br />
.<br />
∂Q<br />
Cost<br />
✻<br />
SRAC<br />
LRAC( ¯Q) =<br />
SRAC( ¯Q, k( ¯Q))<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
¯Q<br />
LRAC<br />
✲<br />
Q<br />
Figure 3<br />
The envelope theorem is a general statement of the result of which this is a<br />
special case. We will consider not only cases in which Q and k are vectors, but also<br />
cases in which the maximisation or minimisation problem includes some constraints.<br />
Let us consider again the maximisation problem (20). Recall:<br />
max f(x 1 , . . . , x n , a 1 , . . . , a k )<br />
x 1,...,x n<br />
subject to g 1 (x 1 , . . . , x n , a 1 , . . . , a k ) = c 1<br />
.<br />
.<br />
g m (x 1 , . . . , x n , a 1 , . . . , a k ) = c m<br />
Again let L(x 1 , . . . , x n , λ 1 , . . . , λ m ; a 1 , . . . , a k ) be the Lagrangian function.<br />
(24)<br />
L(x 1 , . . . , x n , λ 1 , . . . , λ m ; a 1 , . . . , a k ) = f(x 1 , . . . , x n , a 1 , . . . , a k )<br />
m∑<br />
+ λ j (c j − g j (x 1 , . . . , x n , a 1 , . . . , a k )).<br />
j=1<br />
Let (x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k )) and (λ 1 (a 1 , . . . , a k ), . . . , λ m (a 1 , . . . , a k )) be<br />
the values of x and λ that solve this problem. Now let<br />
(25) v(a 1 , . . . , a k ) = f(x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k )<br />
That is, v(a 1 , . . . , a k ) is the maximised value of the function f when the parameters<br />
are (a 1 , . . . , a k ). The envelope theorem says that the derivative of v is equal to the<br />
derivative of L at the maximising values of x and λ. Or, more precisely
483. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
Theorem 10 (The Envelope Theorem). If all functions are defined as above<br />
and the problem is such that the functions x ∗ and λ are well defined then<br />
∂v<br />
∂a h<br />
(a 1 , . . . , a k ) = ∂L<br />
∂a h<br />
(x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ),<br />
for all h.<br />
λ 1 (a 1 , . . . , a k ), . . . , λ m (a 1 , . . . , a k ), a 1 , . . . , a k )<br />
= ∂f (x ∗<br />
∂a<br />
1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k )<br />
h<br />
m∑<br />
− λ j (a 1 , . . . , a k ) ∂g h<br />
(x ∗<br />
∂a<br />
1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k )<br />
h<br />
j=1<br />
In order to show the advantages of using matrix and vector notation we shall<br />
restate the theorem in that notation before returning to give a proof of the theorem.<br />
(In proving the theorem we shall return to using mainly scalar notation.)<br />
Theorem 10 (The Envelope Theorem). Under the same conditions as above<br />
∂v ∂L<br />
(a) =<br />
∂a ∂a (x∗ (a), λ(a), a)<br />
= ∂f<br />
∂a (x∗ (a), a) − λ(a) ∂g<br />
∂a (x∗ (a), a).<br />
Proof. From the definition of the function v we have<br />
(26) v(a 1 , . . . , a k ) = f(x ∗ 1(a 1 , . . . , a k ), . . . , x ∗ n(a 1 , . . . , a k ), a 1 , . . . , a k )<br />
Thus<br />
(27)<br />
∂v<br />
(a) = ∂f (x ∗ (a), a) +<br />
∂a h ∂a h<br />
n∑<br />
i=1<br />
∂f<br />
∂x i<br />
(x ∗ (a), a) ∂x∗ i<br />
∂a h<br />
(a).<br />
Now, from the first order conditions (12) we have<br />
∂f<br />
m∑<br />
(x ∗ (a), a) − λ j (a) ∂g j<br />
(x ∗ (a), a) = 0.<br />
∂x i ∂x i<br />
Or<br />
(28)<br />
j=1<br />
∂f<br />
∂x i<br />
(x ∗ (a), a) =<br />
m∑<br />
j=1<br />
λ j (a) ∂g j<br />
∂x i<br />
(x ∗ (a), a).<br />
Also, since x ∗ (a) satisfies the constraints we have, for each j<br />
g j (x ∗ 1(a), . . . , x ∗ n(a), a 1 , . . . , a k ) ≡ c j .<br />
And, since this holds as an identity, we may differentiate both sides with respect<br />
to a h giving<br />
n∑ ∂g j<br />
(x ∗ (a), a) ∂x∗ i<br />
(a) + ∂g j<br />
(x ∗ (a), a) = 0.<br />
∂x i ∂a h ∂a h<br />
Or<br />
(29)<br />
i=1<br />
n∑<br />
i=1<br />
Substituting (28) into (27) gives<br />
∂g j<br />
∂x i<br />
(x ∗ (a), a) ∂x∗ i<br />
∂a h<br />
(a) = − ∂g j<br />
∂a h<br />
(x ∗ (a), a).<br />
∂v<br />
(a) = ∂f (x ∗ (a), a) +<br />
∂a h ∂a h<br />
n∑ m∑<br />
[<br />
i=1 j=1<br />
λ j (a) ∂g j<br />
∂x i<br />
(x ∗ (a), a)] ∂x∗ i<br />
∂a h<br />
(a).
5. APPLICATIONS TO MICRO<strong>ECON</strong>OMIC THEORY 49<br />
Changing the order of summation gives<br />
(30)<br />
∂v<br />
∂a h<br />
(a) = ∂f<br />
∂a h<br />
(x ∗ (a), a) +<br />
m∑ n∑<br />
λ j (a)[<br />
j=1<br />
i=1<br />
∂g j<br />
∂x i<br />
(x ∗ (a), a) ∂x∗ i<br />
∂a h<br />
(a)].<br />
And now substituting (29) into (30) gives<br />
∂v<br />
(a) = ∂f (x ∗ (a), a) −<br />
∂a h ∂a h<br />
m∑<br />
j=1<br />
λ j (a) ∂g j<br />
∂a h<br />
(x ∗ (a), a),<br />
which is the required result.<br />
□<br />
Exercises.<br />
Exercise 78. Rewrite this proof using matrix notation. Go through your proof<br />
and identify the dimension of each of the vectors or matrices you use. For example<br />
f x is a 1 × n vector, g x is an m × n matrix.<br />
5. Applications to Microeconomic Theory<br />
5.1. Utility Maximisation. Let us again consider the problem given in (31)<br />
max u(x 1 , x 2 )<br />
x 1,x 2<br />
subject to p 1 x 1 + p 2 x 2 − y = 0.<br />
Let v(p 1 , p 2 , y) be the maximised value of u when prices and income are p 1 , p 2 , and<br />
y. Let us consider the effect of a change in y with p 1 and p 2 remaining constant.<br />
By the Envelope Theorem<br />
∂v<br />
∂y = ∂ ∂y {u(x 1, x 2 ) + λ(y − p 1 x 1 + p 2 x 2 )} = 0 + λ1 = λ.<br />
This is the familiar result that λ is the marginal utility of income.<br />
5.2. Expenditure Minimisation. Let us consider the problem of minimising<br />
expenditure subject to attaining a given level of utility, i.e.,<br />
min<br />
x 1,...,x n<br />
n<br />
∑<br />
i=1<br />
p i x i<br />
subject to u(x 1 , . . . , x n ) − u 0 = 0.<br />
Let the minimised value of the expenditure function be denoted by<br />
e(p 1 , . . . , p n , u 0 ). Then by the Envelope Theorem we obtain<br />
∂e<br />
=<br />
∂ {<br />
∂p i ∂p i<br />
n∑<br />
p i x i + λ(u 0 − u(x 1 , . . . , x n ))} = x i − λ0 = x i<br />
i=1<br />
when evaluated at the point which solves the minimisation problem which we write<br />
as h i (p 1 , . . . , p n , u 0 ) to distinguish this (compensated) value of the demand for good<br />
i as a function of prices and utility from the (uncompensated) value of the demand<br />
for good i as a function of prices and income. This result is known as Hotelling’s<br />
Theorem.
503. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
5.3. The Hicks-Slutsky Equations. It can be shown that the compensated<br />
demand at utility u 0 , i.e., h i (p 1 , . . . , p n , u 0 ) is equal to the uncompensated demand<br />
at income e(p 1 , . . . , p n , u 0 ), i.e., x i (p 1 , . . . , p n , e(p 1 , . . . , p n , u 0 )). (This result<br />
is known as the duality theorem.) Thus totally differentiating the identity<br />
with respect to p k we obtain<br />
x i (p 1 , . . . , p n , e(p 1 , . . . , p n , u 0 )) ≡ h i (p 1 , . . . , p n , u 0 )<br />
which by Hotelling’s Theorem gives<br />
So<br />
∂x i<br />
+ ∂x i ∂e<br />
= ∂h i<br />
∂p k ∂y ∂p k ∂p k<br />
∂x i<br />
∂p k<br />
+ ∂x i<br />
∂y h k = ∂h i<br />
∂p k<br />
.<br />
∂x i<br />
= ∂h i ∂x i<br />
− h k<br />
∂p k ∂p k ∂y<br />
for all i, k = 1, . . . , n. These are the Hicks-Slutsky equations.<br />
5.4. The Indirect Utility Function. Again let v(p 1 , . . . , p n , y) be the indirect<br />
utility function, that is, the maximised value of utility as described in Application<br />
(1). Then by the Envelope Theorem<br />
since ∂u<br />
∂p i<br />
have<br />
∂v<br />
= ∂u − λx i (p 1 , . . . , p n , y) = −λx i (p 1 , . . . , p n , y)<br />
∂p i ∂p i<br />
= 0. Now, since we have already shown that λ = ∂v<br />
∂y<br />
This is known as Roy’s Theorem.<br />
x i (p 1 , . . . , p n , y) = − ∂v/∂p i<br />
∂v/∂y .<br />
(in Section 4.1) we<br />
5.5. Profit functions. Now consider the problem of a firm that maximises<br />
profits subject to technology constraints. Let x = (x 1 , . . . , x n ) be a vector of<br />
netputs, i.e., x i is positive if the firm is a net supplier of good i, negative if the firm<br />
is a net user of that good. Let assume that we can write the technology constraints<br />
as F (x) = 0. Thus the firm’s problem is<br />
max<br />
x 1,...,x n<br />
n ∑<br />
i=1<br />
p i x i<br />
subject to F (x 1 , . . . , x n ) = 0.<br />
Let ϕ i (p) be the value of x i that solves this problem, i.e., the net supply of<br />
commodity i when prices are p. (Here p is a vector.) We call the maximised value<br />
the profit function which is given by<br />
n∑<br />
Π(p) = p i ϕ i (p).<br />
And so by the Envelope Theorem<br />
i=1<br />
∂Π<br />
∂p i<br />
= ϕ i (p).<br />
This result is known as Hotelling’s lemma.
5. APPLICATIONS TO MICRO<strong>ECON</strong>OMIC THEORY 51<br />
5.6. Cobb-Douglas Example. We consider a particular Cobb-Douglas example<br />
of the utility maximisation problem<br />
√ √<br />
x1 x2<br />
(31)<br />
The Lagrangean is<br />
max<br />
x 1,x 2<br />
subject to p 1 x 1 + p 2 x 2 = w<br />
(32) L(x 1 , x 2 , λ) = √ x 1<br />
√<br />
x2 + λ(y − p 1 x 1 − p 2 x 2 )<br />
and the first order conditions are<br />
∂L<br />
= 1 1<br />
∂x 1 2 x− 2<br />
1 x 1 2<br />
(33)<br />
2 − p 1 λ = 0<br />
∂L<br />
= 1 ∂x 2 2 x 1 2<br />
1 x − 1 2<br />
(34)<br />
2 − p 2 λ = 0<br />
∂L<br />
(35)<br />
∂λ = w − p 1x 1 − p 2 x 2 = 0.<br />
If we divide equation (33) by equation (34) we obtain<br />
x 1 −1 x 2 = p 1 /p 2<br />
or<br />
p 1 x 1 = p 2 x 2<br />
and if we substitute this into equation (35) we obtain<br />
or<br />
w − p 1 x 1 − p 1 x 1 = 0<br />
(36) x 1 = w<br />
2p 1<br />
.<br />
Similarly,<br />
(37) x 2 = w<br />
2p 2<br />
.<br />
Substituting equations (36) and (37) into the utility function gives<br />
√<br />
w<br />
(38) v(p 1 , p 2 , w) =<br />
2 w<br />
=<br />
4p 1 p 2 2 √ .<br />
p 1 p 2<br />
As a check here we can check some known properties of the indirect utility<br />
function. For example it is homogeneous of degree zero, that is, is we multiply p 1 ,<br />
p 2 , and w by the same positive constant, say α we do not change the value of v.<br />
You should confirm that this is the case.<br />
We now calculate the optimal value of λ from the first order conditions by<br />
substituting equations (36) and (37) into (33), giving<br />
or<br />
or<br />
or<br />
1<br />
2<br />
( ) − 1 ( ) 1<br />
w<br />
2 2<br />
w − p1 λ = 0<br />
2p 1 2p 2<br />
√<br />
1 2p1 w<br />
= p 1 λ<br />
2 w2p 2<br />
√<br />
p1<br />
1 1<br />
√ · = λ<br />
2 p2 p 1<br />
λ =<br />
1<br />
2 √ p 1 p 2<br />
.
523. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
Our first application of the Envelope Theorem told us that this value of λ could<br />
be found as the derivative of the indirect utility function with respect to w. We<br />
confirm this by differentiating the function we found above with respect to w.<br />
∂v<br />
∂w =<br />
∂<br />
∂w<br />
w<br />
2 √ p 1 p 2<br />
=<br />
1<br />
2 √ p 1 p 2<br />
as we had found directly above.<br />
Now let us, for the same utility function consider the expenditure minimisation<br />
problem<br />
The Lagrangian is<br />
min<br />
x 1,x 2<br />
p 1 x 2 + p 2 x 2<br />
subject to √ x 1<br />
√<br />
x2 = u.<br />
(39) L(x 1 , x 2 , λ) = p 1 x 1 + p 2 x 2 + λ(u − √ x 1<br />
√<br />
x2 )<br />
and the first order conditions are<br />
(40)<br />
(41)<br />
(42)<br />
∂L<br />
= p 1 − λ 1 1<br />
∂x 1 2 x− 2<br />
1 x 1 2<br />
2 = 0<br />
∂L<br />
= p 2 − λ 1 ∂x 2 2 x 1 2<br />
1 x − 1 2<br />
2 = 0<br />
∂L<br />
∂λ = u − √ √<br />
x 1 x2 = 0.<br />
Dividing equation (40) by equation (41) gives<br />
p 1<br />
= x 2<br />
p 2 x 1<br />
or<br />
(43) x 2 = p 1x 1<br />
p 2<br />
.<br />
And, if we substitute equation (43) into equation (40) we obtain<br />
or<br />
Similarly,<br />
u − x 1<br />
√<br />
p1<br />
p 2<br />
√<br />
p2<br />
x 1 = u .<br />
p 1<br />
√<br />
p1<br />
x 2 = u ,<br />
p 2<br />
and if we substitute these values back into the objective function we obtain the<br />
expenditure function<br />
e(p 1 , p 2 , u) = p 1 u<br />
√<br />
p2<br />
p 1<br />
+ p 2 u<br />
√<br />
p1<br />
p 2<br />
= 2u √ p 1 p 2 .<br />
Hotelling’s Theorem tells us that is we differentiate this expenditure function<br />
with respect to p i we should obtain the Hicksian demand function h i .<br />
∂e(p 1 , p 2 , u)<br />
= ∂ 2u √ p 1 p 2 = 2u · 1 √ √<br />
p2 p2<br />
= u<br />
∂p 1 ∂p 1 2 p 1<br />
as we had already found. And similarly for h 2 .<br />
p 1
5. APPLICATIONS TO MICRO<strong>ECON</strong>OMIC THEORY 53<br />
Let us summarise what we have found so far. The Marshallian demand functions<br />
are<br />
x 1 (p 1 , p 2 , w) =<br />
w<br />
2p 1<br />
x 2 (p 1 , p 2 , w) =<br />
w<br />
2p 2<br />
The indirect utility function is<br />
The Hicksian demand functions are<br />
v(p 1 , p 2 , w) =<br />
h 1 (p 1 , p 2 , w) = u<br />
w<br />
2 √ p 1 p 2<br />
.<br />
√<br />
p2<br />
p 1<br />
√<br />
p1<br />
h 2 (p 1 , p 2 , w) = u ,<br />
p 2<br />
and the expenditure function is<br />
e(p 1 , p 2 , u) = 2u √ p 1 p 2 .<br />
We now look at the third application concerning the Hicks -Slutsky decomposition.<br />
First let us confirm that if we substitute the expenditure function for w in<br />
the Marshallian demand function we do obtain the Hicksian demand function.<br />
x 1 (p 1 , p 2 , e(p 1 , p 2 , u)) = e(p 1, p 2 , u)<br />
2p 1<br />
= 2u√ p 1 p 2<br />
2p 1<br />
= u<br />
√<br />
p2<br />
as required.<br />
Similarly, if we plug the indirect utility function v into the Hicksian demand<br />
function h i we obtain the Marshallian demand function x i . Confirmation of this is<br />
left as an exercise. [You should do this exercise. If you understand properly it is<br />
very easy. If you understand a bit then doing the exercise will solidify your understanding.<br />
If you can’t do it then it is a message to get some further explanation.]<br />
Let us now check the Hicks-Slutsky decomposition for the effect of a change in<br />
the price of good 2 on the demand for good 1. The Hicks-Slutsky decomposition<br />
tells us that<br />
∂x 1<br />
= ∂h 1 ∂x 1<br />
− h 2<br />
∂p 2 ∂p 2 ∂w .<br />
Calculating these partial derivatives we have<br />
p 1<br />
,<br />
∂x 1<br />
= 0<br />
∂p 2<br />
∂x 1<br />
∂w = 1<br />
2p 1<br />
∂h 1<br />
= √ u × 1 ∂p 2 p1 2 × √ 1<br />
p2<br />
u<br />
=<br />
2 √ p 1 p 2
543. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT<br />
and<br />
√<br />
p2<br />
h 1 = u .<br />
p 1<br />
Substituting into the right hand side of the Hicks-Slutsky equation above gives<br />
RHS =<br />
u<br />
2 √ p 1 p 2<br />
− u<br />
√<br />
p2<br />
p 1<br />
·<br />
1<br />
2p 1<br />
= 0,<br />
which is exactly what we had found for the left hand side of the Hicks-Slutsky<br />
equation.<br />
Finally we check Roy’s Theorem, which tells us that the Marshallian demand<br />
for good 1 can be found as<br />
In this case we obtain<br />
as required.<br />
Exercises.<br />
x 1 (p 1 , p 2 , w) = − ∂v<br />
∂p 1<br />
.<br />
∂v<br />
∂w<br />
x 1 (p 1 , p 2 , w) = − w 2 × 1 √ p2<br />
= w<br />
2p 1<br />
,<br />
1<br />
2<br />
× −1<br />
√<br />
1<br />
p 1p 2<br />
Exercise 79. Consider the direct utility function<br />
n∑<br />
u(x) = β i log(x i − γ i ),<br />
i=1<br />
2 × p −3<br />
2 1<br />
where β i and γ i , i = 1, . . . , n are, respectively, positive and nonpositive parameters.<br />
(1) Derive the indirect utility function and show that it is decreasing in its<br />
arguments.<br />
(2) Verify Roy’s Theorem.<br />
(3) Derive the expenditure function and show that it is homogeneous of degree<br />
one and nondecreasing in prices.<br />
(4) Verify Hotelling’s Theorem.<br />
Exercise 80. For the utility function defined in exercise 2,<br />
(1) Derive the Slutsky equation.<br />
(2) Let d i (p, y) be the demand for good i derived from the above utility function.<br />
Goods i and j are said to be gross substitutes if ∂d i (p, y)/∂p j > 0<br />
and gross complements if ∂d i (p, y)/∂p j < 0. For this utility function are<br />
the various goods gross substitutes, gross complements, or can we not say?<br />
(The two previous exercises are taken from R. Robert Russell and Maurice<br />
Wilkinson, Microeconomics: A Synthesis of Modern and Neoclassical Theory, New<br />
York, John Wiley & Sons, 1979.)<br />
Exercise 81. An electric utility has two generating plants in which total costs<br />
per hour are c 1 and c 2 respectively where<br />
c 1 =80 + 2x 1 + 0.001bx 2 1<br />
b >0<br />
c 2 =90 + 1.5x 2 + 0.002x 2 2
5. APPLICATIONS TO MICRO<strong>ECON</strong>OMIC THEORY 55<br />
where x i is the quantity generated in the i-th plant. If the utility is required to produce<br />
2000 megawatts in a particular hour, how should it allocate this load between<br />
the plants so as to minimise costs? Use the Lagrangian method and interpret the<br />
multiplier. How do total costs vary as b changes. (That is, what is the derivative<br />
of the minimised cost with respect to b.)
CHAPTER 4<br />
Topics in Convex <strong>Analysis</strong><br />
1. Convexity<br />
Convexity is one of the most important mathematical properties in economics.<br />
For example, without convexity of preferences, demand and supply functions<br />
are not continuous, and so competitive markets generally do not have equilibrium<br />
points. The economic interpretation of convex preference sets in consumer theory is<br />
diminishing marginal rates of substitution; the interpretation of convex production<br />
sets is constant or decreasing returns to scale. Considerably less is known about<br />
general equilibrium models that allow non-convex production sets (e.g., economies<br />
of scale) or non-convex preferences (e.g., the consumer prefers a pint of beer or a<br />
shot of vodka alone to any mixture of the two).<br />
Another set of mathematical results closely connected to the notion of convexity<br />
is so-called separation and support theorems. These theorems are frequently used in<br />
economics to obtain a price system that leads consumers and producers to choose<br />
Pareto-efficient allocation. That is, given the prices, producers are maximizing<br />
profits, and given those profits as income, consumers are maximizing utility subject<br />
to their budget constraints.<br />
1.1. Convex Sets. Given two points x, y ∈ R n , a point z = ax + (1 − a) y,<br />
where 0 ≤ a ≤ 1, is called a convex combination of x and y.<br />
The set of all possible convex combinations of x and y, denoted by [x, y], is<br />
called the interval with endpoints x and y (or, the line segment connecting x and<br />
y):<br />
[x, y] = {ax + (1 − a) y : 0 ≤ a ≤ 1} .<br />
Definition 18. A set S ⊆ R n is convex iff for any x, y ∈ S the interval<br />
[x, y] ⊆ S.<br />
In words: a set is convex if it contains the line segment connecting any two of<br />
its points; or, more loosely speaking, a set is convex if along with any two points it<br />
contains all points between them.<br />
Convex sets in R 2 include interiors of triangle, squares, circles, ellipses, and<br />
hosts of other sets. Note also that, for example in R 3 , while the interior of a cube is<br />
a convex set, its boundary is not. The quintessential convex set in Euclidean space<br />
R n for any n > 1 is the n−dimensional sphere S R (a) of radius R > 0 about point<br />
a ∈ R n , given by<br />
S R (a) = {x : x ∈ R n , |x − a| < R}.<br />
More examples of convex sets:<br />
1. Is the empty set convex? Is a singleton convex? Is R n convex?<br />
There are also several standard ways of forming convex sets from convex sets:<br />
2. Let A, B ⊆ R n be sets. The Minkowski sum A + B ⊆ R n is defined as<br />
A + B = {x + y : x ∈ A, y ∈ B} .<br />
When B = {b} is a singleton, the set A + b is called a translation of A. Prove that<br />
A + B is convex if A and B are convex.<br />
57
58 4. TOPICS IN CONVEX ANALYSIS<br />
3. Let A ⊆ R n be a set and α ∈ R be a number. The scaling αA ⊆ R n is<br />
defined as<br />
αA = {αx : x ∈ A} .<br />
When α > 0, the set αA is called a dilation of A. Prove that αA is convex if A is<br />
convex.<br />
4. Prove that the intersection ∩ i∈I S i of any number of convex sets is convex.<br />
5. Show by example that the union of convex sets need not be convex.<br />
It is also possible to define convex combination of arbitrary (but finite) number<br />
of points.<br />
Definition 19. Let x 1 , ..., x k be a finite set of points from R n . A point<br />
k∑<br />
x = α i x i ,<br />
where α i ≥ 0 for i = 1, ..., k and<br />
x 1 , ..., x k .<br />
k ∑<br />
i=1<br />
i=1<br />
α i = 1, is called a convex combination of<br />
Note that the definition of a convex combination of two points is a special case<br />
of this definition. (Prove it)<br />
Can we generate ‘superconvex’ sets using definition 19? No! as the following<br />
Lemma shows.<br />
Lemma 1. A set S ⊆ R n is convex iff every convex combination of points of S<br />
is in S.<br />
Proof. If a set contains all convex combinations of its points it is obviously<br />
convex, because it also contains convex combinations of all pairs its points. Thus,<br />
we need to show that a convex set contains any convex combination of its points.<br />
The proof is by induction on the number of points of S in a convex combination.<br />
By definition, convex set contains all convex combinations of any two of it points.<br />
Suppose that S contains any convex combination of n or fewer points and consider<br />
one of n + 1 points, x = ∑ n+1<br />
i=1 α ix i . Since not all α i = 1, we can relabel them so<br />
that α n+1 < 1. Then<br />
x = (1 − α n+1 )<br />
n∑<br />
i=1<br />
= (1 − α n+1 ) y + α n+1 x n+1 .<br />
α i<br />
1 − α n+1<br />
x i + α n+1 x n+1<br />
Note that y ∈ S by induction hypothesis (as a convex combination of n points of<br />
S) and, as a result, so is x, being a convex combination of two points in S. □<br />
But, using definition 19, we can generate convex sets from non-convex sets!<br />
This operation is very useful, so the resulting set deserves a special name.<br />
Definition 20. Given a set S ⊆ R n the set of all convex combinations of<br />
points from S, denoted convS, is called the convex hull of S.<br />
Note: convince yourself that the adjective ‘convex’ in the term ‘convex hull’ is<br />
well-deserved by proving that convex hull is indeed convex! Now, the lemma 1 can<br />
be written more succinctly: S = convS iff S is convex.<br />
1.2. Convex Hulls. The next theorem deals the following interesting property<br />
of convex hulls: the convex hull of a set S is the intersection of all convex sets<br />
containing S. Thus, in a natural sense, the convex hull of a set S is the ‘smallest’<br />
convex set containing S. In fact, many authors define convex hulls in that way and<br />
then prove our Definition 20 as theorem.
1. CONVEXITY 59<br />
Theorem 11. Let S ⊆ R n be a set then any convex set containing S also<br />
contains convS.<br />
Proof. Let A be a convex set such that S ⊆ A. By lemma 1 A contains all<br />
convex combinations of its points and, in particular, all convex combinations of<br />
points of its subset S, which is convS.<br />
□<br />
The next property is quite obvious and, again, frustrates attempts to generate<br />
‘superconvex’ sets, this time by trying to take convex hulls of convex hulls.<br />
1. Prove that convconvS = convS for any S.<br />
2. Prove that if A ⊂ B then convA ⊂ convB.<br />
The next property relates the operation of taking convex hulls and of taking<br />
direct sums. It does not matter in which order you use these operations.<br />
3. Prove that conv (A + B) = (convA) + (convB).<br />
4. Prove that conv (A ∩ B) ⊆ (convA) ∩ (convB).<br />
5. Prove that (convA) ∪ (convB) ⊆ conv (A ∪ B).<br />
1.3. Caratheodory’s Theorem. The definition 20 implies that any point x<br />
in the convex hull of S is representable as a convex combination of (finitely) many<br />
points of S but it places no restrictions on the number of points of S required<br />
to make the combination. Caratheodory’s Theorem puts the upper bound on the<br />
number of points required, in R n the number of points never has to be more than<br />
n + 1.<br />
Theorem 12 (Caratheodory, 1907). Let S ⊆ R n be a non-empty set then every<br />
x ∈ convS can be represented as a convex combination of (at most) n + 1 points<br />
from S.<br />
Note that the theorem does not ‘identify’ points used in representation, their<br />
choice would depend on x.<br />
Show by example that the constant n + 1 in Caratheodory’s theorem cannot<br />
be improved. That is, exhibit a set S ⊆ R n and a point x ∈ convS that cannot be<br />
represented as a convex combination of fewer than n + 1 points from S.<br />
1.4. Polytopes. The simplest convex sets are those which are convex hulls of<br />
a finite set of points, that is, sets of the form S = conv{x 1 , x 2 , ..., x m }. The convex<br />
hull of a finite set of points in R n is called a polytope.<br />
1. Prove that the set<br />
n+1<br />
∑<br />
∆ = {x ∈ R n+1 : x i = 1 and x i ≥ 0 for any i}<br />
i=1<br />
is a polytope. This polytope is called the standard n−dimensional simplex.<br />
2. Prove that the set<br />
C = {x ∈ R n+1 : 0 ≤ x i ≤ 1 for any i}<br />
is a polytope. This polytope is called an n−dimensional cube.<br />
3. Prove that the set<br />
n+1<br />
∑<br />
O = {x ∈ R n+1 : |x i | ≤ 1}<br />
i=1<br />
is a polytope. This polytope is called a (hyper)octahedron.<br />
1.5. Topology of Convex Sets.<br />
(1) The closure of a convex set is a convex set.<br />
(2) The interior of a convex set (possible empty) is convex.
60 4. TOPICS IN CONVEX ANALYSIS<br />
1.6. Aside: Helly’s Theorem. While there are not so many applications of<br />
Helly’s theorem to economics (in fact, I am aware of the only one paper that uses<br />
Helly’s theorem in economic context), it is definitely one of the most famous results<br />
in convexity.<br />
Theorem 13 (Helly, 1913). Let A 1 , A 2 , ..., A m ⊆ R n be a finite family of convex<br />
sets with m ≥ n + 1. Suppose that every n + 1 sets have a nonempty intersection.<br />
Then all sets have a nonempty intersection.<br />
To prove Helly’s theorem with elegance we need first to formulate a very useul<br />
result obtianed by J.Radon.<br />
Theorem 14 (Radon, 1921). Let S ⊆ R n be a set of at least n + 2 points.<br />
Then there are two non-intersecting subsets R ⊂ S (‘red points’) and B ⊂ S (‘blue<br />
points’) such that<br />
convR ∩ convB ≠ ∅.<br />
Proof. Let x 1 , ..., x m be m ≥ n + 2 distinct points from S. Consider the<br />
system of n + 1 homogeneous linear equations in variables γ 1 , ..., γ m<br />
γ 1 x 1 + ... + γ m x m = 0 and γ 1 + ... + γ m = 0<br />
Since m ≥ n + 2, there is a nontrivial solution to this system. Let<br />
R = {x i : γ i > 0} and B = {x i : γ i < 0}.<br />
Then R ∩ B = ∅. Let β =<br />
∑<br />
∑<br />
γ i then β > 0 and γ i = −β, since γ’s sum<br />
up to zero. Moreover,<br />
since ∑ γ i x i = 0. Let<br />
x =<br />
i:γ i>0<br />
∑<br />
i:γ i>0<br />
∑<br />
i:γ i>0<br />
γ i x i =<br />
γ i<br />
β xi =<br />
∑<br />
i:γ i
2. SUPPORT AND SEPARATION 61<br />
2. Support and Separation<br />
2.1. Hyperplanes. The concept of hyperplane in R n is a straightforward generalisation<br />
of the notion of a line in R 2 and of a plane in R 3 . A line in R 2 can be<br />
described by an equation<br />
p 1 x 1 + p 2 x 2 = α<br />
where p = (p 1 , p 2 ) is some non-zero vector and α is some scalar. A plane in R 3 can<br />
be described by an equation<br />
p 1 x 1 + p 2 x 2 + p 3 x 3 = α<br />
where p = (p 1 , p 2 , p 3 ) is some non-zero vector and α is some scalar. Similarly, a<br />
hyperplane in R n can be described by an equation<br />
∑ n<br />
i=1 p ix i = α<br />
where p = (p 1 , p 2 , ..., p n ) is some non-zero vector in R n and α is some scalar. It can<br />
be written in more concise way using scalar (aka inner, dot) product notation.<br />
Definition 21. A hyperplane is the set<br />
H(p, α) = {x ∈ R n : p · x = α}<br />
where p ∈ R n is a non-zero vector and α is a scalar. The vector p is called the<br />
normal to the hyperplane H.<br />
Suppose that there are two points x ∗ , y ∗ ∈ H(p, α). Then by definition p·x ∗ = α<br />
and p · y ∗ = α. Hence p · (x ∗ − y ∗ ) = 0. In other words, vector p is orthogonal to<br />
the line segment (x ∗ − y ∗ ), or to H(p, α).<br />
Given a hyperplane H ⊂ R n points in R n can be classified according to their<br />
positions relative to hyperplane. The (closed) half-space determined by the hyperplane<br />
H(p, α) is either the set of points ‘below’ H or the set of points ‘above’ H,<br />
i.e., either the set {x ∈ R n : p · x ≤ α} or the set {x ∈ R n : p · x ≥ α}. Open<br />
half-spaces are defined by strict inequalities. Prove that a closed half-space is closed<br />
and open half-space is open.<br />
The straightforward economic example of a half-space is a budget set {x ∈<br />
R n : p · x ≤ α} of a consumer with income α facing the vector of prices p. (It was<br />
rather neat to call the normal vector p, wasn’t it?). By the way, hyperplanes and<br />
half-spaces are convex sets (Prove it).<br />
2.2. Support Functions. In this section we give a description of what is<br />
called a dual structure. Consider the set of all closed convex subsets of R n . We<br />
will show that to each such set S we can associate an extended-real valued function<br />
µ S : R n → R ∪ {∞}, that is a function that maps each vector in R n to either a real<br />
number or to −∞. Not all such functions can be arrived at in this way. In fact<br />
we shall show that any such function must be concave and homogeneous of degree<br />
1. But once we restrict attention to functions that can be arrived at as a “support<br />
function” for some such closed convex set we have another set of objects that we<br />
can analyse and perhaps make useful arguments about the original sets in which<br />
we where interested.<br />
In fact, we shall define the function µ S for any subset of R n , not just the closed<br />
and convex ones. However, if the original set S is not a closed convex one we shall<br />
lose some information about S in going to µ S . In particular, µ S only depends on<br />
the closed convex hull of S, that is, if two sets have the same closed convex hull<br />
they will lead to the same function µ S .<br />
We define µ S : R n → R ∪ {−∞} as<br />
µ S (p) = inf{p · x | x ∈ S},
62 4. TOPICS IN CONVEX ANALYSIS<br />
where inf denotes the infimum or greatest lower bound. It is a property of the<br />
real numbers that any set of real numbers has an infimum. Thus µ S (p) is well<br />
defined for any set S. If the minimum exists, for example if the set S is compact,<br />
then the infimum is the minimum. In other cases the minimum may not exist. To<br />
take a simple one dimensional example suppose that the set S was the subset of R<br />
consisting od the numbers 1/n for n = 1, . . . and that p = 2. Then clearly p·x = px<br />
does not have a minimum on the set S However 0 is less than px = 2x for any value<br />
of x in S but for any number a greater than 0 there is a value of x in S such that<br />
px < a. Thus 0 is in this case the infimum of the set {p · x | x ∈ S}.<br />
Recall that we have not assumed that S is convex. However, if we do assume<br />
that S is both convex and closed then the function µ S contains all the information<br />
needed to reconstruct S.<br />
Given any extended-real valued function µ : R n → R ∪ {∞} let us define the<br />
set S µ as<br />
S µ = {x ∈ R n | p · x ≥ µ(p) for every p ∈ R n }.<br />
That is, for each p > −infty we define the closed half space<br />
{x ∈ R n | p · x ≥ µ(p)}.<br />
Notice that is µ(p) = −∞ then p · x ≥ µ(p) for any x and so the above set will be<br />
R n rather than a half space. The set S µ is the intersection of all these closed half<br />
spaces. Since the intersection of convex sets is convex and the intersection of closed<br />
sets is closed, the set S µ is, for any function µ, a closed convex set.<br />
Suppose that we start with a set S, define µ S as above and then use µ S to<br />
define the set S µS . If the set S was a closed convex set then S µS will be exactly<br />
equal to S. Since we have seen that S µS is a closed convex set, it must be that if<br />
S is not a closed convex set it will not be equal to S µS . However S will always be<br />
a subset of S µS , and indeed S µS will be the smallest closed convex set such that S<br />
is a subset, that is S µS is the closed convex hull of S.<br />
2.3. Separation. We now consider the notion of ‘separating’ two sets by a<br />
hyperplane.<br />
Definition 22. A hyperplane H separates sets A and B if A is contained in<br />
one closed half-space and B is contained in the other. A hyperplane H strictly<br />
separates sets A and B if A is contained in one open half-space and B is contained<br />
in the other.<br />
It is clear that strict separation requires the two sets to be disjoint. For example,<br />
consider two (externally) tangent circles in a plane. Their common tangent line<br />
separates them but does not separate them strictly. On the other hand, although it<br />
is necessary for two sets be disjoint in order to strictly separate them, this condition<br />
is not sufficient, even for closed convex sets. Let A = {x ∈ R 2 : x 1 > 0 and<br />
x 1 x 2 ≥ 1} and B = {x ∈ R 2 : x 1 ≥ 0 and x 2 = 0} then A and B are disjoint<br />
closed convex sets but they cannot be strictly separated by a hyperplane (line in<br />
R 2 ). Thus the problem of the existence of separating hyperplane is more involved<br />
then it may appear to be at first.<br />
We start with separation of a set and a point.<br />
Theorem 15. Let S ⊆ R n be a convex set and x 0 /∈ S be a point. Then S and<br />
x 0 can be separated. If S is closed then S and x can be strongly separated.<br />
Idea of proof. Proof proceeds in two steps. The first step establishes the<br />
existence a point a in the closure of S which is the closest to x 0 . The second step<br />
constructs the separating hyperplane using the point a.
2. SUPPORT AND SEPARATION 63<br />
STEP 1. There exists a point a ∈ ¯S (closure of S) such that d(x 0 , a) ≤ d(x, a)<br />
for all x ∈ ¯S, and d(x 0 , a) > 0.<br />
Let ¯B(x 0 ) be closed ball with centre at x 0 that intersects the closure of S.<br />
Let A = ¯B(x 0 ) ∩ ¯S ≠ ∅. The set A is nonempty, closed and bounded (hence<br />
compact). According to Weierstrass’s theorem, the continuous distance function<br />
d(x 0 , x) achieves its minimum in A. That is, there exists a ∈ A such that d(x 0 , a) ≤<br />
d(x, a) for all x ∈ ¯S. Note that d(x 0 , a) > 0<br />
STEP 2. There exists a hyperplane H(p, α) = {x ∈ R n : p · x = α} such that<br />
p · x ≥ α for all x ∈ ¯S and p · x < α.<br />
Construct a hyperplane which goes through the point a ∈ ¯S and has normal<br />
p = a − x 0 . The proof that this hyperplane is the separating one is done by<br />
contradiction. Suppose there exists a point y ∈ ¯S which is strictly on the same side<br />
of H as x 0 . Consider the point y ′ ∈ [a, y] such that the vector y ′ − x 0 is orthogonal<br />
to y − a. Since d(x 0 , y) ≥ d(x 0 , a), the point y ′ is between a and y. Thus, y ∈ ¯S<br />
and d(x 0 , y ′ ) ≤ d(x 0 , a) which contradicts the choice of a. When S = ¯S, that is S<br />
is closed, the separation can be made strict by choosing a point strictly in between<br />
a and x 0 instead of a. This is always possible because d(x 0 , a) > 0.<br />
□<br />
Theorem 15 is very useful because separation of a pair of sets can be always<br />
reduced to separation of a set and a point.<br />
Lemma 2. Let A and B be a non-empty sets. A and B can be separated<br />
(strongly separated) iff A − B and 0 can be separated (strongly separated).<br />
Proof. If A and B are convex then A − B is convex. If A is compact and B<br />
is closed then A − B is closed. And 0 /∈ A − B iff A ∩ B = ∅.<br />
□<br />
Theorem 16 (Minkowski, 1911). Let A and B be a non-empty convex sets<br />
with A ∩ B = ∅. Then A and B can be separated. If A is compact and B is closed<br />
then A and B can be strongly separated.<br />
2.4. Support. Closely (not in topological sense) related to the notion of a<br />
separating hyperplane is the notion of supporting hyperplane.<br />
Definition 23. The hyperplane H supports the set S at the point x 0 ∈ S if<br />
x 0 ∈ H and S is a subset of one of the half-spaces determined by H.<br />
A convex set can be supported at any of its boundary points, this is the immediate<br />
consequence of Theorem 16. To prove it, consider the sets A and B = {x 0 },<br />
where x 0 is a boundary point of A.<br />
Theorem 17. Let S ⊆ R n be a convex set with nonempty interior and x 0 ∈ S<br />
be its boundary point. Then there exist a supporting hyperplane for S at x 0 .<br />
Note that if the boundary of a convex set is smooth (‘differentiable’) at the<br />
given point x 0 then the supporting hyperplane is unique and is just the tangent<br />
hyperplane. If, however, the boundary is not smooth then there can be many<br />
supporting hyperplanes passing through the given point. It is important to note<br />
that conceptually the supporting theorems are connected to calculus. But, the<br />
supporting theorems are more powerful (don’t require smoothness), more direct,<br />
and more set-theoretic.<br />
Certain points on the boundary of a convex set carry a lot of information about<br />
the set.<br />
Definition 24. A point x of a convex set S is an extreme point of S if x is<br />
not an interior point of any line segment in S.
64 4. TOPICS IN CONVEX ANALYSIS<br />
The extreme points of a closed ball and of a closed cube in R 3 are its boundary<br />
points and its eight vertices, respectively. A half-space has no extreme points even<br />
if it is closed.<br />
An interesting property of extreme points is that an extreme point can be<br />
deleted from the set without destroying convexity of the set. That is, a point x in<br />
a convex set S is an extreme point iff the set S\{x} is convex.<br />
The next Theorem is a finite-dimensional version of a quite general and powerful<br />
result by M.G. Krein and D.P. Milman.<br />
Theorem 18 (Krein & Milman, 1940). Let S ⊆ R n be convex and compact.<br />
Then S is the convex hull of its extreme points.