X - UCSD DSP Lab
X - UCSD DSP Lab
X - UCSD DSP Lab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Some Linear Algebra Concepts<br />
Ken Kreutz-Delgado<br />
Nuno Vasconcelos<br />
ECE 175A – Winter 2011 – <strong>UCSD</strong>
Vector spaces<br />
• Definition: a vector space is a set H where<br />
– addition and scalar multiplication are defined and satisfy:<br />
1) x+(x’+x’’) = (x+x’)+x” 5) lx H<br />
2) x+x’ = x’+x H 6) 1x = x<br />
3) 0 H, 0 + x = x 7) l(l’ x) = (ll’)x<br />
4) –x H, -x + x = 0 8) l(x+x’) = lx + lx’<br />
(l=scalar) 9) (l+l’)x = lx + l’x<br />
• the canonical example is d with standard<br />
vector addition and scalar multiplication<br />
e 2<br />
e d<br />
x<br />
x’<br />
x+x’<br />
e 1<br />
e 2<br />
e d<br />
x<br />
ax<br />
e 1
Vector spaces<br />
• But there are much more interesting examples<br />
• E.g., the space of functions f:X with<br />
(f+g)(x) = f(x) + g(x) (lf)(x) = lf(x)<br />
• d is a vector space of<br />
finite dimension,e.g.<br />
– f = (f 1, ..., f d)<br />
• When d goes to infinity<br />
we have a function<br />
– f = f(t)<br />
• The space of all functions<br />
is an infinite dimensional<br />
vector space
Data<br />
• In this course we will talk a lot about “data”<br />
• In all cases data will be represented in a vector space:<br />
– an example is really just a point (“datapoint”) on such a space<br />
– from above we know how to perform basic operations on<br />
datapoints<br />
– this is nice, because datapoints can be quite abstract<br />
– e.g. images:<br />
an image is a function<br />
on the image plane<br />
it assigns a color f(x,y) to<br />
each each image<br />
location (x,y)<br />
the space Y of images<br />
is a vector space (note: assumes<br />
that images can be negative)<br />
this image is a point in Y
Images<br />
• Because of this we can manipulate images by<br />
manipulating their equivalent vector representations<br />
• E.g., Suppose one wants to “morph” a(x,y) into b(x,y):<br />
– One way to do this is via the path along the line from a to b.<br />
c(a) = a + a(b-a)<br />
= (1-a) a + a b<br />
– for a = 0 we have a<br />
– for a = 1 we have b<br />
– for a in (0,1) we have a point<br />
on the line between a and b<br />
• To morph an image we can simply<br />
apply this rule to the image vector<br />
representations!<br />
b<br />
b-a<br />
a<br />
a(b-a)
Images<br />
• When we make<br />
c(x,y) = (1-a) a(x,y) + a b(x,y)<br />
we get “image morphing”:<br />
a=0 a=0.2 a=0.4<br />
a=0.6 a=0.8 a=1<br />
• The point is that this is possible because we are in a<br />
vector space.<br />
b<br />
b-a<br />
a<br />
a(b-a)
Images<br />
• Images are approximately represented as points in d<br />
– Sample (discretize) image on a finite grid to get array of pixels<br />
a(x,y) a(i,j)<br />
– Images are always stored like this on computers<br />
– Stack all the rows into a vector, e.g. a 3 x 3 image is converted<br />
into a 9 x 1 vector<br />
<br />
– A n x m image vector is transformed into a nm x 1 vector<br />
– Note that this is yet another vector space<br />
• The point is that there are multiple isomorphic vector<br />
spaces in which the data can be represented
Text<br />
• Another common type<br />
of data is text<br />
• Documents are<br />
represented by<br />
word counts:<br />
– associate a counter<br />
with each word<br />
– slide a window through<br />
the text<br />
– whenever the word<br />
occurs increment<br />
its counter<br />
• This is the way search<br />
engines represent<br />
web pages
Text<br />
• E.g. word counts for three<br />
documents in a certain corpus<br />
(only 12 words shown for clarity)<br />
• Note that:<br />
– Each document is a 12<br />
dimensional vector<br />
– If I add two word count vectors (documents), I get a new word<br />
count vector (document)<br />
– If I multiply a word count vector (document) by a scalar, I get a<br />
word count vector<br />
– Note: once again we assume word counts could be negative (to<br />
make this happen we can simply subtract the average value)<br />
• This means:<br />
– We are once again on a vector space (positive subset of d )<br />
– A document is a point in this space
Bilinear forms<br />
• One reason to use vector spaces is that they allow us to<br />
measure distances between data points<br />
• We will see that this is crucial for classification<br />
• The main tool for this is the inner product (dot-product).<br />
• We can define the dot-product using the notion of a<br />
bilinear form (assuming a real vector space).<br />
• Definition: a bilinear form on a vector space H is a<br />
bilinear mapping<br />
Q: H x H <br />
(x,x’) Q(x,x’)<br />
“Bi-linear” means that "x,x’,x’’ H<br />
i) Q[(lx+lx’),x”] = lQ(x,x”) + l’Q(x’,x”)<br />
ii) Q[x”,(lx+lx’)] = lQ(x”,x) + l’Q(x”,x’)
Inner Products<br />
• Definition: an inner product on a real vector space H<br />
is a bilinear form<br />
: H x H <br />
(x,x’) <br />
such that<br />
i) 0, "x H<br />
ii) = 0 if and only if x = 0<br />
iii) = for all x and y<br />
• The positive-definiteness conditions i) and ii) make the<br />
inner product a natural measure of similarity<br />
• This becomes more precise with introduction of a norm
Inner Products and Norms<br />
• Any inner product induces a norm via the definition<br />
||x|| 2 = <br />
• By definition, any norm must obey the following properties<br />
– Positive-definiteness: ||x|| 0, & ||x|| = 0 iff x =0<br />
– Homogeneity: ||l x|| = |l| ||x||<br />
– Triangle Inequality: ||x + y|| ≤ ||x|| + ||y||<br />
• A norm defines a corresponding metric<br />
d(x,y) = ||x-y||<br />
which is a measure of the distance between x and y<br />
• Always remember that the induced norm changes with a<br />
different choice of inner product!
Inner Product<br />
• Back to our examples:<br />
– In d the standard (or unweighted) inner product is<br />
d<br />
T<br />
x, y = x y = <br />
i=<br />
1<br />
– Which leads to the standard Euclidean norm in d<br />
x<br />
=<br />
x<br />
T<br />
x<br />
=<br />
– The distance between two vectors is the standard Euclidean<br />
distance in d<br />
d(<br />
x,<br />
y)<br />
=<br />
x y<br />
=<br />
x<br />
i<br />
d<br />
<br />
i=<br />
1<br />
( x y)<br />
T<br />
y<br />
x<br />
i<br />
2<br />
i<br />
( x y)<br />
=<br />
d<br />
<br />
i=<br />
1<br />
( x<br />
i<br />
<br />
y<br />
i<br />
2<br />
)
Inner Products and Norms<br />
• Note that this immediately gives<br />
a measure of similarity<br />
between web pages<br />
– compute word count vector x i<br />
from page i, for all i<br />
– distance between page i and<br />
page j is simply<br />
T<br />
d( xi,<br />
x j ) =<br />
xi<br />
x j = ( xi<br />
x j ) ( xi<br />
x j )<br />
– this allows us to find, in the web, the most similar page i to any<br />
given page j<br />
• This is very close to the measure of similarity used by<br />
most search engines!<br />
• What about functions, e.g. on images?
Inner Products on Function Spaces<br />
• Recall that the space of functions is an infinite<br />
dimensional vector space<br />
– The standard (unweighted) inner product is the natural<br />
extension of that in d (just replace summations by integrals)<br />
f ( x), g( x) = f ( x) g( x) dx<br />
– the norm is related to the “energy” of the function<br />
2 2<br />
f ( x) = f ( x) dx<br />
– and the distance between functions is related to the energy of<br />
the difference between them<br />
2 2<br />
d( f ( x), g( x)) = f ( x) g( x) = [ f ( x) g( x)] dx
Basis<br />
• We know how to measure distances in a vector space<br />
• Another interesting property is that we can fully<br />
characterize the space by one of its bases<br />
• A set of vectors x 1, …, x k are a basis of a vector space H<br />
if and only if (iff)<br />
– they are linearly independent<br />
<br />
i<br />
c x = 0 c = 0,<br />
" i<br />
i<br />
i<br />
– And they span H : for any v in H, v can be written as<br />
v<br />
=<br />
<br />
i<br />
i i x c<br />
• These two conditions mean that any can be<br />
uniquely represented in this form.<br />
i<br />
v
Basis<br />
• Note that<br />
– by making the canonical representation x i the columns of a matrix<br />
X, these two conditions can be compactly written as<br />
– Condition 1. The vectors x i are linear independent:<br />
Xc<br />
= 0 c =<br />
– Condition 2. The vectors x i span H<br />
• Also, all bases of H have the same number of<br />
vectors, which is called the dimension of H<br />
– This is valid for any vector space!<br />
0<br />
" v 0, c 0| v =<br />
Xc
Basis<br />
• example<br />
– A basis<br />
of the vector<br />
space of images<br />
of faces<br />
– The figure<br />
only show the<br />
first 16 basis<br />
vectors but<br />
there actually<br />
more<br />
– These vectors are<br />
orthonormal
Orthogonality<br />
• Two vectors are orthogonal iff their inner product is zero<br />
– e.g.<br />
2p 2<br />
<br />
0 0<br />
in the space of functions defined on [0,2p], cos(ax) and sin(ax)<br />
are orthogonal<br />
• two subspaces V and W are orthogonal if every vector in<br />
V is orthogonal to every vector in W<br />
• a set of vectors x 1, …, x k is called<br />
– orthogonal if all pairs of vectors are orthogonal.<br />
– orthonormal if all of the orthogonal vectors also have unit norm.<br />
2p<br />
sin ax<br />
sin( ax)cos( ax) dx = = 0<br />
2a<br />
xi, x j<br />
0,<br />
if<br />
= <br />
1,<br />
if<br />
i<br />
i<br />
<br />
=<br />
j<br />
j
Matrix<br />
• an m x n matrix represents a linear operator that maps a vector<br />
from the domain X = R n to a vector in the codomain Y = R m<br />
• E.g. the equation y = Ax<br />
sends x in R n to y in R m<br />
according to<br />
y<br />
<br />
<br />
<br />
<br />
y<br />
a<br />
<br />
=<br />
<br />
<br />
<br />
<br />
<br />
am<br />
X Y<br />
e 2<br />
e n<br />
<br />
x<br />
e 1<br />
A<br />
1<br />
m<br />
11<br />
1<br />
e m<br />
<br />
y<br />
a<br />
a<br />
e 1<br />
1n<br />
<br />
mn<br />
<br />
x<br />
<br />
<br />
<br />
<br />
<br />
x<br />
1<br />
n
Matrix vector multiplication<br />
• Consider y = Ax, i.e. yi = n<br />
j=1 aijxj • This is equivalent to<br />
• where “(– ai –)” means the ith , i = 1,…,m<br />
<br />
x1<br />
n <br />
<br />
y<br />
<br />
i ai1 a<br />
<br />
in aijx <br />
j ai x<br />
<br />
<br />
=<br />
<br />
= = <br />
<br />
j=<br />
1<br />
x <br />
n <br />
<br />
row of A. Hence<br />
– the i th component of y is the inner product of (– a i –) and x.<br />
– The m components of y are obtained by “projecting” x onto (i.e., taking<br />
n<br />
the inner product with) the m rows of A in the domain space<br />
e 2<br />
e m<br />
=<br />
x A’s action in X<br />
e 1<br />
n<br />
y m<br />
-a m-<br />
y2 -a2- y 1<br />
x<br />
(m rows)<br />
-a 1-<br />
=
Matrix vector multiplication<br />
• But there is more. Let y = Ax, i.e. y i = j=1 n aijx j , now written as<br />
y<br />
<br />
<br />
<br />
<br />
y<br />
1<br />
m<br />
<br />
<br />
<br />
<br />
<br />
= <br />
<br />
<br />
<br />
n<br />
<br />
j=<br />
1<br />
a<br />
ij<br />
<br />
a<br />
<br />
<br />
x<br />
<br />
j =<br />
<br />
<br />
<br />
<br />
a<br />
<br />
11<br />
m1<br />
x<br />
x<br />
1<br />
1<br />
<br />
a<br />
<br />
a<br />
where a i with “|” above and below means the i th column of A.<br />
– x i is the i th component of y in the codomain (column space)<br />
spanned by the n columns of A<br />
– I.e, y is a linear combination of the n columns of A in the codomain<br />
e 2<br />
e n<br />
<br />
<br />
=<br />
x<br />
e 1<br />
n<br />
<br />
1n<br />
mn<br />
A maps from X to Y<br />
x<br />
n<br />
x<br />
n<br />
| | <br />
<br />
a<br />
<br />
x <br />
<br />
a<br />
<br />
<br />
=<br />
1<br />
1 <br />
n <br />
x<br />
<br />
<br />
| <br />
<br />
| <br />
|<br />
an |<br />
x n<br />
=<br />
m<br />
x 1<br />
y<br />
n<br />
|<br />
a1 |<br />
=<br />
m<br />
=<br />
m
Matrix vector multiplication<br />
• Thus there are two alternative (dual) pictures of y = Ax:<br />
– “Coordinates of y” = “x „projected‟ onto row space of A” (The X = R n viewpoint)<br />
Domain X = R n<br />
e 2<br />
e n<br />
x A<br />
e 1<br />
– “Components of x” = “ „coordinates‟ of y in column space of A” (Y = R m viewpoint)<br />
y m<br />
|<br />
an |<br />
x n<br />
-a m-<br />
y2 -a2- Domain X = R n viewpoint<br />
y 1<br />
x<br />
-a 1-<br />
x 1<br />
y<br />
|<br />
a1 |<br />
<br />
<br />
y<br />
<br />
i ai x<br />
<br />
<br />
=<br />
<br />
<br />
<br />
( m rows)<br />
<br />
Codomain Y = R m viewpoint<br />
| | <br />
y =<br />
<br />
a<br />
<br />
x <br />
<br />
a<br />
<br />
x<br />
11 nn | |
A cool trick<br />
• the matrix multiplication formula<br />
C<br />
=<br />
AB<br />
<br />
c<br />
ij<br />
=<br />
<br />
also applies to “block matrices” when these are defined<br />
to be conformal.<br />
• for example, if A,B,C,D,E,F,G,H are conformal matrices,<br />
• To be conformal means that the sizes of the matrices<br />
A,B,C,D,E,F,G,H have to be such that the intermediate<br />
operations make sense!<br />
k<br />
a<br />
A B E F AE BGAFBH <br />
C D<br />
<br />
G H<br />
= <br />
CE DGCFDH <br />
<br />
ik<br />
b<br />
kj
Matrix vector multiplication<br />
• This makes it easy to derive the two alternative pictures<br />
• The row space picture (or viewpoint):<br />
<br />
x1<br />
<br />
<br />
<br />
=<br />
<br />
<br />
yi <br />
=<br />
<br />
ain<br />
ain<br />
<br />
<br />
i 1xn<br />
nx1<br />
i<br />
<br />
<br />
<br />
<br />
<br />
x <br />
<br />
<br />
<br />
n<br />
<br />
a <br />
x = a <br />
Scalar multiplication between the row blocks (–a i-) and x<br />
• The column space picture (or viewpoint):<br />
<br />
<br />
<br />
=<br />
<br />
<br />
yi<br />
<br />
ain<br />
<br />
<br />
<br />
<br />
<br />
a<br />
<br />
in<br />
<br />
<br />
x<br />
<br />
<br />
<br />
<br />
<br />
x<br />
1<br />
n<br />
<br />
| | <br />
<br />
<br />
= a1<br />
an<br />
<br />
<br />
| <br />
<br />
|<br />
<br />
<br />
<br />
mx1<br />
mx1<br />
x <br />
x <br />
Inner products between blocks given by the (scalar)<br />
blocks x i and the column blocks of A.<br />
1<br />
n<br />
<br />
1x1<br />
1x1<br />
<br />
x<br />
<br />
<br />
<br />
<br />
<br />
| <br />
<br />
<br />
= ai<br />
xi<br />
i<br />
<br />
|
Square nxn matrices<br />
• in this case m = n and the row and column subspaces are<br />
both equal to (copies of) R n<br />
e 2<br />
e n<br />
x A<br />
e 1<br />
y n<br />
|<br />
an |<br />
x n<br />
-a n-<br />
y2 -a2- x 2<br />
y 1<br />
x<br />
-a 1-<br />
x 1<br />
|<br />
a2 |<br />
y<br />
|<br />
a1 |
Orthogonal matrices<br />
• A matrix is called orthogonal if it is square and has<br />
orthonormal columns.<br />
• Important properties:<br />
– 1) The inverse of an orthogonal matrix is its transpose<br />
this can be easily shown with the block matrix trick. (Also see later.)<br />
1 0 0<br />
<br />
<br />
| <br />
T T <br />
<br />
0 1 0<br />
<br />
<br />
A A = ai a<br />
1n<br />
<br />
<br />
j<br />
=<br />
<br />
<br />
| <br />
n1<br />
<br />
001 – 2) A proper (det(A) = 1) orthogonal matrix is a rotation matrix<br />
this follows from the fact that it is unitary, i.e., does not change the<br />
norms (“sizes”) of the vectors on which it operates,<br />
2 T T T T<br />
2<br />
Ax = ( Ax) ( Ax) = x A Ax = x x =<br />
|| x || ,<br />
and does not induce a reflection.
Rotation matrices<br />
• The combination of<br />
1. “operator” interpretation<br />
2. “block matrix trick”<br />
is useful in many situations<br />
• Example:<br />
– “what is the matrix R that rotates R 2 by degrees?”<br />
e 2<br />
<br />
e 1
Rotation matrices<br />
• The key is to consider how the matrix operates on the<br />
vectors e i of the canonical basis<br />
– note that R sends e 1 to e’ 1<br />
e'<br />
1<br />
r<br />
= <br />
r<br />
11<br />
21<br />
1<br />
<br />
<br />
0<br />
– using the column space picture<br />
e'<br />
1<br />
r<br />
= <br />
r<br />
11<br />
21<br />
r<br />
r<br />
12<br />
22<br />
r<br />
1<br />
<br />
r<br />
12<br />
22<br />
r<br />
0<br />
= <br />
r<br />
– from which we have the first column of the matrix<br />
r<br />
R = e'1<br />
r<br />
12<br />
22<br />
cos<br />
= <br />
sin<br />
<br />
r<br />
r<br />
12<br />
22<br />
11<br />
21<br />
<br />
<br />
<br />
<br />
<br />
<br />
e 2<br />
sin <br />
<br />
cos <br />
e 1
Rotation matrices<br />
• and we do the same for e 2<br />
– R sends e 2 to e’ 2<br />
e'<br />
2<br />
– from which<br />
– check<br />
R T<br />
r<br />
= <br />
r<br />
R<br />
R<br />
=<br />
11<br />
21<br />
r<br />
r<br />
12<br />
22<br />
0<br />
r<br />
<br />
= <br />
1<br />
r<br />
11<br />
21<br />
r<br />
0<br />
<br />
r<br />
e'1 e'2<br />
= <br />
<br />
sin<br />
cos<br />
<br />
cos<br />
= <br />
<br />
sin<br />
12<br />
22<br />
cos sin<br />
<br />
r<br />
1<br />
= <br />
r<br />
sin<br />
cos<br />
sin<br />
<br />
cos<br />
<br />
<br />
sin<br />
cos<br />
<br />
=<br />
12<br />
22<br />
<br />
<br />
<br />
-sin <br />
I<br />
e 2<br />
<br />
cos <br />
sin <br />
<br />
cos <br />
e 1
Projections<br />
• What if A is not orthogonal?<br />
– y = A T x and x’ = A y<br />
– x’ = x for all x if and only if AA T = I !<br />
– this means that A has to be orthogonal to have x’ = x<br />
• What happens when this is not the case? Note<br />
T<br />
AA<br />
T<br />
= AA<br />
T<br />
AA<br />
2<br />
x' column<br />
space – E.g., if , then is idempotent (and also<br />
obviously symmetric) so we get an orthogonal projection of x<br />
onto the column space of A<br />
10<br />
x1<br />
<br />
e.g., let<br />
10<br />
0<br />
x1<br />
<br />
A = , then<br />
<br />
01<br />
y = =<br />
<br />
x2<br />
<br />
<br />
00<br />
01<br />
0<br />
x2<br />
<br />
x<br />
<br />
<br />
x 3<br />
e<br />
e2 3<br />
10<br />
x1<br />
0 x1<br />
<br />
<br />
and<br />
<br />
x <br />
1 <br />
= =<br />
<br />
e<br />
x’<br />
x'=<br />
<br />
01<br />
<br />
0 x2<br />
<br />
x<br />
1<br />
2<br />
<br />
x2<br />
<br />
00<br />
0 0 <br />
0 <br />
column space of A =<br />
row space of A T
Null Space of a Matrix<br />
• What happens to the part that is lost?<br />
• For the previous example this is<br />
the “null space” of A T<br />
TT = | = 0<br />
N A x A x<br />
– in the example, this is comprised of all vectors of the type 0 since<br />
A T<br />
0<br />
<br />
10<br />
0<br />
0<br />
x =<br />
<br />
0<br />
<br />
= =<br />
01<br />
0<br />
<br />
<br />
a<br />
0<br />
<br />
<br />
<br />
<br />
a<br />
<br />
• FACT: N(A) is always orthogonal to the row space of A:<br />
– x is in the null space iff it is orthogonal to all rows of A<br />
– For the previous example this means that N(A T ) is orthogonal to the<br />
column space of A<br />
0<br />
e 3<br />
e 1<br />
e 2<br />
x<br />
x’<br />
column space of A =<br />
row space of A T<br />
0<br />
<br />
<br />
<br />
<br />
a <br />
null space<br />
of A T
Orthogonal Matrices – Cont.<br />
• An orthogonal matrix has linearly independent columns and<br />
therefore must have an inverse.<br />
T<br />
• Note that A A= I (proven earlier) and the existence of an<br />
1<br />
inverse AA = I implies<br />
1 1 T 1<br />
T T<br />
A = I A = A AA = A I = A .<br />
Thus<br />
• This means that<br />
T T<br />
A A = AA =<br />
I<br />
– A has n orthonormal columns and rows<br />
– Each of these two sets of vectors span all of R n<br />
– There is no extra room for an orthogonal subspace in the rowspace<br />
– The null space of A T has to be empty<br />
– The square matrix A has full rank
The Four Fundamental Subspaces<br />
• These exist for any matrix:<br />
– Column Space: space spanned by the columns<br />
– Row Space: space spanned by the rows<br />
– Nullspace: space of vectors orthogonal to all rows (also known<br />
as the orthogonal complement of the row space)<br />
– Left Nullspace: space of vectors orthogonal to all columns (also<br />
known as the orthogonal complement of the column space)<br />
• Special Case I: Square Symmetric Matrices (A = A T ):<br />
– Column Space is equal to the Row Space<br />
– Nullspace is equal to the Left Nullspace, and is therefore<br />
orthogonal to the Column Space<br />
• Special Case II: nxn Orthogonal Matrices (A T A = AA T = I)<br />
– Column Space = Row Space = R n<br />
– Nullspace = Left Nullspace = {0} = the Trivial Subspace
End