22.02.2014 Views

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

Discrete Mathematics University of Kentucky CS 275 Spring ... - MGNet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Discrete</strong> <strong>Mathematics</strong><br />

<strong>University</strong> <strong>of</strong> <strong>Kentucky</strong> <strong>CS</strong> <strong>275</strong><br />

<strong>Spring</strong>, 2007<br />

Pr<strong>of</strong>essor Craig C. Douglas<br />

http://www.mgnet.org/~douglas/Classes/discrete-math/notes/2007s.pdf<br />

Material Covered (<strong>Spring</strong> 2007)<br />

Tuesday Pages Thursday Pages<br />

1/11 1-9<br />

1/16 9-24 1/18 24-33<br />

1/23 34-45 1/25 46-52<br />

1/30 53-65 2/1 Exam 1<br />

2/6 66-73 2/8 74-83<br />

2/13 84-92 2/15 92-94<br />

2/20 95-106 2/22 106-115<br />

2/27 116-124 3/1 Exam 2<br />

3/6 125-132 3/8 No class<br />

3/13 <strong>Spring</strong> 3/15 Break<br />

3/20 132-142 3/22 No class<br />

3/26 142-156 3/28 Exam 3<br />

4/3 157-169 4/5 170-177<br />

4/10 178-185 4/12 186-197<br />

4/17 198-210 4/19 Exam 4<br />

4/24 211-217 4/26 Rama: review<br />

5/1 No class 5/3 Final: 8-10 AM<br />

The final exam will cover Chapters 1-10.<br />

2


Course Outline<br />

1. Logic Principles<br />

2. Sets, Functions, Sequences, and Sums<br />

3. Algorithms, Integers, and Matrices<br />

4. Induction and Recursion<br />

5. Simple Counting Principles<br />

6. <strong>Discrete</strong> Probability<br />

7. Advanced Counting Principles<br />

8. Relations<br />

9. Graphs<br />

10. Trees<br />

11. Boolean Algebra<br />

12. Modeling Computation<br />

3<br />

Logic Principles<br />

Basic values: T or F representing true or false, respectively. In a computer T an<br />

F may be represented by 1 or 0 bits.<br />

Basic items:<br />

• Propositions<br />

o Logic and Equivalences<br />

• Truth tables<br />

• Predicates<br />

• Quantifiers<br />

• Rules <strong>of</strong> Inference<br />

• Pro<strong>of</strong>s<br />

o Concrete, outlines, hand waving, and false<br />

4


Definition: A proposition is a statement <strong>of</strong> a true or false fact (but not both).<br />

Examples:<br />

• 2+2 = 4 is a proposition because this is a fact.<br />

• x+1 = 2 is not a proposition unless a specific value <strong>of</strong> x is stated.<br />

Definition: The negation <strong>of</strong> a proposition p, denoted by ¬p and pronounced not<br />

p, means that, “it is not the case that p.” The truth values for ¬p are the opposite<br />

for p.<br />

Examples:<br />

• p: Today is Thursay, ¬p: Today is not Thursday.<br />

• p: At least a foot <strong>of</strong> snow falls in Boulder on Fridays. ¬p: Less than a foot<br />

<strong>of</strong> snow falls in Boulder on Fridays.<br />

5<br />

Definition: The conjunction <strong>of</strong> propositions p and q, denoted p!q, is true if<br />

both p and q are true, otherwise false.<br />

Definition: The disjunction <strong>of</strong> propositions p and q, denoted p"q, is true if<br />

either p or q is true, otherwise false.<br />

Definition: The exclusive or <strong>of</strong> propositions p and q, denoted p#q, is true if<br />

only one <strong>of</strong> p and q is true, otherwise false.<br />

Truth tables:<br />

p ¬p q p!q p"q p#q<br />

T F T T T F<br />

T * F * F F T T<br />

F * T * T F T T<br />

F T F F F F<br />

* The truth table for p and ¬p is really a 2$2 table.<br />

6


Concepts so far can be extended to Boolean variables and Bit strings.<br />

Definition: A bit is a binary digit. Hence, it has two possible values: 0 and 1.<br />

Definition: A bit string is a sequence <strong>of</strong> zero or more bits. The length <strong>of</strong> a bit<br />

string is the number <strong>of</strong> bits.<br />

Definition: The bitwise operators OR, AND, and XOR are defined based on<br />

!, ", and #, bit by bit in a bit string.<br />

Examples:<br />

• 010111 is a bit string <strong>of</strong> length 6<br />

• 010111 OR 110000 = 110111<br />

• 010111 AND 110000 = 010000<br />

• 010111 XOR 110000 = 100111<br />

7<br />

Definition: The conditional statement is an implication, denoted p%q, and is<br />

false when p is true and q is false, otherwise it is true. In this case p is known<br />

as a hypothesis (or antecedent or premise) and q is known as the conclusion<br />

(or consequence).<br />

Definition: The biconditional statement is a bi-implication, denoted p&q, and<br />

is true if and only if p and q have the same truth table values.<br />

Truth tables:<br />

p q p%q p&q<br />

T T T T<br />

T F F F<br />

F T T F<br />

F F T T<br />

8


We can compound logical operators to make complicated propositions. In<br />

general, using parentheses makes the expressions clearer, even though more<br />

symbols are used. However, there is a well defined operator precedence<br />

accepted in the field. Lower numbered operators take precedence over higher<br />

numbered operators.<br />

Examples:<br />

• ¬p!q = (¬p) !q<br />

• p!q"r = (p!q) "r<br />

Operator Precedence<br />

¬ 1<br />

! 2<br />

" 3<br />

% 4<br />

& 5<br />

9<br />

Definition: A compound proposition that is always true is a tautology. One that<br />

is always false is a contradiction. One that is neither is a contingency.<br />

Example:<br />

p ¬p p!¬p p"¬p<br />

T F F T<br />

F T F T<br />

contigencies contradiction tautology<br />

Definition: Compound propositions p and q are logically equivalent if p&q is a<br />

tautology and is denoted p'q (sometimes written as p!q instead).<br />

10


Theorem: ¬(p"q) ' ¬p ! ¬q.<br />

Pro<strong>of</strong>: Construct a truth table.<br />

p q ¬(p"q) ¬p ¬q ¬p!¬q<br />

T T F F F F<br />

T F F F T F<br />

F T F T F F<br />

F F T T T T<br />

qed<br />

Theorem: ¬(p!q) ' ¬p " ¬q.<br />

Pro<strong>of</strong>: Construct a truth table similar to the previous theorem.<br />

These two theorems are known as DeMorgan’s laws and can be extended to any<br />

number <strong>of</strong> propositions:<br />

¬(p 1 "p 2 "…"p k ) ' ¬ p 1 ! ¬ p 2 ! … ! ¬ p k<br />

¬(p 1 !p 2 !…!p k ) ' ¬ p 1 " ¬ p 2 " … " ¬ p k<br />

11<br />

Theorem: p%q ' ¬p"q.<br />

Pro<strong>of</strong>: Construct a truth table.<br />

p q p%q ¬p ¬p"q<br />

T T T F T<br />

T F F F F<br />

F T T T T<br />

F F T T T<br />

qed<br />

These pro<strong>of</strong>s are examples are concrete ones that are proven using an exhaustive<br />

search <strong>of</strong> all possibilities. As the number <strong>of</strong> propositions grows, the number <strong>of</strong><br />

possibilities grows like 2 k for k propositions.<br />

The distributive laws are an example when k=3.<br />

12


Theorem: p" (q!r) ' (p"q)!(p"r).<br />

Pro<strong>of</strong>: Construct a truth table.<br />

p q r p" (q!r) p"q p"r (p"q)!(p"r)<br />

T T T T T T T<br />

T T F T T T T<br />

T F T T T T T<br />

T F F T T T T<br />

F T T T T T T<br />

F T F F T F F<br />

F F T F F T F<br />

F F F F F F F<br />

qed<br />

Theorem: p! (q"r) ' (p!q) " (p!r).<br />

Pro<strong>of</strong>: Construct a truth table similar to the previous theorem.<br />

13<br />

Some well known logical equivalences includes the following laws:<br />

'<br />

p!T'p<br />

p"F'p<br />

p"T'T<br />

p!F'F<br />

p"p'p<br />

p!p'p<br />

¬(¬p) 'p<br />

p"¬p 'T<br />

p!¬p 'F<br />

p"q'q"p<br />

p!q'q!p<br />

(p"q)"r' p"(q"r)<br />

(p!q) !r' p!(q!r)<br />

Law<br />

Identity<br />

Domination<br />

Idempotent<br />

Double negation<br />

Negation<br />

Commutative<br />

Associative<br />

14


'<br />

p"(q!r) '(p"q)!(q"r)<br />

p!(q"r) '(p!q)"(q!r)<br />

¬(p"q) ' ¬p!¬q<br />

¬(p!q) ' ¬p"¬q<br />

p"(p!q)'p<br />

p!(p"q)'p<br />

Law<br />

Distributive<br />

DeMorgan<br />

Absorption<br />

All <strong>of</strong> these laws can be proven concretely using truth tables. It is a good<br />

exercise to see if you can prove some.<br />

15<br />

Well known logical equivalences involving conditional statements:<br />

p%q ' ¬p"q<br />

p%q ' ¬q%¬p<br />

p"q ' ¬p%q<br />

p!q ' ¬(p%¬q)<br />

¬(p%q) ' p!¬q<br />

(p%q)!(p%r) ' p%(q!r)<br />

(p%r)!(q%r) ' (p"q)%r<br />

(p%q)"(p%r) ' p%(q"r)<br />

(p%r)"(q%r) ' (p!q)%r<br />

Well known logical equivalences involving biconditional statements:<br />

p&q ' (p%q)!(q%p)<br />

p&q ' ¬p&¬q<br />

p&q ' (p!q) " (¬p!¬q)<br />

¬(p&q) ' p&¬q<br />

16


Propositional logic is pretty limited. Almost anything you really are interested in<br />

requires a more sophisticated form <strong>of</strong> logic: predicate logic with quantifiers (or<br />

predicate calculus).<br />

Definition: P(x) is a propositional function when a specific value x is substituted<br />

for the expression in P(x) gives us a proposition. The part <strong>of</strong> the expression<br />

referring to x is known as the predicate.<br />

Examples:<br />

• P(x): x > 24. P(2) = F, P(102) = T.<br />

• P(x): x = y + 1. P(x) = T for one value only (y is an unbounded variable).<br />

• P(x,y): x = y + 1. P(2,1) = T, P(102,-14) = F.<br />

Definition: A statement <strong>of</strong> the form P(x 1 ,x 2 ,…,x n ) is the value <strong>of</strong> the<br />

propositional function P at the n-tuple (x 1 ,x 2 ,…,x n ). P is also known as a n-place<br />

(or n-ary) predicate.<br />

17<br />

Definition: The universal quantification <strong>of</strong> P(x) is the statement P(x) is true for<br />

all values <strong>of</strong> x in some domain, denoted by (x P(x).<br />

Definition: The existential quantification <strong>of</strong> P(x) is the statement P(x) is true for<br />

at least one value <strong>of</strong> x in some domain, denoted by )x P(x).<br />

Definition: The uniqueness quantification <strong>of</strong> P(x) is the statement P(x) is true for<br />

exactly one value <strong>of</strong> x in some domain, denoted by )!x P(x).<br />

There is an infinite number <strong>of</strong> quantifiers that can be constructed, but the three<br />

above are among the most important and common.<br />

Examples: Assume x belongs to the real numbers.<br />

• (x 0). The negative real numbers form the domain.<br />

• )!x (x 1223 = 0).<br />

18


( and ) have higher precedence than the logical operators.<br />

Example: (x P(x)!Q(x) means ((x P(x))!Q(x).<br />

Definition: When a variable is used in a quantification, it is said to be bound.<br />

Otherwise the variable is free.<br />

Example: )x (x = y + 1).<br />

Definition: Statements involving predicates and quantifiers are logically<br />

equivalent if and only if they have the same truth value independent <strong>of</strong> which<br />

predicates are substituted and in which domains are used. Notation: S ' T.<br />

DeMorgan’s Laws for Negation:<br />

• ¬)x P(x) ' (x ¬P(x).<br />

• ¬(x P(x) ' )x ¬P(x).<br />

19<br />

Nested quantifiers just means that more than one is in a statement. The order <strong>of</strong><br />

quantifiers is important.<br />

Examples: Assume x and y belong to the real numbers.<br />

• (x)y (x + y = 0).<br />

• (x(y (x < 0) ! (y > 0) % xy < 0.<br />

Quantification <strong>of</strong> two variables:<br />

Statement When True? When False?<br />

(x(y P(x,y) For all x and y, P(x,y)=T. There is a pair <strong>of</strong> x and y such that<br />

P(x,y)=F.<br />

(x)y P(x,y) For all x there is a y such There is an x such that for all y,<br />

that P(x,y)=T<br />

P(x,y)=F.<br />

)x(y P(x,y) There is an x such that for For all x there is a y such that<br />

)x)y P(x,y)<br />

all y, P(x,y)=T.<br />

There is a pair x and y<br />

such that P(x,y)=T.<br />

P(x,y)=F.<br />

For all x and y, P(x,y)=F.<br />

20


Rules <strong>of</strong> Inference are used instead <strong>of</strong> truth tables in many instances. For n<br />

variables, there are 2 n rows in a truth table, which gets out <strong>of</strong> hand quickly.<br />

Definition: A propositional logic argument is a sequence <strong>of</strong> propositions. The<br />

last proposition is the conclusion. The earlier ones are the premises. An<br />

argument is valid if the truth <strong>of</strong> the premises implies the truth <strong>of</strong> the conclusion.<br />

Definition: A propositional logic argument form is a sequence <strong>of</strong> compound<br />

propositions involving propositional variables. An argument form is valid if no<br />

matter what particular propositions are substituted for the proposition variables<br />

in its premises, the conclusion remains true if the premises are all true.<br />

Translation: An argument form with premises p 1 , p 2 , …, p n and conclusion q is<br />

valid when (p 1 !p 2 !…!p n ) % q is a tautology.<br />

21<br />

There are eight basic rules <strong>of</strong> inference.<br />

Rule Tautology Name<br />

p<br />

[p!( p%q)] % q Modus ponens<br />

p%q<br />

*q<br />

¬q<br />

[¬q!(p%q)] % ¬p Modus tollens<br />

p%q<br />

*¬p<br />

p%q<br />

[(p%q)!(q%r)] % (p%r) Hypothetical syllogism<br />

q%r<br />

* p%r<br />

p"q<br />

[(p"q)!¬p] % q Disjunctive syllogism<br />

¬p<br />

*q<br />

p<br />

*p"q<br />

p % (p"q)<br />

Addition<br />

22


Rule Tautology Name<br />

p!q<br />

(p!q) % p<br />

Simplification<br />

*p<br />

p<br />

[(p)!(q)] % (p!q) Conjunction<br />

q<br />

*p!q<br />

p"q<br />

¬p"r<br />

*q"r<br />

[(p"q)!(¬p"r)]% (q"r) Resolution<br />

23<br />

Rules <strong>of</strong> Inference for Quantified Statements:<br />

Rule <strong>of</strong> Inference<br />

(x P(x)<br />

*P(c)<br />

P(c) for an arbitrary c<br />

*(x P(x)<br />

(x (P(x) % Q(x))<br />

P(a), where a is a particular element in the domain<br />

*Q(a)<br />

(x (P(x) % Q(x))<br />

¬Q(a), where a is a particular element in the domain<br />

*¬P(a)<br />

)x P(x)<br />

*P(c) for some c<br />

P(c) for some c<br />

*)x P(x)<br />

Name<br />

Universal instantiation<br />

Universal generalization<br />

Universal modus ponens<br />

Universal modus tollens<br />

Existential instantiation<br />

Existential generalization<br />

24


Sets, Functions, Sequences, and Sums<br />

Definition: A set is a collection <strong>of</strong> unordered elements.<br />

Examples:<br />

• Z = {…, -3, -2, -1, 0, 1, 2, 3, …}<br />

• N = {1, 2, 3, …} and + = N 0 = {0, 1, 2, 3, …} (Slightly different than text)<br />

• Q = {p/q | p,q,Z, q-0}<br />

• R = {reals}<br />

Definition: The cardinality <strong>of</strong> a set S is denoted |S|. If |S| = n, where n,Z, then<br />

the set S is a finite set. Otherwise it is an infinite set (|S| = .).<br />

Example: The cardinality <strong>of</strong> <strong>of</strong> Z, N, N 0 , Q, and R is infinite.<br />

25<br />

Definition: If |S| = |N|, then S is a countable set. Otherwise it is an uncountable<br />

set.<br />

Examples:<br />

• Q is countable.<br />

• R is uncountable.<br />

Definition: Two sets S and T are equal, denoted S = T, if and only if (x(x,S &<br />

x,T).<br />

Examples:<br />

• Let S = {0, 1, 2} and T = {2, 0, 1}. Then S = T. Order does not count.<br />

• Let S = {0, 1, 2} and T = {0, 1, 3}. Then S - T. Only the elements count.<br />

Definition: The empty set is denoted by /. Note that (S(/,S).<br />

26


Definition: A set S is a subset <strong>of</strong> a set T if (x,S(x,T) and is denoted S0T. S is<br />

a proper subset <strong>of</strong> T if S0T, but S-T and is denoted S1T.<br />

Example: S = {1, 0} and T = {0, 1, 2}. Then S1T.<br />

Theorem: (S(S0S).<br />

Pro<strong>of</strong>: By definition, (x,S(x,S).<br />

27<br />

Definition: The Power Set <strong>of</strong> a set S, denoted P(S), is the set <strong>of</strong> all possible<br />

subsets <strong>of</strong> S.<br />

Theorem: If |S| = n, then |P(S)| = 2 n .<br />

Example: S = {0, 1}. Then P(S) = {/, {0}, {1}, {0,1}}<br />

Definition: The Cartesian product <strong>of</strong> n sets A i is defined by ordered elements<br />

from the A i and is denoted A 1 $A 2 $…$A n = {(a 1 ,a 2 ,…a n ) | a i ,A i }.<br />

Example: Let S = {0, 1} and T = {a, b}. Then S$T = {(0,a), (0,b), (1,a), (1,b)}.<br />

Definition: The union <strong>of</strong> n sets A i is defined by<br />

n<br />

! = A 1 2A 2 2…2A n = {x | )i x,A i }.<br />

i=1 A i<br />

Definition: The intersection <strong>of</strong> n sets A i is defined by<br />

n<br />

! = A 1 3A 2 3…3A n = {x | (i x,A i<br />

i=1 A i<br />

28


Definition: n sets A i are disjoint if A 1 3A 2 3…3A n = /.<br />

Definition: The complement <strong>of</strong> set S with respect to T, denoted T4S, is defined<br />

by T4S = {x,T | x6S}. T4S is also called the difference <strong>of</strong> S and T.<br />

Definitions: The universal set is denoted U. The universal complement <strong>of</strong> S is<br />

S = U4S.<br />

29<br />

Examples:<br />

• Let S = {1, 0} and T = {0, 1, 2}. Then<br />

o S1T.<br />

o S3T = S.<br />

o S2T = T.<br />

o T4S = {2}.<br />

o Let U = N 0 . S = {2, 3, …}<br />

• Let S = {0, 1} and T = {2, 3}. Then<br />

o S5T.<br />

o S3T = /.<br />

o S2T = {0, 1, 2, 3}.<br />

o T4S = {2, 3}.<br />

o Let U=R. Then S is the set <strong>of</strong> all reals except the integers 0 and 1, i.e.,<br />

S = {x,R | x-0 ! x-1}.<br />

30


The textbook has a large number <strong>of</strong> set identities in a table.<br />

Identity<br />

Law(s)<br />

A2/ = A, A3U = A Identity<br />

A2U = U, A3/ = /<br />

Domination<br />

A2A = A, A3A = A<br />

Idempotent<br />

A = A<br />

Complementation<br />

A2B = B2A, A3B = B3A<br />

Commutative<br />

A2(B2C) = (A2B)2C, A3 (B3C) = (A3B) 3C Associative<br />

A3 (B2C) = (A3B) 2 (A3C)<br />

Distributive<br />

A2(B3C) = (A2B) 3 (A2C)<br />

A!B = A"B, A!B = A"B<br />

DeMorgan<br />

A2 (A3B) = A, A3 (A2B) = A<br />

Absorption<br />

A!A = U, A"A = #<br />

Complement<br />

Many <strong>of</strong> these are simple to prove from very basic laws.<br />

31<br />

Definition: A function f:A%B maps a set A to a set B, denoted f(a) = b for a,A<br />

and b,B, where the mapping (or transformation) is unique.<br />

Definition: If f:A%B, then<br />

• If (b,B )a,A (f(a) = b), then f is a surjective function or onto.<br />

• If A=B and f(a) = f(b) implies a = b, then f is one-to-one (1-1) or injective.<br />

• A function f is a bijection or a one-to-one correspondence if it is 1-1 and<br />

onto.<br />

Definition: Let f:A%B. A is the domain <strong>of</strong> f. The minimal set B such that<br />

f:A%B is onto is the image <strong>of</strong> f.<br />

Definitions: Some compound functions include<br />

n<br />

n<br />

• (!<br />

f<br />

i i )(a) = ! f (a) i=1 i . We can substitute + if we expand the summation.<br />

n<br />

n<br />

• (!<br />

f<br />

i=1 i )(a)=! f (a) i=1 i . We can substitute * if we expand the product.<br />

32


Definition: The composition <strong>of</strong> n functions f i : A i %A i+1 is defined by<br />

(f 1 °f 2 °…°f n )(a) = f 1 (f 2 (…(f n (a)…)),<br />

where a,A 1 .<br />

Definition: If f: A%B, then the inverse <strong>of</strong> f, denoted f -1 : B%A exists if and only<br />

if (b,B )a,A (f(a) = b ! f -1 (b) = a).<br />

Examples:<br />

• Let A = [0,1] 1 R, B = [0,2] 1 R.<br />

o f(a) = a 2 and g(a) = a+1. Then f+g: A%B and f*g: A%B.<br />

o f(a) = 2*a and g(a) = a-1. Then neither f+g: A%B nor f*g: A%B.<br />

• Let B = A = [0,1] 1 R.<br />

o f(a) = a 2 and g(a) = 1-a. Then f+g: A%A and f*g: A%A. Both<br />

compound functions are bijections.<br />

o f(a) = a 3 and g(a) = a 1/3 . Then g°f(a): A%A is a bijection.<br />

• Let A = [-1, 1] and B=[0, 1]. Then<br />

o f(a) = a 3 and g(a) = {x>0 | x= a 1/3 }. Then g°f(a): A%B is onto.<br />

33<br />

Definition: The graph <strong>of</strong> a function f is {(a,f(a)) | a,A}.<br />

Example: A = {0, 1, 2, 3, 4, 5} and f(a) = a 2 . Then<br />

(a) graph(f,A)<br />

(b) an approximation to graph(f,[0,5])<br />

34


Definitions: The floor and ceiling functions are defined by<br />

Examples:<br />

• 7x8 = largest integer smaller or equal to x.<br />

• 9x: = smallest integer larger or equal to x.<br />

• 72.998 = 2, 92.99: = 3<br />

• 7-2.998 = -3, 9-2.99: = -2<br />

Definition: A sequence is a function from either N or a subset <strong>of</strong> N to a set A<br />

whose elements a i are the terms <strong>of</strong> the sequence.<br />

Definitions: A geometric progression is a sequence <strong>of</strong> the form {ar i , i=0, 1, …}.<br />

An arithmetic progression is a sequence <strong>of</strong> the form {a+id, i=0, 1,…}.<br />

Translation: f(a,r,i) = ar i and f(a,d,i) = a + id are the corresponding functions.<br />

35<br />

There are a number <strong>of</strong> interesting summations that have closed form solutions.<br />

Theorem: If a,r,R, then<br />

"(n+1)a,<br />

if r=1,<br />

n $<br />

! ar i =<br />

i=0<br />

# ar n+1 -a<br />

$<br />

% r-1 , otherwise.<br />

Pro<strong>of</strong>: If r = 1, then we are left summing a n+1 times. Hence, the r = 1 case is<br />

n<br />

trivial. Suppose r - 1. Let S = ! ar i . Then<br />

n<br />

i=0<br />

i=0<br />

rS = r! ar i<br />

Substitution S formula.<br />

n+1<br />

! ar i<br />

Simplifying.<br />

"<br />

#<br />

$<br />

!<br />

i=1<br />

n<br />

i=0<br />

ar i<br />

S+(ar n+1 -a)<br />

( ) Removing n+1 term and adding 0 term.<br />

%<br />

&<br />

' + arn+1 -a<br />

Substituting S for formula<br />

Solve for S in rS = S+(ar n+1 -a) to get the desired formula.<br />

qed<br />

36


Some other common summations with closed form solutions are<br />

Sum<br />

Closed Form Solution<br />

n<br />

i<br />

n(n+1)<br />

! i=1<br />

2<br />

n<br />

! i<br />

i=1<br />

2<br />

n(n+1)(2n+1)<br />

6<br />

n<br />

! i<br />

i=1<br />

3<br />

n 2 (n+1) 2<br />

4<br />

!<br />

! x<br />

i=0<br />

i , |x|


Theorem: If f i (x) is O(g i (x)), for 1;i;n, then<br />

n<br />

i=1<br />

! f i<br />

(x) is O(max{|g 1 (x)|, |g 2 (x)|, …, |g n (x)|}).<br />

Pro<strong>of</strong>: Let g(x) = max{|g 1 (x)|, |g 2 (x)|, …, |g n (x)|} and C i the constants associated<br />

with O(g i (x)). Then<br />

n<br />

i=1<br />

n<br />

i=1<br />

n<br />

i=1<br />

! f i<br />

(x) ; ! C i<br />

g i<br />

(x) ; ! C i<br />

g(x) = |g(x)| ! C i<br />

= C|g(x)|.<br />

Theorem: If f i (x) is O(g i (x)), for 1;i;n, then ! f i<br />

(x) is O( ! g i<br />

(x) ).<br />

Pro<strong>of</strong>: Let g(x) = |g 1 (x)|$|g 2 (x)|$…|g n (x)| and C i the constants associated with<br />

O(g i (x)). Then<br />

n<br />

i=1<br />

n<br />

i=1<br />

n<br />

i=1<br />

! f i<br />

(x) ; ! C i<br />

g i<br />

(x) ; C! g i<br />

(x) .<br />

n<br />

i=1<br />

n<br />

i=1<br />

n<br />

i=1<br />

39<br />

Definition: Let f and g be functions from either Z or R to R. Then f(x) is<br />

=(g(x)) if there are constants C and k such that |f(x)| < C|g(x)| whenever x>k.<br />

Definition: Let f and g be functions from either Z or R to R. Then f(x) is<br />

>(g(x)) if f(x) = O(g(x)) and f(x) = =(g(x)). In this case, we say that f(x) is <strong>of</strong><br />

order g(x).<br />

Comment: f(x) = O(g(x)) notation is great in the limit, but does not always<br />

provide the right bounds for all values <strong>of</strong> x. =, denoted Big Omega, is used to<br />

provide lower bounds. >, denoted Big Theta, is used to provide both lower and<br />

upper bounds.<br />

n<br />

i=0<br />

Example: f(x) = ! a i<br />

x i with a n -0 is <strong>of</strong> order x n .<br />

40


Notation: Timing, as a function <strong>of</strong> the number <strong>of</strong> elements falls into the field <strong>of</strong><br />

Complexity.<br />

Complexity Terminology<br />

>(1) Constant<br />

>(log(n)) Logarithmic<br />

>(n)<br />

Linear<br />

>(nlog(n)) nlog(n)<br />

>(n k )<br />

Polynomial<br />

>(n k log(n)) Polylog<br />

>(k n ), where k>1 Exponential<br />

>(n!)<br />

Factorial<br />

Notation: Problems are tractable if they can be solved in polynomial time and<br />

are intractable otherwise.<br />

41<br />

Algorithms, Integers, and Matrices<br />

Definition: An algorithm is a finite set <strong>of</strong> precise instructions for solving a<br />

problem.<br />

Computational algorithms should have these properties:<br />

• Input: Values from a specified set.<br />

• Output: Results using the input from a specified set.<br />

• Definiteness: The steps in the algorithm are precise.<br />

• Correctness: The output produced from the input is the right solution.<br />

• Finiteness: The results are produced using a finite number <strong>of</strong> steps.<br />

• Effectiveness: Each step must be performable and in a finite amount <strong>of</strong><br />

time.<br />

• Generality: The procedure should accept all input from the input set, not<br />

just special cases.<br />

42


!<br />

Algorithm: Find the maximum value <strong>of</strong> " a i<br />

! $<br />

procedure max( " a<br />

# i<br />

% : integers)<br />

&i=1<br />

max := a 1<br />

for i := 2 to n<br />

if max < a i then max := a i<br />

{max is the largest element}<br />

n<br />

#<br />

n<br />

$<br />

%<br />

&i=1<br />

, where n is finite.<br />

Pro<strong>of</strong> <strong>of</strong> correctness: We use induction.<br />

1. Suppose n = 1, then max := a 1 , which is the correct result.<br />

2. Suppose the result is true for k = 1, 2, …, i-1. Then at step i, we know that<br />

max is the largest element in a 1 , a 2 , …, a i-1 . In the if statement, either max is<br />

already larger than a i or it is set to a i . Hence, max is the largest element in<br />

a 1 , a 2 , …, a i . Since i was arbitrary, we are done. qed<br />

This algorithm’s input and output are well defined and the overall algorithm can<br />

be performed in O(n) time since n is finite. There are no restrictions on the input<br />

set other than the elements are integers.<br />

43<br />

!<br />

Algorithm: Find a value in a sorted, distinct valued " a<br />

# i<br />

There are many, many search algorithms.<br />

! $<br />

procedure linear_search(x, " a<br />

# i<br />

% : integers)<br />

&i=1<br />

i := 1<br />

while (i;n and x-a i )<br />

i := i + 1<br />

if i;n then location := i else location := 0<br />

!<br />

{location is the subscript <strong>of</strong> " a<br />

# i<br />

n<br />

n<br />

$<br />

%<br />

&i=1<br />

n<br />

$<br />

%<br />

&i=1<br />

, where n is finite.<br />

!<br />

equal to x or 0 if x is not in " a<br />

# i<br />

n<br />

$<br />

% }<br />

&i=1<br />

We can prove that this algorithm is correct using an induction argument. This<br />

algorithm does not rely on either distinctiveness nor sorted elements.<br />

Linear search works, but it is very slow in comparison to many other searching<br />

algorithms. It takes 2n+2 comparisons in the worst case, i.e., O(n) time.<br />

44


! $<br />

procedure binary_search(x, " a<br />

# i<br />

% : integers)<br />

&i=1<br />

i := 1<br />

j := n<br />

while ( i < j )<br />

m := 7(i+j)/28<br />

if x > a m then i := m+1 else j := m<br />

if x = a i then location := i else location := 0<br />

!<br />

{location is the subscript <strong>of</strong> " a<br />

# i<br />

n<br />

n<br />

$<br />

%<br />

&i=1<br />

!<br />

equal to x or 0 if x is not in " a<br />

# i<br />

We can prove that this algorithm is correct using an induction argument.<br />

n<br />

$<br />

% }<br />

&i=1<br />

This algorithm is much, much faster than linear_search on average. It is O(logn)<br />

!<br />

in time. The average time to find a member <strong>of</strong> " a<br />

# i<br />

order n.<br />

n<br />

$<br />

%<br />

&i=1<br />

can be proven to be <strong>of</strong><br />

45<br />

!<br />

Algorithm: Sort the distinct valued " a<br />

# i<br />

finite.<br />

n<br />

$<br />

%<br />

&i=1<br />

into increasing order, where n is<br />

There are many, many sorting algorithms.<br />

! $<br />

procedure bubble_sort( " a<br />

# i<br />

% : reals, n>1)<br />

&i=1<br />

for i := 1 to n-1<br />

for j := 1 to n-i<br />

if a j > a j+1 then swap a j and a j+1<br />

!<br />

{ " a<br />

# i<br />

n<br />

$<br />

%<br />

&i=1<br />

is in increasing order}<br />

n<br />

This is one <strong>of</strong> the simplest sorting algorithms. It is expensive, however, but quite<br />

easy to understand and implement. Only one temporary is needed for the<br />

swapping and two loop variables as extra storage. The worst case time is O(n 2 ).<br />

46


!<br />

procedure insertion_sort( " a<br />

# i<br />

for j := 2 to n<br />

i := 1<br />

while a j > a i<br />

i := i + 1<br />

t := a j<br />

for k := 0 to j-i-1<br />

a j-k := a j-k-1<br />

a i := t<br />

!<br />

{ " a<br />

# i<br />

n<br />

$<br />

%<br />

&i=1<br />

n<br />

$<br />

%<br />

&i=1<br />

is in increasing order}<br />

: reals, n>1)<br />

This is not a very efficient sorting algorithm either. However, it is easy to see<br />

that at the j th step that the j th element is put into the correct spot. The worst case<br />

time is O(n 2 ). In fact, insertion_sort is trivially slower than bubble_sort.<br />

47<br />

Number theory is a rich field <strong>of</strong> mathematics. We will study four aspects briefly:<br />

1. Integers and division<br />

2. Primes and greatest common denominators<br />

3. Integers and algorithms<br />

4. Applications <strong>of</strong> number theory<br />

Most <strong>of</strong> the theorems quoted in this part <strong>of</strong> the textbook require knowledge <strong>of</strong><br />

mathematical induction to rigorously prove, a topic covered in detail in the next<br />

chapter. !<br />

48


Definition: If a,b,Z and a-0, we say that a divides b if )c,Z(b=ac), denoted by<br />

a | b. When a divides b, we denote a as a factor <strong>of</strong> b and b as a multiple <strong>of</strong> a.<br />

When a does not divide b, we denote this as a/|b.<br />

Theorem: Let a,b,c,Z. Then<br />

1. If a | b and a | c, then a | (b+c).<br />

2. If a | b, then a | (bc).<br />

3. If a | b and b | c, then a | c.<br />

Pro<strong>of</strong>: Since a | b, )s,Z(b=as).<br />

1. Since a | c it follows that ) t,Z(c=at). Hence, b+c = as + at = a(s+t).<br />

Therefore, a | (b+c).<br />

2. bc = (as)c = a(sc). Therefore, a | (bc).<br />

3. Since b | c it follows that ) t,Z(c=bt). c = bt = (as)t = a(st), Therefore, a | c.<br />

Corollary: Let a,b,c,Z. If a | b and b | c, then a | (mb+nc) for all m,n,Z.<br />

49<br />

Theorem (Division Algorithm): Let a,d,Z(d > 0). Then )!q,r,Z(a = dq+r).<br />

Definition: In the division algorithm, a is the dividend, d is the divisor, q is the<br />

quotient, and r is the remainder. We write q = a div d and r = a mod d.<br />

Examples:<br />

• Consider 101 divided by 9: 101 = 11$9 + 2.<br />

• Consider -11 divided by 3: -11 = 3(-4) + 1.<br />

Definition: Let a,b,m,Z(m > 0). Then a is congruent to b modulo m if m | (a-b),<br />

denoted a ' b (mod m). The set <strong>of</strong> integers congruent to an integer a modulo m<br />

is called the congruence class <strong>of</strong> a modulo m.<br />

Theorem: Let a,b,m,Z(m > 0). Then a ' b (mod m) if and only if a mod m = b<br />

mod m.<br />

50


Examples:<br />

• Does 17 ' 5 mod 6? Yes, since 17 – 5 = 12 and 6 | 12.<br />

• Does 24 ' 14 mod 6? No, since 24 – 14 = 10, which is not divisible by 6.<br />

Theorem: Let a,b,m,Z(m > 0). Then a ' b (mod m) if and only if<br />

)k,Z(a=b+km).<br />

Pro<strong>of</strong>: If a ' b (mod m), then m | (a-b). So, there is a k such that a-b = km, or a =<br />

b+km. Conversely, if there is a k such that a = b + km, then km = a-b. Hence, m<br />

| (a-b), or a ' b (mod m).<br />

Theorem: Let a,b,c,d,m,Z(m > 0). If a ' b (mod m) and c ' d (mod m), then<br />

a+c ' b+d (mod m) and ac ' bd (mod m).<br />

Corollary: Let a,b,m,Z(m > 0). Then (a+b) mod m = ((a mod m)+(b mod m))<br />

mod m and (ab) mod m = ((a mod m)(b mod m)) mod m.<br />

51<br />

Some applications involving congruence include<br />

• Hashing functions h(k) = k mod m.<br />

• Pseudorandom numbers: x n+1 = (ax n +c) mod m.<br />

o c = 0 is known as a pure multiplicative generator.<br />

o c - 0 is known as a linear congruential generator.<br />

• Cryptography<br />

Definition: A positive integer a is a prime if it is divisible only by 1 and a. It is a<br />

composite otherwise.<br />

Fundamental Theorem <strong>of</strong> Arithmetic: Every positive integer greater than 1 can<br />

be written uniquely as a prime or the product <strong>of</strong> two or more primes where the<br />

prime factors are written in nondecreasing order.<br />

Theorem: If a is a composite number, then a has a prime divisor less than or<br />

equal to a 1/2 .<br />

52


Theorem: There are infinitely many primes.<br />

Prime Number Theorem: The ratio <strong>of</strong> primes not exceeding a and x/ln(a)<br />

approaches 1 as a%..<br />

Example: The odds <strong>of</strong> a randomly chosen positive integer n being prime is given<br />

by (n/ln(n))/n = 1/ln(n) asymptotically.<br />

There are still a number <strong>of</strong> open questions regarding the distribution <strong>of</strong> primes.<br />

Definition: Let a,b,Z(a and b not both 0). The largest integer d such that d | a<br />

and d | b is the greatest common devisor <strong>of</strong> a and b, denoted by gcd(a,b).<br />

Example: gcd(24,36) = 12.<br />

Definition: The integers a and b are relatively prime if gcd(a,b) = 1.<br />

53<br />

!<br />

Definition: The integers " a<br />

# i<br />

whenever 1;i


Integers can be expressed uniquely in any base.<br />

Theorem: Let b,Z(b>1). Then if n,N, then there is a unique expression such<br />

that n = a k b k + a k-1 b k-1 +…+a 1 b+a 0 , where {a i },k,N 0 , a k -0, and 0;a i 1, is really easy. Just group k bits together and<br />

convert to the base 2 k symbol.<br />

• Base 10 to any base 2 k is a pain.<br />

• Base 2 k to base 10 is also a pain.<br />

56


Algorithm: Addition <strong>of</strong> integers<br />

procedure add(a, b: integers)<br />

(a n-1 a n-2 …a 1 a 0 ) 2 := base_2_expansion(a)<br />

(b n-1 b n-2 …b 1 b 0 ) 2 := base_2_expansion(b)<br />

c := 0<br />

for j := 0 to n-1<br />

d := 7(a j +b j +c)/28<br />

s j := a j +b j +c – 2d<br />

c := d<br />

s n := c<br />

{the binary expansion <strong>of</strong> the sum is (s k-1 s k-2 …s 1 s 0 ) 2 }<br />

Questions:<br />

• What is the complexity <strong>of</strong> this algorithm?<br />

• Is this the fastest way to compute the sum?<br />

57<br />

Algorithm: Mutiplication <strong>of</strong> integers<br />

procedure multiply(a, b: integers)<br />

(a n-1 a n-2 …a 1 a 0 ) 2 := base_2_expansion(a)<br />

(b n-1 b n-2 …b 1 b 0 ) 2 := base_2_expansion(b)<br />

for j := 0 to n-1<br />

if b j = 1 then c j := a shifted j places else c j := 0<br />

{c 0 ,c 1 ,…,c n-1 are the partial products}<br />

p := 0<br />

for j := 0 to n-1<br />

p := p + c j<br />

{p is the value <strong>of</strong> ab}<br />

Examples:<br />

• (10) 2 $(11) 2 = (110) 2 . Note that there are more bits than the original integers.<br />

• (11) 2 $(11) 2 = (1001) 2 . Twice as many binary digits!<br />

58


Algorithm: Compute div and mod<br />

procedure division(a: integer, d: positive integer)<br />

q := 0<br />

r := |a|<br />

while r < d<br />

r := r – d<br />

q := q + 1<br />

if a < 0 and r > 0 then<br />

r := d – r<br />

q := -(q + 1)<br />

{q = a div d is the quotient and r = a mod d is the remainder}<br />

Notes:<br />

• The complexity <strong>of</strong> the multiplication algorithm is O(n 2 ). Much more<br />

efficient algorithms exist, including one that is O(n 1.585 ) using a divide and<br />

conquer technique we will see later in the course.<br />

• There are O(log(a)log(d)) complexity algorithms for division.<br />

59<br />

Modular exponentiation, b k mod m, where b, k, and m are large integers is<br />

important to compute efficiently to the field <strong>of</strong> cryptology.<br />

Algorithm: Modular exponentiation<br />

procedure modular_exponentiation(b: integer, k,m: positive integers)<br />

(a n-1 a n-2 …a 1 a 0 ) 2 := base_2_expansion(k)<br />

y := 1<br />

power := b mod m<br />

for i := 0 to n-1<br />

if a i = 1 then y := (y $ power) mod m<br />

power := (power $ power) mod m<br />

{y = b k mod m}<br />

Note: The complexity is O((log(m)) 2 log(k)) bit operations, which is fast.<br />

60


Euclidean Algorithm: Compute gcd(a,b)<br />

procedure gcd(a,b: positive integers)<br />

x := a<br />

y := b<br />

while y-0<br />

r := x mod y<br />

x := y<br />

y := r<br />

{gcd(a,b) is x}<br />

Correctness <strong>of</strong> this algorithm is based on<br />

Lemma: Let a=bq+r, where a,b,q,r,Z. then gcd(a,b) = gcd(b,r).<br />

The complexity will be studied after we master mathematical induction.<br />

61<br />

Number theory useful results<br />

Theorem: If a,b,N then )s,t,Z(gcd(a,b) = sa+tb).<br />

Lemma: If a,b,c,N (gcd(a,b) = 1 and a | bc, then a | c).<br />

Note: This lemma makes proving the prime factorization theorem doable.<br />

Lemma: If p is a prime and p | a 1 a 2 …a n where each a i ,Z, then p | a i for some i.<br />

Theorem: Let m,N and let a,b,c,Z. If ac ' bc (mod m) and gcd(c,m) = 1, then<br />

a ' b (mod m).<br />

Definition: A linear congruence is a congruence <strong>of</strong> the form ax ' b (mod m),<br />

where m,N, a,b,Z, and x is a variable.<br />

Definition: An inverse <strong>of</strong> a modulo m is an a such that aa ' 1 (mod m).<br />

62


Theorem: If a and m are relatively prime integers and m>1, then an inverse <strong>of</strong> a<br />

modulo m exists and is unique modulo m.<br />

Pro<strong>of</strong>: Since gcd(a,m) = 1, )s,t,Z(1 = sa+tb). Hence, sa=tb ' 1 (mod m). Since<br />

tm ' 0 (mod m), it follows that sa ' 1 (mod m). Thus, s is the inverse <strong>of</strong> a<br />

modulo m. The uniqueness argument is made by assuming there are two<br />

inverses and proving this is a contradiction.<br />

Systems <strong>of</strong> linear congruences are used in large integer arithmetic. The basis for<br />

the arithmetic goes back to China 1700 years ago.<br />

Puzzle Sun Tzu (or Sun Zi): There are certain things whose number is unknown.<br />

• When divided by 3, the remainder is 2.<br />

• When divided by 5, the remainder is 3, and<br />

• When divided by 7, the remainder is 2.<br />

What will be the number <strong>of</strong> things? (Answer: 23… stay tuned why).<br />

63<br />

Chinese Remander Theorem: Let m 1 , m 2 ,…,m n ,N be pairwise relatively prime.<br />

n<br />

Then the system x ' a i (mod m i ) has a unique solution modulo m = ! .<br />

i=1<br />

m i<br />

Existence Pro<strong>of</strong>: The pro<strong>of</strong> is by construction. Let M k = m / m k , 1;k;n. Then<br />

gcd(M k , m k ) = 1 (from pairwise relatively prime condition). By the previous<br />

theorem we know that there is a y k which is an inverse <strong>of</strong> M k modulo m k , i.e.,<br />

M k y k ' 1 (mod m k ). To construct the solution, form the sum<br />

x = a 1 M 1 y 1 + a 2 M 2 y 2 + … + a n M n y n .<br />

Note that M j ' 0 (mod m k ) whenever j-k. Hence,<br />

x ' a k M k y k ' a k (mod m k ), 1;k;n.<br />

We have shown that x is simultaneous solution to the n congruences. qed<br />

64


Sun Tzu’s Puzzle: The a k ,{2, 1, 2} from 2 pages earlier. Next<br />

The inverses y k are<br />

m k ,{3, 5, 7}, m=3$5$7=105, and M k =m/m k ,{35, 21, 15}.<br />

1. y 1 = 2 (M 1 = 35 modulo 3).<br />

2. y 2 = 1 (M 2 = 21 modulo 5).<br />

3. y 3 = 1 (M 3 = 15 modulo 7).<br />

The solutions to this system are those x such that<br />

x ' a 1 M 1 y 1 + a 2 M 2 y 2 + a 2 M 2 y 2 = 2$35$2 + 3$21$1 + 2$15$1 = 233<br />

Finally, 233 ' 23 (mod 105).<br />

65<br />

Definition: A m$n matrix is a rectangular array <strong>of</strong> numbers with m rows and n<br />

columns. The elements <strong>of</strong> a matrix A are noted by A ij or a ij . A matrix with m=n<br />

is a square matrix. If two matrices A and B have the same number <strong>of</strong> rows and<br />

columns and all <strong>of</strong> the elements A ij = B ij , then A = B.<br />

Definition: The transpose <strong>of</strong> a m$n matrix A = [A ij ], denoted A T , is A T = [A ji ]. A<br />

matrix is symmetric if A = A T and skew symmetric if A = -A T .<br />

Definition: The i th row <strong>of</strong> an m$n matrix A is [A i1 , A i2 , …, A in ]. The j th column<br />

is [A 1j , A 2j , …, A mj ] T .<br />

Definition: Matrix arithmetic is not exactly the same as scalar arithmetic:<br />

• C = A + B: c ij = a ij + b ij , where A and B are m$n.<br />

• C = A – B: c ij = a ij - b ij , where A and B are m$n<br />

k<br />

• C = AB: c ij<br />

= ! a ip<br />

b pj<br />

, where A is m$k, B is k$n, and C is m$n.<br />

p=1<br />

66


Theorem: A±B = B±A, but AB-BA in general.<br />

Definition: The identity matrix I n is n$n with I ii = 1 and I ij = 0 if i-j.<br />

Theorem: If A is n$n, then AI n = I n A = A.<br />

Definition: A r = AA … A (r times).<br />

Definition: Zero-One matrices are matrices A = [a ij ] such that all a ij ,{0, 1}.<br />

Boolean operations are defined on m$n zero-one matrices A = [a ij ] and B = [b ij ]<br />

by<br />

• Meet <strong>of</strong> A and B: A!B = a ij !b ij , 1;i;m and 1;j;n.<br />

• Join <strong>of</strong> A and B: A"B = a ij "b ij , 1;i;m and 1;j;n.<br />

• The Boolean product <strong>of</strong> A and B is C = A !B, where A is m$k, B is k$n,<br />

and C is m$n, is defined by c ij = (a i1 !b 1j )"(a i2 !b 2j )"…"(a ik !b kj ).<br />

Definition: The Boolean power <strong>of</strong> a n$n matrix A is defined by A [r] =<br />

A !A !… !A (r times), where A [0] = I n .<br />

67<br />

Induction and Recursion<br />

Principle <strong>of</strong> Mathematical Induction: Given a propositional function P(n), n,N,<br />

we prove that P(n) is true for all n,N by verifying<br />

1. (Basis) P(1) is true<br />

2. (Induction) P(k)%P(k+1), (k,N.<br />

Notes:<br />

• Equivalent to [P(1) ! (k,N (P(k)%P(k+1))] % (n,N P(n).<br />

• We do not actually assume P(k) is true. It is shown that if it is assumed that<br />

P(k) is true, then P(k+1) is also true. This is a subtle grammatical point with<br />

mathematical implications.<br />

• Mathematical induction is a form <strong>of</strong> deductive reasoning, not inductive<br />

reasoning. The latter tries to make conclusions based on observations and<br />

rules that may lead to false conclusions.<br />

• Sometimes P(1) is not the basis, but some other P(k), k,Z.<br />

68


• Sometimes P(k) is for a (possibly infinite) subset <strong>of</strong> N or Z.<br />

• Sometimes P(k-1)%P(k) is easier to prove than P(k)%P(k+1).<br />

• Being flexible, but staying within the guiding principle usually works.<br />

• There are many ways <strong>of</strong> proving false results using subtly wrong induction<br />

arguments. Usually there is a disconnect between the basis and induction<br />

parts <strong>of</strong> the pro<strong>of</strong>.<br />

• Examples 10, 11, and 12 in your textbook are worth studying until you<br />

really understand each.<br />

n<br />

Lemma: ! (2i-1) = n<br />

i=1<br />

2 (sum <strong>of</strong> odd numbers).<br />

Pro<strong>of</strong>: (Basis) Take k = 1, so 1 = 1.<br />

(Induction) Assume 1+3+5+…+(2k-1) = k 2 for an arbitrary k > 1. Add 2k+1 to<br />

both sides. Then (1+3+5+…+(2k-1))+(2k+1) = k 2 +(2k+1) = (k+1) 2 .<br />

69<br />

n<br />

i=0<br />

Lemma: ! 2 i = 2 n+1 -1.<br />

Pro<strong>of</strong>: (Basis) Take k=0, so 2 0 = 1 = 2 1 – 1.<br />

k<br />

(Induction) Assume ! 2<br />

i=0<br />

i = 2 k+1 -1 for an arbitrary k > 0. Add 2 k+1 to both<br />

sides. Then<br />

k<br />

! 2<br />

i=0<br />

i + 2 k+1 = 2 k+1 -1 + 2 k+1 ,<br />

which simplifies to<br />

k+1<br />

2<br />

i=0<br />

i<br />

! = 2 k+2 -1.<br />

Principle <strong>of</strong> Strong Induction: Given a propositional function P(n), n,N, we<br />

prove that P(n) is true for all n,N by verifying<br />

1. (Basis) P(1) is true<br />

2. (Induction) [P(1)!P(2)!…!P(k)]%P(k+1) is true (k,N.<br />

70


Example: Infinite ladder with reachable rungs. For mathematical or strong<br />

induction, we need to verify the following:<br />

Step Mathematical Strong<br />

Basis<br />

We can reach the first rung.<br />

Induction If we can reach an arbitrary (k,N, if we can reach all k<br />

rung k, then we can reach rungs, then we can reach<br />

rung k+1.<br />

rung k+1.<br />

We cannot prove that you can climb an infinite ladder using mathematical<br />

induction. Using strong induction, however, you can prove this result using a<br />

trick: since you can prove that you can climb to rungs 1, 2, …, k, it follows that<br />

you can climb 2 rungs arbitrarily, which gets you from rung k-1 to rung k+1.<br />

Rule <strong>of</strong> thumb: Always use mathematical induction if P(k)%P(k+1) % (k,N.<br />

Only resort to strong induction when that fails.<br />

71<br />

Fundamental Theorem <strong>of</strong> Arithmetic: Every n,N (n>1) is the product <strong>of</strong> primes.<br />

Pro<strong>of</strong>: Let P(n) be the proposition that n can be written as the product <strong>of</strong> primes.<br />

(Basis) P(2) is true: 2 = 2, the product <strong>of</strong> 1 prime.<br />

(Induction) Assume P(j) is true (j;k. We must verify that P(k+1) is true.<br />

Case 1: k+1 is a prime. Hence, P(k+1) is true.<br />

Case 2: k+1 is a composite. Hence k+1 = a•b, where 2;a;b


Example: Every postage amount < $.12 can be formed using $.04 and $.05<br />

stamp combinations only. We can prove this using modified strong induction.<br />

(Basis) Consider 4 specific cases:<br />

Postage Number <strong>of</strong> $.04’s Number <strong>of</strong> $.05’s<br />

$.12 3 0<br />

$.13 2 1<br />

$.14 1 2<br />

$.15 0 3<br />

Hence, P(j) is true for 12;j;15.<br />

(Induction) Assume P(j) is true for 12;j;k and k2.<br />

• h(0) = 1, h(n) = nh(n-1) = n!<br />

• Fibonacci numbers: f 0 = 0, f 1 = 1, f n = f n-1 + f n-2 , n>1.<br />

n 0 1 2 3 4<br />

f(n) 1 6 16 36 76<br />

g(n) 12 1 -10 -21 -32<br />

h(n) 1 1 2 6 24<br />

f n 0 1 1 2 3<br />

74


Theorem: Whenever n ? n-2 , where !=(1+ 5)/2 .<br />

The pro<strong>of</strong> is by modified strong induction.<br />

Lamé’s Theorem: Let a,b,N (a


Definition: A recursive algorithm solves a problem by reducing it to an instance<br />

<strong>of</strong> the same problem with smaller input(s).<br />

Note: Recursive algorithms can be proven correct using mathematical induction<br />

or modified strong induction.<br />

Examples:<br />

• n! = n•(n-1)!<br />

• a n = a•(a n-1 )<br />

• gcd(a,b) with a,b,N (a


• Fibonacci numbers<br />

procedure fib(n: n,N 0 )<br />

if n = 0 then fib(0) := 0<br />

else if n = 1 then fib(1) := 1<br />

else fib(n) := fib(n-1) + fib(n-2)<br />

or it can be defined iteratively:<br />

procedure fib(n: n,N 0 )<br />

if n = 0 then y := 0<br />

else<br />

x := 0, y := 1<br />

for I := 1 to n-1<br />

z := x+y<br />

x := y<br />

y := z<br />

{y is f n }<br />

79<br />

Graphs and trees are important concepts that we will spend a lot <strong>of</strong> time<br />

considering later in the course.<br />

• A graph is made up <strong>of</strong> vertices and edges that connect some <strong>of</strong> the vertices.<br />

• A tree is a special form <strong>of</strong> a graph, namely it is a connected unidirectional<br />

graph with no simple circuits.<br />

• A rooted tree is a tree with one vertex that is the root and every edge is<br />

directed away from the root.<br />

• A m-ary tree is a rooted tree such that every internal vertex has no more<br />

than m children. If m = 2, it is a binary tree.<br />

• The height <strong>of</strong> a rooted tree T, denoted h(T), is the maximum number <strong>of</strong><br />

levels (or vertices).<br />

• A balanced rooted tree T has all <strong>of</strong> its leaves at h(T) or h(T)-1.<br />

Let T 1 , T 2 , …, T m be rooted trees with roots r 1 , r 2 , …, r m . Let r be another root.<br />

Connecting r to the roots r 1 , r 2 , …, r m constructs another rooted tree T. We can<br />

reformulate this concept using the recursive set methodology.<br />

80


Merge sort is a balanced binary tree method that first breaks a list up recursively<br />

into two lists until each sublist has only one element. Then the sublists are<br />

recombined, two at a time and sorted order, until only one sorted list remains.<br />

Note: The height <strong>of</strong> the tree formed in merge sort is O(log 2 n) for n elements.<br />

Notes:<br />

10, 4, 7, 1<br />

10, 4 7, 1<br />

10 4 7 1<br />

4, 10 1, 7<br />

1, 4, 7, 10<br />

• First three rows do the sublist splitting.<br />

• Last two rows do the merging.<br />

• There are two distinct algorithms at work.<br />

81<br />

!<br />

procedure merge_sort(L = " a<br />

# i<br />

if n > 1 then<br />

m := 7n/28<br />

!<br />

L 1 := " a i<br />

m<br />

$<br />

%<br />

# &i=1<br />

n<br />

$<br />

%<br />

&i=m+1<br />

n<br />

$<br />

% )<br />

&i=1<br />

!<br />

L 2 := " a<br />

# i<br />

L := merge(merge_sort(L 1 ), merge_sort(L 2 ))<br />

!<br />

{L is now the sorted " a<br />

# i<br />

n<br />

$<br />

% }<br />

&i=1<br />

procedure merge(L 1 , L 2 : sorted lists)<br />

L := /<br />

while L 1 and L 2 are both nonempty<br />

remove the smaller <strong>of</strong> the first element <strong>of</strong> L 1 and L 2 and append it to<br />

end <strong>of</strong> L<br />

if either L 1 or L 2 are empty, append the other list to the end <strong>of</strong> L<br />

{L is the merged, sorted list}<br />

82


Theorem: If n i = |L i |, i=1,2, then merge requires at most n 1 +n 2 -1 comparisons. If<br />

n = |L|, then merge_sort requires O(nlog 2 n) comparisons.<br />

Quick sort is another sorting algorithm that breaks an initial list into many<br />

! $<br />

sublists, but using a different heuristic than merge sort. If L = " a<br />

# i<br />

% with<br />

&i=1<br />

distinct elements, then quick sort recursively constructs two lists: L 1 for all a i <<br />

a 1 and L 2 for all a i > a 1 with a 1 appended to the end <strong>of</strong> L 1 . This continues<br />

recursively until each sublist has only one element. Then the sublists are<br />

recombined in order to get a sorted list.<br />

Note: On average, the number <strong>of</strong> comparisons is O(nlog 2 n) for n elements, but<br />

can be O(n 2 ) in the worst case. Quick sort is one <strong>of</strong> the most popular sorting<br />

algorithms used in academia.<br />

Exercise: Google “quick sort, C++” to see many implementations or look in<br />

many <strong>of</strong> the 200+ C++ primers. Defining quick sort is in Rosen’s exercises.<br />

n<br />

83<br />

Counting, Permutations, and Combinations<br />

Product Rule Principle: Suppose a procedure can be broken down into a<br />

sequence <strong>of</strong> k tasks. If there are n i , 1;i;k, ways to do the i th task, then there are<br />

!<br />

k<br />

i=1<br />

n i<br />

ways to do the procedure.<br />

Sum Rule Principle: Suppose a procedure can be broken down into a sequence<br />

<strong>of</strong> k tasks. If there are n i , 1;i;k, ways to do the i th task, with each way unique,<br />

k<br />

then there are ! ways to do the procedure.<br />

i=1<br />

n i<br />

Exclusion (Inclusion) Principle: If the sum rule cannot be applied because the<br />

ways are not unique, we use the sum rule and subtract the number <strong>of</strong> duplicate<br />

ways.<br />

Note: Mapping the individual ways onto a rooted tree and counting the leaves is<br />

another method for summing. The trees are not unique, however.<br />

84


Examples:<br />

• Consider 3 students in a classroom with 10 seats. There are 10$9$8 = 720<br />

ways to assign the students to the seats.<br />

• We want to appoint 1 person to fill out many, may forms that the<br />

administration wants filled in by today. There are 3 students and 2 faculty<br />

members who can fill out the forms. There are 3+2 = 5 ways to choose 1<br />

person. (Duck fast.)<br />

• How many variables are legal in the orginal Dartmouth BASIC computer<br />

language? Variables are 1 or 2 alphanumeric characters long, begin with A-<br />

Z, case independent, and are not one <strong>of</strong> the 5 two character reserved words<br />

in BASIC. We use a combination <strong>of</strong> the three counting principles:<br />

o 1 character variables: V 1 = 26<br />

o 2 character variables: V 2 = 26$36 - 5 = 931<br />

o Total: V = V 1 + V 2 = 957<br />

85<br />

Pigeonhole Principle: If there are k,N boxes and at least k+1 objects placed in<br />

the boxes, then there is at least one box with more than one object in it.<br />

Theorem: A function f: D%E such that |D| >k and |E| = k, then f is not 1-1.<br />

The pro<strong>of</strong> is by the pigeonhole principle.<br />

Theorem (Generalized Pigeonhole Principle): If N objects are placed in k boxes,<br />

then at least one box contains at least 9N/k: - 1 objects.<br />

Pro<strong>of</strong>: First recall that 9N/k: < (N/k)+1. Now suppose that none <strong>of</strong> the boxes<br />

contains more than 9N/k: - 1 objects. Hence, the total number <strong>of</strong> objects has to<br />

be<br />

k(9N/k: - 1) < k((N/k)+1)-1) = N. %B<br />

Hence, the theorem must be true (pro<strong>of</strong> by contradiction).<br />

Theorem: Every sequence <strong>of</strong> n 2 +1 distinct real numbers contains a subsequence<br />

<strong>of</strong> length n+1 that is either strictly increasing or decreasing.<br />

86


Examples: From a standard 52 card playing deck.<br />

• How many cards must be dealt to guarantee that k = 4 cards from the same<br />

suit are dealt?<br />

o GPP Theorem says 9N/k: - 1 < 4 or N = 17.<br />

o Real minimum turns out to be 9N/k: < 4 or N = 16.<br />

• How many cards must be dealt to guarantee that 4 clubs are dealt?<br />

o GPP Theorem does not apply.<br />

o The product rule and inclusion principles apply: 3$13+4 = 43 since all<br />

<strong>of</strong> the hearts, spaces, and diamonds could be dealt before any clubs.<br />

Definition: A permutation <strong>of</strong> a set <strong>of</strong> distinct objects is an ordered arrangment <strong>of</strong><br />

these objects. A r-permutation is an ordered arrangement <strong>of</strong> r <strong>of</strong> these objects.<br />

Example: Given S = {0,1,2}, then {2,1,0} is a permutation and {0,2} is a 2-<br />

permutation <strong>of</strong> S.<br />

87<br />

Theorem: If n,r,N, then there are P(n,r) = n$(n-1)$(n-2)$…$(n-r+1) = n!/(n-r)!<br />

r-permutations <strong>of</strong> a set <strong>of</strong> n distinct elements. Further, P(n,0) = 1.<br />

The pro<strong>of</strong> is by the product rule for r


Theorem: The number <strong>of</strong> r-combinations <strong>of</strong> a set with n elements with n,r,N 0 is<br />

C(n,r) = n # r<br />

!<br />

"<br />

$<br />

.<br />

%<br />

&<br />

Pro<strong>of</strong>: The r-permutations can be formed using C(n,r) r-combinations and then<br />

ordering each r-combination, which can be done in P(r,r) ways. So,<br />

P(n,r) = C(n,r)$P(r,r)<br />

or<br />

C(n,r) = P(r,r)<br />

P(n,r) = n!<br />

(n-r)! !(r-r)! = n!<br />

r! r!(n-r)! .<br />

Theorem: C(n,r) = C(n,n-r) for 0;r;n.<br />

Definition: A combinatorial pro<strong>of</strong> <strong>of</strong> an identity is a pro<strong>of</strong> that uses counting<br />

arguments to prove that both sides f the identity count the same objects, but in<br />

different ways.<br />

89<br />

Binomial Theorem: Let x and y be variables. Then for n,N,<br />

(x+y) n n ! n$<br />

= # &<br />

' x<br />

j=0#<br />

j<br />

n-j y j .<br />

&<br />

Pro<strong>of</strong>: Expanding the terms in the product all are <strong>of</strong> the form x n-j y j for<br />

j=0,1,…,n. To count the number <strong>of</strong> terms for x n-j y j , note that we have to choose<br />

n-j x’s from the n sums so that the other j terms in the product are y’s. Hence,<br />

the coefficient for x n-j y j ! n $<br />

is # &<br />

# n-j<br />

= ! n $<br />

# &<br />

& # j<br />

.<br />

&<br />

"<br />

%<br />

"<br />

%<br />

Example: What is the coefficient <strong>of</strong> x 12 y 13 in (x+y) 25 !<br />

? 25 $<br />

# &<br />

#<br />

13<br />

= 5,200,300.<br />

&<br />

n ! n$<br />

Corollary: Let n,N 0 . Then # &<br />

' k=0#<br />

k<br />

= 2n .<br />

&<br />

Pro<strong>of</strong>: 2 n = (1+1) n n ! n$<br />

= # &<br />

# k<br />

1k 1 n-k n ! n$<br />

= # &<br />

'<br />

&<br />

' .<br />

k=0<br />

k=0#<br />

k&<br />

"<br />

%<br />

"<br />

%<br />

"<br />

"<br />

%<br />

%<br />

"<br />

%<br />

90


n<br />

k=0<br />

Corollary: Let n,N 0 . Then (-1) k n # &<br />

'<br />

# k<br />

= 0 .<br />

&<br />

Pro<strong>of</strong>: 0 = 0 n = ((-1)+1) n n ! n$<br />

= # &<br />

# k<br />

(-1)k 1 n-k n<br />

= (-1) k ! n $<br />

# &<br />

'<br />

&<br />

' .<br />

k=0<br />

k=0 # k&<br />

!<br />

Corollary: n # &<br />

# 0<br />

+ n # &<br />

& # 2<br />

+ n # &<br />

& # 4<br />

+!= n # &<br />

& # 1<br />

+ n # &<br />

& # 3<br />

+ n # &<br />

& # 5<br />

+!<br />

&<br />

"<br />

$<br />

%<br />

!<br />

"<br />

$<br />

%<br />

!<br />

"<br />

$<br />

%<br />

Corollary: Let n,N 0 . Then 2 k n # &<br />

'<br />

# k<br />

= 3n .<br />

&<br />

!<br />

"<br />

$<br />

%<br />

!<br />

"<br />

n<br />

k=0<br />

!<br />

Theorem (Pascal’s Identity): Let n,k,N with n


If we allow repetitions in the permutations, then all <strong>of</strong> the previous theorems and<br />

corollaries no longer apply. We have to start over !.<br />

Theorem: The number <strong>of</strong> r-permutations <strong>of</strong> a set with n objects and repetition is<br />

n r .<br />

Pro<strong>of</strong>: There are n ways to select an element <strong>of</strong> the set <strong>of</strong> all r positions in the r-<br />

permutation. Using the product principle completes the pro<strong>of</strong>.<br />

Theorem: There are C(n+r-1,r) = C(n+r-1,n-1) r-combinations from a set with n<br />

elements when repetition is allowed.<br />

Example: How many solutions are there to x 1 +x 2 +x 3 = 9 for x i ,N? C(3+9-1,9) =<br />

C(11,9) = C(11,2) = 55. Only when the constraints are placed on the x i can we<br />

possibly find a unique solution.<br />

Definition: The multinomial coefficient is C(n; n 1 , n 2 , …, n k ) = n!<br />

k<br />

! n i<br />

!<br />

.<br />

i=1<br />

93<br />

Theorem: The number <strong>of</strong> different permutations <strong>of</strong> n objects, where there are n i ,<br />

1;i;k, indistinguishable objects <strong>of</strong> type i, is C(n; n 1 , n 2 , …, n k ).<br />

Theorem: The number <strong>of</strong> ways to distribute n distinguishable objects in k<br />

distinguishable boxes so that n i objects are placed into box i, 1;i;k, is C(n; n 1 ,<br />

n 2 , …, n k ).<br />

Theorem: The number <strong>of</strong> ways to distribute n distinguishable objects in k<br />

indistinguishable boxes so that n i objects are placed into box i, 1;i;k, is<br />

k<br />

j=11<br />

j!<br />

Multinomial Theorem: If n,N, then<br />

"<br />

$<br />

$<br />

#<br />

!<br />

k<br />

i=1<br />

x i<br />

n<br />

%<br />

'<br />

'<br />

&<br />

j-1<br />

j"<br />

%<br />

j<br />

" %<br />

! -1 $ ' "<br />

! #<br />

$<br />

&<br />

' j-i<br />

i=0 $<br />

i'<br />

#<br />

$<br />

#<br />

&<br />

%<br />

& 'n<br />

.<br />

n<br />

= ! n1<br />

C(n;n 1<br />

,n 2<br />

,...,n<br />

+n 2<br />

+...n k<br />

)x 1<br />

n<br />

k<br />

=k<br />

1<br />

x 2<br />

n<br />

2<br />

...x k k<br />

.<br />

94


Generating permutations and combinations is useful and sometimes important.<br />

Note: We can place any n-set into a 1-1 correspondence with the first n natural<br />

numbers. All permutations can be listed using {1, 2, …, n} instead <strong>of</strong> the actual<br />

set elements. There are n! possible permutations.<br />

Definition: In the lexicographic (or dictionary) ordering, the permutation <strong>of</strong><br />

{1,2,…,n} a 1 a 2 …a n precedes b 1 b 2 …b n if and only if a i ; b i , for all 1;i;n.<br />

Examples:<br />

• 5 elements. The permutation 21435 precedes 21543.<br />

• Given 362541, then 364125 is the next permutation lexicographically.<br />

95<br />

Algorithm: Generate the next permutation in lexicographic order.<br />

procedure next_perm(a 1 a 2 …a n : a i ,{1,2,…,n} and distinct)<br />

j := n – 1<br />

while a j > a j+1<br />

j := j – 1<br />

{j is the largest subscript with a j < a j+1 }<br />

k := n<br />

while a j > a k<br />

k := k – 1<br />

{a k is the smallest integer greater than a j to the right <strong>of</strong> a j }<br />

Swap a j and a k<br />

r := n, s := j+1<br />

while r > s<br />

Swap a r and a s<br />

r := r – 1, s:= s + 1<br />

{This puts the tail end <strong>of</strong> the permutation after the j th<br />

increasing order}<br />

position in<br />

96


Algorithm: Generating the next r-combination in lexicographic order.<br />

procedure next_r_combination(a 1 a 2 …a n : a i ,{1,2,…,n} and distinct)<br />

i := r<br />

while a i = n-r+1<br />

i := i – 1<br />

a i := a i + 1<br />

for j := i+1 to r<br />

a j := a i + j - 1<br />

Example: Let S = {1, 2, …, 6}. Given a 4-permutation <strong>of</strong> {1, 2, 5, 6}, the next 4-<br />

permutation is {1, 3, 4, 5}.<br />

97<br />

<strong>Discrete</strong> Probability<br />

Definition: An experiment is a procedure that yields one <strong>of</strong> a given set <strong>of</strong><br />

possible outcomes.<br />

Definition: The sample space <strong>of</strong> the experiment is the set <strong>of</strong> (all) possible<br />

outcomes.<br />

Definition: An event is a subset <strong>of</strong> the sample space.<br />

First Assumption: We begin by only considering finitely many possible<br />

outcomes.<br />

Definition: If S is a finite sample space <strong>of</strong> equally likely outcomes and E0S is<br />

an event, then the probability <strong>of</strong> E is p(E) = |E| / |S|.<br />

98


Examples:<br />

• I randomly chose an exam1 to grade. What is the probability that it is one <strong>of</strong><br />

the Davids? Thirty one students took exam1 <strong>of</strong> which five were Davids. So,<br />

p(David) = 5 / 31 ~ 0.16.<br />

• Suppose you are allowed to choose 6 numbers from the first 50 natural<br />

numbers. The probability <strong>of</strong> picking the correct 6 numbers in a lottery<br />

drawing is 1/C(50,6) = (44!$6!) / 50! ~ 1.43$10 -9 . This lottery is just a<br />

regressive tax designed for suckers and starry eyed dreamers.<br />

Definition: When sampling, there are two possible methods: with and without<br />

replacement. In the former, the full sample space is always available. In the<br />

latter, the sample space shrinks with each sampling.<br />

99<br />

Example: Let S = {1, 2, …, 50}. What is the probability <strong>of</strong> sampling {1, 14, 23,<br />

32, 49}?<br />

• Without replacement: p({1,14,23,32,49}) = 1 / (50$49$48$47$46) =<br />

3.93$10 -9 .<br />

• With replacement: p({1,14,23,32,49}) = 1 / (50$50$50$50$50) = 3.20$10 -9 .<br />

Definition: If E is an even, then E is the complementary event.<br />

Theorem: p(E) = 1 – p(E) for a sample space S.<br />

Pro<strong>of</strong>: p(E) = (|S| – |E|) / |S| = 1 – |E| / |S| = 1 – p(E).<br />

Example: Suppose we generate n random bits. What is the probability that one<br />

<strong>of</strong> the bits is 0? Let E be the event that a bit string has at least one 0 bit. Then E<br />

is the event that all n bits are 1. p(E) = 1 – p(E) = 1 – 2 -n = (2 n – 1) / 2 n .<br />

Note: Proving the example directly for p(E) is extremely difficult.<br />

100


Theorem: Let E and F be events in a sample space S. Then<br />

p(E2F) = p(E) + p(F) – p(E3F).<br />

Pro<strong>of</strong>: Recall that |E2F| = |E| + |F| – |E3F|. Hence,<br />

p(E2F) = |E2F| / |S| = (|E| + |F| – |E3F|) / |S| = p(E) + p(F) – p(E3F).<br />

Example: What is the probability in the set {1, 2, …, 100} <strong>of</strong> an element being<br />

divisible by 2 or 3? Let E and F represent elements divisible by 2 and 3,<br />

respectively. Then |E| = 50, |F| = 33, and |E3F| = 16. Hence, p(E2F) = 0.67.<br />

101<br />

Second Assumption: Now suppose that the probability <strong>of</strong> an event is not 1 / |S|.<br />

In this case we must assign probabilities for each possible event, either by<br />

setting a specific value or defining a function.<br />

Definition: For a sample space S with a finite or countable number <strong>of</strong> events, we<br />

assign probabilities p(s) to each event s,S such that<br />

(1) 0 ; p(s) ; 1 (s,S, and<br />

(2) " p(s) = 1.<br />

s!S<br />

Notes:<br />

1. When |S| = n, the formulas (1) and (2) can be rewritten using n.<br />

2. When |S| = . and is uncountable, integral calculus is required for (2).<br />

3. When |S| = . and is countable, the sum in (2) is true in the limit.<br />

102


Example: Coin flipping with events H and T.<br />

• S = {H, T} for a fair coin. Hence, p(H) = p(T) = 0.5.<br />

• S = {H, H, T} for a weighted coin. Then p(H) = 0.67 and p(T) = 0.33.<br />

Definition: Suppose that S is a set with n elements. The uniform distribution<br />

assigns the probability 1/n to each element in S.<br />

Definition: The probability <strong>of</strong> the event E is the sum <strong>of</strong> the probabilities <strong>of</strong> the<br />

outcomes in E, i.e., p(E) = " p(s) .<br />

s!E<br />

Note: When |E| = ., the sum<br />

" p(s) must be convergent in the limit.<br />

s!E<br />

Definition: The experiment <strong>of</strong> selecting an element from a sample space S with<br />

a uniform distribution is known as selecting an element from S at random.<br />

We can prove that (1) p(E) = 1 – p(E) and (2) p(E2F) = p(E) + p(F) – p(E3F)<br />

using the more general probability definitions.<br />

103<br />

Definition: Let E and F be events with p(F) > 0. The conditional probability <strong>of</strong> E<br />

given F is defined by p(E|F) = p(E3F) / p(F).<br />

Example: A bit string <strong>of</strong> length 3 is generated at random. What is the probability<br />

that there are two 0 bits in a row given that the first bit is 0? Let F be the event<br />

that the first bit is 0. Let E be the event that there are two 0 bits in a row. Note<br />

that E3F = {000, 001} and p(F) = 0.5. Hence, p(E|F) = 0.25 / 0.5 = 0.5.<br />

Definition: The events E and F are independent if p(E3F) = p(E)p(F).<br />

Note: Independence is equivalent to having p(E|F) = p(E).<br />

Example: Suppose E is the event that a bit string begins with a 1 and F is the<br />

event that there is are an even number <strong>of</strong> 1’s. Suppose the bit strings are <strong>of</strong><br />

length 3. There are 4 bit strings beginning with 1: {100, 101, 110, 111}. There<br />

are 3 strings with an even number <strong>of</strong> 1’s: {101, 110, 011}. Hence, p(E) = 0.5<br />

and p(F) = 0.375. E3F = {101, 110}, so p(E3F) = 0.25. Thus, p(E3F) -<br />

p(E)p(F). Hence, E and F are not independent.<br />

104


Note: For bit strings <strong>of</strong> length 4, 0.25 = p(E3F) = (0.5)$(0.5) = p(E)p(F), so the<br />

events are independent. We can speculate on whether or not the even/odd length<br />

<strong>of</strong> the bit strings plays a part in the independence characteristic.<br />

Definition: Each performance <strong>of</strong> an experiment with exactly two outcomes,<br />

denoted success (S) and failure (F), is a Bernoulli trial.<br />

Definition: The Bernoulli distribution is denoted b(k; n,p) = C(n,k)p k q n-k .<br />

Theorem: The probability <strong>of</strong> exactly k successes in n independent Bernoulli<br />

trials, with probability <strong>of</strong> success p and failure q = 1 – p is b(k; n,p).<br />

Pro<strong>of</strong>: When n Bernoulli trials are carried out, the outcome is an n-tuple<br />

(t 1 , t 2 , …, t n ), all n t i ,{S, F}. Due to the trials independence, the probability <strong>of</strong><br />

each outcome having k successes and n-k failures is p k q n-k . There are C(n,k)<br />

possible tuples that contain exactly k successes and n-k failures.<br />

105<br />

Example: Suppose we generate bit strings <strong>of</strong> length 10 such that p(0) = 0.7 and<br />

p(1) = 0.3 and the bits are generated independently. Then<br />

• b(8; 10,0.7) = C(10,8)(0.7) 8 (0.3) 2 = 45$0 .0823543$0.09 = 0.3335<br />

• b(7; 10,0.7) = C(10,7)(0.7) 7 (0.3) 3 = 120$0 .05764801$0.027 = 0.1868<br />

n<br />

k=0<br />

Theorem: ! b(k;n,p) = 1.<br />

n<br />

k=0<br />

n<br />

k=0<br />

Pro<strong>of</strong>: ! b(k;n,p) = ! C(k; n,p)p k q n-k = (p+q) n = 1.<br />

Definition: A random variable is a function from the sample space <strong>of</strong> an<br />

experiment to the set <strong>of</strong> reals.<br />

Notes:<br />

• A random variable assigns a real number to each possible outcome.<br />

• A random function is not a function nor random.<br />

106


Example: Flip a fair coin twice. Let X(t) be the random variable that equals the<br />

number <strong>of</strong> tails that appear when t is the outcome. Then<br />

X(HH) = 0, X(HT) = X(TH) = 1, and X(TT) = 2.<br />

Definition: The distribution <strong>of</strong> a random variable X on a sample space is the set<br />

<strong>of</strong> pairs (r, p(X=r)) (r,X(S), where p(X=r) is the probability that X takes the<br />

value r.<br />

Note: A distribution is usually described by specifying p(X=r) (r,X(S).<br />

Example: For our coin flip example above, each outcome has probability 0.25.<br />

Hence,<br />

p(X=0) = 0.25, p(X=1) = 0.5, and p(X=2) = 0.25.<br />

107<br />

Definition: The expected value (or expectation) <strong>of</strong> the random variable X(s) in<br />

the sample space S is E(X)= " p(s)X(s) .<br />

s!S<br />

n<br />

i=1<br />

Note: If S = {x i<br />

} n i=1<br />

, then E(X) = ! p(x i<br />

)X(x i<br />

).<br />

Example: Roll a die. Let the random variable X take the valuess 1, 2, …, 6 with<br />

n<br />

!<br />

$<br />

probability 1/6 each. Then E = 1<br />

' = 3.5. This is not really what you would<br />

i=1"<br />

# 6 %<br />

&<br />

like to see since the die does not a 3.5 face.<br />

Theorem: If X is a random variable and p(X=r) is the probability that X=r so<br />

that p(X=r) = " , then E(X) = " p(X=r)r .<br />

p(s)<br />

r!S,X(s)=r<br />

r!X(S)<br />

Pro<strong>of</strong>: Suppose X is a random variable with range X(S). Let p(X=r) be the<br />

probability that X takes the value r. Hence, p(X=r) is the sum <strong>of</strong> probabilities <strong>of</strong><br />

outcomes s such that X(s)=r Finally, E(X) = " p(X=r)r .<br />

r!X(S)<br />

108


Theorem: If X i , 1;i;n, are random variables on S and if a,b,R, then<br />

1. E(X 1 +X 2 +…+X n ) = E(X 1 )+E(X 2 )+…+E(X n )<br />

2. E(aX i +b) = aE(X i ) + b<br />

Pro<strong>of</strong>: Use mathematical induction (base case is n=2) for 1 and using the<br />

definitions for 2.<br />

Note: The linearity <strong>of</strong> E is extremely convenient and useful.<br />

Theorem: The expected number <strong>of</strong> successes when n Bournoulli trials is<br />

performed when p is the probability <strong>of</strong> success on each trial is np.<br />

Pro<strong>of</strong>: Apply 1 from the previous theorem.<br />

109<br />

Notes:<br />

• The average case complexity <strong>of</strong> an algorithm can be interpreted as the<br />

expected value <strong>of</strong> a random variable. Let S={a i }, where each possible input<br />

is an a i . Let X be the random variable such that X(a i ) = b i , the number <strong>of</strong><br />

operations for the algorithm with input a i . We assign a probability p(a i )<br />

based on b i . Then the average case complexity is E(X) = " p(a i<br />

)X(a i<br />

).<br />

a i<br />

!S<br />

• Estimating the average complexity <strong>of</strong> an algorithm tends to be quite<br />

difficult to do directly. Even if the best and worst cases can be estimated<br />

easily, there is no guarantee that the average case can be estimated without a<br />

great deal <strong>of</strong> work. Frankly, the average case is sometimes too difficult to<br />

estimate. Using the expected value <strong>of</strong> a random variable sometimes<br />

simplifies the process enough to make it doable.<br />

110


Example <strong>of</strong> linear search average complexity: See page 44 in the class notes for<br />

the algorithm and worst case complexity bound. We want to find x in a distinct<br />

!<br />

set " a<br />

# i<br />

n<br />

$<br />

%<br />

&i=1<br />

. If x =<br />

!<br />

ai , then there are 2i+1 comparisons. If x6 " a i<br />

!<br />

2n+2 comparisons. There are n+1 input types: " a i<br />

!<br />

where p is the probability that x, " a<br />

# i<br />

n<br />

$<br />

%<br />

&i=1<br />

n<br />

i=1<br />

#<br />

n<br />

$<br />

%<br />

&i=1<br />

. Let q = 14p. So,<br />

E = (p/n)! (2i-1) + (2n+2)q<br />

= (p/n)((n+1) 2 + (2n+2)q<br />

= p(n+2) + (2n+2)q.<br />

There are three cases <strong>of</strong> interest, namely,<br />

• p = 1, q = 0: E = n + 1<br />

• p = q = 0.5: E = (3n + 4) / 2<br />

• p = 0, q = 1: E = 2n + 2<br />

#<br />

n<br />

$<br />

%<br />

&i=1<br />

, then there are<br />

2x. Clearly, p(ai ) = p/n,<br />

111<br />

Definition: A random variable X has a geometric distribution with parameter p if<br />

p(X=k) = (14p) k-1 p for k = 1, 2, …<br />

Note: Geometric distributions occur in studies about the time required before an<br />

event happens (e.g., time to finding a particular item or a defective item, etc.).<br />

Theorem: If the random variable X has a geometrix distribution with parameter<br />

p, then E(X) = 1/p.<br />

Pro<strong>of</strong>:<br />

E(X) =<br />

!<br />

!<br />

i=1<br />

!<br />

i=1<br />

!<br />

i=1<br />

ip(X=i)<br />

= i(1-p) i-1 p<br />

!<br />

= p i(1-p) i-1<br />

!<br />

= pp -2<br />

= 1/p<br />

112


Definition: The random variables X and Y on a sample space are independent if<br />

p(X(s)=r 1 and Y(S)=r 2 ) = p(X(S)=r 1 )p(Y(S)=r 2 ).<br />

Theorem: If X and Y are independent random variables on a space S, then<br />

E(XY) = E(X)E(Y).<br />

Pro<strong>of</strong>: From the definition <strong>of</strong> expected value and since X and Y are independent<br />

random variables,<br />

"<br />

E(XY) = X(s)Y(s)p(s)<br />

s!S<br />

= " rtp(X(s)=r and Y(s)=t)<br />

r!X(S),t!Y(S)<br />

"<br />

= rtp(X(s)=r)p(Y(s)=t)<br />

r!X(S),t!Y(S)<br />

= #<br />

& #<br />

% rp(X(s)=r) ( % tp(Y(s)=t)<br />

$ " r!X(S)<br />

= E(X)E(Y).<br />

' $ " t!Y(S)<br />

&<br />

'<br />

(<br />

113<br />

Third Assumption: Not all problems can be solved using deterministic<br />

algorithms. We want to assess the probability <strong>of</strong> an event based on partial<br />

evidence.<br />

Note: Some algorithms need to make random choices and produce an answer<br />

that might be wrong with a probability associated with its likelihood <strong>of</strong><br />

correctness or an error estimate. Monte Carlo algorithms are examples <strong>of</strong><br />

probabilistic algorithms.<br />

Example: Consider a city with a lattice <strong>of</strong> streets. A drunk walks home from a<br />

bar. At each intersection, the drunk must choose between continuing or turning<br />

left or right. Hopefully, the drunk gets home eventually. However, there is no<br />

absolute guarantee.<br />

114


Example: You receive n items. Sometimes all n items are guaranteed to be good.<br />

However, not all shipments have been checked. The probability that an item is<br />

bad in an unchecked batch is 0.1. We want to determine whether or not a<br />

shipment has been checked, but are not willing to check all items. So we test<br />

items at random until we find a bad item or the probability that a shipment<br />

seems to have been checked is 0.001. How items do we need to check? The<br />

probability that an item is good, but comes from an unchecked batch is 140.1 =<br />

0.9. Hence, the k th check without finding a bad item, the probability that the<br />

items comes from an unchecked shipment is (0.9) k . Since (0.9) 66 ~0.001, we must<br />

check only 66 items per shipment.<br />

Theorem: If the probability that an element <strong>of</strong> a set S does have a particular<br />

property is in (0,1), then there exists an element in S with this property.<br />

115<br />

Bayes Theorem: Suppose that E and F are events from a sample space S such<br />

that p(E) - 0 and p(F) - 0. Then<br />

p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|F)p(F)).<br />

Generalized Bayes Theorem: Suppose that E is an event from a sample space<br />

n<br />

and that F 1 , F 2 , …, F n are mutually exclusive events such that ! F i<br />

= S.<br />

i=1<br />

Assume that p(E) - 0 and p(F i ) - 0, 1;i;n. Then<br />

n<br />

i=1<br />

p(F j |E) = p(E| F j )p(F j ) / ! p(E|F i<br />

)p(F i<br />

).<br />

116


Example: We have 2 boxes. The first box contains 2 green and 7 red balls. The<br />

second box contains 4 green and 3 red balls. We select a box at random, then a<br />

ball at random. If we picked a red ball, what is the probability that it came from<br />

the first box?<br />

• Let E be the event that we chose a red ball. Thus, E is the event that we<br />

chose a green ball. Let F be the event that we chose a ball from the first box.<br />

Thus, F is the event that we chose a ball from the second box. p(F) = p(F) =<br />

0.5 since we pick a box at random.<br />

• We want to calculate p(F|E) = p(E3F) / p(E), which we will do in stages.<br />

• p(E|F) = 7/9 since there are 7 red balls out <strong>of</strong> 9 total in box 1. p(E|F) = 3/7<br />

since there are 3 red balls out <strong>of</strong> a total <strong>of</strong> 7 in box 2.<br />

• p(E3F) = p(E|F)p(F) = 7/18 = 0.389 and p(E3F) = p(E|F)p(F) = 3/14.<br />

• We need to find p(E). We do this by observing that E = (E3F)2(E3F),<br />

where E3F and E3F are disjoint sets. So, p(E) = p(E3F)+p(E3F) = 0.603.<br />

• p(F|E) = p(E3F) / p(E) = 0.389 / 0.603 = 0.645, which is greater than the 0.5<br />

from the second bullet above. We have improved our estimate!<br />

117<br />

Example: Suppose one person in 100,000 has a particular rare disease and that<br />

there is an accurate diagnostic test for this disease. The test is 99% accurate when<br />

given to someone with the disease and is 99.5% accurate when given to someone<br />

who does not have the disease. We can calculate<br />

(a) the probability that someone who tests positive has the disease, and<br />

(b) the probability that someone who tests negative does not have the disease.<br />

Let F be the event that a person has the disease and let F be the event that this<br />

person tests positive. We will use Bayes theorem to calculate (a) and (b), so have<br />

to calculate p(F), p(F), p(E|F), and p(E|F).<br />

• p(F) = 1 / 100000 = 10 45 and p(F) = 1 4 p(F) = 0.99999.<br />

• p(E|F) = 0.99 since someone who has the disease tests positive 99% <strong>of</strong> the<br />

time. Similarly, we know that a false negative is p(E|F) = 0.01. Further,<br />

p(E|F) = 0.995 since the test is 99.5% accurate for someone who does not<br />

have the disease.<br />

• p(E|F) = 0.005, which is the probability <strong>of</strong> a false negative (100 4 99.5%).<br />

118


Now we calculate (a):<br />

p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|F)p(F)) =<br />

(0.99$10 45 ) / (0.99$10 45 + 0.005$0.99999) = 0.002.<br />

Roughly 0.2% <strong>of</strong> people who test positive actually have the disease. Getting a<br />

positive should not be an immediate cause for alarm (famous last words).<br />

Now we calculate (b):<br />

p(F|E) = p(E|F)p(F) / (p(E|F)p(F) + p(E|F)p(F))<br />

(0.995$0.99999) / (0.995$0.99999 + 0.01$10 45 ) = 0.9999999.<br />

Thus, 99.99999% <strong>of</strong> people who test negative really do not have the disease.<br />

119<br />

Bayesian Spam Filters used to be the first line <strong>of</strong> defense for email programs.<br />

Like many good things, the spammers ran right over the process in about two<br />

years. However, it is an interesting example <strong>of</strong> useful discrete mathematics.<br />

The filtering involves a training period. Email messages need to be marked as<br />

Good or Bad messages, which we will denote as being the G or B sets.<br />

Eventually the filter will mark messages for you, hopefully accurately.<br />

The filter finds all <strong>of</strong> the words in both sets and keeps a running total <strong>of</strong> each<br />

word per set. We construct two functions n G (w) and n B (w) that return the<br />

number <strong>of</strong> messages containing the word w in the G and B sets, respectively.<br />

We use a uniform distribution. The empirical probability that a spam message<br />

contains the word w is p(w) = n B (w) / |B|. The empirical probability that a nonspam<br />

message contains the word w is q(w) = n G (w) / |G|.<br />

We can use p and q to estimate if an incoming message is or is not spam based<br />

on a set <strong>of</strong> words that we build dynamically over time.<br />

120


Let E be the event that an incoming message contains the word w. Let S be the<br />

event that an incoming message is spam and contains the word w. Bayes<br />

theorem tells us that the probability that an incoming message containing the<br />

word w is spam is<br />

p(S|E) = p(E|S)p(S) / (p(E|S)p(S) + p(E|S)p(S)).<br />

If we assume that p(S) = p(S) = 0.5, i.e., that any incoming message is equally<br />

likely to be spam or not, then we get the simplified formula<br />

p(S|E) = p(E|S) / (p(E|S) + p(E|S)).<br />

We estimate p(E|S) = p(w) and p(E|S) = q(w). So, we estimate p(S|E) by<br />

r(w) = p(w) / (p(w) + q(w)).<br />

If r(w) is greater than some preset threshold, then we classify the incoming<br />

message as spam. We can consider a threshold <strong>of</strong> 0.9 to begin with.<br />

121<br />

Example: Let w = Rolex. Suppose it occurs in 250 / 2000 spam messages and in<br />

5 / 1000 good messages. We will estimate the probability that an incoming<br />

message with Rolex in it is spam assuming that it is equally likely that the<br />

incoming message is spam or not. We know that p(Rolex) = 250 / 2000 = 0.125<br />

and q(Rolex) = 5 / 1000 = 0.005. So,<br />

r(Rolex) = 0.125 / (0.125 + 0.005) = 0.962 > 0.9.<br />

Hence, we would reject the message as spam. (Note that some <strong>of</strong> us would reject<br />

all messages with the word Rolex in it as spam, but that is another case entirely.)<br />

122


Using just one word to determine if a message is spam or not leads to excessive<br />

numbers <strong>of</strong> false positives and negatives. We actually have to use the<br />

generalized Bayes theorem with a large set <strong>of</strong> words.<br />

k<br />

i=1<br />

E i<br />

p(S | ! ) =<br />

k<br />

i=1<br />

!<br />

k<br />

i=1<br />

p(E i<br />

|S)<br />

k<br />

i=1<br />

! p(E i<br />

|S) + ! p(E i<br />

|S)<br />

,<br />

which we estimate assuming equal probability that an incoming message is<br />

spam or not by<br />

r(w 1<br />

,w 1<br />

,...,w 1<br />

) =<br />

!<br />

k<br />

i=1<br />

p(w i<br />

)<br />

.<br />

k<br />

k<br />

! p(w i<br />

) + ! q(w i<br />

)<br />

i=1<br />

i=1<br />

123<br />

Example: The word w 1 = stock appears in 400 / 2000 spam messages and in just<br />

60 / 1000 good messages. The word w 2 = undervalued appears in 200 / 2000<br />

spam messages and in just 25 / 1000 good messages. Estimate the likelihood that<br />

an incoming message with both words in it is spam. We know p(stock) = 0.2 and<br />

q(stock) = 0.06. Similarly, p(undervalued) = 0.1 and q(undervalued) = .025. So,<br />

r(stock,undervalued) =<br />

p(stock)p(undervalued)<br />

p(stock)p(undervalued)+q(stock)q(undervalued)<br />

=<br />

0.2!0.1<br />

0.2!0.1+0.06!0.025<br />

= 0.930 > 0.9<br />

Note: Looking for particular pairs or triplets <strong>of</strong> words and treating each as a<br />

single entity is another method for filtering. For example, enhance performance<br />

probably indicates spam to almost anyone, but high performance computing<br />

probably does not indicate spam to someone in computational sciences (but<br />

probably will for someone working in, say, Maytag repair).<br />

124


Advanced Counting Principles<br />

Definition: A recurrence relation for the sequence {a n } is the equation that<br />

expresses a n in terms <strong>of</strong> one or more <strong>of</strong> the previous terms in the sequence. A<br />

sequence is called a solution to a recurrence relation if its terms satisfy the<br />

recurrence relation. The initial conditions specify the values <strong>of</strong> the sequence<br />

before the first term where the recurrence relation takes effect.<br />

Note: Recursion and recurrence relations have a connection. A recursive<br />

algorithm provides a solution to a problem <strong>of</strong> size n in terms <strong>of</strong> a problem size n<br />

in terms <strong>of</strong> one more instances <strong>of</strong> the same problem, but <strong>of</strong> smaller size.<br />

Complexity analysis <strong>of</strong> the recursive algorithm is a recurrence relation on the<br />

number <strong>of</strong> operations.<br />

Example: Suppose we have {a n } with a n = 3n, n,N. Is this a solution for<br />

a n = 2a n-1 4 a n-2 for n


Fibonacci Example: A young pair <strong>of</strong> rabbits (1 male, 1 female) arrive on a<br />

deserted island. They can breed after they are two months old and produce<br />

another pair. Thereafter each pair at least two months old can breed once a<br />

month. How many pairs f n <strong>of</strong> rabbits are there after n months.<br />

• n = 1: f 1 = 1 Initial<br />

• n = 2: f 2 = 1<br />

conditions<br />

• n > 2: f n = f n-1 + f n-2 Recurrence relation<br />

The n > 2 formula is true since each new pair comes from a pair at least 2<br />

months old.<br />

Example: For bit strings <strong>of</strong> length n < 3, find the recurrence relation and initial<br />

conditions for the number <strong>of</strong> bit strings that do not have two consecutive 0’s.<br />

• n = 1: a 1 = 2 Initial {0,1}<br />

• n = 2: a 2 = 3 conditions {01,10,11}<br />

• n > 2: a n = a n-1 + a n-2 Recurrence relation<br />

For n > 2, there are two cases: strings ending in 1 (thus, examine the n41 case)<br />

and strings ending in 10 (thus, examine the n42 case).<br />

127<br />

Definition: A linear homogeneous recurrence relation <strong>of</strong> degree k with constant<br />

coefficients is a recurrence relation <strong>of</strong> the form<br />

where {c i },R.<br />

a n = c 1 a n41 + c 2 a n42 + … + c k a n4k ,<br />

Motivation for study: This type <strong>of</strong> recurrence relation occurs <strong>of</strong>ten and can be<br />

systematically solved. Slightly more general ones can be, too. The solution<br />

methods are related to solving certain classes <strong>of</strong> ordinary differential equations.<br />

Notes:<br />

• Linear because the right hand side is a sum <strong>of</strong> previous terms.<br />

• Homogeneous because no terms occur that are not multiples <strong>of</strong> a j ’s.<br />

• Constant because no coefficient is a function.<br />

• Degree k because a n is defined in terms <strong>of</strong> the previous k sequential terms.<br />

128


Examples: Typical ones include<br />

• P n = 1.15$P n-1 is degree 1.<br />

• f n = f n-1 + f n-2 is degree 2.<br />

• a n = a n-5 is degree 5.<br />

Examples: Ones that fail the definition include<br />

• a n = a n-1 + a2 n-2<br />

is nonlinear.<br />

• H n = 2H n-1 + 1 is nonhomogeneous.<br />

• B n = nB n-1 is variable coefficient.<br />

We will get to nonhomogeneous recurrence relations shortly.<br />

129<br />

Solving a recurrence relation usually assumes that the solution has the form<br />

a n = r n ,<br />

where r,C, if and only if<br />

r n = c 1 r n-1 + c 2 r n-2 + … + c n-k r n-k .<br />

Dividing both sides by r n-k to simplify things, we get<br />

Definition: The characteristic equation is<br />

r k 4 c 1 r k-1 4 c 2 r k-2 4 … 4 c n-k = 0.<br />

Then {a n } with a n = r n is a solution if and only if r is a solution to the<br />

characteristic equation. The pro<strong>of</strong> is quite involved.<br />

The n = 2 case is much easier to understand, yet still multiple cases.<br />

130


Theorem: Assume c 1 ,c 2 ,? 1 ,? 2 ,R and r 1 ,r 2 ,C. Suppose that r 2 4c 1 r4c 2 = 0 has<br />

two distinct roots r 1 and r 2 . Then the sequence {a n } is a solution to the<br />

recurrence relation a n = c 1 a n-1 + c 2 a n-2 if and only if a n = ! 1<br />

r 1<br />

n + ! 2<br />

r 2<br />

n for n,N 0 .<br />

Example: a 0 = 2, a 1 = 7, and a n = a n-1 + 2a n-2 for n


Now comes the second case for n = 2.<br />

Theorem: Assume c 1 ,c 2 ,? 1 ,? 2 ,R and r 0 ,C. Suppose that r 2 4c 1 r4c 2 = 0 has one<br />

root r 0 with multiplicity 2. Then the sequence {a n } is a solution to the recurrence<br />

relation a n = c 1 a n-1 + c 2 a n-2 if and only if a n = ! 1<br />

r 0<br />

n + ! 2<br />

nr 0<br />

n for n,N 0 .<br />

Example: a 0 = 1, a 1 = 6, and a n = 6a n-1 4 9a n-2 for n


Theorem: Let {c i<br />

} k<br />

i=i<br />

, {! i<br />

} k i=i<br />

,R and {r i<br />

} k i=i<br />

,C. Suppose the characteristic<br />

equation r k – c 1 r k41 4… 4 c k = 0 has t distinct roots r i , 1;i;t, with multiplicities<br />

t<br />

m i ,N such that ! m i<br />

= k . Then the sequence {a<br />

i=1<br />

n } is a solution <strong>of</strong> the<br />

recurrence relation a n = c 1 a n41 + c 2 a n42 + … + c k a n4k if and only if<br />

a n = (! 1,0<br />

+! 1,1<br />

n+...+! 1,m1 "1 nm 1 "1 )r 1<br />

n + ... + (! t,0<br />

+! t,1<br />

n+...+! t,mt "1 nm t "1 )r t<br />

n<br />

for n,N 0 and all ? i,j , 1;i;t and 0;j;m i 41.<br />

Example: Suppose the roots <strong>of</strong> the characteristic equation are 2, 2, 3, 3, 3, 5.<br />

Then the general solution form is<br />

(? 1,0 +? 1,1 n)2 n + (? 2,0 +? 2,1 n+? 2,2 n 2 )3 n + ? 3,0 5 n .<br />

With given initial conditions, we can even compute the ?’s.<br />

135<br />

Definition: A linear nonhomogeneous recurrence relation <strong>of</strong> degree k with<br />

constant coefficients is a recurrence relation <strong>of</strong> the form<br />

where {c i },R.<br />

a n = c 1 a n41 + c 2 a n42 + … + c k a n4k + F(n),<br />

Theorem: If {a n (p) } is a particular solution <strong>of</strong> the recurrence relation with<br />

constant coefficients a n = c 1 a n41 + c 2 a n42 + … + c k a n4k + F(n), then every solution<br />

is <strong>of</strong> the form {a n (p) +a n (h) }, where {a n (h) } is a solution <strong>of</strong> the associated<br />

homogeneous recurrence relation (i.e., F(n) = 0).<br />

Note: Finding particular solutions for given F(n)’s is loads <strong>of</strong> fun unless F(n) is<br />

rather simple. Usually you solve the homogeneous form first, then try to find a<br />

particular solution from that.<br />

136


Theorem: Assume {b i },{c i },R. Suppose that {a n } satisfies the nonhomogeneous<br />

recurrence relation<br />

and<br />

a n = c 1 a n41 + c 2 a n42 + … + c k a n4k + F(n)<br />

f(n) = (b t n t + b t-1 n t-1 + … + b 1 n + b 0 )s n .<br />

When s is not a root <strong>of</strong> the characteristic equation <strong>of</strong> the associated<br />

homogeneous recurrence relation, there is a particular solution <strong>of</strong> the form<br />

(p t n t + p t-1 n t-1 + … + p 1 n + p 0 )s n .<br />

When s is a root <strong>of</strong> multiplicity m <strong>of</strong> the characteristic equation, there is a<br />

particular solution <strong>of</strong> the form<br />

n m (p t n t + p t-1 n t-1 + … + p 1 n + p 0 )s n .<br />

Note: If s = 1, then things get even more complicated.<br />

137<br />

Example: Let a n = 6a n-1 – 9a n-2 + F(n). When F(n) = 0, the characteristic equation<br />

is (r43) 2 . Thus, r 0 = 3 with multiplicity 2.<br />

• F(n) = 3 n : particular solution is n 2 p 0 3 n .<br />

• F(n) = n3 n : particular solution is n 2 (p 1 n + p 0 )3 n .<br />

• F(n) = n 2 2 n : particular solution is (p 2 n 2 + p 1 n + p 0 )2 n .<br />

• F(n) = (n+1)3 n : particular solution is n 2 (p 2 n 2 + p 1 n + p 0 )3 n .<br />

Definition: Suppose a recursive algorithm divides a problem <strong>of</strong> size n into m<br />

subproblems <strong>of</strong> size n/m each. Also suppose that g(n) extra operations are<br />

required to combine the m subproblems into a solution <strong>of</strong> the problem <strong>of</strong> size n.<br />

If f(n) is the cost <strong>of</strong> solving a problem <strong>of</strong> size n, then the divide and conquer<br />

recurrence relation is f(n) = af(n/b) + g(n).<br />

We can easily work out a general cost for the divide and conquer recurrence<br />

relation using Big-Oh notation.<br />

138


Divide and Conquer Theorem: Let a,b,c,R and be nonnegative. The solution to<br />

the recurrence relation<br />

!<br />

# c, for n = 1,<br />

f(n) = "<br />

# af(n/b)+cn d , for n > 1,<br />

for n a power <strong>of</strong> b is<br />

$<br />

!<br />

# O(n d ),<br />

#<br />

f(n)= O(n d logn),<br />

"<br />

#<br />

#<br />

$ #<br />

O(n log b a ),<br />

for a < b d ,<br />

for a = b d ,<br />

for a > b d .<br />

log b<br />

n<br />

i=1<br />

Pro<strong>of</strong>: If n is a power <strong>of</strong> b, then for r = a/b, f(n) = cn! r i . There are 3 cases:<br />

!<br />

i=0<br />

• a < b d : Then ! r i converges, so f(n) = O(n d ).<br />

• a = b d : Then each term in the sum is 1, so f(n) = O(n d logn).<br />

• a > b d : Then cn d log b<br />

n<br />

! r<br />

i=1<br />

i = cn d " r1+log b n -1<br />

which is O(a log b n ) or O(n log b a ).<br />

r-1<br />

139<br />

Example: Recall binary search (see page 45 in the class notes). Searching for an<br />

element in a set requires 2 comparisons to determine which half <strong>of</strong> the set to<br />

search further. The search keeps halving the size <strong>of</strong> the set until at most 1<br />

element is left. Hence, f(n) = f(n/2) + 2. Using the Divide and Conquer theorem,<br />

we see that the cost is O(logn) comparisons.<br />

Example: Recall merge sort (see pages 81-83 in the class notes). This sorts<br />

halves <strong>of</strong> sets <strong>of</strong> elements and requires less than n comparisons to put the two<br />

sorted sublists into a sorted list <strong>of</strong> size n. Hence, f(n) = 2f(n/2) + n. Using the<br />

Divide and Conquer theorem, we see that the cost is O(nlogn) comparisons.<br />

Multiplying integers can be done recursively based on a binary decomposition<br />

<strong>of</strong> the two numbers to get a fast algorithm. The patent on this technique,<br />

implemented in hardware, made a computer company several billion dollars<br />

back when a billion dollars was real money (cf. a trillion dollars today).<br />

Why stop with integers? The technique extends to multiplying matrices, too,<br />

with real, complex, or integer entries.<br />

140


Example (funny integer multiplication): Suppose a and b have 2n length binary<br />

representations a = (a 2n41 a 2n42 … a 1 a 0 ) 2 and a = (b 2n41 b 2n42 … b 1 b 0 ) 2 . We will<br />

divide a and b into left and right halves:<br />

The trick is to notice that<br />

a = 2 n A 1 + A 0 and , where b = 2 n B 1 + B 0 and<br />

A 1 = (a 2n41 a 2n42 …a n+1 a n ) 2 and A 0 = (a n-1 a n42 …a 1 a 0 ) 2 ,<br />

B 1 = (b 2n41 b 2n42 …b n+1 b n ) 2 and B 0 = (b n-1 b n42 …b 1 b 0 ) 2 .<br />

ab = (2 2n +2 n )A 1 B 1 + 2 n (A 1 4A 0 )(B 0 4B 1 ) + (2 n +1)A 0 B 0 .<br />

Only 3 multiplies plus adds, subtracts, and shifts are required. So, f(2n) = 3f(n)<br />

+ Cn, where C is the cost <strong>of</strong> the adds, subtracts, and shifts. The Divide and<br />

Conquer theorem tells us this O(n log3 ), which is about O(n 1.6 ). The standard<br />

algorithm is O(n 2 ). It might not seem like much <strong>of</strong> an improvement, but it<br />

actually is when lots <strong>of</strong> integers are multiplied together. The trick can be applied<br />

recursively on the three multiplies in the ab line (halving 2n in the recursion).<br />

141<br />

Example (Strassen-Winograd Matrix-Matrix multiplication): We want to<br />

multiply A: m$k by B: k$n to get C: m$n. The matrix elements can be reals,<br />

complex numbers, or integers. When m = k = n, this takes O(n 3 ) operations<br />

using the standard matrix-matrix multiplication algorithm. However, Strassen<br />

first proposed a divide and conquer algorithm that reduced the exponent. The<br />

belief is that someday, someone will devise an O(n 2 ) algorithm. Some hope it<br />

will even be plausible to use such an algorithm. The variation <strong>of</strong> Strassen’s<br />

algorithm that is most commonly implemented by computer vendors in high<br />

performance math libraries is the Winograd variant. It computes the product as<br />

A 11<br />

A 12<br />

A 21<br />

A 22<br />

!<br />

#<br />

#<br />

#<br />

"<br />

$ !<br />

& #<br />

& #<br />

& #<br />

% "<br />

B 11<br />

B 12<br />

B 21<br />

B 22<br />

$<br />

&<br />

&<br />

&<br />

%<br />

!<br />

"<br />

$<br />

&<br />

&<br />

&<br />

%<br />

= C 11 C # 12<br />

# .<br />

# C 21<br />

C 22<br />

C is computed in 22 steps involving the submatrices <strong>of</strong> A, B, and intermediate<br />

temporary submatrices. An interesting question for many years was how little<br />

extra memory was needed to implement the Strassen-Winograd algorithm (see<br />

C. C. Douglas, M. Heroux, G. Slishman, and R. M. Smith, GEMMW: A<br />

142


portable Level 3 BLAS Winograd variant <strong>of</strong> Strassen's matrix-matrix multiply<br />

algorithm, Journal <strong>of</strong> Computational Physics, 110 (1994), pp. 1-10 for an<br />

answer).<br />

The 22 steps are the following:<br />

Step W mk C 11 C 12 C 21 C 22 W kn Operation<br />

1 S 7 B 22 4B 12<br />

2 S 3 A 11 4A 21<br />

3 M 4 S 3 S 7<br />

4 S 1 A 21 +A 22<br />

5 S 5 B 12 4B 11<br />

6 M 5 S 1 S 5<br />

7 S 6 B 22 4S 5<br />

8 S 2 S 1 4A 11<br />

9 M 1 S 2 S 6<br />

10 S 4 A 12 4S 2<br />

143<br />

Step W mk C 11 C 12 C 21 C 22 W kn Operation<br />

11 M 6 S 4 B 22<br />

12 T 3 M 5 +M 6<br />

13 M 2 A 11 B 11<br />

14 T 1 M 1 +M 2<br />

15 C 12 T 1 +T 3<br />

16 T 2 T 1 +M 4<br />

17 S 8 S 6 4B 21<br />

18 M 7 A 22 S 8<br />

19 C 21 T 2 4M 7<br />

20 C 22 T 2 +M 5<br />

21 M 3 A 12 B 21<br />

22 C 11 M 2 +M 3<br />

There are four tricky steps in the table above, depending on whether or not k is<br />

even or odd. Each step makes certain that we do not use more memory than is<br />

allocated for a submatrix or temporary. For example,<br />

144


• In step 4, we have to take care that with S 1 . (a) If k is odd, then copy the<br />

first column <strong>of</strong> A 21 into W mk . (b) Complete S 1 .<br />

• In step 10, we have to take care that with S 4 . (a) If k is odd, then pretend the<br />

first column <strong>of</strong> A 21 = 0 in W mk . (b) Complete S 4 .<br />

• In step 11, we have to take care that with M 6 . (a) If m is odd, then save the<br />

first row <strong>of</strong> M 5 . (b) Calculate most <strong>of</strong> M 6 . (c) Complete M 6 using (a) based<br />

on whether or not m is odd.<br />

• In step 21, we have to take care that with M 3 . (a) Caluclate M 3 using an<br />

index shift.<br />

This all sounds very complicated. However, the code GEMMW that is readily<br />

available on the Web effectively is implemented in 27 calls to subroutines that<br />

do the matrix operations and actually implements<br />

C = ?Cop(A)op(B) + DCC,<br />

where op(X) is either X, X transpose, X conjugate, or X conjugate transpose.<br />

145<br />

What is the total cost?<br />

• There are 7 submatrix-submatrix multiplies and 15 submatrix-submatrix<br />

adds or subtracts. So the cost is f(n) = 7f(n/2) + 15n 2 /4 when m=k=n. This is<br />

actually an O(n 2.807 logn) algorithm, where log 2 7 = 2.807.<br />

• The work area W mk needs 7((m+1)max(k,n)+m+4)/48 space.<br />

• The work area W kn needs 7((k+1)n+n+4)/48 space.<br />

• If C overlaps A or B in memory, an additional mn space is needed to save C<br />

before calculating DCC when D-0.<br />

• The maximum amount <strong>of</strong> extra memory is bounded by<br />

(mCmax(k,n)+kn)/3+(m+max(k,n)+k+3n)/2+32+mn. Hence, the overall<br />

extra storage is cN 2 /3, where c,{2,5}.<br />

• Typical memory usage when m=k=n is<br />

o D-0 or A or B overlap with C: 1.67N 2 .<br />

o D=0 and A and B do not overlap with C: 0.67N 2 .<br />

146


Definition: The (ordinary) generating function for a sequence a 1 , a 2 , …, a k , … <strong>of</strong><br />

!<br />

real numbers is the infinite series G(x) = " a k<br />

x<br />

k=0<br />

k . For a finite sequence<br />

{a k<br />

} n<br />

n<br />

k=0<br />

, the generating function is G(x) = ! a k<br />

x k .<br />

Examples:<br />

!<br />

1. a k = 3, G(x) = 3" x<br />

k=0<br />

k .<br />

!<br />

2. a k = k+1, G(x) = " (k+1)x<br />

k=0<br />

k .<br />

3. a k = 2 k !<br />

, G(x) = " (2x)<br />

k=0<br />

k .<br />

2<br />

4. a k = 1, 0;k;2, G(x) = ! x<br />

k=0<br />

k = x3 "1<br />

x"1 .<br />

Notes:<br />

• x is a placeholder, so that G(1) in example 4 above is undefined does not<br />

matter.<br />

• We do not have to worry about convergence <strong>of</strong> the series, either.<br />

k=0<br />

147<br />

• When solving a series using calculus, knowing the ball <strong>of</strong> convergence for<br />

the x’s is required.<br />

Lemma: f(x) = (14ax) 41 is the generating function for the sequence 1, (ax), (ax) 2 ,<br />

…, (ax) k , … since for a-0 and |ax|


Definition: The extended binomial coefficient u # &<br />

# k<br />

for u,R and k,N 0 is defined<br />

&<br />

by<br />

!<br />

#<br />

"<br />

#<br />

u<br />

k<br />

$<br />

&<br />

%<br />

&<br />

'<br />

)<br />

=<br />

u(u-1)!(u-k+1)/k! if k > 0,<br />

(<br />

) 1 if k = 0.<br />

*<br />

Extended Binomial Theorem: If u,x,R such that |x|


Note: Generating functions can be used to solve many counting problems.<br />

Examples:<br />

• How many solutions are there to the constrained problem a+b = 9 for 3;a;5<br />

and 4;b;6? There are 3 total. The number <strong>of</strong> solutions with the constraints<br />

is the coefficient <strong>of</strong> x 9 in (x 3 +x 4 +x 5 )(x 4 +x 5 +x 6 ). We choose x a and x b from<br />

the two factors, respectively, so that a+b = 9. By inspection, there are only 3<br />

choices for a and b.<br />

• How many ways can 8 CPUs be distributed in 3 servers if each server gets<br />

2-4 CPUs each? The generating function is f(x) = (x 2 +x 3 +x 4 ) 3 . We need the<br />

coefficient <strong>of</strong> x 8 in f(x). Expansion <strong>of</strong> f(x) gives us 6 ways.<br />

Note: Maple or Mathematica is really useful in the examples above.<br />

151<br />

Note: Generating functions are useful in solving recurrence relations, too.<br />

Example: a k = 3a k41 , k > 0 with a 0 = 2. Let f(x) = " a k<br />

x k be the generating<br />

"<br />

function for {a k }. Then xf(x) = # a k!1<br />

x<br />

k=1<br />

k . Using the recurrence relation<br />

directly, we have<br />

!<br />

k=0<br />

!<br />

k=0<br />

f(x) – 3xf(x) = ! a k<br />

x k " 3!<br />

a k"1<br />

x<br />

k=1<br />

k<br />

!<br />

= a 0<br />

+ (a k<br />

! 3a k!1<br />

)x k<br />

= a 0<br />

= 2<br />

"<br />

Hence, f(x) 4 3xf(x) = (143x)f(x) = 2 or f(x) = 2 / (143x). Using the identity for<br />

(14ax) 41 , we see that<br />

!<br />

k=0<br />

k=1<br />

f(x) = " 2!3 k x k or a k<br />

= 2!3 k .<br />

!<br />

152


Example: a n = 8a n41 + 10 n41 with a 0 = 1, which gives us a 1 = 9. Find a n in closed<br />

form. First multiply the recurrence relation by x n to give us<br />

a n x n + 8a n!1<br />

x n + 10 n-1 x n !<br />

. If f(x) = " a k<br />

x k , then<br />

f(x) 4 1 =<br />

!<br />

!<br />

!<br />

k=1<br />

!<br />

k=1<br />

k=0<br />

a k<br />

x k<br />

= (8a k-1<br />

x k +10 k-1 x k )<br />

= 8xf(x) + x/(1410x)<br />

Hence,<br />

f(x) = 1!9x<br />

(1!8x)(1!10x)<br />

= 1 "<br />

1<br />

2 1!8x + 1!10x<br />

1<br />

$<br />

$<br />

'<br />

#<br />

!<br />

k=0<br />

!<br />

%<br />

'<br />

'<br />

&<br />

= 1 8 k +10 k<br />

2<br />

#<br />

&x k<br />

or a n = .5(8 k +10 k ).<br />

"<br />

$<br />

%<br />

153<br />

Note: It is possible to prove many identities using generating functions.<br />

Exclusion-Inclusion Theorem: Given sets A i , 1;i;n, the number <strong>of</strong> elements in<br />

the union is<br />

n<br />

i=1<br />

A i<br />

n<br />

i=1<br />

"<br />

! = ! A i<br />

! 1!i


Example: A factory produces vehicles that are car or truck based: 2000 could be<br />

cars, 4000 could be trucks, and 3200 are SUV’s, which can be car or truck based<br />

(depending on the frames). How many vehicles were produced? Let A 1 be the<br />

number <strong>of</strong> cars and A 2 be the number <strong>of</strong> trucks. There are<br />

A 1<br />

!A 2<br />

= A 1<br />

+ A 2<br />

! A 1<br />

"A 2<br />

= 2000 + 4000 ! 3200 = 2800.<br />

Theorem: The number <strong>of</strong> onto functions from a set <strong>of</strong> m elements to a set <strong>of</strong> n<br />

elements with m,n,N is<br />

n m 4 C(n,1)(n41) m41 + C(n,2) )(n41) m41 4 … + (41) n41 C(n,n41).<br />

155<br />

Definition: A derangement is a permutation <strong>of</strong> objects such that no object is in<br />

its original position.<br />

Theorem: The number <strong>of</strong> derangements <strong>of</strong> a set <strong>of</strong> n elements is<br />

#<br />

"<br />

n<br />

D n = 1! (!1) k 1<br />

%<br />

(<br />

% k=1<br />

n! k! (<br />

$<br />

Example: I hand back graded exams randomly. What is the probability that no<br />

student gets his or her own exam? It is P n = D n / n! since there are n! possible<br />

permutations. As n%., P n %e 41 .<br />

&<br />

'<br />

156


Relations<br />

Definition: A relation on a set A is a subset <strong>of</strong> A$A.<br />

Definition: A binary relation between two sets A and B is a subset <strong>of</strong> A$B. It is<br />

a set R <strong>of</strong> ordered pairs, denoted aRb when (a,b),R and aRbwhen (a,b)6R.<br />

Definition: A n-ary relation on n sets A 1 , …, A n is a subset <strong>of</strong> A 1 $…$A n . Each<br />

A i is a domain <strong>of</strong> the relation and n is the degree <strong>of</strong> the relation.<br />

Examples:<br />

• Let f: A%B be a function. Then the ordered pairs (a,f(a)), (a,A, forms a<br />

binary relation.<br />

• Let A = {<strong>Spring</strong>field} and B = {U.S. state | )<strong>Spring</strong>field in the state}. Then<br />

(<strong>Spring</strong>field,U.S. states) is a relation with about 44 elements (the so-called<br />

Simpsons relation).<br />

157<br />

Theorem: Let A be a set with n elements. There are 2 n2 unique relations on A.<br />

Pro<strong>of</strong>: We know there are n 2 elements in A$A and that there are 2 m possible<br />

subsets <strong>of</strong> a set with m elements. Hence, the result.<br />

Definitions: Consider a relation R on a set A. Then<br />

• R is reflexive if (a,a),R, (a,A.<br />

• R is symmetric if (a,b),R and (b,a),R, (a,b,A.<br />

• R is antisymmetric if (a,b),R and (b,a),R, then a=b, (a,b,A.<br />

• R is transitive if (a,b),R and (b,c),R, then (a,c),R, (a,b,c,A.<br />

Theorem: Let A be a set with n elements. There are 2 n(n41) unique transitive<br />

relations on A.<br />

Pro<strong>of</strong>: Each <strong>of</strong> the n pairs (a,a),R. The remaining n(n41) pairs may or may not<br />

be in R. The product rule and previous theorem give the result.<br />

158


Examples: Let A = {1, 2, 3, 4}.<br />

• R 1 = {(1,1), (1,2), (2,1), (2,2), (3,4), (4,1), (4,4)} is<br />

o just a relation<br />

• R 2 = {(1,1), (1,2), (2,1)} is<br />

o symmetric<br />

• R 3 = {(1,1), (1,2), (1,4), (2,1), (2,2), (3,3), (4,1), (4,4)} is<br />

o reflexive and symmetric<br />

• R 4 = {(2,1), (3,1), (3,2), (4,1), (4,2), (4,3)} is<br />

o antisymmetric and transitive<br />

• R 5 = {(1,1), (1,2), (1,3), (1,4), (2,1), (2,2), (2,3), (2,4), (3,3), (3,4), (4,1),<br />

(4,4)} is<br />

o reflexive, antisymmetric, and transitive<br />

• R 6 = {(3,4)} is<br />

o antisymmetric<br />

Note: We will come back to these examples when we get around to<br />

representations <strong>of</strong> relations that work in a computer.<br />

159<br />

Note: We can combine two or more relations to get another relation. We use<br />

standard set operations (e.g., 3, 2, #, 4, …).<br />

Definition: Let R be a relation on a set A to B and S a relation on B to a set C.<br />

Then the composite <strong>of</strong> R and S is the relation S!R such that if (a,b),R and<br />

(b,c),S, then (a,c), S!R, where a,A, b,B, and c,C.<br />

Definition: Let R be a relation on a set A. Then R n is defined recursively: R 1 = R<br />

and R n =R n!1 !R , n>1.<br />

Theorem: The relation R is transitive if and only if R0R n , n


Representation: The relation R from a set A to a set B can be represented by a<br />

zero-one matrix M R = [m ij ], where<br />

#<br />

m ij<br />

= 1 if (a i ,b j )!R,<br />

%<br />

$<br />

% 0 if (a i<br />

,b j<br />

)"R.<br />

Notes:<br />

&%<br />

• This is particularly useful on computers, particularly ones with hardware bit<br />

operations for packed words.<br />

• M R contains I for reflexive relations.<br />

• M R<br />

= M R<br />

T for symmetric relations.<br />

• m ij = 0 or m ji = 0 when i-j for antisymmetric relations.<br />

161<br />

Examples:<br />

• M R<br />

=<br />

• M R<br />

=<br />

1 1 0<br />

1 1 1<br />

0 1 1<br />

!<br />

#<br />

#<br />

#<br />

#<br />

"<br />

#<br />

!<br />

#<br />

#<br />

#<br />

#<br />

"<br />

#<br />

$<br />

&<br />

&<br />

&<br />

&<br />

%<br />

&<br />

$<br />

&<br />

&<br />

&<br />

&<br />

%<br />

&<br />

is transitive and symmetric.<br />

0 1 0<br />

0 0 0 is antisymmetric.<br />

0 1 0<br />

162


Representation: A relation can be represented as a directed graph (or digraph).<br />

For (a,b),R, a and b are vertices (or nodes) in the graph and a directional edge<br />

runs from a to b.<br />

Example: The following digraph represents {(a,b), (b,c), (c,a), (c,b)}.<br />

a" " b<br />

" c<br />

What about all <strong>of</strong> those examples on page 159 <strong>of</strong> the class notes? We can do all<br />

<strong>of</strong> them over in either representation.<br />

163<br />

Examples (from page 159):<br />

!<br />

# 1 1 0 0<br />

#<br />

• M R1<br />

= #<br />

1 1 0 0<br />

#<br />

#<br />

0 0 0 1<br />

#<br />

1 0 0 1<br />

"#<br />

!<br />

#<br />

#<br />

#<br />

#<br />

#<br />

#<br />

"#<br />

!<br />

#<br />

#<br />

#<br />

#<br />

#<br />

#<br />

"#<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

1 1 0 0<br />

• M R2<br />

=<br />

1 0 0 0<br />

0 0 0 0<br />

0 0 0 0<br />

1 1 0 1<br />

• M R3<br />

=<br />

1 1 0 0<br />

0 0 1 0<br />

1 0 0 1<br />

or a digraph a 1 " " a 2<br />

164


!<br />

# 0 0 0 0<br />

#<br />

• M R4<br />

= #<br />

1 0 0 0<br />

#<br />

#<br />

1 1 0 0<br />

#<br />

1 1 1 0<br />

"#<br />

!<br />

#<br />

#<br />

#<br />

#<br />

#<br />

#<br />

"#<br />

!<br />

#<br />

#<br />

#<br />

#<br />

#<br />

#<br />

"#<br />

1 1 1 1<br />

• M R5<br />

=<br />

1 1 1 1<br />

0 0 1 0<br />

1 0 0 1<br />

0 0 0 0<br />

• M R6<br />

=<br />

0 0 0 0<br />

0 0 0 1<br />

0 0 0 0<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

or the digraph a 3 " " a 4<br />

165<br />

Definition: A relation on a set A is an equivalence relation if it is reflexive,<br />

symmetric, and transitive. Two elements a and b that are related by an<br />

equivalence relation are called equivalent and denoted a~b.<br />

Examples:<br />

• Let A = Z. Define aRb if and only if either a = b or a = 4b.<br />

o symmetric: aRa since a = a.<br />

o reflexive: aRb E bRa since a = ±b.<br />

o transitive: aRb and bRc E aRc since a = ±b = ±c.<br />

• Let A = R. Define aRb if and only if a4b,Z.<br />

o symmetric: aRa since a4a = 0,Z.<br />

o reflexive: aRb E bRa since a4b,Z E 4(a4b) = b4a,Z.<br />

o transitive: aRb and bRc E aRc since (a4b)+(b4c) ,Z E a4c,Z.<br />

166


Definition: Let R be an equivalence relation on a set A. The set <strong>of</strong> all elements<br />

that are related to an element a,A is called the equivalence class <strong>of</strong> a and is<br />

denoted by [a] R . When R is obvious, it is just [a]. If b,[a] R , b is called a<br />

representative <strong>of</strong> this equivalence class.<br />

Example: Let A = Z. Define aRb if and only if either a = b or a = 4b. There are<br />

two cases for the equivalence class:<br />

• [0] = {0}<br />

• [a] = {a, 4a} if a-0.<br />

167<br />

Theorem: Let R be an equivalence relation on a set A. For a,b,A, the following<br />

are equivalent:<br />

1. aRb<br />

2. [a] = [b]<br />

3. [a] 3 [b] - /.<br />

Pro<strong>of</strong>: 1 E 2 E 3 E 1.<br />

• 1 E 2: Assume aRb. Suppose c,[a]. Then aRc. Due to symmetry,<br />

we know that bRa. Knowing that bRa and aRc, by transitivity,<br />

bRc. Hence, c,[b]. A similar argument shows that if c,[b], then<br />

c,[a]. Hence, [a] = [b].<br />

• Assume that [a] = [b]. Since a,A and R is reflexive, [a] 3 [b] - /.<br />

• Assume [a] 3 [b] - /. So there is a c,[a] and c,[b], too. So, aRc<br />

and bRc. By symmetry, cRb. By transitivity, aRc and cRb, so aRb.<br />

Lemma: For any equivalence relation R on a set A, [a] R<br />

=A<br />

a!A ! .<br />

Pro<strong>of</strong>: For all a,A, a,[a] R .<br />

168


Definition: A partition <strong>of</strong> a set S is a collection <strong>of</strong> disjoint sets whose union is A.<br />

Theorem: Let R be an equivalence relation on a set S. Then the equivalence<br />

classes <strong>of</strong> R form a partition <strong>of</strong> S. Conversely, given a partition {A i | i,I} <strong>of</strong> the<br />

set S, there is an equivalence relation R that has the sets A i , i,I, as its<br />

equivalence classes.<br />

169<br />

Graphs<br />

Definition: A graph G = (V,E) consists <strong>of</strong> a nonempty set <strong>of</strong> vertices V and a set<br />

<strong>of</strong> edges E. Each edge has either one or two vertices as endpoints. An edge<br />

connects its endpoints.<br />

Note: We will only study finite graphs (|V| < .).<br />

Categorizations:<br />

• A simple graph has edges that connects two different vertices and no two<br />

edges connect the same vertex.<br />

• A multigraph has multiple edges connecting the same vertices.<br />

• A loop is a set <strong>of</strong> edges from a vertex back to itself.<br />

• A pseudograph is a graph in which the edges do not have a direction<br />

associated with them.<br />

• An undirected graph is a graph in which the edges do not have direction.<br />

• A mixed graph has both directed and undirected edges.<br />

170


Definition: Two vertices u and v in an undirected graph G are adjacent (or<br />

neighbors) in G if u and v are endpoints <strong>of</strong> an edge e in G. Edge e is incident to<br />

{u,v} and e connects u and v.<br />

Definition: The degree <strong>of</strong> a vertex v, denoted deg(v), in an undirected graph is<br />

the number <strong>of</strong> edges incident with it except that loops contribute twice to the<br />

degree <strong>of</strong> that vertex. If deg(v) = 0, then it is isolated. If deg(v) = 1, then it is a<br />

pendant.<br />

Handshaking Theorem: If G = (V,E) is an undirected graph with e edges, then<br />

#<br />

&<br />

e= % deg(v) ( /2.<br />

$<br />

" v!V<br />

'<br />

Pro<strong>of</strong>: Each edge contributes 2 to the sum since it is incident to 2 vertices.<br />

Example: Let G = (V,E). Suppose |V| = 100,000 and deg(v) = 4 for all v,V.<br />

Then there are (4$100,000)/2 = 200,000 edges.<br />

171<br />

Theorem: An undirected graph has an even number <strong>of</strong> vertices and an odd<br />

degree.<br />

Definition: Let (u,v),E in a directed graph G(V,E). Then u and v are the initial<br />

and terminal vertices <strong>of</strong> (u,v), respectively. The initial and terminal vertices <strong>of</strong> a<br />

loop (u,u) are both u.<br />

Definition: The in-degree <strong>of</strong> a vertex, denoted deg 4 (v), is the number <strong>of</strong> edges<br />

with v as their terminal vertex. The out-degree <strong>of</strong> a vertex, denoted deg + (v), is<br />

the number <strong>of</strong> edges with v as their initial vertex.<br />

Theorem: For a directed graph G(V,E), # deg ! (v) = deg<br />

v"V # + (v) = E .<br />

v"V<br />

172


Examples <strong>of</strong> Simple Graphs:<br />

• A complete graph has an edge between any vertex.<br />

• A cycle C n is a graph with |V|


Representation: For graphs without multiple edges we can use adjacency lists or<br />

matrices. For general graphs we can use incidence matrices.<br />

Definition: Let G(V,E) have no multiple edges. The adjacency list L G = {a v } v,V ,<br />

where a v = adj(v) = {w,V | w is adjacent to v}.<br />

Definition: Let G(V,E) have no multiple edges. The adjacency matrix A G = [a ij ]<br />

is<br />

!<br />

a ij<br />

= 1 if {v i ,v # } is an edge <strong>of</strong> G,<br />

j<br />

"<br />

# 0 otherwise.<br />

Example:<br />

$<br />

v 1<br />

v 2<br />

v 4<br />

v 3<br />

results in A G<br />

=<br />

0 1 1 0<br />

1 0 0 1<br />

1 0 0 1<br />

0 1 1 0<br />

!<br />

#<br />

#<br />

#<br />

#<br />

#<br />

#<br />

"#<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

and L G<br />

=<br />

v 1<br />

:<br />

v 2<br />

:<br />

v 3<br />

:<br />

v 4<br />

:<br />

!<br />

#<br />

#<br />

"<br />

#<br />

#<br />

#<br />

$<br />

v 2<br />

,v 3<br />

v 1<br />

,v 4<br />

v 1<br />

,v<br />

.<br />

4<br />

v 2<br />

,v 3<br />

175<br />

Note: For an undirected graph, A G<br />

= A G T . However, this is not necessarily true<br />

for a directed graph.<br />

Definition: The incidence matrix M = [m ij ] for G(V,E) is<br />

!<br />

m ij<br />

= 1 when edge e i is incident with v j ,<br />

#<br />

"<br />

# 0 otherwise.<br />

$<br />

Definition: The simple graphs G(V,E) and H = (W,F) are isomorphic if there is<br />

an isomorphism f: V%W, a one to one, onto function, such that a and b are<br />

adjacent in G if and only if f(a) and f(b) are adjacent in H for all a,b,V.<br />

176


Examples:<br />

v 1<br />

v 2<br />

•<br />

and<br />

v 4<br />

v 3<br />

v 1<br />

v 2<br />

v 3<br />

v 4<br />

are not isomorphic.<br />

v 1<br />

v 2<br />

•<br />

and<br />

v 3<br />

v 4<br />

v 1<br />

v 2<br />

v 4<br />

v 3<br />

are isomorphic.<br />

Note: Isomorphic simple graphs have the same number <strong>of</strong> vertices and edges.<br />

Definition: A property preserved by graph isomorphism is called a graph<br />

invariant.<br />

Note: Determining whether or not two graphs are isomorphic has exponential<br />

worst case complexity, but linear average case complexity using the bet<br />

algorithms known.<br />

177<br />

Definition: Let G = (V,E) be an undirected graph and n,N. A path <strong>of</strong> length n<br />

for u,v,V is a sequence <strong>of</strong> edges e 1 , e 2 , …, e n ,E with associated vertices in V<br />

<strong>of</strong> u = x 0 , x 1 , …, x n = v. A circuit is a path with u = v. A path or circuit is simple<br />

if all <strong>of</strong> the edges are distinct.<br />

Notes:<br />

• We already defined these terms for directed graphs.<br />

• The terminal vertex <strong>of</strong> the first edge in a path is the initial vertex <strong>of</strong> the<br />

second edge. We can define a path using a recursive definition.<br />

Definition: An undirected graph is connected if there is a path between every<br />

pair <strong>of</strong> distinct vertices in the graph.<br />

178


Theorem: There is a simple path between every distinct pair <strong>of</strong> vertices <strong>of</strong> a<br />

connected undirected graph G = (V,E).<br />

Pro<strong>of</strong>: Let u,v,V such that u - v. Since G is connected, there is a path from u to<br />

v that has minimum length n. Suppose this path is not simple. Then in this<br />

minimum length path, there is some pair <strong>of</strong> vertices x i =x j ,V for some 0;i


Theorem: Let G = (V,E) be a graph with adjacency matrix A. The number <strong>of</strong><br />

different paths <strong>of</strong> length n from v i to v j , where v i ,v j ,V and n,N, is the (i,j) entry<br />

in A n .<br />

Example:<br />

v 1<br />

v 2<br />

v 4<br />

v 3<br />

A =<br />

0 1 1 0<br />

1 0 0 1<br />

1 0 0 1<br />

0 1 1 0<br />

!<br />

#<br />

#<br />

#<br />

#<br />

#<br />

#<br />

"#<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

and A 4 =<br />

8 0 0 8<br />

0 8 8 0<br />

0 8 8 0<br />

8 0 0 8<br />

!<br />

#<br />

#<br />

#<br />

#<br />

#<br />

#<br />

"#<br />

$<br />

&<br />

&<br />

&<br />

&<br />

&<br />

&<br />

%&<br />

Note: The theorem can be used to find the shortest path between any two<br />

vertices and also to determine if a graph is connected.<br />

181<br />

Definition: Let G = (V,E) have an associated weighting function w(u,v):<br />

V$V%R. G is called a weighted graph. The weighted length <strong>of</strong> a path in G is<br />

the sum <strong>of</strong> the weights for the edges in the path.<br />

Example: Let G = (V,E) be a weighted graph where V represents airports. Then<br />

some interesting weighting functions include the following between pairs <strong>of</strong><br />

distinct airports:<br />

• Distance<br />

• Flight times<br />

• Airfares<br />

• Frequent flier miles<br />

• Frequent flier qualification miles<br />

Note: Weighted graphs are extremely important in analyzing transportation <strong>of</strong><br />

goods and people and trying to minimize time and expenses.<br />

182


Dijkska’s Algorithm (Shortest Path) – [published in 1959]<br />

Procedure Dijkstra( G = (V,E) with w: V$V%R + . G is a weighted connected<br />

simple graph,<br />

a,z,V: initial and terminal vertices )<br />

for i := 1 to n<br />

L(i) := .<br />

L(a) := 0<br />

S := /<br />

while z6S<br />

u := a vertex not in S with L(u) minimal<br />

S := S2{u}<br />

for all v,V such that v6S<br />

if L(u) + w(u,v) < L(v) then L(v) := L(u,v) + w(u,v)<br />

{ L(z) = length <strong>of</strong> shortest path from a to z. }<br />

183<br />

Theorem: Dijkstra’s algorithm finds the length <strong>of</strong> the shortest path between two<br />

vertices in a connected simple undirected weighted graph. The algorithm uses<br />

O(n 2 ) comparison and addition operations.<br />

Traveling Salesman Problem: Find the circuit <strong>of</strong> minimum total weight in a<br />

weighted complete undirected graph that visits every vertex exactly once and<br />

returns to its starting vertex.<br />

Note: There are n! possible circuits to consider, which is intractable when n is<br />

sufficiently large. A tremendous amount <strong>of</strong> research has been devoted to finding<br />

fast approximate solution algorithms. The best ones can produce a circuit <strong>of</strong><br />

length 1,000 in a few seconds and still be within 2% <strong>of</strong> the optimum circuit.<br />

184


Definition: A coloring <strong>of</strong> a simple graph is the assignment <strong>of</strong> a color to each<br />

vertex <strong>of</strong> the graph so that no adjacent vertices are assigned the same color.<br />

Definition: The chromomatic number F(G) is the least number <strong>of</strong> colors needed<br />

for a coloring <strong>of</strong> the graph G = (V,E).<br />

Definition: A planar graph is a graph that can be drawn in a plane with no edges<br />

crossing in the picture.<br />

Four Color Theorem: If G is a planar graph, then F(G) ; 4.<br />

Note: The Four Color Conjecture was made in the 1850’s and not proven until<br />

1976. Like Fermat’s last theorem, this theorem became famous partly for how<br />

many wrong pro<strong>of</strong>s (some quite ingenious) were either published or submitted<br />

for publication.<br />

185<br />

Trees<br />

Definition: A tree is a connected undirected graph with no simple circuits. A<br />

weighted tree is a tree with weights associated with the edges.<br />

Uses:<br />

• An efficient data structure for searching a list.<br />

o Useful in encoding data for transmission.<br />

o Computational complexity easily determined for algorithms using trees.<br />

• Weighted trees have edges with weights.<br />

o Useful in decision making.<br />

o Used by telecoms to dynamically connect calls cheaply.<br />

Historical Note: Trees were first developed in the context <strong>of</strong> this course to<br />

describe molecules in chemistry, where atoms were the vertices and bonds were<br />

the edges.<br />

186


Theorem: An undirected graph T = (V,E) is a tree if and only if there is a unique<br />

simple path between any two <strong>of</strong> its distinct vertices.<br />

Pro<strong>of</strong>:<br />

1. Assume T is a tree, so it has no simple circuits. Since T is connected, for all<br />

distinct u,v,V, there is exactly one simple path between u and v.<br />

Otherwise, there is another simple path. Combining the two simple paths is<br />

a circuit, which is a contradiction that T is a tree.<br />

2. Assume that there is a unique simple path between any two distinct vertices<br />

u,v,V. The T is connected. T has no simple circuits since then there would<br />

be two simple paths between u and v (thus forming a crcuit), which is a<br />

contradiction.<br />

Definition: A rooted tree is a tree with one vertex designated as the root and<br />

every edge is directed away from the root.<br />

Note: Any tree can become a rooted tree by picking the right vertex as the root.<br />

187<br />

Terminology/Definitions: Let T = (V,E) be a rooted tree. Then<br />

• If v,V is not a root <strong>of</strong>, the parent w,V <strong>of</strong> v is a vertex with an edge<br />

directed at v and v is a child <strong>of</strong> u.<br />

• If v i ,V are children <strong>of</strong> the same u,V, they are siblings.<br />

• The ancestors v i ,V <strong>of</strong> u,V are any vertices in V except the root which are<br />

in the path from the root to u.<br />

• The descendents v i ,V <strong>of</strong> u,V are all vertices with u as an ancestor.<br />

• A leaf v,V is a vertex with no children.<br />

• An internal vertice v,V has children.<br />

• A subtree is the subgraph formed from a,V and all <strong>of</strong> its descendents and<br />

the edges incident to these descendents.<br />

• The height <strong>of</strong> a rooted tree T, denoted h(T), is the maximum number <strong>of</strong><br />

levels (or vertices).<br />

• A balanced rooted tree T has all <strong>of</strong> its leaves at h(T) or h(T)-1.<br />

188


Definition: A m-ary tree is a rooted tree such that every internal vertex has no<br />

more than m children. A full m-ary tree is a rooted tree such that every internal<br />

vertex has exactly m children. If m = 2, it is a (full) binary tree.<br />

Definition: An ordered rooted tree is a rooted tree with an ordering applied to<br />

the children <strong>of</strong> all <strong>of</strong> the children <strong>of</strong> the root and the internal vertices.<br />

Examples:<br />

• Management charts<br />

• Directory based file or memory systems<br />

Theorem: A tree with n vertices has n41 edges.<br />

The pro<strong>of</strong> is by mathematical induction.<br />

Theorem: A full m-ary tree with i internal vertices contains n = mi+1 vertices.<br />

Pro<strong>of</strong>: There are mi children plus the root.<br />

189<br />

Theorem: A full m-ary tree with<br />

• n vertices has i = (n41)/m internal vertices and q = [(m41)n+1]/m leaves.<br />

• i internal vertices has n = m+1 vertices and q = (m41)i + 1 leaves.<br />

• q leaves has n = (mq41) / (m41) vertices and i = (q41) / (m41) internal<br />

vertices.<br />

Theorem: There are at most m h leaves in a m-ary tree <strong>of</strong> height h.<br />

The pro<strong>of</strong> uses mathematical induction.<br />

Corollary: If an m-ary tree <strong>of</strong> height h has q leaves, then h < 9log m q:. For a full<br />

m-ary and balnced m-ary tree, h = 9log m q:.<br />

190


Definition: A binary search tree T = (V,E) is a binary tree with a key for each<br />

vertex. The keys are ordered such that a key for a vertex is greater in value than<br />

all keys associated with its left subtree and less in value than all keys associated<br />

with its right subtree. The key for vertex v,V is denoted by label(v).<br />

Note: Recursive algorithms search binary trees for keys in O(log h n) operations<br />

for a binary tree <strong>of</strong> height h and with n vertices.<br />

Notation: Let T = (V,E) be a binary tree.<br />

• Let root(T) be the root vertex in T.<br />

• Let left_child(v) and right_child(v) refer to the left or right child <strong>of</strong> a root or<br />

internal vertice v in a binary tree.<br />

• Let add_new_vertex(parent, value) add a new left or right vertex to the<br />

parent vertex with a key <strong>of</strong> value. The details are left intentionally fuzzy.<br />

Note: One <strong>of</strong> the most common operation with a binary tree is to search it.<br />

Another is to search a binary tree for a key and add it if it is missing.<br />

191<br />

procedure insertion( T = (V,E): binary tree, x: item )<br />

v := root(T)<br />

while v - / and label(v) - x<br />

if x < label(v) then<br />

if left_child(v) - / then<br />

v := left_child(v)<br />

else<br />

add_new_vertex(left_child(v), x) and v = /<br />

else<br />

if right_child(v) - / then<br />

v := right_child(v)<br />

else<br />

add_new_vertex(right_child(v), x) and v = /<br />

if root(T) = / then<br />

add_new_vertex(T, x)<br />

else if v = / or label(v) = / then<br />

label the new vertex x and set v := the new vertex<br />

{ v = location <strong>of</strong> x. }<br />

192


Definition: A decision tree is a rooted tree in which the children are the possible<br />

outcomes <strong>of</strong> their ancestors’ keys.<br />

Note: There is usually a weighting associated with a decision tree. The keys may<br />

not be unique.<br />

Definition: A prefix code is an encoding based on bit strings representing<br />

symbols such that a symbol, as a bit string, never occurs as the first part <strong>of</strong><br />

another symbol’s bit string.<br />

Example: We can represent normally a-z in 5 bits and a-zA-Z in 6 bits. Suppose<br />

we only have 3 letters: a = 0, c = 10, and t = 11. Then cat = 10011. Wowee! We<br />

saved one whole bit!!!<br />

Representation: Prefix codes form a binary tree.<br />

193<br />

Example: The prefix code for a = 0, c = 10, and t = 11 is stored as<br />

•<br />

0 1<br />

a • •<br />

0 1<br />

c • t •<br />

Definition: A Huffman coding takes the frequency <strong>of</strong> symbols and is the prefix<br />

code with the smallest number <strong>of</strong> bits.<br />

Note: Huffman coding was a course project by a graduate student at MIT in the<br />

1950’s. Needless to say, his pr<strong>of</strong>essor was stunned.<br />

194


procedure Huffman(a i : symbols, w i : frequencies, 1;i;n )<br />

F := forest <strong>of</strong> n rooted trees, each with a single vertex a i with weight w i<br />

while F - tree<br />

Replace the rooted trees T and T’ <strong>of</strong> least weights from F with w(T) <<br />

w(T’) with a tree T’’ having a new root that has T and T’ as it left and<br />

right children. Label the edge to T as 0 and the edge to T’ as 1.<br />

Assign w(T) + w(T’) to the new tree T’’<br />

{ Huffman encoding tree is complete. }<br />

195<br />

Example: Given {(a,1), (c,2), (t,3)} as (symbol,frequency). What is the Huffman<br />

coding?<br />

Initial forest • (a,1) • (c,2) • (t,3)<br />

Step1 • 3 • (t,3)<br />

0 1<br />

• a • c<br />

Step 2 • 6<br />

0 1<br />

• a •<br />

0 1<br />

• c • t<br />

196


Note: Game trees are another highly studied tree.<br />

Definition (Minimax Strategy): The value <strong>of</strong> a vertex in a game tree is defined<br />

recursively as:<br />

1. The value <strong>of</strong> a leaf is the pay<strong>of</strong>f to the first player when the game terminates<br />

in the position represented by this leaf.<br />

2. The value <strong>of</strong> an internal vertex at an even level is the maximum <strong>of</strong> the<br />

values <strong>of</strong> its children. The value <strong>of</strong> an internal vertex at an odd level is the<br />

inximum <strong>of</strong> the values <strong>of</strong> its children.<br />

Theorem: The value <strong>of</strong> a vertex v <strong>of</strong> a game tree tells us the pay<strong>of</strong>f to the first<br />

player if both players follow the Minimax strategy and play starts from the<br />

position represented by vertex v.<br />

Notes: Game trees are<br />

• Enormous (not just slightly, but really, really enormous)<br />

• Lead to optimal solutions (if you can compute them)<br />

• Basically intractable using standard computer<br />

197<br />

Note: Tree traversal is extremely important to accessing data. There are many<br />

algorithms, each with a plus and a minus. We will study three traversal<br />

algorithms:<br />

• Preorder<br />

• Inorder<br />

• Postorder<br />

These traversal methods not only are used for data storage, but for representing<br />

arithmetic that is useful for compilers.<br />

Definition: The universal addressing system is defined recursively for an<br />

ordered rooted tree T = (V,E). The root r,V is labeled 0 and its k children are<br />

labeled 1, …, k. For each vertex v,V, labeled A v , its n children are labeled A v .1,<br />

A v .2, …, A v .n.<br />

198


Example: Given a tree T = (V,E) with keys ordered 0 < 1 < 1.1 < 2 < 2.1 < 2.2 <<br />

2.2.1 < 2.3, we represent it as<br />

• 0<br />

• 1 • 2<br />

• 1.1 • 2.1 • 2.2 • 2.3<br />

• 2.2.1<br />

We will use this example for quite some time.<br />

199<br />

Definition (Preorder Traversal): Let T be an ordered rooted tree with root r. If T<br />

consists only <strong>of</strong> r, then r is the preorder traversal <strong>of</strong> T. Otherwise, suppose T 1 ,<br />

T 2 , …, T n are subtrees at r from left to right in T. Then the preorder traversal<br />

begins at r and continues by traversing T 1 in preorder, T 2 in preorder, …, and T n<br />

in preorder.<br />

Example: In the tree example at the top <strong>of</strong> page 199, the preorder traversal order<br />

is 0, 1, 1.1, 2, 2.1, 2.2, 2.2.1, and 2.3.<br />

Definition (Inorder Traversal): Let T be an ordered rooted tree with root r. If T<br />

consists only <strong>of</strong> r, then r is the inorder traversal <strong>of</strong> T. Otherwise, suppose T 1 , T 2 ,<br />

…, T n are subtrees at r from left to right in T. Then the inorder traversal begins<br />

by traversing T 1 in inorder, then r, and continues with T 2 in inorder, …, and T n<br />

in inorder.<br />

Example: In the tree example at the top <strong>of</strong> page 199, the inorder traversal order<br />

is 1.1, 1, 0, 2.1, 2, 2.2.1, 2.2, and 2.3.<br />

200


Definition (Postorder Traversal): Let T be an ordered rooted tree with root r. If T<br />

consists only <strong>of</strong> r, then r is the postorder traversal <strong>of</strong> T. Otherwise, suppose T 1 ,<br />

T 2 , …, T n are subtrees at r from left to right in T. Then the postorder traversal<br />

begins by traversing T 1 in postorder, T 2 in postorder, …, T n in postorder, and r.<br />

Example: In the tree example at the top <strong>of</strong> page 199, the postorder traversal<br />

order is 1.1, 1, 2.1, 2.2.1, 2.2, 2.3, 2, and 0.<br />

Notation: Let add_to_list(v) be a global function to append a vertex v to a list.<br />

The list must be initialized to / at some point before use.<br />

Note: The tree traversal algorithms are all easily defined recursively using a<br />

global list that must be initialized first.<br />

201<br />

procedure preorder_traversal( T: ordered rooted tree )<br />

r := root(T)<br />

add_to_list(r)<br />

for each child c <strong>of</strong> r from left to right<br />

T(c) := subtree with c as its root<br />

preorder_traversal( T(c) )<br />

procedure inorder_traversal( T: ordered rooted tree )<br />

r := root(T)<br />

if r = leaf then add_to_list(r)<br />

else<br />

q := first child <strong>of</strong> r from left to right<br />

T(q) := subtree with q as its root<br />

inorder( T(q) )<br />

add_to_list(r)<br />

for each remaining child c <strong>of</strong> r from left to right<br />

T(c) := subtree with c as its root<br />

inorder_traversal( T(c) )<br />

202


procedure postorder_traversal( T: ordered rooted tree )<br />

r := root(T)<br />

for each child c <strong>of</strong> r from left to right<br />

T(c) := subtree with c as its root<br />

postorder_traversal( T(c) )<br />

add_to_list(r)<br />

Definition: Logic and arithmetic can be rewritten using binary trees. Using<br />

inorder, preorder, or postorder traversal <strong>of</strong> the binary tree is known as infix,<br />

prefix, or postfix notation.<br />

Note: The best known is prefix notation, otherwise known as reverse Polish<br />

notation (RPN). This was used in the first pocket sized electronic calculator, the<br />

HP-45 (1972). This notation is valuable in writing compilers, too. See<br />

• http://glow.sourceforge.net/tutorial/lesson7/side_rpn.html<br />

• http://www.hpmuseum.org/rpn.htm<br />

203<br />

Examples: Parentheses disappear completely. It is best to think <strong>of</strong> a RPN<br />

calculator as a stack machine where data is in the stack and arithmetic operates<br />

on the top elements <strong>of</strong> the stack.<br />

• The expression 2+3 is written as 2 3 + in RPN.<br />

• The expression [(9+3) * (4/2)] - [(3x) + (2-y)] is written as 9 3 + 4 2 / * 3 x<br />

* 2 y - + - in RPN, where x and y are numbers.<br />

Tree representation: Labels are the operations on internal vertices or the root and<br />

values (constants or simple variables) on the leaves.<br />

Example: 4 * 3 + 2 in RPN is 4 3 * 2 +, or<br />

• +<br />

• * • 2<br />

• 4 • 3<br />

204


Definition: Let G = (V,E) be a simple graph. A spanning tree <strong>of</strong> G is a subgraph<br />

<strong>of</strong> G that is a tree containing every vertex in G.<br />

Example: Your instructor wants his town, the states <strong>of</strong> Connecticut and New<br />

York, and New York City to keep the roads and highways cleared in <strong>of</strong> ice and<br />

snow connecting his house and Laguardia airport. A graph connecting each <strong>of</strong><br />

the relevant endpoints and connecting points can be made. The relevant agencies<br />

can use this graph when deciding how to keep roads open after a storm.<br />

• G • G • G<br />

• PC • PC • PC<br />

• RB • RB • RB<br />

• S • S • S<br />

WB • • LGA WB • • LGA WB • • LGA<br />

205<br />

Theorem: A simple graph G is connected if and only if it has a spanning tree T.<br />

Example: Multicasting over networks.<br />

Note: Constructing a spanning tree can be done in many different ways,<br />

including some very inefficient ones. Two common ways are depth first and<br />

breadth first searches.<br />

Notation: Let visit(v) mean that we keep track <strong>of</strong> when we first go to vertex v<br />

until we return to v using a backtrack.<br />

procedure visit( G = (V,E): connected graph, T: tree )<br />

for each w,V adjacent to v and not yet in T<br />

add w and edge {v,w} to T<br />

visit(w, T)<br />

206


procedure depth_first( G = (V,E): connected graph )<br />

T := tree with only some single v,V<br />

visit( v, T )<br />

{ T is a spanning tree. }<br />

procedure breadth_first( G = (V,E): connected graph )<br />

T := tree with only some single v,V<br />

L := v<br />

while L - /<br />

Remove first vertex v,L<br />

for each neighbor w,V <strong>of</strong> v<br />

if w6L and w6T then<br />

Add w to the end <strong>of</strong> L<br />

Add w and edge {v,w} to T<br />

{ T is a spanning tree. }<br />

207<br />

Theorem: Let G = (V,E) be a connected graph with |V| = n. Then either depth<br />

first or breadth first takes O(e), or O(n 2 ), steps to construct a spanning tree.<br />

Pro<strong>of</strong>: For a simple graph, |E| ; n(n41)/2.<br />

Bactracking applications:<br />

• Graph coloring: can a graph be colored in n colors<br />

• n-Queens problem: find places on a n$n board so n queens are toothless<br />

n<br />

! $<br />

• Sums <strong>of</strong> subsets: Given " x<br />

# i<br />

% , where xi ,N, find a subset whose sum is M<br />

&i=1<br />

• Web crawlers: search all hyperlinks on a network efficiently<br />

208


Definition: A minimum spanning tree in a connected weighted graph is a<br />

spanning tree that has the smallest possible sum <strong>of</strong> weights on its edges.<br />

procedure Pim( G = (V,E): weighted connected undirected graph )<br />

T := minimum weighted edge<br />

for i := 1 to |V|42<br />

e := an edge <strong>of</strong> minimum weight incident to a vertex in T not<br />

forming a simple circuit in T if it is added to T<br />

T := T with e added<br />

{ T is a minimum spanning tree. }<br />

procedure Kruskal(G = (V,E): weighted connected undirected graph )<br />

T := empty graph<br />

for i := 1 to |V|41<br />

e := an edge in G <strong>of</strong> minimum weight that does not form a simple<br />

circuit in T if it is added to T<br />

T := T with e added<br />

{ T is a minimum spanning tree. }<br />

209<br />

Theorem: The cost <strong>of</strong> Pim’s algorithm is O(|E|log|V|). The cost <strong>of</strong> Kruskal’s<br />

algorithm is O(|E|log|E|).<br />

Definition: A graph G = (V,E) is sparse if |E| is very small with respect to |V| 2 .<br />

Comment: Sparse is ill defined intentionally. There are different degrees <strong>of</strong><br />

sparseness, too (highly sparse, very sparse, somewhat sparse, hardly sparse, not<br />

sparse, and the Scottish favorite, a wee bit sparse). Matrices can also be<br />

categorized as (fill in the blank type) sparse based on their graphs.<br />

Note: When G is sparse, Kruskal’s algorithm is much less expensive than Pim’s<br />

algorithm.<br />

210


Boolean Algebra<br />

Definition: Let B = { 0, 1 } and B n = B$B$…$B ($ n times). A Boolean<br />

variable x,B. A Boolean function <strong>of</strong> degree n is a function f: B n %B.<br />

Notation: For x,y,B, define<br />

• x + y = x " y<br />

• x C y = x ! y<br />

• x = ¬x<br />

using the logic predicate notation from the class notes (circa pages 5-6).<br />

Definition: A Boolean algebra is a set B with binary operators " and !, the<br />

unitary operator ¬, elements 0 and 1, and the following laws holding for all<br />

elements <strong>of</strong> B: identity, complement, associative, commutative, and distributive.<br />

211<br />

Logic gates: Boolean algebra is used to model electronic logic gates, such as<br />

AND, OR, NOT, XAND, XOR, … We design functions with Boolean algebras<br />

and operators. Then we build them using the right gates and wiring patterns.<br />

Typical symbols for AND, OR, and NOT are the following:<br />

AND: OR: NOT:<br />

These are two input AND and OR gates. Versions <strong>of</strong> these gates exist for more<br />

than two inputs and perform the expected operation on all <strong>of</strong> the inputs to get<br />

one output.<br />

Definition: A simple output circuit takes the input(s) and has one output. A<br />

multiple output circuit takes input(s) and has multiple outputs.<br />

Example: The gates above are simple output circuits.<br />

212


Examples: Most circuits are <strong>of</strong> the multiple output variety.<br />

• A half adder adds two bits producing a single bit sum plus a single bit carry:<br />

S := (x"y) ! (¬(x!y)) = x#y and C out := x!y. A half adder has two AND,<br />

one OR, and one NOT gates.<br />

• A full adder computes the complete two bit sum and carry out:<br />

S := (x#y)#c in , where C in is the incoming carry. The carry is quite<br />

complicated: C out := (xCy) + (yCC in ) + (C in Cx). A full adder has two half<br />

adders and an OR gate.<br />

• Ripple adders, lookahead adders, and lookahead carry circuits use many<br />

bits as input to implement integer adders.<br />

Half adder<br />

Full adder<br />

213<br />

Note: Minimizing the Boolean algebra function means a less complicated<br />

circuit. Simpler circuits are cheaper to make, take up less space, and are usually<br />

faster. Add in how many devices are made and there is potentially a lot <strong>of</strong><br />

money involved in saving even a small amount <strong>of</strong> circuitry.<br />

There are two basic methods for simplifying Boolean algebra functions:<br />

• Karnaugh maps (or K-maps) provide a graphical or table driven technique<br />

that works up to about 6 variables before it becomes too complicated.<br />

• The Quine-McCluskey algorithm works with any number <strong>of</strong> variables.<br />

Going to Google and searching on Karnaugh map s<strong>of</strong>tware leads to a number <strong>of</strong><br />

programs to do some <strong>of</strong> the work for you.<br />

Definition: A literal <strong>of</strong> a Boolean variable is its value or its complement. A<br />

minterm <strong>of</strong> Boolean variables x 1 , x 2 , …, x n is a Boolean product <strong>of</strong> the {x i<br />

,x i<br />

}.<br />

Note: A minterm is just the product <strong>of</strong> n literals.<br />

214


Karnaugh maps: The area <strong>of</strong> a K-map rectangle is determined by the number <strong>of</strong><br />

variables (n) and how many (k) are used in a Boolean expression: 2 n4k . Common<br />

arrangements are<br />

• 2 variables: 2$2,<br />

• 3 variables: 4$2, and<br />

• 4 variables: 4$4.<br />

Each variable contributes two possibilities to each possibility <strong>of</strong> every other<br />

variable in the system. K-maps are organized so that all the possibilities <strong>of</strong> the<br />

system are arranged in a grid form and between two adjacent boxes only one<br />

variable can change value. Each square in a K-map corresponds to a minterm.<br />

Cover the ones on the map by rectangule that contain a number <strong>of</strong> boxes equal<br />

to a power <strong>of</strong> 2 (e.g., 4 boxes in a line, 4 boxes in a square, 8 boxes in a<br />

rectangle, etc.). Once the ones are covered, a term <strong>of</strong> a sum <strong>of</strong> products is<br />

produced by finding the variables that do not change throughout the entire<br />

covering, and taking a 1 to mean that variable and a 0 as the complement <strong>of</strong> that<br />

variable. Doing this for every covering produces a matching function.<br />

215<br />

Given a Boolean function f with inputs x 1 , …, x n , make a table with all possible<br />

inputs and outputs. Then create a K-map with the variables on the left and top<br />

sides <strong>of</strong> the rectangle. Look for 1’s. The rectangle is a torus, so look for wrap<br />

arounds, too.<br />

Example: f: B 4 %B with a corresponding K-map <strong>of</strong><br />

x 1 , x 2<br />

00 01 11 10<br />

00 0 0 1 1<br />

x 3 , 01 0 0 1 1<br />

x 4 11 0 0 0 1<br />

10 0 1 1 1<br />

The K-map is colored to try to find patterns in the Boolean expression that can<br />

be simplified. It is quite common to eliminate some <strong>of</strong> the Boolean variables<br />

using this approach. Use high quality s<strong>of</strong>tware if you use the K-map approach.<br />

216


Definition: An implicant is sum term or product term <strong>of</strong> one or more minterms<br />

in a sum <strong>of</strong> products. A prime implicant <strong>of</strong> a function is an implicant that cannot<br />

be covered by a more reduced (i.e., one with fewer literals) implicant.<br />

Note: Suppose f is a Boolean function and P is a product term. Then P is an<br />

implicant <strong>of</strong> f if f takes the value 1 whenever P takes the value 1. This is<br />

sometimes written as P ; f in the natural ordering <strong>of</strong> the Boolean algebra.<br />

Quine-McCluskey: This algorithm has two steps:<br />

1. Find all prime implicants <strong>of</strong> the function.<br />

2. Use those prime implicants in a prime implicant chart to find the essential<br />

prime implicants <strong>of</strong> the function as well as other prime implicants that are<br />

necessary to cover the function.<br />

The algorithm constructs a table and then simplifies the table. The method leads<br />

to computer implementations for large numbers <strong>of</strong> variables. Use high quality<br />

s<strong>of</strong>tware if you use the Quine-McCluskey approach.<br />

217

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!