Information Theory and Coding-HW 3 - Johns Hopkins University

Information Theory and Coding-HW 3 

V Balakrishnan 

Department of ECE 

Johns Hopkins University 

October 1, 2006 

1 Fano’s inequality 

Let us first maximize H(p) subject to 

p 1 = P e 

and 

m∑ 

p i = 1 − P e 

So we would have the unconstrained Lagrange’s equation as 

m∑ 

m∑ 

−(1 − P e ) log(1 − P e ) − p i log(p i ) − λ( p i − P e ) 

i=2 

i=2 

i=2 

Differentiating wrt p i for i > 1,we get that p i = K where K is a constant, which means that 

So we have 

which gives 

−(1 − P e ) log(1 − P e ) − 

m∑ 

i=2 

p i = 

P e 

m − 1 

( ) 

Pe 

p i log(p i ) ≤ −(1 − P e ) log(1 − P e ) − P e log 

m − 1 

which can be diluted to 

where 

H(P e ) + P e log(m − 1) ≥ H(p) 

P e ≥ H(p) − 1 

log(m − 1) 

H(p) = − 

m∑ 

p i log(p i ) 

i=1 

1

2 Logical order of ideas 

2.1 Part a 

Clearly the conditional version of the mutual information is derived from the conditional version 

of entropy. Though the chain rule for relative entropy is derived independently,it is not 

as widely used(or has practical implications) as the mutual information or entropy. So the 

order would be 

1 chain rule for H(X 1 , X 2 . . . X n ) 

2 chain rule for I(X 1 , X 2 . . . X n ; Y ) 

3 chain rule for D(p(x 1 . . . x n )||q(x 1 . . . x n )) 

2.2 Part b 

Clearly Jenson’s inequality is the strongest,then we have relative entropy following mutual information. 

This is because,we derive the fact that I(X; Y ) ≥ 0 by rewriting it as D(p(x, y)||p(x)p(y)). 

So the order is 

1 Jenson’s inequality 

2 D(f||g) ≥ 0 

3 I(X; Y ) ≥ 0 

3 Entropy of Missorted file 

Let X be any permutation of the numbers 1, 2 . . . n we can easily observe the following distribution 

⎧ 

1 

⎪⎨ Case I..if X(i)=j and X(j)=i and X(k)=k for k ≠ i, k ≠ j, |i − j| > 1 

n 2 2 

P (X) = Case II..if X(i)=j and X(j)=i and X(k)=k for k ≠ i, k ≠ j, |i − j| = 1 

n ⎪⎩ 

2 Case III..if X(k)=k for all k 

1 

n 

Imagine we have n numbers placed in a row and there are n+1 dots surrounding it. simply 

speaking (dot)1(dot)2(dot)3 . . . (dot)n(dot) 

If we pick a number,we dont want to place that number is any of the adjacent 2 dots(to not 

get the initial configuration) Now we can remove any of the n numbers and place them in any 

of the n − 1 dots. This gets us n(n-1) cases of which there are 2(n − 1) adjacent cases(Case 

II). this is because there are n-1 adjacent pairs and each adjacent case has 2 duplicates. 

So there are (n − 2)(n − 1) instances of case I,n − 1 instances of case II and 1 instance of case 

2

III Hence the entropy is 

(n − 2)(n − 1) log(n2 ) 

+ (n − 1) 2 ( ) n 

2 

n 2 n log + log(n) 

2 2 n 

which simplifies to 

( ) 2n − 1 

log(n) 

n 

− log(4) 

n 2 (n − 1) 

4 More Huffmann Codes 

The first step would be to club 2/15 and 2/15 to get 4/15. 

So we are left with 1/3,4/15,1/5,1/5. Now we club the 2 1/5’s to get 2/5,1/3,4/15 Now we 

club the 1/3,4/15 to get 2/5,3/5 So the codes would be 

Probability code 

1 

01 

3 

1 

11 

5 

1 

10 

5 

2 

000 

15 

2 

001 

15 

Since 2 2 < 5,it is obvious that the maximum length of the tree is 3. Since we need to have 

atleast 2 leaves whose length(while traversing from the root node) of equal length,we would 

have atleast 2 leaves of code-length 3.Now we are left with 4 leaves. After we club the 2 1/5’s 

we are left with only 1 completion which brings us to the codes that were used to represent 

1/3,4/15,1/5,1/5. 

5 Huffman 20 Questions 

there are 2 n possible sequences. Let b k be the bit representation of the number k(in decimal). 

Let b ki be the i th bit(0 or 1),starting from the LSB for the number k 

So the probability of the k th combination(of defective and good) would be 

In general let 

P n k = 

P j k = 

n∏ 

i=1 

j∏ 

i=1 

p b ki 

i (1 − p i ) 1−b ki 

p b ki 

i (1 − p i ) 1−b ki 

3

We can write the entropy of the distribution to be 

It is a simple observation that 

P n k = 

− 

2 n ∑ 

k=1 

P n k log 2 (P n k ) 

{ 

pn P n−1 

k(mod)2 n if k ≤ 2 n−1 

(1 − p n )P n−1 

k(mod)2 n if k > 2 n−1 

So the entropy can be rewritten as 

− 

2 n ∑ 

k=1 

2∑ 

n−1 

Pk n log 2 (Pk n ) = − 

which can be written as 

− 

2 n ∑ 

k=1 

k=1 

p n P n−1 

k 

2∑ 

n−1 

log(p n P n−1 

k 

) − 

k=1 

P n k log 2 (P n k ) = −p n log(p n ) − (1 − p n ) log(1 − p n ) − 

This is a recursion and the obvious solution to the recursion is 

− 

n∑ 

(p i log(p i ) + (1 − p i ) log(1 − p i )) 

i=1 

(1 − p n )P n−1 

k 

log((1 − p n )P n−1 

k 

) 

2∑ 

n−1 

k=1 

P n−1 

k 

log 2 (P n−1 

k 

) 

So the entropy is a good lower bound. Hence the answer for part (a) is 

n∑ 

− (p i log(p i ) + (1 − p i ) log(1 − p i )) 

i=1 

5.1 Part b 

If we do huffman coding,the longest question will be the one that has got the least probability. 

Clearly the fact that ”all objects are defective” has the lowest probability. The second lowest 

probability is for the fact that all objects,except the n th object are defective. 

So our last question to nature(god) would be 

”is the n th object good atleast ” 

4

5.2 Part c 

Clearly the upper bound will be attained when we are forced to assign integer lengths to all 

the questions. 

So the upper bound is 

= − 

− 

≤ 

≤ 

− 

k=1 

∑2 n 

k=1 

2 n ∑ 

k=1 

∑2 n 

P n k log 2 (P n k ) 

P n k ⌈log 2 (1/P n k )⌉ 

P n k (log 2 (1/P n k ) + 1) 

n∑ 

(p i log(p i ) + (1 − p i ) log(1 − p i )) + 1 

i=1 

n∑ 

(p i log(p i ) + (1 − p i ) log(1 − p i )) + 1 

i=1 

6 The game of Hi-Lo 

6.1 Part a 

The only constraint in this problem is that our question tree would be of a maximum length 

of 6.the number of nodes is 

1 + 2 + 2 2 + 2 3 + 2 4 + 2 5 = 63 so we can identify 63 possible numbers. 

Now the question boils down to which 63 numbers to choose. If we choose some 63 numbers. 

Let I 63 (x) = 1 is x is one of our chosen numbers and 0 otherwise 

So the expected income becomes 

∑100 

x=1 

p(x)v(x)I 63 (x) 

Obviously we will choose those 63 numbers for which p(X)v(X) is high. 

So the algorithm is 

• Enter the effective value of each number as p(X)v(X) 

• take out the top 63 numbers.according to their effective value 

• Sort the numbers so that the numbers are in ascending order. 

• now we have the numbers x 1 < x 2 < x 3 . . . x 63 with 1 ≤ x i ≤ 100 

• the first question would be Is X=x 32 

5

• If the answer is too high,the next question would be ”Is X=x 16 ” .If the answer is too 

low,the next question would be ”Is X=x 48 ”.etc 

To be more precise,we will construct a binary tree with these chosen 63 numbers as the nodes 

and ask questions accordingly. 

6.2 Part b 

Let Y be some permutation of 1, 2, . . . , 100.Our objective is to decide this permutation. so we 

will keep asking questions from i=1 to 99 like ”Is X=Y(i) ” 

The expected return is simply 

∑99 

i=1 

p (y(i)) (v(y(i)) − i) + p (y(100)) (v(y(100)) − 99) 

We have subtracted 99 from the end because once we have asked 99 question and recieved no 

always ,we know for sure that the answer is y(100). The expected return can be written as 

( 

∑100 

99 

) 

∑ 

p(y(i))v(y(i)) − ip(y(i)) + 99p(y(100)) 

i=1 

i=1 

Since the first term ∑ 100 

i=1 

p(y(i))v(y(i)) is a constant,our objective boils down to minimizing 

( ∑99 

) 

ip(y(i)) + 99p(y(100)) 

i=1 

By rearrangement inequality,we know that 

if we have a 1 ≤ a 2 ≤ a 3 . . . a n and b 1 ≤ b 2 ≤ b 3 . . . b n then for any permutation (r 1 , r 2 , . . . , r n ) 

of b 1 ≤ b 2 ≤ b 3 . . . b n ,we have 

a 1 b 1 + . . . a n b n ≥ a 1 r 1 + a 2 r 2 . . . a n r n ≥ a 1 b n + a 2 b n−1 + . . . + a n b 1 

Clearly the numbers 1, 2 . . . , 99, 99 are our a’s and p(1), p(2) . . . p(100) are our r’s.By applying 

the rearrangement inequality,we have that we need to sort p’s in the descending order.More 

formally,we would permute 1, 2...100 to y(1), y(2) . . . y(100) such that 

p(y(1)) ≥ p(y(2)) ≥ . . . ≥ p(y(100)) 

and we need to ask questions in the order y(1), y(2) . . ..ie .the first question is ”Is X=y(1)” 

and so on. and the expected outcome(maximum) would be 

( 

∑100 

99 

) 

∑ 

p(i)v(i) − ip(y(i)) + 99p(y(100)) 

i=1 

i=1 

6

6.3 Part c 

Now let q(i) = p(y(i)) just for notational convinience So we have q(1) ≥ q(2) ≥ q(3) . . . ≥ 

q(100) 

and we already have that 

1 < 2 < 3 . . . < 98 < 99 ≤ 99 

By Chebyshev’s sum inequaality,we have that 

if a 1 ≥ a 2 ≥ a 3 . . . a n and b 1 ≤ b 2 ≤ b 3 . . . b n , then we have that 

( 

n∑ 

n∑ 

) ( n∑ 

) 

n a k b k ≤ a k b k 

k=1 

k=1 k=1 

Setting q(k) = a k and b k = k for k = 1, . . . 99 and b 100 = 99,we get 

( 99 

) ( ∑ 99 

) ( ∑ n∑ 

) 

100 kq(k) + 100q(100) ≤ k + 99 q k 

k=1 

k=1 

k=1 

we also have ( n∑ 

k=1 

q k 

) 

= 1 

as they are probabilities So we get 

( ∑99 

) ( 99 

) 

∑ 1 

kq(k) + 100q(100) ≤ k + 99 . 

100 

k=1 

k=1 

Now if q(i) = 1 

1 

then we see that the upper-bound is achieved.Hence q(i) = 

100 

distribution that the computer will choose. 

100 

will be the 

In other words,the computer will choose the distribution having the maximum entropy(or it 

will choose the uniform distribution). 

7

Information Theory and Coding-HW 3 - Johns Hopkins University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?