01.01.2015 Views

Information Theory and Coding-HW 3 - Johns Hopkins University

Information Theory and Coding-HW 3 - Johns Hopkins University

Information Theory and Coding-HW 3 - Johns Hopkins University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Information</strong> <strong>Theory</strong> <strong>and</strong> <strong>Coding</strong>-<strong>HW</strong> 3<br />

V Balakrishnan<br />

Department of ECE<br />

<strong>Johns</strong> <strong>Hopkins</strong> <strong>University</strong><br />

October 1, 2006<br />

1 Fano’s inequality<br />

Let us first maximize H(p) subject to<br />

p 1 = P e<br />

<strong>and</strong><br />

m∑<br />

p i = 1 − P e<br />

So we would have the unconstrained Lagrange’s equation as<br />

m∑<br />

m∑<br />

−(1 − P e ) log(1 − P e ) − p i log(p i ) − λ( p i − P e )<br />

i=2<br />

i=2<br />

i=2<br />

Differentiating wrt p i for i > 1,we get that p i = K where K is a constant, which means that<br />

So we have<br />

which gives<br />

−(1 − P e ) log(1 − P e ) −<br />

m∑<br />

i=2<br />

p i =<br />

P e<br />

m − 1<br />

( )<br />

Pe<br />

p i log(p i ) ≤ −(1 − P e ) log(1 − P e ) − P e log<br />

m − 1<br />

which can be diluted to<br />

where<br />

H(P e ) + P e log(m − 1) ≥ H(p)<br />

P e ≥ H(p) − 1<br />

log(m − 1)<br />

H(p) = −<br />

m∑<br />

p i log(p i )<br />

i=1<br />

1


2 Logical order of ideas<br />

2.1 Part a<br />

Clearly the conditional version of the mutual information is derived from the conditional version<br />

of entropy. Though the chain rule for relative entropy is derived independently,it is not<br />

as widely used(or has practical implications) as the mutual information or entropy. So the<br />

order would be<br />

1 chain rule for H(X 1 , X 2 . . . X n )<br />

2 chain rule for I(X 1 , X 2 . . . X n ; Y )<br />

3 chain rule for D(p(x 1 . . . x n )||q(x 1 . . . x n ))<br />

2.2 Part b<br />

Clearly Jenson’s inequality is the strongest,then we have relative entropy following mutual information.<br />

This is because,we derive the fact that I(X; Y ) ≥ 0 by rewriting it as D(p(x, y)||p(x)p(y)).<br />

So the order is<br />

1 Jenson’s inequality<br />

2 D(f||g) ≥ 0<br />

3 I(X; Y ) ≥ 0<br />

3 Entropy of Missorted file<br />

Let X be any permutation of the numbers 1, 2 . . . n we can easily observe the following distribution<br />

⎧<br />

1<br />

⎪⎨ Case I..if X(i)=j <strong>and</strong> X(j)=i <strong>and</strong> X(k)=k for k ≠ i, k ≠ j, |i − j| > 1<br />

n 2 2<br />

P (X) = Case II..if X(i)=j <strong>and</strong> X(j)=i <strong>and</strong> X(k)=k for k ≠ i, k ≠ j, |i − j| = 1<br />

n ⎪⎩<br />

2 Case III..if X(k)=k for all k<br />

1<br />

n<br />

Imagine we have n numbers placed in a row <strong>and</strong> there are n+1 dots surrounding it. simply<br />

speaking (dot)1(dot)2(dot)3 . . . (dot)n(dot)<br />

If we pick a number,we dont want to place that number is any of the adjacent 2 dots(to not<br />

get the initial configuration) Now we can remove any of the n numbers <strong>and</strong> place them in any<br />

of the n − 1 dots. This gets us n(n-1) cases of which there are 2(n − 1) adjacent cases(Case<br />

II). this is because there are n-1 adjacent pairs <strong>and</strong> each adjacent case has 2 duplicates.<br />

So there are (n − 2)(n − 1) instances of case I,n − 1 instances of case II <strong>and</strong> 1 instance of case<br />

2


III Hence the entropy is<br />

(n − 2)(n − 1) log(n2 )<br />

+ (n − 1) 2 ( ) n<br />

2<br />

n 2 n log + log(n)<br />

2 2 n<br />

which simplifies to<br />

( ) 2n − 1<br />

log(n)<br />

n<br />

− log(4)<br />

n 2 (n − 1)<br />

4 More Huffmann Codes<br />

The first step would be to club 2/15 <strong>and</strong> 2/15 to get 4/15.<br />

So we are left with 1/3,4/15,1/5,1/5. Now we club the 2 1/5’s to get 2/5,1/3,4/15 Now we<br />

club the 1/3,4/15 to get 2/5,3/5 So the codes would be<br />

Probability code<br />

1<br />

01<br />

3<br />

1<br />

11<br />

5<br />

1<br />

10<br />

5<br />

2<br />

000<br />

15<br />

2<br />

001<br />

15<br />

Since 2 2 < 5,it is obvious that the maximum length of the tree is 3. Since we need to have<br />

atleast 2 leaves whose length(while traversing from the root node) of equal length,we would<br />

have atleast 2 leaves of code-length 3.Now we are left with 4 leaves. After we club the 2 1/5’s<br />

we are left with only 1 completion which brings us to the codes that were used to represent<br />

1/3,4/15,1/5,1/5.<br />

5 Huffman 20 Questions<br />

there are 2 n possible sequences. Let b k be the bit representation of the number k(in decimal).<br />

Let b ki be the i th bit(0 or 1),starting from the LSB for the number k<br />

So the probability of the k th combination(of defective <strong>and</strong> good) would be<br />

In general let<br />

P n k =<br />

P j k =<br />

n∏<br />

i=1<br />

j∏<br />

i=1<br />

p b ki<br />

i (1 − p i ) 1−b ki<br />

p b ki<br />

i (1 − p i ) 1−b ki<br />

3


We can write the entropy of the distribution to be<br />

It is a simple observation that<br />

P n k =<br />

−<br />

2 n ∑<br />

k=1<br />

P n k log 2 (P n k )<br />

{<br />

pn P n−1<br />

k(mod)2 n if k ≤ 2 n−1<br />

(1 − p n )P n−1<br />

k(mod)2 n if k > 2 n−1<br />

So the entropy can be rewritten as<br />

−<br />

2 n ∑<br />

k=1<br />

2∑<br />

n−1<br />

Pk n log 2 (Pk n ) = −<br />

which can be written as<br />

−<br />

2 n ∑<br />

k=1<br />

k=1<br />

p n P n−1<br />

k<br />

2∑<br />

n−1<br />

log(p n P n−1<br />

k<br />

) −<br />

k=1<br />

P n k log 2 (P n k ) = −p n log(p n ) − (1 − p n ) log(1 − p n ) −<br />

This is a recursion <strong>and</strong> the obvious solution to the recursion is<br />

−<br />

n∑<br />

(p i log(p i ) + (1 − p i ) log(1 − p i ))<br />

i=1<br />

(1 − p n )P n−1<br />

k<br />

log((1 − p n )P n−1<br />

k<br />

)<br />

2∑<br />

n−1<br />

k=1<br />

P n−1<br />

k<br />

log 2 (P n−1<br />

k<br />

)<br />

So the entropy is a good lower bound. Hence the answer for part (a) is<br />

n∑<br />

− (p i log(p i ) + (1 − p i ) log(1 − p i ))<br />

i=1<br />

5.1 Part b<br />

If we do huffman coding,the longest question will be the one that has got the least probability.<br />

Clearly the fact that ”all objects are defective” has the lowest probability. The second lowest<br />

probability is for the fact that all objects,except the n th object are defective.<br />

So our last question to nature(god) would be<br />

”is the n th object good atleast ”<br />

4


5.2 Part c<br />

Clearly the upper bound will be attained when we are forced to assign integer lengths to all<br />

the questions.<br />

So the upper bound is<br />

= −<br />

−<br />

≤<br />

≤<br />

−<br />

k=1<br />

∑2 n<br />

k=1<br />

2 n ∑<br />

k=1<br />

∑2 n<br />

P n k log 2 (P n k )<br />

P n k ⌈log 2 (1/P n k )⌉<br />

P n k (log 2 (1/P n k ) + 1)<br />

n∑<br />

(p i log(p i ) + (1 − p i ) log(1 − p i )) + 1<br />

i=1<br />

n∑<br />

(p i log(p i ) + (1 − p i ) log(1 − p i )) + 1<br />

i=1<br />

6 The game of Hi-Lo<br />

6.1 Part a<br />

The only constraint in this problem is that our question tree would be of a maximum length<br />

of 6.the number of nodes is<br />

1 + 2 + 2 2 + 2 3 + 2 4 + 2 5 = 63 so we can identify 63 possible numbers.<br />

Now the question boils down to which 63 numbers to choose. If we choose some 63 numbers.<br />

Let I 63 (x) = 1 is x is one of our chosen numbers <strong>and</strong> 0 otherwise<br />

So the expected income becomes<br />

∑100<br />

x=1<br />

p(x)v(x)I 63 (x)<br />

Obviously we will choose those 63 numbers for which p(X)v(X) is high.<br />

So the algorithm is<br />

• Enter the effective value of each number as p(X)v(X)<br />

• take out the top 63 numbers.according to their effective value<br />

• Sort the numbers so that the numbers are in ascending order.<br />

• now we have the numbers x 1 < x 2 < x 3 . . . x 63 with 1 ≤ x i ≤ 100<br />

• the first question would be Is X=x 32 <br />

5


• If the answer is too high,the next question would be ”Is X=x 16 ” .If the answer is too<br />

low,the next question would be ”Is X=x 48 ”.etc<br />

To be more precise,we will construct a binary tree with these chosen 63 numbers as the nodes<br />

<strong>and</strong> ask questions accordingly.<br />

6.2 Part b<br />

Let Y be some permutation of 1, 2, . . . , 100.Our objective is to decide this permutation. so we<br />

will keep asking questions from i=1 to 99 like ”Is X=Y(i) ”<br />

The expected return is simply<br />

∑99<br />

i=1<br />

p (y(i)) (v(y(i)) − i) + p (y(100)) (v(y(100)) − 99)<br />

We have subtracted 99 from the end because once we have asked 99 question <strong>and</strong> recieved no<br />

always ,we know for sure that the answer is y(100). The expected return can be written as<br />

(<br />

∑100<br />

99<br />

)<br />

∑<br />

p(y(i))v(y(i)) − ip(y(i)) + 99p(y(100))<br />

i=1<br />

i=1<br />

Since the first term ∑ 100<br />

i=1<br />

p(y(i))v(y(i)) is a constant,our objective boils down to minimizing<br />

( ∑99<br />

)<br />

ip(y(i)) + 99p(y(100))<br />

i=1<br />

By rearrangement inequality,we know that<br />

if we have a 1 ≤ a 2 ≤ a 3 . . . a n <strong>and</strong> b 1 ≤ b 2 ≤ b 3 . . . b n then for any permutation (r 1 , r 2 , . . . , r n )<br />

of b 1 ≤ b 2 ≤ b 3 . . . b n ,we have<br />

a 1 b 1 + . . . a n b n ≥ a 1 r 1 + a 2 r 2 . . . a n r n ≥ a 1 b n + a 2 b n−1 + . . . + a n b 1<br />

Clearly the numbers 1, 2 . . . , 99, 99 are our a’s <strong>and</strong> p(1), p(2) . . . p(100) are our r’s.By applying<br />

the rearrangement inequality,we have that we need to sort p’s in the descending order.More<br />

formally,we would permute 1, 2...100 to y(1), y(2) . . . y(100) such that<br />

p(y(1)) ≥ p(y(2)) ≥ . . . ≥ p(y(100))<br />

<strong>and</strong> we need to ask questions in the order y(1), y(2) . . ..ie .the first question is ”Is X=y(1)”<br />

<strong>and</strong> so on. <strong>and</strong> the expected outcome(maximum) would be<br />

(<br />

∑100<br />

99<br />

)<br />

∑<br />

p(i)v(i) − ip(y(i)) + 99p(y(100))<br />

i=1<br />

i=1<br />

6


6.3 Part c<br />

Now let q(i) = p(y(i)) just for notational convinience So we have q(1) ≥ q(2) ≥ q(3) . . . ≥<br />

q(100)<br />

<strong>and</strong> we already have that<br />

1 < 2 < 3 . . . < 98 < 99 ≤ 99<br />

By Chebyshev’s sum inequaality,we have that<br />

if a 1 ≥ a 2 ≥ a 3 . . . a n <strong>and</strong> b 1 ≤ b 2 ≤ b 3 . . . b n , then we have that<br />

(<br />

n∑<br />

n∑<br />

) ( n∑<br />

)<br />

n a k b k ≤ a k b k<br />

k=1<br />

k=1 k=1<br />

Setting q(k) = a k <strong>and</strong> b k = k for k = 1, . . . 99 <strong>and</strong> b 100 = 99,we get<br />

( 99<br />

) ( ∑ 99<br />

) ( ∑ n∑<br />

)<br />

100 kq(k) + 100q(100) ≤ k + 99 q k<br />

k=1<br />

k=1<br />

k=1<br />

we also have ( n∑<br />

k=1<br />

q k<br />

)<br />

= 1<br />

as they are probabilities So we get<br />

( ∑99<br />

) ( 99<br />

)<br />

∑ 1<br />

kq(k) + 100q(100) ≤ k + 99 .<br />

100<br />

k=1<br />

k=1<br />

Now if q(i) = 1<br />

1<br />

then we see that the upper-bound is achieved.Hence q(i) =<br />

100<br />

distribution that the computer will choose.<br />

100<br />

will be the<br />

In other words,the computer will choose the distribution having the maximum entropy(or it<br />

will choose the uniform distribution).<br />

7

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!