Source Coding

Source Coding 

Prof. Ja-Ling Wu 

Department of Computer Science 

and Information Engineering 

National Taiwan University

Source 

sequence of 

source symbols 

ui Encoder 

sequence of 

code symbols 

Source alphabet Code alphabet 

{ u , u2, 

u } 

{ p p , p } L 

U 1 L M ⎫ △ 

⎬ = 

P M ⎭ ⎬ 

1, 

2 

⎧ = , 

⎨ 

⎩ = , 

X 

{ a a , a } L { a a , a } 

A , = A , 

1 , 2 

n 

a i 

2

( i) 

( i) 

message g X , X , , X 

( i) 

△ 

ui → 1 2 L N = X i 

i i 

where 

X 

( i) 

k 

∈ A, 

⎧ X i : codeword 

⎨ 

⎩N 

i : length of the codeword X 

Average length of codeword 

N 

= 

= 

M 

∑ 

i= 

1 

M 

∑ 

i= 

1 

p(X 

p( 

u 

i 

i ) Ni 

) N 

i 

= 

M 

∑ 

i= 

1 

p 

i 

N 

i 

k 

= 1, 

2, 

L, 

N 

i = 1 , L , M 

i 

i 

3

Ex: 

X 

⎧ ⎧X 

⎨ 

⎩X 

1 

⎧ 

⎫ 

⎪ 

u1 

u2 

u3 

u4 

u5 

u6 

u7 

u8 

⎪ 

= ⎨ 

1 1 1 1 1 1 1 1 

⎬ 

⎪ 

⎪ 

⎩ 4 4 8 8 16 16 16 16 ⎭ 

2 

2 

X 1 (1) X2 (1) 

= 00 

= 01 

−2 

⎧ ⎧X 

⎨ 

⎩X 

3 

4 

2 

= 100 

= 101 

−3 

⎧ X 5 = 1100 

⎪ 

X 6 = 1101 

⎨ 

⎪X 

7 = 1110 

⎪ 

⎩ 

⎪ 

⎩ X 8 = 1111 

2 

−4 

0 1 

X 1 2 X 

0 1 

0 

0 

X 3 4 

1 

1 

X 

0 1 

0 1 0 1 

X 5 6 X 7 X 8 X 

4

01 { 1110 {{{ 1100 101 00 { 100 { 

u 

2 

N 

= 

u 

7 

1 

4 

u 

5 

× 2 + 

1 

4 

u 

4 

u 

× 2 + 

1 

1 

8 

u 

3 

× 3+ 

: 

1 

8 

uniquely i l ddecodable d bl 

andd 

iinstantane t t ous ddecodable d bl 

× 3 + 

1 

16 

× 4× 

4 = 

2. 

75 

bits 

5

Entropy of information sources: 

H ( x) 

In general, general 

⇒ 

= 

8 

∑ 

i i= 

1 

p 

i 

log 

p 

i 

= 

2. 

75 

bit 

symbol 

H ( x) 

N ≥ , n : size of the code alphabet p 

llog 

n 

Entropy 

codeword d d 

provides 

llength h 

the 

ffor 

lower 

a 

source 

bound 

code d 

of 

the 

average 

6

Only when = ⇒ = ⇒ 

− Ni 

n N H (x) 

pf: H x pi 

p log 

) ( − 

N 

pi n 

M 

= ∑ 

i= 

1 

M 

= −∑ 

i= 

1 

M 

n 

i 

i log n 

−N 

−N 

i 

M 

−N 

i = ∑ Ni 

n (log n) 

= (log n) 

∑ N 

i= 

1 

M 

M 

= ∑ Ni 

pi 

= ∑ N 

i= 

1 i= 

1 

H ( x ) 

∴ 

N = 

log n 

i 

n 

−N 

i 

i= 

1 

i 

n 

100% efficiency 

bbe 

occurredd 

only l 

−N 

when the 

iis 

a negtive ti 

of n. 

i 

prob. 

dist. 

power 

7

uniquely 

decodable decod b e 

not instantaneous 

ddecodable d bl 

instantaneous 

decodable 

Not uniquely 

decodable 

8

Theorem : 

Let a code have codeword lengths N , N 2 2, 

L , N 

and have n symbols in the code alphabet. If the 

code d 

iis 

uniquely i l 

∑ 

1 

− 

M 

n 

i i= 

1 

ddecodable, d bl 

N i 

≤ 

1 

then h the h 

must 

be 

1 M 

KKraft f 

satisfied. 

iinequality li 

9

Lemma1: 

A uniquely decodable code is a prefix code (prefix-free 

code) d)ifih if it has the h prefix fi property, which hi h requires i that h 

no codeword be a proper prefix of any other codeword. 

LLemma2: 2 IInstantaneous t t decodable d d bl code. d 

(i) uniquely decodable (prefix-free code) 

(ii) Kraft inequality hold 

10

C2 C3 C4 C5 

S1 

00 0 0 1 

S2 01 10 01 01 

S3 

10 110 011 011 

S S4 

11 111 0111 0111 

4 

prefix 

uniquely 

- free code 

decodable 

uniquely decodable 

+ 

prefix property 

→ 

←× 

( C ) 

4 

⇒ 

→ 

←× 

( C 

5 

uniquely 

) 

Kraft 

instantane ous 

decodable 

inequality 

decodable 

11

Encoding gAlgorithm g ( (based on the Kraft inequality) q y) 

Assume 2 

Let 

l1 ≤ l ≤ L ≤ lq 

( 1 ≤ i q) 

−li 

⎧ Si 

+ 1 = Si 

+ r < 

⎨ 

⎩S1 

= 0 

⇒ Si < S j for 

i 

< 

j 

12

i 

l 

r 

S 

S 

− 

+ 

⇒ 

i 

i 

i 

i 

i 

r 

S 

S 

⇓ 

⇓ 

+ 

+ 

= 

⇒ 

+ 

X 

X 1 

1 

i 

i 

of 

prefix 

a 

not 

is 

1 

+ 

⇒ X 

X 

i 

l 

(i) 

-l 

i 

i 

C to 

due 

changed 

be 

will 

since 

p 1 

+ 

i 

-l 

r 

of 

addition 

the 

13

l 

l 

S = 

1 

0 

l 

l 

l 

l 

l 

r 

r 

r 

S 

S 

r 

r 

S 

S 

− 

− 

− 

− 

− 

+ 

= 

+ 

= 

= 

+ 

= 

2 

3 

1 

2 

2 

1 

2 

1 

1 

q 

q 

l 

l 

l 

l 

S 

S 

− 

− 

− 

− 

2 

3 

1 

2 

1 

1 

M 

q 

i 

q 

q 

l 

q 

l 

l 

l 

l 

l 

q 

q 

r 

r 

r 

r 

S 

S 

− 

− 

− 

⎟ 

⎞ 

⎜ 

⎛ 

+ 

+ 

+ 

= 

+ 

= 

∑ 

− 

− 

1 

1 

2 

1 

1 

L 

q 

i 

i 

l 

r 

r 

= 

− 

⎟ 

⎠ 

⎜ 

⎝ 

= ∑ 1 

1 

i ∑ q 

l 

← uniquely decodable 

1 

assumption 

By 

1 

≤ 

= 

− 

∑ r 

q 

i 

l i 

q y 

1 

1 

< 

− 

= 

⇒ 

− 

= 

− 

∑ 

r 

r 

S 

l 

q 

i 

l 

q 

q 

i 

14 

2 

for 

, 

1 

all ≥ 

< 

⇒ i 

S i

− ar ary nnumber mber representa tion of S Si 

S 

i 

( i) 

−1 

( i) 

−2 

( i) 

−li 

= C−1 

r + C−2 

r + L+ 

C−l 

r + L 

{ 0 0, 

1 1, 

, r 1 } 

( i ) 

where Cz 

∈ L r 

or 

C − 

( ( i) 

( i) 

( i) 

. , , , , ) C C C L 

S = − L 

i −1 

−2 

li 

codeword 

X 

= L 

( i) 

( i) 

( i) 

1 2 l C C C− − − 

i i 

r 

i 

15

So, this algorithm guarantees for 

prod producing cing instantaneous instantaneo s decodable codes codes. 

Ex: 

l i 

r 

= 

= 

check 

S 

S 

S 

S 

S 

S 

S 

S 

1 

2 

3 

4 

5 

6 

7 

8 

2, 

2, 

3, 

3, 

4, 

4, 

4, 

4 

2 

= 

= 

= 

= 

= 

= 

0 

S 

S 

S 

S 

S 

: 2 × 2 

1 

= S 

= S 

2 

3 

4 

5 

6 

7 

+ 

+ 

2 

+ 2 

+ 

2 

+ 2 

2 

+ 2 

+ 

2 

−2 

−2 

− 2 

−3 

− 3 

−4 

−4 

−4 

+ 2 × 2 

= 

= 

= 

= 

= 

= 

= 

= 

−3 

+ 4 × 2 

−4 

≤ 1 

( . 0000 L ) 2 X 1 = 00 

( . 0100 L) 

2 X 2 = 01 

( . 1000 L ) 2 X 3 = 100 

( . 1010 L) 

2 X 4 = 101 

( . 1 100 L ) 2 X 5 = 1100 

( . 11010 

L) 

2 X 6 = 1101 

( . 1 110 L ) 2 X 7 = 1110 

( . 11110 

L) 

X = 1111 

Remark: (1) instantaneous decodable 

(2) this encoding procedure is independent of p i 

2 

8 

16

Noise in Huffman coding Probabilities: 

Suppose that the estimate of the probs. p i are not accurate. 

How much does the average code length suffer? 

Let i be the original Huffman code design probs. 

d 

p 

and p ′ i = pi 

+ ei 

be the probs. for the source that is actually used. 

17

∑ q 

(1) 

0 

1 

Cl l 

∑ 

∑ 

∑ = 

= 

q 

q 

i 

i 

e 

q 

(1) 

0 

Clearly, 

1 

L 

∑ 

∑ 

= 

= 

′ 

= 

= i 

i 

i 

i 

e 

p 

p 

get 

we 

error 

the 

of 

size 

the 

of 

measure 

one 

As 

1 

Since 

1 

1 

∑ = 

q 

i 

i 

e 

e 

(2) 

1 

get 

we 

, 

error 

the 

of 

size 

the 

of 

measure 

one 

As 

2 

2 

L 

σ 

∑ = 

i 

i 

q 

symbol 

per 

length 

average 

new 

the 

Now 

() 

1 

∑ 

∑ 

∑ 

+ 

= 

′ 

= 

′ 

i 

i 

i 

i 

i 

i 

i 

i 

i 

e 

l 

q 

p 

l 

q 

p 

l 

q 

L 

1 

1 

1 

∑ 

+ 

= 

i 

i 

i 

i 

i 

i 

e 

l 

q 

L 

q 

q 

q 

1 

18 

i 

q

the 

resort to 

we 

(2), 

and 

(1) 

, 

on 

conditions 

two 

With the i 

e 

1 

0 

1 

1 

cases. 

extreme 

the 

find 

to 

multiplier 

Lagrange 

of 

method 

2 

2 

e 

e 

e 

l ⎟ ⎟⎞ 

⎜ 

⎜ 

⎛ 

− 

− 

⎟ 

⎟ 

⎞ 

⎜ 

⎜ 

⎛ 

− 

− 

= ∑ 

∑ 

∑ 

σ 

μ 

λ 

L 

) 

2 

1 

( 

0 

0 

i 

i 

i 

i 

i 

i 

i 

q 

i 

e 

q 

e 

q 

e 

l 

q 

= 

= 

∂ 

⎟ 

⎟ 

⎠ 

⎜ 

⎜ 

⎝ 

⎟ 

⎟ 

⎠ 

⎜ 

⎜ 

⎝ 

= ∑ 

∑ 

∑ 

σ 

μ 

λ 

L 

L 

L 

) 

, 

, 

2 

, 

1 

( 

0 

i 

l 

q 

i 

l 

= 

= 

∂ 

∑ 

L 

2 

and 

1 

2 

i 

i 

i 

i 

i 

e 

e 

l 

l 

q 

= 

= 

⇒ 

∑ 

∑ 

∑ 

μ 

λ 

1 

1 

1 

2 

2 

2 

2 

2 

i 

i 

e 

⎥ 

⎤ 

⎢ 

⎡ 

⎟ 

⎞ 

⎜ 

⎛ 

⎟ 

⎞ 

⎜ 

⎛ 

∑ 

∑ 

∑ 

∑ 

[ ] [ ] 

) 

( 

of 

ariance 

of 

ariance 

1 

1 

1 2 

2 

i 

i 

i 

i 

i 

i 

i 

l 

l 

q 

l 

q 

e 

l 

q 

⋅ 

⎥ 

⎥ 

⎦ 

⎢ ⎢ 

⎣ 

⎟ 

⎠ 

⎞ 

⎜ 

⎝ 

⎛ 

− 

= 

⎟ 

⎟ 

⎠ 

⎞ 

⎜ 

⎜ 

⎝ 

⎛ 

⇒ ∑ 

∑ 

∑ 

σ 

19 

[ ] [ ] 

) 

( 

of 

variance 

of 

variance i 

i 

e 

l ⋅ 

=

The more variable the l i , the more harm the 

errors in the estimates of the p i can cause in 

the average of the symbol length. 

20

Ex: p 1 = 0.4, p 2 = 0.2, p 3 = 0.2, p 4 = 0.1, p 5 = 0.1 

[Approach 1] 

0.4 

0.2 

0 

1 

0.6 

0.2 

0 

000 

0.1 

0.1 

0 

1 

0.2 

1 

0.4 0010 

0011 

L 

1 

= 

0. 

4 

× 1+ 

0. 

2 

× 2 + 

0. 

2 

0 

1 

1 

× 3+ 

0. 

1× 

4 + 0. 

1× 

4 = 

1 

01 

2. 

2 

bits 

21

[Approach 2] 

0 

0.4 

0.6 

1 

00 

1 

0 

1 

0 

0.2 

02 

0.4 

10 

11 

1 

1 

0 

0.2 

0.1 

02 

11 

010 

0 

1 

0.1 

0.2 

011 

1 

1 

2 

bits 

2 

. 

2 

3 

1 

. 

0 

3 

1 

. 

0 

2 

2 

. 

0 

2 

2 

. 

0 

2 

4 

. 

0 L 

L = 

= 

× 

+ 

× 

+ 

× 

+ 

× 

+ 

× 

= 

22

) 

2 

2 

3 

( 

2 

0 

) 

2 

2 

2 

( 

2 

0 

) 

2 

2 

1 

( 

4 

0 

) 

1 

var( 

2 

2 

2 

− 

+ 

− 

+ 

− 

= . 

. 

. 

. 

. 

. 

but 

36 

. 

1 

) 

2 

2 

4 

( 

1 

0 

) 

2 

2 

4 

( 

1 

0 

2 

2 

= 

− 

+ 

− 

+ . 

. 

. 

. 

) 

2 

2 

3 

( 

1 

0 

) 

2 

2 

3 

( 

1 

0 

) 

2 

2 

2 

( 

2 

0 

) 

2 

2 

2 

( 

2 

0 

) 

2 

2 

2 

( 

4 

0 

) 

2 

var( 

2 

2 

2 

2 

2 

+ 

+ 

− 

+ 

− 

+ 

− 

= . 

. 

. 

. 

. 

. 

16 

. 

0 

) 

2 

2 

3 

( 

1 

0 

) 

2 

2 

3 

( 

1 

0 

= 

− 

+ 

− 

+ . 

. 

. 

. 

variance! 

less 

its 

to 

due 

preferable 

is 

2 

Approach 

⇒ 

23

Source Coding

Create successful ePaper yourself

Delete template?

Save as template?