23.07.2014 Views

Embedding Information in Grayscale Images - Signal Processing ...

Embedding Information in Grayscale Images - Signal Processing ...

Embedding Information in Grayscale Images - Signal Processing ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Embedd<strong>in</strong>g</strong> <strong>Information</strong> <strong>in</strong> <strong>Grayscale</strong> <strong>Images</strong><br />

Marten van Dijk and Frans Willems<br />

Philips Research Laboratories, E<strong>in</strong>dhoven<br />

Abstract<br />

<strong>Grayscale</strong> images can be represented as sequences of <strong>in</strong>teger-valued symbols. If<br />

such a symbol has alphabet {0, 1, · · · , 2 B − 1} it could be represented by B b<strong>in</strong>ary<br />

digits. To embed <strong>in</strong>formation <strong>in</strong> these symbols, we are allowed to distort them. The<br />

distortion measure that we consider here is mean squared error. In this setup there is<br />

a so-called “rate-distortion function” that tells us what the largest embedd<strong>in</strong>g rate is,<br />

given a certa<strong>in</strong> distortion level. First we determ<strong>in</strong>e this rate-distortion function. Next<br />

we compare the performance of LSB modulation to this rate-distortion function. Then<br />

several embedd<strong>in</strong>g codes are proposed: (i) codes that only affect the LSB of a symbol,<br />

(ii) so-called ternary codes <strong>in</strong> which a symbol can be changed by at most one, and (iii)<br />

codes that process a pair of symbols each time.<br />

1 System description<br />

discrete<br />

memoryless<br />

source<br />

X N 1<br />

✲<br />

encoder<br />

Y N<br />

1<br />

✲<br />

dest<strong>in</strong>ation<br />

✻<br />

W<br />

✲<br />

decoder<br />

✲<br />

Ŵ<br />

Figure 1: A model of an <strong>in</strong>formation embedd<strong>in</strong>g system.<br />

The embedd<strong>in</strong>g system that we study is shown <strong>in</strong> figure 1. A discrete memoryless source<br />

emits the host sequence x1<br />

N = (x 1, x 2 , · · · , x N ). The symbols x n for n = 1, N assume values<br />

from the alphabet X . Their distribution is {P(x), x ∈ X }. A message source produces the<br />

message <strong>in</strong>dex w ∈ {1, 2, · · · , M} with probability 1/M, <strong>in</strong>dependent of x1 N . Based on the<br />

message <strong>in</strong>dex w and the host sequence x1<br />

N the encoder (embedder) produces the composite<br />

sequence<br />

= f (x 1 N , w).<br />

y N 1<br />

1


The symbols from y1<br />

N = (y 1, y 2 , · · · , y N ) assume values <strong>in</strong> the alphabet Y. We require that<br />

from the composite sequence y1<br />

N the embedded message can be reconstructed, i.e. there is a<br />

decoder produc<strong>in</strong>g the estimate Ŵ = g(y1 N ) such that<br />

Pr{Ŵ ̸= W } = 0.<br />

Moreover the sequences y N 1<br />

must be “close” to x N 1<br />

, i.e. the expected distortion<br />

¯D = E[d(X1 N , Y 1 N )] = ∑ Pr{X1 N = x 1 N } Pr{W = w}d(x 1 N , f (x 1 N , w)) ≤ ,<br />

x1 N ,w<br />

where is the maximum allowable distortion and<br />

d(x1 N , y 1 N ) = 1 ∑<br />

N<br />

n=1,N<br />

D(x n , y n )<br />

is the distortion between x1<br />

N and y 1 N<br />

def<strong>in</strong>e the embedd<strong>in</strong>g-rate R as<br />

, for some specified distortion matrix D(·, ·). F<strong>in</strong>ally we<br />

R = 1 N log 2 (M).<br />

2 The ”rate-distortion” function<br />

We are obviously <strong>in</strong>terested <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g codes that comb<strong>in</strong>e a large embedd<strong>in</strong>g rate R with a<br />

small distortion ¯D. Only recently several authors, e.g. [3] and [1], realized that this setup<br />

is strongly related to the “defect channel”-problem which was first <strong>in</strong>vestigated <strong>in</strong> a general<br />

form by Gelfand and P<strong>in</strong>sker [2]. In [4] the so-called ”rate-distortion function”, that<br />

corresponds to a slightly more general embedd<strong>in</strong>g model than the one we study here, was<br />

determ<strong>in</strong>ed. This function gives the maximum embedd<strong>in</strong>g rate R for a given distortion level<br />

. For our model (Z ≡ Y ) this function turns out to be<br />

R() =<br />

max<br />

{P(y|x): ∑ H(Y |X).<br />

x,y P(x)P(y|x)D(x,y)≤}<br />

Note that this is not a rate-distortion function <strong>in</strong> the usual sense.<br />

3 <strong>Grayscale</strong> symbols, mean squared error distortion<br />

0<br />

1<br />

2<br />

x y<br />

2 B − 1<br />

❅ ❅<br />

<br />

D(x, y) = (x − y) 2<br />

Figure 2: The grayscale alphabet G and mean squared error distortion.<br />

A grayscale symbol assumes a value from an <strong>in</strong>teger alphabet G. The card<strong>in</strong>ality of G<br />

should no be too small. E.g. let G = {0, 1, · · · , 2 B − 1} for some positive <strong>in</strong>teger B, then


each symbol can be represented by a vector of B b<strong>in</strong>ary digits. Now let X = Y = G. If a<br />

grayscale symbol x ∈ G is changed <strong>in</strong>to a grayscale symbol y ∈ G then the result<strong>in</strong>g squared<br />

error distortion is<br />

D(x, y) = (y − x) 2 ,<br />

<br />

see figure 2. To determ<strong>in</strong>e R() for this case, let p i =<br />

∑y−x=i P(x)P(y|x) for all <strong>in</strong>teger<br />

i, then<br />

H(Y |X) = H(Y − X|X) ≤ H(Y − X) = ∑ 1<br />

p i log 2<br />

p<br />

i<br />

i<br />

and<br />

∑<br />

P(x)P(y|x)(y − x) 2 = ∑<br />

x,y<br />

i<br />

p i i 2 .<br />

The correspond<strong>in</strong>g bound for R() can be achieved by tak<strong>in</strong>g i = y − x <strong>in</strong>dependent of x,<br />

which is possible if we assume that |G| is large and we neglect boundary-effects. Now we<br />

have to maximize ∑ i p 1<br />

i log 2 p i<br />

under the constra<strong>in</strong>t ∑ i p ii 2 ≤ . Therefore consider for<br />

some α > 0 the “Gaussian” distribution {pi ∗} where<br />

p ∗ i<br />

<br />

= exp(−αi 2 )/β,<br />

with β = ∑ i exp(−αi 2 ) and hav<strong>in</strong>g variance ∑ i p∗ i i 2 = . Then for any distribution {p i }<br />

with variance ∑ i p ii 2 ≤ we can write<br />

∑<br />

p i ln 1 pi<br />

∗<br />

i<br />

= ∑ i<br />

p i ln(β exp(αi 2 )) = ln β + α ∑ i<br />

p i i 2 ≤ ln β + α,<br />

and therefore its entropy (<strong>in</strong> nats)<br />

∑<br />

p i ln 1 = ∑ p<br />

i i<br />

i<br />

p i ln p∗ i<br />

p i<br />

+ ∑ i<br />

p i ln 1 p ∗ i<br />

≤ ln β + α.<br />

Equality is achieved only for {p i } equal to the Gaussian distribution {pi ∗ }. Therefore to<br />

compute the rate-distortion function we only need to vary α and compute the entropy and<br />

variance of {pi ∗ }. We can now make a plot of R(), see figure 3. This plot shows that to<br />

achieve a rate of 1 bit/symb. we need an average distortion of at least ≈ 0.22.<br />

4 Least-significant bit(s) modulation<br />

Traditional embedd<strong>in</strong>g methods assume that a grayscale symbol x ∈ G is represented as a<br />

b<strong>in</strong>ary vector b B−1 , · · · , b 1 , b 0 such that x = ∑ i=0,B−1 b i2 i . Messages are now embedded<br />

only <strong>in</strong> the least-significant bits (LSB’s) <strong>in</strong> this b<strong>in</strong>ary representation. Therefore this form of<br />

embedd<strong>in</strong>g is called least-significant bits modulation. Assume that message w is embedded<br />

<strong>in</strong> the R LSB’s (R is a positive <strong>in</strong>teger). The follow<strong>in</strong>g tables now show what happens for<br />

R = 1. Observe that for R = 1 the distortion ¯D = 1/4 · 2 = 1/2.


3<br />

2.5<br />

2<br />

1.5<br />

1<br />

0.5<br />

0<br />

0 0.5 1 1.5 2 2.5 3<br />

Figure 3: R() <strong>in</strong> bits per symbol as a function of the distortion per symbol (horizontally),<br />

together with two LSB distortion-rate pairs (o) and time-shar<strong>in</strong>g the R = 1-LSB method (...).<br />

x w = 0 1<br />

· · ·<br />

8=1000 y = 8 9<br />

9=1001 8 9<br />

· · ·<br />

x w = 0 1<br />

· · ·<br />

8=1000 D = 0 1<br />

9=1001 1 0<br />

· · ·<br />

Similarly for R = 2 the average distortion is ¯D = 1/16(6 · 1 + 4 · 4 + 2 · 9) = 5/2. The<br />

distortion-rate pairs ( ¯D, R) = (1/2, 1) and (5/2, 2) are plotted <strong>in</strong> figure 3 denoted by o’-s.<br />

Us<strong>in</strong>g the LSB modulation scheme ( ¯D, R) = (1/2, 1) only for a fraction of the symbols, we<br />

achieve<br />

R( ¯D) = 2 ¯D, for 0 ≤ ¯D ≤ 1/2.<br />

The result<strong>in</strong>g distortion-rate pairs are plotted <strong>in</strong> the figure with a dotted l<strong>in</strong>e. The problem<br />

we want to address next is: ”How can we do better than R( ¯D)/ ¯D = 2?”<br />

5 Coded LSB modulation<br />

In this section we consider cod<strong>in</strong>g methods that affect only the LSB of each grayscale symbol.<br />

The first method is based on Hamm<strong>in</strong>g codes, after that we consider a code based on the<br />

b<strong>in</strong>ary Golay code.<br />

5.1 <strong>Embedd<strong>in</strong>g</strong> based on b<strong>in</strong>ary Hamm<strong>in</strong>g codes<br />

To describe the embedd<strong>in</strong>g code consider the (7, 4, 3) Hamm<strong>in</strong>g code. Fix a certa<strong>in</strong> coset C i ,<br />

for some i = 0, 1, · · · , 7. Consider only the vector of LSB’s of seven host symbols. Denote<br />

this vector by x 7 1 = x 1, x 2 , · · · , x 7 . Determ<strong>in</strong>e a b<strong>in</strong>ary vector y 7 1 = y 1, y 2 , · · · , y 7 ∈ C i


which is closest to x1 7 <strong>in</strong> Hamm<strong>in</strong>g-sense. To obta<strong>in</strong> the composite sequence just replace the<br />

LSB vector x1 7 by y7 1 .<br />

Next we determ<strong>in</strong>e the average distortion ¯D of this method. The Hamm<strong>in</strong>g code is perfect<br />

with d H = 3, thus we will f<strong>in</strong>d a word y1 7 ∈ C i at distance 1 from x1 7 with probability 7/8 and<br />

a word at distance 0 with probability 1/8 (some smoothness of P(x) is assumed here). Hence<br />

¯D = (7/8)/7 = 1/8. The decoder can detect the coset to which the composite sequence y1<br />

7<br />

belongs, hence reliable transmission is possible with rate R = (log 2 8)/7 = 3/7 bit/symbol.<br />

Thus we achieve ( ¯D, R) = (1/8, 3/7). The R/ ¯D-ratio = 24/7 ≈ 3.428 which is a factor<br />

1.714 higher than LSB modulation. A Hamm<strong>in</strong>g code of length 2 m − 1 for m = 2, 3, 4, · · ·<br />

gives<br />

R =<br />

m<br />

2 m − 1 and ¯D = 1<br />

2 m .<br />

Hence the ratio<br />

R/ ¯D = m2m<br />

2 m − 1 ,<br />

which is approximately equal to m for large m (see figure 4).<br />

5.2 <strong>Embedd<strong>in</strong>g</strong> based on the b<strong>in</strong>ary Golay code<br />

Instead of the b<strong>in</strong>ary Hamm<strong>in</strong>g code we can use the (23, 12, 7) b<strong>in</strong>ary Golay code. This<br />

leads to<br />

R = 11<br />

( 23<br />

) (<br />

23 ≈ 0.478 and 1 · 1 + 23<br />

) (<br />

2 · 2 + 23<br />

)<br />

3 · 3<br />

¯D =<br />

≈ 0.124.<br />

2048 · 23<br />

Now the ratio R/ ¯D ≈ 3.85, which is almost a factor two better what can be achieved with<br />

LSB modulation (see figure 4).<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5<br />

Figure 4: Rate-distortion curve, time-shar<strong>in</strong>g R = 1-LSB modulation (...), b<strong>in</strong>ary Hamm<strong>in</strong>g<br />

codes (o), and the b<strong>in</strong>ary Golay code (x).


6 Ternary embedd<strong>in</strong>g methods<br />

We start this section with a def<strong>in</strong>ition. A grayscale symbol x is said to be <strong>in</strong> class w iff x<br />

mod 3 = w for w = 0, 1, 2. Now we are ready to discuss uncoded ternary embedd<strong>in</strong>g and<br />

after that aga<strong>in</strong> two coded embedd<strong>in</strong>g methods.<br />

6.1 Uncoded ternary modulation<br />

Suppose that message w ∈ {0, 1, 2} is to be embedded <strong>in</strong> the grayscale symbol x. The<br />

decoder determ<strong>in</strong>es the message w simply by look<strong>in</strong>g at the class of y. If x is <strong>in</strong> class<br />

w then y = x is chosen, otherwise we change it <strong>in</strong>to the symbol y <strong>in</strong> class w, such that<br />

D(x, y) = (y − x) 2 is m<strong>in</strong>imal, see the tables below.<br />

x w = 0 1 2<br />

· · ·<br />

9 y = 9 10 8<br />

10 9 10 11<br />

11 12 10 11<br />

· · ·<br />

x w = 0 1 2<br />

· · ·<br />

9 D = 0 1 1<br />

10 1 0 1<br />

11 1 1 0<br />

· · ·<br />

The obta<strong>in</strong>ed embedd<strong>in</strong>g rate R = log 2 3 ≈ 1.585. The correspond<strong>in</strong>g average distortion<br />

¯D = 2/3 · 1 = 2/3. This results <strong>in</strong> the ratio R/ ¯D = 3/2 log 2 3 ≈ 2.378 which is quite good!<br />

6.2 <strong>Embedd<strong>in</strong>g</strong> with ternary Hamm<strong>in</strong>g codes<br />

We can also design codes for the modulat<strong>in</strong>g the class, based on ternary Hamm<strong>in</strong>g codes. For<br />

a given value m, i.e. the number of parity check symbols, the codeword length is (3 m − 1)/2.<br />

Therefore<br />

R = 2m log 2 3<br />

3 m − 1 and ¯D = 2<br />

3 m .<br />

Hence<br />

see figure 5.<br />

R/ ¯D = m3m log 2 3<br />

3 m − 1 ,<br />

6.3 <strong>Embedd<strong>in</strong>g</strong> us<strong>in</strong>g the ternary Golay code<br />

If we use the (11, 6, 5) ternary Golay code we get the follow<strong>in</strong>g rate and distortion:<br />

R = 5 log 2 3<br />

11<br />

≈ 0.7204 and ¯D =<br />

( 11<br />

) (<br />

1 · 1 · 2 + 11<br />

)<br />

2 · 2 · 4<br />

243 · 11<br />

= 42<br />

243 ≈ 0.1728.<br />

Hence R/ ¯D ≈ 4.1682 (see figure 5).


1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5<br />

Figure 5: Rate-distortion curve, time-shar<strong>in</strong>g R = 1-LSB modulation (...), b<strong>in</strong>ary Hamm<strong>in</strong>g<br />

codes (o), ternary Hamm<strong>in</strong>g codes (*), and both Golay codes (x).<br />

7 Two-dimensional codes<br />

F<strong>in</strong>ally we consider a method that modifies pairs of grayscale-symbols (x a , x b ). Therefore<br />

we ”color” the two-dimensional rectangular grid with five colors. A po<strong>in</strong>t and its nearest<br />

neighbors all have a different color, see table below.<br />

x a →<br />

1 2 3 4 0 1 2 3 4 0 1 2 3 4 0<br />

x b 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3<br />

↓ 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1<br />

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4<br />

3 4 0 1 2 3 4 0 1 2 3 4 0 1 2<br />

We want to embed <strong>in</strong> x a and x b a message w ∈ {0, 1, 2, 3, 4}. The decoder just looks at the<br />

color of pair (y a , y b ). If (x a , x b ) does not have color w change it <strong>in</strong>to the nearest neighbor<br />

(y a , y b ) with color w. Now<br />

¯D = 1 · 0 + 4 · 1<br />

5 · 2<br />

= 2/5 and R = log 2 5<br />

2<br />

= 1.16096.<br />

and R/ ¯D = 2.90241. This basic code can be used together with 5-ary Hamm<strong>in</strong>g codes.<br />

For a given m, i.e. the number of parity checks, we get a code length of (5 m − 1)/4 5-ary<br />

symbols. This code processes (5 m − 1)/2 grayscale-symbols and the embedd<strong>in</strong>g rate is<br />

R = 2m log 2 (5)<br />

5 m − 1<br />

and the distortion ¯D = 2<br />

5 m .<br />

The result<strong>in</strong>g distortion-rate pairs are shown <strong>in</strong> figure 6.


1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

0.5<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2<br />

Figure 6: Rate-distortion curve, time-shar<strong>in</strong>g R = 1 LSB modulation (...), b<strong>in</strong>ary Hamm<strong>in</strong>g<br />

codes (o), ternary Hamm<strong>in</strong>g codes (*), the Golay codes (x), and the 5-ary codes (+).<br />

8 Conclusion<br />

We have only looked at small distortions here. Moreover we have concentrated on perfect<br />

codes s<strong>in</strong>ce this was so easy. It can be shown that Hamm<strong>in</strong>g codes are quite good for D → 0.<br />

For a moderate rate we have seen that Golay codes are rather good, but what about even better<br />

codes for even higher rates? How do we generalize the color<strong>in</strong>g results to larger number of<br />

colors and more dimensions?<br />

References<br />

[1] J. Chou, S. Pradhan, L. El Ghaoui and K. Ramchandran, “A Robust Optimization Solution<br />

to the Data Hid<strong>in</strong>g Problem Us<strong>in</strong>g Distributed Source Cod<strong>in</strong>g Pr<strong>in</strong>ciples,” prepr<strong>in</strong>t,<br />

2000.<br />

[2] S. Gelfand and M. P<strong>in</strong>sker, “Cod<strong>in</strong>g for a Channel with Random Parameters”, Problems<br />

of Control and <strong>Information</strong> Theory, vol. 9, pp. 19-31, 1980.<br />

[3] P. Moul<strong>in</strong> and J. O’Sullivan, “<strong>Information</strong>-theoretic Analysis of <strong>Information</strong> Hid<strong>in</strong>g,”<br />

prepr<strong>in</strong>t, 1999.<br />

[4] F.M.J. Willems, ”An <strong>Information</strong>theoretical Approach to <strong>Information</strong> <strong>Embedd<strong>in</strong>g</strong>,”<br />

Proc. 21st Symp. Inform. Theory <strong>in</strong> the Benelux, pp. 255-260, Wassenaar, May 25-26,<br />

2000.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!