Embedding Information in Grayscale Images - Signal Processing ...

Embedding Information in Grayscale Images 

Marten van Dijk and Frans Willems 

Philips Research Laboratories, Eindhoven 

Abstract 

Grayscale images can be represented as sequences of integer-valued symbols. If 

such a symbol has alphabet {0, 1, · · · , 2 B − 1} it could be represented by B binary 

digits. To embed information in these symbols, we are allowed to distort them. The 

distortion measure that we consider here is mean squared error. In this setup there is 

a so-called “rate-distortion function” that tells us what the largest embedding rate is, 

given a certain distortion level. First we determine this rate-distortion function. Next 

we compare the performance of LSB modulation to this rate-distortion function. Then 

several embedding codes are proposed: (i) codes that only affect the LSB of a symbol, 

(ii) so-called ternary codes in which a symbol can be changed by at most one, and (iii) 

codes that process a pair of symbols each time. 

1 System description 

discrete 

memoryless 

source 

X N 1 

✲ 

encoder 

Y N 

1 

✲ 

destination 

✻ 

W 

✲ 

decoder 

✲ 

Ŵ 

Figure 1: A model of an information embedding system. 

The embedding system that we study is shown in figure 1. A discrete memoryless source 

emits the host sequence x1 

N = (x 1, x 2 , · · · , x N ). The symbols x n for n = 1, N assume values 

from the alphabet X . Their distribution is {P(x), x ∈ X }. A message source produces the 

message index w ∈ {1, 2, · · · , M} with probability 1/M, independent of x1 N . Based on the 

message index w and the host sequence x1 

N the encoder (embedder) produces the composite 

sequence 

= f (x 1 N , w). 

y N 1 

1

The symbols from y1 

N = (y 1, y 2 , · · · , y N ) assume values in the alphabet Y. We require that 

from the composite sequence y1 

N the embedded message can be reconstructed, i.e. there is a 

decoder producing the estimate Ŵ = g(y1 N ) such that 

Pr{Ŵ ̸= W } = 0. 

Moreover the sequences y N 1 

must be “close” to x N 1 

, i.e. the expected distortion 

¯D = E[d(X1 N , Y 1 N )] = ∑ Pr{X1 N = x 1 N } Pr{W = w}d(x 1 N , f (x 1 N , w)) ≤ , 

x1 N ,w 

where is the maximum allowable distortion and 

d(x1 N , y 1 N ) = 1 ∑ 

N 

n=1,N 

D(x n , y n ) 

is the distortion between x1 

N and y 1 N 

define the embedding-rate R as 

, for some specified distortion matrix D(·, ·). Finally we 

R = 1 N log 2 (M). 

2 The ”rate-distortion” function 

We are obviously interested in finding codes that combine a large embedding rate R with a 

small distortion ¯D. Only recently several authors, e.g. [3] and [1], realized that this setup 

is strongly related to the “defect channel”-problem which was first investigated in a general 

form by Gelfand and Pinsker [2]. In [4] the so-called ”rate-distortion function”, that 

corresponds to a slightly more general embedding model than the one we study here, was 

determined. This function gives the maximum embedding rate R for a given distortion level 

. For our model (Z ≡ Y ) this function turns out to be 

R() = 

max 

{P(y|x): ∑ H(Y |X). 

x,y P(x)P(y|x)D(x,y)≤} 

Note that this is not a rate-distortion function in the usual sense. 

3 Grayscale symbols, mean squared error distortion 

0 

1 

2 

x y 

2 B − 1 

❅ ❅ 

 

D(x, y) = (x − y) 2 

Figure 2: The grayscale alphabet G and mean squared error distortion. 

A grayscale symbol assumes a value from an integer alphabet G. The cardinality of G 

should no be too small. E.g. let G = {0, 1, · · · , 2 B − 1} for some positive integer B, then

each symbol can be represented by a vector of B binary digits. Now let X = Y = G. If a 

grayscale symbol x ∈ G is changed into a grayscale symbol y ∈ G then the resulting squared 

error distortion is 

D(x, y) = (y − x) 2 , 

 

see figure 2. To determine R() for this case, let p i = 

∑y−x=i P(x)P(y|x) for all integer 

i, then 

H(Y |X) = H(Y − X|X) ≤ H(Y − X) = ∑ 1 

p i log 2 

p 

i 

i 

and 

∑ 

P(x)P(y|x)(y − x) 2 = ∑ 

x,y 

i 

p i i 2 . 

The corresponding bound for R() can be achieved by taking i = y − x independent of x, 

which is possible if we assume that |G| is large and we neglect boundary-effects. Now we 

have to maximize ∑ i p 1 

i log 2 p i 

under the constraint ∑ i p ii 2 ≤ . Therefore consider for 

some α > 0 the “Gaussian” distribution {pi ∗} where 

p ∗ i 

 

= exp(−αi 2 )/β, 

with β = ∑ i exp(−αi 2 ) and having variance ∑ i p∗ i i 2 = . Then for any distribution {p i } 

with variance ∑ i p ii 2 ≤ we can write 

∑ 

p i ln 1 pi 

∗ 

i 

= ∑ i 

p i ln(β exp(αi 2 )) = ln β + α ∑ i 

p i i 2 ≤ ln β + α, 

and therefore its entropy (in nats) 

∑ 

p i ln 1 = ∑ p 

i i 

i 

p i ln p∗ i 

p i 

+ ∑ i 

p i ln 1 p ∗ i 

≤ ln β + α. 

Equality is achieved only for {p i } equal to the Gaussian distribution {pi ∗ }. Therefore to 

compute the rate-distortion function we only need to vary α and compute the entropy and 

variance of {pi ∗ }. We can now make a plot of R(), see figure 3. This plot shows that to 

achieve a rate of 1 bit/symb. we need an average distortion of at least ≈ 0.22. 

4 Least-significant bit(s) modulation 

Traditional embedding methods assume that a grayscale symbol x ∈ G is represented as a 

binary vector b B−1 , · · · , b 1 , b 0 such that x = ∑ i=0,B−1 b i2 i . Messages are now embedded 

only in the least-significant bits (LSB’s) in this binary representation. Therefore this form of 

embedding is called least-significant bits modulation. Assume that message w is embedded 

in the R LSB’s (R is a positive integer). The following tables now show what happens for 

R = 1. Observe that for R = 1 the distortion ¯D = 1/4 · 2 = 1/2.

3 

2.5 

2 

1.5 

1 

0.5 

0 

0 0.5 1 1.5 2 2.5 3 

Figure 3: R() in bits per symbol as a function of the distortion per symbol (horizontally), 

together with two LSB distortion-rate pairs (o) and time-sharing the R = 1-LSB method (...). 

x w = 0 1 

· · · 

8=1000 y = 8 9 

9=1001 8 9 

· · · 

x w = 0 1 

· · · 

8=1000 D = 0 1 

9=1001 1 0 

· · · 

Similarly for R = 2 the average distortion is ¯D = 1/16(6 · 1 + 4 · 4 + 2 · 9) = 5/2. The 

distortion-rate pairs ( ¯D, R) = (1/2, 1) and (5/2, 2) are plotted in figure 3 denoted by o’-s. 

Using the LSB modulation scheme ( ¯D, R) = (1/2, 1) only for a fraction of the symbols, we 

achieve 

R( ¯D) = 2 ¯D, for 0 ≤ ¯D ≤ 1/2. 

The resulting distortion-rate pairs are plotted in the figure with a dotted line. The problem 

we want to address next is: ”How can we do better than R( ¯D)/ ¯D = 2?” 

5 Coded LSB modulation 

In this section we consider coding methods that affect only the LSB of each grayscale symbol. 

The first method is based on Hamming codes, after that we consider a code based on the 

binary Golay code. 

5.1 Embedding based on binary Hamming codes 

To describe the embedding code consider the (7, 4, 3) Hamming code. Fix a certain coset C i , 

for some i = 0, 1, · · · , 7. Consider only the vector of LSB’s of seven host symbols. Denote 

this vector by x 7 1 = x 1, x 2 , · · · , x 7 . Determine a binary vector y 7 1 = y 1, y 2 , · · · , y 7 ∈ C i

which is closest to x1 7 in Hamming-sense. To obtain the composite sequence just replace the 

LSB vector x1 7 by y7 1 . 

Next we determine the average distortion ¯D of this method. The Hamming code is perfect 

with d H = 3, thus we will find a word y1 7 ∈ C i at distance 1 from x1 7 with probability 7/8 and 

a word at distance 0 with probability 1/8 (some smoothness of P(x) is assumed here). Hence 

¯D = (7/8)/7 = 1/8. The decoder can detect the coset to which the composite sequence y1 

7 

belongs, hence reliable transmission is possible with rate R = (log 2 8)/7 = 3/7 bit/symbol. 

Thus we achieve ( ¯D, R) = (1/8, 3/7). The R/ ¯D-ratio = 24/7 ≈ 3.428 which is a factor 

1.714 higher than LSB modulation. A Hamming code of length 2 m − 1 for m = 2, 3, 4, · · · 

gives 

R = 

m 

2 m − 1 and ¯D = 1 

2 m . 

Hence the ratio 

R/ ¯D = m2m 

2 m − 1 , 

which is approximately equal to m for large m (see figure 4). 

5.2 Embedding based on the binary Golay code 

Instead of the binary Hamming code we can use the (23, 12, 7) binary Golay code. This 

leads to 

R = 11 

( 23 

) ( 

23 ≈ 0.478 and 1 · 1 + 23 

) ( 

2 · 2 + 23 

) 

3 · 3 

¯D = 

≈ 0.124. 

2048 · 23 

Now the ratio R/ ¯D ≈ 3.85, which is almost a factor two better what can be achieved with 

LSB modulation (see figure 4). 

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 

Figure 4: Rate-distortion curve, time-sharing R = 1-LSB modulation (...), binary Hamming 

codes (o), and the binary Golay code (x).

6 Ternary embedding methods 

We start this section with a definition. A grayscale symbol x is said to be in class w iff x 

mod 3 = w for w = 0, 1, 2. Now we are ready to discuss uncoded ternary embedding and 

after that again two coded embedding methods. 

6.1 Uncoded ternary modulation 

Suppose that message w ∈ {0, 1, 2} is to be embedded in the grayscale symbol x. The 

decoder determines the message w simply by looking at the class of y. If x is in class 

w then y = x is chosen, otherwise we change it into the symbol y in class w, such that 

D(x, y) = (y − x) 2 is minimal, see the tables below. 

x w = 0 1 2 

· · · 

9 y = 9 10 8 

10 9 10 11 

11 12 10 11 

· · · 

x w = 0 1 2 

· · · 

9 D = 0 1 1 

10 1 0 1 

11 1 1 0 

· · · 

The obtained embedding rate R = log 2 3 ≈ 1.585. The corresponding average distortion 

¯D = 2/3 · 1 = 2/3. This results in the ratio R/ ¯D = 3/2 log 2 3 ≈ 2.378 which is quite good! 

6.2 Embedding with ternary Hamming codes 

We can also design codes for the modulating the class, based on ternary Hamming codes. For 

a given value m, i.e. the number of parity check symbols, the codeword length is (3 m − 1)/2. 

Therefore 

R = 2m log 2 3 

3 m − 1 and ¯D = 2 

3 m . 

Hence 

see figure 5. 

R/ ¯D = m3m log 2 3 

3 m − 1 , 

6.3 Embedding using the ternary Golay code 

If we use the (11, 6, 5) ternary Golay code we get the following rate and distortion: 

R = 5 log 2 3 

11 

≈ 0.7204 and ¯D = 

( 11 

) ( 

1 · 1 · 2 + 11 

) 

2 · 2 · 4 

243 · 11 

= 42 

243 ≈ 0.1728. 

Hence R/ ¯D ≈ 4.1682 (see figure 5).

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 

Figure 5: Rate-distortion curve, time-sharing R = 1-LSB modulation (...), binary Hamming 

codes (o), ternary Hamming codes (*), and both Golay codes (x). 

7 Two-dimensional codes 

Finally we consider a method that modifies pairs of grayscale-symbols (x a , x b ). Therefore 

we ”color” the two-dimensional rectangular grid with five colors. A point and its nearest 

neighbors all have a different color, see table below. 

x a → 

1 2 3 4 0 1 2 3 4 0 1 2 3 4 0 

x b 4 0 1 2 3 4 0 1 2 3 4 0 1 2 3 

↓ 2 3 4 0 1 2 3 4 0 1 2 3 4 0 1 

0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 

3 4 0 1 2 3 4 0 1 2 3 4 0 1 2 

We want to embed in x a and x b a message w ∈ {0, 1, 2, 3, 4}. The decoder just looks at the 

color of pair (y a , y b ). If (x a , x b ) does not have color w change it into the nearest neighbor 

(y a , y b ) with color w. Now 

¯D = 1 · 0 + 4 · 1 

5 · 2 

= 2/5 and R = log 2 5 

2 

= 1.16096. 

and R/ ¯D = 2.90241. This basic code can be used together with 5-ary Hamming codes. 

For a given m, i.e. the number of parity checks, we get a code length of (5 m − 1)/4 5-ary 

symbols. This code processes (5 m − 1)/2 grayscale-symbols and the embedding rate is 

R = 2m log 2 (5) 

5 m − 1 

and the distortion ¯D = 2 

5 m . 

The resulting distortion-rate pairs are shown in figure 6.

1 

0.9 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 

Figure 6: Rate-distortion curve, time-sharing R = 1 LSB modulation (...), binary Hamming 

codes (o), ternary Hamming codes (*), the Golay codes (x), and the 5-ary codes (+). 

8 Conclusion 

We have only looked at small distortions here. Moreover we have concentrated on perfect 

codes since this was so easy. It can be shown that Hamming codes are quite good for D → 0. 

For a moderate rate we have seen that Golay codes are rather good, but what about even better 

codes for even higher rates? How do we generalize the coloring results to larger number of 

colors and more dimensions? 

References 

[1] J. Chou, S. Pradhan, L. El Ghaoui and K. Ramchandran, “A Robust Optimization Solution 

to the Data Hiding Problem Using Distributed Source Coding Principles,” preprint, 

2000. 

[2] S. Gelfand and M. Pinsker, “Coding for a Channel with Random Parameters”, Problems 

of Control and Information Theory, vol. 9, pp. 19-31, 1980. 

[3] P. Moulin and J. O’Sullivan, “Information-theoretic Analysis of Information Hiding,” 

preprint, 1999. 

[4] F.M.J. Willems, ”An Informationtheoretical Approach to Information Embedding,” 

Proc. 21st Symp. Inform. Theory in the Benelux, pp. 255-260, Wassenaar, May 25-26, 

2000.

Embedding Information in Grayscale Images - Signal Processing ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?