22.06.2013 Views

a la physique de l'information - Lisa - Université d'Angers

a la physique de l'information - Lisa - Université d'Angers

a la physique de l'information - Lisa - Université d'Angers

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Author's personal copy<br />

3978 F. Chapeau-Blon<strong>de</strong>au, D. Rousseau / Physica A 388 (2009) 3969–3984<br />

A B<br />

total <strong>de</strong>scription length<br />

x 10 6<br />

1.5172<br />

1.5171<br />

1.517<br />

1.5169<br />

1.5168<br />

1.5167<br />

1.5166<br />

1.5165<br />

1.5164<br />

0 10 20 30 40 50 60 70 80 100 120 150 170<br />

number of bins K<br />

3<br />

2<br />

4<br />

probability <strong>de</strong>nsity<br />

0.4<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

–4 –3 –2 –1 0 1 2 3 4 5 6<br />

signal amplitu<strong>de</strong><br />

Fig. 4. Panel A: Total <strong>de</strong>scription length Ltotal in bits, as a function of the number of bins K , for a data set with N = 10 5 points, and Linitial = 16N =<br />

1 600 000 bits, drawn from probability <strong>de</strong>nsity f (·) which is the bi-Gaussian mixture 0.5N (0, 1) + 0.5N (3, 0.5): (2) L2 from Eq. (35) with lossy coding of<br />

the mo<strong>de</strong>l, (3) L3 from Eq. (31) with lossless coding of the mo<strong>de</strong>l, (4) L4 from Eq. (33) with more efficient lossless coding of the mo<strong>de</strong>l. The minimum of<br />

Ltotal is shown by marker (◦): ( K = 58, L2( K) = 1 516 513 bits), (△): ( K = 53, L3( K) = 1 516 722 bits), (): ( K = 62, L4( K) = 1 516 477 bits). Panel B:<br />

Histogram mo<strong>de</strong>l at the optimum number of bins K = 62 minimizing L4 of Eq. (33), superimposed to the true bi-Gaussian probability <strong>de</strong>nsity f (·).<br />

In Fig. 3A, the total <strong>de</strong>scription length L1 from Eq. (30), steadily <strong>de</strong>creases with K , as announced, because L1 does not<br />

incorporate the mo<strong>de</strong>l <strong>de</strong>scription length. On the contrary in Fig. 3A, the total <strong>de</strong>scription lengths L2, L3 and L4, which<br />

incorporate the mo<strong>de</strong>l <strong>de</strong>scription length, exhibit a minimum for an optimal value of K . In Fig. 3A, the length L3, which<br />

comes from a re<strong>la</strong>tively poor coding strategy for the parameters, is, as a rule, always <strong>la</strong>rger than the lengths L2 and L4. In the<br />

region of the minimum in Fig. 3A, the lengths L2 and L4, although they are based on distinct coding strategies, assume very<br />

close values. L2 from Eq. (35) is based on a lossy approximate coding of the parameters: this provi<strong>de</strong>s a shorter co<strong>de</strong> length<br />

for the parameters associated with a less accurate (longer) coding for the data. On the contrary, L4 from Eq. (33) is based<br />

on an exact lossless coding of the parameters: this costs a longer co<strong>de</strong> length for the parameters associated with a more<br />

accurate (shorter) coding for the data. These two complementary situations of L2 and L4 tend to compensate in the region<br />

of the minimum in Fig. 3A, to lead to close values of the total <strong>de</strong>scription length. However, there is a slight superiority of<br />

L4 over L2 in Fig. 3A, in the sense that L4, at the optimal setting ( K = 23, L4( K) = 150 692 bits), achieves a slightly shorter<br />

minimal total length L4( K) = 150 692 bits and at the same time a higher resolution in the histogram <strong>de</strong>finition with an<br />

optimal number of bins K = 23. Fig. 3B shows the optimal histogram mo<strong>de</strong>l estimated for the probability <strong>de</strong>nsity f (·) of the<br />

data set, at K = 23.<br />

A second example is presented in Fig. 4, for data points drawn from a Gaussian mixture <strong>de</strong>nsity. A simi<strong>la</strong>r overall behavior<br />

is observed in Fig. 4 for the total <strong>de</strong>scription lengths L2, L3 and L4 as in Fig. 3. The length L3 is always <strong>la</strong>rger, while L2 and<br />

L4 take close values in the region of the minimum. Also in Fig. 4, the shortest <strong>de</strong>scription length and at the same time the<br />

highest histogram resolution K , are achieved by L4 at the optimal setting ( K = 62, L4( K) = 1 516 477 bits). This is a double<br />

benefit associated with L4: shortest minimal co<strong>de</strong> length and at the same time highest optimal resolution K . Although the<br />

length L2 is close to L4 in the region of the minimum, and both L2 and L4 fluctuate in these regions from one data set to<br />

another with same size N, this double benefit observed with L4 in Fig. 4, was never exchanged between L4 and L2. This was<br />

the rule for all the configurations we tested, for all the <strong>de</strong>nsities in this Section 9.<br />

We also tested probability <strong>de</strong>nsities that accept a very small number of regu<strong>la</strong>r bins for accurate estimation. For uniform<br />

<strong>de</strong>nsities for which a single bin is a<strong>de</strong>quate, the estimation based on the total lengths L2, L3 and L4, all generally yield the<br />

optimal number of bins K = 1, with in general the shortest co<strong>de</strong> length affor<strong>de</strong>d by L4( K). Comparable conditions are<br />

presented in Fig. 5 with a <strong>de</strong>nsity which is constant over two separate intervals of equal width, separated by an interval<br />

with zero probability.<br />

As visible in Fig. 5, the total lengths L2, L3 and L4, all yield the appropriate number of bins K = 3, while the shortest co<strong>de</strong><br />

length is affor<strong>de</strong>d by L4( K) = 15 434 bits.<br />

Fig. 6 presents the example of a <strong>de</strong>nsity which is constant over two separate intervals of unequal widths, separated by an<br />

interval with zero probability. The total lengths L2, L3 and L4, all yield the appropriate number of regu<strong>la</strong>r bins K = 4, while<br />

the shortest co<strong>de</strong> length is affor<strong>de</strong>d by L4( K) = 15 531 bits.<br />

10. Application to measured data<br />

This section presents an application of histogram estimation by MDL on measured data. The data xn are formed by the<br />

intensities of gray-level images with size N = 512 × 512 pixels. These intensities are initially measured over 256 levels,<br />

159/197

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!