22.06.2013 Views

a la physique de l'information - Lisa - Université d'Angers

a la physique de l'information - Lisa - Université d'Angers

a la physique de l'information - Lisa - Université d'Angers

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Author's personal copy<br />

F. Chapeau-Blon<strong>de</strong>au, D. Rousseau / Physica A 388 (2009) 3969–3984 3971<br />

One seeks to estimate the probability <strong>de</strong>nsity f (x) from the N data points xn of Eq. (1). For this purpose, a histogram mo<strong>de</strong>l<br />

is introduced for the unknown <strong>de</strong>nsity f (x) un<strong>de</strong>r the common form of an approximation by a piecewise constant function.<br />

This histogram mo<strong>de</strong>l is <strong>de</strong>noted M and is <strong>de</strong>fined as follows. The <strong>de</strong>nsity f (x) is mo<strong>de</strong>led by K constant p<strong>la</strong>teaus of value<br />

fk, for k = 1 to K , each of these p<strong>la</strong>teaus being <strong>de</strong>fined in the abscissa between xmin and xmax over a regu<strong>la</strong>r bin of width<br />

δx = xmax − xmin<br />

K<br />

= x<br />

, (3)<br />

K<br />

with xmin and xmax respectively the minimum and maximum values of the xn’s over the data set x of Eq. (1). Especially,<br />

consistency of the probability <strong>de</strong>nsity mo<strong>de</strong>l imposes<br />

K<br />

fkδx = 1. (4)<br />

k=1<br />

The probability P(x) of Eq. (2), based on the histogram mo<strong>de</strong>l M for the <strong>de</strong>nsity f (x), is expressible as<br />

P(x) = dx N<br />

K<br />

k=1<br />

f Nk k , (5)<br />

where Nk is the number of data points xn of the data set x that fall within bin number k, verifying K<br />

k=1 Nk = N.<br />

3. Maximum-likelihood histogram estimation<br />

When the number of bins K is fixed, the <strong>de</strong>nsity mo<strong>de</strong>l M is specified by the K parameters fk for k = 1 to K . To <strong>de</strong>termine<br />

these parameters from the data, a standard approach is the maximum-likelihood method [39] which consists in selecting<br />

those values of the parameters fk that maximize the probability P(x) in Eq. (5) of the observed data set x. Maximizing P(x)<br />

of Eq. (5) un<strong>de</strong>r the constraint of Eq. (4) is achieved by the well-known maximum-likelihood solution<br />

fk = Nk<br />

, k = 1, . . . K. (6)<br />

Nδx<br />

The maximum-likelihood solution of Eq. (6) completely specifies, for the probability <strong>de</strong>nsity f (x), the histogram mo<strong>de</strong>l with<br />

a fixed number K of regu<strong>la</strong>r bins.<br />

4. Minimum <strong>de</strong>scription length<br />

Another point of view can be adopted to arrive at the solution of Eq. (6). Information theory stipu<strong>la</strong>tes that to co<strong>de</strong> data<br />

xn appearing with probability P(xn), the optimal co<strong>de</strong> assigns a co<strong>de</strong>word with length − log P(xn). To co<strong>de</strong> the whole data<br />

set x of Eq. (1), the optimal co<strong>de</strong> assigns a length − log P(x), which by the probability mo<strong>de</strong>l of Eq. (5) is<br />

Ldata = − log P(x) = − log(dx N ) −<br />

K<br />

Nk log(fk). (7)<br />

k=1<br />

The maximum-likelihood solution of Eq. (6) maximizes the likelihood P(x) of Eq. (5) and equivalently the loglikelihood<br />

log P(x). Therefore, the solution of Eq. (6) also minimizes the co<strong>de</strong> length Ldata = − log P(x) of Eq. (7). The solution of Eq. (6)<br />

selects from the data, the K parameters fk of the probability <strong>de</strong>nsity mo<strong>de</strong>l M, so that the optimal co<strong>de</strong> <strong>de</strong>signed for the<br />

data from this <strong>de</strong>nsity mo<strong>de</strong>l, achieves the minimal co<strong>de</strong> length. This is the rationale of the MDL principle: to select the<br />

parameters of the mo<strong>de</strong>l that allow the shortest coding of the complete data. This guarantees that the selected mo<strong>de</strong>l is the<br />

best (within its c<strong>la</strong>ss) at capturing the structures and regu<strong>la</strong>rities in the data.<br />

We can add here, that the minimum of the <strong>de</strong>scription length (7) achieved by the solution of Eq. (6) can be expressed as<br />

<br />

x<br />

Lmin = NH({pk}) − N log(K) + N log , (8)<br />

dx<br />

where we have introduced the entropy<br />

H({pk}) = −<br />

K<br />

pk log(pk) (9)<br />

k=1<br />

of the empirical probabilitiespk = fkδx = Nk/N <strong>de</strong>duced from Eq. (6).<br />

Here, when the number of bins K of the histogram mo<strong>de</strong>l is fixed in an a priori way, the MDL solution coinci<strong>de</strong>s with the<br />

maximum-likelihood solution of Eq. (6). However, the MDL principle can be exten<strong>de</strong>d to also optimally select the number<br />

of bins K of the mo<strong>de</strong>l from the data, along with the K parameter values fk for k = 1 to K . This extension proceeds in the<br />

152/197

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!