a la physique de l'information - Lisa - Université d'Angers
a la physique de l'information - Lisa - Université d'Angers
a la physique de l'information - Lisa - Université d'Angers
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Author's personal copy<br />
F. Chapeau-Blon<strong>de</strong>au, D. Rousseau / Physica A 388 (2009) 3969–3984 3971<br />
One seeks to estimate the probability <strong>de</strong>nsity f (x) from the N data points xn of Eq. (1). For this purpose, a histogram mo<strong>de</strong>l<br />
is introduced for the unknown <strong>de</strong>nsity f (x) un<strong>de</strong>r the common form of an approximation by a piecewise constant function.<br />
This histogram mo<strong>de</strong>l is <strong>de</strong>noted M and is <strong>de</strong>fined as follows. The <strong>de</strong>nsity f (x) is mo<strong>de</strong>led by K constant p<strong>la</strong>teaus of value<br />
fk, for k = 1 to K , each of these p<strong>la</strong>teaus being <strong>de</strong>fined in the abscissa between xmin and xmax over a regu<strong>la</strong>r bin of width<br />
δx = xmax − xmin<br />
K<br />
= x<br />
, (3)<br />
K<br />
with xmin and xmax respectively the minimum and maximum values of the xn’s over the data set x of Eq. (1). Especially,<br />
consistency of the probability <strong>de</strong>nsity mo<strong>de</strong>l imposes<br />
K<br />
fkδx = 1. (4)<br />
k=1<br />
The probability P(x) of Eq. (2), based on the histogram mo<strong>de</strong>l M for the <strong>de</strong>nsity f (x), is expressible as<br />
P(x) = dx N<br />
K<br />
k=1<br />
f Nk k , (5)<br />
where Nk is the number of data points xn of the data set x that fall within bin number k, verifying K<br />
k=1 Nk = N.<br />
3. Maximum-likelihood histogram estimation<br />
When the number of bins K is fixed, the <strong>de</strong>nsity mo<strong>de</strong>l M is specified by the K parameters fk for k = 1 to K . To <strong>de</strong>termine<br />
these parameters from the data, a standard approach is the maximum-likelihood method [39] which consists in selecting<br />
those values of the parameters fk that maximize the probability P(x) in Eq. (5) of the observed data set x. Maximizing P(x)<br />
of Eq. (5) un<strong>de</strong>r the constraint of Eq. (4) is achieved by the well-known maximum-likelihood solution<br />
fk = Nk<br />
, k = 1, . . . K. (6)<br />
Nδx<br />
The maximum-likelihood solution of Eq. (6) completely specifies, for the probability <strong>de</strong>nsity f (x), the histogram mo<strong>de</strong>l with<br />
a fixed number K of regu<strong>la</strong>r bins.<br />
4. Minimum <strong>de</strong>scription length<br />
Another point of view can be adopted to arrive at the solution of Eq. (6). Information theory stipu<strong>la</strong>tes that to co<strong>de</strong> data<br />
xn appearing with probability P(xn), the optimal co<strong>de</strong> assigns a co<strong>de</strong>word with length − log P(xn). To co<strong>de</strong> the whole data<br />
set x of Eq. (1), the optimal co<strong>de</strong> assigns a length − log P(x), which by the probability mo<strong>de</strong>l of Eq. (5) is<br />
Ldata = − log P(x) = − log(dx N ) −<br />
K<br />
Nk log(fk). (7)<br />
k=1<br />
The maximum-likelihood solution of Eq. (6) maximizes the likelihood P(x) of Eq. (5) and equivalently the loglikelihood<br />
log P(x). Therefore, the solution of Eq. (6) also minimizes the co<strong>de</strong> length Ldata = − log P(x) of Eq. (7). The solution of Eq. (6)<br />
selects from the data, the K parameters fk of the probability <strong>de</strong>nsity mo<strong>de</strong>l M, so that the optimal co<strong>de</strong> <strong>de</strong>signed for the<br />
data from this <strong>de</strong>nsity mo<strong>de</strong>l, achieves the minimal co<strong>de</strong> length. This is the rationale of the MDL principle: to select the<br />
parameters of the mo<strong>de</strong>l that allow the shortest coding of the complete data. This guarantees that the selected mo<strong>de</strong>l is the<br />
best (within its c<strong>la</strong>ss) at capturing the structures and regu<strong>la</strong>rities in the data.<br />
We can add here, that the minimum of the <strong>de</strong>scription length (7) achieved by the solution of Eq. (6) can be expressed as<br />
<br />
x<br />
Lmin = NH({pk}) − N log(K) + N log , (8)<br />
dx<br />
where we have introduced the entropy<br />
H({pk}) = −<br />
K<br />
pk log(pk) (9)<br />
k=1<br />
of the empirical probabilitiespk = fkδx = Nk/N <strong>de</strong>duced from Eq. (6).<br />
Here, when the number of bins K of the histogram mo<strong>de</strong>l is fixed in an a priori way, the MDL solution coinci<strong>de</strong>s with the<br />
maximum-likelihood solution of Eq. (6). However, the MDL principle can be exten<strong>de</strong>d to also optimally select the number<br />
of bins K of the mo<strong>de</strong>l from the data, along with the K parameter values fk for k = 1 to K . This extension proceeds in the<br />
152/197