a la physique de l'information - Lisa - Université d'Angers
a la physique de l'information - Lisa - Université d'Angers
a la physique de l'information - Lisa - Université d'Angers
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Author's personal copy<br />
3972 F. Chapeau-Blon<strong>de</strong>au, D. Rousseau / Physica A 388 (2009) 3969–3984<br />
following way. The complete coding of the data should here inclu<strong>de</strong> two parts. The first part is the coding of the data based<br />
on a <strong>de</strong>finite probability <strong>de</strong>nsity mo<strong>de</strong>l to assign the co<strong>de</strong> lengths. For a given data set x, the <strong>de</strong>scription length nee<strong>de</strong>d by<br />
this first part is Ldata of Eq. (7), that we can also write Ldata L(x|M), the <strong>de</strong>scription length of the data given a <strong>de</strong>finite mo<strong>de</strong>l<br />
M of probability <strong>de</strong>nsity. The second part nee<strong>de</strong>d for a complete coding of the data is the <strong>de</strong>scription of the parameters that<br />
completely specify the un<strong>de</strong>rlying probability <strong>de</strong>nsity mo<strong>de</strong>l M. These parameters inclu<strong>de</strong> the number of bins K along with<br />
the K values fk for k = 1 to K . The <strong>de</strong>scription length nee<strong>de</strong>d by this second part in charge of coding the parameters of the<br />
mo<strong>de</strong>l M is <strong>de</strong>noted Lmo<strong>de</strong>l L(M); and we shall soon see how to explicitly quantify this <strong>de</strong>scription length L(M). Now the<br />
complete coding of the data set x has a total <strong>de</strong>scription length Ltotal which sums up the two parts as<br />
Ltotal L(x|M) + L(M), (10)<br />
signifying that the total <strong>de</strong>scription length of the data is the <strong>de</strong>scription length of the data given the mo<strong>de</strong>l plus the<br />
<strong>de</strong>scription length of the mo<strong>de</strong>l.<br />
For a given data set x, the MDL principle then dictates to select the mo<strong>de</strong>l parameters {K; fk, k = 1, . . . K} so as to<br />
minimize the total <strong>de</strong>scription length Ltotal of Eq. (10), i.e.<br />
{K; fk, k = 1, . . .K} = arg min<br />
{K;fk} Ltotal = arg min [L(x|M) + L(M)] . (11)<br />
{K;f k}<br />
This is an optimization principle based on optimal coding and information theory. In a prescribed c<strong>la</strong>ss of mo<strong>de</strong>ls (histograms<br />
with regu<strong>la</strong>r bins here), the best mo<strong>de</strong>l for the data is the mo<strong>de</strong>l that, when known, enables the most efficient (shortest)<br />
coding of these data.<br />
5. Description length for the data<br />
As already stated, the <strong>de</strong>scription length L(x|M) for the data given the mo<strong>de</strong>l is supplied by Eq. (7). The term − log(dx N )<br />
in Eq. (7) is a constant common to all mo<strong>de</strong>ls. For the purpose of discriminating among mo<strong>de</strong>ls, it is often chosen to omit<br />
this constant − log(dx N ) in the <strong>de</strong>scription length, with no impact on the final result concerning the mo<strong>de</strong>l choice. However<br />
here, we prefer to maintain this term, in or<strong>de</strong>r to keep track of the complete value of the <strong>de</strong>scription length, and convey<br />
some additional insight into the mo<strong>de</strong>ling process beyond the choice of the mo<strong>de</strong>l itself. So equivalently, the <strong>de</strong>scription<br />
length of Eq. (7) for the data given the mo<strong>de</strong>l is written as<br />
L(x|M) = −<br />
K<br />
Nk log(fkdx). (12)<br />
k=1<br />
Next, we have to address the quantification of the <strong>de</strong>scription length L(M) for the mo<strong>de</strong>l.<br />
6. Description length for the mo<strong>de</strong>l parameters as in<strong>de</strong>pen<strong>de</strong>nt real variables<br />
To quantify the <strong>de</strong>scription length L(M) of the mo<strong>de</strong>l, a possibility is to use a procedure <strong>de</strong>rived from Ref. [28]. The<br />
approach from Ref. [28] to quantify the <strong>de</strong>scription length L(M) of the mo<strong>de</strong>l, consi<strong>de</strong>rs the K mo<strong>de</strong>l parameters fk as K<br />
in<strong>de</strong>pen<strong>de</strong>nt real (continuously-valued) variables, which need to be quantized to finite precision in or<strong>de</strong>r to allow their<br />
coding. The histogram mo<strong>de</strong>l for the <strong>de</strong>nsity of the data assigns a probability pk = fkδx to bin k with width δx. Un<strong>de</strong>r this<br />
mo<strong>de</strong>l also, the number Nk of data points falling in bin k has expected value E(Nk) = Npk = Nfkδx and standard <strong>de</strong>viation<br />
σ (Nk) = [Nfkδx(1−fkδx)] 1/2 , according to the properties of the binomial distribution [40]. Therefore, since fk = E(Nk)/(Nδx),<br />
for all k, estimating fk is equivalent to estimating the mean E(Nk) of random variable Nk with standard <strong>de</strong>viation σ (Nk). The<br />
value σ (fk) = σ (Nk)/(Nδx) = [fk(1−fkδx)/(Nδx)] 1/2 fixes a natural precision with which fk can be estimated and need to be<br />
co<strong>de</strong>d. This <strong>de</strong>termines σ (fk) as the quantization step relevant for coding the mo<strong>de</strong>l parameters fk. One has the probability<br />
pk ∈ [0, 1] and the <strong>de</strong>nsity fk = pkδx −1 ∈ [0, δx −1 ]. The parameter fk therefore can take its values in the interval [0, δx −1 ]<br />
and is estimated and quantized with the precision σ (fk). Accordingly, a total number δx −1 /σ (fk) of different values for fk can<br />
be distinguished and need to be co<strong>de</strong>d separately, at a co<strong>de</strong> length log[δx −1 /σ (fk)]. For the K parameters fk the co<strong>de</strong> length<br />
results as<br />
L({fk}) =<br />
K<br />
log<br />
k=1<br />
δx −1<br />
σ (fk)<br />
<br />
= K<br />
1<br />
log(N) −<br />
2 2<br />
K<br />
log[fkδx(1 − fkδx)]. (13)<br />
k=1<br />
An alternative, comparable, approach to quantify the cost of coding continuously-valued parameters is <strong>de</strong>scribed in<br />
Ref. [1], based on a slightly more involved mathematical formu<strong>la</strong>tion. It turns out that quantifying the coding cost of<br />
continuously-valued mo<strong>de</strong>l parameters is an important and recurrent step when applying the MDL principle. We review<br />
this alternative approach from Ref. [1] in the Appendix, for better appreciation of different existing variants for applying the<br />
MDL principle. With the present approach <strong>de</strong>rived from Ref. [28] and proceeding through Eq. (13), the <strong>de</strong>scription length<br />
153/197