A Rebuttal to Christopher Hillar and Friedrich Sommer's Comment ...
A Rebuttal to Christopher Hillar and Friedrich Sommer's Comment ...
A Rebuttal to Christopher Hillar and Friedrich Sommer's Comment ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
July 24, 2009<br />
A <strong>Rebuttal</strong> <strong>to</strong> Chris<strong>to</strong>pher <strong>Hillar</strong> <strong>and</strong> <strong>Friedrich</strong> Sommer’s <strong>Comment</strong> on<br />
Distilling Laws from Data<br />
Michael Schmidt 1 , Hod Lipson 2,3<br />
Introduction<br />
<strong>Hillar</strong> <strong>and</strong> Sommer recently published a comment [3] on the validity of the fitness criterion used <strong>to</strong><br />
search for free-form natural laws in experimental data [1]. The comment presents experimental <strong>and</strong><br />
theoretical arguments that the particular fitness criterion evaluates <strong>to</strong> zero <strong>and</strong> assert that the algorithm<br />
described in the paper cannot accomplish the stated goals. They further propose an alternative metric<br />
which they claim <strong>to</strong> be better suited.<br />
<strong>Rebuttal</strong><br />
We have carefully reviewed the arguments by <strong>Hillar</strong> <strong>and</strong> Sommer <strong>and</strong> have made the following<br />
observations:<br />
1. We identified a significant mistake made by <strong>Hillar</strong> et al. in their calculation of our fitness<br />
function [ref 2, Equation S8]. <strong>Hillar</strong> et al. have assumed that all variables are dependent on all<br />
other variables. This assumption led <strong>to</strong> a degenerate fitness of zero in all cases, which then led<br />
<strong>to</strong> their incorrect conclusion.<br />
2. In Section S2 of our supplementary materials [ref 2, Page 4], we explicitly state that one “…<br />
should not assume that every variable is interdependent on all others.” We further provide an<br />
example for the three-dimensional case [ref 2, Equation S6 <strong>and</strong> S7] where all but one pair of<br />
variables is independent.<br />
3. We have taken the liberty <strong>to</strong> correct their accompanying Matlab code posted online [4] so that<br />
it performs the correct calculation (see Figure 1) <strong>and</strong> produces the correct result (Figure 2).<br />
Further, in the Technical <strong>Comment</strong> [3], <strong>Hillar</strong> et al. propose a Hamil<strong>to</strong>nian fitness metric, HFit [ref 3,<br />
Equation 3] as a comparable fitness metric <strong>to</strong> our fitness metric [ref 2, Equation S8]. We must point out<br />
two important difficulties with the HFit metric:<br />
1. As <strong>Hillar</strong> et al. note, their Hamil<strong>to</strong>nian fitness metric “would require additional measures <strong>to</strong> bias<br />
the search away from functions that are nearly constant” [ref 3, Page 2]. As we discuss in detail<br />
in our manuscript [1], avoiding such trivial solutions is the key difficulty in searching for invariant<br />
equations <strong>to</strong> begin with, which we address in our original paper [1]. Therefore, <strong>Hillar</strong>’s metric<br />
would likely be inadequate in a computational search.<br />
2. The HFit proposed by <strong>Hillar</strong> et al. h<strong>and</strong>les a special case <strong>and</strong> could not be used <strong>to</strong> identify<br />
invariants of arbitrary form. The search for free-form invariants is the second key challenge we<br />
address with our method [1]. While <strong>Hillar</strong>’s metric may have uses in certain circumstances, it is<br />
less general, <strong>and</strong> would likely be inadequate for an open-ended computational search seeking <strong>to</strong><br />
find new invariants that do not necessarily follow a Hamil<strong>to</strong>nian form.<br />
1 Computational Biology, Cornell University, Ithaca, NY 14853, USA<br />
2 School of Mechanical & Aerospace Engineering, Cornell. University, Ithaca NY 14853, USA<br />
3 Computing & Information Science, Cornell University, Ithaca, NY 14853, USA
Conclusions<br />
In conclusion, an incorrect assumption about variable dependence by <strong>Hillar</strong> et al. led <strong>to</strong> a<br />
degenerate fitness calculation <strong>and</strong> their incorrect conclusion. Further, avoiding this mistake is addressed<br />
explicitly in our supplemental materials. We were able <strong>to</strong> modify the code posted online by <strong>Hillar</strong> et al.<br />
<strong>to</strong> perform the correct calculation <strong>and</strong> yield the correct result by editing three lines of their code.<br />
Further, the alternative function proposed by <strong>Hillar</strong> et al. is inadequate both because of its lack of<br />
generality <strong>and</strong> because of its inability <strong>to</strong> avoid trivial invariants – the two key challenges addressed in<br />
our original paper.<br />
<strong>Hillar</strong> et al.<br />
code [4]<br />
Corrected<br />
code<br />
function fffit = fffitness(a,b,c,x,v,t)<br />
Dfx = 2*c*x + b*v.*v + (2*a*v+2*b*v.*x).*pvd(v,x,t);<br />
Dfy = (2*a*v+2*b*v.*x) + (2*c*x + b*v.*v).*pvd(x,v,t);<br />
fffit = -sum(log(1+abs(pvd(x,v,t)-Dfy./Dfx)));<br />
Dfx = 2*c*x + b*v.*v;<br />
Dfy = 2*a*v + 2*b*v.*x;<br />
fffit = -mean(log(1+abs(pvd(x,v,t)-Dfy./Dfx)));<br />
Figure 1. The section of <strong>Hillar</strong>’s code published online with the Technical <strong>Comment</strong> where the mistake occurs.<br />
Hiller et al. assume all variables are dependent on each other; however for a 2D system, both variables must be<br />
independent. Correcting these three lines, the code produces the correct fitness (see Figure 2).<br />
Figure 2. The fitness l<strong>and</strong>scapes of the harmonic oscilla<strong>to</strong>r reproduced from [3, 4]. The blue surface is our correctly<br />
implemented fitness function [2] <strong>and</strong> the green surface is the HFit [3] by <strong>Hillar</strong>, et al. Our fitness function identifies<br />
all invariants of the harmonic oscilla<strong>to</strong>r in this space, which can be seen as a diagonal line of optima. These optima<br />
correspond <strong>to</strong> the Hamil<strong>to</strong>nian equation at different multiples or energies (eg. if H is constant, so is cH for any real<br />
c). Also, note that the two surfaces are scaled differently, but both have favorable gradients <strong>to</strong>ward the optima.
References<br />
[1] Schmidt M., Lipson H. (2009) "Distilling Free-Form Natural Laws from Experimental Data,"<br />
Science, Vol. 324, no. 5923, pp. 81 - 85.<br />
[2] Schmidt M., Lipson H. (2009) "Distilling Free-Form Natural Laws from Experimental Data,"<br />
Supplementary online materials.<br />
[3] <strong>Hillar</strong>, C., Sommer, F. “On the article “Distilling free-form natural laws from experimental data”,”<br />
(accessed July 24, 2009) http://www.msri.org/people/members/chillar/files/hs09b.pdf<br />
[4] <strong>Hillar</strong>, C., Sommer, F. “On the article “Distilling free-form natural laws from experimental data”,”<br />
Matlab files, (accessed July 24, 2009)<br />
http://www.msri.org/people/members/chillar/articles.html