Physics for Geologists, Second edition
Physics for Geologists, Second edition
Physics for Geologists, Second edition
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Some dangers of mathematical statistics 133<br />
Figure 12.9 contains the results of the first two series of experiments<br />
conducted by Darcy, and the series conducted independently by Ritter, and<br />
reported by Darcy. It is fairly clear that the lines representing the best fit of<br />
the data are good, but not perfect, and that the relationship is indeed linear,<br />
as Darcy's Equation (12.10,12.11) indicates. The best check on linearity is to<br />
do what Reynolds of Reynolds number fame did: plot the logarithms of the<br />
data (natural or base 10) or compute the linear regression of the logarithms,<br />
and the slope of the line will indicate the order of the association. (Zeros<br />
and negative numbers in the data can be eliminated by adding a constant to<br />
all the data.) For linear relationships, y = bxm, the slope m should be 1 and<br />
In y = lnx + In b. If the slope is far from 1, there is little point in proceeding<br />
with the linear regression analysis, but the order of the association will be<br />
evident from the logarithmic data. The logarithms of Darcy's data are plot-<br />
ted in Figure 13.1, and it is clear that although the slopes are not all exactly<br />
1 (marked by the dashed line), they are close to 1 and we can regard Darcy's<br />
equation as being linear over the range of values he measured. You must<br />
never extrapolate beyond the measured range without very good reason.<br />
The data in Table 13.1 were obtained experimentally. Using a pocket cal-<br />
culator, it is found that the linear regression equation is y = 3x - 3.66 and<br />
the correlation coefficient, r, is 0.97. This coefficient is extremely signifi-<br />
cant. Student's t, which is one method of assessing significance, is 14.0 <strong>for</strong><br />
seven degrees of freedom (there are nine pairs of data, but two points will<br />
always lie on a straight line) whereas there is a probability of about 1 per<br />
cent or 0.01 that t will be 3.5 or larger by chance, and perhaps a million to<br />
one that it will be as large as 14 or larger. You might there<strong>for</strong>e have great<br />
confidence that the relationship is linear -and you would be wrong.<br />
There are a few clues. If you subtract the observed values of y from the<br />
values calculated from the regression equation, you will see that there is<br />
a systematic pattern to the errors. If you plot the data and the regression<br />
line, this pattern is evident (Figure 13.2).<br />
Copyright 2002 by Richard E. Chapman<br />
Logarithm of flow in litreslminutes<br />
Figure 13.1 Darcy's data, plotted as logarithms. The slopes are close to 1<br />
(marked by the dashed line) indicating a linear relationship between<br />
the difference of head, Ah, and the discharge, Q.