thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 5. Growth Dynamics<br />
mean, median, or st<strong>and</strong>ard deviation. Unfortunately, these descriptive<br />
statistics provide little useful information about the distribution<br />
<strong>of</strong> the data, particularly if it is skewed, as is common with many s<strong>of</strong>tware<br />
metrics [21,55,118,223,270,299]. Furthermore, the typical longtailed<br />
metric distributions makes precise interpretation with st<strong>and</strong>ard<br />
descriptive statistical measures difficult.<br />
Commonly used summary measures such as “arithmetic mean” <strong>and</strong><br />
“variance” capture the central tendency in a given data set. However,<br />
where the distribution is strongly skewed, they become much less reliable<br />
in helping underst<strong>and</strong> the shape <strong>and</strong> changes in the underlying<br />
distribution. Moreover, additional problems may arise due to changes<br />
in both the degree <strong>of</strong> concentration <strong>of</strong> individual values <strong>and</strong> the population<br />
size. Specifically, since these summary measures are influenced<br />
by the population size which tends to increase in evolving s<strong>of</strong>tware systems.<br />
Descriptive statistics such as median <strong>and</strong> variance are also likely to be<br />
misleading, given the nature <strong>of</strong> the underlying distribution. Specifically,<br />
we found that the median measure does not change substantially over<br />
time reducing its effectiveness when applied to underst<strong>and</strong>ing s<strong>of</strong>tware<br />
evolution. An example <strong>of</strong> this is illustrated in Figure 5.2, where the<br />
median <strong>of</strong> three different metrics is shown for PMD. As can be seen<br />
in the figure, the median value is very stable over a period <strong>of</strong> nearly<br />
5 years <strong>of</strong> evolution. Though there is some change (to the median), in<br />
absolute terms the value does not convey sufficient information about<br />
the nature <strong>and</strong> dynamics <strong>of</strong> the evolution.<br />
Additional statistics such as the skew, which measures the asymmetry<br />
<strong>of</strong> the data, <strong>and</strong> kurtosis, which measures the peakedness <strong>of</strong> the<br />
data, may be applied, but are ineffective for comparison between systems<br />
with different population sizes as these measures are unbounded<br />
<strong>and</strong> change depending on the size <strong>of</strong> the underlying population, making<br />
relative comparisons ineffective [221]. Given this situation, it is not<br />
surprising that metrics use in industry is not widespread [137]. This<br />
situation is also not helped by the current generation <strong>of</strong> s<strong>of</strong>tware metric<br />
tools as many commercial <strong>and</strong> open source tools [47, 51, 196, 203,<br />
94