20.01.2014 Views

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. Growth Dynamics<br />

mean, median, or st<strong>and</strong>ard deviation. Unfortunately, these descriptive<br />

statistics provide little useful information about the distribution<br />

<strong>of</strong> the data, particularly if it is skewed, as is common with many s<strong>of</strong>tware<br />

metrics [21,55,118,223,270,299]. Furthermore, the typical longtailed<br />

metric distributions makes precise interpretation with st<strong>and</strong>ard<br />

descriptive statistical measures difficult.<br />

Commonly used summary measures such as “arithmetic mean” <strong>and</strong><br />

“variance” capture the central tendency in a given data set. However,<br />

where the distribution is strongly skewed, they become much less reliable<br />

in helping underst<strong>and</strong> the shape <strong>and</strong> changes in the underlying<br />

distribution. Moreover, additional problems may arise due to changes<br />

in both the degree <strong>of</strong> concentration <strong>of</strong> individual values <strong>and</strong> the population<br />

size. Specifically, since these summary measures are influenced<br />

by the population size which tends to increase in evolving s<strong>of</strong>tware systems.<br />

Descriptive statistics such as median <strong>and</strong> variance are also likely to be<br />

misleading, given the nature <strong>of</strong> the underlying distribution. Specifically,<br />

we found that the median measure does not change substantially over<br />

time reducing its effectiveness when applied to underst<strong>and</strong>ing s<strong>of</strong>tware<br />

evolution. An example <strong>of</strong> this is illustrated in Figure 5.2, where the<br />

median <strong>of</strong> three different metrics is shown for PMD. As can be seen<br />

in the figure, the median value is very stable over a period <strong>of</strong> nearly<br />

5 years <strong>of</strong> evolution. Though there is some change (to the median), in<br />

absolute terms the value does not convey sufficient information about<br />

the nature <strong>and</strong> dynamics <strong>of</strong> the evolution.<br />

Additional statistics such as the skew, which measures the asymmetry<br />

<strong>of</strong> the data, <strong>and</strong> kurtosis, which measures the peakedness <strong>of</strong> the<br />

data, may be applied, but are ineffective for comparison between systems<br />

with different population sizes as these measures are unbounded<br />

<strong>and</strong> change depending on the size <strong>of</strong> the underlying population, making<br />

relative comparisons ineffective [221]. Given this situation, it is not<br />

surprising that metrics use in industry is not widespread [137]. This<br />

situation is also not helped by the current generation <strong>of</strong> s<strong>of</strong>tware metric<br />

tools as many commercial <strong>and</strong> open source tools [47, 51, 196, 203,<br />

94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!