thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 5. Growth Dynamics<br />
• How does the pr<strong>of</strong>ile <strong>and</strong> shape <strong>of</strong> this distribution change as s<strong>of</strong>tware<br />
systems evolve?<br />
• Is the rate <strong>and</strong> nature <strong>of</strong> change erratic?<br />
• Do large <strong>and</strong> complex classes become bigger <strong>and</strong> more complex as<br />
s<strong>of</strong>tware systems evolve?<br />
The typical method to answer these questions is to compute traditional<br />
descriptive statistical measures such as arithmetic mean (referred to as<br />
“mean” in the this <strong>thesis</strong> to improve readability), median <strong>and</strong> st<strong>and</strong>ard<br />
deviation on a set <strong>of</strong> size <strong>and</strong> complexity measures <strong>and</strong> then analyze<br />
their changes over time. However, it has been shown that s<strong>of</strong>tware size<br />
<strong>and</strong> complexity metric distributions are non-gaussian <strong>and</strong> are highly<br />
skewed with long tails [21, 55, 270]. This asymmetric nature limits<br />
the effectiveness <strong>of</strong> traditional descriptive statistical measures such as<br />
mean <strong>and</strong> st<strong>and</strong>ard deviation as these values will be heavily influenced<br />
by the samples in the tail making it hard to derive meaningful inferences.<br />
Recently advocated alternative method to analyze metric distributions<br />
[21,55,118,223,270,299] involves fitting metric data to a known probability<br />
distribution. For instance, statistical techniques can be used to<br />
determine if the metric data fits a log-normal distribution [55]. Once a<br />
strong fit is found, we can gain some insight into the s<strong>of</strong>tware system<br />
from the distribution parameters. Unfortunately, the approach <strong>of</strong> fitting<br />
data to a known distribution is more complex <strong>and</strong> the metric data<br />
may not fit any known <strong>and</strong> well understood probability distributions<br />
without a transformation <strong>of</strong> the data.<br />
S<strong>of</strong>tware metrics, it turns out, are distributed like wealth in society —<br />
where a few individuals have a high concentration <strong>of</strong> wealth, while the<br />
majority are dispersed across a broad range from very poor to what<br />
are considered middle class. To take advantage <strong>of</strong> this nature, we analyze<br />
s<strong>of</strong>tware metrics using the Gini coefficient, a bounded higher-order<br />
statistic [191] widely used in the field <strong>of</strong> socio-economics to study the<br />
distribution <strong>of</strong> wealth <strong>and</strong> how it changes over time. Specifically it is<br />
91