20.01.2014 Views

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. Growth Dynamics<br />

• How does the pr<strong>of</strong>ile <strong>and</strong> shape <strong>of</strong> this distribution change as s<strong>of</strong>tware<br />

systems evolve?<br />

• Is the rate <strong>and</strong> nature <strong>of</strong> change erratic?<br />

• Do large <strong>and</strong> complex classes become bigger <strong>and</strong> more complex as<br />

s<strong>of</strong>tware systems evolve?<br />

The typical method to answer these questions is to compute traditional<br />

descriptive statistical measures such as arithmetic mean (referred to as<br />

“mean” in the this <strong>thesis</strong> to improve readability), median <strong>and</strong> st<strong>and</strong>ard<br />

deviation on a set <strong>of</strong> size <strong>and</strong> complexity measures <strong>and</strong> then analyze<br />

their changes over time. However, it has been shown that s<strong>of</strong>tware size<br />

<strong>and</strong> complexity metric distributions are non-gaussian <strong>and</strong> are highly<br />

skewed with long tails [21, 55, 270]. This asymmetric nature limits<br />

the effectiveness <strong>of</strong> traditional descriptive statistical measures such as<br />

mean <strong>and</strong> st<strong>and</strong>ard deviation as these values will be heavily influenced<br />

by the samples in the tail making it hard to derive meaningful inferences.<br />

Recently advocated alternative method to analyze metric distributions<br />

[21,55,118,223,270,299] involves fitting metric data to a known probability<br />

distribution. For instance, statistical techniques can be used to<br />

determine if the metric data fits a log-normal distribution [55]. Once a<br />

strong fit is found, we can gain some insight into the s<strong>of</strong>tware system<br />

from the distribution parameters. Unfortunately, the approach <strong>of</strong> fitting<br />

data to a known distribution is more complex <strong>and</strong> the metric data<br />

may not fit any known <strong>and</strong> well understood probability distributions<br />

without a transformation <strong>of</strong> the data.<br />

S<strong>of</strong>tware metrics, it turns out, are distributed like wealth in society —<br />

where a few individuals have a high concentration <strong>of</strong> wealth, while the<br />

majority are dispersed across a broad range from very poor to what<br />

are considered middle class. To take advantage <strong>of</strong> this nature, we analyze<br />

s<strong>of</strong>tware metrics using the Gini coefficient, a bounded higher-order<br />

statistic [191] widely used in the field <strong>of</strong> socio-economics to study the<br />

distribution <strong>of</strong> wealth <strong>and</strong> how it changes over time. Specifically it is<br />

91

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!