thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 5. Growth Dynamics<br />
bution parameters as they summarise the data <strong>and</strong> can gain an insight<br />
into the evolution by observing changes to the distribution parameters<br />
over time.<br />
Some <strong>of</strong> the early work on underst<strong>and</strong>ing object-oriented s<strong>of</strong>tware metric<br />
data by fitting it to a distribution was conducted by Tamai et al.<br />
[269,270] who have observed that the size <strong>of</strong> methods <strong>and</strong> classes (measured<br />
using lines <strong>of</strong> code) within a hierarchy fit the negative-binomial<br />
distribution. Recently, researchers inspired by work in complex systems<br />
[209, 287] (especially, real-world networks) have attempted to underst<strong>and</strong><br />
s<strong>of</strong>tware metric distributions as power-laws. Baxter et al. [21]<br />
studied 17 metrics in a number <strong>of</strong> Java s<strong>of</strong>tware systems <strong>and</strong> have<br />
shown that some metrics fit a log-normal distribution, while others fit<br />
a power-law distribution, <strong>and</strong> also that some metrics did not fit either<br />
<strong>of</strong> these distributions. Potanin et al. [223] investigated object graphs by<br />
analysing run-time data, <strong>and</strong> found that incoming <strong>and</strong> outgoing references<br />
fit a power law distribution. Wheeldon et al. [299] investigated the<br />
Java Development Kit <strong>and</strong> found 12 metrics fit power-law distribution.<br />
In a detailed case study <strong>of</strong> Visual Works Smalltalk, Java Development<br />
kit <strong>and</strong> Eclipse IDE, Concas et al. [54] observe that out-degree measures<br />
<strong>of</strong> the class graphs <strong>and</strong> Class Lines <strong>of</strong> Code fit a log-normal distribution,<br />
while method lines <strong>of</strong> code <strong>and</strong> in-degree measures <strong>of</strong> a class graph fit a<br />
Pareto distribution. Herraiz [118] investigated the distribution <strong>of</strong> SLOC<br />
(Source Lines <strong>of</strong> Code) in 12,010 packages available for the FreeBSD<br />
s<strong>of</strong>tware system <strong>and</strong> found that SLOC fitted a double pareto distribution.<br />
The common element in all <strong>of</strong> these studies is that s<strong>of</strong>tware metric<br />
distributions are non-gaussian <strong>and</strong> tended to be positively skewed with<br />
long tails. Unfortunately, these studies have not been able to identify<br />
a consistent probability distribution that can be expected for a certain<br />
metric.<br />
Despite consistent results that find skewed distributions when a robust<br />
fit is found, the methods used to fit the distributions have certain<br />
inherent weaknesses <strong>and</strong> limitations. In order to fit many <strong>of</strong> these distributions,<br />
the raw data is <strong>of</strong>ten transformed since s<strong>of</strong>tware metric data<br />
has a large number <strong>of</strong> zero values. For instance, it is common to have a<br />
set <strong>of</strong> classes with no dependents or methods with no branching state-<br />
96