20.01.2014 Views

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. Growth Dynamics<br />

bution parameters as they summarise the data <strong>and</strong> can gain an insight<br />

into the evolution by observing changes to the distribution parameters<br />

over time.<br />

Some <strong>of</strong> the early work on underst<strong>and</strong>ing object-oriented s<strong>of</strong>tware metric<br />

data by fitting it to a distribution was conducted by Tamai et al.<br />

[269,270] who have observed that the size <strong>of</strong> methods <strong>and</strong> classes (measured<br />

using lines <strong>of</strong> code) within a hierarchy fit the negative-binomial<br />

distribution. Recently, researchers inspired by work in complex systems<br />

[209, 287] (especially, real-world networks) have attempted to underst<strong>and</strong><br />

s<strong>of</strong>tware metric distributions as power-laws. Baxter et al. [21]<br />

studied 17 metrics in a number <strong>of</strong> Java s<strong>of</strong>tware systems <strong>and</strong> have<br />

shown that some metrics fit a log-normal distribution, while others fit<br />

a power-law distribution, <strong>and</strong> also that some metrics did not fit either<br />

<strong>of</strong> these distributions. Potanin et al. [223] investigated object graphs by<br />

analysing run-time data, <strong>and</strong> found that incoming <strong>and</strong> outgoing references<br />

fit a power law distribution. Wheeldon et al. [299] investigated the<br />

Java Development Kit <strong>and</strong> found 12 metrics fit power-law distribution.<br />

In a detailed case study <strong>of</strong> Visual Works Smalltalk, Java Development<br />

kit <strong>and</strong> Eclipse IDE, Concas et al. [54] observe that out-degree measures<br />

<strong>of</strong> the class graphs <strong>and</strong> Class Lines <strong>of</strong> Code fit a log-normal distribution,<br />

while method lines <strong>of</strong> code <strong>and</strong> in-degree measures <strong>of</strong> a class graph fit a<br />

Pareto distribution. Herraiz [118] investigated the distribution <strong>of</strong> SLOC<br />

(Source Lines <strong>of</strong> Code) in 12,010 packages available for the FreeBSD<br />

s<strong>of</strong>tware system <strong>and</strong> found that SLOC fitted a double pareto distribution.<br />

The common element in all <strong>of</strong> these studies is that s<strong>of</strong>tware metric<br />

distributions are non-gaussian <strong>and</strong> tended to be positively skewed with<br />

long tails. Unfortunately, these studies have not been able to identify<br />

a consistent probability distribution that can be expected for a certain<br />

metric.<br />

Despite consistent results that find skewed distributions when a robust<br />

fit is found, the methods used to fit the distributions have certain<br />

inherent weaknesses <strong>and</strong> limitations. In order to fit many <strong>of</strong> these distributions,<br />

the raw data is <strong>of</strong>ten transformed since s<strong>of</strong>tware metric data<br />

has a large number <strong>of</strong> zero values. For instance, it is common to have a<br />

set <strong>of</strong> classes with no dependents or methods with no branching state-<br />

96

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!