20.01.2014 Views

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. Growth Dynamics<br />

plifying models constructed to underst<strong>and</strong> s<strong>of</strong>tware. Similar to Succi<br />

et al. [265], we compute the Spearman’s rank correlation coefficient ρ<br />

<strong>and</strong> applied the t-test to check if the reported coefficient is different from<br />

zero at a significance level <strong>of</strong> 0.05 for all 10 measures in all systems.<br />

The t-test checks that the reported relationship between the Gini Coefficient<br />

<strong>and</strong> Age (days since birth) can be considered to be statistically<br />

significant, while the correlation coefficient reports the strength <strong>of</strong> the<br />

relationship between the Gini Coefficient <strong>and</strong> Age. The non-parametric<br />

Spearman’s correlation coefficient was selected over Pearson’s correlation<br />

coefficient since as it does not make any assumptions about the<br />

distribution <strong>of</strong> the underlying data [279], specifically it does not assume<br />

that the data has a gaussian distribution.<br />

5.3.3 Checking Shape <strong>of</strong> Metric Data Distribution<br />

A consistent finding by other researchers [21, 55, 223, 270, 299] studying<br />

s<strong>of</strong>tware metric distributions has been that this data is positively<br />

skewed with long-tails. Can we confirm this finding in our own data?<br />

Further, will this shape assumption hold if metric data was observed<br />

over time? We undertook this step in order to provide additional strength<br />

to the current expectation that metric data is highly skewed.<br />

For a population with values x i , i = 1 to n with a mean <strong>of</strong> µ <strong>and</strong> a<br />

st<strong>and</strong>ard deviation <strong>of</strong> σ,<br />

MovementSkewness = 1 n<br />

n<br />

(x<br />

∑<br />

i − µ) 3<br />

σ<br />

i=1<br />

3 (5.3.1)<br />

In our analysis, we tested the metric data for each release over the entire<br />

evolution history to ensure that the data did not have a gaussian<br />

distribution by using the Shapiro-Wilk goodness <strong>of</strong> fit tests for normality<br />

[279] at a significance level <strong>of</strong> 0.05. The expectation is that the test<br />

will show that the metric data is not normally distributed. Additionally,<br />

to confirm that the distribution can be considered skewed we computed<br />

the descriptive measure <strong>of</strong> movement skewness (See Equation<br />

105

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!