thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 5. Growth Dynamics<br />
plifying models constructed to underst<strong>and</strong> s<strong>of</strong>tware. Similar to Succi<br />
et al. [265], we compute the Spearman’s rank correlation coefficient ρ<br />
<strong>and</strong> applied the t-test to check if the reported coefficient is different from<br />
zero at a significance level <strong>of</strong> 0.05 for all 10 measures in all systems.<br />
The t-test checks that the reported relationship between the Gini Coefficient<br />
<strong>and</strong> Age (days since birth) can be considered to be statistically<br />
significant, while the correlation coefficient reports the strength <strong>of</strong> the<br />
relationship between the Gini Coefficient <strong>and</strong> Age. The non-parametric<br />
Spearman’s correlation coefficient was selected over Pearson’s correlation<br />
coefficient since as it does not make any assumptions about the<br />
distribution <strong>of</strong> the underlying data [279], specifically it does not assume<br />
that the data has a gaussian distribution.<br />
5.3.3 Checking Shape <strong>of</strong> Metric Data Distribution<br />
A consistent finding by other researchers [21, 55, 223, 270, 299] studying<br />
s<strong>of</strong>tware metric distributions has been that this data is positively<br />
skewed with long-tails. Can we confirm this finding in our own data?<br />
Further, will this shape assumption hold if metric data was observed<br />
over time? We undertook this step in order to provide additional strength<br />
to the current expectation that metric data is highly skewed.<br />
For a population with values x i , i = 1 to n with a mean <strong>of</strong> µ <strong>and</strong> a<br />
st<strong>and</strong>ard deviation <strong>of</strong> σ,<br />
MovementSkewness = 1 n<br />
n<br />
(x<br />
∑<br />
i − µ) 3<br />
σ<br />
i=1<br />
3 (5.3.1)<br />
In our analysis, we tested the metric data for each release over the entire<br />
evolution history to ensure that the data did not have a gaussian<br />
distribution by using the Shapiro-Wilk goodness <strong>of</strong> fit tests for normality<br />
[279] at a significance level <strong>of</strong> 0.05. The expectation is that the test<br />
will show that the metric data is not normally distributed. Additionally,<br />
to confirm that the distribution can be considered skewed we computed<br />
the descriptive measure <strong>of</strong> movement skewness (See Equation<br />
105