thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 3. Data Selection Methodology<br />
The size <strong>and</strong> skill <strong>of</strong> development teams, though helpful, was a criteria<br />
that was removed after an initial pass at selecting systems mainly because<br />
it was not possible to obtain this information accurately. In some<br />
<strong>of</strong> our projects, the s<strong>of</strong>tware used to host the source control repositories<br />
changed during the evolutionary history <strong>of</strong> a project <strong>and</strong> many projects<br />
choose to archive older contribution logs at regular intervals removing<br />
access to this data. These aspects limited our ability to determine the<br />
number <strong>of</strong> active <strong>and</strong> contributing developers to the project, specifically<br />
during the early part <strong>of</strong> the evolution. Another facet that could not be<br />
accurately determined was that the level <strong>of</strong> contribution from different<br />
developers. That is, we were not able to identify reliably if some developers<br />
contribute more code than others. Further, some project members<br />
contributed artwork, documentation, organised <strong>and</strong> conducted meetings<br />
while some focused on testing. These non-code contributions were<br />
<strong>of</strong>ten not visible as active contributors on the source code repository.<br />
Another interesting finding during our investigation was that developers<br />
that have not contributed any material code for many years are still<br />
shown as members in the project. These limitations, including an observation<br />
that suggests that a small sub-set <strong>of</strong> developers are responsible<br />
for a large amount <strong>of</strong> the changes <strong>and</strong> additions to the source code<br />
in open source s<strong>of</strong>tware, has been noted by Capiluppi et al. [39].<br />
The observation that few developers contribute most <strong>of</strong> the code by<br />
Capiluppi et al. [39] <strong>and</strong> the variance in the contribution levels over<br />
time indicates that we require a measure that can meaningfully identify<br />
the number <strong>of</strong> normalised developers working on a project at any<br />
given point in time. However, such a metric has not yet been developed<br />
<strong>and</strong> widely accepted as effective <strong>and</strong> hence we did not rely on the<br />
development team size as a variable for use in our study.<br />
3.5 Selected Systems - An Overview<br />
Using the selection criteria, we initially identified 100s <strong>of</strong> s<strong>of</strong>tware systems<br />
that satisfy the criteria. However, we focused on a representative<br />
smaller subset in order to allow us to study each <strong>of</strong> the selected systems<br />
at a greater depth. Our final data set comprises <strong>of</strong> forty s<strong>of</strong>tware sys-<br />
50