20.01.2014 Views

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 5. Growth Dynamics<br />

(see Figure 5.13 showing the Number <strong>of</strong> Branches Gini Coefficient values<br />

for JabRef). JabRef is a tool that helps manage bibliography databases,<br />

specifically BibTex files. Interestingly, the developers initially used a<br />

h<strong>and</strong>-written parsers <strong>and</strong> in RSN 6 introduced a machine generated<br />

parser. The introduction <strong>of</strong> the machine generated parser for BibTex<br />

files caused the Number <strong>of</strong> Branches Gini Coefficient value to increase<br />

from 0.74 to approximately 0.91 (see Figure 5.13).<br />

High Gini Coefficient values indicate that these systems have a small<br />

set <strong>of</strong> very large <strong>and</strong> complex classes (wealthy in terms <strong>of</strong> size <strong>and</strong> complexity).<br />

As noted earlier, human developers rarely write code in which<br />

Gini Coefficients for specific measures go past 0.80 as they will find it<br />

hard to write <strong>and</strong> maintain methods with very high algorithmic complexity<br />

[74,263]. The ability <strong>of</strong> detecting <strong>and</strong> flagging machine generated<br />

code is valuable since it signals the possible need for additional expertise<br />

in order to maintain or enhance an existing code base <strong>and</strong> to meet<br />

strategic objectives.<br />

Interestingly, we did notice a higher Number <strong>of</strong> Branches value (Gini Coefficient<br />

around 0.8) in two XML parsers Xerces <strong>and</strong> Xalan, both <strong>of</strong> which<br />

contained human written parsers. We determined that they were h<strong>and</strong><br />

written by studying repository access logs <strong>and</strong> comments left within<br />

the code. When machines generate parsers, they create a full set <strong>of</strong><br />

classes rather than partially adjusting a few lines to fix defects, this is<br />

visible in the repository logs related to those files. Further, the code<br />

files written by machine use variable names that are alpha-numeric<br />

with the numeric component increasing in value for each new variable<br />

rather than in a human readable form (for example the variables generated<br />

by the machine generated parser take the form jj_123, jj_124,<br />

jj_125 etc.)<br />

Surprisingly, in s<strong>of</strong>tware with human written parsers, the developers<br />

choose to centralise the functionality <strong>of</strong> the parser into a few classes<br />

with large <strong>and</strong> complex methods rather than distributing the complexity<br />

among a large number <strong>of</strong> classes. Though these classes did not match<br />

the size or complexity <strong>of</strong> machine generated code, they influenced the<br />

overall distribution sufficiently to cause a higher Gini value.<br />

135

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!