thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 5. Growth Dynamics<br />
(see Figure 5.13 showing the Number <strong>of</strong> Branches Gini Coefficient values<br />
for JabRef). JabRef is a tool that helps manage bibliography databases,<br />
specifically BibTex files. Interestingly, the developers initially used a<br />
h<strong>and</strong>-written parsers <strong>and</strong> in RSN 6 introduced a machine generated<br />
parser. The introduction <strong>of</strong> the machine generated parser for BibTex<br />
files caused the Number <strong>of</strong> Branches Gini Coefficient value to increase<br />
from 0.74 to approximately 0.91 (see Figure 5.13).<br />
High Gini Coefficient values indicate that these systems have a small<br />
set <strong>of</strong> very large <strong>and</strong> complex classes (wealthy in terms <strong>of</strong> size <strong>and</strong> complexity).<br />
As noted earlier, human developers rarely write code in which<br />
Gini Coefficients for specific measures go past 0.80 as they will find it<br />
hard to write <strong>and</strong> maintain methods with very high algorithmic complexity<br />
[74,263]. The ability <strong>of</strong> detecting <strong>and</strong> flagging machine generated<br />
code is valuable since it signals the possible need for additional expertise<br />
in order to maintain or enhance an existing code base <strong>and</strong> to meet<br />
strategic objectives.<br />
Interestingly, we did notice a higher Number <strong>of</strong> Branches value (Gini Coefficient<br />
around 0.8) in two XML parsers Xerces <strong>and</strong> Xalan, both <strong>of</strong> which<br />
contained human written parsers. We determined that they were h<strong>and</strong><br />
written by studying repository access logs <strong>and</strong> comments left within<br />
the code. When machines generate parsers, they create a full set <strong>of</strong><br />
classes rather than partially adjusting a few lines to fix defects, this is<br />
visible in the repository logs related to those files. Further, the code<br />
files written by machine use variable names that are alpha-numeric<br />
with the numeric component increasing in value for each new variable<br />
rather than in a human readable form (for example the variables generated<br />
by the machine generated parser take the form jj_123, jj_124,<br />
jj_125 etc.)<br />
Surprisingly, in s<strong>of</strong>tware with human written parsers, the developers<br />
choose to centralise the functionality <strong>of</strong> the parser into a few classes<br />
with large <strong>and</strong> complex methods rather than distributing the complexity<br />
among a large number <strong>of</strong> classes. Though these classes did not match<br />
the size or complexity <strong>of</strong> machine generated code, they influenced the<br />
overall distribution sufficiently to cause a higher Gini value.<br />
135