20.01.2014 Views

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5. Growth Dynamics<br />

Another dimension <strong>of</strong> machine generated code was observed when we<br />

investigated two systems PMD <strong>and</strong> Checkstyle that had very high Gini<br />

Coefficient values for Number <strong>of</strong> Attributes <strong>and</strong> Number <strong>of</strong> Methods. Interestingly<br />

these systems had a substantial number <strong>of</strong> classes with either<br />

zero fields or zero methods. This zero inflation increases the Gini<br />

Coefficient value. But, what type <strong>of</strong> design choices cause such an unusual<br />

shape? Interestingly, both PMD <strong>and</strong> Checkstyle serve a similar<br />

purpose. These tools help developers check that Java source code adheres<br />

to specified coding conventions <strong>and</strong> st<strong>and</strong>ards. This functionality<br />

is provided via a set <strong>of</strong> configurable rules around coding st<strong>and</strong>ards, formatting<br />

rules <strong>and</strong> source code metrics.<br />

A deeper investigation showed that both Checkstyle <strong>and</strong> PMD represent<br />

the Java source code that they process by constructing an Abstract Syntax<br />

Tree (AST) <strong>and</strong> use the visitor pattern to walk the nodes in the tree.<br />

In both these systems, the rules to check source code were developed<br />

by extending an abstract rule class with the child class providing the<br />

rule implementation. The rules that developers could configure <strong>and</strong><br />

add in these tools were written without making use <strong>of</strong> any fields <strong>and</strong><br />

<strong>of</strong>ten encapsulated within a single method to satisfy the visitor pattern.<br />

This design approach created a large number <strong>of</strong> “poor" classes making<br />

those that actually possessed more functionality look unusually rich,<br />

pushing the Gini coefficient value closer to 1.<br />

5.6 Summary<br />

Evolving s<strong>of</strong>tware tends to exhibit growth. In this chapter our focus was<br />

on how maintenance effort is distributed by analyzing class size <strong>and</strong><br />

complexity metric distributions. The data we collected for our study<br />

showed that size <strong>and</strong> complexity are unevenly distributed, confirming<br />

similar observations made by other researchers. However, these skewed<br />

distributions cause st<strong>and</strong>ard statistical summary measures such as<br />

mean <strong>and</strong> st<strong>and</strong>ard deviation to provide misleading information making<br />

their application for comparative analysis challenging. In order to<br />

overcome this limitation, we analyzed the data using Gini coefficient, a<br />

higher-order statistic widely used in economics to study the distribution<br />

136

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!