thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
thesis - Faculty of Information and Communication Technologies ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 4. Measuring Evolving S<strong>of</strong>tware<br />
4.5.1 Jar Extraction<br />
We begin our metric extraction process by first extracting the compiled<br />
class files from inside the set <strong>of</strong> Java Archives (JAR files) associated<br />
with an individual version. JAR files are a st<strong>and</strong>ard packaging <strong>and</strong> distribution<br />
method used by all Java projects under analysis. The set <strong>of</strong><br />
JAR files provided as input into this extraction step was manually constructed<br />
<strong>and</strong> all known external libraries (also packaged as JAR files)<br />
were tagged manually for removal (cf. Section 3.6.2 for a discussion <strong>of</strong><br />
the rationale for removing external libraries).<br />
JAR files were tagged as potential external libraries based on the package<br />
names <strong>of</strong> classes inside the JAR file. We found that using the<br />
package names was an effective method to detect potential external libraries<br />
because Java developers tend to follow the recommended st<strong>and</strong>ard<br />
package naming convention <strong>and</strong> embed the name <strong>of</strong> the project, organisation<br />
or team in the package name [235]. For example, all classes<br />
developed by the Hibernate project have a package name that starts<br />
with org.hibernate. We used this common naming convention that is<br />
applied by developers to cluster package names manually (after a simple<br />
sort) <strong>and</strong> then identify potential third-party JAR files. Once potentially<br />
distinct set <strong>of</strong> packages was identified, we used a Google search to check<br />
if a distinct project with its own source code repository was available<br />
on the web that matched the package signature identified.<br />
Using this external library identification technique on our data set, we<br />
were able to identify separate project web sites as well as source code<br />
repositories for many third party libraries within the s<strong>of</strong>tware systems.<br />
Once a distinct open-source project was identified as the primary contributor<br />
<strong>of</strong> the external library, we created a regular expression to match<br />
the package names <strong>of</strong> known third party libraries <strong>and</strong> used this regular<br />
expression to identify <strong>and</strong> remove external library JAR files from all<br />
versions <strong>of</strong> the s<strong>of</strong>tware system. An example <strong>of</strong> such a pattern was the<br />
one created to identify the use <strong>of</strong> Apache Commons library where the<br />
package names had a format org.apache.commons.*.<br />
73