20.01.2014 Views

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

thesis - Faculty of Information and Communication Technologies ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4. Measuring Evolving S<strong>of</strong>tware<br />

4.5.1 Jar Extraction<br />

We begin our metric extraction process by first extracting the compiled<br />

class files from inside the set <strong>of</strong> Java Archives (JAR files) associated<br />

with an individual version. JAR files are a st<strong>and</strong>ard packaging <strong>and</strong> distribution<br />

method used by all Java projects under analysis. The set <strong>of</strong><br />

JAR files provided as input into this extraction step was manually constructed<br />

<strong>and</strong> all known external libraries (also packaged as JAR files)<br />

were tagged manually for removal (cf. Section 3.6.2 for a discussion <strong>of</strong><br />

the rationale for removing external libraries).<br />

JAR files were tagged as potential external libraries based on the package<br />

names <strong>of</strong> classes inside the JAR file. We found that using the<br />

package names was an effective method to detect potential external libraries<br />

because Java developers tend to follow the recommended st<strong>and</strong>ard<br />

package naming convention <strong>and</strong> embed the name <strong>of</strong> the project, organisation<br />

or team in the package name [235]. For example, all classes<br />

developed by the Hibernate project have a package name that starts<br />

with org.hibernate. We used this common naming convention that is<br />

applied by developers to cluster package names manually (after a simple<br />

sort) <strong>and</strong> then identify potential third-party JAR files. Once potentially<br />

distinct set <strong>of</strong> packages was identified, we used a Google search to check<br />

if a distinct project with its own source code repository was available<br />

on the web that matched the package signature identified.<br />

Using this external library identification technique on our data set, we<br />

were able to identify separate project web sites as well as source code<br />

repositories for many third party libraries within the s<strong>of</strong>tware systems.<br />

Once a distinct open-source project was identified as the primary contributor<br />

<strong>of</strong> the external library, we created a regular expression to match<br />

the package names <strong>of</strong> known third party libraries <strong>and</strong> used this regular<br />

expression to identify <strong>and</strong> remove external library JAR files from all<br />

versions <strong>of</strong> the s<strong>of</strong>tware system. An example <strong>of</strong> such a pattern was the<br />

one created to identify the use <strong>of</strong> Apache Commons library where the<br />

package names had a format org.apache.commons.*.<br />

73

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!