27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

that holds a model of each software project internally. Using an<br />

XML file, Maven describes each project’s model as a se t of<br />

source code and third-party libraries (Java Archives or JARs).<br />

Moreover, it is able to detect transitive dependencies<br />

originating from the third-party libraries and include them in<br />

the project. These files are downloaded automatically from<br />

public or private repositories. Note that the model might<br />

include metadata for ea ch dependency, such as the<br />

dependency’s license name and text.<br />

The detection of each dependency’s license is perhaps the<br />

most difficult part in the process towards license decisions. As<br />

described, license discovery can be implemented either by<br />

searching each dependency’s file(s) or by looking up from a<br />

repository. Although public repositories are available, the<br />

existing repositories are not designed for providing license<br />

information. Additionally, there’s no guarantee that every JAR<br />

file contains license information. In order to evaluate the<br />

situation, a crawler that indexed a large number of JAR files in<br />

the central Maven repository was built. A list of the most<br />

common file names that include licensing information is<br />

presented in [4]. Out of the 31476 JAR files studied, exactly<br />

14000 contained a file that might hold license information<br />

(approximately 44.5% of the whole set). The most commonly<br />

used word in the name of files that hold licensing information<br />

was the obvious suspect, i.e., “license”, with 13601 of the JAR<br />

files containing at least one such file. The second name in order<br />

was “readme” with only 708 appearances. These numbers<br />

indicate that in most cases license detection can be performed<br />

without the need of a dedicated repository. However, there are<br />

cases, in which licensing information cannot be determined for<br />

a large number of dependencies. Consequently, although in the<br />

implemented case study a license repository is not necessary, it<br />

could offer a significant improvement in the results of the tool.<br />

The final step in the case study was to create the enriched<br />

system model and extract possible licenses. The missing<br />

information for performing this task is the license compatibility<br />

model. In the implemented tool, the licenses and their<br />

compatibilities were captured in an XML format. Multiple<br />

models have been discussed in the current paper; however, in<br />

the current stage we opted for the approach of [10]. Although it<br />

does not focus on individual rights and obligations expressed in<br />

a license, it offers the required abstraction for und erstanding<br />

which licenses can be adopted and which not. The file can be<br />

modified, in order to introduce additional licenses or change<br />

the relations of existing ones. The XML representation is easily<br />

coupled with the simple graph approach captured in machine<br />

readable notation. Moreover, it can also offer a graphical<br />

representation of the system’s individual components and the<br />

way their interconnections affect licenses allowing a cu stom<br />

XML parser to detect permissive combinations. Using this<br />

XML licenses dependencies file together with the set of<br />

licenses originating from the project’s dependencies, as<br />

collected from the first phase of license detection, appropriate<br />

license decisions can be taken.<br />

VII. CONCLUSIONS<br />

In this paper licenses in free open source software have<br />

been discussed along with representative approaches that guide<br />

the license compatibility process. License identification deals<br />

primarily with the extraction of license information from<br />

existing source code and binary files, whereas software<br />

repositories can be exploited for assisting the storing and fast<br />

retrieval of license-related metadata linked to software<br />

products. License discovery is then feasible by modeling the<br />

relations among licenses and performing reasoning actions on<br />

these relations. These aspects were demonstrated through a<br />

prototype implementation for license identification in the<br />

Maven build tool. Through the presented study, it can be<br />

concluded that it is important for software engineers to detect<br />

the correct licenses and avoid integrating prohibited licenses or<br />

ones that cause violations in license dependency chains. This<br />

decision is vital especially at our times, where a wide variety of<br />

open source, already tested software is available at our<br />

fingertips. Through the use case demonstration it can be seen<br />

that such approaches can be combined with existing tools. We<br />

are currently working towards completing the prototype system<br />

with the aim to cover more file types and include intelligent<br />

reasoning capabilities towards the provision of a complete<br />

“license assistant” tool.<br />

REFERENCES<br />

[1] K. W. Miller, J. Voas and T. Costello, “Free and Open Source<br />

Software,” IT Professional , vol.12, no.6, pp.14-16, Nov.-Dec. 2010.<br />

[2] V. Lindberg, “Intellectual Property and Open Source A Practical Guide<br />

to Protecting Code,” O'Reilly Media, 2008.<br />

[3] T. Tuunanen, J. Koskinen and T. Karkkainen, “Automated software<br />

license analysis,” Automated Software Eng., vol. 16, no. 3-4, pp. 455-<br />

490, Dec. 2009.<br />

[4] D. M. German, Y. Manabe and K. Inoue, “A sentence-matching method<br />

for automatic license identification of source code files,” in Proc. of the<br />

IEEE/ACM int’l conf. on Automated software engineering, ACM Press,<br />

2010.<br />

[5] Y. Landman, “How to Use Continuous Integration to Protect Your<br />

Projects from Open-Source License Violations,”<br />

http://weblogs.java.net/blog/yoavl/archive/2010/12/16/how-usecontinuous-integration-protect-your-projects-open-source-licen,<br />

2010.<br />

[6] J. Howison, M. Conklin and K. Crowston, “FLOSSmole: A<br />

collaborative repository for FL OSS research data and analyses,” Int’l<br />

Journal of Information Technology and Web Eng., vol. 1, no. 3, pp. 17-<br />

26, July 2006.<br />

[7] J. Bevan, E .J. Whitehead Jr., S. Kim and M. Godfrey, “Facilitating<br />

Software Evolution Research with Kenyon,” in Proc. of th e 10th<br />

European software engineering conference held jointly with 13th ACM<br />

SIGSOFT int’l symposium on Foundations of software engineering,<br />

ACM Press, 2005.<br />

[8] R. Gobeille, “The FOSSology project,” in Proc. of the 2008 Int’l<br />

working conf. on Mining software repositories, ACM Press, 2008, pp.<br />

47-50.<br />

[9] M. Conklin, “Project Entity Matching across FLOSS Repositories,” in<br />

Proc. of the 3rd int’l Conference on Open Source <strong>Systems</strong>, Springer-<br />

Verlag, vol. 234, 2007, pp. 45-57.<br />

[10] D.A. Wheeler, “The Free-Libre/Open Source Software (FLOSS) License<br />

Slide,” http://www.dwheeler.com/essays/floss-license-slide.pdf, 2007.<br />

[11] D. M. German, and A. E. Hassan, “License integration patterns:<br />

Addressing license mismatches in component-based development,” in<br />

Proc. of the IEEE 31st int’l Conference on Software Engineering, 2009,<br />

pp. 188-198.<br />

[12] T. A. Alspaugh, W. Scacchi and H. U. Asuncion, “Software licenses in<br />

context: The challenge of heterogeneously-licensed systems,” Journal of<br />

the Association for Information <strong>Systems</strong>, vol. 11, no. 11, pp. 730-755,<br />

Nov. 2010.<br />

[13] T. F. Gordon, “Report on a Prototype Decision Support System for OSS<br />

License Compatibility Issues,” Qualipso (IST- FP6-IP-034763),<br />

Deliverable A1.D2.1.3, 2010.<br />

204

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!