27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The important aspect is to provide repositories with<br />

“intelligence” by creating repositories that store not only source<br />

code, but also sufficient information that would allow drawing<br />

useful conclusions as described by the Fact Extractor Support<br />

of Kenyon [7]. Repositories of this kind should be<br />

accompanied with metadata in a machine understandable<br />

format that can be parsed, analyzed and categorized fast and<br />

efficiently. Metadata may indicate useful parameters, such as<br />

file names and sizes, versions and modification permissions,<br />

and license-related details. When concentrating on license<br />

information and dependencies, input to the software repository<br />

can be provided by agents that extract license information from<br />

source code, libraries and text files. On the other hand, the<br />

stored information is maintained in a suitable format, in order<br />

to be provided to tools that can reason over the license details.<br />

Reasoning mechanisms extract results that usually lead to an<br />

implementation decision for a specific information system or<br />

point out conflicts related with license use. Another important<br />

parameter in software repositories is related to how different<br />

versions of the same entry are handled. Information about<br />

changes in license dependencies needs to be kept up to date,<br />

since this may lead to changes in implementation decisions.<br />

One attempt for storing license information in a repository<br />

can be found in FOSSology [8]. The study of FLOSS was<br />

initiated at Hewlett Packard more than a d ecade ago, but its<br />

final version was released recently. FOSSology presents a list<br />

with all license types identified for each project stored in the<br />

repository, while it contains a tool for analyzing the files given<br />

as input by matching text against license templates. Currently<br />

FOSSology supports about 260 different licenses (considering<br />

different license versions). A match percentage, indicating the<br />

probability that the discovered license is indeed the one<br />

included in the file under examination, is also provided. This<br />

happens, since in practice the project authors may have<br />

changed words or phrases of the original license text. The<br />

presence of such a percentage is, thus, desirable in all license<br />

repositories, as is the presence of license taxonomies that can<br />

help towards identifying which license category each project<br />

belongs to. Taxonomies can be further used for identifying<br />

licenses that are outside the acceptable license set, defined in<br />

policies imposed inside an organization. An example, where<br />

matching of licenses against a set of management rules is<br />

supported, can be found in Artifactory 6 , an internal repository<br />

that acts as a proxy between a build tool (e.g., Maven, Ant) and<br />

the outside world. Artifactory supports five statuses for licenses<br />

found: unapproved, approved, unknown, not found (when no<br />

license information exists) or neutral (used to indicate<br />

unapproved licenses when an approved license also exists).<br />

Based on the existing approaches and generic repository<br />

structures, an example of what should form part of the license<br />

association metadata for a software entry is illustrated in Fig. 2.<br />

This information links projects with one or more licenses and<br />

should be available for each license encountered (apart from<br />

6 http://www.jfrog.com/<br />

Figure 2. License metadata in a software repository<br />

the sixth one, which provides statistical information). Note that<br />

the repository can either be centralized or distributed [9].<br />

V. DISCOVERING POSSIBLE LICENSES<br />

The previous techniques offer the means for e xtracting the<br />

license information and for storing this information in<br />

repositories. However, licenses and components cannot be<br />

viewed individually, but should be put in context with the rest<br />

of the entities that comprise the software system as part of the<br />

license compatibility process. This can be performed through a<br />

two-step procedure.<br />

A. Modeling license relations<br />

The first step lies in identifying the set o f all possible<br />

licenses and modeling their relations. The whole set of open<br />

source licenses is quite large: 69 licenses have currently an<br />

approval from the Open Source Initiative 7 (OSI) and 88 from<br />

the Free Software Foundation 8 (FSF). Therefore, recent<br />

literature works focus on the most well-known and most widely<br />

used open source licenses, such as GNU GPL, GNU Lesser<br />

General Public License (LGPL) (version 2), Apache License<br />

(version 2.0), New BSD licenses, MIT license and Eclipse<br />

Public License. A practical approach that presents<br />

compatibility among some of the most popular open source<br />

licenses can be found in [10]. In general, license A can be<br />

considered to be compatible with license B, if software that<br />

contains components from both licenses can be licensed under<br />

license B. Under this line of though, all known licenses and<br />

their compatibilities can be represented through a di rected<br />

graph G(V, E) as the one depicted in Fig. 3. Each node V of the<br />

graph represents one of the known licenses, while each edge E<br />

represents compatibility between the adjacent licenses. The<br />

direction of the edge represents which license should be used<br />

for software containing components with licenses from both<br />

nodes. This approach is based on manual interpretation of each<br />

license (through the corresponding license legal text) and<br />

provides general rules for reasoning on compatibility among<br />

licenses. A similar compatibility model is discussed in [11].<br />

More detailed solutions, which integrate the notions of<br />

rights and obligations linked with a license, aim at modeling<br />

7 http://www.opensource.org/<br />

8 http://www.fsf.org/<br />

202

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!