29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Generic Provenance Management for Linked Dataspaces<br />

Abstract<br />

The emergence of Linked Data accelerates the demand<br />

for mechanisms which can support users in the<br />

assessment of data quality and trustworthiness on the<br />

Web. In this process, the ability to track the historical<br />

trail behind information resources on the Web plays a<br />

fundamental role in the Linked Data Web. Applications<br />

consuming or generating Linked Data on the Web need<br />

to become provenance-aware, i.e., being able to<br />

capture and consume provenance descriptors<br />

associated with the data. This brings provenance as a<br />

key requirement for a wide spectrum of applications.<br />

This work focuses on the creation of this infrastructure,<br />

describing a generic provenance management<br />

framework for the Web using Semantic Web tools and<br />

standards to address the core challenges of provenance<br />

management on the Web.<br />

1. Introduction<br />

The advent of Linked Data in the last years as the defacto<br />

standard to publish data on the Web, and its<br />

uptake by early adopters, defines a clear trend towards a<br />

Web where users will be able to easily aggregate,<br />

consume and republish data. With Linked Data, Web<br />

information can be repurposed with a new level of<br />

granularity and scale. In this scenario, tracking the<br />

historical trail behind an information artifact plays a<br />

fundamental role for data consumers, allowing users to<br />

determine the suitability and quality of a piece of<br />

information.<br />

To provide the additional provenance data, Linked Data<br />

applications will demand mechanisms to track and<br />

manage provenance information. This new common<br />

requirement is inherent to the level of data integration<br />

provided by Linked Data and it is not found in most<br />

systems consuming information from “data silos”,<br />

where the relationship among data sources and<br />

applications is, in general, more rigid. This work<br />

focuses on the provision of a provenance management<br />

infrastructure having as a motivation supporting Linked<br />

Data applications.<br />

2. Description of the Approach<br />

The central goal behind this work is to provide a set of<br />

core functionalities that enable users to develop<br />

provenance-aware applications, both from the<br />

consumption (discovery/query/access) and from the<br />

capture (logging/ publishing) perspectives. The<br />

architecture of the proposed approach maximizes the<br />

encapsulation of provenance capture and consumption<br />

André Freitas, Seán O’Riain, Edward Curry<br />

Digital Enterprise Research Institute (DERI)<br />

National University of Ireland, <strong>Galway</strong><br />

{firstname.lastname@deri.org}<br />

124<br />

functionalities, separating provenance into a distinct<br />

layer.<br />

Two strong requirements in the construction of the<br />

framework are the minimization of efforts in software<br />

adaptations (changes in the original system to make a<br />

system provenance-aware) and the provision of a<br />

expressive provenance model. To satisfy the<br />

requirements above, the proposed approach employed<br />

the following strategies:<br />

- Provenance capture by software reflection and<br />

annotation.<br />

- An Expressive and accessible API for<br />

provenance queries.<br />

- Use of Semantic Web standards & tools.<br />

The final solution includes a provenance management<br />

framework [1] (Prov4J) and a provenance ontology<br />

(W3P) [2]. Further details on both artifacts can be found<br />

in [1] and [2].<br />

3. Evaluation<br />

The framework was evaluated using a provenance<br />

dataset which was generated using aggregated business<br />

news and opinions collected from data sources on the<br />

Web. These data elements defined the ground artifacts<br />

which were further curated and analyzed in a financial<br />

analysis workflow simulator. Based on the generated<br />

provenance workflow, a provenance query set was<br />

created, covering 51 types of provenance queries. The<br />

framework was evaluated in terms of query expressivity<br />

and completeness, query execution time and coverage of<br />

the W3C Provenance Incubator group requirements for<br />

mapping Provenance on the Web [3].<br />

The framework achieved high query expressivity (80%<br />

of the queries were addressed), with an average query<br />

execution time of 250 ms. The approach also presented<br />

a medium coverage in relation to the W3C Provenance<br />

Incubator group requirements.<br />

References<br />

[1]. André Freitas, Tomas Knap, Sean O’Riain, Edward Curry, W3P:<br />

Building an OPM based provenance model for the Web. In Future<br />

Generation Computer Systems , 2010.<br />

[2] André Freitas, Arnaud Legendre, Sean O'Riain, Edward Curry,<br />

Prov4J: A Semantic Web Framework for Generic Provenance<br />

Management. In The Second International Workshop on Role of<br />

Semantic Web in Provenance Management (SWPM 2010), Springer,<br />

Workshop at International Semantic Web Conference (ISWC),<br />

Shanghai, China, 2010.<br />

[3] Requirements for Provenance on the Web,<br />

http://www.w3.org/2005/Incubator/prov/wiki/User_Requirements

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!