NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Generic Provenance Management for Linked Dataspaces<br />
Abstract<br />
The emergence of Linked Data accelerates the demand<br />
for mechanisms which can support users in the<br />
assessment of data quality and trustworthiness on the<br />
Web. In this process, the ability to track the historical<br />
trail behind information resources on the Web plays a<br />
fundamental role in the Linked Data Web. Applications<br />
consuming or generating Linked Data on the Web need<br />
to become provenance-aware, i.e., being able to<br />
capture and consume provenance descriptors<br />
associated with the data. This brings provenance as a<br />
key requirement for a wide spectrum of applications.<br />
This work focuses on the creation of this infrastructure,<br />
describing a generic provenance management<br />
framework for the Web using Semantic Web tools and<br />
standards to address the core challenges of provenance<br />
management on the Web.<br />
1. Introduction<br />
The advent of Linked Data in the last years as the defacto<br />
standard to publish data on the Web, and its<br />
uptake by early adopters, defines a clear trend towards a<br />
Web where users will be able to easily aggregate,<br />
consume and republish data. With Linked Data, Web<br />
information can be repurposed with a new level of<br />
granularity and scale. In this scenario, tracking the<br />
historical trail behind an information artifact plays a<br />
fundamental role for data consumers, allowing users to<br />
determine the suitability and quality of a piece of<br />
information.<br />
To provide the additional provenance data, Linked Data<br />
applications will demand mechanisms to track and<br />
manage provenance information. This new common<br />
requirement is inherent to the level of data integration<br />
provided by Linked Data and it is not found in most<br />
systems consuming information from “data silos”,<br />
where the relationship among data sources and<br />
applications is, in general, more rigid. This work<br />
focuses on the provision of a provenance management<br />
infrastructure having as a motivation supporting Linked<br />
Data applications.<br />
2. Description of the Approach<br />
The central goal behind this work is to provide a set of<br />
core functionalities that enable users to develop<br />
provenance-aware applications, both from the<br />
consumption (discovery/query/access) and from the<br />
capture (logging/ publishing) perspectives. The<br />
architecture of the proposed approach maximizes the<br />
encapsulation of provenance capture and consumption<br />
André Freitas, Seán O’Riain, Edward Curry<br />
Digital Enterprise Research Institute (DERI)<br />
National University of Ireland, <strong>Galway</strong><br />
{firstname.lastname@deri.org}<br />
124<br />
functionalities, separating provenance into a distinct<br />
layer.<br />
Two strong requirements in the construction of the<br />
framework are the minimization of efforts in software<br />
adaptations (changes in the original system to make a<br />
system provenance-aware) and the provision of a<br />
expressive provenance model. To satisfy the<br />
requirements above, the proposed approach employed<br />
the following strategies:<br />
- Provenance capture by software reflection and<br />
annotation.<br />
- An Expressive and accessible API for<br />
provenance queries.<br />
- Use of Semantic Web standards & tools.<br />
The final solution includes a provenance management<br />
framework [1] (Prov4J) and a provenance ontology<br />
(W3P) [2]. Further details on both artifacts can be found<br />
in [1] and [2].<br />
3. Evaluation<br />
The framework was evaluated using a provenance<br />
dataset which was generated using aggregated business<br />
news and opinions collected from data sources on the<br />
Web. These data elements defined the ground artifacts<br />
which were further curated and analyzed in a financial<br />
analysis workflow simulator. Based on the generated<br />
provenance workflow, a provenance query set was<br />
created, covering 51 types of provenance queries. The<br />
framework was evaluated in terms of query expressivity<br />
and completeness, query execution time and coverage of<br />
the W3C Provenance Incubator group requirements for<br />
mapping Provenance on the Web [3].<br />
The framework achieved high query expressivity (80%<br />
of the queries were addressed), with an average query<br />
execution time of 250 ms. The approach also presented<br />
a medium coverage in relation to the W3C Provenance<br />
Incubator group requirements.<br />
References<br />
[1]. André Freitas, Tomas Knap, Sean O’Riain, Edward Curry, W3P:<br />
Building an OPM based provenance model for the Web. In Future<br />
Generation Computer Systems , 2010.<br />
[2] André Freitas, Arnaud Legendre, Sean O'Riain, Edward Curry,<br />
Prov4J: A Semantic Web Framework for Generic Provenance<br />
Management. In The Second International Workshop on Role of<br />
Semantic Web in Provenance Management (SWPM 2010), Springer,<br />
Workshop at International Semantic Web Conference (ISWC),<br />
Shanghai, China, 2010.<br />
[3] Requirements for Provenance on the Web,<br />
http://www.w3.org/2005/Incubator/prov/wiki/User_Requirements