29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Provenance in the Web of Data:<br />

a building block for user profiling and trust in online communities<br />

Fabrizio Orlandi, Alexandre Passant<br />

Digital Enterprise Research Institute<br />

National University of Ireland, <strong>Galway</strong><br />

fabrizio.orlandi@deri.org - alexandre.passant@deri.org<br />

Abstract<br />

Online collaborative knowledge bases such as<br />

Wikipedia provide an extensive source of information,<br />

not only to their readers, but also to a wide range of<br />

applications and Web services. For example, DBpedia,<br />

one of the largest datasets on the Web of Data, is<br />

widely used as a reference for data interlinking and as<br />

a basis for applications employing Semantic Web<br />

technologies. Yet its dataset, directly derived from<br />

Wikipedia articles, could contain errors due to<br />

inexperience or anonymity of the contributors. By<br />

analysing the Wikipedia edit history and the users'<br />

contributions we provide detailed provenance<br />

information for DBpedia statements and we make this<br />

information publicly available on the Web of Data.<br />

The dataset we provide is then fundamental for<br />

analysing users' activities/interests and computing<br />

trust measures.<br />

Collaborative websites such as Wikipedia have<br />

recently shown the benefit of being able to create and<br />

manage very large public knowledge bases. However,<br />

one of the most common concerns about these types of<br />

information sources is the trustworthiness of their<br />

content which can be arbitrarily edited by everyone.<br />

The DBpedia project 1 , which aims at converting<br />

Wikipedia content into structured knowledge, is then<br />

not exempt from this concern. Especially considering<br />

that one of the main objectives of DBpedia is to build a<br />

dataset such that Semantic Web technologies can be<br />

employed against it. Hence this allows not only to<br />

formulate sophisticated queries against Wikipedia, but<br />

also to link it to other datasets on the Web, or create<br />

new applications or mashups. Thanks to its large<br />

dataset and its cross-domain nature DBpedia has<br />

become one of the most important and interlinked<br />

datasets on the Web of Data. Therefore providing<br />

information about where DBpedia data comes from and<br />

how it was extracted and processed is crucial. This type<br />

of information is called provenance and it describes the<br />

entire data life cycle, from its origin to its subsequent<br />

processing history.<br />

Having provenance information about Wikipedia<br />

data allows us to identify quality measures for<br />

Wikipedia articles and estimate the trustworthiness of<br />

their content. Then, since the DBpedia content is<br />

directly extracted from Wikipedia, the same trust and<br />

quality values can be propagated to the DBpedia<br />

dataset. We apply this process to DBpedia, but this is<br />

just one particular use-case, the same considerations<br />

about provenance are suitable for every dataset on the<br />

Web of Data. The benefits of using data provenance to<br />

develop trust on the Web, and the Semantic Web in<br />

particular, have been already widely described in the<br />

state of the art. Provenance data provides useful<br />

information such as timeliness and authorship of data.<br />

It can be used as a ground basis for various<br />

applications and use cases such as identifying trust<br />

values for pages or pages fragments, or measuring<br />

users' expertise by analysing their contributions and<br />

then personalize trust metrics based on the user profile<br />

of a person on a particular topic. Moreover, providing<br />

also provenance meta-data as RDF and making it<br />

available on the Web of Data offers more interchange<br />

possibilities and transparency. This would let people<br />

link to provenance information from other sources. It<br />

provides them the opportunity to compare these sources<br />

and choose the most appropriate one or the one with<br />

higher quality. In our specific context of DBpedia for<br />

example, by indicating by whom and when a RDF<br />

triple was created (or contributed by), it could let any<br />

application flag, reject or approve this statement based<br />

on particular criteria.<br />

In our work [1][2] we propose a modelling solution<br />

to semantically represent information about provenance<br />

of data in DBpedia and an extraction framework<br />

capable of computing provenance for DBpedia<br />

statements using Wikipedia edits. The framework<br />

consists of: (i) a lightweight modelling solution to<br />

semantically represent provenance of both DBpedia<br />

resources and Wikipedia content, (ii) an information<br />

extraction process and a provenance-computation<br />

system combining Wikipedia articles' history with<br />

DBpedia information, (iii) a set of scripts to make<br />

provenance information about DBpedia statements<br />

directly available when browsing this source, (iv) a<br />

publicly available web service that exposes in RDF as<br />

Linked Open Data our provenance dataset letting<br />

software agents and developers consume it.<br />

References<br />

[1] Orlandi F., Champin P-A., Passant A., “Semantic<br />

Representation of Provenance in Wikipedia,”, Semantic Web<br />

Provenance Management workshop at ISWC2010, CEUR-<br />

WS, Shanghai, 2010.<br />

[2] Orlandi F., Passant A., “Modelling Provenance of<br />

DBpedia Resources Using Wikipedia Contributions”,<br />

Journal of Web Semantics, (to be published), 2011.<br />

* This work has been funded in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2) and by an IRCSET Scholarship.<br />

1 http://dbpedia.org/<br />

97

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!