29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1. Situation<br />

Querying Live Linked Data<br />

Jürgen Umbrich, Aidan Hogan, Axel Polleres, Stefan Decker<br />

DERI, National University of Ireland, <strong>Galway</strong><br />

email: firstname.lastname@deri.org<br />

A growing amount of Linked Data enables advanced<br />

data integration and decision-making applications. Typical<br />

systems operating on Linked Data collect (crawl) and preprocess<br />

(index) large amounts of data, and evaluate queries<br />

against a centralised repository. Given that crawling and<br />

indexing are time-consuming operations, the data in the<br />

centralised index may be out of date at query execution<br />

time. An idealised query answering system for Linked<br />

Data should return live answers in a reasonable amount of<br />

time, even on corpora as large as the Web. In such a live<br />

query system source selection - determining which sources<br />

contribute answers to a query - is a crucial step to avoid<br />

unnecessary fetch operations.<br />

Most current approaches enabling query processing over<br />

RDF data are based on materialised based approaches<br />

(MAT). Centralised approaches provide excellent query<br />

response times due to extensive preprocessing carried<br />

out, but suffer from drawbacks like the freshness of the<br />

aggregated data or data providers have to give up sole<br />

sovereignty on their data. On the other end of the spectrum,<br />

distributed query processing approaches (DQP) typically<br />

assume processing power attainable at the sources themselves,<br />

which could be leveraged in parallel for query<br />

processing. Such distributed or federated approaches offer<br />

up-to-date query results, however, they cannot give strict<br />

guarantees on query performance and the response time is<br />

very slow compared to MAT approaches.<br />

Our current research direction is based on 1) the investigation<br />

of a middle ground approach between the<br />

two extremes of MAT and DQP and propose the use of<br />

histogram-based data summaries [1] and 2) the studies of<br />

the dynamics of Linked Data sources [2], [3]. The results<br />

clearly show that a query approach which combines MAT<br />

and DQP allows a novel and powerful way to guarantee<br />

fast query response times and up-to-date answers. The<br />

remainder of this abstract highlights the next steps to<br />

enable novel ways to querying live Linked Data.<br />

2. The System<br />

Figure 1 depicts the proposed solution architecture<br />

which consists of three main components: 1) the query<br />

processor, 2) detecting dynamic patterns and 3) the source<br />

selection components. The system can be plugged on top of<br />

any RDF store which allows data access via the SPARQL<br />

protocol. The query processor decides during the runtime<br />

which parts of the query involve dynamic information and<br />

122<br />

live results<br />

SPARQL<br />

Query<br />

Processor<br />

non-dynamic<br />

query patterns<br />

dynamic<br />

query patterns<br />

MAT<br />

detect dynamics<br />

LinkedData<br />

Web<br />

source<br />

selection<br />

Any local or<br />

remote index<br />

providing data<br />

access via<br />

t h e S P A R Q L<br />

protocol<br />

Fig. 1. Combined Query of RDF store(s) and the Linked Data Web.<br />

which are rather static. Clearly, the system will evaluate<br />

static query parts with the MAT approach over the local<br />

index and dynamic parts with an DQP approach directly<br />

over the Linked Data Web.<br />

3. Dynamic Patterns<br />

Our current research heavily focuses in detecting dynamic<br />

patterns in Linked Data sources which are then<br />

incorporated into the query processor and the source selection<br />

component (cf. “detect dynamics” in Figure 1). As<br />

a direct consequence, the next research step is to integrate<br />

the source selection component and the knowledge about<br />

dynamic patterns into the query processor to decide at<br />

runtime which parts of a query are evaluated over which<br />

sources (either local or Web sources). One of the crucial<br />

steps is to decide if we base our access decision on the<br />

knowledge about dynamic patterns or static patterns and<br />

how to represent such patterns.<br />

4. Acknowledgement<br />

The work presented in this paper has been funded<br />

in part by Science Foundation Ireland under Grant No.<br />

SFI/08/CE/I1380 (Lion-2).<br />

References<br />

[1] J. Umbrich, K. Hose, M. Karnstedt, A. Harth, and A. Polleres,<br />

“Comparing data summaries for processing live queries over linked<br />

data,” World Wide Web Journal, vol. accepted, 2011, to appear.<br />

[2] J. Umbrich, M. Hausenblas, A. Hogan, A. Polleres, and S. Decker,<br />

“Towards Dataset Dynamics: Change Frequency of Linked Open<br />

Data Sources,” in Proceedings of the Linked Data on the Web<br />

Workshop (LDOW2010) at WWW2010, 2010.<br />

[3] J. Umbrich, M. Karnstedt, and S. Land, “Towards Understanding the<br />

Changing Web: Mining the Dynamics of Linked-Data Sources and<br />

Entities,” in Proceedings of the Knowledge Discovery, Data Mining,<br />

and Machine Learning Workshop (KDML 2010) at LWA 2010, 2010.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!