NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
1. Situation<br />
Querying Live Linked Data<br />
Jürgen Umbrich, Aidan Hogan, Axel Polleres, Stefan Decker<br />
DERI, National University of Ireland, <strong>Galway</strong><br />
email: firstname.lastname@deri.org<br />
A growing amount of Linked Data enables advanced<br />
data integration and decision-making applications. Typical<br />
systems operating on Linked Data collect (crawl) and preprocess<br />
(index) large amounts of data, and evaluate queries<br />
against a centralised repository. Given that crawling and<br />
indexing are time-consuming operations, the data in the<br />
centralised index may be out of date at query execution<br />
time. An idealised query answering system for Linked<br />
Data should return live answers in a reasonable amount of<br />
time, even on corpora as large as the Web. In such a live<br />
query system source selection - determining which sources<br />
contribute answers to a query - is a crucial step to avoid<br />
unnecessary fetch operations.<br />
Most current approaches enabling query processing over<br />
RDF data are based on materialised based approaches<br />
(MAT). Centralised approaches provide excellent query<br />
response times due to extensive preprocessing carried<br />
out, but suffer from drawbacks like the freshness of the<br />
aggregated data or data providers have to give up sole<br />
sovereignty on their data. On the other end of the spectrum,<br />
distributed query processing approaches (DQP) typically<br />
assume processing power attainable at the sources themselves,<br />
which could be leveraged in parallel for query<br />
processing. Such distributed or federated approaches offer<br />
up-to-date query results, however, they cannot give strict<br />
guarantees on query performance and the response time is<br />
very slow compared to MAT approaches.<br />
Our current research direction is based on 1) the investigation<br />
of a middle ground approach between the<br />
two extremes of MAT and DQP and propose the use of<br />
histogram-based data summaries [1] and 2) the studies of<br />
the dynamics of Linked Data sources [2], [3]. The results<br />
clearly show that a query approach which combines MAT<br />
and DQP allows a novel and powerful way to guarantee<br />
fast query response times and up-to-date answers. The<br />
remainder of this abstract highlights the next steps to<br />
enable novel ways to querying live Linked Data.<br />
2. The System<br />
Figure 1 depicts the proposed solution architecture<br />
which consists of three main components: 1) the query<br />
processor, 2) detecting dynamic patterns and 3) the source<br />
selection components. The system can be plugged on top of<br />
any RDF store which allows data access via the SPARQL<br />
protocol. The query processor decides during the runtime<br />
which parts of the query involve dynamic information and<br />
122<br />
live results<br />
SPARQL<br />
Query<br />
Processor<br />
non-dynamic<br />
query patterns<br />
dynamic<br />
query patterns<br />
MAT<br />
detect dynamics<br />
LinkedData<br />
Web<br />
source<br />
selection<br />
Any local or<br />
remote index<br />
providing data<br />
access via<br />
t h e S P A R Q L<br />
protocol<br />
Fig. 1. Combined Query of RDF store(s) and the Linked Data Web.<br />
which are rather static. Clearly, the system will evaluate<br />
static query parts with the MAT approach over the local<br />
index and dynamic parts with an DQP approach directly<br />
over the Linked Data Web.<br />
3. Dynamic Patterns<br />
Our current research heavily focuses in detecting dynamic<br />
patterns in Linked Data sources which are then<br />
incorporated into the query processor and the source selection<br />
component (cf. “detect dynamics” in Figure 1). As<br />
a direct consequence, the next research step is to integrate<br />
the source selection component and the knowledge about<br />
dynamic patterns into the query processor to decide at<br />
runtime which parts of a query are evaluated over which<br />
sources (either local or Web sources). One of the crucial<br />
steps is to decide if we base our access decision on the<br />
knowledge about dynamic patterns or static patterns and<br />
how to represent such patterns.<br />
4. Acknowledgement<br />
The work presented in this paper has been funded<br />
in part by Science Foundation Ireland under Grant No.<br />
SFI/08/CE/I1380 (Lion-2).<br />
References<br />
[1] J. Umbrich, K. Hose, M. Karnstedt, A. Harth, and A. Polleres,<br />
“Comparing data summaries for processing live queries over linked<br />
data,” World Wide Web Journal, vol. accepted, 2011, to appear.<br />
[2] J. Umbrich, M. Hausenblas, A. Hogan, A. Polleres, and S. Decker,<br />
“Towards Dataset Dynamics: Change Frequency of Linked Open<br />
Data Sources,” in Proceedings of the Linked Data on the Web<br />
Workshop (LDOW2010) at WWW2010, 2010.<br />
[3] J. Umbrich, M. Karnstedt, and S. Land, “Towards Understanding the<br />
Changing Web: Mining the Dynamics of Linked-Data Sources and<br />
Entities,” in Proceedings of the Knowledge Discovery, Data Mining,<br />
and Machine Learning Workshop (KDML 2010) at LWA 2010, 2010.