29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Natural Language Queries on Enterprise Linked Dataspaces:<br />

A Vocabulary Independent Approach<br />

André Freitas, João Gabriel Oliveira, Edward Curry, Seán O’Riain<br />

Digital Enterprise Research Institute (DERI)<br />

National University of Ireland, <strong>Galway</strong><br />

{firstname.lastname@deri.org}<br />

Abstract<br />

This work describes Treo, a natural language query<br />

mechanism for Linked Data which focuses on the<br />

provision of a precise and scalable semantic matching<br />

approach between natural language queries and<br />

distributed heterogeneous Linked Datasets. Treo's<br />

semantic matching approach combines three key<br />

elements: entity search, a Wikipedia-based semantic<br />

relatedness measure and spreading activation search.<br />

While entity search allows Treo to cope with queries<br />

over high volume and distributed data, the combination<br />

of entity search and spreading activation search using a<br />

Wikipedia-based semantic relatedness measure<br />

provides a flexible approach for handling the semantic<br />

match between natural language queries and Linked<br />

Data.<br />

1. Introduction<br />

Linked Data brings the promise of incorporating a new<br />

dimension to the Web where the availability of Webscale<br />

data can determine a paradigmatic transformation<br />

of the Web and its applications. However, together with<br />

its opportunities, Linked Data brings inherent<br />

challenges in the way users and applications consume<br />

the available data. End-users consuming Linked Data on<br />

the Web or on corporate intranets should be able to<br />

query data spread over potentially a large number of<br />

heterogeneous, complex and distributed datasets. The<br />

freedom and universality provided by search engines in<br />

the Web of Documents were fundamental in the process<br />

of maximizing the value of the information available on<br />

the Web. Linked Data consumers, however, need<br />

previous understanding of the available vocabularies in<br />

order to execute expressive queries over Linked<br />

Datasets. This constraint strongly limits the visibility<br />

and value of Linked Data. Ideally a query mechanism<br />

for Linked Data should abstract users from the<br />

representation of data. This work focuses on the<br />

investigation of a query mechanism that could address<br />

this challenge providing a vocabulary independent<br />

natural language query approach for Linked Data.<br />

2. Description of the Approach<br />

In order to address the problem, an approach based on<br />

the combination of entity search, a Wikipedia-based<br />

semantic relatedness measure and spreading activation<br />

is proposed. The combination of these three elements in<br />

a query mechanism for Linked Data is a new<br />

contribution in the space. The center of the approach<br />

93<br />

relies on the use of a Wikipedia-based semantic<br />

relatedness measure as a key element for matching<br />

query terms to vocabulary terms, addressing an existing<br />

gap in the literature. Wikipedia-based relatedness<br />

measures address limitations of existing works which<br />

are based on similarity measures/term expansion based<br />

on WordNet.<br />

The final query processing approach provides an<br />

opportunity to revisit cognitive inspired spreading<br />

activation models over semantic networks under<br />

contemporary lenses. The recent availability of Linked<br />

Data, large Web corpora, hardware resources and a<br />

better understanding of the principles behind<br />

information retrieval can provide the necessary<br />

resources to enable practical applications over cognitive<br />

inspired architectures.<br />

3. Evaluation<br />

A prototype, Treo, was developed and evaluated in<br />

terms quality of results using the QALD Workshop<br />

DBpedia training query set [1] containing 50 natural<br />

language queries over DBPedia. Examples of queries<br />

present in the dataset include: “who was the wife of<br />

president Lincoln?” and “which capitals in Europe<br />

were host cities of the summer Olympic games?”. Mean<br />

reciprocal rank, precision and recall measures were<br />

collected.<br />

The proposed approach was able to answer 70% of the<br />

queries and the final values for the collected measures<br />

are mrr=0.492, precision=0.395 and recall=0.451. The<br />

relatedness measure was able to cope with nontaxonomic<br />

variations between query and vocabulary<br />

terms, showing high average discrimination in the node<br />

selection process (average difference between the<br />

relatedness value of answer nodes and the relatedness<br />

mean is 2.81 σ).<br />

The results for each query were analyzed and queries<br />

with errors were categorized into 5 different error<br />

classes. The removal of the queries with errors that are<br />

considered addressable in the short term (Partial<br />

Ordered Dependency Structure Error, Pivot Error,<br />

Relatedness Error) defines estimated projected<br />

measurements of precision=0.64, recall=0.75 and<br />

mrr=0.82.<br />

References<br />

1st Workshop on Question Answering over Linked Data (QALD-1),<br />

://www.sc.cit-ec.uni-bielefeld.de/qald-1, (2011).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!