29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Using the Web to Enhance Desktop Data<br />

Laura Dragan, Renaud Delbru, Siegfried Handschuh, Stefan Decker<br />

Digital Enterprise Research Institute, National University of Ireland, <strong>Galway</strong><br />

firstname.lastname@deri.org<br />

Abstract<br />

The Semantic Web and the Semantic Desktop provide<br />

frameworks to structure and interlink data, using<br />

similar stacks of technologies, but the integration of the<br />

resulting datasets is not straightforward. We describe a<br />

system that finds Web aliases for desktop resources and<br />

uses them to enhance the desktop information. The<br />

process is based on type and property mappings and<br />

matching the values found online with the known values<br />

from the desktop.<br />

The Semantic Web proposes a common framework<br />

that gives meaning to the data so that it can be<br />

understood by machines, structured and interlinked<br />

better. With the emergence of Linked [Open] Data, the<br />

Semantic Web has gained momentum, and vast<br />

amounts of structured data became available online.<br />

The Semantic Desktop is the result of applying<br />

Semantic Web technologies on the desktop. It promised<br />

to bring elegant solutions to the problems of<br />

information overload and application data silos by<br />

using standard vocabularies and representation for the<br />

desktop data.<br />

The Semantic Web and the Semantic Desktop<br />

provide solutions to interlink data in the two<br />

environments, but we have separate islands of<br />

knowledge, containing similar data, related to the same<br />

topics of interest to the user, but disconnected from<br />

each other. For instance, DBLP contains information<br />

about publications, authors and conferences. This<br />

information can be used to enhance the desktop<br />

information about a fellow researcher, or about a<br />

conference of interest. Other online data sources, like<br />

MusicBrainz and the BBC Music Project can help<br />

enhance the desktop information of the user's music<br />

collection. The challenge is to find links between these<br />

two systems and ultimately integrate them.<br />

We describe here an approach to automatically find<br />

semantic web aliases for Semantic Desktop resources.<br />

An alias is a Semantic Web resource that represents the<br />

same real-world thing as the initial desktop resource.<br />

The type of information available about the aliases<br />

might differ between sources - e.g. for a person, the<br />

types of information include the email address and<br />

telephone number, homepage, blog URL and date of<br />

birth; for researchers, also affiliation, papers they<br />

authored and projects they are involved in. This<br />

information might not all be available from one source,<br />

but spread over multiple sources, each one containing<br />

overlapping information. Some information might only<br />

be available online, while some could only be found on<br />

the desktop.<br />

118<br />

The service we propose runs on the desktop. It uses<br />

the information available on the desktop about<br />

resources to find Web aliases for those resources. The<br />

process consists of three steps, as follows:<br />

Gather local data for the current resource on<br />

the desktop. This extends to all the values of<br />

properties of other connected resources that might be<br />

relevant.<br />

Query Web sources based on the label and<br />

type of the current resource. The Web sources can<br />

be Semantic Web search engines like Sindice, or<br />

accessible SPARQL endpoints like the one provided<br />

by DBpedia.<br />

Filter the Web results based on property<br />

matches. For this step we use a set of mappings<br />

between desktop and Web ontologies, and string<br />

matching algorithms to rank the results.<br />

The set of vocabularies used on the desktop are<br />

different from those used on the Semantic Web,<br />

therefore we created the mappings for some of the most<br />

popular vocabularies, to reconcile the two systems. The<br />

mappings between the desktop ontologies and the<br />

vocabularies used on the Web are of two kinds: type<br />

mappings and property mappings. The type mappings<br />

say for each desktop resource type (e.g. pimo:Person)<br />

what possible types can be found online (e.g.<br />

foaf:Person, foaf:Actor, akt:Person). Property mappings<br />

specify the correspondence between desktop and Web<br />

properties of resources - e.g. nco:fullname corresponds<br />

to foaf:name. Composite properties are also handled.<br />

We process the desktop resources for which there is<br />

a high probability of finding related information online,<br />

like people and events. For each, the result is a list of<br />

Web aliases and a score for each alias, to show how<br />

exact the match was determined to be. If the score is<br />

above a maximal threshold, the alias is saved. If the<br />

score is below a minimal threshold, the alias is<br />

discarded. For intermediary values, the user is asked to<br />

decide which alias to keep.<br />

The system is designed to be modular, more Web<br />

sources can be plugged in at any time, and more<br />

mappings can be added.<br />

Finding Web aliases for desktop resources helps<br />

enriching desktop data with new information from the<br />

Web, and provides a way to close the gap between the<br />

two knowledge islands.<br />

Acknowledgments. The work presented here was<br />

supported by Science Foundation Ireland under Grant<br />

No. SFI/08/CE/I1380 (Líon-2).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!