18.11.2012 Views

Lecture 10: Storing and querying RDF data

Lecture 10: Storing and querying RDF data

Lecture 10: Storing and querying RDF data

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

UNIVERSITY OF JYVÄSKYLÄ<br />

<strong>Lecture</strong> <strong>10</strong>: <strong>Storing</strong> <strong>and</strong> <strong>querying</strong> <strong>RDF</strong> <strong>data</strong><br />

University of Jyväskylä<br />

ITKS544 Semantic Web <strong>and</strong> Ontology Engineering<br />

Spring 2012<br />

Version 1.0<br />

Michal Nagy


UNIVERSITY OF JYVÄSKYLÄ<br />

Environmental variables<br />

� OS stores simple variables in its memory<br />

� Applications can ask OS for the value<br />

� Each variable has a name <strong>and</strong> a simple textual value<br />

� To see variables <strong>and</strong> their values run (Win):<br />

– See all:<br />

– See a concrete variable:<br />

� Scope:<br />

set<br />

– Global (valid for the whole OS)<br />

– Local (valid in the current comm<strong>and</strong> line window)<br />

� To change or set new local variable run:<br />

set VARNAME=some text here<br />

set VARNAME<br />

practical<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 2


UNIVERSITY OF JYVÄSKYLÄ<br />

� Plain literals<br />

– Represent strings<br />

– Optional language tag<br />

– Example: “John”@en<br />

� Typed literals<br />

More about literals<br />

– Represent non-string values (e.g. boolean, numbers, …)<br />

– Lexical form of the literal + URI of the <strong>data</strong>type<br />

– Example: “true”^^<br />

@prefix f: .<br />

@prefix xsd: .<br />

f:john f:age "25"^^xsd:int ;<br />

f:firstname "John"@en ;<br />

f:taxesPayed "true"^^xsd:boolean ;<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 3


UNIVERSITY OF JYVÄSKYLÄ<br />

<strong>Storing</strong> <strong>RDF</strong> <strong>data</strong><br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 4


UNIVERSITY OF JYVÄSKYLÄ<br />

<strong>Storing</strong> of <strong>RDF</strong><br />

� Small <strong>data</strong>sets (few triples)<br />

– <strong>RDF</strong> file published on the web or stored locally<br />

� Large <strong>data</strong>sets (thous<strong>and</strong>s to millions of triples)<br />

– Database-bases solution better<br />

– Usually in form of <strong>RDF</strong> storage<br />

� Legacy <strong>data</strong><br />

– Keep in original form<br />

– Provide mapping to <strong>RDF</strong><br />

– Expose as <strong>RDF</strong> to the outer world<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 5


UNIVERSITY OF JYVÄSKYLÄ<br />

� Jena (open-source)<br />

Database-based solutions<br />

– Link: http://incubator.apache.org/jena/<br />

� Joseki (open-source)<br />

– Link: http://www.joseki.org/<br />

� Sesame by Aduna (open-source)<br />

– Link: http://www.openrdf.org/<br />

� <strong>RDF</strong>Lib (open source, BSD 2-Clause license)<br />

– Link: http://code.google.com/p/rdflib/<br />

– Doc: http://readthedocs.org/docs/rdflib/en/latest/<br />

� Mulgara (open source, Open Software License)<br />

– Link: http://www.mulgara.org/<br />

� Open Anzo (open source, Ecplise Public License 1.0)<br />

– Link: http://www.openanzo.org/<br />

� Virtuoso Universal Server by OpenLink Software<br />

– Link: http://virtuoso.openlinksw.com/<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 6


UNIVERSITY OF JYVÄSKYLÄ<br />

Sesame<br />

� Framework for processing <strong>RDF</strong> <strong>data</strong><br />

� Home: http://www.openrdf.org/<br />

� Features:<br />

– Parsing<br />

• Supports all major notations<br />

– <strong>Storing</strong><br />

• In-memory, RDBS-backed, file-based<br />

– Inferencing<br />

• Rule-based, Ontology-based<br />

– Querying<br />

• SPARQL, SeRQL<br />

� Java-based API + tools<br />

� Elmo: http://semanticweb.org/wiki/Elmo<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 7


UNIVERSITY OF JYVÄSKYLÄ<br />

Sesame architecture<br />

� Rio (<strong>RDF</strong> I/O)<br />

– Parsers <strong>and</strong> writers for various notations<br />

� Sail (Storage And Inference Layer)<br />

– Low level System API<br />

– Abstraction for storage <strong>and</strong> inferencing<br />

� Repository API<br />

– Higher level API<br />

– Developer-oriented methods for h<strong>and</strong>ling <strong>RDF</strong> <strong>data</strong><br />

� HTTP Server<br />

– Accessing Sesame through HTTP<br />

Image source: http://www.openrdf.org/doc/sesame2/users/ch03.html<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 8


UNIVERSITY OF JYVÄSKYLÄ<br />

Installing Sesame workbench<br />

� Simple web interface for storing <strong>and</strong> <strong>querying</strong> <strong>RDF</strong> <strong>data</strong><br />

� Install steps (no admin rights needed):<br />

practical<br />

1. Download <strong>and</strong> unzip newest Sesame <strong>and</strong> Tomcat<br />

2. Copy all *.war files from Sesame’s war folder to Tomcat’s<br />

webapps folder<br />

3. Start Tomcat<br />

• From bin folder by running startup.sh (UNIX) or<br />

startup.bat (Win)<br />

• You may need to set JAVA_HOME variable (it should point to<br />

JDK or JRE main folder)<br />

4. Go to http://localhost:8080/openrdf-workbench<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 9


UNIVERSITY OF JYVÄSKYLÄ<br />

Jena<br />

� Collection of tools <strong>and</strong> Java libraries to simplify the<br />

development of semantic web <strong>and</strong> linked-<strong>data</strong><br />

applications<br />

� Home: http://incubator.apache.org/jena/index.html<br />

� Includes:<br />

– API for processing <strong>RDF</strong> <strong>data</strong> in various notations<br />

– Ontology API for OWL <strong>and</strong> <strong>RDF</strong>S<br />

– Rule-based inference engine<br />

– <strong>RDF</strong> store<br />

– SPARQL query processor (called ARQ)<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> <strong>10</strong>


UNIVERSITY OF JYVÄSKYLÄ<br />

� Installation:<br />

– Download <strong>and</strong> unzip Jena<br />

ARQ<br />

– Set JENAROOT environmental variable to folder where you<br />

unzipped Jena<br />

– In bat (Win) or bin (UNIX) you find arq executable<br />

� Usage:<br />

practical<br />

– Prepare a SPARQL query <strong>and</strong> save it into a file (here:<br />

query.sparql)<br />

– Prepare some <strong>data</strong> file (if needed) – e.g. <strong>data</strong>.rdf<br />

– Execute the query on top of the <strong>data</strong> by running:<br />

arq --query=query.sparql --<strong>data</strong>=<strong>data</strong>.rdf<br />

� In case of problems use arq --help<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 11


UNIVERSITY OF JYVÄSKYLÄ<br />

� SPARQL server for Jena<br />

� Features:<br />

Joseki<br />

– <strong>RDF</strong> Data from files <strong>and</strong> <strong>data</strong>bases<br />

– HTTP (GET <strong>and</strong> POST) implementation of the SPARQL<br />

protocol<br />

� Installation:<br />

practical<br />

1. Download <strong>and</strong> unzip Joseki<br />

2. Check if you can run java from comm<strong>and</strong> line<br />

• If not, put java to PATH environmental variable<br />

3. Start Joseki server from Joseki folder by running:<br />

java -cp "%CLASSPATH%;lib\*" joseki.rdfserver joseki-config.ttl<br />

4. Open http://localhost:2020/<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 12


UNIVERSITY OF JYVÄSKYLÄ<br />

� Python-based<br />

<strong>RDF</strong>lib<br />

� Link: http://code.google.com/p/rdflib/<br />

� Documentation: http://readthedocs.org/docs/rdflib/<br />

� Simple control over <strong>RDF</strong> graphs within Python code<br />

– Parsing <strong>and</strong> serializing<br />

� <strong>Storing</strong> of <strong>RDF</strong> <strong>data</strong> using relational DB<br />

� Querying using SPARQL<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 13


UNIVERSITY OF JYVÄSKYLÄ<br />

<strong>RDF</strong>lib plugins<br />

Image source: http://readthedocs.org/docs/rdflib/en/latest/<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 14


UNIVERSITY OF JYVÄSKYLÄ<br />

� <strong>RDF</strong> <strong>data</strong>base<br />

� Written entirely in Java<br />

� Link: http://www.mulgara.org/<br />

Mulgara<br />

� Querying language – SPARQL <strong>and</strong> own TQL<br />

� TQL language:<br />

– Interpreted <strong>querying</strong> <strong>and</strong> comm<strong>and</strong> language<br />

– To manage Mulgara storage<br />

� REST interface for TQL <strong>and</strong> SPARQL<br />

� Starting the server:<br />

practical<br />

– Download <strong>and</strong> unzip the binary package<br />

– Inside the mulgara folder run:<br />

– TQL REST interface: http://localhost:8080/tql/<br />

java -jar mulgara-2.1.13.jar<br />

– SPARQL REST interface: http://localhost:8080/sparql/<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 15


UNIVERSITY OF JYVÄSKYLÄ<br />

Querying <strong>RDF</strong> <strong>data</strong><br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 16


UNIVERSITY OF JYVÄSKYLÄ<br />

@prefix f:<br />

.<br />

@prefix xsd:<br />

.<br />

f:john f:age "25"^^xsd:int .<br />

f:bill f:age "30"^^xsd:int .<br />

f:mary f:age "24"^^xsd:int .<br />

f:jane f:age "26"^^xsd:int .<br />

f:john f:loves f:mary .<br />

f:bill f:loves f:jane .<br />

f:john f:hasFriend f:bill<br />

Example <strong>data</strong> set<br />

f:bill<br />

f:john<br />

f:mary<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 17<br />

30<br />

25<br />

f:age<br />

f:age<br />

f:loves<br />

f:loves<br />

f:jane<br />

26<br />

24<br />

f:age<br />

f:age


UNIVERSITY OF JYVÄSKYLÄ<br />

Simple SPARQL queries (1)<br />

� Show me the property f:age of resource f:mary<br />

f:bill f:loves<br />

30<br />

f:john<br />

25<br />

f:age<br />

f:age<br />

Data<br />

f:loves<br />

f:jane<br />

26<br />

f:mary<br />

24<br />

f:age<br />

f:age<br />

Query<br />

SELECT ?a<br />

WHERE { <br />

?a }<br />

PREFIX f: <br />

SELECT ?a<br />

WHERE { f:mary f:age ?a }<br />

f:mary<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 18<br />

?a<br />

f:age<br />

Result<br />

a<br />

24


UNIVERSITY OF JYVÄSKYLÄ<br />

Simple SPARQL queries (2)<br />

� Show me f:age of all resources<br />

f:bill f:loves<br />

30<br />

f:john<br />

25<br />

f:age<br />

f:age<br />

Data Query<br />

f:loves<br />

f:jane<br />

26<br />

f:mary<br />

24<br />

f:age<br />

f:age<br />

PREFIX f: <br />

SELECT ?r ?a<br />

WHERE { ?r f:age ?a }<br />

Result<br />

r a<br />

f:bill 30<br />

f:jane 26<br />

f:john 25<br />

f:mary 24<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 19<br />

?r<br />

?a<br />

f:age


UNIVERSITY OF JYVÄSKYLÄ<br />

Simple SPARQL queries (3)<br />

� Show me all things that are loved. Also show me their age (f:age)<br />

?x<br />

f:bill f:loves<br />

30<br />

f:john<br />

25<br />

f:age<br />

f:age<br />

f:loves<br />

f:loves<br />

Data Query<br />

f:jane<br />

26<br />

f:mary<br />

24<br />

?t<br />

?a<br />

f:age<br />

f:age<br />

f:age<br />

PREFIX f: <br />

SELECT ?t ?a<br />

WHERE {<br />

?x f:loves ?t .<br />

?t f:age ?a<br />

}<br />

Result<br />

t a<br />

f:jane 26<br />

f:mary 24<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 20


UNIVERSITY OF JYVÄSKYLÄ<br />

SPARQL: Optional condition<br />

� Show me the person <strong>and</strong> its age (f:age). If you have information about<br />

that person loving somebody, then show it as well.<br />

?p<br />

?a<br />

f:bill f:loves<br />

30<br />

f:john<br />

25<br />

f:age<br />

f:age<br />

f:age<br />

f:loves<br />

f:loves<br />

Data Query<br />

f:jane<br />

26<br />

f:mary<br />

24<br />

?q<br />

f:age<br />

f:age<br />

PREFIX f: <br />

SELECT ?p ?a ?q<br />

WHERE {<br />

?p f:age ?a .<br />

OPTIONAL {?p f:loves ?q}<br />

}<br />

Result<br />

p a q<br />

f:bill 30 f:jane<br />

f:john 25 f:mary<br />

f:mary 24<br />

f:jane 26<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 21


UNIVERSITY OF JYVÄSKYLÄ<br />

SPARQL: Filtering<br />

� Show me people <strong>and</strong> their age for people older than 25.<br />

f:bill f:loves<br />

30<br />

f:john<br />

25<br />

f:age<br />

f:age<br />

Data Query<br />

f:loves<br />

f:jane<br />

26<br />

f:mary<br />

24<br />

f:age<br />

f:age<br />

PREFIX f: <br />

SELECT ?p ?a<br />

WHERE {<br />

?p f:age ?a .<br />

FILTER (?a > 25)<br />

}<br />

If ?a is not a number, then it will not work<br />

Result<br />

p a<br />

f:bill 30<br />

f:jane 26<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 22


UNIVERSITY OF JYVÄSKYLÄ<br />

SPARQL: Logical OR (union)<br />

� Show me all people who have a friend together with all the people that<br />

are younger than 25<br />

f:bill f:loves<br />

30<br />

f:john<br />

25<br />

f:age<br />

f:age<br />

Data Query<br />

f:loves<br />

f:jane<br />

26<br />

f:mary<br />

24<br />

f:age<br />

f:age<br />

PREFIX f: <br />

SELECT ?p<br />

WHERE {?p f:age ?a . FILTER (?a < 25)}<br />

+<br />

PREFIX f: <br />

SELECT ?p<br />

WHERE {?p f:hasFriend ?q}<br />

PREFIX f: <br />

SELECT ?p<br />

WHERE {<br />

{?p f:age ?a . FILTER (?a < 25)}<br />

UNION {?p f:hasFriend ?q}<br />

}<br />

Result<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 23<br />

p<br />

f:mary<br />

f:john


UNIVERSITY OF JYVÄSKYLÄ<br />

� Example:<br />

� Others:<br />

SPARQL: Solution set modifiers<br />

– ORDER BY DESC(?x)<br />

• Arrange in descending order<br />

– LIMIT n<br />

• Include only first n solutions<br />

– OFFSET n<br />

PREFIX f: <br />

SELECT ?p ?a<br />

WHERE { ?p f:age ?a }<br />

ORDER BY ?a<br />

• Include solutions starting from index n+1<br />

– SELECT DISTINCT<br />

• Do not duplicate solutions<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 24


UNIVERSITY OF JYVÄSKYLÄ<br />

SPARQL: Constructing graphs<br />

� Annotate people with age below 26 as young people<br />

f:bill f:loves<br />

30<br />

f:john<br />

25<br />

f:age<br />

f:age<br />

Data “Query”<br />

f:loves<br />

f:jane<br />

26<br />

f:mary<br />

24<br />

f:age<br />

f:age<br />

PREFIX f: <br />

PREFIX rdf: <br />

CONSTRUCT { ?p rdf:type f:YoungPerson }<br />

WHERE { ?p f:age ?a . FILTER (?a < 26) }<br />

rdf:type<br />

f:john<br />

Result<br />

f:YoungPerson<br />

rdf:type<br />

f:mary<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 25


UNIVERSITY OF JYVÄSKYLÄ<br />

SPARQL: several sources (1)<br />

f:john f:age "25"^^xsd:integer .<br />

f:bill f:age "30"^^xsd:integer .<br />

f:mary f:age "24"^^xsd:integer .<br />

f:jane f:age "26"^^xsd:integer .<br />

f:john f:loves f:mary .<br />

f:bill f:loves f:jane .<br />

f:john f:hasFriend f:bill<br />

http://users.jyu.fi/~minagy/itks544/rdf/people.rdf<br />

j:teacher rdf:type j:EducationJob .<br />

j:seniorResearcher rdf:type j:ResearchJob .<br />

j:juniorResearcher rdf:type j:ResearchJob .<br />

j:professor rdf:type j:ResearchJob, j:EducationJob .<br />

http://users.jyu.fi/~minagy/itks544/rdf/jobs.rdf<br />

f:john e:worksAs j:teacher .<br />

f:mary e:worksAs j:seniorResearcher .<br />

f:jane e:worksAs j:juniorResearcher .<br />

f:bill e:worksAs j:professor .<br />

http://users.jyu.fi/~minagy/itks544/rdf/employment.rdf<br />

@prefix j: .<br />

@prefix e: .<br />

@prefix f: .<br />

@prefix rdf: .<br />

@prefix xsd: .<br />

Prefixes<br />

From now on prefixes will<br />

be omitted to save space<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 26


UNIVERSITY OF JYVÄSKYLÄ<br />

SPARQL: several sources (2)<br />

� Show me all people who have a friend together with all the people that<br />

are younger than 25<br />

Query<br />

SELECT ?per ?ag ?job<br />

FROM <br />

FROM <br />

FROM <br />

WHERE {<br />

?per f:age ?ag .<br />

?per e:worksAs ?job .<br />

?job rdf:type j:EducationJob .<br />

}<br />

Result<br />

per ag job<br />

f:bill 30 j:professor<br />

f:john 25 j:teacher<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 27


UNIVERSITY OF JYVÄSKYLÄ<br />

SPARQL: Update<br />

� Also called SPARUL or SPARQL/Update<br />

� From January 2012 W3C Working Draft<br />

– http://www.w3.org/TR/sparql11-update/<br />

� Insert, update <strong>and</strong> delete functionality<br />

� Example (taken from W3C doc):<br />

PREFIX dc: <br />

PREFIX xsd: <br />

DELETE { ?book ?p ?v }<br />

WHERE {<br />

?book dc:date ?date .<br />

FILTER ( ?date > "1970-01-01T00:00:00-02:00"^^xsd:dateTime )<br />

?book ?p ?v<br />

}<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 28


UNIVERSITY OF JYVÄSKYLÄ<br />

� Official specification:<br />

More on SPARQL<br />

– http://www.w3.org/TR/rdf-sparql-query/<br />

� SPARQL update<br />

– http://www.w3.org/TR/sparql11-update/<br />

� Tutorials:<br />

– http://jena.sourceforge.net/ARQ/Tutorial/index.html<br />

� Validator:<br />

– http://www.w3.org/<strong>RDF</strong>/Validator/<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 29


UNIVERSITY OF JYVÄSKYLÄ<br />

Homework<br />

1. Download <strong>and</strong> start Sesame yourself<br />

2. Create some <strong>RDF</strong> file<br />

3. Import it to Sesame<br />

4. Try a few queries<br />

27/02/2012 ITKS544 - <strong>Lecture</strong> <strong>10</strong> 30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!