10.11.2012 Views

D4.2.2 State-of-the-art survey on Ontology Merging ... - STI Innsbruck

D4.2.2 State-of-the-art survey on Ontology Merging ... - STI Innsbruck

D4.2.2 State-of-the-art survey on Ontology Merging ... - STI Innsbruck

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

EU-IST Integrated Project (IP) IST-2003-506826 SEKT<br />

SEKT: Semantically Enabled Knowledge Technologies<br />

<str<strong>on</strong>g>D4.2.2</str<strong>on</strong>g> <str<strong>on</strong>g>State</str<strong>on</strong>g>-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> <strong>on</strong><br />

<strong>Ontology</strong> <strong>Merging</strong> and Aligning V2<br />

Livia Predoiu (DERI)<br />

Cristina Feier (DERI)<br />

Francois Scharffe (DERI)<br />

Jos de Bruijn (DERI)<br />

Francisco M<str<strong>on</strong>g>art</str<strong>on</strong>g>ín-Recuerda (DERI)<br />

Dimitar Manov (SIRMA)<br />

Marc Ehrig (UKARL)<br />

Abstract.<br />

EU-IST Integrated Project (IP) IST-2003-506826 SEKT<br />

Deliverable <str<strong>on</strong>g>D4.2.2</str<strong>on</strong>g> (WP4)<br />

This deliverable c<strong>on</strong>tains a comprehensive state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> <strong>on</strong> <strong>Ontology</strong> <strong>Merging</strong> and<br />

Aligning methods, tools and specificati<strong>on</strong> languages.<br />

Keyword list: <strong>Ontology</strong> Mediati<strong>on</strong>, state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g>, <strong>on</strong>tology merging, <strong>on</strong>tology aligning,<br />

<strong>on</strong>tology mapping<br />

WP4: <strong>Ontology</strong> Mediati<strong>on</strong><br />

Nature <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Deliverable: Report Disseminati<strong>on</strong> level: PU<br />

C<strong>on</strong>tractual date <str<strong>on</strong>g>of</str<strong>on</strong>g> delivery: 2005-12-31 Actual date <str<strong>on</strong>g>of</str<strong>on</strong>g> delivery: 2005-12-31<br />

Copyright c○ 2006 Digital Enterprise Research Institute, University <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Innsbruck</strong>


SEKT C<strong>on</strong>sortium<br />

This document is p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> a research project p<str<strong>on</strong>g>art</str<strong>on</strong>g>ially funded by <str<strong>on</strong>g>the</str<strong>on</strong>g> IST Programme <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Commissi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> European<br />

Communities as project number IST-2003-506826.<br />

British Telecommunicati<strong>on</strong>s plc.<br />

Ori<strong>on</strong> 5/12, Adastral Park<br />

Ipswich IP5 3RE<br />

UK<br />

Tel: +44 1473 609583, Fax: +44 1473 609832<br />

C<strong>on</strong>tactpers<strong>on</strong>: John Davies<br />

E-mail: john.nj.davies@bt.com<br />

Jozef Stefan Institute<br />

Jamova 39<br />

1000 Ljubljana<br />

Slovenia<br />

Tel: +386 1 4773 778, Fax: +386 1 4251 038<br />

C<strong>on</strong>tactpers<strong>on</strong>: Marko Grobelnik<br />

E-mail: marko.grobelnik@ijs.si<br />

University <str<strong>on</strong>g>of</str<strong>on</strong>g> Sheffield<br />

Dep<str<strong>on</strong>g>art</str<strong>on</strong>g>ment <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science<br />

Regent Court, 211 Portobello St.<br />

Sheffield S1 4DP<br />

UK<br />

Tel: +44 114 222 1891, Fax: +44 114 222 1810<br />

C<strong>on</strong>tactpers<strong>on</strong>: Hamish Cunningham<br />

E-mail: hamish@dcs.shef.ac.uk<br />

Intelligent S<str<strong>on</strong>g>of</str<strong>on</strong>g>tware Comp<strong>on</strong>ents S.A.<br />

Pedro de Valdivia, 10<br />

28006 Madrid<br />

Spain<br />

Tel: +34 913 349 797, Fax: +49 34 913 349 799<br />

C<strong>on</strong>tactpers<strong>on</strong>: Richard Benjamins<br />

E-mail: rbenjamins@isoco.com<br />

Ontoprise GmbH<br />

Amalienbadstr. 36<br />

76227 Karlsruhe<br />

Germany<br />

Tel: +49 721 50980912, Fax: +49 721 50980911<br />

C<strong>on</strong>tactpers<strong>on</strong>: Hans-Peter Schnurr<br />

E-mail: schnurr@<strong>on</strong>toprise.de<br />

Vrije Universiteit Amsterdam (VUA)<br />

Dep<str<strong>on</strong>g>art</str<strong>on</strong>g>ment <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Sciences<br />

De Boelelaan 1081a<br />

1081 HV Amsterdam<br />

The Ne<str<strong>on</strong>g>the</str<strong>on</strong>g>rlands<br />

Tel: +31 20 444 7731, Fax: +31 84 221 4294<br />

C<strong>on</strong>tactpers<strong>on</strong>: Frank van Harmelen<br />

E-mail: frank.van.harmelen@cs.vu.nl<br />

Empolis GmbH<br />

Europaallee 10<br />

67657 Kaiserslautern<br />

Germany<br />

Tel: +49 631 303 5540, Fax: +49 631 303 5507<br />

C<strong>on</strong>tactpers<strong>on</strong>: Ralph Traphöner<br />

E-mail: ralph.traphoener@empolis.com<br />

University <str<strong>on</strong>g>of</str<strong>on</strong>g> Karlsruhe, Institute AIFB<br />

Englerstr. 28<br />

D-76128 Karlsruhe<br />

Germany<br />

Tel: +49 721 608 6592, Fax: +49 721 608 6580<br />

C<strong>on</strong>tactpers<strong>on</strong>: York Sure<br />

E-mail: sure@aifb.uni-karlsruhe.de<br />

University <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Innsbruck</strong><br />

Institute <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science<br />

Techikerstraße 13<br />

6020 <strong>Innsbruck</strong><br />

Austria<br />

Tel: +43 512 507 6475, Fax: +43 512 507 9872<br />

C<strong>on</strong>tactpers<strong>on</strong>: Jos de Bruijn<br />

E-mail: jos.de-bruijn@deri.ie<br />

Kea-pro GmbH<br />

Tal<br />

6464 Springen<br />

Switzerland<br />

Tel: +41 41 879 00, Fax: 41 41 879 00 13<br />

C<strong>on</strong>tactpers<strong>on</strong>: Tom Bösser<br />

E-mail: tb@keapro.net<br />

Sirma AI EOOD (Ltd.)<br />

135 Tsarigradsko Shose<br />

S<str<strong>on</strong>g>of</str<strong>on</strong>g>ia 1784<br />

Bulgaria<br />

Tel: +359 2 9768, Fax: +359 2 9768 311<br />

C<strong>on</strong>tactpers<strong>on</strong>: Atanas Kiryakov<br />

E-mail: naso@sirma.bg<br />

Universitat Aut<strong>on</strong>oma de Barcel<strong>on</strong>a<br />

Edifici B, Campus de la UAB<br />

08193 Bellaterra (Cerdanyola del Vallès)<br />

Barcel<strong>on</strong>a<br />

Spain<br />

Tel: +34 93 581 22 35, Fax: +34 93 581 29 88<br />

C<strong>on</strong>tactpers<strong>on</strong>: Pompeu Casanovas Romeu<br />

E-mail: pompeu.casanovasquab.es


Changes<br />

Versi<strong>on</strong> Date Author Changes<br />

0.1 2005-12-01 Cristina creati<strong>on</strong> and fill in a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent <str<strong>on</strong>g>of</str<strong>on</strong>g> D4.2.1<br />

0.2 2005-12-05 Livia added more c<strong>on</strong>tent<br />

0.3 2005-12-10 Cristina added more c<strong>on</strong>tent<br />

0.4 2005-12-19 Livia added more c<strong>on</strong>tent<br />

0.5 2005-12-20 Cristina Versi<strong>on</strong> for QA<br />

0.6 2006-01-31 Livia final Versi<strong>on</strong> for QA


Executive Summary<br />

This report provides a state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong> <strong>Merging</strong> and Aligning methods,<br />

tools and techniques. In this deliverable, we provide a framework for <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>se approaches, as well as a categorizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> approaches for <strong>on</strong>tology merging and<br />

aligning.<br />

We categorize <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in methods and tools, integrati<strong>on</strong> systems and specific<br />

techniques. We compare <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches according to <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> framework and to a<br />

set <str<strong>on</strong>g>of</str<strong>on</strong>g> generic use cases for <strong>on</strong>tology mediati<strong>on</strong> in order to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> problem <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web.<br />

This is <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> this deliverable in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project. We<br />

kept much <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent that was delivered in <str<strong>on</strong>g>the</str<strong>on</strong>g> first versi<strong>on</strong> because <str<strong>on</strong>g>the</str<strong>on</strong>g> systems did not<br />

change and p<str<strong>on</strong>g>art</str<strong>on</strong>g>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> thus did not change. In this versi<strong>on</strong> we have refined<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong>-encountered terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> deliverable and we have updated<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> approaches. We have also added <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

five new systems that are remarkable and make a big c<strong>on</strong>tributi<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> field <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong><br />

Mediati<strong>on</strong> and updated <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> previous systems, where necessarily.


C<strong>on</strong>tents<br />

1 Introducti<strong>on</strong> 3<br />

1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />

1.2 The <strong>Ontology</strong> Mapping Process . . . . . . . . . . . . . . . . . . . . . . 7<br />

1.3 <strong>Ontology</strong> Mismatches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

1.3.1 <strong>Ontology</strong>-level Mismatches . . . . . . . . . . . . . . . . . . . . 9<br />

1.3.2 Language-level mismatches . . . . . . . . . . . . . . . . . . . . 11<br />

1.4 Integrati<strong>on</strong> approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

1.5 Wrappers and Mediators . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

2 Motivati<strong>on</strong>al Use Cases 21<br />

2.1 Use Cases for Instance Mediati<strong>on</strong> . . . . . . . . . . . . . . . . . . . . . 21<br />

2.1.1 Instance Transformati<strong>on</strong> . . . . . . . . . . . . . . . . . . . . . . 22<br />

2.1.2 Instance Unificati<strong>on</strong> . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

2.1.3 Query Rewriting . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />

2.2 <strong>Ontology</strong> <strong>Merging</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.3 Creating <strong>Ontology</strong> Mappings . . . . . . . . . . . . . . . . . . . . . . . . 26<br />

2.3.1 Finding Similarities . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />

2.3.2 Specifying Mappings . . . . . . . . . . . . . . . . . . . . . . . . 26<br />

3 The Evaluati<strong>on</strong> Framework 27<br />

4 The Survey 30<br />

4.1 Methods and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

4.1.1 MAFRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

4.1.2 RDFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />

4.1.3 PROMPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />

4.1.4 GLUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />

4.1.5 Semantic Matching . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />

4.1.6 OntoMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

4.1.7 RDFDiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

4.1.8 OntoMerge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

4.1.9 OMEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

4.1.10 WSMT Data mediati<strong>on</strong> module . . . . . . . . . . . . . . . . . . 67<br />

1


CONTENTS 2<br />

4.1.11 DOME mapping module . . . . . . . . . . . . . . . . . . . . . . 72<br />

4.2 Integrated Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

4.2.1 InfoSleuth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />

4.2.2 ONION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />

4.2.3 OBSERVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />

4.2.4 MOMIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87<br />

4.2.5 INFOMIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90<br />

4.2.6 AutoMed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />

4.3 Specific Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

4.3.1 FCA-Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

4.3.2 OntoMorph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />

4.3.3 QOM Quick <strong>Ontology</strong> Mapping . . . . . . . . . . . . . . . . . . 99<br />

5 Comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Methods 101<br />

5.1 <strong>Ontology</strong> Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

5.2 Mapping Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />

5.3 Mapping Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

5.4 Automati<strong>on</strong> Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

5.5 Applicability to Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

5.6 Implementati<strong>on</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

5.7 Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />

6 C<strong>on</strong>clusi<strong>on</strong>s 110<br />

Bibliography 113


Chapter 1<br />

Introducti<strong>on</strong><br />

This report presents a state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mapping and merging methods<br />

and tools with an emphasis <strong>on</strong> <strong>on</strong>tology mapping and inter-operability over <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />

Web.<br />

This is <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> this deliverable in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project. We<br />

kept a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent that was delivered in <str<strong>on</strong>g>the</str<strong>on</strong>g> first versi<strong>on</strong> unchanged because <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

systems did not change and p<str<strong>on</strong>g>art</str<strong>on</strong>g>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> thus did not change as well. In<br />

this versi<strong>on</strong> we have refined <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong>-encountered terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> deliverable<br />

and we have updated <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> approaches. We have<br />

also added <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> five new systems that are remarkable and make a big c<strong>on</strong>tributi<strong>on</strong><br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> field <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong> Mediati<strong>on</strong> and updated <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> previous<br />

systems, where necessarily. The five new systems which were added are OMEN, WSMT<br />

Data Mediati<strong>on</strong> module, Dome mapping module, INFOMIX and AutoMed.<br />

The issue <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-operability between informati<strong>on</strong> systems has already existed for<br />

many years. With <str<strong>on</strong>g>the</str<strong>on</strong>g> recent advent <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web [BLHL01], <str<strong>on</strong>g>the</str<strong>on</strong>g> issues have <strong>on</strong>ly<br />

increased, because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> abundance, heterogeneity and independence <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> various data<br />

sources.<br />

Traditi<strong>on</strong>al data integrati<strong>on</strong> systems focus <strong>on</strong> inter-operability between data sources<br />

and applicati<strong>on</strong>s within enterprises. Within enterprises a certain coherence between data<br />

sources can be expected, although data integrati<strong>on</strong> within enterprises still faces many<br />

challenges which remain to be resolved.<br />

We focus <strong>on</strong> informati<strong>on</strong> integrati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. This means that we not<br />

<strong>on</strong>ly take into account data integrati<strong>on</strong> within organizati<strong>on</strong>s 1 , but also explicitly address<br />

integrati<strong>on</strong> across organizati<strong>on</strong>al boundaries. Between organizati<strong>on</strong>s, even more heterogeneity<br />

between <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources can be expected.<br />

Fortunately, <strong>on</strong>tologies [Fen03], <str<strong>on</strong>g>the</str<strong>on</strong>g> backb<strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, can help us with a<br />

1 Arguably, <str<strong>on</strong>g>the</str<strong>on</strong>g> intranet <str<strong>on</strong>g>of</str<strong>on</strong>g> an organizati<strong>on</strong> is an isolated p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Web and could be p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />

Web in <str<strong>on</strong>g>the</str<strong>on</strong>g> near future<br />

3


CHAPTER 1. INTRODUCTION 4<br />

p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> problem. Because <strong>on</strong>tologies are explicit and formal specificati<strong>on</strong>s<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> knowledge, <str<strong>on</strong>g>the</str<strong>on</strong>g>y help in disambiguating data and can help in finding corresp<strong>on</strong>dences<br />

between data sources because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> explicit specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> knowledge in an <strong>on</strong>tology.<br />

On <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, data is annotated using <strong>on</strong>tologies. C<strong>on</strong>cepts (also called<br />

classes) 2 in <strong>on</strong>tologies give meaning to data <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Web. Because <strong>on</strong>tologies are shared<br />

specificati<strong>on</strong>s, <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tologies can be used for <str<strong>on</strong>g>the</str<strong>on</strong>g> annotati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> multiple data sources,<br />

not <strong>on</strong>ly limited to Web pages, but also collecti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> xml documents, relati<strong>on</strong>al databases,<br />

etc. This enables a certain degree <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-operati<strong>on</strong> between data sources using <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />

<strong>on</strong>tologies, because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir shared terminology 3 . However, it cannot be expected that all<br />

individuals and organizati<strong>on</strong>s <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web will ever agree <strong>on</strong> using <strong>on</strong>e comm<strong>on</strong><br />

terminology or <strong>on</strong>tology. It can be expected that many different <strong>on</strong>tologies will appear<br />

and, in order to enable inter-operati<strong>on</strong>, mediati<strong>on</strong> is required between <str<strong>on</strong>g>the</str<strong>on</strong>g>se <strong>on</strong>tologies.<br />

As was argued in [VC98, Usc00], it is very hard to create standard <strong>on</strong>tologies. In fact,<br />

even inside organizati<strong>on</strong>s <str<strong>on</strong>g>the</str<strong>on</strong>g> standardizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a terminology is not feasible, because<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> lengthy process <str<strong>on</strong>g>of</str<strong>on</strong>g> standardizati<strong>on</strong> and because <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> a big standard impedes<br />

changes in <str<strong>on</strong>g>the</str<strong>on</strong>g> organizati<strong>on</strong> (any change would require c<strong>on</strong>sensus am<strong>on</strong>g a large group<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> people, which is hard to achieve). Across organizati<strong>on</strong>s this problem becomes more<br />

severe, because <str<strong>on</strong>g>the</str<strong>on</strong>g> group <str<strong>on</strong>g>of</str<strong>on</strong>g> people which need to reach c<strong>on</strong>sensus is much bigger and<br />

c<strong>on</strong>flicts <str<strong>on</strong>g>of</str<strong>on</strong>g> interest are more likely to occur. Therefore, it is likely that <str<strong>on</strong>g>the</str<strong>on</strong>g>re will be many<br />

different heterogeneous <strong>on</strong>tologies <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web and in order to enable interoperability<br />

between applicati<strong>on</strong>s <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, mediati<strong>on</strong> is required between<br />

different representati<strong>on</strong>s (<strong>on</strong>tologies) <str<strong>on</strong>g>of</str<strong>on</strong>g> knowledge in <str<strong>on</strong>g>the</str<strong>on</strong>g> same domain.<br />

This report presents a <str<strong>on</strong>g>survey</str<strong>on</strong>g> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> state <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mapping, merging and<br />

mediati<strong>on</strong>. It includes both well-known approaches in database integrati<strong>on</strong> and recent approaches<br />

specifically addressing <strong>on</strong>tology mapping <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. The Semantic<br />

Web has a number <str<strong>on</strong>g>of</str<strong>on</strong>g> distinguishing features relative to older data integrati<strong>on</strong> systems:<br />

• The Semantic Web relies heavily <strong>on</strong> standardizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> both <str<strong>on</strong>g>the</str<strong>on</strong>g> protocols for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

transport <str<strong>on</strong>g>of</str<strong>on</strong>g> data (HTTP) and <str<strong>on</strong>g>the</str<strong>on</strong>g> syntax for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> data and knowledge<br />

(RDF [LS99] and OWL [DS04]).<br />

• In c<strong>on</strong>trast with <str<strong>on</strong>g>the</str<strong>on</strong>g> database schemas in many data integrati<strong>on</strong> systems, <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data are made explicit through a logic-based language.<br />

• Ontologies capture knowledge in a way understandable to humans and processable<br />

by machines. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <strong>on</strong>tologies ideally represent a c<strong>on</strong>sensual view <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />

p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular domain, which is shared am<strong>on</strong>g a group <str<strong>on</strong>g>of</str<strong>on</strong>g> people.<br />

2 We use <str<strong>on</strong>g>the</str<strong>on</strong>g> words c<strong>on</strong>cept and class interchangeably in this document.<br />

3 Note that this is not <str<strong>on</strong>g>the</str<strong>on</strong>g> end <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> story. For many applicati<strong>on</strong>s it is necessary to detect whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r pieces<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> data, coming from different data sources annotated with <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tology, actually refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />

thing. This is a challenge we also address in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project; it is described in more detail<br />

in Secti<strong>on</strong> 2.1.2 <str<strong>on</strong>g>of</str<strong>on</strong>g> this report.


CHAPTER 1. INTRODUCTION 5<br />

These features help in <str<strong>on</strong>g>the</str<strong>on</strong>g> task <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong>. For example, because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

standardizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> languages <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, syntax does not play a big role, so<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> mediati<strong>on</strong> can focus <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data.<br />

Note that Semantic Web technologies can not <strong>on</strong>ly be put to use <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> World Wide<br />

Web. They can also be employed within company intranets in order to achieve interoperability<br />

between applicati<strong>on</strong>s within an organizati<strong>on</strong>.<br />

This chapter is fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r structured as follows. We first clarify <str<strong>on</strong>g>the</str<strong>on</strong>g> terminology used in<br />

this <str<strong>on</strong>g>survey</str<strong>on</strong>g> in Secti<strong>on</strong> 1.1. In Secti<strong>on</strong> 1.2 we explain <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping process as<br />

we see it, which is used during <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> to identify <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> certain methods in this<br />

process. Then, we present a list <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches, which can occur between <strong>on</strong>tologies, in<br />

Secti<strong>on</strong> 1.3. In Secti<strong>on</strong> 1.4 we summarize different approaches to achieve integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

multiple heterogeneous data sources. We c<strong>on</strong>clude with a few remarks about <str<strong>on</strong>g>the</str<strong>on</strong>g> wrapper/mediator<br />

architecture, which is used in several <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>, in<br />

Secti<strong>on</strong> 1.5.<br />

1.1 Terminology<br />

This secti<strong>on</strong> provides some clarificati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> terminology used in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>. We deem<br />

this necessary, because <str<strong>on</strong>g>the</str<strong>on</strong>g>re exist many different understandings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> terminology in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> literature.<br />

<strong>Ontology</strong> A widespread definiti<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> noti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology is ”an explicit specificati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> a c<strong>on</strong>ceptualizati<strong>on</strong>”[Gru94]. In <str<strong>on</strong>g>the</str<strong>on</strong>g> following we c<strong>on</strong>sider an <strong>on</strong>tology to be a<br />

4-tuple 〈C, R, I, A〉, where C is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts, R is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>s, I is a set <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

instances and A is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> axioms. A c<strong>on</strong>cept in an <strong>on</strong>tology is a category <str<strong>on</strong>g>of</str<strong>on</strong>g> real or<br />

virtual objects <str<strong>on</strong>g>of</str<strong>on</strong>g> interest, which we call instances. Relati<strong>on</strong>s specify how different<br />

c<strong>on</strong>cepts and instances are interc<strong>on</strong>nected, and axioms are formulae that allow to<br />

fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r specify interdependencies between all menti<strong>on</strong>ed <strong>on</strong>tology entities. All c<strong>on</strong>cepts,<br />

relati<strong>on</strong>s, instances, and axioms are specified in some logical language. Note<br />

that <str<strong>on</strong>g>the</str<strong>on</strong>g> sets C and I are not necessarily disjoint (e.g. <str<strong>on</strong>g>the</str<strong>on</strong>g> same term can denote<br />

both a class and an instance), although <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language <str<strong>on</strong>g>of</str<strong>on</strong>g>ten requires this.<br />

In an <strong>on</strong>tology, c<strong>on</strong>cepts are usually organized in a subclass hierarchy, through <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> (or subc<strong>on</strong>cept-<str<strong>on</strong>g>of</str<strong>on</strong>g>)relati<strong>on</strong>ship. More general c<strong>on</strong>cepts reside higher in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> hierarchy.<br />

Instance Base Although instances are logically p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology, it is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten useful to<br />

separate between an <strong>on</strong>tology describing a collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances and <str<strong>on</strong>g>the</str<strong>on</strong>g> collecti<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> instances described by <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. We refer to this collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances as <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Instance Base. Instance bases are sometimes used to discover similarities between<br />

c<strong>on</strong>cepts in different <strong>on</strong>tologies (e.g. [SM01], [DMDH04]). Note that this does not


CHAPTER 1. INTRODUCTION 6<br />

rule out <str<strong>on</strong>g>the</str<strong>on</strong>g> situati<strong>on</strong> where instances use several <strong>on</strong>tologies for <str<strong>on</strong>g>the</str<strong>on</strong>g>ir descripti<strong>on</strong>.<br />

However, most approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> which make use <str<strong>on</strong>g>of</str<strong>on</strong>g> instances assume a<br />

collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances described by <strong>on</strong>e <strong>on</strong>tology.<br />

<strong>Ontology</strong> Language The <strong>on</strong>tology language is <str<strong>on</strong>g>the</str<strong>on</strong>g> language which is used to represent<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. Popular <strong>on</strong>tology languages for <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web are RDFS [BG04]<br />

and OWL [DS04]. Semantic Web <strong>on</strong>tology languages can be split up into two<br />

p<str<strong>on</strong>g>art</str<strong>on</strong>g>s: <str<strong>on</strong>g>the</str<strong>on</strong>g> logical and <str<strong>on</strong>g>the</str<strong>on</strong>g> extra-logical p<str<strong>on</strong>g>art</str<strong>on</strong>g>s. The logical p<str<strong>on</strong>g>art</str<strong>on</strong>g> usually amounts to a<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>ory in some logical language, which can be used for reas<strong>on</strong>ing. The logical p<str<strong>on</strong>g>art</str<strong>on</strong>g><br />

basically c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a number <str<strong>on</strong>g>of</str<strong>on</strong>g> logical axioms, which form <str<strong>on</strong>g>the</str<strong>on</strong>g> class (c<strong>on</strong>cept)<br />

definiti<strong>on</strong>s, property (relati<strong>on</strong>) definiti<strong>on</strong>s, instance definiti<strong>on</strong>s, etc.<br />

The extra-logical p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> language typically c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> n<strong>on</strong>-functi<strong>on</strong>al properties<br />

(e.g. author name, creati<strong>on</strong> date, natural language comments, multi-lingual<br />

labels) and o<str<strong>on</strong>g>the</str<strong>on</strong>g>r extra-logical statements, such as namespace declarati<strong>on</strong>s, <strong>on</strong>tology<br />

imports, versi<strong>on</strong>ing, etc.<br />

N<strong>on</strong>-functi<strong>on</strong>al properties are typically <strong>on</strong>ly for <str<strong>on</strong>g>the</str<strong>on</strong>g> human reader, whereas many <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r extra-logical statements are machine-processable. For example, namespaces<br />

can be resolved by <str<strong>on</strong>g>the</str<strong>on</strong>g> machine and <str<strong>on</strong>g>the</str<strong>on</strong>g> importing <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies can be<br />

achieved automatically by ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r (a) appending <str<strong>on</strong>g>the</str<strong>on</strong>g> logical p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> imported <strong>on</strong>tology<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> logical p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> importing <strong>on</strong>tology to create <strong>on</strong>e logical <str<strong>on</strong>g>the</str<strong>on</strong>g>ory or<br />

(b) using a mediator, which resolves <str<strong>on</strong>g>the</str<strong>on</strong>g> heterogeneity between <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies<br />

(see also <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong> Mediati<strong>on</strong> below).<br />

<strong>Ontology</strong> Mediati<strong>on</strong> <strong>Ontology</strong> mediati<strong>on</strong> is <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> rec<strong>on</strong>ciling differences between<br />

heterogeneous <strong>on</strong>tologies in order to achieve inter-operati<strong>on</strong> between data<br />

sources annotated with and applicati<strong>on</strong>s using <str<strong>on</strong>g>the</str<strong>on</strong>g>se <strong>on</strong>tologies. This includes <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

discovery and specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings, as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se mappings<br />

for certain tasks, such as query rewriting and instance transformati<strong>on</strong>. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> merging <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies also falls under <str<strong>on</strong>g>the</str<strong>on</strong>g> term <strong>on</strong>tology mediati<strong>on</strong>.<br />

<strong>Ontology</strong> Mapping An <strong>on</strong>tology mapping M is a (declarative) specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic<br />

overlap between two <strong>on</strong>tologies OS and OT . The corresp<strong>on</strong>dences between<br />

different entities <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies are typically expressed using some axioms<br />

formulated in a specific mapping language. Mappings can be unidirecti<strong>on</strong>al (that<br />

specify how terms in <strong>on</strong>e <strong>on</strong>tology can be expressed using terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

<strong>on</strong>tology) or bidirecti<strong>on</strong>al (that work in both ways, i.e terms from <strong>on</strong>e <strong>on</strong>tology are<br />

expressed using terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r and <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r way around).<br />

Mapping Language The mapping language is <str<strong>on</strong>g>the</str<strong>on</strong>g> language used to represent <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

mapping M. It is important here to distinguish between a specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> similarities <str<strong>on</strong>g>of</str<strong>on</strong>g> entities between <strong>on</strong>tologies and an actual <strong>on</strong>tology mapping. The<br />

specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities between <strong>on</strong>tologies is usually a level <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>fidence (usually<br />

between 0 and 1) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> entities, whereas an <strong>on</strong>tology mapping


CHAPTER 1. INTRODUCTION 7<br />

actually specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. This is typically<br />

an exact specificati<strong>on</strong> and typically far more powerful than simple similarity<br />

measures. Mapping languages <str<strong>on</strong>g>of</str<strong>on</strong>g>ten allow arbitrary transformati<strong>on</strong> between <strong>on</strong>tologies,<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g>ten using a rule-based formalism and typically allowing arbitrary value<br />

transformati<strong>on</strong>s.<br />

Mapping Pattern Although not <str<strong>on</strong>g>of</str<strong>on</strong>g>ten used in current approaches to <strong>on</strong>tology mediati<strong>on</strong>,<br />

patterns can play an important role in <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings, because<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y have <str<strong>on</strong>g>the</str<strong>on</strong>g> potential to make mappings more c<strong>on</strong>cise, better understandable<br />

and reduce <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> errors (cf. [PGM98]). A mapping pattern can be seen<br />

as a template for mappings which occur very <str<strong>on</strong>g>of</str<strong>on</strong>g>ten. Patterns can range from very<br />

simple (e.g. a mapping between a c<strong>on</strong>cept and a relati<strong>on</strong>) to very complex, in which<br />

case <str<strong>on</strong>g>the</str<strong>on</strong>g> pattern captures comprehensive substructures <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, which are<br />

related in a certain way.<br />

Matching We define <strong>on</strong>tology matching as <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> discovering similarities between<br />

two source <strong>on</strong>tologies. The result <str<strong>on</strong>g>of</str<strong>on</strong>g> a matching operati<strong>on</strong> is a specificati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> similarities between two <strong>on</strong>tologies. <strong>Ontology</strong> matching is d<strong>on</strong>e through<br />

applicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator (cf. [RB01]). Any schema matching or <strong>on</strong>tology<br />

matching algorithm can be used to implement <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator, e.g.<br />

[DMDH04, GSY04, MBR01, MRB03].<br />

The specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities typically serves as an input to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping<br />

(see also Secti<strong>on</strong> 1.2).<br />

<strong>Merging</strong> Ontologies <strong>Merging</strong> refers to <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e <strong>on</strong>tology from two or more<br />

source <strong>on</strong>tologies. The new <strong>on</strong>tology will unify and in general replace <str<strong>on</strong>g>the</str<strong>on</strong>g> original<br />

source <strong>on</strong>tologies (see also Secti<strong>on</strong> 1.4).<br />

Aligning Ontologies Specifying how <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> different <strong>on</strong>tologies are related<br />

in a logical sense. This means that <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies have not changed, but<br />

that additi<strong>on</strong>al axioms describe <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts. Leaving<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies unchanged <str<strong>on</strong>g>of</str<strong>on</strong>g>ten implies that <strong>on</strong>ly a p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong><br />

can be d<strong>on</strong>e, because major differences may require adaptati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

This c<strong>on</strong>cept is very similar with <strong>Ontology</strong> Mapping, however it is a more general<br />

c<strong>on</strong>cept: two <strong>on</strong>tologies can be aligned by creating an <strong>on</strong>tology mapping.<br />

1.2 The <strong>Ontology</strong> Mapping Process<br />

In order to clarify <str<strong>on</strong>g>the</str<strong>on</strong>g> role <str<strong>on</strong>g>of</str<strong>on</strong>g> many <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> methods, tools and techniques in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>,<br />

we will explain in this secti<strong>on</strong> what we see as <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping process. Many <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> tools and techniques in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> form a p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> overall mapping process and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

integrati<strong>on</strong> systems typically form a superset <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g>y typically


CHAPTER 1. INTRODUCTION 8<br />

Figure 1.1: The <strong>Ontology</strong> Mapping Process<br />

incorporate <str<strong>on</strong>g>the</str<strong>on</strong>g> complete mapping process, but also <str<strong>on</strong>g>of</str<strong>on</strong>g>fer additi<strong>on</strong>al functi<strong>on</strong>ality, such as<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping to perform <str<strong>on</strong>g>the</str<strong>on</strong>g> actual querying and data integrati<strong>on</strong>.<br />

First, we have to note that for simplicity we assume <strong>on</strong>ly two 4 different <strong>on</strong>tologies O1<br />

and O2, which describe <str<strong>on</strong>g>the</str<strong>on</strong>g> same or similar domains, as input to <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process.<br />

The outcome <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process is ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r a mapping M, which describes how O1 and<br />

O2 are related, or a new <strong>on</strong>tology OM, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> merger <str<strong>on</strong>g>of</str<strong>on</strong>g> O1 and O2.<br />

Figure 1.1 depicts <str<strong>on</strong>g>the</str<strong>on</strong>g> different phases in <str<strong>on</strong>g>the</str<strong>on</strong>g> generic mapping process as we see it.<br />

Not all phases are necessarily incorporated in every mapping tool and several phases in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> process are opti<strong>on</strong>al. We distinguish <str<strong>on</strong>g>the</str<strong>on</strong>g> following phases (in temporal order) in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

mapping process:<br />

1. Import <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies Ontologies can be specified in different languages, which indicates<br />

a need to c<strong>on</strong>vert <str<strong>on</strong>g>the</str<strong>on</strong>g>m to a comm<strong>on</strong> format in order to be able to specify<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies need to be imported in <str<strong>on</strong>g>the</str<strong>on</strong>g> tool, which is<br />

used to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping.<br />

2. Finding Similarities Many systems use <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator to (semi-)automatically<br />

find similarities between schemas or <strong>on</strong>tologies. For any two source <strong>on</strong>tologies,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator returns <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. We distinguish<br />

this phase in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process <strong>on</strong>ly when <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities are determined in<br />

an automatic fashi<strong>on</strong>. If <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process is completely manual, this phase is<br />

skipped.<br />

3. Specifying Mapping/<strong>Merging</strong> After (potential) similarities between <strong>on</strong>tologies have<br />

been found, <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies needs to be specified. This specificati<strong>on</strong><br />

is usually a manual process, but it can be aided by a tool. PROMPT<br />

[NM00b], for example, comes up with c<strong>on</strong>crete proposals for merge operati<strong>on</strong>s,<br />

so that for many operati<strong>on</strong>s <str<strong>on</strong>g>the</str<strong>on</strong>g> user <strong>on</strong>ly needs to say “execute”, instead <str<strong>on</strong>g>of</str<strong>on</strong>g> having<br />

to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> complete operati<strong>on</strong>.<br />

In many cases (e.g. PROMPT), <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a feedback loop from this phase to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

previous phase. Typically, <str<strong>on</strong>g>the</str<strong>on</strong>g> tool can <str<strong>on</strong>g>of</str<strong>on</strong>g>fer more precise similarity measures when<br />

4 It is straightforward to scale up this approach to more than two <strong>on</strong>tologies.


CHAPTER 1. INTRODUCTION 9<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> user has already specified p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. Many matching algorithms<br />

do not include this feedback loop. However, <str<strong>on</strong>g>the</str<strong>on</strong>g>se algorithms can <str<strong>on</strong>g>of</str<strong>on</strong>g>ten be readily<br />

applied in an overall algorithm which executes <str<strong>on</strong>g>the</str<strong>on</strong>g> match algorithm in each iterati<strong>on</strong><br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> process.<br />

The three phases <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process are specified at a very high level. Many <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> provide a more detailed descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> (p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g>) this mapping<br />

process (e.g. PROMPT, Secti<strong>on</strong> 4.1.3, MOMIS, Secti<strong>on</strong> 4.2.4).<br />

1.3 <strong>Ontology</strong> Mismatches<br />

Different types <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches can occur between different <strong>on</strong>tologies. It is important<br />

to identify which kind <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches can and do occur between <strong>on</strong>tologies, in order to<br />

resolve <str<strong>on</strong>g>the</str<strong>on</strong>g>se mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping or <str<strong>on</strong>g>the</str<strong>on</strong>g> merge <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. The classificati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mismatches is also important to denote which kind <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches can be<br />

resolved with a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular mapping formalism (language) and which kind <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches<br />

can be detected with a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular matching algorithm.<br />

Klein [Kle01] identifies two levels <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches between <strong>on</strong>tologies. The first level<br />

is <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language or meta-model level. These mismatches include syntactic mismatches,<br />

differences in <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> primitives in <str<strong>on</strong>g>the</str<strong>on</strong>g> different languages, and differences<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> expressivity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> languages. We will describe <str<strong>on</strong>g>the</str<strong>on</strong>g>se mismatches in more<br />

detail in Secti<strong>on</strong> 1.3.2. The sec<strong>on</strong>d level <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches is <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology or model level,<br />

which is described below.<br />

1.3.1 <strong>Ontology</strong>-level Mismatches<br />

Where mismatches at <str<strong>on</strong>g>the</str<strong>on</strong>g> language level include differences in encoding and meaning <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

language c<strong>on</strong>structs, mismatches at <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology level include mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning<br />

or encoding <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts in different <strong>on</strong>tologies. Klein follows <str<strong>on</strong>g>the</str<strong>on</strong>g> basic types <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology<br />

mismatches identified in [VJBCS97]:<br />

• C<strong>on</strong>ceptualizati<strong>on</strong> mismatches are mismatches between different c<strong>on</strong>ceptualizati<strong>on</strong>s<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same domain.<br />

• Explicati<strong>on</strong> mismatches are mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> way a c<strong>on</strong>ceptualizati<strong>on</strong> is specified.<br />

[Kle01] distinguishes two different c<strong>on</strong>ceptualizati<strong>on</strong> mismatches:<br />

Scope mismatch Two classes have some overlap in <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong> (<str<strong>on</strong>g>the</str<strong>on</strong>g> set <str<strong>on</strong>g>of</str<strong>on</strong>g> instances),<br />

but <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong>s are not exactly <str<strong>on</strong>g>the</str<strong>on</strong>g> same. [VJBCS97] call this a class mismatch<br />

and work it out fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r for classes and relati<strong>on</strong>s.


CHAPTER 1. INTRODUCTION 10<br />

Model coverage and granularity This mismatch is a difference in <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> domain<br />

that is covered by both <strong>on</strong>tologies or <str<strong>on</strong>g>the</str<strong>on</strong>g> level <str<strong>on</strong>g>of</str<strong>on</strong>g> detail with which <str<strong>on</strong>g>the</str<strong>on</strong>g> model is<br />

covered.<br />

Klein fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore distinguishes different types <str<strong>on</strong>g>of</str<strong>on</strong>g> explicati<strong>on</strong> mismatches. First, <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

are two mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> style <str<strong>on</strong>g>of</str<strong>on</strong>g> modeling:<br />

Paradigm These mismatches occur when different paradigms are used for <str<strong>on</strong>g>the</str<strong>on</strong>g> explicati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same c<strong>on</strong>cept. For example, <strong>on</strong>e <strong>on</strong>tology might represent time using<br />

intervals, while ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology might use points to represent time.<br />

C<strong>on</strong>cept descripti<strong>on</strong> Mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> way a c<strong>on</strong>cept is described. For example, differences<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> way <str<strong>on</strong>g>the</str<strong>on</strong>g> subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> hierarchy is built or when in <strong>on</strong>e <strong>on</strong>tology several<br />

subclasses are defined for groups <str<strong>on</strong>g>of</str<strong>on</strong>g> instances, while in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology subclasses<br />

are created for <str<strong>on</strong>g>the</str<strong>on</strong>g>se different groups.<br />

Then <str<strong>on</strong>g>the</str<strong>on</strong>g>re are <str<strong>on</strong>g>the</str<strong>on</strong>g> terminological mismatches:<br />

Syn<strong>on</strong>ym terms Two terms are equivalent when <str<strong>on</strong>g>the</str<strong>on</strong>g>y are semantically equivalent, but<br />

are represented by different names. It is possible to use dicti<strong>on</strong>aries or <str<strong>on</strong>g>the</str<strong>on</strong>g>sauri to<br />

resolve this problem, but <strong>on</strong>e should be aware <str<strong>on</strong>g>of</str<strong>on</strong>g> possible scope differences (see <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

first c<strong>on</strong>ceptualizati<strong>on</strong> mismatch above).<br />

Hom<strong>on</strong>ym terms This problem occurs when semantically different c<strong>on</strong>cepts have <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

same name.<br />

Finally, <str<strong>on</strong>g>the</str<strong>on</strong>g> last type <str<strong>on</strong>g>of</str<strong>on</strong>g> difference:<br />

Encoding Values in different <strong>on</strong>tologies might be encoded in a different way. For example,<br />

<strong>on</strong>e <strong>on</strong>tology might define distance in kilometers, while ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r uses miles.<br />

Inter-<strong>on</strong>tology relati<strong>on</strong>ships [MIKS00] takes a slightly different approach. This paper<br />

identifies different types <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships (based <strong>on</strong> relati<strong>on</strong>ships identified<br />

in [HM93]) that should be taken into account by <strong>on</strong>tology mapping systems:<br />

Syn<strong>on</strong>ym Two terms in different <strong>on</strong>tologies have <str<strong>on</strong>g>the</str<strong>on</strong>g> same semantics. This corresp<strong>on</strong>ds<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> syn<strong>on</strong>ym terms mismatch menti<strong>on</strong>ed above.<br />

Hyp<strong>on</strong>ym A term is less general than ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>e in a different <strong>on</strong>tology. This is a special<br />

kind <str<strong>on</strong>g>of</str<strong>on</strong>g> scope mismatch and can also be seen as a c<strong>on</strong>cept descripti<strong>on</strong> mismatch.<br />

Hypernym A term is more general than ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>e in a different <strong>on</strong>tology. This is<br />

a special kind <str<strong>on</strong>g>of</str<strong>on</strong>g> scope mismatch and can also be seen as a c<strong>on</strong>cept descripti<strong>on</strong><br />

mismatch.


CHAPTER 1. INTRODUCTION 11<br />

Overlap There is an intersecti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracti<strong>on</strong> represented by two terms. This corresp<strong>on</strong>ds<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> scope mismatch.<br />

Disjoint There is no intersecti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracti<strong>on</strong> represented by two terms.<br />

Covering The abstracti<strong>on</strong> represented by a term in <strong>on</strong>e <strong>on</strong>tology is <str<strong>on</strong>g>the</str<strong>on</strong>g> same as <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracti<strong>on</strong><br />

represented by <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r given abstracti<strong>on</strong>s which are subsumed<br />

individually by <str<strong>on</strong>g>the</str<strong>on</strong>g> term. This corresp<strong>on</strong>ds to <str<strong>on</strong>g>the</str<strong>on</strong>g> granularity mismatch identified<br />

by Klein.<br />

1.3.2 Language-level mismatches<br />

Typically, <strong>on</strong>tology mappings require <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies to be represented<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> same language. This translati<strong>on</strong> may already resolve most <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> language issues,<br />

which can occur. Typical language level mismatches are syntax, logical representati<strong>on</strong>,<br />

semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> primitives and language expressivity [Kle01]. Most systems presented in this<br />

<str<strong>on</strong>g>survey</str<strong>on</strong>g> do such a translati<strong>on</strong> but do not say if and how <str<strong>on</strong>g>the</str<strong>on</strong>g> issues surrounding language<br />

level mismatches are resolved in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> internal representati<strong>on</strong>.<br />

Also, many methods and tools for matching and mapping require <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies<br />

to be expressed in a certain representati<strong>on</strong>al format. Although <str<strong>on</strong>g>the</str<strong>on</strong>g>re typically exists a<br />

translati<strong>on</strong> from any <strong>on</strong>tology language into this p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular representati<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> preservati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> semantics can still not be guaranteed.<br />

We will now go over <str<strong>on</strong>g>the</str<strong>on</strong>g> above menti<strong>on</strong>ed language level mismatches and describe<br />

what <str<strong>on</strong>g>the</str<strong>on</strong>g> (potential) issues are with <str<strong>on</strong>g>the</str<strong>on</strong>g> current systems and techniques:<br />

• Certainly, differences in syntax would be resolved by any such translati<strong>on</strong> to an<br />

internal representati<strong>on</strong>, since both <strong>on</strong>tologies <str<strong>on</strong>g>the</str<strong>on</strong>g>n use <str<strong>on</strong>g>the</str<strong>on</strong>g> same syntax.<br />

• Differences in logical representati<strong>on</strong> occur when syntactically different, but logically<br />

equivalent statements are used to represent <str<strong>on</strong>g>the</str<strong>on</strong>g> same thing. An example <str<strong>on</strong>g>of</str<strong>on</strong>g> this<br />

is <str<strong>on</strong>g>the</str<strong>on</strong>g> way disjointness is expressed in <str<strong>on</strong>g>the</str<strong>on</strong>g> OWL Lite species <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Web <strong>Ontology</strong><br />

Language OWL [MvH04], compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> way disjointness is usually expressed<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> OWL DL species 5 .<br />

Arguably, this is not really an issue with <str<strong>on</strong>g>the</str<strong>on</strong>g> language itself, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r an issue with<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> language. However, when a language allows <str<strong>on</strong>g>the</str<strong>on</strong>g> user to model <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

same thing in different ways, it is easy for a user to mistakenly model certain things<br />

in an inc<strong>on</strong>venient way and it is harder for a user to understand <str<strong>on</strong>g>the</str<strong>on</strong>g> model created<br />

by a different user or indeed created by him/herself in <str<strong>on</strong>g>the</str<strong>on</strong>g> past. When <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

language used by <str<strong>on</strong>g>the</str<strong>on</strong>g> technique/tool/system allows such different logical representati<strong>on</strong>s<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> equivalent statements, this mismatch still needs to be taken into account<br />

5 The OWL Lite statement Class(owl:Nothing complete A B), although also valid in OWL<br />

DL, is usually modeled as DisjointClasses(A B) in OWL DL


CHAPTER 1. INTRODUCTION 12<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping process. In order to overcome <str<strong>on</strong>g>the</str<strong>on</strong>g>se issues, <strong>on</strong>e could think<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> a normalizati<strong>on</strong> step before <str<strong>on</strong>g>the</str<strong>on</strong>g> st<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process or reas<strong>on</strong>ing during<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process in order to detect equivalence in logical expressi<strong>on</strong>s.<br />

• When <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> primitives is different in different <strong>on</strong>tology languages, i.e. a<br />

syntactically equivalent c<strong>on</strong>struct has a different meaning in <str<strong>on</strong>g>the</str<strong>on</strong>g> different languages,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> representati<strong>on</strong> needs to take this into account. Fortunately,<br />

this problem can be resolved in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> representati<strong>on</strong>.<br />

If both <strong>on</strong>tologies already use <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> representati<strong>on</strong> and this comm<strong>on</strong><br />

representati<strong>on</strong> does not allow ambiguous statements, this mismatch does not occur.<br />

• Differences in expressivity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> languages are resolved in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

comm<strong>on</strong> representati<strong>on</strong> language. However, if <str<strong>on</strong>g>the</str<strong>on</strong>g> expressivity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> representati<strong>on</strong><br />

language is not a superset <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> language <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology, some<br />

semantics might get lost in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong>, as was pointed out in [MWK00].<br />

As we can see, <str<strong>on</strong>g>the</str<strong>on</strong>g> issues with language mismatches are less severe if <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a translati<strong>on</strong><br />

to a comm<strong>on</strong> representati<strong>on</strong>. However, all <strong>on</strong>tology mismatches menti<strong>on</strong>ed previously<br />

need to be taken into account when creating any mapping between <strong>on</strong>tologies.<br />

1.4 Integrati<strong>on</strong> approaches<br />

We identify two major paradigms in informati<strong>on</strong> integrati<strong>on</strong>: (1) merging data models<br />

into a central model and (2) aligning and mapping models. In <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology engineering<br />

community <str<strong>on</strong>g>the</str<strong>on</strong>g>se approaches are known as <strong>Ontology</strong> <strong>Merging</strong> and <strong>Ontology</strong> Aligning.<br />

We distinguish two distinct approaches in <strong>on</strong>tology merging. In <str<strong>on</strong>g>the</str<strong>on</strong>g> first approach <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

input <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process is a collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> outcome is <strong>on</strong>e new,<br />

merged, <strong>on</strong>tology which captures <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies (see Figure 1.2). A prominent<br />

example <str<strong>on</strong>g>of</str<strong>on</strong>g> this approach is PROMPT [NM00b], which is an algorithm and a tool for<br />

interactively merging <strong>on</strong>tologies. In <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d approach (see Figure 1.3) <str<strong>on</strong>g>the</str<strong>on</strong>g> original<br />

<strong>on</strong>tologies are not replaced, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r a ‘view’, called bridge <strong>on</strong>tology, is created which<br />

imports <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies and specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> corresp<strong>on</strong>dences using bridge axioms.<br />

OntoMerge [DMQ02] is a prominent example <str<strong>on</strong>g>of</str<strong>on</strong>g> this approach.<br />

[NM99] clarifies <str<strong>on</strong>g>the</str<strong>on</strong>g> difference between <strong>on</strong>tology merging and <strong>on</strong>tology aligning.<br />

When merging two <strong>on</strong>tologies, a single coherent <strong>on</strong>tology is created that is a merged<br />

versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two original <strong>on</strong>tologies. When aligning two <strong>on</strong>tologies, <str<strong>on</strong>g>the</str<strong>on</strong>g> two original<br />

<strong>on</strong>tologies persist, with a number <str<strong>on</strong>g>of</str<strong>on</strong>g> links established between <str<strong>on</strong>g>the</str<strong>on</strong>g>m, allowing <str<strong>on</strong>g>the</str<strong>on</strong>g> aligned<br />

<strong>on</strong>tologies to reuse informati<strong>on</strong> from <strong>on</strong>e ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r. The alignment <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies is usually<br />

p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology merging process.<br />

Soluti<strong>on</strong>s can be fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r classified al<strong>on</strong>g two dimensi<strong>on</strong>s: a run-time and a design-time<br />

dimensi<strong>on</strong>. The run-time dimensi<strong>on</strong> c<strong>on</strong>cerns with <str<strong>on</strong>g>the</str<strong>on</strong>g> way <str<strong>on</strong>g>the</str<strong>on</strong>g> user views <str<strong>on</strong>g>the</str<strong>on</strong>g> data in <str<strong>on</strong>g>the</str<strong>on</strong>g>


CHAPTER 1. INTRODUCTION 13<br />

Figure 1.2: Output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process - case 1<br />

Figure 1.3: Output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process - case 2<br />

system during operati<strong>on</strong>. The design-time dimensi<strong>on</strong> c<strong>on</strong>cerns with <str<strong>on</strong>g>the</str<strong>on</strong>g> way <str<strong>on</strong>g>the</str<strong>on</strong>g> models<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> disparate data sources are integrated.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time, or user-centered dimensi<strong>on</strong> we distinguish two approaches: (1) <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

local model and (2) <str<strong>on</strong>g>the</str<strong>on</strong>g> global model approach. The difference between <str<strong>on</strong>g>the</str<strong>on</strong>g>se two approaches<br />

is whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r, in interacti<strong>on</strong>s with <str<strong>on</strong>g>the</str<strong>on</strong>g> system, <str<strong>on</strong>g>the</str<strong>on</strong>g> user can use his/her own local<br />

data model, or whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> user needs to c<strong>on</strong>form to a global model when interacting with<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> system:<br />

The Local Model/<strong>Ontology</strong> Approach. In this case <str<strong>on</strong>g>the</str<strong>on</strong>g> user is represented by an agent<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> system and this agent represents <str<strong>on</strong>g>the</str<strong>on</strong>g> user with its own local data model. The<br />

agent performs <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s local model and ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> global<br />

model or o<str<strong>on</strong>g>the</str<strong>on</strong>g>r local models in order to allow interacti<strong>on</strong> with multiple data sources<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />

The Global Model/<strong>Ontology</strong> Approach. The user views <str<strong>on</strong>g>the</str<strong>on</strong>g> system through <str<strong>on</strong>g>the</str<strong>on</strong>g> global<br />

data model using a mediator, which is ”a system that supports an integrated view<br />

over multiple informati<strong>on</strong> sources”[Hul97]. Note that in <str<strong>on</strong>g>the</str<strong>on</strong>g> local model approach,<br />

a user agent will in most cases also c<strong>on</strong>tact a mediator in order to allow interoperati<strong>on</strong><br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> system, which c<strong>on</strong>tains multiple informati<strong>on</strong> sources. An example<br />

is <str<strong>on</strong>g>the</str<strong>on</strong>g> approach taken in <str<strong>on</strong>g>the</str<strong>on</strong>g> COG project[dB04].<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> design-time dimensi<strong>on</strong> we distinguish:<br />

One-to-<strong>on</strong>e mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. Mappings are created between pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

Problems with this approach arise when many such mappings need to be created,


CHAPTER 1. INTRODUCTION 14<br />

Figure 1.4: The <strong>on</strong>e-to-<strong>on</strong>e approach for <strong>on</strong>tology integrati<strong>on</strong><br />

which is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten <str<strong>on</strong>g>the</str<strong>on</strong>g> case in organizati<strong>on</strong>s where many different applicati<strong>on</strong>s are in<br />

use. The complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping for <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e approach is O(n 2 ),<br />

where n is <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. An example <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e approach is<br />

OBSERVER [MIKS00] (see also Secti<strong>on</strong> 4.2.3). Figure 1.4 illustrates <strong>on</strong>e-to-<strong>on</strong>e<br />

mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. There exists a mapping between every pair <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. In<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> worst case, <str<strong>on</strong>g>the</str<strong>on</strong>g>se mappings are <strong>on</strong>ly <strong>on</strong>e-way. This means that a single mapping<br />

can <strong>on</strong>ly translate from <strong>on</strong>e model to ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r, not <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r way around.<br />

Mapping to a global <strong>on</strong>tology. Each <strong>on</strong>tology is mapped to a central shared <strong>on</strong>tology<br />

(Figure 1.5. Drawbacks <str<strong>on</strong>g>of</str<strong>on</strong>g> using a global <strong>on</strong>tology to which all o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies<br />

are mapped are similar to those <str<strong>on</strong>g>of</str<strong>on</strong>g> using any standard [VC98]. For example, it<br />

is hard to reach a c<strong>on</strong>sensus <strong>on</strong> a standard shared by many people (it is always<br />

a lengthy process), who use different terminologies for <str<strong>on</strong>g>the</str<strong>on</strong>g> same domain and a<br />

standard impedes changes in an organizati<strong>on</strong> (because evoluti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> standards suffers


CHAPTER 1. INTRODUCTION 15<br />

Figure 1.5: The global approach for <strong>on</strong>tology integrati<strong>on</strong><br />

from <str<strong>on</strong>g>the</str<strong>on</strong>g> same problems as <str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> standards). An example <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global<br />

<strong>on</strong>tology approach is MOMIS [BCVB01] (see also Secti<strong>on</strong> 4.2.4).<br />

Hybrid approach. On <str<strong>on</strong>g>the</str<strong>on</strong>g> Web, not much c<strong>on</strong>trol over <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies can be enforced<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> global integrati<strong>on</strong> scenario does not scale, because different organizati<strong>on</strong>s<br />

use different <strong>on</strong>tologies and do not want to commit to a single new <strong>on</strong>tology.<br />

However, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e integrati<strong>on</strong> approach also does not to scale, because it requires<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> maintenance <str<strong>on</strong>g>of</str<strong>on</strong>g> too many mappings between <strong>on</strong>tologies. Therefore, a<br />

hybrid approach is suitable, where <str<strong>on</strong>g>the</str<strong>on</strong>g>re are several ”islands” around influential<br />

domain <strong>on</strong>tologies, where within <str<strong>on</strong>g>the</str<strong>on</strong>g> island <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a form <str<strong>on</strong>g>of</str<strong>on</strong>g> global integrati<strong>on</strong>;<br />

<strong>on</strong>e <strong>on</strong>tology would be <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> islands and a number <str<strong>on</strong>g>of</str<strong>on</strong>g> local<br />

<strong>on</strong>tologies are mapped to this global <strong>on</strong>tology. Mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> islands are<br />

established, as illustrated in Figure 1.6.<br />

Note that many methods, tools and techniques in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> do not have a bias for <strong>on</strong>e


CHAPTER 1. INTRODUCTION 16<br />

Figure 1.6: The hybrid approach for <strong>on</strong>tology integrati<strong>on</strong>


CHAPTER 1. INTRODUCTION 17<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> above approaches. They can <str<strong>on</strong>g>of</str<strong>on</strong>g>ten be used in all scenarios, although an <strong>on</strong>tology<br />

merging tool such as PROMPT [NM00b] does seem to have a bias towards using a shared<br />

<strong>on</strong>tology.<br />

The more comprehensive integrati<strong>on</strong> systems typically prescribe which paradigm<br />

should be used. MOMIS [BCVB01], for example, prescribes <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> a global merged<br />

<strong>on</strong>tology for <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources, whereas OBSERVER [MIKS00] prescribes<br />

loosely coupled comp<strong>on</strong>ent <strong>on</strong>tologies with mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

The approaches that fall in <str<strong>on</strong>g>the</str<strong>on</strong>g> global-approach category can be fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r classified by<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> directi<strong>on</strong>ality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> shared <strong>on</strong>tology.<br />

This distincti<strong>on</strong> is inherited from <str<strong>on</strong>g>the</str<strong>on</strong>g> database setting, where <str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong> task is<br />

seen as a process by which several databases with associated local schemas are integrated<br />

to form a virtual database (thus it c<strong>on</strong>forms to <str<strong>on</strong>g>the</str<strong>on</strong>g> global approach), and views c<strong>on</strong>nect<br />

local schemas with <str<strong>on</strong>g>the</str<strong>on</strong>g> global virtual schema. The terms LAV, GAV, and GLAV were<br />

introduced for describing different approaches in database integrati<strong>on</strong> [Len]:<br />

GAV (Global as view). The c<strong>on</strong>structs <str<strong>on</strong>g>of</str<strong>on</strong>g> a global schema are described as views over<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas (in database terminology) or <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings describe entities from<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> shared <strong>on</strong>tology in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> entities from <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies (in <strong>on</strong>tology mediati<strong>on</strong><br />

terminology).<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> case where <str<strong>on</strong>g>the</str<strong>on</strong>g>re are no integrity c<strong>on</strong>straints in <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema, query<br />

processing is ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r simple, c<strong>on</strong>sisting in unfolding <str<strong>on</strong>g>the</str<strong>on</strong>g> original query. Unfolding<br />

means expanding <str<strong>on</strong>g>the</str<strong>on</strong>g> terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query, which are c<strong>on</strong>structs <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema,<br />

according to <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings, in order to obtain a query which c<strong>on</strong>tains <strong>on</strong>ly c<strong>on</strong>structs<br />

from local schemas. Unfortunately, in <str<strong>on</strong>g>the</str<strong>on</strong>g> presence <str<strong>on</strong>g>of</str<strong>on</strong>g> integrity c<strong>on</strong>straints (which is<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> case when <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema is an <strong>on</strong>tology), additi<strong>on</strong>al steps must be performed<br />

for reas<strong>on</strong>ing[CDGL01].<br />

One disadvantage <str<strong>on</strong>g>of</str<strong>on</strong>g> this method is that whenever a source changes or a new <strong>on</strong>e<br />

is added, <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema needs to be rec<strong>on</strong>sidered. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r disadvantage menti<strong>on</strong>ed<br />

in [EA] is that it cannot fully capture <str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong> semantics where<br />

a source schema c<strong>on</strong>struct can be defined by a n<strong>on</strong>-reversible functi<strong>on</strong> over global<br />

schema c<strong>on</strong>structs.<br />

LAV (Local as view). The c<strong>on</strong>structs <str<strong>on</strong>g>of</str<strong>on</strong>g> local schemas are described as views over <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

global schema (in database terminology) or <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings define entities from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

local <strong>on</strong>tologies in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> entities from <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology (in <strong>on</strong>tology mediati<strong>on</strong><br />

terminology). This approach is characterized by a high modularity and extensibility<br />

(<str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas are expected to be <str<strong>on</strong>g>the</str<strong>on</strong>g> subject <str<strong>on</strong>g>of</str<strong>on</strong>g> change more frequently than <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

global schema and changing a local schema does not require changes in <str<strong>on</strong>g>the</str<strong>on</strong>g> global<br />

<strong>on</strong>e). A disadvantage <str<strong>on</strong>g>of</str<strong>on</strong>g> this method is <str<strong>on</strong>g>the</str<strong>on</strong>g> complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> reas<strong>on</strong>ing. Like GAV,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>re are situati<strong>on</strong>s where <str<strong>on</strong>g>the</str<strong>on</strong>g> approach is not expressive enough for fully capturing<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong> semantics [EA].


CHAPTER 1. INTRODUCTION 18<br />

GLAV (Global-local as view). This approach can be seen as an extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> both LAV<br />

and GAV. In this case a view <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> local schema is defined with respect to a view <strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> global schema (a mapping is an inclusi<strong>on</strong>/equivalence relati<strong>on</strong> between a query<br />

over <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology and a query over a shared <strong>on</strong>tology). Query processing<br />

is complex in this case as well, since <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no direct informati<strong>on</strong> about which<br />

data satisfy <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema. One reas<strong>on</strong>ing technique is to transform a GLAV<br />

rule into a GAV rule and an integrity c<strong>on</strong>straint over <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema[AC01]. An<br />

advantage <str<strong>on</strong>g>of</str<strong>on</strong>g> this method is that it is more expressive than both LAV and GAV 6<br />

Recently, as p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed framework (see Secti<strong>on</strong> 4.2.6), a new approach for<br />

c<strong>on</strong>necting schemas called BAV(Both as view) was introduced [MP03],[EA] that is more<br />

expressive than LAV/GAV/GLAV. Below we give an overview <str<strong>on</strong>g>of</str<strong>on</strong>g> this approach:<br />

BAV (Both as view). This approach is based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> reversible sequences <str<strong>on</strong>g>of</str<strong>on</strong>g> primitive<br />

schema transformati<strong>on</strong>s, called transformati<strong>on</strong>s pathways.<br />

Schemas are incrementally transformed by applying to <str<strong>on</strong>g>the</str<strong>on</strong>g>m a sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> primitive<br />

transformati<strong>on</strong> steps t1, . . . , tn, where each such step adds, extends, deletes,<br />

c<strong>on</strong>tracts or renames a single schema c<strong>on</strong>cept. A query expressed in <str<strong>on</strong>g>the</str<strong>on</strong>g> intermediate<br />

query language (IQL) is attached to each add or delete step and specifies<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> extent <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> new or <str<strong>on</strong>g>the</str<strong>on</strong>g> deleted c<strong>on</strong>struct in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> rest <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>structs<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> schema.<br />

[MP03] shows how <str<strong>on</strong>g>the</str<strong>on</strong>g> LAV and GAV definiti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> views can be fully derived from<br />

BAV schema transformati<strong>on</strong> sequences, and how BAV transformati<strong>on</strong> sequences<br />

may be p<str<strong>on</strong>g>art</str<strong>on</strong>g>ially derived from LAV or GAV definiti<strong>on</strong>s. A network <str<strong>on</strong>g>of</str<strong>on</strong>g> pathways may<br />

join different schema toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r. Thus, this approach for specifying mappings could<br />

also be applied for <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e mapping and for <str<strong>on</strong>g>the</str<strong>on</strong>g> hybrid approach.<br />

1.5 Wrappers and Mediators<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> wrapper/mediator architecture, <str<strong>on</strong>g>the</str<strong>on</strong>g> main comp<strong>on</strong>ents are wrappers; <str<strong>on</strong>g>the</str<strong>on</strong>g>re typically<br />

exists <strong>on</strong>e wrapper for each data source, and <strong>on</strong>e (or more) mediator(s), which mediate<br />

between <str<strong>on</strong>g>the</str<strong>on</strong>g> differences in <str<strong>on</strong>g>the</str<strong>on</strong>g> individual data sources. In <str<strong>on</strong>g>the</str<strong>on</strong>g> global integrati<strong>on</strong> paradigm,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>re is typically <strong>on</strong>e mediator which is accessed by <str<strong>on</strong>g>the</str<strong>on</strong>g> user for querying and informati<strong>on</strong><br />

retrieval. In this case <str<strong>on</strong>g>the</str<strong>on</strong>g> mediator typically has <strong>on</strong>e global schema al<strong>on</strong>g with<br />

mappings to all <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas, where each data source has <strong>on</strong>e local schema. Each<br />

data source has a wrapper associated with it, which provides <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data source and <str<strong>on</strong>g>the</str<strong>on</strong>g> system representati<strong>on</strong> (this is typically between<br />

a database representati<strong>on</strong> and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology representati<strong>on</strong>) and translates queries from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

system representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> data source representati<strong>on</strong> (typically, <strong>on</strong>tology queries would<br />

6 see http://www.science.unitn.it/coopis/talks/Wednesday/Lenzerini.pdf for an example


CHAPTER 1. INTRODUCTION 19<br />

be translated to SQL queries to be executed <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> individual database). An example<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> this approach is MOMIS [BCVB01] (see also Secti<strong>on</strong> 4.2.4). This wrapper/mediator<br />

approach with <strong>on</strong>e global mediator is illustrated in Figure 1.7.<br />

Figure 1.7: The wrapper/mediator architecture in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> global integrati<strong>on</strong>. Note that<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> drums represent as usual databases and <str<strong>on</strong>g>the</str<strong>on</strong>g> sheets represent o<str<strong>on</strong>g>the</str<strong>on</strong>g>r data sources.<br />

In a <strong>on</strong>e-to-<strong>on</strong>e integrati<strong>on</strong> scheme, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are typically several mediators, which mediate<br />

between <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> individual sources. The setting here is similar<br />

to a peer-to-peer setting, where each peer could have a number <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources and an<br />

<strong>on</strong>tology, which describes <str<strong>on</strong>g>the</str<strong>on</strong>g> data <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source. The user would be <strong>on</strong>e peer and would<br />

use that peer’s <strong>on</strong>tology. If <str<strong>on</strong>g>the</str<strong>on</strong>g> user wants to query a different peer, <str<strong>on</strong>g>the</str<strong>on</strong>g> mediator has to<br />

mediate <str<strong>on</strong>g>the</str<strong>on</strong>g> differences between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. One example <str<strong>on</strong>g>of</str<strong>on</strong>g> this case is OBSERVER<br />

[MIKS00] (see also Secti<strong>on</strong> 4.2.3), where each peer has its own mediator, which does<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting and querying <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r peers, although <strong>on</strong>e central mediator (called<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> Inter-<strong>on</strong>tology Relati<strong>on</strong>ship Manager IRM) still keeps track <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. This central mediator is queried by o<str<strong>on</strong>g>the</str<strong>on</strong>g>r mediators to find out about<br />

related peers and <str<strong>on</strong>g>the</str<strong>on</strong>g> differences in representati<strong>on</strong>.<br />

Figure 1.8 illustrates <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers and mediators in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e<br />

integrati<strong>on</strong>. Note that in this case, all mediators need to be aware <str<strong>on</strong>g>of</str<strong>on</strong>g> all o<str<strong>on</strong>g>the</str<strong>on</strong>g>r mediators in<br />

order to perform query rewriting and to achieve effective query answering.<br />

This report is fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r structured as follows. Chapter 2 presents <str<strong>on</strong>g>the</str<strong>on</strong>g> use cases which<br />

we have identified as crucial for <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. These use<br />

cases are later used to identify if and how <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> would fit into<br />

a Semantic Web c<strong>on</strong>text. The framework we use for evaluating <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this<br />

<str<strong>on</strong>g>survey</str<strong>on</strong>g> is presented in Chapter 3. The <str<strong>on</strong>g>survey</str<strong>on</strong>g> itself is presented in Chapter 4. Chapter 5<br />

compares <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> and Chapter 6 presents some c<strong>on</strong>clusi<strong>on</strong>s.


CHAPTER 1. INTRODUCTION 20<br />

Figure 1.8: The wrapper/mediator architecture in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e integrati<strong>on</strong>


Chapter 2<br />

Motivati<strong>on</strong>al Use Cases<br />

In this chapter we present a number <str<strong>on</strong>g>of</str<strong>on</strong>g> generic use cases which capture <str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong>ality<br />

required for <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. Any applicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong><br />

is expected to use all <str<strong>on</strong>g>the</str<strong>on</strong>g>se use cases to some extent. Therefore it is interesting<br />

to see to what extent each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> supports <str<strong>on</strong>g>the</str<strong>on</strong>g>se use cases in<br />

order to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g>ir applicability to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> problem <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />

Web. The <strong>Ontology</strong> Mediati<strong>on</strong> framework developed in <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project should support<br />

as many features as possible from <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>es described here.<br />

We distinguish three use cases, which are detailed in <str<strong>on</strong>g>the</str<strong>on</strong>g> remainder <str<strong>on</strong>g>of</str<strong>on</strong>g> this chapter:<br />

• Instance Mediati<strong>on</strong><br />

• <strong>Ontology</strong> <strong>Merging</strong><br />

• Creating <strong>Ontology</strong> Mappings<br />

The first use case, Instance Mediati<strong>on</strong>, addresses <str<strong>on</strong>g>the</str<strong>on</strong>g> tasks <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong>,<br />

unificati<strong>on</strong> and query rewriting. The sec<strong>on</strong>d use case, <strong>Ontology</strong> <strong>Merging</strong>, addresses <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

way two source <strong>on</strong>tologies can be merged into <strong>on</strong>e target <strong>on</strong>tology. The third use case,<br />

Creating <strong>Ontology</strong> Mappings, is about actually finding similarities between <strong>on</strong>tologies<br />

and creating mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

The generic use cases corresp<strong>on</strong>d to three orthog<strong>on</strong>al dimensi<strong>on</strong>s in <strong>on</strong>tology mediati<strong>on</strong>.<br />

Each applicati<strong>on</strong> scenario can make use <str<strong>on</strong>g>of</str<strong>on</strong>g> all three use cases to some extent.<br />

2.1 Use Cases for Instance Mediati<strong>on</strong><br />

The following use cases are <str<strong>on</strong>g>the</str<strong>on</strong>g> typical use cases for instance mediati<strong>on</strong>, where <str<strong>on</strong>g>the</str<strong>on</strong>g> emphasis<br />

is <strong>on</strong> instance transformati<strong>on</strong> and unificati<strong>on</strong>.<br />

21


CHAPTER 2. MOTIVATIONAL USE CASES 22<br />

Figure 2.1: Instance Transformati<strong>on</strong><br />

Definiti<strong>on</strong> 1 We define instance mediati<strong>on</strong> as <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> rec<strong>on</strong>ciling differences between<br />

two instance bases, each described by an <strong>on</strong>tology. This includes <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery and<br />

specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings, as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se mappings for certain tasks,<br />

such as query rewriting and instance transformati<strong>on</strong>.<br />

As we can see in <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong>, instance mediati<strong>on</strong> also requires <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery and<br />

specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings. This makes apparent <str<strong>on</strong>g>the</str<strong>on</strong>g> inter-dependencies between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> different use cases. We do not describe <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery and specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology<br />

mappings here; instead, <str<strong>on</strong>g>the</str<strong>on</strong>g>se use cases are discussed later, because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir use in different<br />

o<str<strong>on</strong>g>the</str<strong>on</strong>g>r areas <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong>.<br />

2.1.1 Instance Transformati<strong>on</strong><br />

For <str<strong>on</strong>g>the</str<strong>on</strong>g> instance transformati<strong>on</strong> use case we assume two separate applicati<strong>on</strong>s with separate<br />

instance stores both described by <strong>on</strong>tologies. The task to be performed is <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> a source <strong>on</strong>tology, say OS, to an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology<br />

OT . Figure 2.1 illustrates <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong>. An instance i1, which<br />

refers to <strong>on</strong>tology O1, is transformed into instance i2, which refers to <strong>on</strong>tology O2. What<br />

is important to note here is that <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> itself is derived from <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies, and that both <str<strong>on</strong>g>the</str<strong>on</strong>g> original and <str<strong>on</strong>g>the</str<strong>on</strong>g> transformed instance provide<br />

informati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant. Note that a real-world referant is not necessarily<br />

a physical object, but can also be, for example, a brand-name product, a date, an<br />

event or a message.<br />

This kind <str<strong>on</strong>g>of</str<strong>on</strong>g> transformati<strong>on</strong> needs to be supported by <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

sense that <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />

<strong>on</strong>tology OS and instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology OT .<br />

Different applicati<strong>on</strong> scenarios have different requirements <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> precisi<strong>on</strong> and coverage<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>. With precisi<strong>on</strong> in this c<strong>on</strong>text we mean <str<strong>on</strong>g>the</str<strong>on</strong>g> degree to which<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> intended meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> instance is preserved in <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>. With coverage


CHAPTER 2. MOTIVATIONAL USE CASES 23<br />

we mean <str<strong>on</strong>g>the</str<strong>on</strong>g> fracti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances that are intended to be transformed, which are actually<br />

transformed. The requirements <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong> determine what <str<strong>on</strong>g>the</str<strong>on</strong>g>se measures look<br />

like.<br />

When an instance has been translated from OS to OT , it is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten necessary to detect<br />

whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> transformed instance corresp<strong>on</strong>ds to an existing instance in <str<strong>on</strong>g>the</str<strong>on</strong>g> instance store<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target applicati<strong>on</strong> in order to avoid duplicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> and in order to find<br />

out more about <str<strong>on</strong>g>the</str<strong>on</strong>g> instances in <str<strong>on</strong>g>the</str<strong>on</strong>g> knowledge base. We discuss this issue below.<br />

2.1.2 Instance Unificati<strong>on</strong><br />

The instance unificati<strong>on</strong> problem can be summarized as follows:<br />

Say we have an <strong>on</strong>tology O, and two instances I1 and I2 <str<strong>on</strong>g>of</str<strong>on</strong>g> that <strong>on</strong>tology. We want to<br />

check whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r I1 and I2 refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant. In this case we need to unify<br />

I1 and I2 into a newly created instance I0, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> I1 and I2 1 . Therefore,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> instance unificati<strong>on</strong> task can be decomposed into (1) <str<strong>on</strong>g>the</str<strong>on</strong>g> identificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances<br />

referring to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant and (2) taking <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two instances in<br />

order to obtain <str<strong>on</strong>g>the</str<strong>on</strong>g> unified instance.<br />

If <str<strong>on</strong>g>the</str<strong>on</strong>g> instances I1 and I2 have been identified as referring to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant,<br />

but c<strong>on</strong>tain c<strong>on</strong>tradictory informati<strong>on</strong>, it is not possible to create a unified instance<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> user should be informed <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> inc<strong>on</strong>sistency.<br />

Figure 2.2 illustrates <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> instance unificati<strong>on</strong>. Two instances (i1 and i2) in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tology O1, which refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant r1, are unified into <strong>on</strong>e<br />

new instance, i0, which is also an instance in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O1 and also describes <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />

real-world referant r1. 2<br />

We identify two general means <str<strong>on</strong>g>of</str<strong>on</strong>g> detecting whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r two instances refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />

real-world referant:<br />

• In <str<strong>on</strong>g>the</str<strong>on</strong>g> ‘exact’ case, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping specifies precise, exact c<strong>on</strong>diti<strong>on</strong>s which<br />

unambiguously specify in which cases two instances refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world<br />

referant and in which cases <str<strong>on</strong>g>the</str<strong>on</strong>g>y refer to different real-world referant. In o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

words, in which cases <str<strong>on</strong>g>the</str<strong>on</strong>g> instances are unifiable.<br />

• In <str<strong>on</strong>g>the</str<strong>on</strong>g> ‘probabilistic’ case, a similarity measure is created <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> basis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

mapping. The similarity measure expresses <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that both instances<br />

1Note that I0 could coincide with ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r I1 or I2, which would be a less general case <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e described<br />

here.<br />

2Note that c<strong>on</strong>tradictory informati<strong>on</strong> may result from unstated assumpti<strong>on</strong>s in an <strong>on</strong>tology or instance<br />

base. Unificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>tradictory statements could be performed by adding in <str<strong>on</strong>g>the</str<strong>on</strong>g> appropriate assumpti<strong>on</strong><br />

as a c<strong>on</strong>diti<strong>on</strong>.<br />

For example, ”Canada [a real-world object] requires inbound travellers to carry passports” is true if <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

assumpti<strong>on</strong> is that travellers are EU citizens, but not if <str<strong>on</strong>g>the</str<strong>on</strong>g>y are US citizens entering from <str<strong>on</strong>g>the</str<strong>on</strong>g> US.<br />

Such techniques are not explored in this deliverable.


CHAPTER 2. MOTIVATIONAL USE CASES 24<br />

Figure 2.2: Instance Unificati<strong>on</strong><br />

refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same object. A threshold could be used to decide whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r to unify <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

instances. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r possibility is to have <str<strong>on</strong>g>the</str<strong>on</strong>g> user decide about <str<strong>on</strong>g>the</str<strong>on</strong>g> unificati<strong>on</strong>, which<br />

is clearly undesirable in <str<strong>on</strong>g>the</str<strong>on</strong>g> general case, but could be useful when <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> resulting <strong>on</strong>tology is crucial or when dealing with very few instances.<br />

Instance transformati<strong>on</strong> and instance unificati<strong>on</strong> are <str<strong>on</strong>g>of</str<strong>on</strong>g>ten required in a querying scenario<br />

where an applicati<strong>on</strong> A queries ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r applicati<strong>on</strong> B and <str<strong>on</strong>g>the</str<strong>on</strong>g> query results (c<strong>on</strong>sisting<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> instances) are transformed to <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> A and unified with instances<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> instance base <str<strong>on</strong>g>of</str<strong>on</strong>g> A.<br />

In order to be able to query a data source which uses a different (unknown) <strong>on</strong>tology,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> query originally formulated in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong>’s <strong>on</strong>tology needs to be rewritten<br />

in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology. The next secti<strong>on</strong> describes <str<strong>on</strong>g>the</str<strong>on</strong>g> generic query rewriting use<br />

case.<br />

2.1.3 Query Rewriting<br />

An operati<strong>on</strong> occurring very frequently in Knowledge Management applicati<strong>on</strong>s is querying<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> sources. We want to allow an applicati<strong>on</strong> to query different heterogeneous<br />

informati<strong>on</strong> sources without actually knowing about all <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. In order to<br />

achieve this, a query written in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong>’s <strong>on</strong>tology needs to be rewritten<br />

using <str<strong>on</strong>g>the</str<strong>on</strong>g> terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> target data source’s <strong>on</strong>tology.<br />

Say, we have an applicati<strong>on</strong> A, which uses an <strong>on</strong>tology OA for its informati<strong>on</strong> representati<strong>on</strong>.<br />

Say now that this applicati<strong>on</strong> wants to query a different data source, which uses<br />

<strong>on</strong>tology OB, but A does not know about <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> this <strong>on</strong>tology. The applicati<strong>on</strong>


CHAPTER 2. MOTIVATIONAL USE CASES 25<br />

Figure 2.3: Query Rewriting<br />

A now formulates a query QA in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology OA. In order to execute this query<br />

<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> target data source, it needs to be rewritten <strong>on</strong>to query QB, which is formulated in<br />

terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology OB. This rewriting process is illustrated in Figure 2.3.<br />

After executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query, <str<strong>on</strong>g>the</str<strong>on</strong>g> results are transformed back to <str<strong>on</strong>g>the</str<strong>on</strong>g> OA representati<strong>on</strong><br />

and unified with <str<strong>on</strong>g>the</str<strong>on</strong>g> local instances using <str<strong>on</strong>g>the</str<strong>on</strong>g> techniques for instance transformati<strong>on</strong> and<br />

unificati<strong>on</strong> described above.<br />

2.2 <strong>Ontology</strong> <strong>Merging</strong><br />

Besides <str<strong>on</strong>g>the</str<strong>on</strong>g> instance transformati<strong>on</strong> and unificati<strong>on</strong> and query rewriting, we see ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

major use case for <strong>on</strong>tology mediati<strong>on</strong>: <strong>Ontology</strong> <strong>Merging</strong>.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong> <strong>Merging</strong> [NM00b], two source <strong>on</strong>tologies shall be merged<br />

into <strong>on</strong>e target <strong>on</strong>tology based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies. In <str<strong>on</strong>g>the</str<strong>on</strong>g> general case, <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />

<strong>on</strong>tologies would disappear or become unavailable and <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> target (merged) <strong>on</strong>tology<br />

remains. A special case is when <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies remain, al<strong>on</strong>g with mappings to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

merged <strong>on</strong>tology. Note that <str<strong>on</strong>g>the</str<strong>on</strong>g> target (merged) <strong>on</strong>tology could coincide with <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

source <strong>on</strong>tologies.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> general case where <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies become unavailable after <str<strong>on</strong>g>the</str<strong>on</strong>g> merge,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> complete instance stores <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies have to be merged. In <str<strong>on</strong>g>the</str<strong>on</strong>g> special<br />

case, <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies can maintain <str<strong>on</strong>g>the</str<strong>on</strong>g>ir instance stores and during run-time <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

applicati<strong>on</strong>, processes <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong> and instance unificati<strong>on</strong> (cf. <str<strong>on</strong>g>the</str<strong>on</strong>g> previous<br />

subsecti<strong>on</strong>) might be necessary 3 . We can compare <str<strong>on</strong>g>the</str<strong>on</strong>g>se two distinct cases with noti<strong>on</strong>s<br />

developed in <str<strong>on</strong>g>the</str<strong>on</strong>g> field <str<strong>on</strong>g>of</str<strong>on</strong>g> database integrati<strong>on</strong>, namely, <str<strong>on</strong>g>the</str<strong>on</strong>g> noti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> materialised and<br />

virtual views [Hul97] respectively.<br />

Of course, when <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies do not have instance stores associated with<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>m, <str<strong>on</strong>g>the</str<strong>on</strong>g>se problems do not occur. However, in <str<strong>on</strong>g>the</str<strong>on</strong>g> general case an <strong>on</strong>tology will have<br />

<strong>on</strong>e or more instance stores associated with it. In special cases, such as <str<strong>on</strong>g>the</str<strong>on</strong>g> (distributed)<br />

development <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies, <str<strong>on</strong>g>the</str<strong>on</strong>g>re will not be instance stores.<br />

3 Note that mappings <strong>on</strong>ly can be derived when <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a semantical overlap. If <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no semantical<br />

overlap, <str<strong>on</strong>g>the</str<strong>on</strong>g> set <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings is empty and thus instance transformati<strong>on</strong>s and and instance unificati<strong>on</strong>s are<br />

not performed.


CHAPTER 2. MOTIVATIONAL USE CASES 26<br />

2.3 Creating <strong>Ontology</strong> Mappings<br />

In order to be able to support <str<strong>on</strong>g>the</str<strong>on</strong>g> previously menti<strong>on</strong>ed use cases, generally, a mapping<br />

needs to be created between <str<strong>on</strong>g>the</str<strong>on</strong>g> source and <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology. This does not apply to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging where <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies do not remain. Because <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />

<strong>on</strong>tologies disappear, <str<strong>on</strong>g>the</str<strong>on</strong>g>re needs to be no <strong>on</strong>tology mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g>se sources and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> new merged <strong>on</strong>tology. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> techniques for finding c<strong>on</strong>cepts to be merged in<br />

different <strong>on</strong>tologies and finding mappings between c<strong>on</strong>cepts in different <strong>on</strong>tologies are<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> same, since <str<strong>on</strong>g>the</str<strong>on</strong>g>y are both based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts. In fact, a mapping<br />

between two <strong>on</strong>tologies can be used as a basis for <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology. In <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<strong>Ontology</strong> <strong>Merging</strong> where <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies remain, a mapping needs to be created<br />

between each source <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology.<br />

We split <str<strong>on</strong>g>the</str<strong>on</strong>g> ”Creating <strong>Ontology</strong> Mappings” use case into two distinct use cases: finding<br />

similarities between <strong>on</strong>tologies and specifying mappings between <strong>on</strong>tologies.<br />

2.3.1 Finding Similarities<br />

In order to determine which mappings need to be created, <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity between <strong>on</strong>tologies<br />

needs to be established. This can ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r be established manually or automatically<br />

using <str<strong>on</strong>g>the</str<strong>on</strong>g> so-called Match operator (cf. [RB01]). The Match operator takes as input two<br />

<strong>on</strong>tologies and returns as output a list <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities between entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> two source<br />

<strong>on</strong>tologies. These similarities can now be used as a st<str<strong>on</strong>g>art</str<strong>on</strong>g>ing point to semi-automatically<br />

create a mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies or to merge <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies (cf. [NM00b]).<br />

2.3.2 Specifying Mappings<br />

After having defined <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> different <strong>on</strong>tologies, a mapping<br />

needs to be specified between <str<strong>on</strong>g>the</str<strong>on</strong>g> similar entities <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. The requirements<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> this mapping depend <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong> scenario (cf. <str<strong>on</strong>g>the</str<strong>on</strong>g> various scenarios described in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> next secti<strong>on</strong>) and in general <str<strong>on</strong>g>the</str<strong>on</strong>g> requirements <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong>, as menti<strong>on</strong>ed in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> introducti<strong>on</strong>.


Chapter 3<br />

The Evaluati<strong>on</strong> Framework<br />

This chapter presents <str<strong>on</strong>g>the</str<strong>on</strong>g> framework used for evaluating and comparing different approaches<br />

in <strong>on</strong>tology merging and aligning, as well as data integrati<strong>on</strong> using <strong>on</strong>tologies.<br />

This framework is set up in such a way that it enables us to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> approaches to an <strong>on</strong>tology mediati<strong>on</strong> setting in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web c<strong>on</strong>text. Each <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> is described according to <str<strong>on</strong>g>the</str<strong>on</strong>g>se criteria. If <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> criteria<br />

is not applicable to <str<strong>on</strong>g>the</str<strong>on</strong>g> approach, it will be omitted.<br />

Summary <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approach We first summarize <str<strong>on</strong>g>the</str<strong>on</strong>g> approach to give <str<strong>on</strong>g>the</str<strong>on</strong>g> reader a feeling<br />

for what <str<strong>on</strong>g>the</str<strong>on</strong>g> approach is all about.<br />

<strong>Ontology</strong> Languages For each tool or method we describe which <strong>on</strong>tology languages are<br />

supported as sources and targets <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between <strong>on</strong>tologies. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore,<br />

we describe how <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology languages relate to <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language employed<br />

by <str<strong>on</strong>g>the</str<strong>on</strong>g> approach. In many cases <str<strong>on</strong>g>the</str<strong>on</strong>g> same language is used for both <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings. In some cases, this can have drawbacks if <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language<br />

is not expressive enough to capture all <str<strong>on</strong>g>the</str<strong>on</strong>g> required <strong>on</strong>tology mappings.<br />

Mapping language An important aspect in <strong>on</strong>tology mediati<strong>on</strong> is <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language<br />

which is used to actually specify <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. The mapping language determines<br />

to some extent <str<strong>on</strong>g>the</str<strong>on</strong>g> complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> creating mappings and also <str<strong>on</strong>g>the</str<strong>on</strong>g> possibilities <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong><br />

in creating <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings and in transforming and unifying instances. The<br />

most important aspects <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology mapping language are its expressivity (i.e.<br />

what kind <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>s between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies can be expressed) and its usability.<br />

An important aspect <str<strong>on</strong>g>of</str<strong>on</strong>g> a mapping language is <str<strong>on</strong>g>the</str<strong>on</strong>g> types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings that are supported,<br />

in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r words <str<strong>on</strong>g>the</str<strong>on</strong>g> expressivity. We can distinguish several types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings<br />

here. The following is an (incomplete) list <str<strong>on</strong>g>of</str<strong>on</strong>g> types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings:<br />

• Class mappings<br />

• Property (i.e. relati<strong>on</strong>) mappings<br />

27


CHAPTER 3. THE EVALUATION FRAMEWORK 28<br />

• Instance mappings<br />

• Axioms / rules / c<strong>on</strong>straints<br />

• Value transformati<strong>on</strong>s (for properties)<br />

• C<strong>on</strong>diti<strong>on</strong>al mapping<br />

As was pointed out in Secti<strong>on</strong> 1.3, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are several mismatches between <strong>on</strong>tologies,<br />

both <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> language and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology level. A mapping language needs to take<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>se mismatches into account. These mismatches mostly c<strong>on</strong>cern <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

level, although <str<strong>on</strong>g>the</str<strong>on</strong>g>re are still some issues remaining <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> language level, as was<br />

pointed out in Secti<strong>on</strong> 1.3.<br />

Two notes about <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language with respect to <str<strong>on</strong>g>the</str<strong>on</strong>g> approach in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g><br />

are in order here. Firstly, an <strong>on</strong>tology merging tool (e.g. PROMPT [NM00b]) does<br />

not produce a mapping and <str<strong>on</strong>g>the</str<strong>on</strong>g>refore does not need a mapping language. Sec<strong>on</strong>dly,<br />

we describe several methods and tools for <strong>on</strong>tology matching in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>. These<br />

approaches typically do not produce a mapping, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r a specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities<br />

between entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

Mapping Patterns One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> major goals <str<strong>on</strong>g>of</str<strong>on</strong>g> Work Package 4 in <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project is to<br />

investigate <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> patterns for <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings. One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

tasks is to find such patterns. Therefore, it would be interesting to see if and how<br />

existing approaches cope with this and how mapping patterns could be integrated.<br />

This issue is very closely related to <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language.<br />

Automati<strong>on</strong> support We describe <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong> that is supported and <str<strong>on</strong>g>the</str<strong>on</strong>g> degree<br />

to which it is supported during creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping. <strong>Ontology</strong><br />

mapping can not be fully automated; <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process will always be an interactive<br />

<strong>on</strong>e.<br />

One important aspect in <str<strong>on</strong>g>the</str<strong>on</strong>g> automati<strong>on</strong> support is <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> external informati<strong>on</strong><br />

sources, such as domain-specific lexic<strong>on</strong>s or existing <strong>on</strong>tologies or data schemas.<br />

Applicability to use cases In order to see if and how an approach can be applied to our<br />

setting <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web we analyze <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

each approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> use cases presented in Chapter 2. More specifically, we relate<br />

each approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> following use cases:<br />

• Instance Transformati<strong>on</strong><br />

• Instance Unificati<strong>on</strong><br />

• Query Rewriting<br />

• <strong>Ontology</strong> <strong>Merging</strong><br />

We will not treat <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g> each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches to each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> use cases<br />

in detail, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r give an indicati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> (in)applicability to each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> use<br />

cases.


CHAPTER 3. THE EVALUATION FRAMEWORK 29<br />

Implementati<strong>on</strong> For each approach we describe <str<strong>on</strong>g>the</str<strong>on</strong>g> tool support developed for <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular<br />

method. We distinguish <str<strong>on</strong>g>the</str<strong>on</strong>g> following two categories <str<strong>on</strong>g>of</str<strong>on</strong>g> tools:<br />

• Tools that support <str<strong>on</strong>g>the</str<strong>on</strong>g> user in creating <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings (and merging <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies).<br />

These tools fall in two categories: (1) comp<strong>on</strong>ents that implement <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Match operator to find similarities between <strong>on</strong>tologies and (2) GUI tools that<br />

aid <str<strong>on</strong>g>the</str<strong>on</strong>g> user in specifying <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

• Tools that do <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time mediati<strong>on</strong>. These tools take care <str<strong>on</strong>g>of</str<strong>on</strong>g> query-rewriting,<br />

data transformati<strong>on</strong>, etc. . .<br />

An important aspect <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong> is <str<strong>on</strong>g>the</str<strong>on</strong>g> maturity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool(set). An<br />

academic prototype that has just been built to support a PhD <str<strong>on</strong>g>the</str<strong>on</strong>g>sis would be less<br />

stable and less usable than a product that has underg<strong>on</strong>e much development over<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> years and is exploited by a commercial organizati<strong>on</strong>.<br />

Experiences with <str<strong>on</strong>g>the</str<strong>on</strong>g> approach We summarize <str<strong>on</strong>g>the</str<strong>on</strong>g> experiences that have been reported<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> literature for each approach. These experiences are very valuable, because<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y show <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> methods to real <strong>on</strong>tology mapping and informati<strong>on</strong><br />

integrati<strong>on</strong> problems. They also show <str<strong>on</strong>g>the</str<strong>on</strong>g> usability and limitati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tools<br />

that have been developed for <str<strong>on</strong>g>the</str<strong>on</strong>g> method.<br />

We structure <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> in Chapter 4<br />

according to this evaluati<strong>on</strong> framework. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, we provide a comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> presented evaluati<strong>on</strong> framework in Chapter 5.


Chapter 4<br />

The Survey<br />

This chapter presents <str<strong>on</strong>g>the</str<strong>on</strong>g> actual <str<strong>on</strong>g>survey</str<strong>on</strong>g> <strong>on</strong> <strong>on</strong>tology merging and aligning approaches. We<br />

evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches according to <str<strong>on</strong>g>the</str<strong>on</strong>g> criteria identified in <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> framework in<br />

Chapter 3.<br />

In order to structure <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g>, we have grouped <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches into three categories:<br />

• Methods and Tools. We describe several special-purpose methods and tools.<br />

The purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this secti<strong>on</strong> ranges from <strong>on</strong>tology matching<br />

(GLUE, Semantic Matching) to <strong>on</strong>tology merging (PROMPT) and <strong>on</strong>tology mapping<br />

(MAFRA, RDFT, WSMT). Sometimes <str<strong>on</strong>g>the</str<strong>on</strong>g> lines between <str<strong>on</strong>g>the</str<strong>on</strong>g> purpose <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> approaches becomes blurred, because, for example, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA<br />

[MMSV02] also describe a way to do <strong>on</strong>tology matching. Also, in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

PROMPT we not <strong>on</strong>ly describe <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology merging tool, but also related tools in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> matching (even PROMPT itself has a matching algorithm) and <strong>on</strong>tology<br />

versi<strong>on</strong>ing (PROMPTDiff).<br />

• Data Integrati<strong>on</strong> Systems. We describe four approaches to data integrati<strong>on</strong> using<br />

<strong>on</strong>tologies, namely InfoSleuth, ONION, INFOMIX, MOMIS and OBSERVER.<br />

These integrati<strong>on</strong> systems are all comprehensive in <str<strong>on</strong>g>the</str<strong>on</strong>g> sense that <str<strong>on</strong>g>the</str<strong>on</strong>g>y typically<br />

have different types <str<strong>on</strong>g>of</str<strong>on</strong>g> functi<strong>on</strong>ality. For example, both ONION and MOMIS have<br />

matching tools, which aid in creating mappings between <strong>on</strong>tologies. All data integrati<strong>on</strong><br />

systems described in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> support querying <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data<br />

sources based <strong>on</strong> querying posed against an <strong>on</strong>tology; <str<strong>on</strong>g>the</str<strong>on</strong>g>y typically implement <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

wrapper/mediator architecture, which was described in Secti<strong>on</strong> 1.5.<br />

• Specific Techniques. We briefly describe a few specific techniques, which we do<br />

not evaluate according to <str<strong>on</strong>g>the</str<strong>on</strong>g> criteria in <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> framework. FCA-Merge is a<br />

method for <strong>on</strong>tology merging, based <strong>on</strong> formal c<strong>on</strong>cept analysis. OntoMorph is a<br />

system for syntactic and semantic rewriting <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. QOM (Quick <strong>Ontology</strong><br />

Mapping) is a method and tool for <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mapping, based <strong>on</strong> a<br />

30


CHAPTER 4. THE SURVEY 31<br />

combined similarity measure. OMEN is a system that describes mappings with a<br />

probability and a by means <str<strong>on</strong>g>of</str<strong>on</strong>g> a Bayesian Network.<br />

4.1 Methods and Tools<br />

4.1.1 MAFRA<br />

Summary MAFRA (MApping FRAmework for distributed <strong>on</strong>tologies) [MMSV02,<br />

SaR03b] is a framework defined for mapping distributed <strong>on</strong>tologies <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web<br />

based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> idea that complex mappings and reas<strong>on</strong>ing about those mappings is <str<strong>on</strong>g>the</str<strong>on</strong>g> best<br />

approach in a decentralized envir<strong>on</strong>ment like <str<strong>on</strong>g>the</str<strong>on</strong>g> Web. MAFRA has been implemented<br />

as a plug-in <str<strong>on</strong>g>of</str<strong>on</strong>g> KAON 1 and introduces two interesting new c<strong>on</strong>cepts: Semantic Bridges<br />

and service-centric approaches. A Semantic Bridge is defined as ”a declarative representati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> a semantic relati<strong>on</strong> between source and target <strong>on</strong>tologies entities” [SaR03b]. A<br />

Semantic Bridge provides <str<strong>on</strong>g>the</str<strong>on</strong>g> necessary mechanisms to transform instances and property<br />

fillers <str<strong>on</strong>g>of</str<strong>on</strong>g> a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular source <strong>on</strong>tology into instances and property fillers <str<strong>on</strong>g>of</str<strong>on</strong>g> a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular<br />

target <strong>on</strong>tology. Semantic Bridges are similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> noti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> structures (“<str<strong>on</strong>g>the</str<strong>on</strong>g><br />

points <str<strong>on</strong>g>of</str<strong>on</strong>g> linkage between two aligned <strong>on</strong>tologies”) in [Kle01] and <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies<br />

in ONION [MWK00, MW01] (also Secti<strong>on</strong> 4.2.2).<br />

The o<str<strong>on</strong>g>the</str<strong>on</strong>g>r novelty is <str<strong>on</strong>g>the</str<strong>on</strong>g> service-centric approach that <str<strong>on</strong>g>the</str<strong>on</strong>g> MAFRA Toolkit introduces<br />

[SaR03b]:<br />

Each semantic bridge has an associated transformati<strong>on</strong> service that determines<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> procedure and <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> user must provide<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> engine. Each service is characterized by a set <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

arguments, which in turn are characterized by name, type, opti<strong>on</strong>ality and locati<strong>on</strong><br />

(whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r it is a source, target or c<strong>on</strong>diti<strong>on</strong> argument). Services are not<br />

<strong>on</strong>ly resp<strong>on</strong>sible for <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> capabilities but also for <str<strong>on</strong>g>the</str<strong>on</strong>g> validati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> argument values and semi-automatic mapping.<br />

The service oriented approach complements <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges mechanism providing<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> services necessary to perform <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping transformati<strong>on</strong>s. Silva<br />

and colleagues proposed a decentralized soluti<strong>on</strong> where independent transformati<strong>on</strong> modules<br />

are attached to <str<strong>on</strong>g>the</str<strong>on</strong>g> system. An overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> MAFRA toolkit can<br />

be seen in Figure 4.1, where some transformati<strong>on</strong> modules are included (copy instance,<br />

copy relati<strong>on</strong>, c<strong>on</strong>catenate, split, etc.).<br />

Figure 4.2 outlines <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>ceptual architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> MAFRA System. In <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>ceptual<br />

architecture a set <str<strong>on</strong>g>of</str<strong>on</strong>g> modules is identified and organized al<strong>on</strong>g two dimensi<strong>on</strong>s. Horiz<strong>on</strong>tal<br />

modules corresp<strong>on</strong>d to five fundamental phases in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping process<br />

1 KAON is an <strong>Ontology</strong> Management tool developed by <str<strong>on</strong>g>the</str<strong>on</strong>g> University <str<strong>on</strong>g>of</str<strong>on</strong>g> Karlsruhe,<br />

http://ka<strong>on</strong>.semanticweb.org/


CHAPTER 4. THE SURVEY 32<br />

Figure 4.1: MAFRA Toolkit System Architecture [SaR03b]


CHAPTER 4. THE SURVEY 33<br />

(Lift & Normalizati<strong>on</strong>, Similarity, Semantic Bridging, Executi<strong>on</strong> and Postprocessing).<br />

The vertical modules (Evoluti<strong>on</strong>, Domain Knowledge & C<strong>on</strong>straints, Cooperative C<strong>on</strong>sensus<br />

building and GUI) interact with <str<strong>on</strong>g>the</str<strong>on</strong>g> horiz<strong>on</strong>tal phases during <str<strong>on</strong>g>the</str<strong>on</strong>g> entire <strong>on</strong>tology<br />

mapping process.<br />

Figure 4.2: MAFRA C<strong>on</strong>ceptual Architecture [MMSV02]<br />

The main phases <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process described al<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g> horiz<strong>on</strong>tal dimensi<strong>on</strong><br />

are:<br />

• Lift & Normalizati<strong>on</strong>. Lifting refers to <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> importing <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies in a<br />

uniform representati<strong>on</strong> formalism (RDF-Schema in this case) for facilitating later<br />

operati<strong>on</strong>s. The next step after lifting is <str<strong>on</strong>g>the</str<strong>on</strong>g> normalizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabularies <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>tologies by eliminating lexical and semantic differences (like special characters,<br />

upper case letters and acr<strong>on</strong>yms).<br />

• Similarity In this step, similarities between <strong>on</strong>tology entities are calculated as a<br />

support for mapping discovery. A combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> different similarity algorithms<br />

is employed in MAFRA[MMSV02][Sil02] in order to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

identificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> equivalent terms.<br />

• Semantic Bridging After identifying <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between entities from different<br />

<strong>on</strong>tologies, <str<strong>on</strong>g>the</str<strong>on</strong>g> similar entities are semantically bridged, i.e. corresp<strong>on</strong>dences are<br />

established between <str<strong>on</strong>g>the</str<strong>on</strong>g>m for enabling <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e source<br />

entity to instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e target entity. Several sub-steps can be identified in this


CHAPTER 4. THE SURVEY 34<br />

phase. We describe <str<strong>on</strong>g>the</str<strong>on</strong>g>se sub-steps in more detail after <str<strong>on</strong>g>the</str<strong>on</strong>g> enumerati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

mapping phases and <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> vertical modules.<br />

MAFRA includes a formal representati<strong>on</strong> to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings. The formalism<br />

that is used to describe <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges is based <strong>on</strong> a DAML+OIL <strong>on</strong>tology,<br />

called <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridging <strong>Ontology</strong> (SBO). The result is close to <str<strong>on</strong>g>the</str<strong>on</strong>g> noti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology in ONION[MWK00] (see also Secti<strong>on</strong> 4.2.2). A mapping is<br />

a set <str<strong>on</strong>g>of</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic bridges described by this <strong>on</strong>tology. We describe<br />

this <strong>on</strong>tology in more detail later.<br />

• Executi<strong>on</strong>. When <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings/bridges are specified, <str<strong>on</strong>g>the</str<strong>on</strong>g> next step is to exploit<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>m in a meaningful way. As already menti<strong>on</strong>ed, MAFRA addresses <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> task<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong> [MMSV02]. This module actually transforms instances<br />

from <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology representati<strong>on</strong> into <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology<br />

by by evaluating <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> functi<strong>on</strong>s associated with <str<strong>on</strong>g>the</str<strong>on</strong>g> bridges<br />

defined in <str<strong>on</strong>g>the</str<strong>on</strong>g> previous stage. There are two possible operati<strong>on</strong>al modes: <str<strong>on</strong>g>of</str<strong>on</strong>g>fline<br />

(all <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s are executed <strong>on</strong>e time) and <strong>on</strong>line (<str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s are<br />

c<strong>on</strong>tinuously executed, and modificati<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> source or target <strong>on</strong>tologies are immediately<br />

reflected).<br />

• Postprocessing Based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> executi<strong>on</strong> results, <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping specificati<strong>on</strong> is again<br />

analyzed, e.g. for discovering object identities, in order to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> instance transformati<strong>on</strong> task.<br />

The vertical dimensi<strong>on</strong> comprises <str<strong>on</strong>g>the</str<strong>on</strong>g> following modules:<br />

• Evoluti<strong>on</strong> Synchr<strong>on</strong>ize <str<strong>on</strong>g>the</str<strong>on</strong>g> changes in <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Semantic Bridges defined by <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridge module.<br />

• Cooperative C<strong>on</strong>sensus Building From multiple alternative possible mappings <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

tool helps to set up a c<strong>on</strong>sensus between <str<strong>on</strong>g>the</str<strong>on</strong>g> various proposals <str<strong>on</strong>g>of</str<strong>on</strong>g> people involved<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping task.<br />

• Domain c<strong>on</strong>straints and Background Knowledge The tool allows users to include<br />

extra informati<strong>on</strong> (e.g. lexical <strong>on</strong>tologies like Wordnet can help in <str<strong>on</strong>g>the</str<strong>on</strong>g> identificati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> syn<strong>on</strong>yms) in order to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping.<br />

• GUI Visualizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies makes <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

mapping task a lot easier in <str<strong>on</strong>g>the</str<strong>on</strong>g> same way as do <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges established<br />

to represent <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between entities.<br />

The Semantic Bridging <strong>Ontology</strong> (SBO) (see Figure 4.3) is a tax<strong>on</strong>omy <str<strong>on</strong>g>of</str<strong>on</strong>g> generic<br />

bridges that are going to be instantiated for defining c<strong>on</strong>crete bridges/mappings. We give<br />

an overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> dimensi<strong>on</strong>s al<strong>on</strong>g which a bridge can be described in MAFRA, followed<br />

by a shallow descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> classes from SBO that allows <strong>on</strong>e to express such<br />

bridges.


CHAPTER 4. THE SURVEY 35<br />

Figure 4.3: Semantic Bridging <strong>Ontology</strong> (SBO) in UML [MMSV02]


CHAPTER 4. THE SURVEY 36<br />

A bridge can be described al<strong>on</strong>g five dimensi<strong>on</strong>s:<br />

• entity dimensi<strong>on</strong>: pertains to <str<strong>on</strong>g>the</str<strong>on</strong>g> entities related by a bridge which may be c<strong>on</strong>cepts<br />

(modeling classes <str<strong>on</strong>g>of</str<strong>on</strong>g> objects in <str<strong>on</strong>g>the</str<strong>on</strong>g> real world), relati<strong>on</strong>s, and attributes.<br />

• cardinality dimensi<strong>on</strong>: pertains to <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology entities at both sides <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> semantic bridge(usually 1:n or m:1, m:n is seldom required and it can be usually<br />

decomposed into m:1:n)<br />

• structural dimensi<strong>on</strong>: pertains to <str<strong>on</strong>g>the</str<strong>on</strong>g> way elementary bridges may be combined into<br />

a more complex bridge (relati<strong>on</strong>s that may hold between bridges: specializati<strong>on</strong>,<br />

alternatives, compositi<strong>on</strong>, abstracti<strong>on</strong>)<br />

• transformati<strong>on</strong> dimensi<strong>on</strong>: describes how instances are transformed by means <str<strong>on</strong>g>of</str<strong>on</strong>g> an<br />

associated transformati<strong>on</strong> functi<strong>on</strong>.<br />

• c<strong>on</strong>straint dimensi<strong>on</strong>: allows <strong>on</strong>e to express c<strong>on</strong>diti<strong>on</strong>s up<strong>on</strong> whose fulfilment <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

bridge evaluati<strong>on</strong> depends. The transformati<strong>on</strong> rule associated with <str<strong>on</strong>g>the</str<strong>on</strong>g> bridge is<br />

not executed unless <str<strong>on</strong>g>the</str<strong>on</strong>g>se c<strong>on</strong>diti<strong>on</strong>s hold.<br />

The abstract class SemanticBridge describes a generic bridge, up<strong>on</strong> which <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

are no restricti<strong>on</strong>s regarding <str<strong>on</strong>g>the</str<strong>on</strong>g> entity types that <str<strong>on</strong>g>the</str<strong>on</strong>g> bridge c<strong>on</strong>nects or <str<strong>on</strong>g>the</str<strong>on</strong>g> cardinality.<br />

For supporting <str<strong>on</strong>g>the</str<strong>on</strong>g> compositi<strong>on</strong> modelling primitive, this class has defined a relati<strong>on</strong><br />

hasBridge. The class SemanticBridgeAlt supports <str<strong>on</strong>g>the</str<strong>on</strong>g> alternative modelling<br />

primitive by grouping several mutual exclusive semantic bridges. SemanticBridge is<br />

fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r specialized in <str<strong>on</strong>g>the</str<strong>on</strong>g> SBO according to <str<strong>on</strong>g>the</str<strong>on</strong>g> entity type, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology defining as subclasses<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> this class: Relati<strong>on</strong>Bridge, C<strong>on</strong>ceptBridge, and AtributeBridge.<br />

There is no specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SemanticBridge class to a class which bridges between<br />

individuals.<br />

Rule is a class for describing generic rules. C<strong>on</strong>diti<strong>on</strong> and Transformati<strong>on</strong><br />

are its subclasses which are resp<strong>on</strong>sible for describing <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>diti<strong>on</strong> necessary for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a bridge, and <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> functi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a bridge, respectively. The<br />

Service class maps <str<strong>on</strong>g>the</str<strong>on</strong>g> bridge parameters with <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> procedure arguments<br />

and call to procedures.<br />

The main goal in MAFRA is to transform instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology into instances<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology. Semantic Bridges specify how to perform <str<strong>on</strong>g>the</str<strong>on</strong>g>se transformati<strong>on</strong>s<br />

and categorize <str<strong>on</strong>g>the</str<strong>on</strong>g>m between c<strong>on</strong>cept bridges and property bridges. C<strong>on</strong>cept<br />

bridges define <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s between source instances and target instances, whereas<br />

property bridges specify <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s between source properties and target properties.<br />

The Semantic Bridge phase defines in <str<strong>on</strong>g>the</str<strong>on</strong>g> following steps <str<strong>on</strong>g>the</str<strong>on</strong>g> necessary structures<br />

to describe <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between two <strong>on</strong>tologies:<br />

1. Based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities that were discovered in <str<strong>on</strong>g>the</str<strong>on</strong>g> Similarity phase,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> first step is to select <str<strong>on</strong>g>the</str<strong>on</strong>g> pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> entities, which could be c<strong>on</strong>cepts, relati<strong>on</strong>s


CHAPTER 4. THE SURVEY 37<br />

and attributes, to be bridged that corresp<strong>on</strong>d with c<strong>on</strong>cept bridges. MAFRA allows<br />

relati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> different cardinality between source and target entities. Thus, a source<br />

or target entity can bel<strong>on</strong>g to <strong>on</strong>e or more Semantic Bridges.<br />

2. The property bridging step specifies matching properties for each c<strong>on</strong>cept bridge.<br />

The authors <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA distinguish two types <str<strong>on</strong>g>of</str<strong>on</strong>g> properties: attributes and relati<strong>on</strong>s.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> case that <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> source and target properties is different <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong><br />

specificati<strong>on</strong> informati<strong>on</strong> is required, and <str<strong>on</strong>g>the</str<strong>on</strong>g> domain expert is asked to supply<br />

this informati<strong>on</strong>. According to MAFRA’s definiti<strong>on</strong>, an attribute defines a relati<strong>on</strong><br />

between a c<strong>on</strong>cept and a data type value and a relati<strong>on</strong> defines a relati<strong>on</strong> between<br />

two c<strong>on</strong>cepts.<br />

3. This step (toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with <str<strong>on</strong>g>the</str<strong>on</strong>g> next <strong>on</strong>e) is p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> a refinement process to improve <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

matching results, and focuses <strong>on</strong> looking for mapping alternatives where <str<strong>on</strong>g>the</str<strong>on</strong>g>re is<br />

no target entities. If it is not possible to find a target entity for a source entity, <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

algorithm analyzes <str<strong>on</strong>g>the</str<strong>on</strong>g> hierarchy <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology and proposes an equivalent<br />

mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> some <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> parents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> unmapped source entity. So <str<strong>on</strong>g>the</str<strong>on</strong>g> source entity<br />

is mapped to <str<strong>on</strong>g>the</str<strong>on</strong>g> same target entities as some <str<strong>on</strong>g>of</str<strong>on</strong>g> its parents.<br />

4. As a p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> refinement process menti<strong>on</strong>ed previously, in this step <str<strong>on</strong>g>the</str<strong>on</strong>g> system<br />

tries to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> bridges between source c<strong>on</strong>cepts and target c<strong>on</strong>cepts.<br />

It can be viewed as a complementary routine to <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity phase.<br />

5. Associate transformati<strong>on</strong> procedures with <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping relati<strong>on</strong>s identified in previous<br />

phases. Although <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> main goals <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> authors <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA is to provide<br />

an elevated level <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping procedure, <str<strong>on</strong>g>the</str<strong>on</strong>g>y recognize that in<br />

this step <str<strong>on</strong>g>the</str<strong>on</strong>g> interventi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an expert is highly recommended.<br />

To finish this brief descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA, we present an example from [MMSV02]<br />

(see Figure 4.4). The goal <str<strong>on</strong>g>of</str<strong>on</strong>g> this exercise is to map two <strong>on</strong>tologies: <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology<br />

(o1) describes <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> a family and its events are categorized in family events<br />

(marriage and divorce) and individual events (birth date, death date); and <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology<br />

(o2) characterizes individuals as Man and Woman. A C<strong>on</strong>cept Bridge is defined to<br />

map o1 : Individual with o2 : Individual. All <str<strong>on</strong>g>the</str<strong>on</strong>g> attribute bridges are mapped using<br />

property/attribute bridges except for o1 : Individual - sex. This attribute is mapped using<br />

an alternative semantic bridge with two c<strong>on</strong>cept bridges that map o1 : Individual -<br />

sex with o2 : Man and o2 : W oman.<br />

<strong>Ontology</strong> Languages MAFRA needs <str<strong>on</strong>g>the</str<strong>on</strong>g> Lift & Normalizati<strong>on</strong> module to translate <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>tologies that p<str<strong>on</strong>g>art</str<strong>on</strong>g>icipate in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process into RDF (S). Precisely, <str<strong>on</strong>g>the</str<strong>on</strong>g> terminology<br />

specificati<strong>on</strong> is transformed to RDF Schema and <str<strong>on</strong>g>the</str<strong>on</strong>g> instances to RDF.


CHAPTER 4. THE SURVEY 38<br />

Figure 4.4: UML representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic bridge defined to map <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> example [MMSV02]<br />

Mapping Language Semantic Bridges (SBs) in c<strong>on</strong>juncti<strong>on</strong> with transformati<strong>on</strong>s modules<br />

services provide all <str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong>ality that a mapping language requires. The semantics<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> this mapping formalism is unambiguously specified through <str<strong>on</strong>g>the</str<strong>on</strong>g> SBO (Semantic<br />

Bridging <strong>Ontology</strong>).<br />

Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r important characteristic <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA is that it supports several types <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping<br />

like Class mappings, Property (i.e. relati<strong>on</strong>) mappings and Instance mappings.<br />

Mapping Patterns MAFRA does not support <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns in <strong>on</strong>tology<br />

mappings. However, <strong>on</strong>e could see a Semantic Bridge as an elementary mapping pattern<br />

and a specific combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a number <str<strong>on</strong>g>of</str<strong>on</strong>g> Semantic Bridges can be seen as a mapping<br />

pattern. Therefore, it should be possible to incorporate <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns into<br />

MAFRA.<br />

Automati<strong>on</strong> Support One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> main goals <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> designers <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA is to get a<br />

high level <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong> support. Unfortunately, <str<strong>on</strong>g>the</str<strong>on</strong>g> papers that describe <str<strong>on</strong>g>the</str<strong>on</strong>g> tool do not<br />

indicate precisely which steps are automatic and which are not. Also MAFRA gives <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user <str<strong>on</strong>g>the</str<strong>on</strong>g> opportunity to define Semantic Bridges manually. The modules that are directly<br />

involved in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process (horiz<strong>on</strong>tal dimensi<strong>on</strong>) present <str<strong>on</strong>g>the</str<strong>on</strong>g> following level <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

automati<strong>on</strong>:<br />

• Lift & Normalizati<strong>on</strong> is probably a module that can work independently from users<br />

to provide a uniform representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies that will be mapped.


CHAPTER 4. THE SURVEY 39<br />

• The calculati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities inside a multi-strategy process looks like it is fully<br />

automatic.<br />

• The generati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Semantic Bridges is p<str<strong>on</strong>g>art</str<strong>on</strong>g>ially automated. The specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

mappings between properties (property bridging step) and <str<strong>on</strong>g>the</str<strong>on</strong>g> associati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> transformati<strong>on</strong><br />

procedures with mapping relati<strong>on</strong>s require <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g>icipati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a domain<br />

expert.<br />

• The executi<strong>on</strong> engine, implemented in Java, is fully automated, and achieves <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

transformati<strong>on</strong>s defined in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges.<br />

• The postprocessing module is not fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r elaborated in <str<strong>on</strong>g>the</str<strong>on</strong>g> papers that described<br />

MAFRA, and <str<strong>on</strong>g>the</str<strong>on</strong>g> level <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong> is not specified.<br />

The module Domain and Background Knowledge provides mechanisms to include<br />

background knowledge and domain c<strong>on</strong>straints by using for example glossaries or lexical<br />

<strong>on</strong>tologies. This features can c<strong>on</strong>siderably improved <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> results <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

similarity module and <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic bridge module.<br />

Applicability to Use Cases One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> goals <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA is to support instance transformati<strong>on</strong><br />

through transformati<strong>on</strong> procedures that are associated to Semantic Bridges. The<br />

postprocessing module tries to provide support for instance unificati<strong>on</strong> (recognizing that<br />

two instances represent <str<strong>on</strong>g>the</str<strong>on</strong>g> same real word object), but <str<strong>on</strong>g>the</str<strong>on</strong>g> authors recognized that it is<br />

a very challenging task and do not guarantee that it is fully implemented. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

hand, Semantic Bridges define explicitly mappings between entities <str<strong>on</strong>g>of</str<strong>on</strong>g> two <strong>on</strong>tologies,<br />

and MAFRA provides a semantic specificati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g>se mechanisms.<br />

Finally, in <str<strong>on</strong>g>the</str<strong>on</strong>g> papers [SaR03a] and [SaR03b], <str<strong>on</strong>g>the</str<strong>on</strong>g> authors outline a mechanism is<br />

close to <str<strong>on</strong>g>the</str<strong>on</strong>g> idea <str<strong>on</strong>g>of</str<strong>on</strong>g> query rewriting to retrieve all <str<strong>on</strong>g>the</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> a query that are stored<br />

in several <strong>on</strong>tologies which have mapping specificati<strong>on</strong>s between each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r.<br />

Implementati<strong>on</strong> As we menti<strong>on</strong>ed in <str<strong>on</strong>g>the</str<strong>on</strong>g> summary descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> this tool, <str<strong>on</strong>g>the</str<strong>on</strong>g> MAFRA<br />

toolkit was implemented as plug-in <str<strong>on</strong>g>of</str<strong>on</strong>g> KAON. Silva and colleagues c<strong>on</strong>tinuous with<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> this mapping system, and <str<strong>on</strong>g>the</str<strong>on</strong>g> latest versi<strong>on</strong>s can be founded at<br />

http://mafra-toolkit.sourceforge.net. Also some examples and documentati<strong>on</strong> are available<br />

<strong>on</strong> this site. MAFRA’s current approach is being used and tested under <str<strong>on</strong>g>the</str<strong>on</strong>g> Harm<strong>on</strong>ise<br />

project 2 . Harm<strong>on</strong>ise intends to overcome <str<strong>on</strong>g>the</str<strong>on</strong>g> interoperability problems occurring<br />

between tourism operators due to <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> different informati<strong>on</strong> representati<strong>on</strong> standards.<br />

The MAFRA Toolkit was adopted as <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> and transformati<strong>on</strong> engine<br />

core technology for <str<strong>on</strong>g>the</str<strong>on</strong>g> Harm<strong>on</strong>ise project. Harm<strong>on</strong>ise uses an “Interoperability Minimum<br />

Harm<strong>on</strong>isati<strong>on</strong> <strong>Ontology</strong>” (IMHO) as lingua franca between agents. The MAFRA<br />

Toolkit is resp<strong>on</strong>sible for <str<strong>on</strong>g>the</str<strong>on</strong>g> acquisiti<strong>on</strong>, representati<strong>on</strong> and executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

mapping between each agent specific <strong>on</strong>tology and IMHO [SaR03a].<br />

2 http://www.harm<strong>on</strong>ise.org


CHAPTER 4. THE SURVEY 40<br />

Experiences Silva and colleagues provided an informal evaluati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g>ir papers<br />

([SaR03a, SaR03b]) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> performance <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA and <str<strong>on</strong>g>the</str<strong>on</strong>g>y compare <str<strong>on</strong>g>the</str<strong>on</strong>g>ir results with<br />

OntoMerge [DMQ02], a tool for mapping and merge.<br />

4.1.2 RDFT<br />

Summary Omelayenko and Fensel [OF01] present an approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

product informati<strong>on</strong> over <str<strong>on</strong>g>the</str<strong>on</strong>g> web by exploiting <str<strong>on</strong>g>the</str<strong>on</strong>g> data model <str<strong>on</strong>g>of</str<strong>on</strong>g> RDF [LS99], which<br />

is based <strong>on</strong> directed labeled graphs. In <str<strong>on</strong>g>the</str<strong>on</strong>g>ir approach, Omelayenko and Fensel assume<br />

product catalogs from different organizati<strong>on</strong>s specified in XML documents. The problem<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y sketch is different organizati<strong>on</strong>s using different representati<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g>ir product<br />

catalogs. They intend to mediate between <str<strong>on</strong>g>the</str<strong>on</strong>g>se different representati<strong>on</strong>s with <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

RDF triples 3 .<br />

The approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> product catalogs is called two-layered because <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

product informati<strong>on</strong> itself is represented in XML, whereas <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> between<br />

different representati<strong>on</strong>s is d<strong>on</strong>e in RDF. The general idea is that an XML document,<br />

whose structure is described by a DTD (Document Type Definiti<strong>on</strong>) or XML Schema, is<br />

(1) abstracted to an RDF graph, which in turn is described by an <strong>on</strong>tology, which could<br />

be specified using <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF Schema [BG04] <strong>on</strong>tology language. The RDF document is<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>n (2) transformed into a target representati<strong>on</strong>, which is also described by an <strong>on</strong>tology.<br />

Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> target RDF is (3) refined to <str<strong>on</strong>g>the</str<strong>on</strong>g> target XML representati<strong>on</strong>, which can be used<br />

by applicati<strong>on</strong>s at <str<strong>on</strong>g>the</str<strong>on</strong>g> target vendor. All three transformati<strong>on</strong> steps are performed with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

XML transformati<strong>on</strong> language XSLT [Cla99]. The process <str<strong>on</strong>g>of</str<strong>on</strong>g> abstracti<strong>on</strong>, transformati<strong>on</strong><br />

and refinement is illustrated in Figure 4.5.<br />

[Ome02b] proposes a mapping meta-<strong>on</strong>tology for describing <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> between<br />

RDF documents. This mapping meta-<strong>on</strong>tology, called RDFT (RDF Transformati<strong>on</strong>)<br />

is specified using RDF Schema [BG04] and is used to describe <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between<br />

two RDFS <strong>on</strong>tologies. We describe this <strong>on</strong>tology and its use in more detail below.<br />

[Ome02a] describes a technique for discovering semantic corresp<strong>on</strong>dence between<br />

different product classificati<strong>on</strong> schemes based <strong>on</strong> a Naive-Bayes classifier. The mappings<br />

between <str<strong>on</strong>g>the</str<strong>on</strong>g> different product classificati<strong>on</strong>s are represented using <str<strong>on</strong>g>the</str<strong>on</strong>g> bridges from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

RDFT meta-<strong>on</strong>tology.<br />

<strong>Ontology</strong> Languages Omelayenko [Ome02b] not <strong>on</strong>ly describes a way to map between<br />

different RDF Schema <strong>on</strong>tologies, but also describes <str<strong>on</strong>g>the</str<strong>on</strong>g> way to transform XML documents<br />

to RDF using RDFT, <str<strong>on</strong>g>the</str<strong>on</strong>g>reby effectively specifying <str<strong>on</strong>g>the</str<strong>on</strong>g> way to perform <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracti<strong>on</strong><br />

step.<br />

3 An RDF triple c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a subject, a predicate and an object. Subjects and objects form <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> graph, whereas predicates form <str<strong>on</strong>g>the</str<strong>on</strong>g> edges. An object in a triple can also occur as a subject or an object<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> a different triple.


CHAPTER 4. THE SURVEY 41<br />

Figure 4.5: Two-layered integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> XML catalogs using RDF<br />

RDFT can be used to express mappings between arbitrary <strong>on</strong>tologies specified in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

RDF Schema <strong>on</strong>tology language. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, it can be used to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong><br />

between XML documents and <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF representati<strong>on</strong>.<br />

Mapping language We will now give a short overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFT mapping meta<strong>on</strong>tology.<br />

The RDFT meta-<strong>on</strong>tology is used to describe three types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings denoted by<br />

classes in RDFT:<br />

• An EventMap is used to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between different events. Events<br />

in this c<strong>on</strong>text corresp<strong>on</strong>d to activity occurrences, such as sending or receiving a<br />

message. These events can be used, for example, to c<strong>on</strong>nect descripti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> two<br />

web services, described using <str<strong>on</strong>g>the</str<strong>on</strong>g> Web Service Definiti<strong>on</strong> Language WSDL 4 .<br />

• A DocumentMap specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between an XML and an RDF representati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> a catalog.<br />

• A VocabularyMap specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> actual relati<strong>on</strong>ships between two <strong>on</strong>tologies.<br />

For our purposes, <str<strong>on</strong>g>the</str<strong>on</strong>g> most interesting type <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping is <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabulary mapping<br />

(VocabularyMap).<br />

A mapping between two <strong>on</strong>tologies (vocabularies) is expressed using a number <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

bridges. Bridges in RDFT are subclasses <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFBridge class. RDFT distinguishes<br />

two types <str<strong>on</strong>g>of</str<strong>on</strong>g> RDF bridges, namely Class2Class and Property2Property<br />

4 http://www.w3.org/TR/wsdl


CHAPTER 4. THE SURVEY 42<br />

bridges. Class2Class bridges are used to describe <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between two classes<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se classes. The instance transformati<strong>on</strong> is specified<br />

using XPath [CD99] expressi<strong>on</strong>s. Property2Property bridges are used to describe<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between two properties in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. Again, XPath can be used<br />

to specify instance transformati<strong>on</strong>s.<br />

The types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings in RDFT (class-to-class and property-to-property) are probably<br />

sufficient in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <str<strong>on</strong>g>of</str<strong>on</strong>g> e-Marketplaces, which was <str<strong>on</strong>g>the</str<strong>on</strong>g> original target applicati<strong>on</strong><br />

domain [OF01], because <strong>on</strong>tologies can be expected to have a similar level <str<strong>on</strong>g>of</str<strong>on</strong>g> granularity<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> goals <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> different <strong>on</strong>tologies are similar. However, if <strong>on</strong>tologies are more<br />

diverse, different types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings, e.g. classes-to-instances, classes-to-properties, etc.<br />

will be necessary.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> approach taken by Omelayenko (cf. [OF01, Ome02b]), <str<strong>on</strong>g>the</str<strong>on</strong>g> steps <str<strong>on</strong>g>of</str<strong>on</strong>g> abstracti<strong>on</strong>,<br />

transformati<strong>on</strong>, and refinement all use <str<strong>on</strong>g>the</str<strong>on</strong>g> XML Transformati<strong>on</strong> language XSLT [Cla99]<br />

for specifying <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s between XML and RDF documents, as well as transformati<strong>on</strong>s<br />

between different RDF representati<strong>on</strong>s. While certainly XSLT is expressive<br />

enough to express arbitrary transformati<strong>on</strong>s between XML documents, and can <str<strong>on</strong>g>the</str<strong>on</strong>g>refore<br />

also transform RDF documents represented in <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF/XML [Bec03] serializati<strong>on</strong> into<br />

a different representati<strong>on</strong>, it is not well-suited for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> RDF transformati<strong>on</strong>s,<br />

because it does not take <str<strong>on</strong>g>the</str<strong>on</strong>g> data model <str<strong>on</strong>g>of</str<strong>on</strong>g> RDF, which is graph based into account,<br />

whereas <str<strong>on</strong>g>the</str<strong>on</strong>g> data model <str<strong>on</strong>g>of</str<strong>on</strong>g> XML is tree based. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF data model needs to<br />

be in a sense encoded in <str<strong>on</strong>g>the</str<strong>on</strong>g> tree based XML model in each single XSLT transformati<strong>on</strong>.<br />

Automati<strong>on</strong> Support [Ome02a] describes a way to discover similarity between classes<br />

based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> instance informati<strong>on</strong> for this class, using a machine-learning approach. In <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

use case, <str<strong>on</strong>g>the</str<strong>on</strong>g> class was a product classifier and <str<strong>on</strong>g>the</str<strong>on</strong>g> instance data c<strong>on</strong>sisted <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> product<br />

descripti<strong>on</strong>s.<br />

The RDFT meta-<strong>on</strong>tology was presented as <str<strong>on</strong>g>the</str<strong>on</strong>g> preferred way to specify mappings between<br />

<strong>on</strong>tologies, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities discovered by a matching tool, but no explicit<br />

support is provided for this.<br />

Applicability to Use Cases RDFT tackles <str<strong>on</strong>g>the</str<strong>on</strong>g> use case <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong><br />

through <str<strong>on</strong>g>the</str<strong>on</strong>g> XPath specificati<strong>on</strong>s attached to <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF Bridges. RDFT does not <str<strong>on</strong>g>of</str<strong>on</strong>g>fer a<br />

soluti<strong>on</strong> for instance unificati<strong>on</strong>, nor for query rewriting, although <str<strong>on</strong>g>the</str<strong>on</strong>g> declarative mapping<br />

between classes and properties could be used for this purpose. The scope <str<strong>on</strong>g>of</str<strong>on</strong>g> RDFT<br />

is limited to <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> XML documents between different representati<strong>on</strong>s.<br />

The use case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging is not addressed, although a specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships<br />

between <strong>on</strong>tologies in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFT meta-<strong>on</strong>tology could help in merging<br />

different <strong>on</strong>tologies, because it specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between classes.


CHAPTER 4. THE SURVEY 43<br />

Implementati<strong>on</strong> A prototype tool was created to create mappings based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFT<br />

meta-<strong>on</strong>tology.<br />

Experiences RDFT as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> method proposed in [Ome02a] have<br />

been used for <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery and specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings between product classificati<strong>on</strong><br />

schemes in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> GoldenBullet [DKO + 02] project.<br />

4.1.3 PROMPT<br />

Summary The PROMPT suite c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a set <str<strong>on</strong>g>of</str<strong>on</strong>g> tools that have had an important impact<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> merging, alignment and versi<strong>on</strong>ing <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. A relevant result<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> this development is <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a global strategy that looks to take advantage <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> synergies that have been generated by <str<strong>on</strong>g>the</str<strong>on</strong>g> combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> tools that in <str<strong>on</strong>g>the</str<strong>on</strong>g> past where<br />

c<strong>on</strong>sidered independent. The PROMPT suite [NM03b] includes an <strong>on</strong>tology merging<br />

tool (iPROMPT, formerly known as PROMPT [NM00b]), an <strong>on</strong>tology tool for finding<br />

additi<strong>on</strong>al points <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity between <strong>on</strong>tologies for o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tools like iPROMPT (AnchorPROMPT,<br />

[NM00a]), an <strong>on</strong>tology versi<strong>on</strong>ing tool (PROMPTDiff, [NM03a]), and a<br />

tool for factoring out semantically complete sub<strong>on</strong>tologies (PROMPTFactor, [NM03b]).<br />

The work <str<strong>on</strong>g>of</str<strong>on</strong>g> Natasha Noy and colleagues proves that different tasks in multiple <strong>on</strong>tology<br />

management 5 ,like looking for differences between versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology or looking<br />

for similarities between two <strong>on</strong>tologies in a merging process, are closely interrelated and<br />

share several comp<strong>on</strong>ents and heuristics (see Figure 4.6). Thus tools for supporting some<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tasks in <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> multiple <strong>on</strong>tology management can benefit greatly from <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

integrati<strong>on</strong> with o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs [NM03a].<br />

The key comp<strong>on</strong>ents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT suite have been developed as extensi<strong>on</strong>s (plugins)<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Protégé 2000 <strong>on</strong>tology development envir<strong>on</strong>ment 6 . We can distinguish <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

following comp<strong>on</strong>ents:<br />

• iPROMPT is an interactive <strong>on</strong>tology merging tool, which helps users in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

merging task by providing suggesti<strong>on</strong>s about which elements can be merged, by<br />

identifying inc<strong>on</strong>sistencies and potential problems and suggesting possible strategies<br />

to resolve <str<strong>on</strong>g>the</str<strong>on</strong>g>se problems and inc<strong>on</strong>sistencies.<br />

• AnchorPROMPT extends <str<strong>on</strong>g>the</str<strong>on</strong>g> performances <str<strong>on</strong>g>of</str<strong>on</strong>g> tools like iPROMPT determining<br />

additi<strong>on</strong>al points <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities between <strong>on</strong>tologies that are not identified by<br />

iPROMPT.<br />

5 Noy and colleagues [NM03b] define multiple <strong>on</strong>tology management as a set <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>crete tasks for<br />

dealing with multiple <strong>on</strong>tologies such as maintaining libraries <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies, import and reuse <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies,<br />

translating <strong>on</strong>tologies to o<str<strong>on</strong>g>the</str<strong>on</strong>g>r formalisms, <strong>on</strong>tology versi<strong>on</strong>ing support, <strong>on</strong>tology merging-mappingalignment<br />

support, inference across multiple <strong>on</strong>tologies and query across multiple <strong>on</strong>tologies<br />

6 http://protege.semanticweb.org/


CHAPTER 4. THE SURVEY 44<br />

Figure 4.6: The PROMPT suite infrastructure and interacti<strong>on</strong>s between tools [NM03b]


CHAPTER 4. THE SURVEY 45<br />

• PROMPTDiff compares two versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology and identifies structural differences<br />

between different versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tology.<br />

• PROMPTFactor is a tool that enables users to create a new <strong>on</strong>tology factoring out<br />

p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> an existing <strong>on</strong>tology. In this process, <str<strong>on</strong>g>the</str<strong>on</strong>g> tool guarantees that <str<strong>on</strong>g>the</str<strong>on</strong>g> terms<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resulting sub<strong>on</strong>tology are well-defined (for instance, every c<strong>on</strong>cept <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

sub<strong>on</strong>tology includes as appropriate <str<strong>on</strong>g>the</str<strong>on</strong>g> superc<strong>on</strong>cepts/subc<strong>on</strong>cepts required for its<br />

specificati<strong>on</strong>).<br />

One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> major c<strong>on</strong>tributi<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> PROMPT suite was <str<strong>on</strong>g>the</str<strong>on</strong>g> identificati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> an important overlap in <str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong>ality <str<strong>on</strong>g>of</str<strong>on</strong>g> its tools and <str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

an integrated approach where all <str<strong>on</strong>g>the</str<strong>on</strong>g>se tools benefit from each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r. For instance, some <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ents that were originally created for <str<strong>on</strong>g>the</str<strong>on</strong>g> interface <str<strong>on</strong>g>of</str<strong>on</strong>g> iPROMPT were reused in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> interfaces <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tools <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> suite. In additi<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> initial<br />

sets <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms between two <strong>on</strong>tologies that AnchorPROMPT requires as a st<str<strong>on</strong>g>art</str<strong>on</strong>g>ing<br />

point for a deeper analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities can be provided by iPROMPT. In return, AnchorPROMPT<br />

can supply an additi<strong>on</strong>al set <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms that can be used by iPROMPT<br />

to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> results <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process. A final example <str<strong>on</strong>g>of</str<strong>on</strong>g> this integrated approach<br />

can be found in <str<strong>on</strong>g>the</str<strong>on</strong>g> design <str<strong>on</strong>g>of</str<strong>on</strong>g> PROMPTDiff and iPROMPT. PROMPTDiff uses some<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> heuristics that were initially developed in iPROMPT for comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept<br />

names, slots attached to c<strong>on</strong>cepts, domains and range <str<strong>on</strong>g>of</str<strong>on</strong>g> slots and so <strong>on</strong>.<br />

As menti<strong>on</strong>ed above, iPROMPT [NM00b] is an interactive tool implemented as an<br />

extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Protégé 2000. iPROMPT guides users in <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> merging two <strong>on</strong>tologies<br />

(see an example <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user interface in Figure 4.7). The tool was originally<br />

developed to handle <strong>on</strong>tologies specified in OKBC [CFF + 98], but <str<strong>on</strong>g>the</str<strong>on</strong>g>re are significant<br />

efforts to adapt <str<strong>on</strong>g>the</str<strong>on</strong>g> tool in order to support <str<strong>on</strong>g>the</str<strong>on</strong>g> OWL <strong>on</strong>tology language [DS04]. The<br />

central element <str<strong>on</strong>g>of</str<strong>on</strong>g> iPROMPT is <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm that defines a set <str<strong>on</strong>g>of</str<strong>on</strong>g> steps for <str<strong>on</strong>g>the</str<strong>on</strong>g> interactive<br />

merging process, see also Figure 4.8. The first step is to identify potential merge candidates<br />

based <strong>on</strong> class-name similarities. The result is presented to <str<strong>on</strong>g>the</str<strong>on</strong>g> user as a list <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

potential merge operati<strong>on</strong>s. The sec<strong>on</strong>d step is initiated by <str<strong>on</strong>g>the</str<strong>on</strong>g> user who chooses <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> suggested operati<strong>on</strong>s from <str<strong>on</strong>g>the</str<strong>on</strong>g> list or specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> directly. The system performs<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> requested acti<strong>on</strong> and automatically executes additi<strong>on</strong>al changes derived from<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> acti<strong>on</strong>. It <str<strong>on</strong>g>the</str<strong>on</strong>g>n makes a new list <str<strong>on</strong>g>of</str<strong>on</strong>g> suggested acti<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> user based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> new<br />

structure <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology, determines c<strong>on</strong>flicts introduced by <str<strong>on</strong>g>the</str<strong>on</strong>g> last acti<strong>on</strong>, finds possible<br />

soluti<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g>se c<strong>on</strong>flicts and displays <str<strong>on</strong>g>the</str<strong>on</strong>g>se to <str<strong>on</strong>g>the</str<strong>on</strong>g> user.<br />

Initially, PROMPT identified a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging operati<strong>on</strong>s (merge classes,<br />

merge slots, merge bindings between a slot and a class, etc) and a set <str<strong>on</strong>g>of</str<strong>on</strong>g> possible c<strong>on</strong>flicts<br />

for <str<strong>on</strong>g>the</str<strong>on</strong>g>se operati<strong>on</strong>s (name c<strong>on</strong>flicts, dangling references, redundancy in <str<strong>on</strong>g>the</str<strong>on</strong>g> class hierarchy<br />

and slot-value restricti<strong>on</strong>s that violate class inheritance). These lists <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology<br />

merging operati<strong>on</strong>s and possible c<strong>on</strong>flict operati<strong>on</strong>s have been extended by <str<strong>on</strong>g>the</str<strong>on</strong>g> authors <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> tool as a p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> an evoluti<strong>on</strong> process in <str<strong>on</strong>g>the</str<strong>on</strong>g> design <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />

The goal <str<strong>on</strong>g>of</str<strong>on</strong>g> AnchorPROMPT [NM00a] is to augment <str<strong>on</strong>g>the</str<strong>on</strong>g> results <str<strong>on</strong>g>of</str<strong>on</strong>g> methods that


CHAPTER 4. THE SURVEY 46<br />

Figure 4.7: An example <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging in iPROMPT<br />

Figure 4.8: The flow <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> iPROMPT algorithm [NM00b]


CHAPTER 4. THE SURVEY 47<br />

analyze <strong>on</strong>ly local c<strong>on</strong>text in <strong>on</strong>tology structures, such as Chimaera [MFRW00] and<br />

iPROMPT [NM00b], by finding additi<strong>on</strong>al possible points <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity between <strong>on</strong>tologies.<br />

To do this AnchorPROMPT requires that <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tool or <str<strong>on</strong>g>the</str<strong>on</strong>g> user provides an initial<br />

set <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms. Following a graph perspective, <str<strong>on</strong>g>the</str<strong>on</strong>g> tool establishes a set <str<strong>on</strong>g>of</str<strong>on</strong>g> paths that<br />

c<strong>on</strong>nects <str<strong>on</strong>g>the</str<strong>on</strong>g> terms <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology that are related with <str<strong>on</strong>g>the</str<strong>on</strong>g> terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>e. The<br />

algorithm takes two pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms as input and analyzes <str<strong>on</strong>g>the</str<strong>on</strong>g> elements that are included<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> path that c<strong>on</strong>nect <str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tology with <str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> equivalence path <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology. So, we have two paths (<strong>on</strong>e for each <strong>on</strong>tology)<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> terms that compound <str<strong>on</strong>g>the</str<strong>on</strong>g>se paths. The analysis looks for terms al<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g> paths<br />

that might be similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r path, which bel<strong>on</strong>gs to <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology,<br />

assuming that <str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g> those paths are <str<strong>on</strong>g>of</str<strong>on</strong>g>ten similar as well. These new potentially<br />

related terms <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm discovers are marked with a similarity score that can<br />

be modified during <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r paths in which <str<strong>on</strong>g>the</str<strong>on</strong>g>se terms are also involved.<br />

Terms with high similar scores will be presented to <str<strong>on</strong>g>the</str<strong>on</strong>g> user to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> set <str<strong>on</strong>g>of</str<strong>on</strong>g> possible<br />

suggesti<strong>on</strong>s in, for example, a merging process in iPROMPT.<br />

If <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies that we compare present important differences in <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

levels <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir hierarchy or in <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>s between classes, <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm does<br />

not work well.<br />

The third element <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> suite is PROMPTDiff [NM03a], which is used to compare<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular <strong>on</strong>tology and which identifies <str<strong>on</strong>g>the</str<strong>on</strong>g> frames<br />

(i.e. classes, slots or instances) that have no changes, frames with <strong>on</strong>ly changes in <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

properties, and frames that have also changed in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r p<str<strong>on</strong>g>art</str<strong>on</strong>g>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir definiti<strong>on</strong>s. The name<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool, PROMPTDiff, is influenced by tools like CVS, which is a versi<strong>on</strong> c<strong>on</strong>trol<br />

system that is used to maintain <str<strong>on</strong>g>the</str<strong>on</strong>g> history <str<strong>on</strong>g>of</str<strong>on</strong>g> program source code files. This tool includes<br />

facilities to discover changes between versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> a document (finding a diff).<br />

The last element <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT suite we will describe here is <str<strong>on</strong>g>the</str<strong>on</strong>g> tool PROMPT-<br />

Factor [NM03b] which allows users to extract from a larger <strong>on</strong>tology <str<strong>on</strong>g>the</str<strong>on</strong>g> elements that<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> user is interested in, in a way that also copies all <str<strong>on</strong>g>the</str<strong>on</strong>g> terms required for preserving<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong>s. The authors <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool call this process “factoring<br />

sub<strong>on</strong>tologies”.<br />

During <str<strong>on</strong>g>the</str<strong>on</strong>g> analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT suite, we c<strong>on</strong>cluded that <str<strong>on</strong>g>the</str<strong>on</strong>g> tool has some limitati<strong>on</strong>s<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology versi<strong>on</strong>ing and evoluti<strong>on</strong>. We present a summary <str<strong>on</strong>g>of</str<strong>on</strong>g> some<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> most relevant c<strong>on</strong>clusi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> our study (some <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>m where c<strong>on</strong>firmed by Natasha<br />

Noy):<br />

• PROMPTDiff <strong>on</strong>ly detects differences between two versi<strong>on</strong>s using a structural diff.<br />

In [Kle04], we can find several complementary alternatives (change logs, c<strong>on</strong>ceptual<br />

relati<strong>on</strong>s and transformati<strong>on</strong> set) that can give us a richer descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

changes that <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tology has underg<strong>on</strong>e.


CHAPTER 4. THE SURVEY 48<br />

• The descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> differences between two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology that<br />

PROMPTDiff <str<strong>on</strong>g>of</str<strong>on</strong>g>fers is limited. For this reas<strong>on</strong>, Klein extended PROMPTDiff to<br />

support richer semantic descripti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> changes. He introduced a more complex<br />

classificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> changes (implicitly-changed, directly-changed, changed,<br />

isomorphic and unchanged, see [Kle04]) and provides a high level descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> changes based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> idea <str<strong>on</strong>g>of</str<strong>on</strong>g> minimal transformati<strong>on</strong> set and <strong>on</strong> an <strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

changes (again see, [Kle04]).<br />

PROMPTDiff can find difference between <strong>on</strong>tologies but it does not mean that <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

is explicit support for versi<strong>on</strong>ing. PROMPTDiff does not allow <str<strong>on</strong>g>the</str<strong>on</strong>g> user to identify<br />

versi<strong>on</strong>s or to indicate that <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a versi<strong>on</strong>ing relati<strong>on</strong>ship between <strong>on</strong>tologies.<br />

Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> user has to find a way to manage different versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology<br />

and to identify that a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular <strong>on</strong>tology is a versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology.<br />

<strong>Ontology</strong> Languages The knowledge model underlying PROMPT is <str<strong>on</strong>g>the</str<strong>on</strong>g> Open Knowledge<br />

Base C<strong>on</strong>nectivity (OKBC) protocol [CFF + 98]. OKBC is frame-based: frames are<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> main elements in this knowledge model for building <strong>on</strong>tologies, and three types <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

frames can be distinguished: classes, slots and instances. A class is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> entities and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g> such a set are called instances, which may be classes or slots as well as individuals.<br />

Slots define binary relati<strong>on</strong>s between pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> entities, whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g>y be classes,<br />

instances, or primitive objects (such as a string or a number).<br />

There has been a c<strong>on</strong>siderable effort to provide PROMPT with RDF and OWL<br />

support[NM03b]. In <str<strong>on</strong>g>the</str<strong>on</strong>g> tutorial about <str<strong>on</strong>g>the</str<strong>on</strong>g> Protégé OWL plug-in given at ISWC03 7 it was<br />

menti<strong>on</strong>ed that PROMPT can be used in c<strong>on</strong>juncti<strong>on</strong> with this plug-in 8 [KFNM]. Thus <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

OWL support for PROMPT is achieved through this plug-in.<br />

Mapping Language iPROMPT and AnchorPROMPT do not include a language that<br />

specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. We understand that <str<strong>on</strong>g>the</str<strong>on</strong>g>re should exist an internal representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping results because <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a str<strong>on</strong>g interacti<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g> tools <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> suite, and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y need to share <str<strong>on</strong>g>the</str<strong>on</strong>g>se results, but <str<strong>on</strong>g>the</str<strong>on</strong>g> related bibliography does not describe this possible<br />

formalism. Michel Klein (see [Kle04]) implemented an extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> PROMPTDiff that<br />

provides a language for change specificati<strong>on</strong> that characterizes differences between two<br />

<strong>on</strong>tologies. This language was originally defined using OKBC and <str<strong>on</strong>g>the</str<strong>on</strong>g>n translated and<br />

extended in OWL.<br />

iPROMPT in combinati<strong>on</strong> with AnchorPROMPT can map classes, properties and instances<br />

using linguistic and structural similarity techniques.<br />

Mapping Patterns Currently <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no support for mapping patterns in <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT<br />

suite.<br />

7 http://iswc2003.semanticweb.org/pdf/Protege-OWL-Tutorial-ISWC03.pdf<br />

8 http://protege.stanford.edu/plugins/owl/


CHAPTER 4. THE SURVEY 49<br />

Automati<strong>on</strong> Support iPROMPT is an interactive merging tool that guides users in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

process <str<strong>on</strong>g>of</str<strong>on</strong>g> merging two <strong>on</strong>tologies. iPROMPT proposes to <str<strong>on</strong>g>the</str<strong>on</strong>g> user a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology<br />

merging operati<strong>on</strong>s and a set <str<strong>on</strong>g>of</str<strong>on</strong>g> possible c<strong>on</strong>flicts for <str<strong>on</strong>g>the</str<strong>on</strong>g>se operati<strong>on</strong>s. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> user has<br />

two choices: select <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> suggesti<strong>on</strong>s generated by <str<strong>on</strong>g>the</str<strong>on</strong>g> tool, or specify <str<strong>on</strong>g>the</str<strong>on</strong>g> desired operati<strong>on</strong><br />

directly. After that, iPROMPT performs <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> and automatically executes<br />

additi<strong>on</strong>al changes that <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> requires. Finally <str<strong>on</strong>g>the</str<strong>on</strong>g> previous list <str<strong>on</strong>g>of</str<strong>on</strong>g> suggesti<strong>on</strong>s is<br />

modified as a result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> changes that <str<strong>on</strong>g>the</str<strong>on</strong>g> executed operati<strong>on</strong> produced. This cycle is<br />

repeated until <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process finishes, or <str<strong>on</strong>g>the</str<strong>on</strong>g> user decides to abort it.<br />

Applicability to Use Cases The PROMPT suite is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> tools that provides several soluti<strong>on</strong>s<br />

for <strong>on</strong>tology mediati<strong>on</strong>, versi<strong>on</strong>ing and factoring. iPROMPT covers <str<strong>on</strong>g>the</str<strong>on</strong>g> complete<br />

merging process, and can also generate a list <str<strong>on</strong>g>of</str<strong>on</strong>g> initial similarities that AnchorPROMPT<br />

improves in generating a new list <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms <strong>on</strong> which <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping could be based.<br />

Implementati<strong>on</strong> All <str<strong>on</strong>g>the</str<strong>on</strong>g> tools <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT suite are plug-ins or extensi<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Protégé-2000 <strong>on</strong>tology development envir<strong>on</strong>ment. Protégé-2000 provides an intuitive<br />

graphical user interface for <strong>on</strong>tology development, a rich knowledge model based <strong>on</strong><br />

OKBC, and an extensible architecture that provides API access both to <str<strong>on</strong>g>the</str<strong>on</strong>g> Protégé-2000<br />

knowledge bases and to its user interface comp<strong>on</strong>ents [NM03b]. One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> disadvantages<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> having Protégé-2000 as a base for PROMPT stems from <str<strong>on</strong>g>the</str<strong>on</strong>g> difficulties <str<strong>on</strong>g>of</str<strong>on</strong>g> supporting<br />

OWL <strong>on</strong> top <str<strong>on</strong>g>of</str<strong>on</strong>g> Protégé-2000. These are menti<strong>on</strong>ed in [KFNM] and are due to <str<strong>on</strong>g>the</str<strong>on</strong>g> differences<br />

between OKBC, Protégé-2000’s internal model, and OWL.<br />

The PROMPT suite is clearly user oriented where <str<strong>on</strong>g>the</str<strong>on</strong>g> main goal is to support <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user in creating <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings (and merging <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies).The suite <str<strong>on</strong>g>of</str<strong>on</strong>g> tools provides a<br />

comm<strong>on</strong> user interface that follows <str<strong>on</strong>g>the</str<strong>on</strong>g> schema <str<strong>on</strong>g>of</str<strong>on</strong>g> Protégé-2000 GUI, and comp<strong>on</strong>ents<br />

that implement <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator to find similarities between <strong>on</strong>tologies.<br />

The PROMPT suite was developed and improved in <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> several projects<br />

during <str<strong>on</strong>g>the</str<strong>on</strong>g> last 5 years, with <str<strong>on</strong>g>the</str<strong>on</strong>g> collaborati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> many users who c<strong>on</strong>tinuously evaluate<br />

and exploit <str<strong>on</strong>g>the</str<strong>on</strong>g> tools providing valuable feedback for <str<strong>on</strong>g>the</str<strong>on</strong>g> developers.<br />

Experiences The papers that describe iPROMPT [NM00b], AnchorPROMPT [NM00a]<br />

and PROMPTDiff [NM03a] include evaluati<strong>on</strong> tests to show <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se tools.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> iPROMPT [NM00b], <str<strong>on</strong>g>the</str<strong>on</strong>g> authors tested <str<strong>on</strong>g>the</str<strong>on</strong>g> tool using two <strong>on</strong>tologies<br />

with 134 class and slot frames in total. The first <strong>on</strong>tology was developed for <str<strong>on</strong>g>the</str<strong>on</strong>g> unified<br />

problem solving method development language (UPML) [FMvH + 03] and <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d <strong>on</strong>tology<br />

for <str<strong>on</strong>g>the</str<strong>on</strong>g> method descripti<strong>on</strong> language (MDL) [GGM98]. The evaluati<strong>on</strong> showed<br />

that human experts followed 90% <str<strong>on</strong>g>of</str<strong>on</strong>g> iPROMPT’s suggesti<strong>on</strong>s and 75% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>flict<br />

resoluti<strong>on</strong> strategies. The users performed 74% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong>s suggested by iPROMPT<br />

during <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process.


CHAPTER 4. THE SURVEY 50<br />

AnchorPROMPT was also tested in [NM00a]. The results show that <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

AnchorPROMPT is directly proporti<strong>on</strong>al to <str<strong>on</strong>g>the</str<strong>on</strong>g> length <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> paths c<strong>on</strong>sidered. For example<br />

with path length 2 <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy is 100% and with path length 4 <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy decreases<br />

to 67%. Noy and colleagues also tested AnchorPROMPT with <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tologies as<br />

iPROMPT. They discovered an important limitati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool: <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm does not<br />

provide good results when <str<strong>on</strong>g>the</str<strong>on</strong>g> structures <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies differ c<strong>on</strong>siderably. The UPML<br />

<strong>on</strong>tology has a large number <str<strong>on</strong>g>of</str<strong>on</strong>g> classes distributed <strong>on</strong> many different levels. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

hand, <str<strong>on</strong>g>the</str<strong>on</strong>g> MDL <strong>on</strong>tology has a simpler structure with fewer classes and with <strong>on</strong>ly two<br />

levels in <str<strong>on</strong>g>the</str<strong>on</strong>g> hierarchy.<br />

Finally, <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g> PROMPTDiff [NM03a] was evaluated using several versi<strong>on</strong>s<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> two <strong>on</strong>tologies <str<strong>on</strong>g>of</str<strong>on</strong>g> two different projects: EON project and PharmGKB project. The<br />

tool identified 96% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> possible matches (recall) and 93% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> identified matches<br />

were correct (precisi<strong>on</strong>).<br />

4.1.4 GLUE<br />

Summary GLUE [DMDH04] is a system which employs machine learning technologies<br />

to semi-automatically create mappings between heterogeneous <strong>on</strong>tologies, where an<br />

<strong>on</strong>tology is seen as a tax<strong>on</strong>omy <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts. With GLUE, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors port <str<strong>on</strong>g>the</str<strong>on</strong>g>ir previous<br />

work <strong>on</strong> matching database schemas (called LSD) [DMDH02] to <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web<br />

domain. GLUE focuses <strong>on</strong> finding 1-to-1 mappings between c<strong>on</strong>cepts in tax<strong>on</strong>omies, although<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> authors say that extending matching to relati<strong>on</strong>s and attributes and involving<br />

more complex mappings (such as 1-to-n and n-to-1 mappings) is <str<strong>on</strong>g>the</str<strong>on</strong>g> subject <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>going<br />

research.<br />

The similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> two c<strong>on</strong>cepts A and B in <str<strong>on</strong>g>the</str<strong>on</strong>g> two tax<strong>on</strong>omies O1 and O2 is based <strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> sets <str<strong>on</strong>g>of</str<strong>on</strong>g> instances that overlap between <str<strong>on</strong>g>the</str<strong>on</strong>g> two c<strong>on</strong>cepts. In order to determine whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept B is also an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept A, first a classifier is built using<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept A as <str<strong>on</strong>g>the</str<strong>on</strong>g> training set. This classifier is now used to classify <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

instances <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept B. The classifier <str<strong>on</strong>g>the</str<strong>on</strong>g>n decides for each instance <str<strong>on</strong>g>of</str<strong>on</strong>g> B, whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r it is<br />

also an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> A or not.<br />

Based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>se classificati<strong>on</strong>s, four probabilities are computed, namely P (A, B),<br />

P (A, B), P (A, B) and P (A, B), where, for example, P (A, B) is <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that<br />

an instance in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain bel<strong>on</strong>gs to A, but not to B. These four probabilities can now<br />

be used to compute <str<strong>on</strong>g>the</str<strong>on</strong>g> joint probability distributi<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts A and B, which is a<br />

user supplied functi<strong>on</strong>, using <str<strong>on</strong>g>the</str<strong>on</strong>g>se four probabilities as parameters. [DMDH04] describes<br />

two possible functi<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> joint probability distributi<strong>on</strong>. The first example is <str<strong>on</strong>g>the</str<strong>on</strong>g> Jaccard<br />

coefficient, where <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity measure is computed by dividing <str<strong>on</strong>g>the</str<strong>on</strong>g> probability<br />

that an instance is in <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> two c<strong>on</strong>cepts by <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that an instance<br />

is in <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts (P (A ∩ B)/P (A ∪ B)), which intuitively corresp<strong>on</strong>ds to<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> relevant instances, which are both in A and B. The sec<strong>on</strong>d example is <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

“most-specific-parent”, where <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity measure is positive (i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> measure is not


CHAPTER 4. THE SURVEY 51<br />

0) for any parent B <str<strong>on</strong>g>of</str<strong>on</strong>g> A and it is <str<strong>on</strong>g>the</str<strong>on</strong>g> highest for <str<strong>on</strong>g>the</str<strong>on</strong>g> most specific parent, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cept<br />

BMSP , which represents <str<strong>on</strong>g>the</str<strong>on</strong>g> smallest superset <str<strong>on</strong>g>of</str<strong>on</strong>g> A.<br />

The general architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> GLUE system is as follows:<br />

• The Distributi<strong>on</strong> Estimator takes as input <str<strong>on</strong>g>the</str<strong>on</strong>g> two tax<strong>on</strong>omies O1 and O2, toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g>ir instances and applies machine learning to compute <str<strong>on</strong>g>the</str<strong>on</strong>g> four aforementi<strong>on</strong>ed<br />

probabilities P (A, B), P (A, B), P (A, B) and P (A, B). Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g> distributi<strong>on</strong><br />

estimator uses a c<strong>on</strong>tent learner, which learns a classifier based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

textual c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> instances, and a name learner, which learns a classifier based<br />

<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> name <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> instance. It is possible to plug in different learners for different<br />

aspects using a meta-learner which uses a certain functi<strong>on</strong> to incorporate <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

predicti<strong>on</strong>s from all learners into an overall predicti<strong>on</strong>.<br />

• The Similarity Estimator applies a user supplied functi<strong>on</strong>, such as <str<strong>on</strong>g>the</str<strong>on</strong>g> menti<strong>on</strong>ed<br />

Jaccard coefficient or <str<strong>on</strong>g>the</str<strong>on</strong>g> most-specific-parent, and computes a similarity value for<br />

each pair <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts 〈A ∈ O1, B ∈ O2〉.<br />

• The Relaxati<strong>on</strong> Labeler takes as input <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity values for <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts from<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> tax<strong>on</strong>omies and searches for <str<strong>on</strong>g>the</str<strong>on</strong>g> best mapping c<strong>on</strong>figurati<strong>on</strong>, exploiting user<br />

supplied domain specific c<strong>on</strong>straints and heuristics.<br />

All in all, GLUE can be seen as an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator and can be<br />

fit into <str<strong>on</strong>g>the</str<strong>on</strong>g> overall mapping process as illustrated in Secti<strong>on</strong> 1.2.<br />

<strong>Ontology</strong> Languages The GLUE matcher uses two tax<strong>on</strong>omies, in which <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes<br />

corresp<strong>on</strong>d to c<strong>on</strong>cepts, and edges corresp<strong>on</strong>d to subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

Clearly, such a tax<strong>on</strong>omy can be easily extracted from an <strong>on</strong>tology represented in<br />

any <strong>on</strong>tology language, although a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology are not taken<br />

into account. This, though, is not such a big problem for <str<strong>on</strong>g>the</str<strong>on</strong>g> approach, since <str<strong>on</strong>g>the</str<strong>on</strong>g> matching<br />

is mostly based <strong>on</strong> instance informati<strong>on</strong>.<br />

Mapping Language The result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching d<strong>on</strong>e in GLUE is not a mapping between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r a set <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity measures, stating which c<strong>on</strong>cepts<br />

in <strong>on</strong>e <strong>on</strong>tology O1 are similar to c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology O2.<br />

Mapping Patterns Mapping patterns are not an issue in GLUE, since it is <strong>on</strong>ly c<strong>on</strong>cerned<br />

with discovering similarities between c<strong>on</strong>cepts based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir instances. GLUE<br />

could also not be used for matching patterns with an <strong>on</strong>tology, since a pattern does not<br />

have instances.


CHAPTER 4. THE SURVEY 52<br />

Automati<strong>on</strong> Support GLUE has a semi-automatic algorithm for specifying <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping<br />

between two <strong>on</strong>tologies. Ontologies are seen as tax<strong>on</strong>omies and <str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

matching is reduced to: “for each c<strong>on</strong>cept node in <strong>on</strong>e tax<strong>on</strong>omy, find <str<strong>on</strong>g>the</str<strong>on</strong>g> most similar<br />

node in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tax<strong>on</strong>omy”.<br />

The input from <str<strong>on</strong>g>the</str<strong>on</strong>g> user in <str<strong>on</strong>g>the</str<strong>on</strong>g> matching process c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong> to be used for<br />

computing <str<strong>on</strong>g>the</str<strong>on</strong>g> overall similarity value, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> joint distributi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts, and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> domain specific c<strong>on</strong>straints and heuristics, which are used for <str<strong>on</strong>g>the</str<strong>on</strong>g> relaxati<strong>on</strong> labeling<br />

process.<br />

GLUE takes a <strong>on</strong>e-shot approach at determining <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between tax<strong>on</strong>omies,<br />

which means that <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no user interacti<strong>on</strong> during <str<strong>on</strong>g>the</str<strong>on</strong>g> matching process. The user has to<br />

use <str<strong>on</strong>g>the</str<strong>on</strong>g> outcome <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching process as-is and use it as a basis for creating a mapping<br />

between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. In o<str<strong>on</strong>g>the</str<strong>on</strong>g>r words, GLUE implements <str<strong>on</strong>g>the</str<strong>on</strong>g> “find similarities” step in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process (Secti<strong>on</strong> 1.2), but does not provide support for <str<strong>on</strong>g>the</str<strong>on</strong>g> iterati<strong>on</strong> step.<br />

Applicability to Use Cases GLUE aids in creating mappings between <strong>on</strong>tologies in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

sense that it makes <str<strong>on</strong>g>the</str<strong>on</strong>g> work <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> human user easier by finding similarities between<br />

c<strong>on</strong>cepts in two <strong>on</strong>tologies based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir instances.<br />

Implementati<strong>on</strong> A prototypical implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> GLUE was created and <str<strong>on</strong>g>the</str<strong>on</strong>g> performance<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ents in <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture was evaluated. The main comp<strong>on</strong>ents<br />

to be evaluated were <str<strong>on</strong>g>the</str<strong>on</strong>g> different types <str<strong>on</strong>g>of</str<strong>on</strong>g> learners used for <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> and <str<strong>on</strong>g>the</str<strong>on</strong>g> relaxati<strong>on</strong><br />

labeler, which applied domain c<strong>on</strong>straints and heuristics in order to come up<br />

with better matches. It turned out that <str<strong>on</strong>g>the</str<strong>on</strong>g> combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> several combined classifiers toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

with domain heuristics can achieve significant performance enhancement in terms<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> accuracy, which can go up to 97% in some domains.<br />

Experiences [DMDH04] reports <strong>on</strong>ly <strong>on</strong> small evaluati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> performance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

system for tax<strong>on</strong>omies in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <str<strong>on</strong>g>of</str<strong>on</strong>g> (university) course catalogs and company pr<str<strong>on</strong>g>of</str<strong>on</strong>g>iles.<br />

The matching accuracy for <str<strong>on</strong>g>the</str<strong>on</strong>g>ir chosen examples was typically between 70 and 90<br />

percent. However, experiments <strong>on</strong> a broader scale need to be d<strong>on</strong>e to see if GLUE works<br />

in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r domains and to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> scalability <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approach.<br />

4.1.5 Semantic Matching<br />

Summary Semantic Matching [GS04] is an approach to matching classificati<strong>on</strong> hierarchies.<br />

The problem addressed by Semantic Matching is <str<strong>on</strong>g>the</str<strong>on</strong>g> following: say you have<br />

two different classificati<strong>on</strong> hierarchies, where each hierarchy is used to describe a set <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

documents, i.e. each term in <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> hierarchy describes a set <str<strong>on</strong>g>of</str<strong>on</strong>g> documents.<br />

How do <str<strong>on</strong>g>the</str<strong>on</strong>g> terms in <strong>on</strong>e hierarchy relate to <str<strong>on</strong>g>the</str<strong>on</strong>g> terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hierarchy?


CHAPTER 4. THE SURVEY 53<br />

Semantic Matching can also be seen as an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator for<br />

purely tree-structures <strong>on</strong>tologies. The authors define Match as follows: “Match is an<br />

operator that takes two graph-like structures (e.g. database schemas or <strong>on</strong>tologies) and<br />

produces a mapping between elements <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two graphs that corresp<strong>on</strong>d semantically to<br />

each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r”. This definiti<strong>on</strong> is similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> provided in Secti<strong>on</strong> 1.1. However,<br />

in Semantic Matching <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> is limited to <str<strong>on</strong>g>the</str<strong>on</strong>g> graph representati<strong>on</strong> format for<br />

<strong>on</strong>tologies. This distincti<strong>on</strong> is fundamental to <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Matching approach, since it<br />

performs matching based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes and <str<strong>on</strong>g>the</str<strong>on</strong>g> edges between <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes in a graph.<br />

Until now Semantic Matching has been mostly developed and tested for <str<strong>on</strong>g>the</str<strong>on</strong>g> task <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

matching classificati<strong>on</strong> hierarchies. Classificati<strong>on</strong> hierarchies are tree-structured graphs in<br />

which each node has <strong>on</strong>ly <strong>on</strong>e parent. A property <str<strong>on</strong>g>of</str<strong>on</strong>g> classificati<strong>on</strong> hierarchies is that <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

is <strong>on</strong>ly <strong>on</strong>e type <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ship, which is a more-specific-term relati<strong>on</strong> which subsumes<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ship. It is currently not clear if and exactly how Semantic Matching<br />

can be applied to <str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology matching, because most <strong>on</strong>tologies typically<br />

have different types <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships between c<strong>on</strong>cepts and <str<strong>on</strong>g>the</str<strong>on</strong>g> subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ship in<br />

<strong>on</strong>tologies is a formal relati<strong>on</strong>ship, interpreted <str<strong>on</strong>g>of</str<strong>on</strong>g>ten as a strict logical implicati<strong>on</strong> or a<br />

subset relati<strong>on</strong>ship (as is <str<strong>on</strong>g>the</str<strong>on</strong>g> case for <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> Descripti<strong>on</strong> Logics).<br />

Of course, an <strong>on</strong>tology can usually be rewritten as a graph with labelled edges, although<br />

some informati<strong>on</strong> (e.g. axioms and higher-order relati<strong>on</strong>s) might be lost in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

rewriting. C<strong>on</strong>cepts could be <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes and relati<strong>on</strong>ships between c<strong>on</strong>cepts could be <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

(labelled) edges; <str<strong>on</strong>g>the</str<strong>on</strong>g> label <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> edge would denote <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship. This is<br />

similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> labelled graphs used in ONION (see Secti<strong>on</strong> 4.2.2). There is currently work<br />

underway to incorporate <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Matching algorithm,<br />

but this work is still in <str<strong>on</strong>g>the</str<strong>on</strong>g> early stages.<br />

The authors <str<strong>on</strong>g>of</str<strong>on</strong>g> [GS04] have argued that almost all earlier approaches to schema and<br />

<strong>on</strong>tology matching have been syntactic matching approaches, as opposed to semantic<br />

matching. In syntactic matching, <str<strong>on</strong>g>the</str<strong>on</strong>g> labels and sometimes <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactical structure <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> graph is matched and typically some similarity coefficient [0, 1] is obtained, which<br />

indicates <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity between <str<strong>on</strong>g>the</str<strong>on</strong>g> two nodes. Semantic Matching computes a set-based<br />

relati<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes, taking into account <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> each node. The possible<br />

relati<strong>on</strong>s returned by <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Matching algorithm are equality (=), overlap (∩),<br />

mismatch (⊥), more general (⊆) or more specific (⊇). The corresp<strong>on</strong>dence <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> symbols<br />

with set <str<strong>on</strong>g>the</str<strong>on</strong>g>ory is not a coincidence, since each c<strong>on</strong>cept in <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> hierarchies<br />

represents a set <str<strong>on</strong>g>of</str<strong>on</strong>g> documents.<br />

We will now briefly sketch <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Matching (also S-Match) algorithm for graph<br />

matching.<br />

Two levels <str<strong>on</strong>g>of</str<strong>on</strong>g> granularity for matching are distinguished in S-Match, namely elementlevel<br />

matching and structure-level matching. At <str<strong>on</strong>g>the</str<strong>on</strong>g> element level, which is c<strong>on</strong>cerned<br />

with individual nodes, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors distinguish techniques with weak semantics and techniques<br />

with str<strong>on</strong>g semantics. Techniques with weak semantics corresp<strong>on</strong>d to <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactic<br />

matching which has been proposed in most previous literature (for an overview


CHAPTER 4. THE SURVEY 54<br />

see [RB01]). Element-level matching with str<strong>on</strong>g semantics is d<strong>on</strong>e using <str<strong>on</strong>g>the</str<strong>on</strong>g>sauri (e.g.<br />

WordNet [Fel99]), which typically c<strong>on</strong>tain syn<strong>on</strong>ym and hypernym relati<strong>on</strong>s between<br />

terms. These relati<strong>on</strong>s can be used to find semantic relati<strong>on</strong>s between nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> graphs.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> next phase, <str<strong>on</strong>g>the</str<strong>on</strong>g> structure-level matching, <str<strong>on</strong>g>the</str<strong>on</strong>g> matching problem, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> two<br />

graphs toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping query are translated into a propositi<strong>on</strong>al formula and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>n checked for validity (i.e. satisfiability). A mapping query is a pair <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes and a<br />

semantic relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> pair <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes. If <str<strong>on</strong>g>the</str<strong>on</strong>g> propositi<strong>on</strong>al sentence is valid, we<br />

know that <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> two nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> query holds and thus<br />

can be added to <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping result.<br />

A potential problem with this algorithm is that <str<strong>on</strong>g>the</str<strong>on</strong>g> propositi<strong>on</strong>al satisfiability check<br />

(which is known to have n<strong>on</strong>deterministic polynomial complexity) has to be performed<br />

for every pair <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes from <str<strong>on</strong>g>the</str<strong>on</strong>g> two graphs. Clearly, this does not scale for large graphs.<br />

<strong>Ontology</strong> Languages Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic matching can work with classificati<strong>on</strong> hierarchies,<br />

but also directed acyclic graphs (DAGs) in general. Classificati<strong>on</strong> hierarchies<br />

can <str<strong>on</strong>g>of</str<strong>on</strong>g>ten be extracted from <strong>on</strong>tologies by treating classes in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology as nodes and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships as edges, but all o<str<strong>on</strong>g>the</str<strong>on</strong>g>r relati<strong>on</strong>ships are lost in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong>.<br />

Note that multiple classificati<strong>on</strong> hierarchies can be extracted from an <strong>on</strong>tology in<br />

which c<strong>on</strong>cepts have more than <strong>on</strong>e direct superclass. This <str<strong>on</strong>g>of</str<strong>on</strong>g> course does not rule out<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm as <str<strong>on</strong>g>the</str<strong>on</strong>g> input to a mapping process for <str<strong>on</strong>g>the</str<strong>on</strong>g> complete<br />

<strong>on</strong>tologies. Also, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is work underway to extend <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic matching to work with<br />

labeled graphs, taking <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> different relati<strong>on</strong>ships into account.<br />

This does not mean that <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm in its current form is useless, <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>trary.<br />

There are currently many classificati<strong>on</strong> schemes around, such as dmoz 9 , Yahoo 10 , and<br />

many o<str<strong>on</strong>g>the</str<strong>on</strong>g>r (specialized) classificati<strong>on</strong> hierarchies are in use. However, for arbitrary <strong>on</strong>tology<br />

matching <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, it has not been shown that <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm performs<br />

well. That <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm works well has been shown <strong>on</strong>ly for <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> classificati<strong>on</strong><br />

hierarchies with more-specific-term relati<strong>on</strong>ships. Ontologies typically implement both<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> formal subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong> and many o<str<strong>on</strong>g>the</str<strong>on</strong>g>r types <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships.<br />

Mapping Language S-Match is a matching algorithm and as such does not have a<br />

language for <str<strong>on</strong>g>the</str<strong>on</strong>g> actual specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings, <strong>on</strong>ly for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

similarities, although in this case <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities comes close to a real<br />

mapping specificati<strong>on</strong>.<br />

As we have menti<strong>on</strong>ed earlier, <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts is d<strong>on</strong>e<br />

using set-based primitives, denoting <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships <str<strong>on</strong>g>of</str<strong>on</strong>g> equality, disjointness, overlap<br />

and sub/superset. In later work (e.g. [GSY04]), <str<strong>on</strong>g>the</str<strong>on</strong>g> authors use <str<strong>on</strong>g>the</str<strong>on</strong>g> symbols comm<strong>on</strong>ly<br />

found in descripti<strong>on</strong> logics, i.e. 〈A, B, ⊓〉 for overlap, 〈A, B, ⊑〉 for subset, 〈A, B, ⊒〉<br />

9 http://www.dmoz.org/<br />

10 http://www.yahoo.com/


CHAPTER 4. THE SURVEY 55<br />

for superset and 〈A, B, ⊥〉 for disjointness <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts. These relati<strong>on</strong>s could be<br />

translated to Descripti<strong>on</strong> Logic [BCM + 03] axioms, i.e. A ⊓ B �≡ ⊥, A ⊑ B, B ⊑ A, and<br />

A ⊓ B ≡ ⊥.<br />

Mapping Patterns Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no use <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns in Semantic Matching.<br />

It might be worthwhile to see if mapping patterns can help to find similarities, although<br />

this is not a straightforward task. Perhaps it is possible to match <strong>on</strong>tologies against<br />

mapping patterns in order to find out if a certain mapping pattern might be applicable, but<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> authors do not give any hints as to if and how we can incorporate mapping patterns<br />

into <str<strong>on</strong>g>the</str<strong>on</strong>g> matching algorithm.<br />

Automati<strong>on</strong> Support Clearly, <str<strong>on</strong>g>the</str<strong>on</strong>g> proposed algorithm is an automatic <strong>on</strong>e-pass (i.e.<br />

no user interacti<strong>on</strong>) algorithm, which returns all similarities it can find between <str<strong>on</strong>g>the</str<strong>on</strong>g> two<br />

graphs. There is no user interacti<strong>on</strong> during <str<strong>on</strong>g>the</str<strong>on</strong>g> executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching.<br />

It cannot be assumed that <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping returned by <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm is ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r correct<br />

or complete. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> S-Match algorithm can serve as a first step in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> overall <strong>on</strong>tology mapping process. It can serve as <str<strong>on</strong>g>the</str<strong>on</strong>g> input for <str<strong>on</strong>g>the</str<strong>on</strong>g> next phase in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process, in which <str<strong>on</strong>g>the</str<strong>on</strong>g> user validates <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching and corrects<br />

any mistakes and does <str<strong>on</strong>g>the</str<strong>on</strong>g> necessary additi<strong>on</strong>s in order to make <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping correct and<br />

complete 11 .<br />

Applicability to Use Cases For <str<strong>on</strong>g>the</str<strong>on</strong>g> purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />

Web, <str<strong>on</strong>g>the</str<strong>on</strong>g> role which can be played by S-Match could be in <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery phase <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

mappings between <strong>on</strong>tologies. Since S-Match provides an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Match<br />

operator, it fits into <str<strong>on</strong>g>the</str<strong>on</strong>g> “find similarities” step in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process.<br />

Implementati<strong>on</strong> [GSY04] presents S-Match, an algorithm and implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Semantic<br />

Matching. It also compares <str<strong>on</strong>g>the</str<strong>on</strong>g> performance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> S-Match implementati<strong>on</strong><br />

in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> speed, precisi<strong>on</strong> and recall with available implementati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> existing approaches<br />

in syntactic matchings COMA [DR02], Cupid [MBR01] and Similarity Flooding<br />

[MGMR02], which was implemented in <str<strong>on</strong>g>the</str<strong>on</strong>g> RONDO system [MRB03].<br />

It turned out that for most applicati<strong>on</strong>s, S-Match outperformed <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r systems in<br />

terms <str<strong>on</strong>g>of</str<strong>on</strong>g> precisi<strong>on</strong> and recall. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r systems typically outperformed S-Match<br />

in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> time required to perform <str<strong>on</strong>g>the</str<strong>on</strong>g> actual matching. One possible explanati<strong>on</strong> is that<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> S-Match implementati<strong>on</strong> has not really been optimized. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> S-Match implementati<strong>on</strong><br />

uses a propositi<strong>on</strong>al SAT solver, which can not be efficiently implemented,<br />

because <str<strong>on</strong>g>the</str<strong>on</strong>g> problem is known to be NP-Hard. Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are no known algorithms<br />

that require less than exp<strong>on</strong>ential time for satisfiability checking.<br />

11 Of course, it can also never be guaranteed that <str<strong>on</strong>g>the</str<strong>on</strong>g> outcome <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> human mapping will be ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r correct<br />

or complete.


CHAPTER 4. THE SURVEY 56<br />

Experiences Semantic matching has so far <strong>on</strong>ly been tested with some toy examples.<br />

However, <str<strong>on</strong>g>the</str<strong>on</strong>g> results presented in [GSY04] do look promising with respect to <str<strong>on</strong>g>the</str<strong>on</strong>g> precisi<strong>on</strong><br />

and recall achieved by <str<strong>on</strong>g>the</str<strong>on</strong>g> system compared to o<str<strong>on</strong>g>the</str<strong>on</strong>g>r existing matchers. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, S-<br />

Match is currently in <str<strong>on</strong>g>the</str<strong>on</strong>g> early stages <str<strong>on</strong>g>of</str<strong>on</strong>g> its development; <str<strong>on</strong>g>the</str<strong>on</strong>g>re are plans to apply S-Match<br />

in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r settings, which will show whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r S-Match works for real-world problems <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Semantic Web.<br />

4.1.6 OntoMap<br />

OntoMap ([KSD01a]) is a knowledge representati<strong>on</strong> formalism, reas<strong>on</strong>er, and web portal<br />

12 for upper-level <strong>on</strong>tologies and lexical semantics. The project was developed by Ontotext<br />

Lab. in cooperati<strong>on</strong> with <str<strong>on</strong>g>the</str<strong>on</strong>g> Bulgarian Academy <str<strong>on</strong>g>of</str<strong>on</strong>g> Sciences. Note that <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>Ontology</strong><br />

Mapping system used in SEKT is also named OntoMap but is mainly developed by <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

company Ontoprise and is a completely different system. The portal provides access to<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> most popular upper-level <strong>on</strong>tologies and lexical resources, toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with hand-crafted<br />

mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g>m. It facilitates <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> and comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> upper-level <strong>on</strong>tologies<br />

and lexical knowledge bases. The portal is based <strong>on</strong> a unified representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> resources, a proprietary inference engine, and a mapping methodology. It includes a<br />

number <str<strong>on</strong>g>of</str<strong>on</strong>g> alternative viewers: HTML, DHTML, a stand-al<strong>on</strong>e GUI applicati<strong>on</strong>.<br />

In order to provide a uniform representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>m, OntoMap introduces a relatively simple meta-<strong>on</strong>tology called OntoMapO.<br />

The knowledge representati<strong>on</strong> language is more complex than RDF(S) and similar to<br />

OWL Lite − [dBPF04], but it also includes specific primitives for <strong>on</strong>tology-mapping.<br />

The following upper-level <strong>on</strong>tologies are hosted:<br />

• Upper Cyc <strong>Ontology</strong><br />

• EuroWordnet Top <strong>Ontology</strong><br />

• EuroWordnet Meta-<strong>Ontology</strong><br />

• WordNet Meta-<strong>Ontology</strong><br />

• WordNet Tops (<str<strong>on</strong>g>the</str<strong>on</strong>g> top 41 classes)<br />

• MikroKosmos Top (<str<strong>on</strong>g>the</str<strong>on</strong>g> top 13 classes)<br />

• OntoMap Meta-<strong>Ontology</strong><br />

• Protege Meta-<strong>Ontology</strong><br />

• Simple <strong>Ontology</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Business Entities<br />

12 http://www.<strong>on</strong>totext.com/projects/OntoMap.html


CHAPTER 4. THE SURVEY 57<br />

• SENSUS Top (<str<strong>on</strong>g>the</str<strong>on</strong>g> top 257 classes)<br />

Mappings between EuroWordnet Top and <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies were created. There are<br />

almost no direct mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies, but <str<strong>on</strong>g>the</str<strong>on</strong>g> equivalence and subsumpti<strong>on</strong><br />

relati<strong>on</strong>s are automatically propagated through <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping to EuroWordnet<br />

Top.<br />

Mapping Language The full descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> OntoMapO could be found in<br />

[KSD01b]. Here we present just its mapping primitives, as follows:<br />

• MuchMoreSpecific - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is much more specific than <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d <strong>on</strong>e;<br />

transitive relati<strong>on</strong>. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> MuchMoreGeneral and a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> ChildOf;<br />

• MuchMoreGeneral - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is much more general than <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d <strong>on</strong>e;<br />

transitive relati<strong>on</strong>. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> MuchMoreSpecific and a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> ParentOf;<br />

• TopInstance - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is <str<strong>on</strong>g>the</str<strong>on</strong>g> most general instance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d <strong>on</strong>e, which<br />

is a meta-c<strong>on</strong>cept. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> ExactClass and a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> InstanceOf;<br />

• ExactClass - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is a meta-c<strong>on</strong>cept, <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d c<strong>on</strong>cept is <str<strong>on</strong>g>the</str<strong>on</strong>g> most<br />

general instance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first <strong>on</strong>e. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> TopInstance and a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

ClassOf;<br />

• ParentAsInstance - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is more general than all <str<strong>on</strong>g>the</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

sec<strong>on</strong>d <strong>on</strong>e which is a meta-c<strong>on</strong>cept. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> ChildAsClass;<br />

• ChildAsClass - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is a meta-c<strong>on</strong>cept (class), all its instances are more<br />

specific than <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d c<strong>on</strong>cept. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> ParentAsInstance.<br />

Automati<strong>on</strong> support OntoMap does not automatically create mappings. It assumes<br />

that ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r a mapping exists or it may be created manually. Although it may seem that<br />

automatic mapping may reduce <str<strong>on</strong>g>the</str<strong>on</strong>g> efforts, in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> upper-level <strong>on</strong>tologies <str<strong>on</strong>g>the</str<strong>on</strong>g> typical<br />

heuristics involved for domain <strong>on</strong>tologies can play a very limited role. This is explained<br />

in detail in [KSD01b]. Once a mapping to <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies it supports is created,<br />

OntoMap could automatically create a mapping to any <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies.<br />

Applicability to use cases OntoMap could be used for (semi-)automatic creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<strong>Ontology</strong> Mappings between o<str<strong>on</strong>g>the</str<strong>on</strong>g>r domain <strong>on</strong>tologies and existing <strong>on</strong>es, but it requires<br />

that a mapping exists to <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> supported <strong>on</strong>tologies. The different upper-level <strong>on</strong>tologies<br />

are suited for different purposes, thus, a domain <strong>on</strong>tology may naturally map to <strong>on</strong>e<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se, and <str<strong>on</strong>g>the</str<strong>on</strong>g>n OntoMap will automatically provide a mapping to <str<strong>on</strong>g>the</str<strong>on</strong>g> rest. Although<br />

OntoMap does not directly address <str<strong>on</strong>g>the</str<strong>on</strong>g> use-case <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings<br />

it creates could be used for such tasks. It is important to menti<strong>on</strong> that OntoMap handles


CHAPTER 4. THE SURVEY 58<br />

classes and instances in an uniform fashi<strong>on</strong> and thus could transform instances to classes<br />

and vice-versa (via ParentAsInstance and ChildAsClass mapping primitives).<br />

Tool support<br />

• The OntoMap web portal (http://<strong>on</strong>tomap.<strong>on</strong>totext.com) requires <str<strong>on</strong>g>the</str<strong>on</strong>g> users to register<br />

(it is free) and <str<strong>on</strong>g>the</str<strong>on</strong>g>n it allows <str<strong>on</strong>g>the</str<strong>on</strong>g> browsing <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies via a handy DHTML<br />

Tree View. The search for c<strong>on</strong>cepts throughout <strong>on</strong>e or more <strong>on</strong>tologies is also supported.<br />

The portal allows <str<strong>on</strong>g>the</str<strong>on</strong>g> export <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies to DAML+OIL.<br />

• CYC to EWN-Top mapping. An <strong>on</strong>line service, hosted at http://demo.<strong>on</strong>totext.com,<br />

allows <str<strong>on</strong>g>the</str<strong>on</strong>g> browsing <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EuroWordnet Top <strong>on</strong>tology and its mapping into Upper<br />

Cyc <strong>Ontology</strong>. The corresp<strong>on</strong>ding Cyc c<strong>on</strong>cepts are represented with: <str<strong>on</strong>g>the</str<strong>on</strong>g>ir glosses,<br />

direct and indirect super-classes (#$genls), direct and indirect classes (#$isa). The<br />

mapping itself is expressed in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> a CycL micro<str<strong>on</strong>g>the</str<strong>on</strong>g>ory encoding <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EuroWordNet<br />

Top <strong>Ontology</strong> <strong>on</strong> top <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> publicly available p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Cyc knowledge<br />

base. This approach was chosen because such a mapping is impossible by means <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

equivalence and subsumpti<strong>on</strong> relati<strong>on</strong>s <strong>on</strong>ly. However, a simplified relati<strong>on</strong>al view<br />

that is sufficient for many purposes, is also provided. More <str<strong>on</strong>g>the</str<strong>on</strong>g>oretical details can<br />

be found in [KS00].<br />

• The OntoMap Viewer is a standal<strong>on</strong>e java applicati<strong>on</strong>, which represents<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> main functi<strong>on</strong>ality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> OntoMap web portal. OntoMap Viewer is<br />

distributed for all popular platforms: Windows, Linux, Solaris, MacOS<br />

(http://www.<strong>on</strong>totext.com/projects/OntoMapViewer/install.htm). All <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> previously<br />

menti<strong>on</strong>ed <strong>on</strong>tologies are encoded into OntoMapO language. The viewer<br />

allows <str<strong>on</strong>g>the</str<strong>on</strong>g> browsing and searching by c<strong>on</strong>cepts from any <strong>on</strong>tology. An example,<br />

shown in Figure 4.9, illustrates <str<strong>on</strong>g>the</str<strong>on</strong>g> supported mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> upper-level <strong>on</strong>tologies.<br />

The user chooses an <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g>n selects a c<strong>on</strong>cept from it, e.g. Pers<strong>on</strong><br />

from SENSUS Top <strong>on</strong>tology. Then <str<strong>on</strong>g>the</str<strong>on</strong>g> viewer shows any equivalence, superand<br />

sub-c<strong>on</strong>cepts from all <strong>on</strong>tologies including <str<strong>on</strong>g>the</str<strong>on</strong>g> current <strong>on</strong>e, but also <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs,<br />

if <str<strong>on</strong>g>the</str<strong>on</strong>g>re are equivalence and/or subsumpti<strong>on</strong> relati<strong>on</strong>s (in this case <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cept is<br />

equivalent to Human from EuroWordnet Top and Pers<strong>on</strong> from UpperCyc).<br />

Summary OntoMap provides a mapping model for upper-level <strong>on</strong>tologies, and a few<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> most popular <strong>on</strong>es are encoded in it. Using <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping to <str<strong>on</strong>g>the</str<strong>on</strong>g> EuroWordnet Top<br />

<strong>on</strong>tology and a reas<strong>on</strong>er to support <str<strong>on</strong>g>the</str<strong>on</strong>g> knowledge representati<strong>on</strong> language, a mapping<br />

between all <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies is available. Thus, a new mapping from a domain <strong>on</strong>tology<br />

to <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> supported upper-level <strong>on</strong>tologies could be automatically mapped to each <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>es. However, OntoMap is focused <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> and <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, which are encoded into OntoMapO, ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than <strong>on</strong> <strong>Ontology</strong> mapping or<br />

instance transformati<strong>on</strong> services.


CHAPTER 4. THE SURVEY 59<br />

Figure 4.9: OntoMap Viewer - c<strong>on</strong>cept “Pers<strong>on</strong>” in different upper-level <strong>on</strong>tologies


CHAPTER 4. THE SURVEY 60<br />

4.1.7 RDFDiff<br />

RDFDiff (formerly known as OntoView [KKOF02]) is an algorithm and a tool, developed<br />

jointly by VU Amsterdam and Ontotext Lab. It aims to detect and depict changes<br />

in RDF-encoded knowledge (such as <strong>on</strong>tologies encoded in RDF(S), DAML+OIL, and<br />

OWL). Basically, it is very much inspired by <str<strong>on</strong>g>the</str<strong>on</strong>g> CVS diff and <str<strong>on</strong>g>the</str<strong>on</strong>g> UNIX diff. It has been<br />

tuned for RDF(S) and compares <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies <strong>on</strong> a structural level, i.e. ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> data<br />

model <str<strong>on</strong>g>of</str<strong>on</strong>g> different RDF(S) <strong>on</strong>tologies than <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactical encoding. RDFDiff utilizes<br />

change-classificati<strong>on</strong> rules, which are intended to serve as a basis for fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r semantic<br />

and structural analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> differences. Such rules can be classified as:<br />

• robust categorizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> changes<br />

• evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> compatibility between <str<strong>on</strong>g>the</str<strong>on</strong>g> versi<strong>on</strong>s<br />

• data-transformati<strong>on</strong> or mapping<br />

The change detecti<strong>on</strong> in RDFDiff includes:<br />

• Matching an<strong>on</strong>ymous resources and <str<strong>on</strong>g>the</str<strong>on</strong>g>ir descripti<strong>on</strong>s<br />

• Detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> renamed resources, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resource<br />

• Detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> renamed resources, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> usage <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resource (detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

sub-graphs, which do not change, given that <str<strong>on</strong>g>the</str<strong>on</strong>g> old ID/URI has been substituted<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> new <strong>on</strong>e)<br />

The tool resembles PROMPTDiff [NM03a] in <str<strong>on</strong>g>the</str<strong>on</strong>g> following characteristics:<br />

• change detecti<strong>on</strong> based <strong>on</strong> graphs<br />

• an extensible set <str<strong>on</strong>g>of</str<strong>on</strong>g> rules to classify a change (called “heuristics” in both PROMPT-<br />

Diff and RDFDiff)<br />

It differs from PROMPTDiff in being able to compare XML-serializati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF(S)<br />

<strong>on</strong>tology as well. RDFDiff detects changes mainly using <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF-graph model, but it<br />

presents <str<strong>on</strong>g>the</str<strong>on</strong>g> results in a diff-like way (textual), while trying to preserve <str<strong>on</strong>g>the</str<strong>on</strong>g> order <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

statements in <str<strong>on</strong>g>the</str<strong>on</strong>g> XML serialized files. It is focused <strong>on</strong> RDF(S) and thus it handles its<br />

specifics well (e.g. an<strong>on</strong>ymous resource matching). However, <str<strong>on</strong>g>the</str<strong>on</strong>g> rules model <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> logical language (i.e. Descripti<strong>on</strong> Logics for DAML+OIL and OWL).


CHAPTER 4. THE SURVEY 61<br />

RDFDiff overview Essentially, <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm takes as input two RDF-encoded files,<br />

compares <str<strong>on</strong>g>the</str<strong>on</strong>g>m, and produces a list <str<strong>on</strong>g>of</str<strong>on</strong>g> changes, organized into added, changed and deleted<br />

items.<br />

It is important to menti<strong>on</strong> that RDFDiff does not implement much semantics. The<br />

rules may implement <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data model <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF-encoded <strong>on</strong>tology but<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y need not to be used. RDFDiff does not need to be run <strong>on</strong> two <strong>on</strong>tologies (it could<br />

compare any arbitrary RDF-graphs) but it was developed to aid <strong>on</strong>tology comparis<strong>on</strong>.<br />

In its first versi<strong>on</strong> RDFDiff was perceived as a diff for an old and a new versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an<br />

<strong>on</strong>tology, and thus <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm, as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> user interface <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

tool implementing it, refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies being compared as ”old” and ”new”.<br />

However, except for <str<strong>on</strong>g>the</str<strong>on</strong>g> visualizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> results, RDFDiff does not depend <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> fact<br />

that <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF graphs are two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same graph and <str<strong>on</strong>g>the</str<strong>on</strong>g>refore it could well be<br />

used for any two <strong>on</strong>tologies.<br />

The algorithm treats an RDF-encoded <strong>on</strong>tology as a sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> resource definiti<strong>on</strong>s,<br />

where a definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> resource R is assumed to be <str<strong>on</strong>g>the</str<strong>on</strong>g> list <str<strong>on</strong>g>of</str<strong>on</strong>g> all statements where R is<br />

a subject. This grouping <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> statements into resource definiti<strong>on</strong>s is opti<strong>on</strong>al, because<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> user might require <str<strong>on</strong>g>the</str<strong>on</strong>g> original grouping from <str<strong>on</strong>g>the</str<strong>on</strong>g> source file. In <str<strong>on</strong>g>the</str<strong>on</strong>g> case where <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

statements are not grouped into ”resource definiti<strong>on</strong>s”, <str<strong>on</strong>g>the</str<strong>on</strong>g>y will be handled as grouped in<br />

first level XML elements. The resource definiti<strong>on</strong>s are ordered according to <str<strong>on</strong>g>the</str<strong>on</strong>g> new file,<br />

c<strong>on</strong>sidering <str<strong>on</strong>g>the</str<strong>on</strong>g> positi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first statement from <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong>.<br />

<str<strong>on</strong>g>State</str<strong>on</strong>g>ments are c<strong>on</strong>sidered to be changed if and <strong>on</strong>ly if <str<strong>on</strong>g>the</str<strong>on</strong>g> subject and <str<strong>on</strong>g>the</str<strong>on</strong>g> predicate in<br />

each versi<strong>on</strong>s match unambiguously, but <str<strong>on</strong>g>the</str<strong>on</strong>g> values might be different, i.e. when <str<strong>on</strong>g>the</str<strong>on</strong>g>re is<br />

a single statement with such a subject and such a predicate in both versi<strong>on</strong>s. Because <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

resource definiti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first file are reordered according to <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d file, definiti<strong>on</strong>s<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> same terms that occur in different places in <str<strong>on</strong>g>the</str<strong>on</strong>g> two files can be found.<br />

Mapping language The change-classificati<strong>on</strong> rules are defined via a simple RDFS<br />

schema, which c<strong>on</strong>tains two classes and a few properties. The classes are Rule and Triplet.<br />

Each rule is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> triplets defining relati<strong>on</strong>ships between some triples via comm<strong>on</strong> variables<br />

used in place <str<strong>on</strong>g>of</str<strong>on</strong>g> ’subject’, ’predicate’ or ’object’. The main task is to find all <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

possible soluti<strong>on</strong>s (read possible bindings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> variables used in <str<strong>on</strong>g>the</str<strong>on</strong>g> triplets) where <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

exist triples that match <str<strong>on</strong>g>the</str<strong>on</strong>g> patterns <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> whole set <str<strong>on</strong>g>of</str<strong>on</strong>g> triplets. Then all soluti<strong>on</strong>s found<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> older versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resource definiti<strong>on</strong> are compared against those found in its<br />

newer versi<strong>on</strong>, and <str<strong>on</strong>g>the</str<strong>on</strong>g> equal pairs are removed.<br />

Each instance <str<strong>on</strong>g>of</str<strong>on</strong>g> a Rule class can be c<strong>on</strong>nected with several instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Triplet<br />

by <str<strong>on</strong>g>the</str<strong>on</strong>g> property use, which is defined as:<br />

<br />

These are <str<strong>on</strong>g>the</str<strong>on</strong>g> properties that indicate which variable is to be observed for possible<br />

changes and what label should be added, removed or changed:


CHAPTER 4. THE SURVEY 62<br />

<br />

The instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Triplet class can be c<strong>on</strong>nected with predicates specifying a subject,<br />

a predicate and an object <str<strong>on</strong>g>of</str<strong>on</strong>g> a triple for matching. Each value <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se predicates,<br />

st<str<strong>on</strong>g>art</str<strong>on</strong>g>ing with ”$”, can be treated as a name <str<strong>on</strong>g>of</str<strong>on</strong>g> a variable, and <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs as resource URIs<br />

or literals.<br />

<br />

The soluti<strong>on</strong>s are compared by <str<strong>on</strong>g>the</str<strong>on</strong>g> values bound to <str<strong>on</strong>g>the</str<strong>on</strong>g> used variables: if a value is a<br />

resource, by its URI, if it is a literal, lexically, and in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> an an<strong>on</strong>ymous resource,<br />

by <str<strong>on</strong>g>the</str<strong>on</strong>g> object value <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> triple with a predicate ”daml:<strong>on</strong>Property”. If <str<strong>on</strong>g>the</str<strong>on</strong>g>re is at least<br />

<strong>on</strong>e such soluti<strong>on</strong> in any <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> versi<strong>on</strong>s, <str<strong>on</strong>g>the</str<strong>on</strong>g> rule is applicable and its ‘label’ is added.<br />

If <str<strong>on</strong>g>the</str<strong>on</strong>g>re are soluti<strong>on</strong>s which <strong>on</strong>ly differ by <str<strong>on</strong>g>the</str<strong>on</strong>g> value <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ’checkVar’ variable: <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

generated ’label’ is <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e prepared by <str<strong>on</strong>g>the</str<strong>on</strong>g> literal, c<strong>on</strong>nected with a ’rule:<strong>on</strong>Changed’<br />

property to that rule instance. If <str<strong>on</strong>g>the</str<strong>on</strong>g>re are no soluti<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> older versi<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g>n for<br />

each ’extra’ soluti<strong>on</strong> that is left unmatched in <str<strong>on</strong>g>the</str<strong>on</strong>g> newer versi<strong>on</strong>, a ’label’ is generated<br />

from <str<strong>on</strong>g>the</str<strong>on</strong>g> literal, c<strong>on</strong>nected with a ’rule:<strong>on</strong>Added’ property to <str<strong>on</strong>g>the</str<strong>on</strong>g> rule. And in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

unmatched soluti<strong>on</strong>s left from <str<strong>on</strong>g>the</str<strong>on</strong>g> older versi<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> rule label is generated with <str<strong>on</strong>g>the</str<strong>on</strong>g> literal<br />

c<strong>on</strong>nected to <str<strong>on</strong>g>the</str<strong>on</strong>g> rule with a rule:<strong>on</strong>Removed property.<br />

An example rule to detect <str<strong>on</strong>g>the</str<strong>on</strong>g> change <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> parent class follows:<br />

<br />

<br />

<br />

$X<br />

<br />

$Z<br />

<br />

<br />

$Z<br />

subClassOf.changed to $Z<br />

subClassOf.added to $Z<br />

subClassOf.removed to $Z<br />

<br />

Applicati<strong>on</strong> to use-cases When applied to two <strong>on</strong>tologies, RDFDiff will find structural<br />

similarities. It does not utilize semantic (S-Match) or linguistic (PROMPT) similarities.


CHAPTER 4. THE SURVEY 63<br />

The change-classificati<strong>on</strong> rules could be a powerful declarative syntax for <str<strong>on</strong>g>the</str<strong>on</strong>g> automatic<br />

detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> changes or for <str<strong>on</strong>g>the</str<strong>on</strong>g> automatic creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a mapping between two <strong>on</strong>tologies.<br />

Instance transformati<strong>on</strong> is also possible, but all <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong>s require extensi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> allowed acti<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> rules.<br />

Tool support The RDFDiff tool is web-based. It implements <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFDiff algorithm,<br />

and in additi<strong>on</strong> it allows <strong>on</strong>e to specify some handy compare opti<strong>on</strong>s:<br />

• pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> namespaces treated as equal;<br />

• an ignore-list <str<strong>on</strong>g>of</str<strong>on</strong>g> properties (e.g. ignoring rdf:Comment).<br />

Figure 4.10: RDFDiff - an example<br />

Summary RDFDiff is a diff-like tool oriented to <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> XML-serialized<br />

RDF(S) graphs. It could be used for change detecti<strong>on</strong> between two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology,<br />

or for a comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> two arbitrary <strong>on</strong>tologies. Although not directly suited for<br />

mediati<strong>on</strong> use-cases, its change classificati<strong>on</strong> rules allow for applicati<strong>on</strong>s for automatic<br />

creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings or instance transformati<strong>on</strong>.<br />

4.1.8 OntoMerge<br />

Dou et al.[DMQ02] introduce an approach to <strong>on</strong>tology mediati<strong>on</strong> “<strong>on</strong>tology translati<strong>on</strong> by<br />

<strong>on</strong>tology merging and automated reas<strong>on</strong>ing”. In this approach, <strong>on</strong>tologies are merged by


CHAPTER 4. THE SURVEY 64<br />

taking <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> both <strong>on</strong>tologies, where all terms are separated through <str<strong>on</strong>g>the</str<strong>on</strong>g> differences<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> namespace. So-called Bridging Axioms are used to c<strong>on</strong>nect <str<strong>on</strong>g>the</str<strong>on</strong>g> overlapping p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies.<br />

In general, when merging <strong>on</strong>tologies, <strong>on</strong>e would ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r create a new namespace for<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology or import <strong>on</strong>e <strong>on</strong>tology into <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r, so that <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology<br />

uses <str<strong>on</strong>g>the</str<strong>on</strong>g> namespace <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> importing <strong>on</strong>tology. Having in <str<strong>on</strong>g>the</str<strong>on</strong>g> end an <strong>on</strong>tology which<br />

uses different namespaces in its definiti<strong>on</strong>s can be very c<strong>on</strong>fusing for <str<strong>on</strong>g>the</str<strong>on</strong>g> user, since an<br />

<strong>on</strong>tology is intended to be shared am<strong>on</strong>g multiple p<str<strong>on</strong>g>art</str<strong>on</strong>g>ies. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> bridging<br />

axioms in <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology might also be very c<strong>on</strong>fusing for <str<strong>on</strong>g>the</str<strong>on</strong>g> user, since <str<strong>on</strong>g>the</str<strong>on</strong>g>y serve<br />

no o<str<strong>on</strong>g>the</str<strong>on</strong>g>r purpose than linking toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r related terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. Thus, <str<strong>on</strong>g>the</str<strong>on</strong>g> merged<br />

<strong>on</strong>tology c<strong>on</strong>tains a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> clutter, which makes <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology hard to understand and hard<br />

to use. The clutter in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g>: (1) terms with different namespaces, (2)<br />

similar and equivalent terms exist in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology and (3) bridging axioms between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

related terms. These three factors impede usability and especially sharing <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology.<br />

On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, Dou et al.[DMQ02] does not propose to use <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tologies<br />

as such, but to merely use <str<strong>on</strong>g>the</str<strong>on</strong>g>m for three different tasks:<br />

1. Dataset translati<strong>on</strong> (cf. instance transformati<strong>on</strong> in [dBP04]). Dataset translati<strong>on</strong> is<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> translating a set <str<strong>on</strong>g>of</str<strong>on</strong>g> data (instances) from <strong>on</strong>e representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

o<str<strong>on</strong>g>the</str<strong>on</strong>g>r.<br />

2. <strong>Ontology</strong> extensi<strong>on</strong> generati<strong>on</strong>. The problem <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology extensi<strong>on</strong> generati<strong>on</strong> is<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> generating an extensi<strong>on</strong> O2s, given two related <strong>on</strong>tologies O1 and<br />

O2 and an extensi<strong>on</strong> (sub<strong>on</strong>tology) O1s <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology O1. The example given by <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

authors is to generate a WSDL extensi<strong>on</strong> based <strong>on</strong> an OWL-S descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

corresp<strong>on</strong>ding Web Service.<br />

3. Querying different <strong>on</strong>tologies. This relates very much to <str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting described<br />

in [dBP04]. However, query rewriting is a technique for solving <str<strong>on</strong>g>the</str<strong>on</strong>g> problem<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> querying different <strong>on</strong>tologies, whereas Dou et al. [DMQ02] merely menti<strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> problem.<br />

As we have also suggested in [dBP04], OntoMerge uses mappings between <strong>on</strong>tologies<br />

in order to enable <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong>. In fact, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology translati<strong>on</strong> (except for <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong><br />

generati<strong>on</strong>) can be seen as run-time mediati<strong>on</strong> [dBP04].<br />

[DMQ02] presents an internal representati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, called Web-PDDL,<br />

which is a typed first-order logic language. The import and export <str<strong>on</strong>g>of</str<strong>on</strong>g> DAML+OIL and<br />

OWL is already supported. Due to <str<strong>on</strong>g>the</str<strong>on</strong>g> fact that Web-PDDL can capture many different<br />

<strong>on</strong>tology languages, importers and exporters for o<str<strong>on</strong>g>the</str<strong>on</strong>g>r languages could be written as well.


CHAPTER 4. THE SURVEY 65<br />

Dataset translati<strong>on</strong> OntoMerge [DMQ02] performs dataset translati<strong>on</strong> in two distinct<br />

steps. First, given <str<strong>on</strong>g>the</str<strong>on</strong>g> source dataset (a set <str<strong>on</strong>g>of</str<strong>on</strong>g> facts) and <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology, all possible<br />

inferences are drawn from <str<strong>on</strong>g>the</str<strong>on</strong>g> source facts. Sec<strong>on</strong>dly, <str<strong>on</strong>g>the</str<strong>on</strong>g> results are projected <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

target vocabulary, retaining <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> results expressed in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology.<br />

These two steps guarantee that a maximal translati<strong>on</strong> is performed, with respect to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

merged <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g> source dataset.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g>ir practical evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors <strong>on</strong>ly work with very small<br />

datasets c<strong>on</strong>sisting <str<strong>on</strong>g>of</str<strong>on</strong>g> several thousand facts. The fact that <str<strong>on</strong>g>the</str<strong>on</strong>g>y use a <str<strong>on</strong>g>the</str<strong>on</strong>g>orem prover<br />

leaves open questi<strong>on</strong>s about scalability for large numbers <str<strong>on</strong>g>of</str<strong>on</strong>g> facts.<br />

<strong>Ontology</strong> extensi<strong>on</strong> generati<strong>on</strong> Say you have two related <strong>on</strong>tologies O1 and O2 and<br />

a sub<strong>on</strong>tology O1s <str<strong>on</strong>g>of</str<strong>on</strong>g> O1. It is now possible, using <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships between <str<strong>on</strong>g>the</str<strong>on</strong>g> two<br />

<strong>on</strong>tologies, to automatically generate a sub<strong>on</strong>tology O2s <str<strong>on</strong>g>of</str<strong>on</strong>g> O2 which corresp<strong>on</strong>ds with<br />

O1s.<br />

The sub<strong>on</strong>tology O2s will c<strong>on</strong>tain new predicates with identifiers generated using<br />

skolem c<strong>on</strong>stants. The new predicates are made sub-predicates <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> existing predicates.<br />

The major disadvantage to this approach for <strong>on</strong>tology extensi<strong>on</strong> generati<strong>on</strong>, identified<br />

by <str<strong>on</strong>g>the</str<strong>on</strong>g> authors, is that <str<strong>on</strong>g>the</str<strong>on</strong>g> generated sub<strong>on</strong>tology <strong>on</strong>ly c<strong>on</strong>tains subproperty axioms,<br />

whereas many sub<strong>on</strong>tologies might be specified using general axioms.<br />

Querying through Different Ontologies Querying is d<strong>on</strong>e in OntoMerge by selecting<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology which merges <str<strong>on</strong>g>the</str<strong>on</strong>g> query <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology. Then, a<br />

query selecti<strong>on</strong> and reformulati<strong>on</strong> module (not described in detail) is used to select subqueries<br />

and reformulate <str<strong>on</strong>g>the</str<strong>on</strong>g> subqueries. Each subquery is executed <strong>on</strong> respective knowledge<br />

bases and <str<strong>on</strong>g>the</str<strong>on</strong>g> results are combined.<br />

In fact, what we call an <strong>Ontology</strong> mapping is very similar to a set <str<strong>on</strong>g>of</str<strong>on</strong>g> bridging axioms<br />

in [DMQ02]. However, we do not presume <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies use <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />

language as <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping, whereas OntoMerge requires <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology to c<strong>on</strong>sist <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> bridging axioms.<br />

A major drawback <str<strong>on</strong>g>of</str<strong>on</strong>g> OntoMerge is that bridging axioms need to be written using a<br />

first-order language. Only very few people are familiar with <str<strong>on</strong>g>the</str<strong>on</strong>g> first-order logic.<br />

4.1.9 OMEN<br />

OMEN (<strong>Ontology</strong> Mapping Enhancer) [MNJ04] is a tool for describing mappings with<br />

probabilities and infer new mappings by means <str<strong>on</strong>g>of</str<strong>on</strong>g> Bayesian Network inference mechanisms.<br />

The motivati<strong>on</strong> for using Bayesian Networks for this purpose is that especially


CHAPTER 4. THE SURVEY 66<br />

when <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings are discovered automatically by heuristics or machine learning techniques,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>y are imprecise and attached with a certain uncertainty. Also experts are sometimes<br />

unsure about <str<strong>on</strong>g>the</str<strong>on</strong>g> exact match between <strong>on</strong>tological elements <str<strong>on</strong>g>of</str<strong>on</strong>g> different <strong>on</strong>tologies.<br />

This allows <str<strong>on</strong>g>the</str<strong>on</strong>g>m to assign some certainty rating to a match.<br />

The inference <str<strong>on</strong>g>of</str<strong>on</strong>g> new mappings happens by means <str<strong>on</strong>g>of</str<strong>on</strong>g> meta-rules based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology relati<strong>on</strong>s that expresses how each mapping affects o<str<strong>on</strong>g>the</str<strong>on</strong>g>r related<br />

mappings. These meta-rules are used by <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network.<br />

Within this tool, a simple <strong>on</strong>tology model is assumed. Similar to RDF Schema, <strong>on</strong>ly<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> following comp<strong>on</strong>ents are used to express <strong>on</strong>tologies:<br />

• Classes are c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain that are organized in a hierarchy <str<strong>on</strong>g>of</str<strong>on</strong>g> subclass<br />

superclass relati<strong>on</strong>ships with multiple inheritance.<br />

• Properties describe attributes <str<strong>on</strong>g>of</str<strong>on</strong>g> classes and relati<strong>on</strong>ships between classes. Properties<br />

are c<strong>on</strong>sidered to have <strong>on</strong>e or more domains and <strong>on</strong>e or more ranges. Domains<br />

are classes to which <str<strong>on</strong>g>the</str<strong>on</strong>g> property can be applied, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> domains that<br />

are possible is taken as final domain. Ranges restrict <str<strong>on</strong>g>the</str<strong>on</strong>g> classes for <str<strong>on</strong>g>the</str<strong>on</strong>g> values <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> property, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ranges that are possible is taken as final<br />

range.<br />

C<strong>on</strong>structi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network The nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network corresp<strong>on</strong>d<br />

to individual pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> matches, i.e. matches between individual pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> classes<br />

or properties from different <strong>on</strong>tologies. Arrows in <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network represent <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

influences between <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network. Each node is associated with a<br />

c<strong>on</strong>diti<strong>on</strong>al probability table (CPT) that represents <str<strong>on</strong>g>the</str<strong>on</strong>g> influences <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> parents <str<strong>on</strong>g>of</str<strong>on</strong>g> a node<br />

that affect <str<strong>on</strong>g>the</str<strong>on</strong>g> node itself.<br />

Note that not all possible pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> classes or properties can be represented as nodes<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network because <str<strong>on</strong>g>the</str<strong>on</strong>g>re are too many pairs and <str<strong>on</strong>g>the</str<strong>on</strong>g> inference would not<br />

be efficient. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network is pruned. For details <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> pruning see<br />

[MNJ04]. Also <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> parents that each nodes has is adjusted because too many<br />

parents significantly increase <str<strong>on</strong>g>the</str<strong>on</strong>g> computati<strong>on</strong>s that are necessary and make <str<strong>on</strong>g>the</str<strong>on</strong>g> inference<br />

intractable. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, all cycles that may occur with <str<strong>on</strong>g>the</str<strong>on</strong>g> arrows are broken by rejecting<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> edges from <str<strong>on</strong>g>the</str<strong>on</strong>g> parents whose matching informati<strong>on</strong> is minimum.<br />

For inferencing, <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes that have an initial probability that is above a given threshold<br />

are c<strong>on</strong>sidered to be p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> evidence. The CPTs are computed by means <str<strong>on</strong>g>of</str<strong>on</strong>g> a set<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> generic meta-rules that enables <str<strong>on</strong>g>the</str<strong>on</strong>g> automatic generati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> CPTs for each p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular<br />

pair <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. The implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> OMEN system is parametrized with respect<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> meta-rules. Meta-rules can be added and removed in order to enable a better<br />

evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> which <strong>on</strong>e work best for a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular knowledge model.<br />

Afterwards <str<strong>on</strong>g>the</str<strong>on</strong>g> inference <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network can be st<str<strong>on</strong>g>art</str<strong>on</strong>g>ed and <str<strong>on</strong>g>the</str<strong>on</strong>g> output is a<br />

new set <str<strong>on</strong>g>of</str<strong>on</strong>g> matches.


CHAPTER 4. THE SURVEY 67<br />

The set <str<strong>on</strong>g>of</str<strong>on</strong>g> meta-rules used in <str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong> are for example <str<strong>on</strong>g>the</str<strong>on</strong>g> following metarules:<br />

• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two c<strong>on</strong>cepts C and C ′ that match and <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a relati<strong>on</strong>ship between<br />

C and ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r c<strong>on</strong>cept C1 in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O and a relati<strong>on</strong>ship between C ′ and C ′ 1 in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O ′ . Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore <str<strong>on</strong>g>the</str<strong>on</strong>g>se two relati<strong>on</strong>ships match. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> probability<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> match between C1 and C ′ 1 is increased.<br />

• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two c<strong>on</strong>cepts C and C ′ that match and <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a relati<strong>on</strong>ship between<br />

C and ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r c<strong>on</strong>cept C1 in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O and a relati<strong>on</strong>ship between C ′ and C ′ 1<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O ′ . Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore <str<strong>on</strong>g>the</str<strong>on</strong>g>se two relati<strong>on</strong>ships do not match. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

probability <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> match between C1 and C ′ 1 is decreased.<br />

• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two properties that match and each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>m has a single range. Then,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> probability <str<strong>on</strong>g>of</str<strong>on</strong>g> a match between <str<strong>on</strong>g>the</str<strong>on</strong>g> classes that represent <str<strong>on</strong>g>the</str<strong>on</strong>g> ranges is increased.<br />

• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two properties that match and <str<strong>on</strong>g>the</str<strong>on</strong>g> first property has a range that is a<br />

uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> classes C1 and C2. The o<str<strong>on</strong>g>the</str<strong>on</strong>g>r property has a single range corresp<strong>on</strong>ding<br />

to C ′ . Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that C1 is a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> C ′ can be increased.<br />

Analogously, <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that C2 is a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> C ′ can be increased.<br />

• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are mappings between superc<strong>on</strong>cepts <str<strong>on</strong>g>of</str<strong>on</strong>g> two certain c<strong>on</strong>cepts, each bel<strong>on</strong>ging<br />

to a different <strong>on</strong>tology, and all <str<strong>on</strong>g>the</str<strong>on</strong>g> siblings. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> probability <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />

match between <str<strong>on</strong>g>the</str<strong>on</strong>g> remaining c<strong>on</strong>cepts is increased.<br />

Note that in <str<strong>on</strong>g>the</str<strong>on</strong>g> OMEN system probabilistic influences are combined as follows: If<br />

a node in a Bayesian Network has two parents, <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>diti<strong>on</strong>al probability tables are<br />

combined for <str<strong>on</strong>g>the</str<strong>on</strong>g> child using <str<strong>on</strong>g>the</str<strong>on</strong>g> assumpti<strong>on</strong> that <str<strong>on</strong>g>the</str<strong>on</strong>g> two parents are independent. I.e.<br />

P (N|P1, P2) = P (N|P1)P (N|P2)<br />

When <str<strong>on</strong>g>the</str<strong>on</strong>g> match <str<strong>on</strong>g>of</str<strong>on</strong>g> two pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> parents influences each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r, this assumpti<strong>on</strong> is<br />

not true. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> system obtained encouraging results even with this simplifying<br />

assumpti<strong>on</strong>.<br />

Experiments were c<strong>on</strong>ducted in order to see whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> inference <str<strong>on</strong>g>of</str<strong>on</strong>g> new mappings<br />

works. For this purpose, two <strong>on</strong>tologies expressed in RDF using RDF Schema have been<br />

used. The experiments showed that new mappings can be inferred by OMEN. E.g. by<br />

giving <strong>on</strong>ly 3 out <str<strong>on</strong>g>of</str<strong>on</strong>g> 11 matches, up to 7 missing matches were able to be generated.<br />

4.1.10 WSMT Data mediati<strong>on</strong> module<br />

The Web Services Modelling Toolkit (WSMT) [Ker05] has different modules to model<br />

Semantic Web Services based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMO framework [DR05]. The data mediati<strong>on</strong><br />

module [Moc05] presented in this secti<strong>on</strong> c<strong>on</strong>tains three sub-modules or comp<strong>on</strong>ents: a<br />

graphical user interface for defining <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings, a comp<strong>on</strong>ent that reads <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings


CHAPTER 4. THE SURVEY 68<br />

from <str<strong>on</strong>g>the</str<strong>on</strong>g> storage and generates <str<strong>on</strong>g>the</str<strong>on</strong>g> appropriate mapping rules and an envir<strong>on</strong>ment that<br />

provides <str<strong>on</strong>g>the</str<strong>on</strong>g> means for executing <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules. The first module represents <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

design-time comp<strong>on</strong>ent while <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d two modules represent <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time comp<strong>on</strong>ent.<br />

Figure4.11 shows how <str<strong>on</strong>g>the</str<strong>on</strong>g>se comp<strong>on</strong>ents interact.<br />

Overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Data Mediati<strong>on</strong> Module<br />

Design Time Comp<strong>on</strong>ent<br />

Figure 4.11: Overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMT Data Mediati<strong>on</strong> Module<br />

The mapping tools <str<strong>on</strong>g>of</str<strong>on</strong>g>fer graphical interfaces in order to assist <str<strong>on</strong>g>the</str<strong>on</strong>g> user in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping<br />

creati<strong>on</strong> and to reduce his or her effort to simple choices and validati<strong>on</strong>s.<br />

The graphical interface is built using Java 1.5 13 and <str<strong>on</strong>g>the</str<strong>on</strong>g> default <strong>on</strong>tology language<br />

supported is WSML. Future versi<strong>on</strong>s are expected to propose various wrappers that could<br />

transliterate <strong>on</strong>tologies expressed in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r languages into WSML (a simplified <strong>on</strong>e for<br />

Flora-2 already exists). The object model behind <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping tool is fully compatible<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMO API and WSMO4J (and this applies to <str<strong>on</strong>g>the</str<strong>on</strong>g> runtime comp<strong>on</strong>ent as well).<br />

The WSMO API provides a set <str<strong>on</strong>g>of</str<strong>on</strong>g> interface for manipulating WSMO entities, including<br />

<strong>on</strong>tologies. WSMO4J provides an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se interfaces and a parser for<br />

WSML 14 . The mapping tool was integrated as a plug-in in <str<strong>on</strong>g>the</str<strong>on</strong>g> Web Service Modelling<br />

13 see http://java.sun.com for more informati<strong>on</strong><br />

14 for more details see https://sourceforge.net/projects/wsmo4j


CHAPTER 4. THE SURVEY 69<br />

Toolkit, which also <str<strong>on</strong>g>of</str<strong>on</strong>g>fers as plug-ins a WSML editor and a WSMX invoker (see Figure<br />

4.12). The editor can be used by <str<strong>on</strong>g>the</str<strong>on</strong>g> domain expert to operate <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies <str<strong>on</strong>g>the</str<strong>on</strong>g>y are<br />

mapping and <str<strong>on</strong>g>the</str<strong>on</strong>g> framework <str<strong>on</strong>g>of</str<strong>on</strong>g>fers basic synchr<strong>on</strong>izati<strong>on</strong> functi<strong>on</strong>s such as reloading <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>tologies into <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping tool or invalidating mapping rules affected by changes in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>tologies. The WSML invoker can be used to test mappings with run-time comp<strong>on</strong>ents<br />

deployed as Web Services, directly from <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping tool (more details about this are<br />

provided in secti<strong>on</strong> 4.1.10).<br />

Figure 4.12: Screenshots <str<strong>on</strong>g>of</str<strong>on</strong>g> WSMT Plug-ins (from back to fr<strong>on</strong>t: <str<strong>on</strong>g>the</str<strong>on</strong>g> Preferences Panel,<br />

WSMX Invoker, WSML Editor and Mapping Tool)<br />

As <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping tool and WSMT are two <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> surrounding efforts around WSMO<br />

and WSMX, <str<strong>on</strong>g>the</str<strong>on</strong>g>y are p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMX open source project 15 - Windows and Linux<br />

installers for WSMT can be found <strong>on</strong> SourceForge 16 .<br />

15 see https://sourceforge.net/projects/wsmx<br />

16 downloads available at: http://sourceforge.net/project/showfiles.php?group id=113321


CHAPTER 4. THE SURVEY 70<br />

Map mediate(<strong>Ontology</strong> source<strong>Ontology</strong>,<br />

<strong>Ontology</strong> target<strong>Ontology</strong>,<br />

Set data)<br />

Transforms a set <str<strong>on</strong>g>of</str<strong>on</strong>g> source <strong>on</strong>tology<br />

instances into instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target<br />

<strong>on</strong>tology.<br />

List mediate(<strong>Ontology</strong> source<strong>Ontology</strong>,<br />

<strong>Ontology</strong> target<strong>Ontology</strong>,<br />

Identifiable data)<br />

Transforms a given source <strong>on</strong>tology<br />

instance into instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target<br />

<strong>on</strong>tology.<br />

StringBuffer mediate(<strong>Ontology</strong> source<strong>Ontology</strong>,<br />

<strong>Ontology</strong> target<strong>Ontology</strong>,<br />

StringBuffer payload)<br />

Transforms source <strong>on</strong>tology instances<br />

into instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology.<br />

The payload represents a WSML document<br />

c<strong>on</strong>taining <str<strong>on</strong>g>the</str<strong>on</strong>g> instances to be mediated.<br />

It will be parsed and after <str<strong>on</strong>g>the</str<strong>on</strong>g> mediati<strong>on</strong><br />

takes places a new WSML document is created<br />

c<strong>on</strong>taining <str<strong>on</strong>g>the</str<strong>on</strong>g> target instances.<br />

Table 4.1: Interfaces <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Run-time Mediator Method Summary<br />

The mappings are stored in an external storage, in this case a relati<strong>on</strong>al database, from<br />

where <str<strong>on</strong>g>the</str<strong>on</strong>g>y can be loaded by <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules generator module. Also, by means <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

this graphical interface, <str<strong>on</strong>g>the</str<strong>on</strong>g> user can load existing mappings from <str<strong>on</strong>g>the</str<strong>on</strong>g> external storage for<br />

fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r refinements or as support in computing <str<strong>on</strong>g>the</str<strong>on</strong>g> suggesti<strong>on</strong>s.<br />

Run-time comp<strong>on</strong>ent<br />

The Mapping Rules Creator implements <str<strong>on</strong>g>the</str<strong>on</strong>g> grounding mechanism described in Secti<strong>on</strong><br />

4.2 while <str<strong>on</strong>g>the</str<strong>on</strong>g> Rule Executi<strong>on</strong> Envir<strong>on</strong>ment has <str<strong>on</strong>g>the</str<strong>on</strong>g> role <str<strong>on</strong>g>of</str<strong>on</strong>g> executing <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules<br />

against <str<strong>on</strong>g>the</str<strong>on</strong>g> incoming source <strong>on</strong>tology instances. The Run-time comp<strong>on</strong>ent is designed to<br />

be p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMX architecture [MZ05], and as a c<strong>on</strong>sequence it <str<strong>on</strong>g>of</str<strong>on</strong>g>fers well-defined<br />

interfaces explaining how it can be invoked. These interfaces are presented in Table 4.1.<br />

The first two methods refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tological entities in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMO API<br />

objects, while <str<strong>on</strong>g>the</str<strong>on</strong>g> last method manipulates WSML documents embedded in StringBuffer<br />

objects.<br />

Outside <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMX architecture <str<strong>on</strong>g>the</str<strong>on</strong>g> Run-time comp<strong>on</strong>ent can be used in <str<strong>on</strong>g>the</str<strong>on</strong>g> following


CHAPTER 4. THE SURVEY 71<br />

ways:<br />

• as a standal<strong>on</strong>e applicati<strong>on</strong> able to c<strong>on</strong>nect to <str<strong>on</strong>g>the</str<strong>on</strong>g> provided mapping storage and<br />

to perform mediati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances provided as WSML documents. It <str<strong>on</strong>g>of</str<strong>on</strong>g>fers a small<br />

graphical interface where <str<strong>on</strong>g>the</str<strong>on</strong>g> user can set <str<strong>on</strong>g>the</str<strong>on</strong>g> source and <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology, provide<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> data to be mediated and retrieve <str<strong>on</strong>g>the</str<strong>on</strong>g> mediated data.<br />

Figure 4.13: The Graphical Interface <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Stand-al<strong>on</strong>e Run-time Mediator<br />

• as a deployed Web Service that can be invoked with <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tology<br />

IDs and data to be mediated and that returns <str<strong>on</strong>g>the</str<strong>on</strong>g> mediated data. The mapping tool<br />

can directly invoke such Web Services by using <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMT Invoker.


CHAPTER 4. THE SURVEY 72<br />

4.1.11 DOME mapping module<br />

The Distributed <strong>Ontology</strong> Management Envir<strong>on</strong>ment(DOME [Hen05]) 17 is an <strong>on</strong>going<br />

project aiming at building a suite <str<strong>on</strong>g>of</str<strong>on</strong>g> tools to manage distributed <strong>on</strong>tologies. Proposed as<br />

a stand-al<strong>on</strong>e applicati<strong>on</strong> or as an Eclipse plug-in, it includes three modules: an <strong>on</strong>tology<br />

editor, a mapping editor, and a versi<strong>on</strong>ing editor. DOME is developed using c<strong>on</strong>cepts from<br />

Human Interface Design and <strong>on</strong>tology engineering to facilitate <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology management<br />

in being easily usable and intuitive. Figure 4.14 present a screen-shot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware.<br />

Figure 4.14: Screenshot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Distributed <strong>Ontology</strong> Management Envir<strong>on</strong>ment(DOME)<br />

Figure 4.15: DOME screenshot<br />

Dome is based <strong>on</strong> WSMO4j to editing WSML <strong>on</strong>tologies.<br />

The mapping editor is based <strong>on</strong> an abstract <strong>on</strong>tology mapping language[SdB05] and<br />

presents an original mapping document tree view. It allows mapping between classes,<br />

attributes, relati<strong>on</strong>s and instance, and exploits <str<strong>on</strong>g>the</str<strong>on</strong>g> capability <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> abstract mapping language<br />

to relate many to many entities by operator selecti<strong>on</strong>. It is to our knowledge <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>ly mapping tool providing this capability.<br />

17 The s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware is available at http://dome.sourceforge.net


CHAPTER 4. THE SURVEY 73<br />

In order to realize a mapping, <str<strong>on</strong>g>the</str<strong>on</strong>g> user select <strong>on</strong>e or many entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology,<br />

and drags’n’drops <str<strong>on</strong>g>the</str<strong>on</strong>g>m to <str<strong>on</strong>g>the</str<strong>on</strong>g> entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tologies. If many entities<br />

were selected, <str<strong>on</strong>g>the</str<strong>on</strong>g>y can be combined via a selected operator. It is in <str<strong>on</strong>g>the</str<strong>on</strong>g>n possible to<br />

map, for example, <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>juncti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> male and female classes to <str<strong>on</strong>g>the</str<strong>on</strong>g> pers<strong>on</strong> class using a<br />

unique mapping rule. The mapping document tree editor gives <str<strong>on</strong>g>the</str<strong>on</strong>g> possibility <str<strong>on</strong>g>of</str<strong>on</strong>g> adding<br />

c<strong>on</strong>diti<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g> rules, for example mapping pers<strong>on</strong> to female require <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>diti<strong>on</strong> that<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> gender attribute <str<strong>on</strong>g>of</str<strong>on</strong>g> pers<strong>on</strong> is equal to ”female”. Figure 4.16 presents <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping<br />

document tree editor.<br />

Figure 4.16: Mapping document tree editor<br />

It is interesting to note that <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping module is fully integrated in <str<strong>on</strong>g>the</str<strong>on</strong>g> whole <strong>on</strong>tology<br />

management envir<strong>on</strong>ment, to realize <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping <str<strong>on</strong>g>the</str<strong>on</strong>g> user <strong>on</strong>ly has to open ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

<strong>on</strong>tology and can directly relate <str<strong>on</strong>g>the</str<strong>on</strong>g> entities.


CHAPTER 4. THE SURVEY 74<br />

4.2 Integrated Systems<br />

4.2.1 InfoSleuth<br />

Summary InfoSleuth [FNPB99, NFK + 00] is an agent-based system, which supports<br />

c<strong>on</strong>structi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> complex <strong>on</strong>tologies from smaller comp<strong>on</strong>ent <strong>on</strong>tologies so that tools tailored<br />

for <strong>on</strong>e comp<strong>on</strong>ent <strong>on</strong>tology can be used in many applicati<strong>on</strong> domains. The purpose<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system is to provide an interface to very dynamic data sources which can appear and<br />

disappear from <str<strong>on</strong>g>the</str<strong>on</strong>g> system at any given time. Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> reused <strong>on</strong>tologies include units<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> measure, chemistry knowledge, geographic metadata, and so <strong>on</strong>. Mapping is explicitly<br />

specified am<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g>se <strong>on</strong>tologies as relati<strong>on</strong>ships between c<strong>on</strong>cepts in <strong>on</strong>e <strong>on</strong>tology and<br />

related c<strong>on</strong>cepts in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies.<br />

All mappings between <strong>on</strong>tologies are maintained by a special class <str<strong>on</strong>g>of</str<strong>on</strong>g> agents known<br />

as resource agents. A resource agent encapsulates a set <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

mapping rules, and presents that informati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r agents in c<strong>on</strong>cepts <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e<br />

or more <strong>on</strong>tologies (called domain <strong>on</strong>tologies). All mapping is encapsulated within <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

resource agents. Ontologies are represented in OKBC (Open Knowledge Base C<strong>on</strong>nectivity)<br />

[CFF + 98] format and stored in an OKBC server by a special class <str<strong>on</strong>g>of</str<strong>on</strong>g> agents called<br />

<strong>on</strong>tology agents, which provide <strong>on</strong>tology specificati<strong>on</strong>s to users (for request formulati<strong>on</strong>)<br />

and to resource agents (for mapping).<br />

The InfoSleuth architecture [NFK + 00] (Figure 4.17) c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a number <str<strong>on</strong>g>of</str<strong>on</strong>g> different<br />

types <str<strong>on</strong>g>of</str<strong>on</strong>g> agents. User agents and resource agents are <str<strong>on</strong>g>the</str<strong>on</strong>g> main agents in <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />

User agents request informati<strong>on</strong> to fulfil <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s informati<strong>on</strong> needs and resource agents<br />

provide that informati<strong>on</strong>. The remaining agents in <str<strong>on</strong>g>the</str<strong>on</strong>g> system provide <str<strong>on</strong>g>the</str<strong>on</strong>g> “glue” (or<br />

mediati<strong>on</strong>) between <str<strong>on</strong>g>the</str<strong>on</strong>g> two.<br />

• The user agents act <strong>on</strong> behalf <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user and maintain <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s state. They provide<br />

a system interface that enables users to communicate with <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />

• The resource agents wrap and activate databases and o<str<strong>on</strong>g>the</str<strong>on</strong>g>r repositories <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong>.<br />

They translate queries and data stored in external repositories between <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

local forms and <str<strong>on</strong>g>the</str<strong>on</strong>g>ir InfoSleuth forms. There are resource agents for different types<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> data sources, including relati<strong>on</strong>al databases, flat files, and images.<br />

• Service agents provide internal informati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> agent system.<br />

Service agents include Broker agents, which collectively maintain <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> agents advertise about <str<strong>on</strong>g>the</str<strong>on</strong>g>mselves, <strong>Ontology</strong> agents, which maintain a knowledge<br />

base <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> different <strong>on</strong>tologies used for specifying requests, and M<strong>on</strong>itor<br />

agents, which m<strong>on</strong>itor <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />

• Query and analysis agents fuse and/or analyze informati<strong>on</strong> from <strong>on</strong>e or more resources<br />

into single (<strong>on</strong>e-time) results. Query and analysis agents include Multiresource<br />

query agents, which process queries that span multiple data sources, De-


CHAPTER 4. THE SURVEY 75<br />

Figure 4.17: The InfoSleuth architecture<br />

viati<strong>on</strong> detecti<strong>on</strong> agents, which m<strong>on</strong>itor streams <str<strong>on</strong>g>of</str<strong>on</strong>g> data to detect deviati<strong>on</strong>s, and<br />

o<str<strong>on</strong>g>the</str<strong>on</strong>g>r data mining agents.<br />

Multi-resource query agents query multiple heterogeneous resources. The queries<br />

posed to <str<strong>on</strong>g>the</str<strong>on</strong>g> agent are specified in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> some domain <strong>on</strong>tology. In InfoSleuth,<br />

applicati<strong>on</strong>s can use several domain <strong>on</strong>tologies. However, a query is always posed<br />

over <strong>on</strong>e domain-specific <strong>on</strong>tology.<br />

• Planning and temporal agents guide <str<strong>on</strong>g>the</str<strong>on</strong>g> request through some processing which<br />

may take place over a period <str<strong>on</strong>g>of</str<strong>on</strong>g> time, such as a l<strong>on</strong>g-term plan, a workflow, or <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> complex events. Planning and temporal agents include Subscripti<strong>on</strong><br />

agents, which m<strong>on</strong>itor how a set <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> (in a data source) changes over<br />

time, Task planning and executi<strong>on</strong> agents plan <str<strong>on</strong>g>the</str<strong>on</strong>g> processing <str<strong>on</strong>g>of</str<strong>on</strong>g> user requests in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

system, and Sentinel agents m<strong>on</strong>itor <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> and event stream for complex<br />

events.<br />

• Value mapping agents provide value mappings, i.e. mappings between values,<br />

am<strong>on</strong>g equivalent representati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same informati<strong>on</strong>.<br />

InfoSleuth uses a variant <str<strong>on</strong>g>of</str<strong>on</strong>g> SQL in which a query c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a select, from and where<br />

clause as its query language. Functi<strong>on</strong>s are allowed in <str<strong>on</strong>g>the</str<strong>on</strong>g> select and where clauses and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> syntax is c<strong>on</strong>sistent with that used in popular relati<strong>on</strong>al database management systems.<br />

For <str<strong>on</strong>g>the</str<strong>on</strong>g> user queries, a layer <strong>on</strong> top <str<strong>on</strong>g>of</str<strong>on</strong>g> this query language has been developed,


CHAPTER 4. THE SURVEY 76<br />

called Template-based Query Markup Language (TQML), which uses templates and materialized<br />

views to aid <str<strong>on</strong>g>the</str<strong>on</strong>g> user in creating queries.<br />

When agents come <strong>on</strong>line <str<strong>on</strong>g>the</str<strong>on</strong>g>y advertise <str<strong>on</strong>g>the</str<strong>on</strong>g>ir capabilities to a specific broker agent<br />

in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> InfoSleuth <strong>on</strong>tology. This <strong>on</strong>tology is a special <strong>on</strong>tology used for advertisement<br />

and querying <str<strong>on</strong>g>of</str<strong>on</strong>g> agents. When a query is posed to a broker agent, <str<strong>on</strong>g>the</str<strong>on</strong>g> brokering<br />

process is initiated. First, syntactic matching is d<strong>on</strong>e for example to, determine which resource<br />

agents speak <str<strong>on</strong>g>the</str<strong>on</strong>g> desired language. The semantic matching is d<strong>on</strong>e in order to find<br />

out which resources c<strong>on</strong>tain informati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> desired c<strong>on</strong>cepts. Finally, pragmatic<br />

matching is d<strong>on</strong>e to restrict <str<strong>on</strong>g>the</str<strong>on</strong>g> set <str<strong>on</strong>g>of</str<strong>on</strong>g> resources to those that, for example, have <str<strong>on</strong>g>the</str<strong>on</strong>g> correct<br />

access permissi<strong>on</strong>s.<br />

Resource agents in InfoSleuth functi<strong>on</strong> as a wrapper <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data source. A<br />

resource agent advertises <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> overall domain <strong>on</strong>tology that it supports, advertises<br />

its query capabilities and does <str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting and transforms <str<strong>on</strong>g>the</str<strong>on</strong>g> retrieved data<br />

to facts <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <strong>on</strong>tology.<br />

Mappings between different value domains are encapsulated in value mapping agents,<br />

which perform simple and complex mappings between domains. Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> complex<br />

mappings are sophisticated functi<strong>on</strong>s (e.g. differences in time intervals) and incorporating<br />

values from (multiple) external <strong>on</strong>tologies.<br />

The executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> queries is d<strong>on</strong>e by <str<strong>on</strong>g>the</str<strong>on</strong>g> query agent. This query agent decomposes<br />

a query into a number <str<strong>on</strong>g>of</str<strong>on</strong>g> subqueries, <strong>on</strong>e for each resource agent involved in <str<strong>on</strong>g>the</str<strong>on</strong>g> query.<br />

Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, it creates a number <str<strong>on</strong>g>of</str<strong>on</strong>g> global queries for fusing <str<strong>on</strong>g>the</str<strong>on</strong>g> results <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> subqueries<br />

in order not to have redundancy in <str<strong>on</strong>g>the</str<strong>on</strong>g> overall query result.<br />

<strong>Ontology</strong> Languages <strong>Ontology</strong> agents which provide an OKBC interface to <str<strong>on</strong>g>the</str<strong>on</strong>g> knowledge<br />

base can all be c<strong>on</strong>nected to <str<strong>on</strong>g>the</str<strong>on</strong>g> InfoSleuth agent system. All <strong>on</strong>tologies within<br />

InfoSleuth are expressed using <str<strong>on</strong>g>the</str<strong>on</strong>g> OKBC knowledge model. Each resource agent must<br />

wrap an external informati<strong>on</strong> source and provide a mapping with <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <strong>on</strong>tologies<br />

currently in use in <str<strong>on</strong>g>the</str<strong>on</strong>g> InfoSleuth system.<br />

Mapping Language [NFK + 00] reports no mapping between <strong>on</strong>tologies in InfoSleuth.<br />

In fact, this was seen as future work. However, because InfoSleuth is mostly a data integrati<strong>on</strong><br />

system, it is more relevant that a mapping between data sources and <str<strong>on</strong>g>the</str<strong>on</strong>g> domain<br />

<strong>on</strong>tologies is possible. InfoSleuth does not provide a mapping language, but does provide<br />

a number <str<strong>on</strong>g>of</str<strong>on</strong>g> Java templates, which can be used for <str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers, which<br />

c<strong>on</strong>tain a procedural mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g> data schema and <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <strong>on</strong>tologies in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

agent system. An important point here is that it is possible to map to multiple domain<br />

<strong>on</strong>tologies and it would be very interesting to combine this with actual mappings between<br />

<strong>on</strong>tologies, as is d<strong>on</strong>e in ONION [MWK00], for example.


CHAPTER 4. THE SURVEY 77<br />

Mapping Patterns Although some aid in <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings through <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

Java templates is <str<strong>on</strong>g>of</str<strong>on</strong>g>fered to <str<strong>on</strong>g>the</str<strong>on</strong>g> user, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no c<strong>on</strong>cept <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns in InfoSleuth.<br />

Extensi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> InfoSleuth, which would enable mapping between <strong>on</strong>tologies, would<br />

benefit from <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns. However, we are not aware <str<strong>on</strong>g>of</str<strong>on</strong>g> any c<strong>on</strong>tinuati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> work <strong>on</strong> InfoSleuth after <str<strong>on</strong>g>the</str<strong>on</strong>g> work reported in [NFK + 00].<br />

Automati<strong>on</strong> Support There is no automati<strong>on</strong> support in creating mappings between<br />

data schemas and <strong>on</strong>tologies. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting and data fusi<strong>on</strong> is completely<br />

automated, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> data schemas and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. A query<br />

written in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> a domain <strong>on</strong>tology is automatically decomposed in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resources,<br />

and after executi<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> results are automatically fused by a different decompositi<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> original query.<br />

Applicability to Use Cases As stated above, <str<strong>on</strong>g>the</str<strong>on</strong>g> resource agents take care <str<strong>on</strong>g>of</str<strong>on</strong>g> transforming<br />

data from <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying sources to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system and also<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> rewriting <str<strong>on</strong>g>the</str<strong>on</strong>g> query in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data schema.<br />

The querying agent fuses query results from different sources in order to remove redundancies.<br />

The fusi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> query results is based <strong>on</strong> a different decompositi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user<br />

query, which defines a uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query results and eliminates any redundancy in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

results.<br />

Implementati<strong>on</strong> The InfoSleuth agent system has been implemented in two prototype<br />

projects.<br />

There are Java templates available to make <str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> new agents easier. To<br />

create a resource agent using such a template, it is generally sufficient to just supply a<br />

c<strong>on</strong>figurati<strong>on</strong> and a mapping file to complete <str<strong>on</strong>g>the</str<strong>on</strong>g> agent [NFK + 00]. It is possible to use<br />

different <strong>on</strong>tologies in an InfoSleuth system. Each OKBC-compliant Knowledge Base<br />

can be used in InfoSleuth by wrapping it using an <strong>on</strong>tology agent.<br />

Experiences [NFK + 00] reports <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> InfoSleuth in two prototype projects. The<br />

first is <str<strong>on</strong>g>the</str<strong>on</strong>g> EDEN (Envir<strong>on</strong>mental Data Exchange Network) project. The aim <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EDEN<br />

project was to provide integrated access to envir<strong>on</strong>mental informati<strong>on</strong> resources over <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Web. EDEN posed many challenges in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> legacy databases and<br />

mappings <str<strong>on</strong>g>of</str<strong>on</strong>g> values <str<strong>on</strong>g>of</str<strong>on</strong>g> different representati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> similar informati<strong>on</strong>.<br />

Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r prototype project in which InfoSleuth was applied is MCC’s Competitive<br />

Intelligence System [NFK + 00].


CHAPTER 4. THE SURVEY 78<br />

4.2.2 ONION<br />

Summary ONION (ONtology compositION) [MWK00, MW01] is an architecture<br />

based <strong>on</strong> a sound formalism to support a scalable framework for <strong>on</strong>tology integrati<strong>on</strong><br />

that uses a graph-oriented model for <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. The special feature<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> this system is that it separates <str<strong>on</strong>g>the</str<strong>on</strong>g> logical inference engine from <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong><br />

model (<str<strong>on</strong>g>the</str<strong>on</strong>g> graph representati<strong>on</strong>) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies as much as possible. This allows for<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> accommodati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> different inference engines in <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture.<br />

In ONION <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two types <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies, individual <strong>on</strong>tologies, referred to as<br />

source <strong>on</strong>tologies and <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies, which c<strong>on</strong>tain <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts and relati<strong>on</strong>ships<br />

expressed as <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules (rules that provide links across domains). Articulati<strong>on</strong><br />

rules are established to enable knowledge inter-operability, and to bridge <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic<br />

gap between heterogeneous sources. They indicate which c<strong>on</strong>cepts individually<br />

or in c<strong>on</strong>juncti<strong>on</strong>, are related in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies [MWK00]. SKAT (<str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />

Knowledge Articulati<strong>on</strong> Tool) [MWJ99] uses <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se graphs toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with<br />

term-matching and o<str<strong>on</strong>g>the</str<strong>on</strong>g>r rules to propose <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules for <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies.<br />

The source <strong>on</strong>tologies are reflected in <str<strong>on</strong>g>the</str<strong>on</strong>g> system by <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers.<br />

The mapping between <strong>on</strong>tologies is executed by <strong>on</strong>tology algebra [Wie94, MW01].<br />

Such algebra c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> three operati<strong>on</strong>s, namely, intersecti<strong>on</strong>, uni<strong>on</strong> and difference. The<br />

objective <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology algebra is to provide <str<strong>on</strong>g>the</str<strong>on</strong>g> capability for interrogating many largely<br />

semantically disjoint knowledge resources, given <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology algebra as input. The descripti<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> algebra operators is as follows:<br />

• The intersecti<strong>on</strong> produces an <strong>on</strong>tology graph, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two<br />

source <strong>on</strong>tologies with respect to a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules, generated by an <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />

generator functi<strong>on</strong>. The nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <strong>on</strong>tology are those that<br />

appear in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules. The edges are those edges between nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

intersecti<strong>on</strong> <strong>on</strong>tology that appear in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies or have been established<br />

as an <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rule. The intersecti<strong>on</strong> determines <str<strong>on</strong>g>the</str<strong>on</strong>g> porti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> knowledge bases<br />

that deal with similar c<strong>on</strong>cepts.<br />

• The uni<strong>on</strong> operator generates a unified <strong>on</strong>tology graph comprising <str<strong>on</strong>g>the</str<strong>on</strong>g> two original<br />

<strong>on</strong>tology graphs c<strong>on</strong>nected by <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong>. The uni<strong>on</strong> presents a coherent,<br />

c<strong>on</strong>nected and semantically sound unified <strong>on</strong>tology if <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies are<br />

c<strong>on</strong>sistent.<br />

• The difference operator, to distinguish <str<strong>on</strong>g>the</str<strong>on</strong>g> difference between two <strong>on</strong>tologies<br />

(O1 −O2) is defined as <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts and relati<strong>on</strong>ships <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first <strong>on</strong>tology that have<br />

not been determined to exist in <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d. This operati<strong>on</strong> allows a local <strong>on</strong>tology<br />

maintainer to determine <str<strong>on</strong>g>the</str<strong>on</strong>g> extent <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e’s <strong>on</strong>tology that remains independent <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r domain <strong>on</strong>tologies so that it can be independently manipulated<br />

without having to update <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong>.


CHAPTER 4. THE SURVEY 79<br />

Figure 4.18: The comp<strong>on</strong>ents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION system<br />

ONION tries to separate as much as possible <str<strong>on</strong>g>the</str<strong>on</strong>g> logical inference engine from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

representati<strong>on</strong> model <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, allowing <str<strong>on</strong>g>the</str<strong>on</strong>g> accommodati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> different inference<br />

engines. It also uses <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies to inter-operate am<strong>on</strong>g <strong>on</strong>tologies.<br />

These <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies can be organized in a hierarchical fashi<strong>on</strong>. For example, an<br />

<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology can be created for two o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies that unify different<br />

source <strong>on</strong>tologies. The <strong>on</strong>tology mapping is based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> graph mapping, and at <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

same time, domain experts can define a variety <str<strong>on</strong>g>of</str<strong>on</strong>g> fuzzy matching.<br />

<strong>Ontology</strong> Language Before <strong>on</strong>tologies are integrated in <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION system, <str<strong>on</strong>g>the</str<strong>on</strong>g>y are<br />

translated to <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION graph-based c<strong>on</strong>ceptual model. An <strong>on</strong>tology O = (G, R) is<br />

represented as a directed labeled graph G and a set <str<strong>on</strong>g>of</str<strong>on</strong>g> rules R. The graph G = (V, E)<br />

c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a finite set <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes V and a finite set <str<strong>on</strong>g>of</str<strong>on</strong>g> edges E. Nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> graph corresp<strong>on</strong>d<br />

to c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. Edges corresp<strong>on</strong>d to semantic relati<strong>on</strong>ships between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION c<strong>on</strong>ceptual model, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are several semantic relati<strong>on</strong>ships with a builtin<br />

meaning, namely {SubClassOf, P <str<strong>on</strong>g>art</str<strong>on</strong>g>Of, AttributeOf, InstanceOf, V alueOf}.<br />

Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> user can create user-defined semantic relati<strong>on</strong>ships. The user <str<strong>on</strong>g>the</str<strong>on</strong>g>n has<br />

to axiomatize <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship. The better <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship<br />

is axiomatized, <str<strong>on</strong>g>the</str<strong>on</strong>g> more accurate <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> will be. A more detailed descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> built-in semantic relati<strong>on</strong>ships can be found in [MW01].<br />

An <strong>on</strong>tology graph can be represented in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web language RDF [LS99],<br />

because RDF has a graph-based data model. The set <str<strong>on</strong>g>of</str<strong>on</strong>g> logical rules R are expressed as


CHAPTER 4. THE SURVEY 80<br />

Horn clauses.<br />

An <strong>on</strong>tology in any source language can be translated to <str<strong>on</strong>g>the</str<strong>on</strong>g> graph-based model using<br />

a custom wrapper. It could happen that during <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION c<strong>on</strong>ceptual<br />

model, some semantic informati<strong>on</strong> is lost. This informati<strong>on</strong> can no l<strong>on</strong>ger be used for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies, however, <str<strong>on</strong>g>the</str<strong>on</strong>g> user can still access this<br />

informati<strong>on</strong> by querying <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying <strong>on</strong>tology directly.<br />

Mapping Language Inter-operati<strong>on</strong> in ONION is achieved through <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />

<strong>on</strong>tologies. An <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology denotes <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> two source<br />

<strong>on</strong>tologies. The intersecti<strong>on</strong> is an operati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> so-called <strong>on</strong>tology algebra [Wie94].<br />

The <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology is c<strong>on</strong>structed based <strong>on</strong> so-called <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules. An <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />

rule specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies. An<br />

<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rule is a rule <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> form P ⇒ Q, which can be intuitively read as “P semantically<br />

implies Q”. In o<str<strong>on</strong>g>the</str<strong>on</strong>g>r words, P is a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Q, or “P is subsumed by<br />

Q”.<br />

ONION distinguishes between simple and compound rules. A simple <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />

rule, which specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between nodes in two <strong>on</strong>tology graphs, is <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

form O1.A ⇒ O2.B, where A depicts a node in <strong>on</strong>tology O1 and B depicts a node in<br />

<strong>on</strong>tology O2. The rule specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> fact that A is a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> B. This rule translates<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> simplest semantic bridge, <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic implicati<strong>on</strong> bridge, which is an edge<br />

(A, “SIBridge”, B), c<strong>on</strong>necting <str<strong>on</strong>g>the</str<strong>on</strong>g> two nodes. Compound rules incorporate c<strong>on</strong>juncti<strong>on</strong><br />

and/or disjuncti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> rule. Such rules are modeled by adding <strong>on</strong>e or more nodes to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology and creating <str<strong>on</strong>g>the</str<strong>on</strong>g> appropriate semantic implicati<strong>on</strong> bridges between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> new node in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology. For more<br />

informati<strong>on</strong>, see [MWK00].<br />

In order to allow for value transformati<strong>on</strong>s, ONION <str<strong>on</strong>g>of</str<strong>on</strong>g>fers <str<strong>on</strong>g>the</str<strong>on</strong>g> possibility <str<strong>on</strong>g>of</str<strong>on</strong>g> associating<br />

a functi<strong>on</strong> with an edge in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology. Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> such functi<strong>on</strong>s are<br />

currency c<strong>on</strong>versi<strong>on</strong> and c<strong>on</strong>versi<strong>on</strong> between different distance measures.<br />

Automati<strong>on</strong> Support The <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules are created in a semi-automatic process<br />

with SKAT [MWJ99] (Semantic Knowledge Annotati<strong>on</strong> Tool), which proposes <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />

rules to <str<strong>on</strong>g>the</str<strong>on</strong>g> expert and <str<strong>on</strong>g>the</str<strong>on</strong>g> expert can ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r accept or decline <str<strong>on</strong>g>the</str<strong>on</strong>g>se proposals and also<br />

specify rules which are not proposed by <str<strong>on</strong>g>the</str<strong>on</strong>g> tool.<br />

SKAT does matching <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two source <strong>on</strong>tologies using both term matching and<br />

structural matching.<br />

Applicability to use cases ONION is a system for <str<strong>on</strong>g>the</str<strong>on</strong>g> unificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogeneous<br />

<strong>on</strong>tologies through <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies with <str<strong>on</strong>g>the</str<strong>on</strong>g> purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> query processing.<br />

The resulting <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology is presented to <str<strong>on</strong>g>the</str<strong>on</strong>g> user and is used (toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with <str<strong>on</strong>g>the</str<strong>on</strong>g>


CHAPTER 4. THE SURVEY 81<br />

source <strong>on</strong>tologies) by <str<strong>on</strong>g>the</str<strong>on</strong>g> user for querying. The ONION query system translates a query<br />

<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology to <str<strong>on</strong>g>the</str<strong>on</strong>g> actual source <strong>on</strong>tologies and executes <str<strong>on</strong>g>the</str<strong>on</strong>g> query <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

underlying <strong>on</strong>tologies. The results are <str<strong>on</strong>g>the</str<strong>on</strong>g>n translated back to <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology.<br />

ONION does not propose a strategy for unifying instances. The <strong>on</strong>tology obtained<br />

from applying <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> operator can be seen as <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> a merge operati<strong>on</strong>.<br />

The complete mapping process is included in ONION. In fact, <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping is just<br />

<strong>on</strong>e aspect <str<strong>on</strong>g>of</str<strong>on</strong>g> ONION, because ONION also provides <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time envir<strong>on</strong>ment for data<br />

integrati<strong>on</strong>.<br />

Implementati<strong>on</strong> The ONION architecture [MWK00, MW01] (Figure 4.18, taken from<br />

[MW01]) c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> four comp<strong>on</strong>ents:<br />

• The ONION data layer. This layer c<strong>on</strong>tains <str<strong>on</strong>g>the</str<strong>on</strong>g> wrappers for <str<strong>on</strong>g>the</str<strong>on</strong>g> external sources<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies that form <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges between <str<strong>on</strong>g>the</str<strong>on</strong>g> sources.<br />

• The ONION viewer. This is <str<strong>on</strong>g>the</str<strong>on</strong>g> user interface comp<strong>on</strong>ent <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system. The viewer<br />

visualizes both <str<strong>on</strong>g>the</str<strong>on</strong>g> source and <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies.<br />

• The ONION query system. The query system translates queries formulated in term<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> an <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology into a query executi<strong>on</strong> plan and executes <str<strong>on</strong>g>the</str<strong>on</strong>g> query.<br />

• The Articulati<strong>on</strong> Engine. The <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> generator takes <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules proposed<br />

by SKAT [MWJ99], <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Knowledge Articulati<strong>on</strong> Tool, and generates<br />

sets <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules, which are forwarded to <str<strong>on</strong>g>the</str<strong>on</strong>g> expert for c<strong>on</strong>firmati<strong>on</strong>.<br />

The different comp<strong>on</strong>ents in <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture have been implemented as a research<br />

prototype to support a PhD <str<strong>on</strong>g>the</str<strong>on</strong>g>sis.<br />

Experiences [MWK00, MW01] do not show any real experiences with <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> ONION besides toy examples described in <str<strong>on</strong>g>the</str<strong>on</strong>g> papers.<br />

4.2.3 OBSERVER<br />

Summary OBSERVER (<strong>Ontology</strong> Based System Enhanced with Relati<strong>on</strong>ships for Vocabulary<br />

hEterogeneity Resoluti<strong>on</strong>) [MIKS00] is a system which is intended to overcome<br />

problems with heterogeneity between distributed data repositories by using comp<strong>on</strong>ent<br />

<strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> explicit relati<strong>on</strong>ships between <str<strong>on</strong>g>the</str<strong>on</strong>g>se comp<strong>on</strong>ents. OBSERVER<br />

presents an architecture c<strong>on</strong>sisting <str<strong>on</strong>g>of</str<strong>on</strong>g> comp<strong>on</strong>ent nodes, each <str<strong>on</strong>g>of</str<strong>on</strong>g> which has its own <strong>on</strong>tology,<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> Inter-<strong>on</strong>tology Relati<strong>on</strong>ship Manager (IRM), which maintains mappings<br />

between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies at <str<strong>on</strong>g>the</str<strong>on</strong>g> different comp<strong>on</strong>ent nodes. Besides <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology, each comp<strong>on</strong>ent<br />

node c<strong>on</strong>tains a number <str<strong>on</strong>g>of</str<strong>on</strong>g> data repositories al<strong>on</strong>g with mappings to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology,


CHAPTER 4. THE SURVEY 82<br />

to enable semantic querying <str<strong>on</strong>g>of</str<strong>on</strong>g> data residing in <str<strong>on</strong>g>the</str<strong>on</strong>g>se repositories. When o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ents<br />

need to be queried, <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM provides mappings to <strong>on</strong>tologies <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ent<br />

nodes in order to enable querying. The user views <str<strong>on</strong>g>the</str<strong>on</strong>g> data in <str<strong>on</strong>g>the</str<strong>on</strong>g> system through its own<br />

local <strong>on</strong>tology, located at <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s comp<strong>on</strong>ent node.<br />

OBSERVER uses a comp<strong>on</strong>ent-based approach to <strong>on</strong>tology mapping. It provides<br />

brokering capabilities across domain <strong>on</strong>tologies to enhance distributed <strong>on</strong>tology querying,<br />

thus avoiding <str<strong>on</strong>g>the</str<strong>on</strong>g> need to have a global schema or collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts.<br />

OBSERVER uses multiple pre-existing <strong>on</strong>tologies to access heterogeneous, distributed<br />

and independently developed data repositories. Each repository is described by<br />

means <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e or more <strong>on</strong>tologies expressed using <str<strong>on</strong>g>the</str<strong>on</strong>g> Descripti<strong>on</strong> Logic (DL) system<br />

CLASSIC. The informati<strong>on</strong> requested from OBSERVER is expressed according to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user’s domain <strong>on</strong>tology, also expressed using DL. DL allows matching <str<strong>on</strong>g>the</str<strong>on</strong>g> query with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

available relevant data repositories, as well as translating it to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies used in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

local repositories.<br />

The system c<strong>on</strong>tains a number <str<strong>on</strong>g>of</str<strong>on</strong>g> comp<strong>on</strong>ent nodes, <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> which is <str<strong>on</strong>g>the</str<strong>on</strong>g> user node.<br />

Each node has an <strong>on</strong>tology server that provides definiti<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

and retrieves data underlying <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology in <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ent node. If <str<strong>on</strong>g>the</str<strong>on</strong>g> user wants to<br />

expand his query over different <strong>on</strong>tology servers, <str<strong>on</strong>g>the</str<strong>on</strong>g> original query needs to be translated<br />

from <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabulary <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s <strong>on</strong>tology into <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabulary <str<strong>on</strong>g>of</str<strong>on</strong>g> ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r’s comp<strong>on</strong>ent <strong>on</strong>tology.<br />

This is d<strong>on</strong>e in <str<strong>on</strong>g>the</str<strong>on</strong>g> following way: <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies are integrated,<br />

after which all terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> query for which a syn<strong>on</strong>ym exists in <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology, are<br />

replaced by this syn<strong>on</strong>ym and all o<str<strong>on</strong>g>the</str<strong>on</strong>g>r terms are replaced by semantically similar expressi<strong>on</strong><br />

that could lead to a full translati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user query. Each such term for which <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />

is no syn<strong>on</strong>ym in <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ent <strong>on</strong>tology is replaced by <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> its immediate<br />

parents or <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> its immediate children, recursively, until <str<strong>on</strong>g>the</str<strong>on</strong>g> rewritten query c<strong>on</strong>tains<br />

<strong>on</strong>ly terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ent <strong>on</strong>tology. Several alternative translati<strong>on</strong>s might be<br />

generated in this process, which are not exact in <str<strong>on</strong>g>the</str<strong>on</strong>g> case where terms were replace with<br />

semantically similar expressi<strong>on</strong> and not syn<strong>on</strong>yms.<br />

A method for evaluating <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> loss for <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> inexact translati<strong>on</strong>s was<br />

developed, with <str<strong>on</strong>g>the</str<strong>on</strong>g> purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> enabling <str<strong>on</strong>g>the</str<strong>on</strong>g> system to choose ”<str<strong>on</strong>g>the</str<strong>on</strong>g> best” am<strong>on</strong>g alternative<br />

translati<strong>on</strong>s. The user can opt for translati<strong>on</strong>s for which <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> loss is smaller<br />

than a certain value by setting a certain threshold. By setting this parameter to 0, <str<strong>on</strong>g>the</str<strong>on</strong>g> user<br />

specifies no loss at all. The loss <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> threshold is used by <str<strong>on</strong>g>the</str<strong>on</strong>g> query processor,<br />

which discards queries exceeding <str<strong>on</strong>g>the</str<strong>on</strong>g> threshold.<br />

The informati<strong>on</strong> loss is computed based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong>ly encountered metrics in<br />

Informati<strong>on</strong> Retrieval, precisi<strong>on</strong> and recall. The idea is to take into account both <str<strong>on</strong>g>the</str<strong>on</strong>g>se<br />

metrics, that is <str<strong>on</strong>g>the</str<strong>on</strong>g> loss in precisi<strong>on</strong> (percentage <str<strong>on</strong>g>of</str<strong>on</strong>g> retrieved objects which are not relevant)<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> loss in recall (percentage <str<strong>on</strong>g>of</str<strong>on</strong>g> relevant objects which are not retrieved), because it is<br />

not known a priori <str<strong>on</strong>g>the</str<strong>on</strong>g> preference <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user towards <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se measures. The formula<br />

for computing informati<strong>on</strong> loss based <strong>on</strong> precisi<strong>on</strong> loss and recall loss c<strong>on</strong>tains a weight<br />

which expresses <str<strong>on</strong>g>the</str<strong>on</strong>g> bias towards <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two metrics, which can be set by <str<strong>on</strong>g>the</str<strong>on</strong>g> user.


CHAPTER 4. THE SURVEY 83<br />

However <str<strong>on</strong>g>the</str<strong>on</strong>g>se metrics, i.e. precisi<strong>on</strong> and recall, had to be adapted in order to give<br />

higher priority to semantic relati<strong>on</strong>ships than to those suggested by <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying extensi<strong>on</strong>s.<br />

This problem arises due to <str<strong>on</strong>g>the</str<strong>on</strong>g> fact that extensi<strong>on</strong>s are coming from different<br />

<strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong>al relati<strong>on</strong>ships may not reflect <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic relati<strong>on</strong>ships.<br />

For example, a term in a user <strong>on</strong>tology which semantically subsumes a term in <str<strong>on</strong>g>the</str<strong>on</strong>g> target<br />

<strong>on</strong>tology may have a smaller extensi<strong>on</strong> than <str<strong>on</strong>g>the</str<strong>on</strong>g> child term.<br />

An Inter-<strong>on</strong>tology Relati<strong>on</strong>ship Manager (IRM) provides <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong>s between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

terms am<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g> different comp<strong>on</strong>ent <strong>on</strong>tologies. The IRM effectively c<strong>on</strong>tains a <strong>on</strong>eto-<strong>on</strong>e<br />

mapping between any two comp<strong>on</strong>ent <strong>on</strong>tologies. This module is able to deal<br />

with (intenti<strong>on</strong>al) Syn<strong>on</strong>ym, Hyp<strong>on</strong>ym, Hypernym, Overlap, Disjoint and Covering inter<strong>on</strong>tology<br />

relati<strong>on</strong>ships. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM is also able to deal with extensi<strong>on</strong>al relati<strong>on</strong>ships<br />

between <strong>on</strong>tologies through <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> so-called transformer functi<strong>on</strong>s.<br />

The user submits a query to <str<strong>on</strong>g>the</str<strong>on</strong>g> query processor in its own comp<strong>on</strong>ent node (in fact,<br />

each comp<strong>on</strong>ent node has a query processor). The query processor first uses <str<strong>on</strong>g>the</str<strong>on</strong>g> local<br />

<strong>on</strong>tology server to translate <str<strong>on</strong>g>the</str<strong>on</strong>g> query into queries <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> local data repositories and <str<strong>on</strong>g>the</str<strong>on</strong>g>n<br />

execute <str<strong>on</strong>g>the</str<strong>on</strong>g>m, after which <str<strong>on</strong>g>the</str<strong>on</strong>g> user can choose to incrementally increase <str<strong>on</strong>g>the</str<strong>on</strong>g> query to<br />

multiple <strong>on</strong>tologies. The query processor <str<strong>on</strong>g>the</str<strong>on</strong>g>n uses <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM to translate <str<strong>on</strong>g>the</str<strong>on</strong>g> query into<br />

terms used by <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ent <strong>on</strong>tologies and retrieve <str<strong>on</strong>g>the</str<strong>on</strong>g> results from <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

servers at <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ent nodes.<br />

Querying in OBSERVER c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> following three steps:<br />

1. The user c<strong>on</strong>structs <str<strong>on</strong>g>the</str<strong>on</strong>g> query using terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s <strong>on</strong>tology.<br />

2. The query processor uses <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology server to access <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data at <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user’s node. The query is executed against <str<strong>on</strong>g>the</str<strong>on</strong>g> local data repositories.<br />

3. In a process <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>trolled query expansi<strong>on</strong> to new <strong>on</strong>tologies <str<strong>on</strong>g>the</str<strong>on</strong>g> user can specify<br />

whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r he/she is satisfied with <str<strong>on</strong>g>the</str<strong>on</strong>g> query results or whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> query should be expanded<br />

to o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ent <strong>on</strong>tologies. In this case, <str<strong>on</strong>g>the</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships<br />

retrieved from <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM are used to rewrite queries and to transform query results.<br />

The <strong>on</strong>tology server can be queried in two ways. Informati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology itself<br />

can be retrieved and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology server can answer queries formulated over an <strong>on</strong>tology<br />

using <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings to <str<strong>on</strong>g>the</str<strong>on</strong>g> different data sources and <str<strong>on</strong>g>the</str<strong>on</strong>g> wrappers available for each data<br />

source. The query capabilities <str<strong>on</strong>g>of</str<strong>on</strong>g> each data source are c<strong>on</strong>sulted by <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology server,<br />

which creates a query plan and invokes <str<strong>on</strong>g>the</str<strong>on</strong>g> wrappers to retrieve <str<strong>on</strong>g>the</str<strong>on</strong>g> data from <str<strong>on</strong>g>the</str<strong>on</strong>g> sources.<br />

In principle, <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tology server is queried initially. The user can <str<strong>on</strong>g>the</str<strong>on</strong>g>n<br />

choose to incrementally expand <str<strong>on</strong>g>the</str<strong>on</strong>g> query over multiple <strong>on</strong>tologies in order to retrieve<br />

more results for <str<strong>on</strong>g>the</str<strong>on</strong>g> query.<br />

<strong>Ontology</strong> Languages Ontologies, as well as mappings between <strong>on</strong>tologies are specified<br />

using <str<strong>on</strong>g>the</str<strong>on</strong>g> Descripti<strong>on</strong> Logic system CLASSIC.


CHAPTER 4. THE SURVEY 84<br />

Figure 4.19: The general OBSERVER architecture [MIKS00]


CHAPTER 4. THE SURVEY 85<br />

Mapping Language In OBSERVER, <str<strong>on</strong>g>the</str<strong>on</strong>g>re exist two types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings, namely <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

mappings between data repositories and <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

We will first describe <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between data repositories and <strong>on</strong>tologies,<br />

after which we describe <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships.<br />

A data source is seen as c<strong>on</strong>sisting <str<strong>on</strong>g>of</str<strong>on</strong>g> entities and attributes (in <str<strong>on</strong>g>the</str<strong>on</strong>g> Entity-Relati<strong>on</strong>ship<br />

[Che79] sense <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> terms). Mapping between data sources and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology is represented<br />

by associating each term in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology with a number <str<strong>on</strong>g>of</str<strong>on</strong>g> Extended Relati<strong>on</strong><br />

Algebra (ERA) expressi<strong>on</strong>s. ERA is used as an intermediate language between <str<strong>on</strong>g>the</str<strong>on</strong>g> Descripti<strong>on</strong><br />

Logic expressi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data repositories. The<br />

wrapper is resp<strong>on</strong>sible for <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> between ERA and <str<strong>on</strong>g>the</str<strong>on</strong>g> data repository itself,<br />

which is straightforward if <str<strong>on</strong>g>the</str<strong>on</strong>g> data source is a relati<strong>on</strong>al database. It is interesting to<br />

note that in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping to roles in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology, ERA allows functi<strong>on</strong>s, which typically<br />

represent value transformati<strong>on</strong>s.<br />

OBSERVER deals with a number <str<strong>on</strong>g>of</str<strong>on</strong>g> types <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships in order to<br />

enable inter-operability:<br />

• Syn<strong>on</strong>ym. Two syn<strong>on</strong>ymous terms have <str<strong>on</strong>g>the</str<strong>on</strong>g> same semantics, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> same intended<br />

meaning. This does not guarantee that <str<strong>on</strong>g>the</str<strong>on</strong>g>y have <str<strong>on</strong>g>the</str<strong>on</strong>g> same extensi<strong>on</strong>.<br />

• Hyp<strong>on</strong>ym. A term is a hyp<strong>on</strong>ym <str<strong>on</strong>g>of</str<strong>on</strong>g> ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r term if it is less general, i.e. a term in<br />

<strong>on</strong>e <strong>on</strong>tology subsumes a term in ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology.<br />

• Hypernym. A term is a hypernym <str<strong>on</strong>g>of</str<strong>on</strong>g> ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r term if it is more general, i.e. a term<br />

in <strong>on</strong>e <strong>on</strong>tology is subsumed by a term in ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology.<br />

• Overlap. This means <str<strong>on</strong>g>the</str<strong>on</strong>g> two terms have an overlap, i.e. a n<strong>on</strong>-empty intersecti<strong>on</strong>.<br />

In OBSERVER, <str<strong>on</strong>g>the</str<strong>on</strong>g> overlap between terms is usually indicated with a percentage,<br />

which can be used to estimate <str<strong>on</strong>g>the</str<strong>on</strong>g> loss <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> in a query translati<strong>on</strong>.<br />

• Disjoint. This means <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no intersecti<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g> two terms.<br />

• Covering. When a term in <strong>on</strong>e <strong>on</strong>tology corresp<strong>on</strong>ds to a uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

<strong>on</strong>tology, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> term in <strong>on</strong>e <strong>on</strong>tology is covered by <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology. There does not exist an object represented by <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

term in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e <strong>on</strong>tology, which is not represented by <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> given children<br />

terms.<br />

The above menti<strong>on</strong>ed inter-<strong>on</strong>tology relati<strong>on</strong>ships explicate <str<strong>on</strong>g>the</str<strong>on</strong>g> intenti<strong>on</strong>al relati<strong>on</strong>ship<br />

between terms in two <strong>on</strong>tologies. However, when an intenti<strong>on</strong>al relati<strong>on</strong>ship between<br />

terms is true, it does not mean that this relati<strong>on</strong>ship holds also for <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong>s (i.e. sets<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> instances) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. For <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong>al level, a set <str<strong>on</strong>g>of</str<strong>on</strong>g> transformer functi<strong>on</strong>s<br />

between roles in different <strong>on</strong>tologies is used. These functi<strong>on</strong>s are used for both instance<br />

transformati<strong>on</strong> and instance unificati<strong>on</strong>.


CHAPTER 4. THE SURVEY 86<br />

The Inter-<strong>on</strong>tology Relati<strong>on</strong>ship Manager can be used to discover sets <str<strong>on</strong>g>of</str<strong>on</strong>g> related comp<strong>on</strong>ent<br />

<strong>on</strong>tologies, to retrieve related terms between <strong>on</strong>tologies and to perform value transformati<strong>on</strong>s<br />

from <strong>on</strong>e <strong>on</strong>tology representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r.<br />

Automati<strong>on</strong> Support There seems to be no automati<strong>on</strong> in creating <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between<br />

<strong>on</strong>tologies and/or data sources. However, known matching algorithms could be<br />

easily used to identify similarities between <strong>on</strong>tologies.<br />

The query processing, <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, is completely automated with <str<strong>on</strong>g>the</str<strong>on</strong>g> excepti<strong>on</strong><br />

that <str<strong>on</strong>g>the</str<strong>on</strong>g> user is still required in <str<strong>on</strong>g>the</str<strong>on</strong>g> incremental querying process to specify whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

comp<strong>on</strong>ent nodes need to be queried and he/she might also intervene to set some parameters<br />

like <str<strong>on</strong>g>the</str<strong>on</strong>g> bias towards precisi<strong>on</strong> or recall in <str<strong>on</strong>g>the</str<strong>on</strong>g> computati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> loss and<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> threshold for informati<strong>on</strong> loss in a query translati<strong>on</strong>. As already menti<strong>on</strong>ed, if <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

used desires <str<strong>on</strong>g>the</str<strong>on</strong>g> query to be answered by accessing o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ents as well, <str<strong>on</strong>g>the</str<strong>on</strong>g> query<br />

is translated into <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language(s) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ents. If <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no exact<br />

translati<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g>re might be alternative translati<strong>on</strong>s which are generated and evaluated by<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> system and <str<strong>on</strong>g>the</str<strong>on</strong>g> optimal <strong>on</strong>e, which does not surpass <str<strong>on</strong>g>the</str<strong>on</strong>g> threshold set by <str<strong>on</strong>g>the</str<strong>on</strong>g> user for<br />

informati<strong>on</strong> loss, is chosen.<br />

Applicability to Use Cases Instance transformati<strong>on</strong> and instance unificati<strong>on</strong> are both<br />

performed in <str<strong>on</strong>g>the</str<strong>on</strong>g> querying process. The query processor is resp<strong>on</strong>sible for transforming<br />

and correlating query results from <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology into <str<strong>on</strong>g>the</str<strong>on</strong>g> user <strong>on</strong>tology.<br />

The emphasis in OBSERVER is really <strong>on</strong> query rewriting. The relati<strong>on</strong>ships between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, expressed using Descripti<strong>on</strong> Logics, are used to rewrite <str<strong>on</strong>g>the</str<strong>on</strong>g> queries from<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> user’s <strong>on</strong>tology to <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ent <strong>on</strong>tology.<br />

OBSERVER is a data integrati<strong>on</strong> system and as such provides no explicit support for<br />

<strong>on</strong>tology merging, although in <str<strong>on</strong>g>the</str<strong>on</strong>g> query processing <strong>on</strong>tologies are automatically integrated<br />

based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships retrieved from <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM.<br />

Implementati<strong>on</strong> The OBSERVER architecture, depicted in Figure 4.19, c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />

number <str<strong>on</strong>g>of</str<strong>on</strong>g> comp<strong>on</strong>ent nodes and <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM node. A comp<strong>on</strong>ent node c<strong>on</strong>tains an <strong>Ontology</strong><br />

Server which provides for <str<strong>on</strong>g>the</str<strong>on</strong>g> interacti<strong>on</strong> with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources. It uses<br />

a repository <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings to relate <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources and to be able to<br />

translate queries <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology to queries <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data sources. The architecture<br />

c<strong>on</strong>tains <strong>on</strong>e Inter-<strong>Ontology</strong> Relati<strong>on</strong>ship Manager (IRM), which enables semantic<br />

inter-operati<strong>on</strong> between comp<strong>on</strong>ent nodes by maintaining <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>tologies.<br />

OBSERVER has been implemented as a prototype for <str<strong>on</strong>g>the</str<strong>on</strong>g> access <str<strong>on</strong>g>of</str<strong>on</strong>g> distributed heterogeneous<br />

data sources in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> bibliographic data.


CHAPTER 4. THE SURVEY 87<br />

Experiences The authors have reused different existing <strong>on</strong>tologies in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> bibliographic<br />

data and represented <str<strong>on</strong>g>the</str<strong>on</strong>g>m in CLASSIC for integrati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> OBSERVER architecture.<br />

Real-life <strong>on</strong>tologies and data repositories were used in <str<strong>on</strong>g>the</str<strong>on</strong>g> prototype. It turned<br />

out that <str<strong>on</strong>g>the</str<strong>on</strong>g> time required to access <str<strong>on</strong>g>the</str<strong>on</strong>g> distributed data repositories was <str<strong>on</strong>g>the</str<strong>on</strong>g> bottleneck for<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> prototype.<br />

4.2.4 MOMIS<br />

Summary MOMIS (Mediator envirOnment for Multiple Informati<strong>on</strong> Sources) approach<br />

[BCV99, BCVB01] is an approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogeneous data sources<br />

using a global <strong>on</strong>tology, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> a merge <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> local data schemas.<br />

The goal <str<strong>on</strong>g>of</str<strong>on</strong>g> MOMIS is to give <str<strong>on</strong>g>the</str<strong>on</strong>g> user a global virtual view (cf. [Hul97]) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

informati<strong>on</strong> coming from heterogeneous informati<strong>on</strong> sources. MOMIS creates a global<br />

mediati<strong>on</strong> schema (<strong>on</strong>tology) for <str<strong>on</strong>g>the</str<strong>on</strong>g> structured and semi-structured heterogeneous data<br />

sources, in order to provide to <str<strong>on</strong>g>the</str<strong>on</strong>g> user a uniform query interface to <str<strong>on</strong>g>the</str<strong>on</strong>g>se sources.<br />

The first step in <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global mediati<strong>on</strong> schema is <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Comm<strong>on</strong> Thesaurus from <str<strong>on</strong>g>the</str<strong>on</strong>g> disparate data sources. To do this, first a wrapper is created<br />

for each data source in <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3 language. ODL I 3 is an object-oriented language with<br />

an underlying Descripti<strong>on</strong> Logic language OLCD, which enables making inferences (e.g.<br />

subsumpti<strong>on</strong>) about <str<strong>on</strong>g>the</str<strong>on</strong>g> classes expressed in that language.<br />

Using <str<strong>on</strong>g>the</str<strong>on</strong>g> disparate schemas, a Comm<strong>on</strong> Thesaurus is created, which describes intraand<br />

inter-schema knowledge about ODL I 3 classes and attributes <str<strong>on</strong>g>of</str<strong>on</strong>g> source schemas. The<br />

Comm<strong>on</strong> Thesaurus is built in an incremental process in which relati<strong>on</strong>ships (between<br />

classes) are added based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source schemas, lexical properties <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

source classes and attributes (e.g. WordNet [Fel99] can be used to identify possible syn<strong>on</strong>yms),<br />

relati<strong>on</strong>ships supplied by <str<strong>on</strong>g>the</str<strong>on</strong>g> designer, and relati<strong>on</strong>ships inferred by <str<strong>on</strong>g>the</str<strong>on</strong>g> inference<br />

engine.<br />

Once <str<strong>on</strong>g>the</str<strong>on</strong>g> Comm<strong>on</strong> Thesaurus has been created, a tree <str<strong>on</strong>g>of</str<strong>on</strong>g> affinity clusters is created,<br />

in which c<strong>on</strong>cepts are clustered based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir (name and structural) affinity. The name<br />

affinity coefficient is calculated based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> terminological relati<strong>on</strong>ships between two<br />

classes. The structural affinity coefficient between two classes is calculated based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

level <str<strong>on</strong>g>of</str<strong>on</strong>g> matching <str<strong>on</strong>g>of</str<strong>on</strong>g> attribute relati<strong>on</strong>ships in <str<strong>on</strong>g>the</str<strong>on</strong>g> Comm<strong>on</strong> Thesaurus. The sum <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se<br />

two coefficients is <str<strong>on</strong>g>the</str<strong>on</strong>g> global affinity coefficient, which is used to c<strong>on</strong>struct <str<strong>on</strong>g>the</str<strong>on</strong>g> affinity<br />

tree, in which c<strong>on</strong>cepts with a high affinity are clustered toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r.<br />

For each cluster in <str<strong>on</strong>g>the</str<strong>on</strong>g> affinity tree, a global class is (interactively) created. For each<br />

global class a mapping (expressed in ODL I 3) is maintained to all <str<strong>on</strong>g>the</str<strong>on</strong>g> source classes.<br />

If we compare MOMIS with OBSERVER we can say that OBSERVER takes a minimalist<br />

approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships, specifying <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

bare minimum required for query processing, whereas MOMIS tries to identify all possible<br />

relati<strong>on</strong>ships between a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies, integrating <str<strong>on</strong>g>the</str<strong>on</strong>g>m in <strong>on</strong>e global <strong>on</strong>tology.


CHAPTER 4. THE SURVEY 88<br />

<strong>Ontology</strong> Languages A wrapper translates each data schema to <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3 representati<strong>on</strong>.<br />

MOMIS also deals with semi-structured data by extracting object patterns, which<br />

are used as schema informati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> source to generate <str<strong>on</strong>g>the</str<strong>on</strong>g> corresp<strong>on</strong>ding ODL I 3 descripti<strong>on</strong>.<br />

ODL I 3 is an object-oriented language with a translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> OLCD Descripti<strong>on</strong><br />

Logic languages in order to allow inferencing. OLCD is a KL-ONE [BS85] like <strong>on</strong>tology<br />

language, which allows classes (types), binary roles (attributes), disjuncti<strong>on</strong>, negati<strong>on</strong> and<br />

also has a number <str<strong>on</strong>g>of</str<strong>on</strong>g> base data types (integer, string, Boolean, real).<br />

Mapping Language Source schemas and object patterns are first translated into ODL I 3<br />

descripti<strong>on</strong>s. This translati<strong>on</strong> is performed by a wrapper. Then, a Comm<strong>on</strong> Thesaurus is<br />

created based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3 class descripti<strong>on</strong>s and attributes. The Comm<strong>on</strong> Thesaurus<br />

c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> four kinds <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships, which are added to <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>saurus in <str<strong>on</strong>g>the</str<strong>on</strong>g> following<br />

phases:<br />

1. Schema-derived relati<strong>on</strong>ships In this phase, <strong>on</strong>ly intra-schema relati<strong>on</strong>ships are<br />

c<strong>on</strong>sidered. Relati<strong>on</strong>ships within <strong>on</strong>e p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular schema are extracted, e.g. by<br />

exploiting foreign and primary key relati<strong>on</strong>ships in order to infer related and<br />

broader/narrower term relati<strong>on</strong>ships.<br />

2. Lexical-derived relati<strong>on</strong>ships Lexical relati<strong>on</strong>ships between names in different<br />

schemas are exploited to extract inter-schema relati<strong>on</strong>ships. WordNet [Fel99] is<br />

used to extract syn<strong>on</strong>yms and hypo/hypernyms. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, syn<strong>on</strong>ymous terms<br />

are also extracted from attributes with similar names.<br />

3. Designer-supplied relati<strong>on</strong>ships In this phase, <str<strong>on</strong>g>the</str<strong>on</strong>g> designer can supply relati<strong>on</strong>ships<br />

between schemas. A Descripti<strong>on</strong> Logic reas<strong>on</strong>er, such as ODB, can be used to<br />

check <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships for inc<strong>on</strong>sistency.<br />

4. Inferred relati<strong>on</strong>ships Descripti<strong>on</strong> Logic reas<strong>on</strong>ing is used to infer new relati<strong>on</strong>ships<br />

between ODL I 3 classes, based <strong>on</strong> relati<strong>on</strong>ships specified in <str<strong>on</strong>g>the</str<strong>on</strong>g> previous<br />

phases.<br />

In each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> phases, intenti<strong>on</strong>al relati<strong>on</strong>ships are added to <str<strong>on</strong>g>the</str<strong>on</strong>g> Comm<strong>on</strong> Thesaurus.<br />

The designer can streng<str<strong>on</strong>g>the</str<strong>on</strong>g>n <str<strong>on</strong>g>the</str<strong>on</strong>g>se intenti<strong>on</strong>al relati<strong>on</strong>ships by creating extensi<strong>on</strong>al relati<strong>on</strong>ships,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>reby enabling subsumpti<strong>on</strong> reas<strong>on</strong>ing and c<strong>on</strong>sistency checking. An intenti<strong>on</strong>al<br />

relati<strong>on</strong>ship can be seen as saying “<str<strong>on</strong>g>the</str<strong>on</strong>g>re exists in general a relati<strong>on</strong>ship between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>se terms”, whereas an extensi<strong>on</strong>al relati<strong>on</strong>ship can be seen as saying “this relati<strong>on</strong>ship<br />

holds for <str<strong>on</strong>g>the</str<strong>on</strong>g>se p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular data sources”.<br />

MOMIS employs hierarchical clustering based <strong>on</strong> an affinity measure, which indicates<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> similarity between classes in a cluster. The affinity measure is based <strong>on</strong> both <str<strong>on</strong>g>the</str<strong>on</strong>g> name<br />

and structural similarity.


CHAPTER 4. THE SURVEY 89<br />

The clusters in <str<strong>on</strong>g>the</str<strong>on</strong>g> hierarchy are used to interactively create new classes for <str<strong>on</strong>g>the</str<strong>on</strong>g> merged<br />

<strong>on</strong>tology. Generally, a uni<strong>on</strong> is taken <str<strong>on</strong>g>of</str<strong>on</strong>g> all classes in a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular cluster in order to come<br />

up with <str<strong>on</strong>g>the</str<strong>on</strong>g> new global class. During <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> creating <str<strong>on</strong>g>the</str<strong>on</strong>g> global class, mapping<br />

rules between <str<strong>on</strong>g>the</str<strong>on</strong>g> attributes in <str<strong>on</strong>g>the</str<strong>on</strong>g> local classes and <str<strong>on</strong>g>the</str<strong>on</strong>g> global class are established and<br />

stored in <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology for later use. Because <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology is created <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

basis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies, MOMIS takes <str<strong>on</strong>g>the</str<strong>on</strong>g> global-as-view approach [Lev00], which<br />

means that <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema is created as a view over <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas and all queries<br />

to <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema can be easily translated to <str<strong>on</strong>g>the</str<strong>on</strong>g> local representati<strong>on</strong> because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

presence <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping rules.<br />

It is not clear how new sources can really be integrated <strong>on</strong>ce <str<strong>on</strong>g>the</str<strong>on</strong>g> system is in place. It<br />

seems that <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema has to be re-created from <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas, although <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

computed affinity clusters can <str<strong>on</strong>g>of</str<strong>on</strong>g> course be reused and if <str<strong>on</strong>g>the</str<strong>on</strong>g> classes in <str<strong>on</strong>g>the</str<strong>on</strong>g> new schema to<br />

be integrated fit inside <str<strong>on</strong>g>the</str<strong>on</strong>g> existing affinity clusters, <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

new schema and <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema need to be created.<br />

Automati<strong>on</strong> Support Automati<strong>on</strong> support in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology merging task is provided by<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> ARTEMIS tool [CdA99]. The ARTEMIS tool provides support in <str<strong>on</strong>g>the</str<strong>on</strong>g> matching task<br />

by determining <str<strong>on</strong>g>the</str<strong>on</strong>g> (name and structural) affinity between terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

Applicability to Use Cases In MOMIS, <str<strong>on</strong>g>the</str<strong>on</strong>g> Query Manager does query rewriting based<br />

<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules in <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology. The wrapper <str<strong>on</strong>g>of</str<strong>on</strong>g> each data source rewrites<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> query from its ODL I 3 representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data source and also<br />

transforms <str<strong>on</strong>g>the</str<strong>on</strong>g> query results back to <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3 representati<strong>on</strong>. The query manager <str<strong>on</strong>g>the</str<strong>on</strong>g>n<br />

uses <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules to translate <str<strong>on</strong>g>the</str<strong>on</strong>g> query results back to <str<strong>on</strong>g>the</str<strong>on</strong>g> global representati<strong>on</strong> in<br />

order to present <str<strong>on</strong>g>the</str<strong>on</strong>g> results to <str<strong>on</strong>g>the</str<strong>on</strong>g> user. It is not exactly clear if and how <str<strong>on</strong>g>the</str<strong>on</strong>g> Query Manager<br />

fuses <str<strong>on</strong>g>the</str<strong>on</strong>g> query results from <str<strong>on</strong>g>the</str<strong>on</strong>g> different sources in case <str<strong>on</strong>g>of</str<strong>on</strong>g> overlap in <str<strong>on</strong>g>the</str<strong>on</strong>g> result sets.<br />

Implementati<strong>on</strong> A number <str<strong>on</strong>g>of</str<strong>on</strong>g> comp<strong>on</strong>ents are used to enable <str<strong>on</strong>g>the</str<strong>on</strong>g> MOMIS architecture.<br />

These comp<strong>on</strong>ents are (see Figure 4.20, taken from [BCVB01]):<br />

• A wrapper performs <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> individual data source into <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3<br />

language (and translates <str<strong>on</strong>g>the</str<strong>on</strong>g> queries back).<br />

• The mediator c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query manager (QM) and <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema builder<br />

(GSB). The QM comp<strong>on</strong>ent breaks up global ODL I 3 queries into sub-queries for<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> different data sources. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> GSB is an <str<strong>on</strong>g>of</str<strong>on</strong>g>fline comp<strong>on</strong>ent, which aids<br />

in <strong>on</strong>tology merging and <str<strong>on</strong>g>the</str<strong>on</strong>g> QM is a run-time comp<strong>on</strong>ent, which performs <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

queries.<br />

• The ARTEMIS tool envir<strong>on</strong>ment performs <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> (affinity and syn<str<strong>on</strong>g>the</str<strong>on</strong>g>sis)<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> classes for <str<strong>on</strong>g>the</str<strong>on</strong>g> syn<str<strong>on</strong>g>the</str<strong>on</strong>g>sis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global classes.


CHAPTER 4. THE SURVEY 90<br />

• The ODB-tools engine performs schema validati<strong>on</strong> and inferences for <str<strong>on</strong>g>the</str<strong>on</strong>g> generati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Comm<strong>on</strong> Thesaurus, as well as query optimizati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> Query Manager.<br />

Figure 4.20: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> MOMIS system [BCVB01]<br />

The architecture in Figure 4.20 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> main tools used to support <str<strong>on</strong>g>the</str<strong>on</strong>g> overall architecture.<br />

A disadvantage is that <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no integrated tool envir<strong>on</strong>ment. Any data source<br />

can be c<strong>on</strong>nected to <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture, as l<strong>on</strong>g as an ODL I 3 wrapper is created.<br />

Experience As far as we are aware, MOMIS has not been used in any major industrial<br />

project and is mainly an academic research activity, with toy examples. However,<br />

[CAdV01] reports <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> ARTEMIS in a research project in <str<strong>on</strong>g>the</str<strong>on</strong>g> Italian Public<br />

Administrati<strong>on</strong> domain.<br />

4.2.5 INFOMIX<br />

INFOMIX [LGR + 05b, LGR + 05a, LLR02] is a system that supports informati<strong>on</strong> integrati<strong>on</strong><br />

by utilizing advanced reas<strong>on</strong>ing capabilities. The INFOMIX system is built in cooperati<strong>on</strong><br />

with RODAN systems 18 which is a commercial database management system<br />

developer. It uses <str<strong>on</strong>g>the</str<strong>on</strong>g> DLV reas<strong>on</strong>ing system [LPF + 05] for <str<strong>on</strong>g>the</str<strong>on</strong>g> reas<strong>on</strong>ing tasks. The DLV<br />

system is a disjunctive datalog reas<strong>on</strong>ing system and has been delevoped independently<br />

18 The web site <str<strong>on</strong>g>of</str<strong>on</strong>g> RODAN systems can be found at http://www.rodan.pl/en/.


CHAPTER 4. THE SURVEY 91<br />

from INFOMIX. However, INFOMIX uses <str<strong>on</strong>g>the</str<strong>on</strong>g> DLV system for solving its reas<strong>on</strong>ing<br />

tasks.<br />

Figure 4.21: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system [LGR + 05b]<br />

The INFOMIX architecture is depicted in Figure 4.21. It is divided into three levels<br />

and supports two modes:<br />

• a design mode<br />

Here, <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema, <str<strong>on</strong>g>the</str<strong>on</strong>g> source schema and <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g>m are<br />

specified. Also, <str<strong>on</strong>g>the</str<strong>on</strong>g> wrappers for <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources are created or imported. The data<br />

sources c<strong>on</strong>sist <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>al and XML data.<br />

• a query mode<br />

Here, query answering facilities are provided at run time, including data acquisiti<strong>on</strong>,<br />

integrati<strong>on</strong>, answer computati<strong>on</strong> and presentati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> user.<br />

In both modes, <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system is divided into three levels:<br />

• Informati<strong>on</strong> Service Level<br />

Here, a direct interface to <str<strong>on</strong>g>the</str<strong>on</strong>g> user is provided at run time and to <str<strong>on</strong>g>the</str<strong>on</strong>g> designer at


CHAPTER 4. THE SURVEY 92<br />

design time. This level deals with global data and provides <str<strong>on</strong>g>the</str<strong>on</strong>g> interfaces that are<br />

necessary (e.g. for global schema definiti<strong>on</strong> and for query formulati<strong>on</strong>.<br />

This level comprises two modules: <str<strong>on</strong>g>the</str<strong>on</strong>g> Informati<strong>on</strong> Model Manager and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Query Formulati<strong>on</strong> module:<br />

– The Informati<strong>on</strong> Model Manager handles <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas, as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. User-friendly interfaces for<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>se tasks, including schema browsers, are provided. Automatic support<br />

for <str<strong>on</strong>g>the</str<strong>on</strong>g> verificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> coherency, redundancy and adequacy <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong><br />

specificati<strong>on</strong> is given. Finally, this module presents query results in a suitable<br />

form to <str<strong>on</strong>g>the</str<strong>on</strong>g> user.<br />

– The Query Formulati<strong>on</strong> module provides a graphical, user-friendly interface<br />

for query formulati<strong>on</strong> over <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema and query validati<strong>on</strong> facilities.<br />

The query validati<strong>on</strong> facilities check <str<strong>on</strong>g>the</str<strong>on</strong>g> interacti<strong>on</strong>s between <str<strong>on</strong>g>the</str<strong>on</strong>g> user query<br />

and global integrity c<strong>on</strong>straints to guarantee that query answering is always<br />

decidable.<br />

• Internal Integrati<strong>on</strong> Level<br />

This level is based <strong>on</strong> computati<strong>on</strong>al logic and deductive database technology. It is<br />

composed by three modules:<br />

– The Query Rewriter reformulates <str<strong>on</strong>g>the</str<strong>on</strong>g> user query according to global integrity<br />

c<strong>on</strong>straints. It makes use <str<strong>on</strong>g>of</str<strong>on</strong>g> a submodule to verify data c<strong>on</strong>sistency. This<br />

submodule exploits <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping and unfolds <str<strong>on</strong>g>the</str<strong>on</strong>g> user query over <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />

relati<strong>on</strong>s and activates <str<strong>on</strong>g>the</str<strong>on</strong>g> corresp<strong>on</strong>ding wrappers to retrieve relevant data.<br />

Afterwards, <str<strong>on</strong>g>the</str<strong>on</strong>g> submodule checks we<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g>re are integrity c<strong>on</strong>straint violati<strong>on</strong>s.<br />

If no violati<strong>on</strong>s occur, <str<strong>on</strong>g>the</str<strong>on</strong>g> reformulati<strong>on</strong> produced by <str<strong>on</strong>g>the</str<strong>on</strong>g> rewriter is a<br />

simple (disjuncti<strong>on</strong> free) Datalog program. O<str<strong>on</strong>g>the</str<strong>on</strong>g>rwise, a suitable disjunctive<br />

Datalog program is generated that performs automatic repair <str<strong>on</strong>g>of</str<strong>on</strong>g> data, in a way<br />

such that cautious answers to this program evaluated over <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources<br />

corresp<strong>on</strong>d to <str<strong>on</strong>g>the</str<strong>on</strong>g> certain answers to <str<strong>on</strong>g>the</str<strong>on</strong>g> query.<br />

– The Query Optimizer provides several optimizati<strong>on</strong> strategies which enhance<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> efficiency <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system. In p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular, <str<strong>on</strong>g>the</str<strong>on</strong>g> module exploits some<br />

focusing techniques which are able to isolate <str<strong>on</strong>g>the</str<strong>on</strong>g> porti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source database<br />

that is relevant to answer a user query. This is d<strong>on</strong>e by pushing c<strong>on</strong>stants in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> query towards <str<strong>on</strong>g>the</str<strong>on</strong>g> sources, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query that c<strong>on</strong>tains c<strong>on</strong>stants<br />

is processed first. To this aim, an optimized (possibly disjunctive) Datalog<br />

program is generated by applying advanced binding propagati<strong>on</strong> techniques<br />

similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> magic sets algorithms that are well-known in <str<strong>on</strong>g>the</str<strong>on</strong>g> database area.<br />

– The Query Evaluator takes <str<strong>on</strong>g>the</str<strong>on</strong>g> optimized program and first loads data from<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> Internal Data Store and <str<strong>on</strong>g>the</str<strong>on</strong>g>n invokes <str<strong>on</strong>g>the</str<strong>on</strong>g> DLV system in order to com-


CHAPTER 4. THE SURVEY 93<br />

pute c<strong>on</strong>sistent answers. The results are <str<strong>on</strong>g>the</str<strong>on</strong>g>n sent to <str<strong>on</strong>g>the</str<strong>on</strong>g> Informati<strong>on</strong> Model<br />

Manager for suitable presentati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> user.<br />

• Data Acquisiti<strong>on</strong> and Transformati<strong>on</strong> Level<br />

This level provides access to external data sources. INFOMIX has an architecture<br />

which allows for <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogenous types <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources. The primary<br />

types <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources are relati<strong>on</strong>al, XML, HTML, and object-oriented data sources.<br />

However, it is claimed that arbitrary o<str<strong>on</strong>g>the</str<strong>on</strong>g>r types <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources can be incorporated<br />

easily. All data sources are c<strong>on</strong>ceptually transformed into a uniform source data<br />

format, which is a fragment <str<strong>on</strong>g>of</str<strong>on</strong>g> XML Schema. Data encoded in this uniform source<br />

data format can be browsed. The acquisiti<strong>on</strong> and transformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> data is d<strong>on</strong>e by<br />

wrappers. A query plan for executing suitable wrappers is generated, which load<br />

data into <str<strong>on</strong>g>the</str<strong>on</strong>g> Internal Data Store. C<strong>on</strong>stants are pushed to <str<strong>on</strong>g>the</str<strong>on</strong>g> query wrappers whenever<br />

it is possible in order to reduce <str<strong>on</strong>g>the</str<strong>on</strong>g> amount <str<strong>on</strong>g>of</str<strong>on</strong>g> data retrieved.<br />

Currently, INFOMIX <str<strong>on</strong>g>of</str<strong>on</strong>g>fers three classes <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers, which provide different levels<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> support for query formulati<strong>on</strong> and wrapper code generati<strong>on</strong>:<br />

– Code wrappers are basically a definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an API and <str<strong>on</strong>g>the</str<strong>on</strong>g> code implementing<br />

it. The internals and characteristics <str<strong>on</strong>g>of</str<strong>on</strong>g> code wrappers are <str<strong>on</strong>g>the</str<strong>on</strong>g>refore inaccessible<br />

to INFOMIX.<br />

– Query wrappers propagate queries to external data sources and treat <str<strong>on</strong>g>the</str<strong>on</strong>g> result<br />

as a logical data source.<br />

– Visual wrappers support interactive development <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers at design time.<br />

Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is support for developing LiXto wrappers and pipes as well as<br />

for Rodans Data Extractor.<br />

<strong>Ontology</strong> Languages The INFOMIX system supports <str<strong>on</strong>g>the</str<strong>on</strong>g> following kinds <str<strong>on</strong>g>of</str<strong>on</strong>g> external<br />

data sources:<br />

• Relati<strong>on</strong>al data in <str<strong>on</strong>g>the</str<strong>on</strong>g> ODBC standard, i.e. especially SQL, is supported.<br />

• Any object-oriented database that c<strong>on</strong>forms to <str<strong>on</strong>g>the</str<strong>on</strong>g> ODMG standard (cf<br />

http://www.odmg.org/) and which is accessible by <str<strong>on</strong>g>the</str<strong>on</strong>g> ODMG 2.0 Java binding will<br />

be supported. In p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular, <str<strong>on</strong>g>the</str<strong>on</strong>g> object-oriented database must be accessible by OQL<br />

queries (more specifically, by <str<strong>on</strong>g>the</str<strong>on</strong>g> subset <str<strong>on</strong>g>of</str<strong>on</strong>g> OQL supported by <str<strong>on</strong>g>the</str<strong>on</strong>g> ODMG Java<br />

binding). However, no multimedia data types will be supported although usually<br />

supported by object-oriented database management systems. Currently <str<strong>on</strong>g>the</str<strong>on</strong>g> support<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> object-oriented databases is not available.<br />

• Pure (semi-structured) XML data (according to <str<strong>on</strong>g>the</str<strong>on</strong>g> XML 1.0 standard) as well as<br />

XML data structured according to a corresp<strong>on</strong>ding XML Schema specificati<strong>on</strong> is<br />

supported. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system does not check <str<strong>on</strong>g>the</str<strong>on</strong>g> validity <str<strong>on</strong>g>of</str<strong>on</strong>g> structured<br />

XML data. It also does not take DTD specificati<strong>on</strong>s into account.


CHAPTER 4. THE SURVEY 94<br />

• Valid web pages c<strong>on</strong>forming to <str<strong>on</strong>g>the</str<strong>on</strong>g> HTML 2.0, 3.2 and 4.1 standards are supported.<br />

However, <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system does not check <str<strong>on</strong>g>the</str<strong>on</strong>g> validity <str<strong>on</strong>g>of</str<strong>on</strong>g> HTML data. Note<br />

that HTML rendered versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> embedded data formats like comma separated values<br />

(CVS), Micros<str<strong>on</strong>g>of</str<strong>on</strong>g>t Word (planned), Excel and Powerpoint (planned) documents<br />

as well PDF documents are supported. INFOMIX does not support multi-media<br />

data such as image files, sound or movie data.<br />

Data extracti<strong>on</strong> and preprocessing <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources is d<strong>on</strong>e by LiXto [BFG01] wrappers<br />

and tools.<br />

The data is internally stored in a language called ISDF (INFOMIX source data format)<br />

and has an XML-like syntax. It is specified by a subset <str<strong>on</strong>g>of</str<strong>on</strong>g> XML Schema. Essentially,<br />

ISDF amounts to a small extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>al format, by allowing nesting <str<strong>on</strong>g>of</str<strong>on</strong>g> data<br />

and multiple attribute values. This format is translated by <str<strong>on</strong>g>the</str<strong>on</strong>g> query reformulator into<br />

disjunctive Datalog programs extended by aggregate functi<strong>on</strong>s.<br />

Mapping Language The mapping is a Global-As-View (GAV) mapping. I.e. <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a<br />

global schema G and a source schema S which comprises <str<strong>on</strong>g>the</str<strong>on</strong>g> schemas <str<strong>on</strong>g>of</str<strong>on</strong>g> all <str<strong>on</strong>g>the</str<strong>on</strong>g> sources to<br />

be integrated. It c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a set <str<strong>on</strong>g>of</str<strong>on</strong>g> logical implicati<strong>on</strong>s in Datalog. Such an implicati<strong>on</strong><br />

has a c<strong>on</strong>juncti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> atoms in S as antecedent and as c<strong>on</strong>sequent a relati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> G. All<br />

free variables <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> body <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> implicati<strong>on</strong> are quantified. This means that each global<br />

relati<strong>on</strong> is associated with a uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>junctive queries. Clearly, this mapping language<br />

is not as general as our mapping language which can be grounded to different kinds <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

logics and is much more expressive.<br />

Mapping Patterns There are a user-friendly interface and schema browsers for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

global schema, <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings. However, <str<strong>on</strong>g>the</str<strong>on</strong>g>re does not exist a<br />

c<strong>on</strong>cept <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns. We are not aware <str<strong>on</strong>g>of</str<strong>on</strong>g> planned extensi<strong>on</strong>s to use mapping<br />

patterns.<br />

Automati<strong>on</strong> Support There is no automati<strong>on</strong> support in <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings between<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas and <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

data fusi<strong>on</strong> is completely automated based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> global schemas as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> wrapper specificati<strong>on</strong>s between <str<strong>on</strong>g>the</str<strong>on</strong>g> original data<br />

source formats and <str<strong>on</strong>g>the</str<strong>on</strong>g> internal storage format.<br />

Applicability to Use Cases Instance transformati<strong>on</strong> and unificati<strong>on</strong> can be performed<br />

by <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system. Query rewriting is d<strong>on</strong>e by <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system as well<br />

naturally.<br />

The INFOMIX system does not perform <strong>on</strong>tology merging. However, it translates <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

relevant data into a uniform data format. This can be viewed in some respect as <strong>on</strong>tology


CHAPTER 4. THE SURVEY 95<br />

merging.<br />

Implementati<strong>on</strong> The kernel <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system is be developed in C++ to achieve better performance.<br />

The user interfaces and data-extracti<strong>on</strong> modules are based <strong>on</strong> Java comp<strong>on</strong>ents.<br />

Experience There is no experience report available yet.<br />

4.2.6 AutoMed<br />

AutoMed [BKL + 04] is an integrated system that incorporates <str<strong>on</strong>g>the</str<strong>on</strong>g> BAV (both as view)<br />

approach. This means that it does not adhere to ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> LAV (local as view) approach<br />

or <str<strong>on</strong>g>the</str<strong>on</strong>g> GAV (global as view) or GLAV (global local as view) approach. BAV subsumes<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> expressive power <str<strong>on</strong>g>of</str<strong>on</strong>g> all <str<strong>on</strong>g>the</str<strong>on</strong>g>se menti<strong>on</strong>ed approaches.<br />

Figure 4.22 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed system. The meta data repository<br />

can be viewed as <str<strong>on</strong>g>the</str<strong>on</strong>g> core and o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware architecture are implemented<br />

up<strong>on</strong> it. When a data source is wrapped, a definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> schema for that data<br />

source is added to <str<strong>on</strong>g>the</str<strong>on</strong>g> repository. The schema matching tool can <str<strong>on</strong>g>the</str<strong>on</strong>g>n identify related<br />

objects in various data sources by means <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query processor that retrieves data from<br />

schema objects. The template transformati<strong>on</strong> tool is used to generate transformati<strong>on</strong>s<br />

between <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources. A GUI is available for <str<strong>on</strong>g>the</str<strong>on</strong>g>se comp<strong>on</strong>ents.<br />

The repository has two logical comp<strong>on</strong>ents:<br />

• The model defniti<strong>on</strong>s repository (MDR) defines how a data modelling language<br />

is represented (in <str<strong>on</strong>g>the</str<strong>on</strong>g> hypergraph data model (HDM)). It can be used to c<strong>on</strong>figure<br />

AutoMed to handle a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular data modelling language.<br />

• The schema transformati<strong>on</strong> repository (STR) defines schemas in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

data modelling c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> MDR and transformati<strong>on</strong>s that will be specified<br />

between those schemas. Most tools and users will be c<strong>on</strong>cerned with editing this<br />

repository when new databases are added to <str<strong>on</strong>g>the</str<strong>on</strong>g> repository.<br />

The MDR and STR may be held in <str<strong>on</strong>g>the</str<strong>on</strong>g> same or in separate persistent storages.<br />

Figure 4.23 gives an overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> key objects in <str<strong>on</strong>g>the</str<strong>on</strong>g> repository. The STR c<strong>on</strong>tains<br />

a set <str<strong>on</strong>g>of</str<strong>on</strong>g> Schema descripti<strong>on</strong>s. Each such Schema descripti<strong>on</strong> c<strong>on</strong>tains a set <str<strong>on</strong>g>of</str<strong>on</strong>g> SchemaObject<br />

instances. Each such SchemaObject instance must be based <strong>on</strong> a C<strong>on</strong>struct instance<br />

that exists in <str<strong>on</strong>g>the</str<strong>on</strong>g> MDR. This C<strong>on</strong>struct describes how <str<strong>on</strong>g>the</str<strong>on</strong>g> SchemaObject can be c<strong>on</strong>structed<br />

in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> strings and references to o<str<strong>on</strong>g>the</str<strong>on</strong>g>r schema objects, and <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>struct to <str<strong>on</strong>g>the</str<strong>on</strong>g> HDM. Schemas are thus easily translatable into HDM which is<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> underlying representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> all <str<strong>on</strong>g>the</str<strong>on</strong>g> data modelling languages handled in<br />

AutoMed. Note that each Schema may c<strong>on</strong>tain SchemaObjects from more that <strong>on</strong>e data


CHAPTER 4. THE SURVEY 96<br />

Figure 4.22: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed system [BKL + 04]<br />

modelling language. In this way, mappings between different data modelling languages<br />

can be described. Schemas may be related to each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r using transformati<strong>on</strong> instances.<br />

For this deliverable <str<strong>on</strong>g>the</str<strong>on</strong>g> schema matching architecture is most relevant and most important.<br />

Figure 4.24 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> schema mapping module.<br />

Within AutoMed, five types <str<strong>on</strong>g>of</str<strong>on</strong>g> semantic relati<strong>on</strong>ships between schema objects are distinguished<br />

based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir intenti<strong>on</strong>al domains, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> sets <str<strong>on</strong>g>of</str<strong>on</strong>g> real-world<br />

entities represented by <str<strong>on</strong>g>the</str<strong>on</strong>g> schema objects. These are: equivalence, subsumpti<strong>on</strong>, intersecti<strong>on</strong>,<br />

disjointness and incompatibility. Transformati<strong>on</strong> rules that should be performed<br />

<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> schemas based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> discovered semantic relati<strong>on</strong>ships are defined in [Riz03]. In<br />

AutoMed, <str<strong>on</strong>g>the</str<strong>on</strong>g> automatic discovery <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic relati<strong>on</strong>ships are performed by means<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> a bidirecti<strong>on</strong>al comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> schema objects.<br />

The relati<strong>on</strong>ship discovery process is implemented by means <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture in<br />

Figure 4.24. The architecture c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> several modules that exploit different types <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

informati<strong>on</strong> in order to compute bidirecti<strong>on</strong>al similarity degrees <str<strong>on</strong>g>of</str<strong>on</strong>g> schema objects. The<br />

currently implemented modules compare schema object names, instances, statistical data<br />

over <str<strong>on</strong>g>the</str<strong>on</strong>g> instances, data types, value ranges and lengths. There are two types <str<strong>on</strong>g>of</str<strong>on</strong>g> modules:<br />

relati<strong>on</strong>ship identificati<strong>on</strong> modules attempt to discover compatible pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> schema objects,<br />

and relati<strong>on</strong>ship clarificati<strong>on</strong> modules attempt to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic<br />

relati<strong>on</strong>ship in each compatible pair. Initially in <str<strong>on</strong>g>the</str<strong>on</strong>g> schema matching process, <str<strong>on</strong>g>the</str<strong>on</strong>g> bidirecti<strong>on</strong>al<br />

similarity degrees produced by <str<strong>on</strong>g>the</str<strong>on</strong>g> modules are combined by a Filter that uses <str<strong>on</strong>g>the</str<strong>on</strong>g>


CHAPTER 4. THE SURVEY 97<br />

Figure 4.23: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed repository schema [BKL + 04]<br />

average aggregati<strong>on</strong> strategy in order to separate <str<strong>on</strong>g>the</str<strong>on</strong>g> compatible from <str<strong>on</strong>g>the</str<strong>on</strong>g> incompatible<br />

pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> schema objects. Afterwards, <str<strong>on</strong>g>the</str<strong>on</strong>g> Aggregator comp<strong>on</strong>ent combines <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity<br />

degrees <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> compatible schema objects with <str<strong>on</strong>g>the</str<strong>on</strong>g> product aggregati<strong>on</strong> strategy and indicates<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>ir semantic relati<strong>on</strong>ships. The output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Aggregator becomes <str<strong>on</strong>g>the</str<strong>on</strong>g> input <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> Degree Combinator, which is based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship clarificati<strong>on</strong> modules and <strong>on</strong><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> values <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity degrees. The Degree Combinator has as output <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic<br />

relati<strong>on</strong>ships it has discovered. The user can <str<strong>on</strong>g>the</str<strong>on</strong>g>n validate or reject <str<strong>on</strong>g>the</str<strong>on</strong>g> discovered<br />

relati<strong>on</strong>ships and step ahead to <str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong> process.<br />

<strong>Ontology</strong> Languages The AutoMed system can use different <strong>Ontology</strong> languages. It<br />

supports RDF and RDFS as well as XML and comm<strong>on</strong> relati<strong>on</strong>al Databases formats.<br />

Mapping Language The mapping language is <str<strong>on</strong>g>the</str<strong>on</strong>g> hypergraph data model (HDM). This<br />

is <str<strong>on</strong>g>the</str<strong>on</strong>g> internal language that is used.<br />

Mapping Patterns Mapping Patterns are not menti<strong>on</strong>ed.<br />

Automati<strong>on</strong> Support There is automatic support for <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings. The<br />

mappings creati<strong>on</strong> architecture shown in Figure 4.24 displays <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ents that are<br />

used to create automatic mappings.<br />

Applicability to Use Cases All use cases can be applied.


CHAPTER 4. THE SURVEY 98<br />

Figure 4.24: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed mapping system [BKL + 04]<br />

Implementati<strong>on</strong> The AutoMed system has been implemented in Java. The different<br />

comp<strong>on</strong>ents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture have been implemented in Java as well.<br />

Experience The authors are not aware <str<strong>on</strong>g>of</str<strong>on</strong>g> experience reports <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed system.<br />

4.3 Specific Techniques<br />

In this secti<strong>on</strong> we briefly describe some specific techniques for <strong>on</strong>tology merging such as<br />

FCA-Merge, OntoMorph, and QOM.<br />

4.3.1 FCA-Merge<br />

FCA-Merge [SM01] is a method for merging <strong>on</strong>tologies based <strong>on</strong> Formal C<strong>on</strong>cept Analysis<br />

[GW99]. The FCA-Merge approach is a bottom-up approach, which means that it<br />

is based <strong>on</strong> applicati<strong>on</strong>-specific instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies that need to be merged.


CHAPTER 4. THE SURVEY 99<br />

A set <str<strong>on</strong>g>of</str<strong>on</strong>g> documents that are relevant to both <strong>on</strong>tologies are provided as input. Using linguistic<br />

analysis, instances are extracted from <str<strong>on</strong>g>the</str<strong>on</strong>g> documents for both <strong>on</strong>tologies. Now a<br />

pruned c<strong>on</strong>cept lattice is generated using <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity in instances for both <strong>on</strong>tologies.<br />

These first two steps (lexical analysis and generating <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cept lattice) are carried out<br />

fully automatically. In <str<strong>on</strong>g>the</str<strong>on</strong>g> third and last step <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> method, <str<strong>on</strong>g>the</str<strong>on</strong>g> merged target <strong>on</strong>tology is<br />

created interactively (i.e. semi-automatically).<br />

4.3.2 OntoMorph<br />

The OntoMorph system aims to facilitate <strong>on</strong>tology merging and <str<strong>on</strong>g>the</str<strong>on</strong>g> rapid generati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

knowledge base translators [Cha00]. It combines two powerful mechanisms to describe<br />

KB transformati<strong>on</strong>s. The first <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se mechanisms is syntactic rewriting via patterndirected<br />

rewrite rules that allow <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cise specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> sentence-level transformati<strong>on</strong>s<br />

based <strong>on</strong> pattern matching, and <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d mechanism involves semantic rewriting<br />

which modulates syntactic rewriting via semantic models and logical inference. The integrati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies can be based <strong>on</strong> any mixture <str<strong>on</strong>g>of</str<strong>on</strong>g> syntactic and semantic criteria.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactic rewriting process, input expressi<strong>on</strong>s are first tokenized into lexemes<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g>n represented as syntax trees, which are represented internally as flat sequences<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> tokens and <str<strong>on</strong>g>the</str<strong>on</strong>g>ir structure <strong>on</strong>ly exists logically. OntoMorph’s pattern language and<br />

executi<strong>on</strong> model is str<strong>on</strong>gly influenced by Plisp [Smi90]. The pattern language can match<br />

and de-structure arbitrarily nested syntax trees in a direct and c<strong>on</strong>cise fashi<strong>on</strong>. Rewrite<br />

rules are applied to <str<strong>on</strong>g>the</str<strong>on</strong>g> executi<strong>on</strong> model.<br />

For <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic rewriting process, OntoMorph is built <strong>on</strong> top <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PowerLoom 19<br />

knowledge representati<strong>on</strong> system, which is a successor to <str<strong>on</strong>g>the</str<strong>on</strong>g> Loom system. Using semantic<br />

import rules, <str<strong>on</strong>g>the</str<strong>on</strong>g> precise image <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source KB semantics can be established<br />

within PowerLoom (limited <strong>on</strong>ly by <str<strong>on</strong>g>the</str<strong>on</strong>g> expressiveness <str<strong>on</strong>g>of</str<strong>on</strong>g> first-order logic).<br />

4.3.3 QOM Quick <strong>Ontology</strong> Mapping<br />

The QOM tool [ES04a] represents an approach that c<strong>on</strong>siders both <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping<br />

results as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time complexity. The hypo<str<strong>on</strong>g>the</str<strong>on</strong>g>sis is that mapping algorithms<br />

may be streamlined such that <str<strong>on</strong>g>the</str<strong>on</strong>g> loss <str<strong>on</strong>g>of</str<strong>on</strong>g> quality (compared to a standard baseline) is<br />

marginal, but <str<strong>on</strong>g>the</str<strong>on</strong>g> improvement <str<strong>on</strong>g>of</str<strong>on</strong>g> efficiency is so tremendous that it allows for <str<strong>on</strong>g>the</str<strong>on</strong>g> ad-hoc<br />

mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> large-size, light-weight <strong>on</strong>tologies. To substantiate <str<strong>on</strong>g>the</str<strong>on</strong>g> hypo<str<strong>on</strong>g>the</str<strong>on</strong>g>sis, a number<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> practical experiments were performed.<br />

The goal is to present an efficient mapping algorithm. The outcome is QOM — Quick<br />

<strong>Ontology</strong> Mapping. It is defined by <str<strong>on</strong>g>the</str<strong>on</strong>g> steps <str<strong>on</strong>g>of</str<strong>on</strong>g> a process model as shown in Figure 4.25.<br />

Mapping <strong>on</strong>e <strong>on</strong>tology <strong>on</strong>to ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r means that for each entity (c<strong>on</strong>cept C, relati<strong>on</strong> R, or<br />

19 http://www.isi.edu/isd/LOOM/PowerLoom/


CHAPTER 4. THE SURVEY 100<br />

instance I) in <strong>on</strong>tology O1, <strong>on</strong>e tries to find a corresp<strong>on</strong>ding entity, which has <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />

intended meaning, in <strong>on</strong>tology O2.<br />

Figure 4.25: QOM Mapping Process<br />

1. Firstly, QOM uses RDF triples as features.<br />

2. Sec<strong>on</strong>d, instead <str<strong>on</strong>g>of</str<strong>on</strong>g> comparing all entities <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first <strong>on</strong>tology with all entities <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

sec<strong>on</strong>d <strong>on</strong>tology, QOM uses heuristics to lower <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> candidate mappings,<br />

which is a major problem for run-time complexity. In this dynamic programming<br />

approach we <strong>on</strong>ly choose promising candidate mappings.<br />

3. The actual similarity computati<strong>on</strong> is d<strong>on</strong>e by using a wide range <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity functi<strong>on</strong>s<br />

[ES04b]. An entity is described by <str<strong>on</strong>g>the</str<strong>on</strong>g> kind <str<strong>on</strong>g>of</str<strong>on</strong>g> appearance that is found to<br />

hold for this entity for characteristics like: identifiers such as URIs, RDF/S primitives<br />

such as subclass and instance relati<strong>on</strong>s, or domain specific features e.g. a<br />

hashcode-<str<strong>on</strong>g>of</str<strong>on</strong>g>-file in a file sharing domain. These features <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tological entities are<br />

compared using String Similarity and SimSet for set comparis<strong>on</strong>s. For efficiency<br />

reas<strong>on</strong>s <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity computati<strong>on</strong> was disburdened by removing extremely costly<br />

feature-measure combinati<strong>on</strong>s such as <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> all subclasses <str<strong>on</strong>g>of</str<strong>on</strong>g> two c<strong>on</strong>cepts.<br />

4. These individual measures are all input to <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity aggregati<strong>on</strong>. Instead <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

applying linear aggregati<strong>on</strong> functi<strong>on</strong>s, QOM applies a sigmoid functi<strong>on</strong>, which emphasizes<br />

high individual similarities and de-emphasizes low individual similarities.<br />

5. From <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity values we derive <str<strong>on</strong>g>the</str<strong>on</strong>g> actual mappings. A threshold to discard<br />

spurious evidence <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity is applied. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r mappings are assigned based <strong>on</strong><br />

a greedy strategy that st<str<strong>on</strong>g>art</str<strong>on</strong>g>s with <str<strong>on</strong>g>the</str<strong>on</strong>g> largest similarity values first.<br />

6. Through several iterati<strong>on</strong> rounds <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> results rises c<strong>on</strong>siderably. Eventually,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> output returned is a mapping table representing <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong> mapO1,O2.<br />

The table is represented in a proprietary format, but can easily be transformed into<br />

a standardized format.<br />

The evaluati<strong>on</strong> was very promising. Depending <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> scenario QOM reaches high<br />

quality mapping levels very quickly. QOM is <strong>on</strong> a par with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r good state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g><br />

algorithms c<strong>on</strong>cerning <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> proposed mappings, while outperforming <str<strong>on</strong>g>the</str<strong>on</strong>g>m with<br />

respect to efficiency — in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> run-time complexity (O(n · log(n)) instead <str<strong>on</strong>g>of</str<strong>on</strong>g> O(n 2 ))<br />

and in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> experiments that have been performed <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> QOM tool (by a factor<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> 10 to 100).


Chapter 5<br />

Comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Methods<br />

5.1 <strong>Ontology</strong> Languages<br />

We compare <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology languages supported by <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches included in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>.<br />

Because many <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> systems included in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> are database integrati<strong>on</strong> systems, <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>tology language is not <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>ly language that counts. For <str<strong>on</strong>g>the</str<strong>on</strong>g>se systems, it also matters<br />

which database schema languages are supported.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g>se integrati<strong>on</strong> systems, <str<strong>on</strong>g>the</str<strong>on</strong>g> data schema is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten lifted to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology level before<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> actual integrati<strong>on</strong> takes place. Therefore, for <str<strong>on</strong>g>the</str<strong>on</strong>g> core mapping task we are usually<br />

<strong>on</strong>ly c<strong>on</strong>cerned with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology languages. This lifting process <str<strong>on</strong>g>of</str<strong>on</strong>g>ten employed<br />

in database integrati<strong>on</strong> systems indicates <str<strong>on</strong>g>the</str<strong>on</strong>g> need for a comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> lifting methods<br />

employed by <str<strong>on</strong>g>the</str<strong>on</strong>g> different systems. This small sub-comparis<strong>on</strong> evaluates to what extent<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> schema is actually translated to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. In o<str<strong>on</strong>g>the</str<strong>on</strong>g>r words, what we need to know<br />

is: (1) is <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> sound?, i.e. is <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> from <str<strong>on</strong>g>the</str<strong>on</strong>g> schema to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />

correct and semantics preserving? and (2) is <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> complete?, i.e. is <str<strong>on</strong>g>the</str<strong>on</strong>g> schema<br />

translated completely?<br />

Table 5.1 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology languages supported by <str<strong>on</strong>g>the</str<strong>on</strong>g> various approaches. It<br />

turns out that integrati<strong>on</strong> systems typically do not focus <strong>on</strong> inter-operability with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />

<strong>on</strong>tology tools. This makes sense, because all tasks (mainly querying) are performed<br />

in a closed envir<strong>on</strong>ment. In a Semantic Web setting, use <str<strong>on</strong>g>of</str<strong>on</strong>g> standards is very important<br />

to enable inter-operability. We can see that tools created especially for <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />

Web (e.g. MAFRA, RDFT, PROMPT, OntoMap, OMEN) support RDFS and PROMPT<br />

also supports OWL. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, matchers such as GLUE and S-Match use <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />

own internal representati<strong>on</strong>. Usually it is not a problem to c<strong>on</strong>vert an <strong>on</strong>tology in any<br />

language to such a representati<strong>on</strong>, which is typically d<strong>on</strong>e in more comprehensive tool<br />

envir<strong>on</strong>ments such as PROMPT.<br />

101


CHAPTER 5. COMPARISON OF THE METHODS 102<br />

Approach <strong>Ontology</strong> Language Remarks<br />

Methods and Tools<br />

MAFRA RDFS<br />

RDFT RDFS<br />

PROMPT Protégé-2000 supported Includes support for RDFS, OWL, etc.<br />

GLUE tax<strong>on</strong>omies<br />

S-Match DAGs<br />

OntoMap proprietary language similar<br />

to OWL Lite− supports export to RDFS<br />

RDFDiff RDF<br />

OMEN proprietary language similar<br />

to RDF Schema<br />

WSMX Mediati<strong>on</strong> Flora (so<strong>on</strong> WSML based) Integrated in <str<strong>on</strong>g>the</str<strong>on</strong>g> Web Services Modeling<br />

Toolkit.<br />

DOME Mediati<strong>on</strong> WSML Based <strong>on</strong> an abstract mapping language<br />

Integrati<strong>on</strong> Systems<br />

InfoSleuth OKBC wrappers are used to integrate data<br />

sources<br />

ONION Directed labeled graphs and source schemas are translated using<br />

Horn rules<br />

wrappers<br />

OBSERVER Descripti<strong>on</strong> Logics (CLAS-<br />

SIC)<br />

The <strong>on</strong>tology server maintains mappings<br />

between data schemas and <strong>on</strong>tologies<br />

MOMIS ODLI3 relati<strong>on</strong>al and semi-structured sources<br />

are translated to ODLI3 using custom<br />

wrappers<br />

INFOMIX a set <str<strong>on</strong>g>of</str<strong>on</strong>g> logical implicati<strong>on</strong>s INFOMIX has a very rich support for<br />

in Disjunctive Datalog different interfaces and languages<br />

AutoMed HDM XML, RDF(S) and relati<strong>on</strong>al databases<br />

are supported<br />

Table 5.1: <strong>Ontology</strong> Languages


CHAPTER 5. COMPARISON OF THE METHODS 103<br />

5.2 Mapping Language<br />

Table 5.2 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping languages used by <str<strong>on</strong>g>the</str<strong>on</strong>g> different approaches. In mapping<br />

languages we can see three general approaches:<br />

• The <strong>on</strong>tology language and <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language are <str<strong>on</strong>g>the</str<strong>on</strong>g> same. This is <str<strong>on</strong>g>the</str<strong>on</strong>g> case in<br />

MOMIS, to some extent in OntoMap and OMEN, to some extent in OBSERVER<br />

and to some extent in ONION.<br />

• The mapping language is different from <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language. This is <str<strong>on</strong>g>the</str<strong>on</strong>g> case in<br />

MAFRA, RDFT and AutoMed and to some extent in OBSERVER and INFOMIX.<br />

MAFRA and RDFT both use a meta-<strong>on</strong>tology to describe types <str<strong>on</strong>g>of</str<strong>on</strong>g> bridges, which<br />

explicate <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. These types <str<strong>on</strong>g>of</str<strong>on</strong>g> bridges can be<br />

seen as <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabulary for <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping languages.<br />

OBSERVER uses to some extent <str<strong>on</strong>g>the</str<strong>on</strong>g> same languages for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> both<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping also allows transformer<br />

functi<strong>on</strong>s, which are bey<strong>on</strong>d <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> data schemas are specified using ERA (Extended Relati<strong>on</strong>al Algebra).<br />

• There is no real mapping language. The output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool is a similarity measure<br />

between c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. This is <str<strong>on</strong>g>the</str<strong>on</strong>g> case for matchers, such as GLUE<br />

and S-Match. The purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se matchers is not to create an <strong>on</strong>tology mapping<br />

as such, but to discover similarities between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />

MAFRA, RDFT and OntoMap describe an <strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g> bridges, called SBO (Semantic<br />

Bridge <strong>Ontology</strong>), RDFT (RDF-Transformati<strong>on</strong>s) and OntoMapO (OntoMap <strong>Ontology</strong>),<br />

respectively. These bridges are instantiated in <str<strong>on</strong>g>the</str<strong>on</strong>g> actual <strong>on</strong>tology mapping in order to<br />

realize <str<strong>on</strong>g>the</str<strong>on</strong>g> actual mapping specificati<strong>on</strong>. In this c<strong>on</strong>text, MAFRA has <str<strong>on</strong>g>the</str<strong>on</strong>g> most elaborate<br />

bridge <strong>on</strong>tology, i.e. MAFRA has <str<strong>on</strong>g>the</str<strong>on</strong>g> most expressive mapping language. However,<br />

MAFRA does not support mappings between classes and instances, which is supported<br />

by OntoMap. The specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> such a bridge <strong>on</strong>tology has many advantages; <str<strong>on</strong>g>the</str<strong>on</strong>g> main<br />

advantage is that it makes <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings clearer and more understandable to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

user and it allows <str<strong>on</strong>g>the</str<strong>on</strong>g> user to more easily find suitable mappings between <strong>on</strong>tologies.<br />

We have not menti<strong>on</strong>ed PROMPT and RDFDiff yet. PROMPT is not used to create a<br />

mapping between <strong>on</strong>tologies, but to merge <strong>on</strong>tologies. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool is<br />

not a mapping specificati<strong>on</strong>, but a merged <strong>on</strong>tology, which, in this case, can be exported to<br />

any <strong>on</strong>tology language supported by Protégé, such as RDFS or OWL. RDFDiff returns as<br />

its output not <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between two different <strong>on</strong>tologies, but <str<strong>on</strong>g>the</str<strong>on</strong>g> structural difference<br />

between two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology in <str<strong>on</strong>g>the</str<strong>on</strong>g> form <str<strong>on</strong>g>of</str<strong>on</strong>g> changed, added and deleted triples in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> RDF document.


CHAPTER 5. COMPARISON OF THE METHODS 104<br />

Approach Mapping Language Remarks<br />

Methods and Tools<br />

MAFRA Semantic Bridge <strong>Ontology</strong><br />

(SBO)<br />

SBO is a meta-<strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g> semantic<br />

bridges. It allows arbitrary mappings<br />

between c<strong>on</strong>cepts, relati<strong>on</strong>s, and attributes,<br />

as well as c<strong>on</strong>diti<strong>on</strong>al mappings<br />

and procedural transformati<strong>on</strong>s<br />

RDFT RDFT RDFT is a meta-<strong>on</strong>tology, which<br />

describes types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings (bridges).<br />

Only allows class-to-class and<br />

property-to-property bridges<br />

PROMPT - not applicable; PROMPT merges <strong>on</strong>tologies<br />

GLUE similarity measures<br />

S-Match set-based (equal, disjoint,<br />

subset, superset)<br />

OntoMap OntoMapO OntoMapO allows specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships<br />

between classes and also<br />

between classes and instances.<br />

RDFDiff changed, added, deleted RDFDiff detects changed, added and<br />

deleted triples between versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an<br />

RDF document<br />

OMEN Bayesian Network with individual<br />

pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> matches as<br />

nodes<br />

WSMX Mediati<strong>on</strong> SEKT Abstract Mapping<br />

Language<br />

DOME SEKT Abstract Map[ping<br />

Language<br />

Integrati<strong>on</strong> Systems<br />

InfoSleuth wrappers no <strong>on</strong>tology mapping; just mapping<br />

data schemas to <strong>on</strong>tologies<br />

ONION Articulati<strong>on</strong> rules<br />

OBSERVER Extended Relati<strong>on</strong>al Algebra<br />

for mapping <strong>on</strong>tology-<br />

DB and DL and transformer<br />

functi<strong>on</strong>s for mapping between<br />

<strong>on</strong>tologies<br />

MOMIS ODL I 3 wrappers are used to integrate data<br />

sources<br />

INFOMIX Disjunctive Datalog wrappers are used to integrate data<br />

sources<br />

AutoMed HDM wrappers are used to integrate data<br />

sources<br />

Table 5.2: Mapping Language


CHAPTER 5. COMPARISON OF THE METHODS 105<br />

5.3 Mapping Patterns<br />

N<strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> uses mapping patterns in <str<strong>on</strong>g>the</str<strong>on</strong>g> way proposed by<br />

[PGM98]. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings <str<strong>on</strong>g>of</str<strong>on</strong>g>ten present in specific mapping languages<br />

(e.g. <str<strong>on</strong>g>the</str<strong>on</strong>g> bridges in RDFT and MAFRA and <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules in ONION) can be seen<br />

as elementary mapping patterns. One could combine a number <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se bridges to create<br />

more complex mapping patterns. However, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no explicit support for such combinati<strong>on</strong>s<br />

in existing approaches.<br />

5.4 Automati<strong>on</strong> Support<br />

Table 5.3 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> automati<strong>on</strong> support provided by <str<strong>on</strong>g>the</str<strong>on</strong>g> different approaches. Both<br />

<strong>on</strong>tology matchers (GLUE and S-Match) are completely automated, in <str<strong>on</strong>g>the</str<strong>on</strong>g> sense that after<br />

inputting two <strong>on</strong>tologies, <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology are returned<br />

without any user interacti<strong>on</strong>s. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> matching <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies is just <strong>on</strong>e step in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

overall mapping process (see Secti<strong>on</strong> 1.2). Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g>se approaches automate <strong>on</strong>ly<br />

p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process.<br />

Mapping (or merging) <strong>on</strong>tologies is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten an interactive process (e.g. in PROMPT),<br />

where <str<strong>on</strong>g>the</str<strong>on</strong>g> tool suggests mapping or merging acti<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g> user and <str<strong>on</strong>g>the</str<strong>on</strong>g> user can choose<br />

to ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r perform <str<strong>on</strong>g>the</str<strong>on</strong>g> suggested acti<strong>on</strong>, to discard it or to perform a different acti<strong>on</strong>. After<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> user interacti<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> tool has more informati<strong>on</strong> to come up with more accurate suggesti<strong>on</strong>s.<br />

It is not clear if and how such <strong>on</strong>e-shot matchers as GLUE and S-Match could<br />

fit in such an interactive process.<br />

The integrati<strong>on</strong> systems ONION and MOMIS use specific tools (SKAT and<br />

ARTEMIS, respectively) for <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity between <strong>on</strong>tologies. These tools<br />

are typically integrated in <str<strong>on</strong>g>the</str<strong>on</strong>g> system, which allows user interacti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process.<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> both MAFRA and RDFT, techniques were described to do <strong>on</strong>tology<br />

matching. In this c<strong>on</strong>text, MAFRA exploits <str<strong>on</strong>g>the</str<strong>on</strong>g> terms and <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies<br />

for <str<strong>on</strong>g>the</str<strong>on</strong>g> matching and RDFT exploits <str<strong>on</strong>g>the</str<strong>on</strong>g> instance descripti<strong>on</strong>s associated with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

<strong>on</strong>tologies to find similarities.<br />

5.5 Applicability to Use Cases<br />

The integrati<strong>on</strong> systems (InfoSleuth, MOMIS, OBSERVER and ONION) in this <str<strong>on</strong>g>survey</str<strong>on</strong>g><br />

typically support all instance mediati<strong>on</strong> use cases presented in Chapter 2. This is because<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> typical use case for data integrati<strong>on</strong> systems is <str<strong>on</strong>g>the</str<strong>on</strong>g> integrated querying <str<strong>on</strong>g>of</str<strong>on</strong>g> multiple<br />

data sources using a unified view (<strong>on</strong>tology). The querying <str<strong>on</strong>g>of</str<strong>on</strong>g> a unified view can be<br />

decomposed into query rewriting (<str<strong>on</strong>g>the</str<strong>on</strong>g> query in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology has to be


CHAPTER 5. COMPARISON OF THE METHODS 106<br />

Approach Automati<strong>on</strong> Support Remarks<br />

Methods and Tools<br />

MAFRA lexical and structural matching<br />

and semi-automatic creati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> mappings<br />

RDFT discovery <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities<br />

based <strong>on</strong> instance data<br />

PROMPT name & structural matching semi-automatic <strong>on</strong>tology merging,<br />

where merge acti<strong>on</strong>s are suggested<br />

based <strong>on</strong> similarities<br />

GLUE multi-strategy machine<br />

learning approach<br />

S-Match matching based <strong>on</strong> synsets<br />

from <str<strong>on</strong>g>the</str<strong>on</strong>g>sauri, using a SAT<br />

solver<br />

OntoMap - automati<strong>on</strong> is supported when two <strong>on</strong>tologies<br />

are mapped to a comm<strong>on</strong> <strong>on</strong>tology<br />

RDFDiff changes between versi<strong>on</strong>s are<br />

detected automatically<br />

OMEN mappings that depend <strong>on</strong><br />

o<str<strong>on</strong>g>the</str<strong>on</strong>g>r mappings can be inferred<br />

automatically<br />

WSMX Mediati<strong>on</strong> String Matching, Wordnet<br />

similarity and real time user<br />

feedback<br />

DOME No specific automati<strong>on</strong> support.<br />

The use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Mapping<br />

Language enable <str<strong>on</strong>g>the</str<strong>on</strong>g> possibility<br />

to use automati<strong>on</strong> tools<br />

exporting in this language.<br />

Integrati<strong>on</strong> Systems<br />

InfoSleuth -<br />

ONION term and structural matching<br />

using SKAT<br />

OBSERVER -<br />

MOMIS name and structural matching<br />

using ARTEMIS<br />

INFOMIX -<br />

AutoMed bidirecti<strong>on</strong>al similarity degrees<br />

Table 5.3: Automati<strong>on</strong> Support<br />

affinities computed by ARTEMIS are<br />

used to identify candidates for classes<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology


CHAPTER 5. COMPARISON OF THE METHODS 107<br />

rewritten in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> local data source), instance transformati<strong>on</strong> (query results need<br />

to be translated from <str<strong>on</strong>g>the</str<strong>on</strong>g> local representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> global representati<strong>on</strong>) and instance<br />

unificati<strong>on</strong> (duplicates and redundancy have to be removed from <str<strong>on</strong>g>the</str<strong>on</strong>g> query results when<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> results from different data sources are combined).<br />

Both MAFRA and RDFT have specific support for instance transformati<strong>on</strong>s. In<br />

MAFRA, it is possible to attach an executable piece <str<strong>on</strong>g>of</str<strong>on</strong>g> code to a Semantic Bridge. In<br />

RDFT, it is possible to associate an XPath expressi<strong>on</strong> with a bridge. Because RDFT is<br />

used for transforming XML documents and RDF documents in <str<strong>on</strong>g>the</str<strong>on</strong>g>ir XML representati<strong>on</strong>,<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> XPath language can be used to express such transformati<strong>on</strong>s.<br />

A form <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging is performed in both <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION and <str<strong>on</strong>g>the</str<strong>on</strong>g> MOMIS systems.<br />

In both systems, a global <strong>on</strong>tology is created, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies and<br />

database schemas. The global <strong>on</strong>tology is, in both cases, a virtual view over <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying<br />

data sources; <str<strong>on</strong>g>the</str<strong>on</strong>g> local sources remain and mapping rules between <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology<br />

and <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies are specified inside <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology.<br />

PROMPT, <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, is a pure <strong>on</strong>tology merging tool. The outcome <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

PROMPT tool is a merged versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies; no mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

sources and <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology are created; <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology is assumed to replace<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies.<br />

5.6 Implementati<strong>on</strong><br />

Table 5.4 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> implementati<strong>on</strong>s that have been made for <str<strong>on</strong>g>the</str<strong>on</strong>g> different<br />

approaches.<br />

As we can see from <str<strong>on</strong>g>the</str<strong>on</strong>g> table, most approaches have <strong>on</strong>ly been implemented as academic<br />

prototypes. For most approaches we are not aware <str<strong>on</strong>g>of</str<strong>on</strong>g> any planned fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r development<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tools. Excepti<strong>on</strong>s are PROMPT, which is currently under active development<br />

and has recently been adapted for <str<strong>on</strong>g>the</str<strong>on</strong>g> Protégé OWL plugin; S-Match, which is currently<br />

being extended to take <str<strong>on</strong>g>the</str<strong>on</strong>g> different semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> different relati<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies into<br />

account and to optimize <str<strong>on</strong>g>the</str<strong>on</strong>g> performance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong>; and RDFDiff, which will<br />

be fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r developed in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project.<br />

5.7 Experiences<br />

Table 5.5 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> experiences with <str<strong>on</strong>g>the</str<strong>on</strong>g> various approaches reported in <str<strong>on</strong>g>the</str<strong>on</strong>g> literature.<br />

Most <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> experiences reported in <str<strong>on</strong>g>the</str<strong>on</strong>g> literature are really toy problems; we feel<br />

that real experience with <strong>on</strong>tology mapping and <strong>on</strong>tology-based informati<strong>on</strong> integrati<strong>on</strong><br />

is lacking. A cause <str<strong>on</strong>g>of</str<strong>on</strong>g> this problem is that <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web has not gained any real


CHAPTER 5. COMPARISON OF THE METHODS 108<br />

Approach Implementati<strong>on</strong> Remarks<br />

Methods and Tools<br />

MAFRA Two prototypes have been<br />

implemented<br />

RDFT Research Prototype<br />

PROMPT Versi<strong>on</strong> 2.1.1 PROMPT is still undergoing active development<br />

GLUE Research Prototype<br />

S-Match First prototype Work is still under way to improve <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

implementati<strong>on</strong><br />

OntoMap Prototype; under development<br />

since 2001<br />

RDFDiff Research Prototype The tool will be fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r developed in<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project<br />

OMEN first research prototype<br />

WSMX Mediati<strong>on</strong> Research Prototype<br />

DOME Research Prototype<br />

Integrati<strong>on</strong> Systems<br />

InfoSleuth Project Prototype<br />

ONION Research Prototype<br />

OBSERVER Research Prototype<br />

MOMIS Research Prototype<br />

INFOMIX INFOMIX IST Project<br />

AutoMed AutoMed project, LSDI<br />

project, ISPIDER project,<br />

RoDEX project<br />

Table 5.4: Implementati<strong>on</strong>


CHAPTER 5. COMPARISON OF THE METHODS 109<br />

Approach Experience Remarks<br />

Methods and Tools<br />

MAFRA Toy Problems<br />

RDFT Mapping product classificati<strong>on</strong><br />

schemes in GoldenBullet<br />

project<br />

PROMPT HPKB project; evaluati<strong>on</strong> using<br />

example <strong>on</strong>tologies<br />

GLUE Toy problems<br />

Semantic Matching Toy problems Was evaluated against o<str<strong>on</strong>g>the</str<strong>on</strong>g>r matchers<br />

OntoMap Applied to most upper-level<br />

<strong>on</strong>tologies<br />

RDFDiff Toy problems<br />

OMEN Toy problems<br />

WSMX Mediati<strong>on</strong> DIP European Project<br />

DOME DIP European project<br />

Integrati<strong>on</strong> Systems<br />

InfoSleuth Two elaborate case studies<br />

ONION Toy Examples<br />

OBSERVER Prototype with real-life bibliographic<br />

data<br />

MOMIS Toy Examples ARTEMIS (p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> MOMIS) has been<br />

applied in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <str<strong>on</strong>g>of</str<strong>on</strong>g> Italian Public<br />

Administrati<strong>on</strong><br />

INFOMIX -<br />

AutoMed some case studies<br />

Table 5.5: Experience<br />

momentum as yet. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are currently not so many <strong>on</strong>tologies <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Web, although<br />

<str<strong>on</strong>g>the</str<strong>on</strong>g>re is some experience with real-life data sources, such as bibliographic sources.


Chapter 6<br />

C<strong>on</strong>clusi<strong>on</strong>s<br />

In this <str<strong>on</strong>g>survey</str<strong>on</strong>g> we have evaluated and compared several approaches to <strong>on</strong>tology mapping,<br />

<strong>on</strong>tology matching, <strong>on</strong>tology merging and data integrati<strong>on</strong>.<br />

Comparing different types <str<strong>on</strong>g>of</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> has made it clear that n<strong>on</strong>e<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches exactly fits all our criteria for <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />

Web. The purposes <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> tend to vary. The integrati<strong>on</strong> systems<br />

such as MOMIS [BCVB01] and ONION [MWK00, MW01] have <str<strong>on</strong>g>the</str<strong>on</strong>g> aim <str<strong>on</strong>g>of</str<strong>on</strong>g> providing<br />

query answering services over multiple data sources to <str<strong>on</strong>g>the</str<strong>on</strong>g> user. Matchers such as GLUE<br />

[DMDH04] and S-Match [GS04] have <str<strong>on</strong>g>the</str<strong>on</strong>g> more specific goal <str<strong>on</strong>g>of</str<strong>on</strong>g> finding similarities between<br />

schemas or <strong>on</strong>tologies. Integrati<strong>on</strong> systems <str<strong>on</strong>g>of</str<strong>on</strong>g>ten use matchers for <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

mappings (e.g. ONION uses SKAT; MOMIS uses ARTEMIS).<br />

MAFRA [MMSV02] and RDFT [Ome02b] provide meta-<strong>on</strong>tologies for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong>s<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> mappings between <strong>on</strong>tologies. These mappings can be used for instance transformati<strong>on</strong>s.<br />

In fact, both approaches include specific means to enable transformati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> instances between different representati<strong>on</strong>s. However, whereas InfoSleuth provides<br />

methods for <str<strong>on</strong>g>the</str<strong>on</strong>g> fusi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> query results (instance unificati<strong>on</strong>), nei<str<strong>on</strong>g>the</str<strong>on</strong>g>r MAFRA nor RDFT<br />

handle this situati<strong>on</strong>. Query rewriting is also not explicitly handled, but this should not<br />

be a problem because bridges between entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies are explicitly present and<br />

can be readily used for query rewriting.<br />

This state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> has made clear that work still needs to be d<strong>on</strong>e in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

area <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. We can learn from <str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong><br />

systems, which provide services for query answering over distributed heterogeneous data<br />

sources. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> current setting <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se integrati<strong>on</strong> systems is inside <str<strong>on</strong>g>the</str<strong>on</strong>g> enterprise,<br />

which is still a more-or-less c<strong>on</strong>trolled area. On <str<strong>on</strong>g>the</str<strong>on</strong>g> Web, not much c<strong>on</strong>trol over <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

use <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies can be expected and <str<strong>on</strong>g>the</str<strong>on</strong>g> global integrati<strong>on</strong> scenario is not expected to<br />

scale, because eventually different organizati<strong>on</strong>s will use different <strong>on</strong>tologies and will not<br />

want to commit to a new <strong>on</strong>tology. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e integrati<strong>on</strong> approach is also<br />

not expected to scale, because it would require <str<strong>on</strong>g>the</str<strong>on</strong>g> maintenance <str<strong>on</strong>g>of</str<strong>on</strong>g> too many mappings<br />

110


CHAPTER 6. CONCLUSIONS 111<br />

Figure 6.1: <strong>Ontology</strong> “islands”: large ellipses depict locally global <strong>on</strong>tologies; small<br />

ellipses depict locally local <strong>on</strong>tologies<br />

between <strong>on</strong>tologies. Therefore, we expect a hybrid approach will appear, where we have<br />

several “islands” around influential domain <strong>on</strong>tologies, where within <str<strong>on</strong>g>the</str<strong>on</strong>g> island <str<strong>on</strong>g>the</str<strong>on</strong>g>re is<br />

a form <str<strong>on</strong>g>of</str<strong>on</strong>g> global integrati<strong>on</strong>; <strong>on</strong>e <strong>on</strong>tology would be <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> islands<br />

and a number <str<strong>on</strong>g>of</str<strong>on</strong>g> local <strong>on</strong>tologies are mapped to this global <strong>on</strong>tology. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g>re would<br />

be mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> islands, as illustrated in Figure 6.1. In <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong><br />

soluti<strong>on</strong> to be developed within SEKT, we need to take this into account and we need<br />

to combine global integrati<strong>on</strong> approaches, such as <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>es supported by MOMIS and<br />

ONION, with <strong>on</strong>e-to-<strong>on</strong>e mappings, which are supported by, for example, MAFRA.<br />

From <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process (Secti<strong>on</strong> 1.2) we can already see that we need different<br />

types <str<strong>on</strong>g>of</str<strong>on</strong>g> methods and tools for its realizati<strong>on</strong>. Most notably, we need an <strong>on</strong>tology matcher<br />

in order to identify similarities between <strong>on</strong>tologies and we need a mapping language and a<br />

mapping tool for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mapping. Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> matchers are GLUE<br />

and S-Match. MAFRA and OntoMap provide mapping languages for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

mappings between <strong>on</strong>tologies.<br />

From <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> in Chapter 5 we can see that current approaches to <strong>on</strong>tology<br />

mapping have mostly been applied to toy problems and cannot be expected to scale both


CHAPTER 6. CONCLUSIONS 112<br />

in <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies to be mapped and <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> instances to <str<strong>on</strong>g>the</str<strong>on</strong>g> transformed<br />

at executi<strong>on</strong> time. Within SEKT we need to take into account <str<strong>on</strong>g>the</str<strong>on</strong>g> different approaches<br />

that are out <str<strong>on</strong>g>the</str<strong>on</strong>g>re and especially look into database integrati<strong>on</strong> approaches, which can<br />

overcome some <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> scalability issues with large sets <str<strong>on</strong>g>of</str<strong>on</strong>g> instances.<br />

Within SEKT <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>Ontology</strong> Mediati<strong>on</strong> plugin called OntoMap is being developed.<br />

Note that OntoMap shares <str<strong>on</strong>g>the</str<strong>on</strong>g> name with <str<strong>on</strong>g>the</str<strong>on</strong>g> mediati<strong>on</strong> system developed by Sirma (cf.<br />

Secti<strong>on</strong> 4.1.6), but is a completely different system. The OntoMap plugin uses F-Logic as<br />

<strong>Ontology</strong> Mapping Language and is integrated in OntoStudio which yields an integrated<br />

system. The Ontobroker reas<strong>on</strong>er is used for reas<strong>on</strong>ing with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, mediati<strong>on</strong><br />

rules and data. OntoStudio has support for different kinds <str<strong>on</strong>g>of</str<strong>on</strong>g> databases and source formats<br />

and thus is competitive with <str<strong>on</strong>g>the</str<strong>on</strong>g> presented integrated systems for mediati<strong>on</strong>. It also<br />

provides facilities for Query Answering. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> following use cases can be<br />

implemented by OntoMap:<br />

• Instance Transformati<strong>on</strong><br />

• Instance Unificati<strong>on</strong><br />

• <strong>Ontology</strong> <strong>Merging</strong><br />

OntoMap supports Mapping Patterns which is not available in any <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> systems that<br />

have been <str<strong>on</strong>g>survey</str<strong>on</strong>g>ed in this state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g>.<br />

Thus, OntoMap as a plugin for Ontostudio, can be c<strong>on</strong>sidered at least competitive<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> presented systems.


Bibliography<br />

[AC01] Maurizio Lenzerini Andrea Cal, Giuseppe De Giacomo. Models for informati<strong>on</strong><br />

integrati<strong>on</strong>: turning local-as-view into global-as-view. In Proc. <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong> Foundati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Models for Informati<strong>on</strong> Integrati<strong>on</strong> (FMII<br />

2001), pages 270–284, 2001.<br />

[BCM + 03] Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi,<br />

and Peter F. Patel-Schneider, editors. The Descripti<strong>on</strong> Logic Handbook.<br />

Cambridge University Press, 2003.<br />

[BCV99] S<strong>on</strong>ia Bergamaschi, Silvana Castano, and Maurizio Vincini. Semantic integrati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> semistructured and structured data sources. SIGMOD Record<br />

Special Issua <strong>on</strong> Semantic Interoperability in Global Informati<strong>on</strong>, 28(1),<br />

March 1999.<br />

[BCVB01] S<strong>on</strong>ia Bergamaschi, Silvana Castano, Maurizio Vincini, and Domenico Beneventano.<br />

Semantic integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogeneous informati<strong>on</strong> sources. Special<br />

Issue <strong>on</strong> Intelligent Informati<strong>on</strong> Integrati<strong>on</strong>, Data & Knowledge Engineering,<br />

36(1):215–249, 2001.<br />

[Bec03] Dave Beckett. RDF/XML syntax specificati<strong>on</strong> (revised). Recommendati<strong>on</strong><br />

10 February 2004, W3C, 2003.<br />

[BFG01] R. Baumg<str<strong>on</strong>g>art</str<strong>on</strong>g>ner, S. Flesca, and G. Gottlob. Visual web informati<strong>on</strong> extracti<strong>on</strong><br />

with LiXto. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> VLDB, 2001.<br />

[BG04] Dan Brickley and Ramanathan V. Guha. RDF vocabulary descripti<strong>on</strong> language<br />

1.0: RDF schema. Recommendati<strong>on</strong> 10 February 2004, W3C, 2004.<br />

Available from http://www.w3.org/TR/rdf-schema/.<br />

[BKL + 04] M. Boyd, S. Kittivoravitkul, P. Lazanitis, P.J. McBrien, and N. Rizopoulos.<br />

Automed: A bav data integrati<strong>on</strong> system for heterogeneous data sources. In<br />

Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> CAiSE04, 2004.<br />

[BLHL01] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic<br />

web. Scientific American, 284(5):34–43, May 2001.<br />

113


BIBLIOGRAPHY 114<br />

http://www.sciam.com/<str<strong>on</strong>g>art</str<strong>on</strong>g>icle.cfm?<str<strong>on</strong>g>art</str<strong>on</strong>g>icleID=00048144-10D2-1C70-<br />

84A9809EC588EF21&ref=sciam.<br />

[BS85] R<strong>on</strong>ald J. Brachman and James G. Schmolze. An overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> KL-ONE<br />

knowledge representati<strong>on</strong> system. Cognitive Science, 9(2):171–216, 1985.<br />

[CAdV01] Silvana Castano, Valeria De Ant<strong>on</strong>ellis, and Sabrina De Capitani di Vimercati.<br />

Global viewing <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogeneous data sources. IEEE Transacti<strong>on</strong>s <strong>on</strong><br />

Knowledge and Data Engineering, 13(2):277–297, 2001.<br />

[CD99] James Clark and Steve DeRose. XML path language (XPath) versi<strong>on</strong> 1.0.<br />

Recommendati<strong>on</strong> 16 November 1999, W3C, 1999.<br />

[CdA99] Silvana Castano and Valeria de Ant<strong>on</strong>ellis. A schema analysis and rec<strong>on</strong>ciliati<strong>on</strong><br />

tool envir<strong>on</strong>ment for heterogeneous databases. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

1999 Internati<strong>on</strong>al Symposium <strong>on</strong> Database Engineering & Applicati<strong>on</strong>s.<br />

IEEE Computer Society, 1999.<br />

[CDGL01] Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. A framework<br />

for <strong>on</strong>tology integrati<strong>on</strong>. In Proc. <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 2001 Int. Semantic Web Working<br />

Symposium (SWWS 2001), pages 303–316, 2001.<br />

[CFF + 98] Vinay K. Chaudhri, Adam Farquhar, Richard Fikes, Peter D. Karp, and<br />

James P. Rice. OKBC: A programmatic foundati<strong>on</strong> for knowledge base<br />

interoperability. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Fifteenth Nati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Artificial<br />

Intelligence (AAAI-98), pages 600–607, Madis<strong>on</strong>, Wisc<strong>on</strong>sin, USA,<br />

1998. MIT Press.<br />

[Cha00] Hans Chalupsky. OntoMorph: A translati<strong>on</strong> system for symbolic knowledge.<br />

In Anth<strong>on</strong>y G. Cohn, Fausto Giunchiglia, and B<str<strong>on</strong>g>art</str<strong>on</strong>g> Selman, editors,<br />

KR 2000, Principles <str<strong>on</strong>g>of</str<strong>on</strong>g> Knowledge Representati<strong>on</strong> and Reas<strong>on</strong>ing Proceedings<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Seventh Internati<strong>on</strong>al C<strong>on</strong>ference, pages 471–482, Breckenridge,<br />

Colorado, USA, 2000. Morgan Kaufmann Publishers.<br />

[Che79] P. Chen. The entity relati<strong>on</strong>ship model - toward a unified view <str<strong>on</strong>g>of</str<strong>on</strong>g> data. ACM<br />

Transacti<strong>on</strong>s <strong>on</strong> Database Systems, 1(1):9–36, 1979.<br />

[Cla99] James Clark. XSL transformati<strong>on</strong>s (XSLT) versi<strong>on</strong> 1.0. Recommendati<strong>on</strong><br />

16 November 1999, W3C, 1999.<br />

[dB04] Jos de Bruijn. Semantic integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> disparate data sources in <str<strong>on</strong>g>the</str<strong>on</strong>g> cog<br />

project. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 6th Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Enterprise<br />

Informati<strong>on</strong> Systems (ICEIS2004), Porto, Portugal, 2004.<br />

[dBP04] Jos de Bruijn and Axel Polleres. Towards and <strong>on</strong>tology mapping specificati<strong>on</strong><br />

language for <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic web. Technical Report DERI-2004-06-30,<br />

DERI, 2004.


BIBLIOGRAPHY 115<br />

[dBPF04] Jos de Bruijn, Axel Polleres, and Dieter Fensel. OWL lite − . Deliverable<br />

d20v0.1, WSML, 2004.<br />

Available from http://www.wsmo.org/2004/d20/v0.1/.<br />

[DKO + 02] Ying Ding, M. Korotkiy, Borys Omelayenko, V. K<str<strong>on</strong>g>art</str<strong>on</strong>g>seva, V. Zykov, Michel<br />

Klein, Ellen Schulten, and Dieter Fensel. GoldenBullet: Automated classificati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> product data in e-commerce. In Withold Abramowicz, editor,<br />

Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> BIS 2002, Poznan, Poland, 2002.<br />

[DMDH02] AnHai Doan, Jazant Madhavan, Pedro Domingos, and Al<strong>on</strong> Halevy. Learning<br />

to map between <strong>on</strong>tologies <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic web. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

World-Wide Web C<strong>on</strong>ference, 2002.<br />

[DMDH04] AnHai Doan, Jazant Madhaven, Pedro Domingos, and Al<strong>on</strong> Halevy. <strong>Ontology</strong><br />

matching: A machine learning approach. In Steffen Staab and Rudi<br />

Studer, editors, Handbook <strong>on</strong> Ontologies in Informati<strong>on</strong> Systems, pages<br />

397–416. Springer-Verlag, 2004.<br />

[DMQ02] Dejing Dou, Drew McDermott, and Peishen Qi. <strong>Ontology</strong> translati<strong>on</strong> by<br />

<strong>on</strong>tology merging and automated reas<strong>on</strong>ing. In Proc. EKAW2002 Workshop<br />

<strong>on</strong> Ontologies for Multi-Agent Systems, pages 3–18, 2002.<br />

[DR02] H<strong>on</strong>g-Hai Do and Erhard Rahm. COMA – a system for flexible combinati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> schema matching approaches. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> VLDB’02, pages<br />

610–621, 2002.<br />

[DR05] H. Lausen (eds.) D. Roman, U. Keller. Web service modeling <strong>on</strong>tology.<br />

Technical report, DERI, 2005.<br />

[DS04] Mike Dean and Guus Schreiber, editors. OWL Web <strong>Ontology</strong> Language<br />

Reference. 2004. W3C Recommendati<strong>on</strong> 10 February 2004.<br />

[EA] P.McBrien E.Jasper, N.T<strong>on</strong>g and A.Poulovassilis. Generating and optimising<br />

views from both as view data integrati<strong>on</strong> rules. In Proc. 6th Baltic C<strong>on</strong>ference<br />

<strong>on</strong> Database and Informati<strong>on</strong> Systems (DBIS’04), Riga, June 2004.<br />

[ES04a] Marc Ehrig and Steffen Staab. QOM - quick <strong>on</strong>tology mapping. Hiroshima,<br />

Japan, November 2004.<br />

[ES04b] Marc Ehrig and York Sure. <strong>Ontology</strong> mapping - an integrated approach.<br />

In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> First European Semantic Web Symposium, Herakli<strong>on</strong>,<br />

Greece, May 2004.<br />

[Fel99] Christiane Fellbaum, editor. WordNet: An Electr<strong>on</strong>ic Lexical Database.<br />

MIT Press, 1999.


BIBLIOGRAPHY 116<br />

[Fen03] Dieter Fensel. Ontologies: Silver Bullet for Knowledge Management and<br />

Electr<strong>on</strong>ic Commerce, 2nd editi<strong>on</strong>. Springer-Verlag, Berlin, 2003.<br />

[FMvH + 03] Dieter Fensel, Enrico Motta, Frank van Harmelen, V. Richard Benjamins,<br />

Stefan Decker, Mauro Gaspari, Rix Groenboom, William Grosso, Mark A.<br />

Musen, Enric, Guus Schreiber, Rudi Studer, and Bob Wielinga. The unified<br />

problem-solving method development language upml. Knowledge and<br />

Informati<strong>on</strong> Systems(KAIS) journal, 5(1), 2003.<br />

[FNPB99] Jerry Fowler, Marian Nodine, Brad Perry, and Bruce Bargmeyer. Agentbased<br />

semantic interoperability in infosleuth. SIGMOD Record, 28(1),<br />

1999.<br />

[GGM98] John H. Gennari, William Grosso, and Mark A. Musen. A methoddescripti<strong>on</strong><br />

language: An initial <strong>on</strong>tology with examples. In Proceedings<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Eleventh Banff Knowledge Acquisiti<strong>on</strong> for Knowledge-Bases Systems<br />

Workshop, Banff, Canada, 1998.<br />

[Gru94] T. Gruber. An <strong>on</strong>tology for engineering ma<str<strong>on</strong>g>the</str<strong>on</strong>g>matics. Technical report,<br />

Knowledge Systems Laboratory, Stanford University, 1994. Report KSL-<br />

94-18.<br />

[GS04] Fausto Giunchiglia and Pavel Shvaiko. Semantic matching. The Knowledge<br />

Engineering Review, 18(3):265–280, 2004.<br />

[GSY04] Fausto Giunchiglia, Pavel Shvaiko, and Mikalai Yatskevich. S-match: an<br />

algorithm and an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> semantic matching. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

ESWS’04, number 3053 in LNCS, pages 61–75, Herakli<strong>on</strong>, Greece, 2004.<br />

Springer-Verlag.<br />

[GW99] Bernhard Ganter and Rudolph Wille. Formal c<strong>on</strong>cept analysis: Ma<str<strong>on</strong>g>the</str<strong>on</strong>g>matical<br />

Foundati<strong>on</strong>s. Springer, Berlin-Heidelberg, 1999.<br />

[Hen05] Jan Henke. Omwg d8.2 editing and browsing design. Technical report,<br />

2005.<br />

[HM93] Joachim Hammer and Dennis McLeod. An approach to resolving semantic<br />

heterogeneity in a federati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> aut<strong>on</strong>omous, heterogeneous, database<br />

systems. Internati<strong>on</strong>al Journal <strong>on</strong> Intelligent and Cooperative Informati<strong>on</strong><br />

Systems, 2(1):51–83, 1993.<br />

[Hul97] Richard Hull. Managing semantic heterogeneity in databases: A <str<strong>on</strong>g>the</str<strong>on</strong>g>oretical<br />

perspective. In ACM Symposium <strong>on</strong> Principles <str<strong>on</strong>g>of</str<strong>on</strong>g> Database Systems, pages<br />

51–61, Tusc<strong>on</strong>, Ariz<strong>on</strong>a, USA, 1997.<br />

[Ker05] M. Kerrigan. Web service modeling toolkit (wsmt). Technical report,<br />

WSMX Working Draft, 2005.


BIBLIOGRAPHY 117<br />

[KFNM] Holger Knublauch, Ray W. Fergers<strong>on</strong>, Natalya F. Noy, and Mark A. Musen.<br />

The protégé owl plugin: An open development envir<strong>on</strong>ment for semanticweb<br />

applicati<strong>on</strong>s. In J. Allenet et al., editors, Third Internati<strong>on</strong>al Semantic<br />

Web C<strong>on</strong>ference - ISWC 2004, Hiroshima, Japan.<br />

[KKOF02] Michel Klein, Atanas Kiryakov, Damyan Ognyanov, and Dieter Fensel.<br />

Finding and characterizing changes in <strong>on</strong>tologies. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

21st Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> C<strong>on</strong>ceptual Modeling (ER2002), Tampere,<br />

Finland, 2002.<br />

[Kle01] Michel Klein. Combining and relating <strong>on</strong>tologies: an analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> problems<br />

and soluti<strong>on</strong>s. In Asunci<strong>on</strong> Gomez-Perez, Michael Gruninger, Heiner<br />

Stuckenschmidt, and Michael Uschold, editors, Workshop <strong>on</strong> Ontologies<br />

and Informati<strong>on</strong> Sharing, IJCAI’01, Seattle, USA, August 4–5, 2001.<br />

[Kle04] Michel Klein. Change Management for Distributed Ontologies. PhD <str<strong>on</strong>g>the</str<strong>on</strong>g>sis,<br />

Free University <str<strong>on</strong>g>of</str<strong>on</strong>g> Amsterdam, 2004.<br />

[KS00] Atanas Kiryakov and Kiril Iv. Simov. Mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> eurowordnet top <strong>on</strong>tology<br />

to upper cyc <strong>on</strong>tology. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> Ontologies and Text workshop,<br />

EKAW 2000, Juan-les-Pins, French Riviera, 2000.<br />

[KSD01a] Atanas Kiryakov, Kiril Iv. Simov, and Marin Dimitrov. Ontomap: The<br />

upper-<strong>on</strong>tology portal. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> ”Formal <strong>Ontology</strong> in Informati<strong>on</strong><br />

Systems”, Ogunquit, Maine, 2001.<br />

[KSD01b] Atanas Kiryakov, Kiril Iv. Simov, and Marin Dimitrov. Tr1. <strong>on</strong>tomap: The<br />

upper-<strong>on</strong>tology portal. revisi<strong>on</strong> 2, Ontotext, 2001.<br />

[Len] Maurizio Lenzerini. Data integrati<strong>on</strong>: A <str<strong>on</strong>g>the</str<strong>on</strong>g>oretical perspective. In PODS<br />

2002, pages 233–246.<br />

[Lev00] Al<strong>on</strong> Y. Levy. Logic-Based Techniques in Data Integrati<strong>on</strong>, pages 575–595.<br />

Kluwer Publishers, 2000.<br />

[LGR + 05a] N. Le<strong>on</strong>e, G. Gottlob, R. Rosati, G. Greco, G. Ianni, V. Lio, V. Terracina,<br />

T. Eiter, W. Faber, M. Fink, D. Lembo, M. Lenzerini, M. Ruzzi, E. Kalka,<br />

B. Nowicki, and W. Staniszkis. Data integrati<strong>on</strong> by logic programming: The<br />

INFOMIX system. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Eighth Internati<strong>on</strong>al C<strong>on</strong>ference<br />

<strong>on</strong> Logic Programming and N<strong>on</strong>m<strong>on</strong>ot<strong>on</strong>ic Reas<strong>on</strong>ing (LPNMR), 2005.<br />

[LGR + 05b] N. Le<strong>on</strong>e, G. Gottlob, R. Rosati, G. Greco, G. Ianni, V. Lio, V. Terracina,<br />

T. Eiter, W. Faber, M. Fink, D. Lembo, M. Lenzerini, M. Ruzzi, E. Kalka,<br />

B. Nowicki, and W. Staniszkis. The INFOMIX system for advanced integrati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> incomplete and inc<strong>on</strong>sistent data. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ACM<br />

SIGMOD Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Management <str<strong>on</strong>g>of</str<strong>on</strong>g> Data, pages 915–<br />

917, 2005.


BIBLIOGRAPHY 118<br />

[LLR02] Domenico Lembo, Maurizio Lenzerini, and Ricardo Rosati. Functi<strong>on</strong>al<br />

specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> infomix system. Technical Report D2.1, University <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

Rome, 2002.<br />

[LPF + 05] N. Le<strong>on</strong>e, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri, and F. Scarcello.<br />

The dlv system for knowledge representati<strong>on</strong> and reas<strong>on</strong>ing. In ACM<br />

Transacti<strong>on</strong>s <strong>on</strong> Computati<strong>on</strong>al Logic, 2005.<br />

[LS99] Ora Lassila and Ralph R. Swick. Resource descripti<strong>on</strong> framework (RDF)<br />

model and syntax specificati<strong>on</strong>. W3c recommendati<strong>on</strong>, W3C, 1999.<br />

http://www.w3.org/TR/1999/REC-rdf-syntax-19990222.<br />

[MBR01] Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. Generic schema<br />

matching with cupid. In Proc. 27th Int. C<strong>on</strong>f. <strong>on</strong> Very Large Data Bases<br />

(VLDB), 2001.<br />

[MFRW00] Deborah L. McGuinness, Richard Fikes, James Rice, and Steve Wilder.<br />

An envir<strong>on</strong>ment for merging and testing large <strong>on</strong>tologies. In Proc. 7th<br />

Intl. C<strong>on</strong>f. On Principles <str<strong>on</strong>g>of</str<strong>on</strong>g> Knowledge Representati<strong>on</strong> and Reas<strong>on</strong>ing<br />

(KR2000), Colorado, USA, April 2000.<br />

[MGMR02] Sergey Melnik, Hector Garcia-Molina, and Erhard Rahm. Similarity flooding:<br />

A versatile graph matching algorithm. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> ICDE, pages<br />

117–128, 2002.<br />

[MIKS00] Eduardo Mena, Arantza Illarramendi, Vipul Kashyap, and Amit P. Sheth.<br />

OBSERVER: An approach for query processing in global informati<strong>on</strong> systems<br />

based <strong>on</strong> interoperati<strong>on</strong> across pre-existing <strong>on</strong>tologies. Distributed and<br />

Parallel Databases, 8(2):223–271, 2000.<br />

[MMSV02] Alexander Maedche, Boris Motik, Nuno Silva, and Raphael Volz. Mafra a<br />

mapping framework for distributed <strong>on</strong>tologies. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 13th<br />

European C<strong>on</strong>ference <strong>on</strong> Knowledge Engineering and Knowledge Management<br />

EKAW-2002, Madrid, Spain, 2002.<br />

[MNJ04] P. Mitra, N. F. Noy, and A. R. Jaiswal. Omen: A probabilistic <strong>on</strong>tology<br />

mapping tool. In Workshop <strong>on</strong> Meaning coordinati<strong>on</strong> and negotiati<strong>on</strong> at <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Third Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web (ISWC), Hiroshima,<br />

Japan, 2004.<br />

[Moc05] A. Mocan. Wsmx data mediati<strong>on</strong>. Technical report, WSMX Working Draft,<br />

2005.<br />

[MP03] P.J. McBrien and A. Poulovassilis. Data integrati<strong>on</strong> by bi-directi<strong>on</strong>al schema<br />

transformati<strong>on</strong> rules. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> ICDE03, pages 227 – 238. IEEE,<br />

2003.


BIBLIOGRAPHY 119<br />

[MRB03] Sergey Melnik, Erhard Rahm, and Philip A. Bernstein. Developing<br />

metadata-intensive applicati<strong>on</strong>s with r<strong>on</strong>do. Journal <str<strong>on</strong>g>of</str<strong>on</strong>g> Web Semantics, 1(1),<br />

December 2003.<br />

[MvH04] Deborah L. McGuinness and Frank van Harmelen. OWL web <strong>on</strong>tology language<br />

overview. Recommendati<strong>on</strong> 10 February 2004, W3C, 2004. Available<br />

from http://www.w3.org/TR/owl-features/.<br />

[MW01] Prasenjit Mitra and Gio Wiederhold. An algebra for semantic interoperability<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> sources. In IEEE Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Bioinformatics<br />

and Biomedical Egineering, pages 174–182, 2001.<br />

[MWJ99] Prasenjit Mitra, Gio Wiederhold, and Jan Jannink. Semi-automatic integrati<strong>on</strong><br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> knowledge sources. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> Fusi<strong>on</strong> 99, Sunnydale,<br />

California, USA, July 1999.<br />

[MWK00] Prasenjit Mitra, Gio Wiederhold, and M<str<strong>on</strong>g>art</str<strong>on</strong>g>in L. Kersten. A graph-oriented<br />

model for <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology interdependencies. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

C<strong>on</strong>ference <strong>on</strong> Extending Database Technology (EDBT 2000), K<strong>on</strong>stanz,<br />

Germany, March 2000.<br />

[MZ05] M. Moran Michal Zaremba. Wsmx architecture. Technical report, WSMO<br />

Working Draft, 2005.<br />

[NFK + 00] Marian H. Nodine, Jerry Fowler, Tomasz Ksiezyk, Brad Perry, Malcolm<br />

Taylor, and Amy Unruh. Active informati<strong>on</strong> ga<str<strong>on</strong>g>the</str<strong>on</strong>g>ring in infosleuth. Internati<strong>on</strong>al<br />

Journal <str<strong>on</strong>g>of</str<strong>on</strong>g> Cooperative Informati<strong>on</strong> Systems, 9(1-2):3–28, 2000.<br />

[NM99] Natalya F. Noy and Mark A. Musen. Sm<str<strong>on</strong>g>art</str<strong>on</strong>g>: Automated support for <strong>on</strong>tology<br />

merging and alignment. Technical Report SMI-1999-0813, Stanford<br />

Medical Informatics, 1999.<br />

[NM00a] Natalya F. Noy and Mark A. Musen. Anchor-prompt: Using n<strong>on</strong>-local c<strong>on</strong>text<br />

for semantic matching. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong> Ontologies<br />

and Informati<strong>on</strong> Sharing at <str<strong>on</strong>g>the</str<strong>on</strong>g> Seventeenth Internati<strong>on</strong>al Joint C<strong>on</strong>ference<br />

<strong>on</strong> Artificial Intelligence (IJCAI-2001), Seattle, WA, USA, 2000.<br />

[NM00b] Natalya F. Noy and Mark A. Musen. Prompt: Algorithm and tool for automated<br />

<strong>on</strong>tology merging and alignment. In Proc. 17th Natl. C<strong>on</strong>f. On<br />

Artificial Intelligence (AAAI2000), Austin, Texas, USA, July/August 2000.<br />

[NM03a] Natalya F. Noy and Mark A. Musen. <strong>Ontology</strong> versi<strong>on</strong>ing as an element <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

an <strong>on</strong>tology-management framework. To be published in IEEE Intelligent<br />

Systems, 2003.


BIBLIOGRAPHY 120<br />

[NM03b] Natalya F. Noy and Mark A. Musen. The PROMPT suite: Interactive<br />

tools for <strong>on</strong>tology merging and mapping. Internati<strong>on</strong>al Journal <str<strong>on</strong>g>of</str<strong>on</strong>g> Human-<br />

Computer Studies, 59(6):983–1024, 2003.<br />

[OF01] Borys Omelayenko and Dieter Fensel. A two-layered integrati<strong>on</strong> approach<br />

for product informati<strong>on</strong> in B2B e-commerce. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Sec<strong>on</strong>d<br />

Intenati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Electr<strong>on</strong>ic Commerce and Web Technologies<br />

(EC WEB-2001), Munich, Germany, 2001. Springer-Verlag.<br />

[Ome02a] Borys Omelayenko. Integrating vocabularies: Discovering and representing<br />

vocabulary maps. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> First Internati<strong>on</strong>al Semantic Web<br />

C<strong>on</strong>ference (ISWC2002), Sardinia, Italy, 2002.<br />

[Ome02b] Borys Omelayenko. RDFT: A mapping meta-<strong>on</strong>tology for business integrati<strong>on</strong>.<br />

In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong> Knowledge Transformati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

Semantic Web (KTSW 2002) at <str<strong>on</strong>g>the</str<strong>on</strong>g> 15-th European C<strong>on</strong>ference <strong>on</strong> Artificial<br />

Intelligence, pages 76–83, Ly<strong>on</strong>, France, 2002.<br />

[PGM98] John Y. Park, John H. Gennari, and Mark A. Musen. Mappings for reuse in<br />

knowledge-based systems. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 11th Workshop <strong>on</strong> Knowledge<br />

Acquisiti<strong>on</strong>, Modelling and Management (KAW 98), Banff, Canada,<br />

1998.<br />

[RB01] Erhard Rahm and Philip A. Bernstein. A <str<strong>on</strong>g>survey</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> approaches to automatic<br />

schema matching. VLDB Journal: Very Large Data Bases, 10(4):334–350,<br />

2001.<br />

[Riz03] N. Rizopoulos. Bav transformati<strong>on</strong>s <strong>on</strong> relati<strong>on</strong>al schemas based <strong>on</strong> semantic<br />

relati<strong>on</strong>ships between attributes. Technical Report Technical Report 22,<br />

Imperial College L<strong>on</strong>d<strong>on</strong>, 2003.<br />

[SaR03a] Nuno Silva and Jo ao Rocha. <strong>Ontology</strong> mapping for interoperability in<br />

semantic web. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> IADIS Internati<strong>on</strong>al C<strong>on</strong>ference<br />

WWW/Internet 2003 (ICWI’2003), Algarve, Portugal, 2003.<br />

[SaR03b] Nuno Silva and Jo ao Rocha. Service-oriented <strong>on</strong>tology mapping system. In<br />

Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong> Semantic Integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Internati<strong>on</strong>al<br />

Semantic Web C<strong>on</strong>ference (ISWC2003), Sanibel Island, USA, 2003.<br />

[SdB05] François Scharffe and Jos de Bruijn. A language to specify corresp<strong>on</strong>dences<br />

between <strong>on</strong>tologies. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> IEEE Signal Image and Internet<br />

Based Systems (SITIS-05), 2005.<br />

[Sil02] N. Silva. Discovering mappings between distributed <strong>on</strong>tologies. Internal<br />

Report July, University <str<strong>on</strong>g>of</str<strong>on</strong>g> Karlsruhe, 2002.


BIBLIOGRAPHY 121<br />

[SM01] Gerd Stumme and Alexander Maedche. Fca-merge: Bottom-up merging <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

<strong>on</strong>tologies. In 7th Intl. C<strong>on</strong>f. <strong>on</strong> Artificial Intelligence (IJCAI ’01), pages<br />

225–230, Seattle, WA, USA, 2001.<br />

[Smi90] D.C. Smith. Plisp Users Manual. Apple Computers, august, 1990 editi<strong>on</strong>,<br />

1990.<br />

[Usc00] Mike Uschold. Creating, integrating, and maintaining local and global <strong>on</strong>tologies.<br />

In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> First Workshop <strong>on</strong> <strong>Ontology</strong> Learning (OL-<br />

2000) in c<strong>on</strong>juncti<strong>on</strong> with <str<strong>on</strong>g>the</str<strong>on</strong>g> 14th European C<strong>on</strong>ference <strong>on</strong> Artificial Intelligence<br />

(ECAI-2000), Berlin, Germany, August 2000.<br />

[VC98] Pepijn R. S. Visser and Zhan Cui. On accepting heterogeneous <strong>on</strong>tologies<br />

in distributed architectures. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ECAI98 workshop <strong>on</strong> applicati<strong>on</strong>s<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies and problem-solving methods, Bright<strong>on</strong>, UK, 1998.<br />

[VJBCS97] Pepijn R. S. Visser, Dean M. J<strong>on</strong>es, T. J. M. Bench-Cap<strong>on</strong>, and M. J. R.<br />

Shave. An analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tological mismatches: Heterogeneity versus interoperability.<br />

In AAAI 1997 Spring Symposium <strong>on</strong> Ontological Engineering,<br />

Stanford, USA, 1997.<br />

[Wie94] Gio Wiederhold. An algebra for <strong>on</strong>tology compositi<strong>on</strong>. In Proceedings<br />

<str<strong>on</strong>g>of</str<strong>on</strong>g> 1994 M<strong>on</strong>terey Workshop <strong>on</strong> formal Methods, pages 56–61, U.S. Naval<br />

Postgraduate School, M<strong>on</strong>terey CA, 1994.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!