D4.2.2 State-of-the-art survey on Ontology Merging ... - STI Innsbruck
D4.2.2 State-of-the-art survey on Ontology Merging ... - STI Innsbruck
D4.2.2 State-of-the-art survey on Ontology Merging ... - STI Innsbruck
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
EU-IST Integrated Project (IP) IST-2003-506826 SEKT<br />
SEKT: Semantically Enabled Knowledge Technologies<br />
<str<strong>on</strong>g>D4.2.2</str<strong>on</strong>g> <str<strong>on</strong>g>State</str<strong>on</strong>g>-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> <strong>on</strong><br />
<strong>Ontology</strong> <strong>Merging</strong> and Aligning V2<br />
Livia Predoiu (DERI)<br />
Cristina Feier (DERI)<br />
Francois Scharffe (DERI)<br />
Jos de Bruijn (DERI)<br />
Francisco M<str<strong>on</strong>g>art</str<strong>on</strong>g>ín-Recuerda (DERI)<br />
Dimitar Manov (SIRMA)<br />
Marc Ehrig (UKARL)<br />
Abstract.<br />
EU-IST Integrated Project (IP) IST-2003-506826 SEKT<br />
Deliverable <str<strong>on</strong>g>D4.2.2</str<strong>on</strong>g> (WP4)<br />
This deliverable c<strong>on</strong>tains a comprehensive state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> <strong>on</strong> <strong>Ontology</strong> <strong>Merging</strong> and<br />
Aligning methods, tools and specificati<strong>on</strong> languages.<br />
Keyword list: <strong>Ontology</strong> Mediati<strong>on</strong>, state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g>, <strong>on</strong>tology merging, <strong>on</strong>tology aligning,<br />
<strong>on</strong>tology mapping<br />
WP4: <strong>Ontology</strong> Mediati<strong>on</strong><br />
Nature <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Deliverable: Report Disseminati<strong>on</strong> level: PU<br />
C<strong>on</strong>tractual date <str<strong>on</strong>g>of</str<strong>on</strong>g> delivery: 2005-12-31 Actual date <str<strong>on</strong>g>of</str<strong>on</strong>g> delivery: 2005-12-31<br />
Copyright c○ 2006 Digital Enterprise Research Institute, University <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Innsbruck</strong>
SEKT C<strong>on</strong>sortium<br />
This document is p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> a research project p<str<strong>on</strong>g>art</str<strong>on</strong>g>ially funded by <str<strong>on</strong>g>the</str<strong>on</strong>g> IST Programme <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Commissi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> European<br />
Communities as project number IST-2003-506826.<br />
British Telecommunicati<strong>on</strong>s plc.<br />
Ori<strong>on</strong> 5/12, Adastral Park<br />
Ipswich IP5 3RE<br />
UK<br />
Tel: +44 1473 609583, Fax: +44 1473 609832<br />
C<strong>on</strong>tactpers<strong>on</strong>: John Davies<br />
E-mail: john.nj.davies@bt.com<br />
Jozef Stefan Institute<br />
Jamova 39<br />
1000 Ljubljana<br />
Slovenia<br />
Tel: +386 1 4773 778, Fax: +386 1 4251 038<br />
C<strong>on</strong>tactpers<strong>on</strong>: Marko Grobelnik<br />
E-mail: marko.grobelnik@ijs.si<br />
University <str<strong>on</strong>g>of</str<strong>on</strong>g> Sheffield<br />
Dep<str<strong>on</strong>g>art</str<strong>on</strong>g>ment <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science<br />
Regent Court, 211 Portobello St.<br />
Sheffield S1 4DP<br />
UK<br />
Tel: +44 114 222 1891, Fax: +44 114 222 1810<br />
C<strong>on</strong>tactpers<strong>on</strong>: Hamish Cunningham<br />
E-mail: hamish@dcs.shef.ac.uk<br />
Intelligent S<str<strong>on</strong>g>of</str<strong>on</strong>g>tware Comp<strong>on</strong>ents S.A.<br />
Pedro de Valdivia, 10<br />
28006 Madrid<br />
Spain<br />
Tel: +34 913 349 797, Fax: +49 34 913 349 799<br />
C<strong>on</strong>tactpers<strong>on</strong>: Richard Benjamins<br />
E-mail: rbenjamins@isoco.com<br />
Ontoprise GmbH<br />
Amalienbadstr. 36<br />
76227 Karlsruhe<br />
Germany<br />
Tel: +49 721 50980912, Fax: +49 721 50980911<br />
C<strong>on</strong>tactpers<strong>on</strong>: Hans-Peter Schnurr<br />
E-mail: schnurr@<strong>on</strong>toprise.de<br />
Vrije Universiteit Amsterdam (VUA)<br />
Dep<str<strong>on</strong>g>art</str<strong>on</strong>g>ment <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Sciences<br />
De Boelelaan 1081a<br />
1081 HV Amsterdam<br />
The Ne<str<strong>on</strong>g>the</str<strong>on</strong>g>rlands<br />
Tel: +31 20 444 7731, Fax: +31 84 221 4294<br />
C<strong>on</strong>tactpers<strong>on</strong>: Frank van Harmelen<br />
E-mail: frank.van.harmelen@cs.vu.nl<br />
Empolis GmbH<br />
Europaallee 10<br />
67657 Kaiserslautern<br />
Germany<br />
Tel: +49 631 303 5540, Fax: +49 631 303 5507<br />
C<strong>on</strong>tactpers<strong>on</strong>: Ralph Traphöner<br />
E-mail: ralph.traphoener@empolis.com<br />
University <str<strong>on</strong>g>of</str<strong>on</strong>g> Karlsruhe, Institute AIFB<br />
Englerstr. 28<br />
D-76128 Karlsruhe<br />
Germany<br />
Tel: +49 721 608 6592, Fax: +49 721 608 6580<br />
C<strong>on</strong>tactpers<strong>on</strong>: York Sure<br />
E-mail: sure@aifb.uni-karlsruhe.de<br />
University <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Innsbruck</strong><br />
Institute <str<strong>on</strong>g>of</str<strong>on</strong>g> Computer Science<br />
Techikerstraße 13<br />
6020 <strong>Innsbruck</strong><br />
Austria<br />
Tel: +43 512 507 6475, Fax: +43 512 507 9872<br />
C<strong>on</strong>tactpers<strong>on</strong>: Jos de Bruijn<br />
E-mail: jos.de-bruijn@deri.ie<br />
Kea-pro GmbH<br />
Tal<br />
6464 Springen<br />
Switzerland<br />
Tel: +41 41 879 00, Fax: 41 41 879 00 13<br />
C<strong>on</strong>tactpers<strong>on</strong>: Tom Bösser<br />
E-mail: tb@keapro.net<br />
Sirma AI EOOD (Ltd.)<br />
135 Tsarigradsko Shose<br />
S<str<strong>on</strong>g>of</str<strong>on</strong>g>ia 1784<br />
Bulgaria<br />
Tel: +359 2 9768, Fax: +359 2 9768 311<br />
C<strong>on</strong>tactpers<strong>on</strong>: Atanas Kiryakov<br />
E-mail: naso@sirma.bg<br />
Universitat Aut<strong>on</strong>oma de Barcel<strong>on</strong>a<br />
Edifici B, Campus de la UAB<br />
08193 Bellaterra (Cerdanyola del Vallès)<br />
Barcel<strong>on</strong>a<br />
Spain<br />
Tel: +34 93 581 22 35, Fax: +34 93 581 29 88<br />
C<strong>on</strong>tactpers<strong>on</strong>: Pompeu Casanovas Romeu<br />
E-mail: pompeu.casanovasquab.es
Changes<br />
Versi<strong>on</strong> Date Author Changes<br />
0.1 2005-12-01 Cristina creati<strong>on</strong> and fill in a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent <str<strong>on</strong>g>of</str<strong>on</strong>g> D4.2.1<br />
0.2 2005-12-05 Livia added more c<strong>on</strong>tent<br />
0.3 2005-12-10 Cristina added more c<strong>on</strong>tent<br />
0.4 2005-12-19 Livia added more c<strong>on</strong>tent<br />
0.5 2005-12-20 Cristina Versi<strong>on</strong> for QA<br />
0.6 2006-01-31 Livia final Versi<strong>on</strong> for QA
Executive Summary<br />
This report provides a state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong> <strong>Merging</strong> and Aligning methods,<br />
tools and techniques. In this deliverable, we provide a framework for <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>se approaches, as well as a categorizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> approaches for <strong>on</strong>tology merging and<br />
aligning.<br />
We categorize <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in methods and tools, integrati<strong>on</strong> systems and specific<br />
techniques. We compare <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches according to <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> framework and to a<br />
set <str<strong>on</strong>g>of</str<strong>on</strong>g> generic use cases for <strong>on</strong>tology mediati<strong>on</strong> in order to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> problem <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web.<br />
This is <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> this deliverable in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project. We<br />
kept much <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent that was delivered in <str<strong>on</strong>g>the</str<strong>on</strong>g> first versi<strong>on</strong> because <str<strong>on</strong>g>the</str<strong>on</strong>g> systems did not<br />
change and p<str<strong>on</strong>g>art</str<strong>on</strong>g>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> thus did not change. In this versi<strong>on</strong> we have refined<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong>-encountered terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> deliverable and we have updated<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> approaches. We have also added <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
five new systems that are remarkable and make a big c<strong>on</strong>tributi<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> field <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong><br />
Mediati<strong>on</strong> and updated <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> previous systems, where necessarily.
C<strong>on</strong>tents<br />
1 Introducti<strong>on</strong> 3<br />
1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />
1.2 The <strong>Ontology</strong> Mapping Process . . . . . . . . . . . . . . . . . . . . . . 7<br />
1.3 <strong>Ontology</strong> Mismatches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />
1.3.1 <strong>Ontology</strong>-level Mismatches . . . . . . . . . . . . . . . . . . . . 9<br />
1.3.2 Language-level mismatches . . . . . . . . . . . . . . . . . . . . 11<br />
1.4 Integrati<strong>on</strong> approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />
1.5 Wrappers and Mediators . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />
2 Motivati<strong>on</strong>al Use Cases 21<br />
2.1 Use Cases for Instance Mediati<strong>on</strong> . . . . . . . . . . . . . . . . . . . . . 21<br />
2.1.1 Instance Transformati<strong>on</strong> . . . . . . . . . . . . . . . . . . . . . . 22<br />
2.1.2 Instance Unificati<strong>on</strong> . . . . . . . . . . . . . . . . . . . . . . . . 23<br />
2.1.3 Query Rewriting . . . . . . . . . . . . . . . . . . . . . . . . . . 24<br />
2.2 <strong>Ontology</strong> <strong>Merging</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />
2.3 Creating <strong>Ontology</strong> Mappings . . . . . . . . . . . . . . . . . . . . . . . . 26<br />
2.3.1 Finding Similarities . . . . . . . . . . . . . . . . . . . . . . . . . 26<br />
2.3.2 Specifying Mappings . . . . . . . . . . . . . . . . . . . . . . . . 26<br />
3 The Evaluati<strong>on</strong> Framework 27<br />
4 The Survey 30<br />
4.1 Methods and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
4.1.1 MAFRA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
4.1.2 RDFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40<br />
4.1.3 PROMPT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43<br />
4.1.4 GLUE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50<br />
4.1.5 Semantic Matching . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />
4.1.6 OntoMap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />
4.1.7 RDFDiff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />
4.1.8 OntoMerge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />
4.1.9 OMEN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />
4.1.10 WSMT Data mediati<strong>on</strong> module . . . . . . . . . . . . . . . . . . 67<br />
1
CONTENTS 2<br />
4.1.11 DOME mapping module . . . . . . . . . . . . . . . . . . . . . . 72<br />
4.2 Integrated Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />
4.2.1 InfoSleuth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74<br />
4.2.2 ONION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />
4.2.3 OBSERVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />
4.2.4 MOMIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87<br />
4.2.5 INFOMIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90<br />
4.2.6 AutoMed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />
4.3 Specific Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />
4.3.1 FCA-Merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />
4.3.2 OntoMorph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99<br />
4.3.3 QOM Quick <strong>Ontology</strong> Mapping . . . . . . . . . . . . . . . . . . 99<br />
5 Comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Methods 101<br />
5.1 <strong>Ontology</strong> Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />
5.2 Mapping Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />
5.3 Mapping Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />
5.4 Automati<strong>on</strong> Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />
5.5 Applicability to Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />
5.6 Implementati<strong>on</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />
5.7 Experiences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107<br />
6 C<strong>on</strong>clusi<strong>on</strong>s 110<br />
Bibliography 113
Chapter 1<br />
Introducti<strong>on</strong><br />
This report presents a state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mapping and merging methods<br />
and tools with an emphasis <strong>on</strong> <strong>on</strong>tology mapping and inter-operability over <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />
Web.<br />
This is <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> this deliverable in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project. We<br />
kept a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>tent that was delivered in <str<strong>on</strong>g>the</str<strong>on</strong>g> first versi<strong>on</strong> unchanged because <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
systems did not change and p<str<strong>on</strong>g>art</str<strong>on</strong>g>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> thus did not change as well. In<br />
this versi<strong>on</strong> we have refined <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong>-encountered terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> deliverable<br />
and we have updated <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> approaches. We have<br />
also added <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> five new systems that are remarkable and make a big c<strong>on</strong>tributi<strong>on</strong><br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> field <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong> Mediati<strong>on</strong> and updated <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> previous<br />
systems, where necessarily. The five new systems which were added are OMEN, WSMT<br />
Data Mediati<strong>on</strong> module, Dome mapping module, INFOMIX and AutoMed.<br />
The issue <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-operability between informati<strong>on</strong> systems has already existed for<br />
many years. With <str<strong>on</strong>g>the</str<strong>on</strong>g> recent advent <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web [BLHL01], <str<strong>on</strong>g>the</str<strong>on</strong>g> issues have <strong>on</strong>ly<br />
increased, because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> abundance, heterogeneity and independence <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> various data<br />
sources.<br />
Traditi<strong>on</strong>al data integrati<strong>on</strong> systems focus <strong>on</strong> inter-operability between data sources<br />
and applicati<strong>on</strong>s within enterprises. Within enterprises a certain coherence between data<br />
sources can be expected, although data integrati<strong>on</strong> within enterprises still faces many<br />
challenges which remain to be resolved.<br />
We focus <strong>on</strong> informati<strong>on</strong> integrati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. This means that we not<br />
<strong>on</strong>ly take into account data integrati<strong>on</strong> within organizati<strong>on</strong>s 1 , but also explicitly address<br />
integrati<strong>on</strong> across organizati<strong>on</strong>al boundaries. Between organizati<strong>on</strong>s, even more heterogeneity<br />
between <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources can be expected.<br />
Fortunately, <strong>on</strong>tologies [Fen03], <str<strong>on</strong>g>the</str<strong>on</strong>g> backb<strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, can help us with a<br />
1 Arguably, <str<strong>on</strong>g>the</str<strong>on</strong>g> intranet <str<strong>on</strong>g>of</str<strong>on</strong>g> an organizati<strong>on</strong> is an isolated p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Web and could be p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />
Web in <str<strong>on</strong>g>the</str<strong>on</strong>g> near future<br />
3
CHAPTER 1. INTRODUCTION 4<br />
p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> problem. Because <strong>on</strong>tologies are explicit and formal specificati<strong>on</strong>s<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> knowledge, <str<strong>on</strong>g>the</str<strong>on</strong>g>y help in disambiguating data and can help in finding corresp<strong>on</strong>dences<br />
between data sources because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> explicit specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> knowledge in an <strong>on</strong>tology.<br />
On <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, data is annotated using <strong>on</strong>tologies. C<strong>on</strong>cepts (also called<br />
classes) 2 in <strong>on</strong>tologies give meaning to data <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Web. Because <strong>on</strong>tologies are shared<br />
specificati<strong>on</strong>s, <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tologies can be used for <str<strong>on</strong>g>the</str<strong>on</strong>g> annotati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> multiple data sources,<br />
not <strong>on</strong>ly limited to Web pages, but also collecti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> xml documents, relati<strong>on</strong>al databases,<br />
etc. This enables a certain degree <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-operati<strong>on</strong> between data sources using <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />
<strong>on</strong>tologies, because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir shared terminology 3 . However, it cannot be expected that all<br />
individuals and organizati<strong>on</strong>s <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web will ever agree <strong>on</strong> using <strong>on</strong>e comm<strong>on</strong><br />
terminology or <strong>on</strong>tology. It can be expected that many different <strong>on</strong>tologies will appear<br />
and, in order to enable inter-operati<strong>on</strong>, mediati<strong>on</strong> is required between <str<strong>on</strong>g>the</str<strong>on</strong>g>se <strong>on</strong>tologies.<br />
As was argued in [VC98, Usc00], it is very hard to create standard <strong>on</strong>tologies. In fact,<br />
even inside organizati<strong>on</strong>s <str<strong>on</strong>g>the</str<strong>on</strong>g> standardizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a terminology is not feasible, because<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> lengthy process <str<strong>on</strong>g>of</str<strong>on</strong>g> standardizati<strong>on</strong> and because <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> a big standard impedes<br />
changes in <str<strong>on</strong>g>the</str<strong>on</strong>g> organizati<strong>on</strong> (any change would require c<strong>on</strong>sensus am<strong>on</strong>g a large group<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> people, which is hard to achieve). Across organizati<strong>on</strong>s this problem becomes more<br />
severe, because <str<strong>on</strong>g>the</str<strong>on</strong>g> group <str<strong>on</strong>g>of</str<strong>on</strong>g> people which need to reach c<strong>on</strong>sensus is much bigger and<br />
c<strong>on</strong>flicts <str<strong>on</strong>g>of</str<strong>on</strong>g> interest are more likely to occur. Therefore, it is likely that <str<strong>on</strong>g>the</str<strong>on</strong>g>re will be many<br />
different heterogeneous <strong>on</strong>tologies <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web and in order to enable interoperability<br />
between applicati<strong>on</strong>s <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, mediati<strong>on</strong> is required between<br />
different representati<strong>on</strong>s (<strong>on</strong>tologies) <str<strong>on</strong>g>of</str<strong>on</strong>g> knowledge in <str<strong>on</strong>g>the</str<strong>on</strong>g> same domain.<br />
This report presents a <str<strong>on</strong>g>survey</str<strong>on</strong>g> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> state <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mapping, merging and<br />
mediati<strong>on</strong>. It includes both well-known approaches in database integrati<strong>on</strong> and recent approaches<br />
specifically addressing <strong>on</strong>tology mapping <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. The Semantic<br />
Web has a number <str<strong>on</strong>g>of</str<strong>on</strong>g> distinguishing features relative to older data integrati<strong>on</strong> systems:<br />
• The Semantic Web relies heavily <strong>on</strong> standardizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> both <str<strong>on</strong>g>the</str<strong>on</strong>g> protocols for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
transport <str<strong>on</strong>g>of</str<strong>on</strong>g> data (HTTP) and <str<strong>on</strong>g>the</str<strong>on</strong>g> syntax for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> data and knowledge<br />
(RDF [LS99] and OWL [DS04]).<br />
• In c<strong>on</strong>trast with <str<strong>on</strong>g>the</str<strong>on</strong>g> database schemas in many data integrati<strong>on</strong> systems, <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data are made explicit through a logic-based language.<br />
• Ontologies capture knowledge in a way understandable to humans and processable<br />
by machines. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <strong>on</strong>tologies ideally represent a c<strong>on</strong>sensual view <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />
p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular domain, which is shared am<strong>on</strong>g a group <str<strong>on</strong>g>of</str<strong>on</strong>g> people.<br />
2 We use <str<strong>on</strong>g>the</str<strong>on</strong>g> words c<strong>on</strong>cept and class interchangeably in this document.<br />
3 Note that this is not <str<strong>on</strong>g>the</str<strong>on</strong>g> end <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> story. For many applicati<strong>on</strong>s it is necessary to detect whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r pieces<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> data, coming from different data sources annotated with <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tology, actually refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />
thing. This is a challenge we also address in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project; it is described in more detail<br />
in Secti<strong>on</strong> 2.1.2 <str<strong>on</strong>g>of</str<strong>on</strong>g> this report.
CHAPTER 1. INTRODUCTION 5<br />
These features help in <str<strong>on</strong>g>the</str<strong>on</strong>g> task <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong>. For example, because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
standardizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> languages <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, syntax does not play a big role, so<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> mediati<strong>on</strong> can focus <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data.<br />
Note that Semantic Web technologies can not <strong>on</strong>ly be put to use <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> World Wide<br />
Web. They can also be employed within company intranets in order to achieve interoperability<br />
between applicati<strong>on</strong>s within an organizati<strong>on</strong>.<br />
This chapter is fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r structured as follows. We first clarify <str<strong>on</strong>g>the</str<strong>on</strong>g> terminology used in<br />
this <str<strong>on</strong>g>survey</str<strong>on</strong>g> in Secti<strong>on</strong> 1.1. In Secti<strong>on</strong> 1.2 we explain <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping process as<br />
we see it, which is used during <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> to identify <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> certain methods in this<br />
process. Then, we present a list <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches, which can occur between <strong>on</strong>tologies, in<br />
Secti<strong>on</strong> 1.3. In Secti<strong>on</strong> 1.4 we summarize different approaches to achieve integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
multiple heterogeneous data sources. We c<strong>on</strong>clude with a few remarks about <str<strong>on</strong>g>the</str<strong>on</strong>g> wrapper/mediator<br />
architecture, which is used in several <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>, in<br />
Secti<strong>on</strong> 1.5.<br />
1.1 Terminology<br />
This secti<strong>on</strong> provides some clarificati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> terminology used in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>. We deem<br />
this necessary, because <str<strong>on</strong>g>the</str<strong>on</strong>g>re exist many different understandings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> terminology in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> literature.<br />
<strong>Ontology</strong> A widespread definiti<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> noti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology is ”an explicit specificati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> a c<strong>on</strong>ceptualizati<strong>on</strong>”[Gru94]. In <str<strong>on</strong>g>the</str<strong>on</strong>g> following we c<strong>on</strong>sider an <strong>on</strong>tology to be a<br />
4-tuple 〈C, R, I, A〉, where C is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts, R is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>s, I is a set <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
instances and A is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> axioms. A c<strong>on</strong>cept in an <strong>on</strong>tology is a category <str<strong>on</strong>g>of</str<strong>on</strong>g> real or<br />
virtual objects <str<strong>on</strong>g>of</str<strong>on</strong>g> interest, which we call instances. Relati<strong>on</strong>s specify how different<br />
c<strong>on</strong>cepts and instances are interc<strong>on</strong>nected, and axioms are formulae that allow to<br />
fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r specify interdependencies between all menti<strong>on</strong>ed <strong>on</strong>tology entities. All c<strong>on</strong>cepts,<br />
relati<strong>on</strong>s, instances, and axioms are specified in some logical language. Note<br />
that <str<strong>on</strong>g>the</str<strong>on</strong>g> sets C and I are not necessarily disjoint (e.g. <str<strong>on</strong>g>the</str<strong>on</strong>g> same term can denote<br />
both a class and an instance), although <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language <str<strong>on</strong>g>of</str<strong>on</strong>g>ten requires this.<br />
In an <strong>on</strong>tology, c<strong>on</strong>cepts are usually organized in a subclass hierarchy, through <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> (or subc<strong>on</strong>cept-<str<strong>on</strong>g>of</str<strong>on</strong>g>)relati<strong>on</strong>ship. More general c<strong>on</strong>cepts reside higher in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> hierarchy.<br />
Instance Base Although instances are logically p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology, it is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten useful to<br />
separate between an <strong>on</strong>tology describing a collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances and <str<strong>on</strong>g>the</str<strong>on</strong>g> collecti<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> instances described by <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. We refer to this collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances as <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Instance Base. Instance bases are sometimes used to discover similarities between<br />
c<strong>on</strong>cepts in different <strong>on</strong>tologies (e.g. [SM01], [DMDH04]). Note that this does not
CHAPTER 1. INTRODUCTION 6<br />
rule out <str<strong>on</strong>g>the</str<strong>on</strong>g> situati<strong>on</strong> where instances use several <strong>on</strong>tologies for <str<strong>on</strong>g>the</str<strong>on</strong>g>ir descripti<strong>on</strong>.<br />
However, most approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> which make use <str<strong>on</strong>g>of</str<strong>on</strong>g> instances assume a<br />
collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances described by <strong>on</strong>e <strong>on</strong>tology.<br />
<strong>Ontology</strong> Language The <strong>on</strong>tology language is <str<strong>on</strong>g>the</str<strong>on</strong>g> language which is used to represent<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. Popular <strong>on</strong>tology languages for <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web are RDFS [BG04]<br />
and OWL [DS04]. Semantic Web <strong>on</strong>tology languages can be split up into two<br />
p<str<strong>on</strong>g>art</str<strong>on</strong>g>s: <str<strong>on</strong>g>the</str<strong>on</strong>g> logical and <str<strong>on</strong>g>the</str<strong>on</strong>g> extra-logical p<str<strong>on</strong>g>art</str<strong>on</strong>g>s. The logical p<str<strong>on</strong>g>art</str<strong>on</strong>g> usually amounts to a<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>ory in some logical language, which can be used for reas<strong>on</strong>ing. The logical p<str<strong>on</strong>g>art</str<strong>on</strong>g><br />
basically c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a number <str<strong>on</strong>g>of</str<strong>on</strong>g> logical axioms, which form <str<strong>on</strong>g>the</str<strong>on</strong>g> class (c<strong>on</strong>cept)<br />
definiti<strong>on</strong>s, property (relati<strong>on</strong>) definiti<strong>on</strong>s, instance definiti<strong>on</strong>s, etc.<br />
The extra-logical p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> language typically c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> n<strong>on</strong>-functi<strong>on</strong>al properties<br />
(e.g. author name, creati<strong>on</strong> date, natural language comments, multi-lingual<br />
labels) and o<str<strong>on</strong>g>the</str<strong>on</strong>g>r extra-logical statements, such as namespace declarati<strong>on</strong>s, <strong>on</strong>tology<br />
imports, versi<strong>on</strong>ing, etc.<br />
N<strong>on</strong>-functi<strong>on</strong>al properties are typically <strong>on</strong>ly for <str<strong>on</strong>g>the</str<strong>on</strong>g> human reader, whereas many <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r extra-logical statements are machine-processable. For example, namespaces<br />
can be resolved by <str<strong>on</strong>g>the</str<strong>on</strong>g> machine and <str<strong>on</strong>g>the</str<strong>on</strong>g> importing <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies can be<br />
achieved automatically by ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r (a) appending <str<strong>on</strong>g>the</str<strong>on</strong>g> logical p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> imported <strong>on</strong>tology<br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> logical p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> importing <strong>on</strong>tology to create <strong>on</strong>e logical <str<strong>on</strong>g>the</str<strong>on</strong>g>ory or<br />
(b) using a mediator, which resolves <str<strong>on</strong>g>the</str<strong>on</strong>g> heterogeneity between <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies<br />
(see also <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong> Mediati<strong>on</strong> below).<br />
<strong>Ontology</strong> Mediati<strong>on</strong> <strong>Ontology</strong> mediati<strong>on</strong> is <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> rec<strong>on</strong>ciling differences between<br />
heterogeneous <strong>on</strong>tologies in order to achieve inter-operati<strong>on</strong> between data<br />
sources annotated with and applicati<strong>on</strong>s using <str<strong>on</strong>g>the</str<strong>on</strong>g>se <strong>on</strong>tologies. This includes <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
discovery and specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings, as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se mappings<br />
for certain tasks, such as query rewriting and instance transformati<strong>on</strong>. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> merging <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies also falls under <str<strong>on</strong>g>the</str<strong>on</strong>g> term <strong>on</strong>tology mediati<strong>on</strong>.<br />
<strong>Ontology</strong> Mapping An <strong>on</strong>tology mapping M is a (declarative) specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic<br />
overlap between two <strong>on</strong>tologies OS and OT . The corresp<strong>on</strong>dences between<br />
different entities <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies are typically expressed using some axioms<br />
formulated in a specific mapping language. Mappings can be unidirecti<strong>on</strong>al (that<br />
specify how terms in <strong>on</strong>e <strong>on</strong>tology can be expressed using terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
<strong>on</strong>tology) or bidirecti<strong>on</strong>al (that work in both ways, i.e terms from <strong>on</strong>e <strong>on</strong>tology are<br />
expressed using terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r and <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r way around).<br />
Mapping Language The mapping language is <str<strong>on</strong>g>the</str<strong>on</strong>g> language used to represent <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
mapping M. It is important here to distinguish between a specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> similarities <str<strong>on</strong>g>of</str<strong>on</strong>g> entities between <strong>on</strong>tologies and an actual <strong>on</strong>tology mapping. The<br />
specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities between <strong>on</strong>tologies is usually a level <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>fidence (usually<br />
between 0 and 1) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> entities, whereas an <strong>on</strong>tology mapping
CHAPTER 1. INTRODUCTION 7<br />
actually specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. This is typically<br />
an exact specificati<strong>on</strong> and typically far more powerful than simple similarity<br />
measures. Mapping languages <str<strong>on</strong>g>of</str<strong>on</strong>g>ten allow arbitrary transformati<strong>on</strong> between <strong>on</strong>tologies,<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g>ten using a rule-based formalism and typically allowing arbitrary value<br />
transformati<strong>on</strong>s.<br />
Mapping Pattern Although not <str<strong>on</strong>g>of</str<strong>on</strong>g>ten used in current approaches to <strong>on</strong>tology mediati<strong>on</strong>,<br />
patterns can play an important role in <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings, because<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>y have <str<strong>on</strong>g>the</str<strong>on</strong>g> potential to make mappings more c<strong>on</strong>cise, better understandable<br />
and reduce <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> errors (cf. [PGM98]). A mapping pattern can be seen<br />
as a template for mappings which occur very <str<strong>on</strong>g>of</str<strong>on</strong>g>ten. Patterns can range from very<br />
simple (e.g. a mapping between a c<strong>on</strong>cept and a relati<strong>on</strong>) to very complex, in which<br />
case <str<strong>on</strong>g>the</str<strong>on</strong>g> pattern captures comprehensive substructures <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, which are<br />
related in a certain way.<br />
Matching We define <strong>on</strong>tology matching as <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> discovering similarities between<br />
two source <strong>on</strong>tologies. The result <str<strong>on</strong>g>of</str<strong>on</strong>g> a matching operati<strong>on</strong> is a specificati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> similarities between two <strong>on</strong>tologies. <strong>Ontology</strong> matching is d<strong>on</strong>e through<br />
applicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator (cf. [RB01]). Any schema matching or <strong>on</strong>tology<br />
matching algorithm can be used to implement <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator, e.g.<br />
[DMDH04, GSY04, MBR01, MRB03].<br />
The specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities typically serves as an input to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping<br />
(see also Secti<strong>on</strong> 1.2).<br />
<strong>Merging</strong> Ontologies <strong>Merging</strong> refers to <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e <strong>on</strong>tology from two or more<br />
source <strong>on</strong>tologies. The new <strong>on</strong>tology will unify and in general replace <str<strong>on</strong>g>the</str<strong>on</strong>g> original<br />
source <strong>on</strong>tologies (see also Secti<strong>on</strong> 1.4).<br />
Aligning Ontologies Specifying how <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> different <strong>on</strong>tologies are related<br />
in a logical sense. This means that <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies have not changed, but<br />
that additi<strong>on</strong>al axioms describe <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts. Leaving<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies unchanged <str<strong>on</strong>g>of</str<strong>on</strong>g>ten implies that <strong>on</strong>ly a p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong><br />
can be d<strong>on</strong>e, because major differences may require adaptati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
This c<strong>on</strong>cept is very similar with <strong>Ontology</strong> Mapping, however it is a more general<br />
c<strong>on</strong>cept: two <strong>on</strong>tologies can be aligned by creating an <strong>on</strong>tology mapping.<br />
1.2 The <strong>Ontology</strong> Mapping Process<br />
In order to clarify <str<strong>on</strong>g>the</str<strong>on</strong>g> role <str<strong>on</strong>g>of</str<strong>on</strong>g> many <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> methods, tools and techniques in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>,<br />
we will explain in this secti<strong>on</strong> what we see as <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping process. Many <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> tools and techniques in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> form a p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> overall mapping process and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
integrati<strong>on</strong> systems typically form a superset <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g>y typically
CHAPTER 1. INTRODUCTION 8<br />
Figure 1.1: The <strong>Ontology</strong> Mapping Process<br />
incorporate <str<strong>on</strong>g>the</str<strong>on</strong>g> complete mapping process, but also <str<strong>on</strong>g>of</str<strong>on</strong>g>fer additi<strong>on</strong>al functi<strong>on</strong>ality, such as<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping to perform <str<strong>on</strong>g>the</str<strong>on</strong>g> actual querying and data integrati<strong>on</strong>.<br />
First, we have to note that for simplicity we assume <strong>on</strong>ly two 4 different <strong>on</strong>tologies O1<br />
and O2, which describe <str<strong>on</strong>g>the</str<strong>on</strong>g> same or similar domains, as input to <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process.<br />
The outcome <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process is ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r a mapping M, which describes how O1 and<br />
O2 are related, or a new <strong>on</strong>tology OM, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> merger <str<strong>on</strong>g>of</str<strong>on</strong>g> O1 and O2.<br />
Figure 1.1 depicts <str<strong>on</strong>g>the</str<strong>on</strong>g> different phases in <str<strong>on</strong>g>the</str<strong>on</strong>g> generic mapping process as we see it.<br />
Not all phases are necessarily incorporated in every mapping tool and several phases in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> process are opti<strong>on</strong>al. We distinguish <str<strong>on</strong>g>the</str<strong>on</strong>g> following phases (in temporal order) in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
mapping process:<br />
1. Import <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies Ontologies can be specified in different languages, which indicates<br />
a need to c<strong>on</strong>vert <str<strong>on</strong>g>the</str<strong>on</strong>g>m to a comm<strong>on</strong> format in order to be able to specify<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies need to be imported in <str<strong>on</strong>g>the</str<strong>on</strong>g> tool, which is<br />
used to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping.<br />
2. Finding Similarities Many systems use <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator to (semi-)automatically<br />
find similarities between schemas or <strong>on</strong>tologies. For any two source <strong>on</strong>tologies,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator returns <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. We distinguish<br />
this phase in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process <strong>on</strong>ly when <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities are determined in<br />
an automatic fashi<strong>on</strong>. If <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process is completely manual, this phase is<br />
skipped.<br />
3. Specifying Mapping/<strong>Merging</strong> After (potential) similarities between <strong>on</strong>tologies have<br />
been found, <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies needs to be specified. This specificati<strong>on</strong><br />
is usually a manual process, but it can be aided by a tool. PROMPT<br />
[NM00b], for example, comes up with c<strong>on</strong>crete proposals for merge operati<strong>on</strong>s,<br />
so that for many operati<strong>on</strong>s <str<strong>on</strong>g>the</str<strong>on</strong>g> user <strong>on</strong>ly needs to say “execute”, instead <str<strong>on</strong>g>of</str<strong>on</strong>g> having<br />
to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> complete operati<strong>on</strong>.<br />
In many cases (e.g. PROMPT), <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a feedback loop from this phase to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
previous phase. Typically, <str<strong>on</strong>g>the</str<strong>on</strong>g> tool can <str<strong>on</strong>g>of</str<strong>on</strong>g>fer more precise similarity measures when<br />
4 It is straightforward to scale up this approach to more than two <strong>on</strong>tologies.
CHAPTER 1. INTRODUCTION 9<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> user has already specified p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. Many matching algorithms<br />
do not include this feedback loop. However, <str<strong>on</strong>g>the</str<strong>on</strong>g>se algorithms can <str<strong>on</strong>g>of</str<strong>on</strong>g>ten be readily<br />
applied in an overall algorithm which executes <str<strong>on</strong>g>the</str<strong>on</strong>g> match algorithm in each iterati<strong>on</strong><br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> process.<br />
The three phases <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process are specified at a very high level. Many <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> provide a more detailed descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> (p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g>) this mapping<br />
process (e.g. PROMPT, Secti<strong>on</strong> 4.1.3, MOMIS, Secti<strong>on</strong> 4.2.4).<br />
1.3 <strong>Ontology</strong> Mismatches<br />
Different types <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches can occur between different <strong>on</strong>tologies. It is important<br />
to identify which kind <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches can and do occur between <strong>on</strong>tologies, in order to<br />
resolve <str<strong>on</strong>g>the</str<strong>on</strong>g>se mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping or <str<strong>on</strong>g>the</str<strong>on</strong>g> merge <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. The classificati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mismatches is also important to denote which kind <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches can be<br />
resolved with a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular mapping formalism (language) and which kind <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches<br />
can be detected with a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular matching algorithm.<br />
Klein [Kle01] identifies two levels <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches between <strong>on</strong>tologies. The first level<br />
is <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language or meta-model level. These mismatches include syntactic mismatches,<br />
differences in <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> primitives in <str<strong>on</strong>g>the</str<strong>on</strong>g> different languages, and differences<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> expressivity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> languages. We will describe <str<strong>on</strong>g>the</str<strong>on</strong>g>se mismatches in more<br />
detail in Secti<strong>on</strong> 1.3.2. The sec<strong>on</strong>d level <str<strong>on</strong>g>of</str<strong>on</strong>g> mismatches is <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology or model level,<br />
which is described below.<br />
1.3.1 <strong>Ontology</strong>-level Mismatches<br />
Where mismatches at <str<strong>on</strong>g>the</str<strong>on</strong>g> language level include differences in encoding and meaning <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
language c<strong>on</strong>structs, mismatches at <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology level include mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning<br />
or encoding <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts in different <strong>on</strong>tologies. Klein follows <str<strong>on</strong>g>the</str<strong>on</strong>g> basic types <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology<br />
mismatches identified in [VJBCS97]:<br />
• C<strong>on</strong>ceptualizati<strong>on</strong> mismatches are mismatches between different c<strong>on</strong>ceptualizati<strong>on</strong>s<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same domain.<br />
• Explicati<strong>on</strong> mismatches are mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> way a c<strong>on</strong>ceptualizati<strong>on</strong> is specified.<br />
[Kle01] distinguishes two different c<strong>on</strong>ceptualizati<strong>on</strong> mismatches:<br />
Scope mismatch Two classes have some overlap in <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong> (<str<strong>on</strong>g>the</str<strong>on</strong>g> set <str<strong>on</strong>g>of</str<strong>on</strong>g> instances),<br />
but <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong>s are not exactly <str<strong>on</strong>g>the</str<strong>on</strong>g> same. [VJBCS97] call this a class mismatch<br />
and work it out fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r for classes and relati<strong>on</strong>s.
CHAPTER 1. INTRODUCTION 10<br />
Model coverage and granularity This mismatch is a difference in <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> domain<br />
that is covered by both <strong>on</strong>tologies or <str<strong>on</strong>g>the</str<strong>on</strong>g> level <str<strong>on</strong>g>of</str<strong>on</strong>g> detail with which <str<strong>on</strong>g>the</str<strong>on</strong>g> model is<br />
covered.<br />
Klein fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore distinguishes different types <str<strong>on</strong>g>of</str<strong>on</strong>g> explicati<strong>on</strong> mismatches. First, <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />
are two mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> style <str<strong>on</strong>g>of</str<strong>on</strong>g> modeling:<br />
Paradigm These mismatches occur when different paradigms are used for <str<strong>on</strong>g>the</str<strong>on</strong>g> explicati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same c<strong>on</strong>cept. For example, <strong>on</strong>e <strong>on</strong>tology might represent time using<br />
intervals, while ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology might use points to represent time.<br />
C<strong>on</strong>cept descripti<strong>on</strong> Mismatches in <str<strong>on</strong>g>the</str<strong>on</strong>g> way a c<strong>on</strong>cept is described. For example, differences<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> way <str<strong>on</strong>g>the</str<strong>on</strong>g> subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> hierarchy is built or when in <strong>on</strong>e <strong>on</strong>tology several<br />
subclasses are defined for groups <str<strong>on</strong>g>of</str<strong>on</strong>g> instances, while in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology subclasses<br />
are created for <str<strong>on</strong>g>the</str<strong>on</strong>g>se different groups.<br />
Then <str<strong>on</strong>g>the</str<strong>on</strong>g>re are <str<strong>on</strong>g>the</str<strong>on</strong>g> terminological mismatches:<br />
Syn<strong>on</strong>ym terms Two terms are equivalent when <str<strong>on</strong>g>the</str<strong>on</strong>g>y are semantically equivalent, but<br />
are represented by different names. It is possible to use dicti<strong>on</strong>aries or <str<strong>on</strong>g>the</str<strong>on</strong>g>sauri to<br />
resolve this problem, but <strong>on</strong>e should be aware <str<strong>on</strong>g>of</str<strong>on</strong>g> possible scope differences (see <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
first c<strong>on</strong>ceptualizati<strong>on</strong> mismatch above).<br />
Hom<strong>on</strong>ym terms This problem occurs when semantically different c<strong>on</strong>cepts have <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
same name.<br />
Finally, <str<strong>on</strong>g>the</str<strong>on</strong>g> last type <str<strong>on</strong>g>of</str<strong>on</strong>g> difference:<br />
Encoding Values in different <strong>on</strong>tologies might be encoded in a different way. For example,<br />
<strong>on</strong>e <strong>on</strong>tology might define distance in kilometers, while ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r uses miles.<br />
Inter-<strong>on</strong>tology relati<strong>on</strong>ships [MIKS00] takes a slightly different approach. This paper<br />
identifies different types <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships (based <strong>on</strong> relati<strong>on</strong>ships identified<br />
in [HM93]) that should be taken into account by <strong>on</strong>tology mapping systems:<br />
Syn<strong>on</strong>ym Two terms in different <strong>on</strong>tologies have <str<strong>on</strong>g>the</str<strong>on</strong>g> same semantics. This corresp<strong>on</strong>ds<br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> syn<strong>on</strong>ym terms mismatch menti<strong>on</strong>ed above.<br />
Hyp<strong>on</strong>ym A term is less general than ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>e in a different <strong>on</strong>tology. This is a special<br />
kind <str<strong>on</strong>g>of</str<strong>on</strong>g> scope mismatch and can also be seen as a c<strong>on</strong>cept descripti<strong>on</strong> mismatch.<br />
Hypernym A term is more general than ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>e in a different <strong>on</strong>tology. This is<br />
a special kind <str<strong>on</strong>g>of</str<strong>on</strong>g> scope mismatch and can also be seen as a c<strong>on</strong>cept descripti<strong>on</strong><br />
mismatch.
CHAPTER 1. INTRODUCTION 11<br />
Overlap There is an intersecti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracti<strong>on</strong> represented by two terms. This corresp<strong>on</strong>ds<br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> scope mismatch.<br />
Disjoint There is no intersecti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracti<strong>on</strong> represented by two terms.<br />
Covering The abstracti<strong>on</strong> represented by a term in <strong>on</strong>e <strong>on</strong>tology is <str<strong>on</strong>g>the</str<strong>on</strong>g> same as <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracti<strong>on</strong><br />
represented by <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r given abstracti<strong>on</strong>s which are subsumed<br />
individually by <str<strong>on</strong>g>the</str<strong>on</strong>g> term. This corresp<strong>on</strong>ds to <str<strong>on</strong>g>the</str<strong>on</strong>g> granularity mismatch identified<br />
by Klein.<br />
1.3.2 Language-level mismatches<br />
Typically, <strong>on</strong>tology mappings require <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies to be represented<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> same language. This translati<strong>on</strong> may already resolve most <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> language issues,<br />
which can occur. Typical language level mismatches are syntax, logical representati<strong>on</strong>,<br />
semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> primitives and language expressivity [Kle01]. Most systems presented in this<br />
<str<strong>on</strong>g>survey</str<strong>on</strong>g> do such a translati<strong>on</strong> but do not say if and how <str<strong>on</strong>g>the</str<strong>on</strong>g> issues surrounding language<br />
level mismatches are resolved in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> internal representati<strong>on</strong>.<br />
Also, many methods and tools for matching and mapping require <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies<br />
to be expressed in a certain representati<strong>on</strong>al format. Although <str<strong>on</strong>g>the</str<strong>on</strong>g>re typically exists a<br />
translati<strong>on</strong> from any <strong>on</strong>tology language into this p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular representati<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> preservati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> semantics can still not be guaranteed.<br />
We will now go over <str<strong>on</strong>g>the</str<strong>on</strong>g> above menti<strong>on</strong>ed language level mismatches and describe<br />
what <str<strong>on</strong>g>the</str<strong>on</strong>g> (potential) issues are with <str<strong>on</strong>g>the</str<strong>on</strong>g> current systems and techniques:<br />
• Certainly, differences in syntax would be resolved by any such translati<strong>on</strong> to an<br />
internal representati<strong>on</strong>, since both <strong>on</strong>tologies <str<strong>on</strong>g>the</str<strong>on</strong>g>n use <str<strong>on</strong>g>the</str<strong>on</strong>g> same syntax.<br />
• Differences in logical representati<strong>on</strong> occur when syntactically different, but logically<br />
equivalent statements are used to represent <str<strong>on</strong>g>the</str<strong>on</strong>g> same thing. An example <str<strong>on</strong>g>of</str<strong>on</strong>g> this<br />
is <str<strong>on</strong>g>the</str<strong>on</strong>g> way disjointness is expressed in <str<strong>on</strong>g>the</str<strong>on</strong>g> OWL Lite species <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Web <strong>Ontology</strong><br />
Language OWL [MvH04], compared to <str<strong>on</strong>g>the</str<strong>on</strong>g> way disjointness is usually expressed<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> OWL DL species 5 .<br />
Arguably, this is not really an issue with <str<strong>on</strong>g>the</str<strong>on</strong>g> language itself, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r an issue with<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> language. However, when a language allows <str<strong>on</strong>g>the</str<strong>on</strong>g> user to model <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
same thing in different ways, it is easy for a user to mistakenly model certain things<br />
in an inc<strong>on</strong>venient way and it is harder for a user to understand <str<strong>on</strong>g>the</str<strong>on</strong>g> model created<br />
by a different user or indeed created by him/herself in <str<strong>on</strong>g>the</str<strong>on</strong>g> past. When <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
language used by <str<strong>on</strong>g>the</str<strong>on</strong>g> technique/tool/system allows such different logical representati<strong>on</strong>s<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> equivalent statements, this mismatch still needs to be taken into account<br />
5 The OWL Lite statement Class(owl:Nothing complete A B), although also valid in OWL<br />
DL, is usually modeled as DisjointClasses(A B) in OWL DL
CHAPTER 1. INTRODUCTION 12<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping process. In order to overcome <str<strong>on</strong>g>the</str<strong>on</strong>g>se issues, <strong>on</strong>e could think<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> a normalizati<strong>on</strong> step before <str<strong>on</strong>g>the</str<strong>on</strong>g> st<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process or reas<strong>on</strong>ing during<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process in order to detect equivalence in logical expressi<strong>on</strong>s.<br />
• When <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> primitives is different in different <strong>on</strong>tology languages, i.e. a<br />
syntactically equivalent c<strong>on</strong>struct has a different meaning in <str<strong>on</strong>g>the</str<strong>on</strong>g> different languages,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> representati<strong>on</strong> needs to take this into account. Fortunately,<br />
this problem can be resolved in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> representati<strong>on</strong>.<br />
If both <strong>on</strong>tologies already use <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> representati<strong>on</strong> and this comm<strong>on</strong><br />
representati<strong>on</strong> does not allow ambiguous statements, this mismatch does not occur.<br />
• Differences in expressivity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> languages are resolved in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
comm<strong>on</strong> representati<strong>on</strong> language. However, if <str<strong>on</strong>g>the</str<strong>on</strong>g> expressivity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> representati<strong>on</strong><br />
language is not a superset <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> language <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology, some<br />
semantics might get lost in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong>, as was pointed out in [MWK00].<br />
As we can see, <str<strong>on</strong>g>the</str<strong>on</strong>g> issues with language mismatches are less severe if <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a translati<strong>on</strong><br />
to a comm<strong>on</strong> representati<strong>on</strong>. However, all <strong>on</strong>tology mismatches menti<strong>on</strong>ed previously<br />
need to be taken into account when creating any mapping between <strong>on</strong>tologies.<br />
1.4 Integrati<strong>on</strong> approaches<br />
We identify two major paradigms in informati<strong>on</strong> integrati<strong>on</strong>: (1) merging data models<br />
into a central model and (2) aligning and mapping models. In <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology engineering<br />
community <str<strong>on</strong>g>the</str<strong>on</strong>g>se approaches are known as <strong>Ontology</strong> <strong>Merging</strong> and <strong>Ontology</strong> Aligning.<br />
We distinguish two distinct approaches in <strong>on</strong>tology merging. In <str<strong>on</strong>g>the</str<strong>on</strong>g> first approach <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
input <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process is a collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> outcome is <strong>on</strong>e new,<br />
merged, <strong>on</strong>tology which captures <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies (see Figure 1.2). A prominent<br />
example <str<strong>on</strong>g>of</str<strong>on</strong>g> this approach is PROMPT [NM00b], which is an algorithm and a tool for<br />
interactively merging <strong>on</strong>tologies. In <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d approach (see Figure 1.3) <str<strong>on</strong>g>the</str<strong>on</strong>g> original<br />
<strong>on</strong>tologies are not replaced, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r a ‘view’, called bridge <strong>on</strong>tology, is created which<br />
imports <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies and specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> corresp<strong>on</strong>dences using bridge axioms.<br />
OntoMerge [DMQ02] is a prominent example <str<strong>on</strong>g>of</str<strong>on</strong>g> this approach.<br />
[NM99] clarifies <str<strong>on</strong>g>the</str<strong>on</strong>g> difference between <strong>on</strong>tology merging and <strong>on</strong>tology aligning.<br />
When merging two <strong>on</strong>tologies, a single coherent <strong>on</strong>tology is created that is a merged<br />
versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two original <strong>on</strong>tologies. When aligning two <strong>on</strong>tologies, <str<strong>on</strong>g>the</str<strong>on</strong>g> two original<br />
<strong>on</strong>tologies persist, with a number <str<strong>on</strong>g>of</str<strong>on</strong>g> links established between <str<strong>on</strong>g>the</str<strong>on</strong>g>m, allowing <str<strong>on</strong>g>the</str<strong>on</strong>g> aligned<br />
<strong>on</strong>tologies to reuse informati<strong>on</strong> from <strong>on</strong>e ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r. The alignment <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies is usually<br />
p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology merging process.<br />
Soluti<strong>on</strong>s can be fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r classified al<strong>on</strong>g two dimensi<strong>on</strong>s: a run-time and a design-time<br />
dimensi<strong>on</strong>. The run-time dimensi<strong>on</strong> c<strong>on</strong>cerns with <str<strong>on</strong>g>the</str<strong>on</strong>g> way <str<strong>on</strong>g>the</str<strong>on</strong>g> user views <str<strong>on</strong>g>the</str<strong>on</strong>g> data in <str<strong>on</strong>g>the</str<strong>on</strong>g>
CHAPTER 1. INTRODUCTION 13<br />
Figure 1.2: Output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process - case 1<br />
Figure 1.3: Output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process - case 2<br />
system during operati<strong>on</strong>. The design-time dimensi<strong>on</strong> c<strong>on</strong>cerns with <str<strong>on</strong>g>the</str<strong>on</strong>g> way <str<strong>on</strong>g>the</str<strong>on</strong>g> models<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> disparate data sources are integrated.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time, or user-centered dimensi<strong>on</strong> we distinguish two approaches: (1) <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
local model and (2) <str<strong>on</strong>g>the</str<strong>on</strong>g> global model approach. The difference between <str<strong>on</strong>g>the</str<strong>on</strong>g>se two approaches<br />
is whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r, in interacti<strong>on</strong>s with <str<strong>on</strong>g>the</str<strong>on</strong>g> system, <str<strong>on</strong>g>the</str<strong>on</strong>g> user can use his/her own local<br />
data model, or whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> user needs to c<strong>on</strong>form to a global model when interacting with<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> system:<br />
The Local Model/<strong>Ontology</strong> Approach. In this case <str<strong>on</strong>g>the</str<strong>on</strong>g> user is represented by an agent<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> system and this agent represents <str<strong>on</strong>g>the</str<strong>on</strong>g> user with its own local data model. The<br />
agent performs <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s local model and ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> global<br />
model or o<str<strong>on</strong>g>the</str<strong>on</strong>g>r local models in order to allow interacti<strong>on</strong> with multiple data sources<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />
The Global Model/<strong>Ontology</strong> Approach. The user views <str<strong>on</strong>g>the</str<strong>on</strong>g> system through <str<strong>on</strong>g>the</str<strong>on</strong>g> global<br />
data model using a mediator, which is ”a system that supports an integrated view<br />
over multiple informati<strong>on</strong> sources”[Hul97]. Note that in <str<strong>on</strong>g>the</str<strong>on</strong>g> local model approach,<br />
a user agent will in most cases also c<strong>on</strong>tact a mediator in order to allow interoperati<strong>on</strong><br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> system, which c<strong>on</strong>tains multiple informati<strong>on</strong> sources. An example<br />
is <str<strong>on</strong>g>the</str<strong>on</strong>g> approach taken in <str<strong>on</strong>g>the</str<strong>on</strong>g> COG project[dB04].<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> design-time dimensi<strong>on</strong> we distinguish:<br />
One-to-<strong>on</strong>e mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. Mappings are created between pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
Problems with this approach arise when many such mappings need to be created,
CHAPTER 1. INTRODUCTION 14<br />
Figure 1.4: The <strong>on</strong>e-to-<strong>on</strong>e approach for <strong>on</strong>tology integrati<strong>on</strong><br />
which is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten <str<strong>on</strong>g>the</str<strong>on</strong>g> case in organizati<strong>on</strong>s where many different applicati<strong>on</strong>s are in<br />
use. The complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping for <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e approach is O(n 2 ),<br />
where n is <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. An example <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e approach is<br />
OBSERVER [MIKS00] (see also Secti<strong>on</strong> 4.2.3). Figure 1.4 illustrates <strong>on</strong>e-to-<strong>on</strong>e<br />
mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. There exists a mapping between every pair <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. In<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> worst case, <str<strong>on</strong>g>the</str<strong>on</strong>g>se mappings are <strong>on</strong>ly <strong>on</strong>e-way. This means that a single mapping<br />
can <strong>on</strong>ly translate from <strong>on</strong>e model to ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r, not <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r way around.<br />
Mapping to a global <strong>on</strong>tology. Each <strong>on</strong>tology is mapped to a central shared <strong>on</strong>tology<br />
(Figure 1.5. Drawbacks <str<strong>on</strong>g>of</str<strong>on</strong>g> using a global <strong>on</strong>tology to which all o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies<br />
are mapped are similar to those <str<strong>on</strong>g>of</str<strong>on</strong>g> using any standard [VC98]. For example, it<br />
is hard to reach a c<strong>on</strong>sensus <strong>on</strong> a standard shared by many people (it is always<br />
a lengthy process), who use different terminologies for <str<strong>on</strong>g>the</str<strong>on</strong>g> same domain and a<br />
standard impedes changes in an organizati<strong>on</strong> (because evoluti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> standards suffers
CHAPTER 1. INTRODUCTION 15<br />
Figure 1.5: The global approach for <strong>on</strong>tology integrati<strong>on</strong><br />
from <str<strong>on</strong>g>the</str<strong>on</strong>g> same problems as <str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> standards). An example <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global<br />
<strong>on</strong>tology approach is MOMIS [BCVB01] (see also Secti<strong>on</strong> 4.2.4).<br />
Hybrid approach. On <str<strong>on</strong>g>the</str<strong>on</strong>g> Web, not much c<strong>on</strong>trol over <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies can be enforced<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> global integrati<strong>on</strong> scenario does not scale, because different organizati<strong>on</strong>s<br />
use different <strong>on</strong>tologies and do not want to commit to a single new <strong>on</strong>tology.<br />
However, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e integrati<strong>on</strong> approach also does not to scale, because it requires<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> maintenance <str<strong>on</strong>g>of</str<strong>on</strong>g> too many mappings between <strong>on</strong>tologies. Therefore, a<br />
hybrid approach is suitable, where <str<strong>on</strong>g>the</str<strong>on</strong>g>re are several ”islands” around influential<br />
domain <strong>on</strong>tologies, where within <str<strong>on</strong>g>the</str<strong>on</strong>g> island <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a form <str<strong>on</strong>g>of</str<strong>on</strong>g> global integrati<strong>on</strong>;<br />
<strong>on</strong>e <strong>on</strong>tology would be <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> islands and a number <str<strong>on</strong>g>of</str<strong>on</strong>g> local<br />
<strong>on</strong>tologies are mapped to this global <strong>on</strong>tology. Mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> islands are<br />
established, as illustrated in Figure 1.6.<br />
Note that many methods, tools and techniques in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> do not have a bias for <strong>on</strong>e
CHAPTER 1. INTRODUCTION 16<br />
Figure 1.6: The hybrid approach for <strong>on</strong>tology integrati<strong>on</strong>
CHAPTER 1. INTRODUCTION 17<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> above approaches. They can <str<strong>on</strong>g>of</str<strong>on</strong>g>ten be used in all scenarios, although an <strong>on</strong>tology<br />
merging tool such as PROMPT [NM00b] does seem to have a bias towards using a shared<br />
<strong>on</strong>tology.<br />
The more comprehensive integrati<strong>on</strong> systems typically prescribe which paradigm<br />
should be used. MOMIS [BCVB01], for example, prescribes <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> a global merged<br />
<strong>on</strong>tology for <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources, whereas OBSERVER [MIKS00] prescribes<br />
loosely coupled comp<strong>on</strong>ent <strong>on</strong>tologies with mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
The approaches that fall in <str<strong>on</strong>g>the</str<strong>on</strong>g> global-approach category can be fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r classified by<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> directi<strong>on</strong>ality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> shared <strong>on</strong>tology.<br />
This distincti<strong>on</strong> is inherited from <str<strong>on</strong>g>the</str<strong>on</strong>g> database setting, where <str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong> task is<br />
seen as a process by which several databases with associated local schemas are integrated<br />
to form a virtual database (thus it c<strong>on</strong>forms to <str<strong>on</strong>g>the</str<strong>on</strong>g> global approach), and views c<strong>on</strong>nect<br />
local schemas with <str<strong>on</strong>g>the</str<strong>on</strong>g> global virtual schema. The terms LAV, GAV, and GLAV were<br />
introduced for describing different approaches in database integrati<strong>on</strong> [Len]:<br />
GAV (Global as view). The c<strong>on</strong>structs <str<strong>on</strong>g>of</str<strong>on</strong>g> a global schema are described as views over<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas (in database terminology) or <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings describe entities from<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> shared <strong>on</strong>tology in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> entities from <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies (in <strong>on</strong>tology mediati<strong>on</strong><br />
terminology).<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> case where <str<strong>on</strong>g>the</str<strong>on</strong>g>re are no integrity c<strong>on</strong>straints in <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema, query<br />
processing is ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r simple, c<strong>on</strong>sisting in unfolding <str<strong>on</strong>g>the</str<strong>on</strong>g> original query. Unfolding<br />
means expanding <str<strong>on</strong>g>the</str<strong>on</strong>g> terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query, which are c<strong>on</strong>structs <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema,<br />
according to <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings, in order to obtain a query which c<strong>on</strong>tains <strong>on</strong>ly c<strong>on</strong>structs<br />
from local schemas. Unfortunately, in <str<strong>on</strong>g>the</str<strong>on</strong>g> presence <str<strong>on</strong>g>of</str<strong>on</strong>g> integrity c<strong>on</strong>straints (which is<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> case when <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema is an <strong>on</strong>tology), additi<strong>on</strong>al steps must be performed<br />
for reas<strong>on</strong>ing[CDGL01].<br />
One disadvantage <str<strong>on</strong>g>of</str<strong>on</strong>g> this method is that whenever a source changes or a new <strong>on</strong>e<br />
is added, <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema needs to be rec<strong>on</strong>sidered. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r disadvantage menti<strong>on</strong>ed<br />
in [EA] is that it cannot fully capture <str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong> semantics where<br />
a source schema c<strong>on</strong>struct can be defined by a n<strong>on</strong>-reversible functi<strong>on</strong> over global<br />
schema c<strong>on</strong>structs.<br />
LAV (Local as view). The c<strong>on</strong>structs <str<strong>on</strong>g>of</str<strong>on</strong>g> local schemas are described as views over <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
global schema (in database terminology) or <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings define entities from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
local <strong>on</strong>tologies in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> entities from <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology (in <strong>on</strong>tology mediati<strong>on</strong><br />
terminology). This approach is characterized by a high modularity and extensibility<br />
(<str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas are expected to be <str<strong>on</strong>g>the</str<strong>on</strong>g> subject <str<strong>on</strong>g>of</str<strong>on</strong>g> change more frequently than <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
global schema and changing a local schema does not require changes in <str<strong>on</strong>g>the</str<strong>on</strong>g> global<br />
<strong>on</strong>e). A disadvantage <str<strong>on</strong>g>of</str<strong>on</strong>g> this method is <str<strong>on</strong>g>the</str<strong>on</strong>g> complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> reas<strong>on</strong>ing. Like GAV,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>re are situati<strong>on</strong>s where <str<strong>on</strong>g>the</str<strong>on</strong>g> approach is not expressive enough for fully capturing<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong> semantics [EA].
CHAPTER 1. INTRODUCTION 18<br />
GLAV (Global-local as view). This approach can be seen as an extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> both LAV<br />
and GAV. In this case a view <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> local schema is defined with respect to a view <strong>on</strong><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> global schema (a mapping is an inclusi<strong>on</strong>/equivalence relati<strong>on</strong> between a query<br />
over <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology and a query over a shared <strong>on</strong>tology). Query processing<br />
is complex in this case as well, since <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no direct informati<strong>on</strong> about which<br />
data satisfy <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema. One reas<strong>on</strong>ing technique is to transform a GLAV<br />
rule into a GAV rule and an integrity c<strong>on</strong>straint over <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema[AC01]. An<br />
advantage <str<strong>on</strong>g>of</str<strong>on</strong>g> this method is that it is more expressive than both LAV and GAV 6<br />
Recently, as p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed framework (see Secti<strong>on</strong> 4.2.6), a new approach for<br />
c<strong>on</strong>necting schemas called BAV(Both as view) was introduced [MP03],[EA] that is more<br />
expressive than LAV/GAV/GLAV. Below we give an overview <str<strong>on</strong>g>of</str<strong>on</strong>g> this approach:<br />
BAV (Both as view). This approach is based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> reversible sequences <str<strong>on</strong>g>of</str<strong>on</strong>g> primitive<br />
schema transformati<strong>on</strong>s, called transformati<strong>on</strong>s pathways.<br />
Schemas are incrementally transformed by applying to <str<strong>on</strong>g>the</str<strong>on</strong>g>m a sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> primitive<br />
transformati<strong>on</strong> steps t1, . . . , tn, where each such step adds, extends, deletes,<br />
c<strong>on</strong>tracts or renames a single schema c<strong>on</strong>cept. A query expressed in <str<strong>on</strong>g>the</str<strong>on</strong>g> intermediate<br />
query language (IQL) is attached to each add or delete step and specifies<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> extent <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> new or <str<strong>on</strong>g>the</str<strong>on</strong>g> deleted c<strong>on</strong>struct in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> rest <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>structs<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> schema.<br />
[MP03] shows how <str<strong>on</strong>g>the</str<strong>on</strong>g> LAV and GAV definiti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> views can be fully derived from<br />
BAV schema transformati<strong>on</strong> sequences, and how BAV transformati<strong>on</strong> sequences<br />
may be p<str<strong>on</strong>g>art</str<strong>on</strong>g>ially derived from LAV or GAV definiti<strong>on</strong>s. A network <str<strong>on</strong>g>of</str<strong>on</strong>g> pathways may<br />
join different schema toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r. Thus, this approach for specifying mappings could<br />
also be applied for <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e mapping and for <str<strong>on</strong>g>the</str<strong>on</strong>g> hybrid approach.<br />
1.5 Wrappers and Mediators<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> wrapper/mediator architecture, <str<strong>on</strong>g>the</str<strong>on</strong>g> main comp<strong>on</strong>ents are wrappers; <str<strong>on</strong>g>the</str<strong>on</strong>g>re typically<br />
exists <strong>on</strong>e wrapper for each data source, and <strong>on</strong>e (or more) mediator(s), which mediate<br />
between <str<strong>on</strong>g>the</str<strong>on</strong>g> differences in <str<strong>on</strong>g>the</str<strong>on</strong>g> individual data sources. In <str<strong>on</strong>g>the</str<strong>on</strong>g> global integrati<strong>on</strong> paradigm,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>re is typically <strong>on</strong>e mediator which is accessed by <str<strong>on</strong>g>the</str<strong>on</strong>g> user for querying and informati<strong>on</strong><br />
retrieval. In this case <str<strong>on</strong>g>the</str<strong>on</strong>g> mediator typically has <strong>on</strong>e global schema al<strong>on</strong>g with<br />
mappings to all <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas, where each data source has <strong>on</strong>e local schema. Each<br />
data source has a wrapper associated with it, which provides <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data source and <str<strong>on</strong>g>the</str<strong>on</strong>g> system representati<strong>on</strong> (this is typically between<br />
a database representati<strong>on</strong> and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology representati<strong>on</strong>) and translates queries from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
system representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> data source representati<strong>on</strong> (typically, <strong>on</strong>tology queries would<br />
6 see http://www.science.unitn.it/coopis/talks/Wednesday/Lenzerini.pdf for an example
CHAPTER 1. INTRODUCTION 19<br />
be translated to SQL queries to be executed <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> individual database). An example<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> this approach is MOMIS [BCVB01] (see also Secti<strong>on</strong> 4.2.4). This wrapper/mediator<br />
approach with <strong>on</strong>e global mediator is illustrated in Figure 1.7.<br />
Figure 1.7: The wrapper/mediator architecture in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> global integrati<strong>on</strong>. Note that<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> drums represent as usual databases and <str<strong>on</strong>g>the</str<strong>on</strong>g> sheets represent o<str<strong>on</strong>g>the</str<strong>on</strong>g>r data sources.<br />
In a <strong>on</strong>e-to-<strong>on</strong>e integrati<strong>on</strong> scheme, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are typically several mediators, which mediate<br />
between <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> individual sources. The setting here is similar<br />
to a peer-to-peer setting, where each peer could have a number <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources and an<br />
<strong>on</strong>tology, which describes <str<strong>on</strong>g>the</str<strong>on</strong>g> data <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source. The user would be <strong>on</strong>e peer and would<br />
use that peer’s <strong>on</strong>tology. If <str<strong>on</strong>g>the</str<strong>on</strong>g> user wants to query a different peer, <str<strong>on</strong>g>the</str<strong>on</strong>g> mediator has to<br />
mediate <str<strong>on</strong>g>the</str<strong>on</strong>g> differences between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. One example <str<strong>on</strong>g>of</str<strong>on</strong>g> this case is OBSERVER<br />
[MIKS00] (see also Secti<strong>on</strong> 4.2.3), where each peer has its own mediator, which does<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting and querying <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r peers, although <strong>on</strong>e central mediator (called<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> Inter-<strong>on</strong>tology Relati<strong>on</strong>ship Manager IRM) still keeps track <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. This central mediator is queried by o<str<strong>on</strong>g>the</str<strong>on</strong>g>r mediators to find out about<br />
related peers and <str<strong>on</strong>g>the</str<strong>on</strong>g> differences in representati<strong>on</strong>.<br />
Figure 1.8 illustrates <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers and mediators in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e<br />
integrati<strong>on</strong>. Note that in this case, all mediators need to be aware <str<strong>on</strong>g>of</str<strong>on</strong>g> all o<str<strong>on</strong>g>the</str<strong>on</strong>g>r mediators in<br />
order to perform query rewriting and to achieve effective query answering.<br />
This report is fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r structured as follows. Chapter 2 presents <str<strong>on</strong>g>the</str<strong>on</strong>g> use cases which<br />
we have identified as crucial for <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. These use<br />
cases are later used to identify if and how <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> would fit into<br />
a Semantic Web c<strong>on</strong>text. The framework we use for evaluating <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this<br />
<str<strong>on</strong>g>survey</str<strong>on</strong>g> is presented in Chapter 3. The <str<strong>on</strong>g>survey</str<strong>on</strong>g> itself is presented in Chapter 4. Chapter 5<br />
compares <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> and Chapter 6 presents some c<strong>on</strong>clusi<strong>on</strong>s.
CHAPTER 1. INTRODUCTION 20<br />
Figure 1.8: The wrapper/mediator architecture in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e integrati<strong>on</strong>
Chapter 2<br />
Motivati<strong>on</strong>al Use Cases<br />
In this chapter we present a number <str<strong>on</strong>g>of</str<strong>on</strong>g> generic use cases which capture <str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong>ality<br />
required for <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. Any applicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong><br />
is expected to use all <str<strong>on</strong>g>the</str<strong>on</strong>g>se use cases to some extent. Therefore it is interesting<br />
to see to what extent each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> supports <str<strong>on</strong>g>the</str<strong>on</strong>g>se use cases in<br />
order to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g>ir applicability to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> problem <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />
Web. The <strong>Ontology</strong> Mediati<strong>on</strong> framework developed in <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project should support<br />
as many features as possible from <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>es described here.<br />
We distinguish three use cases, which are detailed in <str<strong>on</strong>g>the</str<strong>on</strong>g> remainder <str<strong>on</strong>g>of</str<strong>on</strong>g> this chapter:<br />
• Instance Mediati<strong>on</strong><br />
• <strong>Ontology</strong> <strong>Merging</strong><br />
• Creating <strong>Ontology</strong> Mappings<br />
The first use case, Instance Mediati<strong>on</strong>, addresses <str<strong>on</strong>g>the</str<strong>on</strong>g> tasks <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong>,<br />
unificati<strong>on</strong> and query rewriting. The sec<strong>on</strong>d use case, <strong>Ontology</strong> <strong>Merging</strong>, addresses <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
way two source <strong>on</strong>tologies can be merged into <strong>on</strong>e target <strong>on</strong>tology. The third use case,<br />
Creating <strong>Ontology</strong> Mappings, is about actually finding similarities between <strong>on</strong>tologies<br />
and creating mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
The generic use cases corresp<strong>on</strong>d to three orthog<strong>on</strong>al dimensi<strong>on</strong>s in <strong>on</strong>tology mediati<strong>on</strong>.<br />
Each applicati<strong>on</strong> scenario can make use <str<strong>on</strong>g>of</str<strong>on</strong>g> all three use cases to some extent.<br />
2.1 Use Cases for Instance Mediati<strong>on</strong><br />
The following use cases are <str<strong>on</strong>g>the</str<strong>on</strong>g> typical use cases for instance mediati<strong>on</strong>, where <str<strong>on</strong>g>the</str<strong>on</strong>g> emphasis<br />
is <strong>on</strong> instance transformati<strong>on</strong> and unificati<strong>on</strong>.<br />
21
CHAPTER 2. MOTIVATIONAL USE CASES 22<br />
Figure 2.1: Instance Transformati<strong>on</strong><br />
Definiti<strong>on</strong> 1 We define instance mediati<strong>on</strong> as <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> rec<strong>on</strong>ciling differences between<br />
two instance bases, each described by an <strong>on</strong>tology. This includes <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery and<br />
specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings, as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se mappings for certain tasks,<br />
such as query rewriting and instance transformati<strong>on</strong>.<br />
As we can see in <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong>, instance mediati<strong>on</strong> also requires <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery and<br />
specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings. This makes apparent <str<strong>on</strong>g>the</str<strong>on</strong>g> inter-dependencies between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> different use cases. We do not describe <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery and specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology<br />
mappings here; instead, <str<strong>on</strong>g>the</str<strong>on</strong>g>se use cases are discussed later, because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir use in different<br />
o<str<strong>on</strong>g>the</str<strong>on</strong>g>r areas <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong>.<br />
2.1.1 Instance Transformati<strong>on</strong><br />
For <str<strong>on</strong>g>the</str<strong>on</strong>g> instance transformati<strong>on</strong> use case we assume two separate applicati<strong>on</strong>s with separate<br />
instance stores both described by <strong>on</strong>tologies. The task to be performed is <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> a source <strong>on</strong>tology, say OS, to an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology<br />
OT . Figure 2.1 illustrates <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong>. An instance i1, which<br />
refers to <strong>on</strong>tology O1, is transformed into instance i2, which refers to <strong>on</strong>tology O2. What<br />
is important to note here is that <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> itself is derived from <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies, and that both <str<strong>on</strong>g>the</str<strong>on</strong>g> original and <str<strong>on</strong>g>the</str<strong>on</strong>g> transformed instance provide<br />
informati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant. Note that a real-world referant is not necessarily<br />
a physical object, but can also be, for example, a brand-name product, a date, an<br />
event or a message.<br />
This kind <str<strong>on</strong>g>of</str<strong>on</strong>g> transformati<strong>on</strong> needs to be supported by <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
sense that <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />
<strong>on</strong>tology OS and instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology OT .<br />
Different applicati<strong>on</strong> scenarios have different requirements <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> precisi<strong>on</strong> and coverage<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>. With precisi<strong>on</strong> in this c<strong>on</strong>text we mean <str<strong>on</strong>g>the</str<strong>on</strong>g> degree to which<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> intended meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> instance is preserved in <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>. With coverage
CHAPTER 2. MOTIVATIONAL USE CASES 23<br />
we mean <str<strong>on</strong>g>the</str<strong>on</strong>g> fracti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances that are intended to be transformed, which are actually<br />
transformed. The requirements <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong> determine what <str<strong>on</strong>g>the</str<strong>on</strong>g>se measures look<br />
like.<br />
When an instance has been translated from OS to OT , it is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten necessary to detect<br />
whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> transformed instance corresp<strong>on</strong>ds to an existing instance in <str<strong>on</strong>g>the</str<strong>on</strong>g> instance store<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target applicati<strong>on</strong> in order to avoid duplicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> and in order to find<br />
out more about <str<strong>on</strong>g>the</str<strong>on</strong>g> instances in <str<strong>on</strong>g>the</str<strong>on</strong>g> knowledge base. We discuss this issue below.<br />
2.1.2 Instance Unificati<strong>on</strong><br />
The instance unificati<strong>on</strong> problem can be summarized as follows:<br />
Say we have an <strong>on</strong>tology O, and two instances I1 and I2 <str<strong>on</strong>g>of</str<strong>on</strong>g> that <strong>on</strong>tology. We want to<br />
check whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r I1 and I2 refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant. In this case we need to unify<br />
I1 and I2 into a newly created instance I0, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> I1 and I2 1 . Therefore,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> instance unificati<strong>on</strong> task can be decomposed into (1) <str<strong>on</strong>g>the</str<strong>on</strong>g> identificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances<br />
referring to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant and (2) taking <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two instances in<br />
order to obtain <str<strong>on</strong>g>the</str<strong>on</strong>g> unified instance.<br />
If <str<strong>on</strong>g>the</str<strong>on</strong>g> instances I1 and I2 have been identified as referring to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant,<br />
but c<strong>on</strong>tain c<strong>on</strong>tradictory informati<strong>on</strong>, it is not possible to create a unified instance<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> user should be informed <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> inc<strong>on</strong>sistency.<br />
Figure 2.2 illustrates <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> instance unificati<strong>on</strong>. Two instances (i1 and i2) in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tology O1, which refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world referant r1, are unified into <strong>on</strong>e<br />
new instance, i0, which is also an instance in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O1 and also describes <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />
real-world referant r1. 2<br />
We identify two general means <str<strong>on</strong>g>of</str<strong>on</strong>g> detecting whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r two instances refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />
real-world referant:<br />
• In <str<strong>on</strong>g>the</str<strong>on</strong>g> ‘exact’ case, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping specifies precise, exact c<strong>on</strong>diti<strong>on</strong>s which<br />
unambiguously specify in which cases two instances refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same real-world<br />
referant and in which cases <str<strong>on</strong>g>the</str<strong>on</strong>g>y refer to different real-world referant. In o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
words, in which cases <str<strong>on</strong>g>the</str<strong>on</strong>g> instances are unifiable.<br />
• In <str<strong>on</strong>g>the</str<strong>on</strong>g> ‘probabilistic’ case, a similarity measure is created <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> basis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
mapping. The similarity measure expresses <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that both instances<br />
1Note that I0 could coincide with ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r I1 or I2, which would be a less general case <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e described<br />
here.<br />
2Note that c<strong>on</strong>tradictory informati<strong>on</strong> may result from unstated assumpti<strong>on</strong>s in an <strong>on</strong>tology or instance<br />
base. Unificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>tradictory statements could be performed by adding in <str<strong>on</strong>g>the</str<strong>on</strong>g> appropriate assumpti<strong>on</strong><br />
as a c<strong>on</strong>diti<strong>on</strong>.<br />
For example, ”Canada [a real-world object] requires inbound travellers to carry passports” is true if <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
assumpti<strong>on</strong> is that travellers are EU citizens, but not if <str<strong>on</strong>g>the</str<strong>on</strong>g>y are US citizens entering from <str<strong>on</strong>g>the</str<strong>on</strong>g> US.<br />
Such techniques are not explored in this deliverable.
CHAPTER 2. MOTIVATIONAL USE CASES 24<br />
Figure 2.2: Instance Unificati<strong>on</strong><br />
refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> same object. A threshold could be used to decide whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r to unify <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
instances. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r possibility is to have <str<strong>on</strong>g>the</str<strong>on</strong>g> user decide about <str<strong>on</strong>g>the</str<strong>on</strong>g> unificati<strong>on</strong>, which<br />
is clearly undesirable in <str<strong>on</strong>g>the</str<strong>on</strong>g> general case, but could be useful when <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> resulting <strong>on</strong>tology is crucial or when dealing with very few instances.<br />
Instance transformati<strong>on</strong> and instance unificati<strong>on</strong> are <str<strong>on</strong>g>of</str<strong>on</strong>g>ten required in a querying scenario<br />
where an applicati<strong>on</strong> A queries ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r applicati<strong>on</strong> B and <str<strong>on</strong>g>the</str<strong>on</strong>g> query results (c<strong>on</strong>sisting<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> instances) are transformed to <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> A and unified with instances<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> instance base <str<strong>on</strong>g>of</str<strong>on</strong>g> A.<br />
In order to be able to query a data source which uses a different (unknown) <strong>on</strong>tology,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> query originally formulated in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong>’s <strong>on</strong>tology needs to be rewritten<br />
in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology. The next secti<strong>on</strong> describes <str<strong>on</strong>g>the</str<strong>on</strong>g> generic query rewriting use<br />
case.<br />
2.1.3 Query Rewriting<br />
An operati<strong>on</strong> occurring very frequently in Knowledge Management applicati<strong>on</strong>s is querying<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> sources. We want to allow an applicati<strong>on</strong> to query different heterogeneous<br />
informati<strong>on</strong> sources without actually knowing about all <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. In order to<br />
achieve this, a query written in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong>’s <strong>on</strong>tology needs to be rewritten<br />
using <str<strong>on</strong>g>the</str<strong>on</strong>g> terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> target data source’s <strong>on</strong>tology.<br />
Say, we have an applicati<strong>on</strong> A, which uses an <strong>on</strong>tology OA for its informati<strong>on</strong> representati<strong>on</strong>.<br />
Say now that this applicati<strong>on</strong> wants to query a different data source, which uses<br />
<strong>on</strong>tology OB, but A does not know about <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> this <strong>on</strong>tology. The applicati<strong>on</strong>
CHAPTER 2. MOTIVATIONAL USE CASES 25<br />
Figure 2.3: Query Rewriting<br />
A now formulates a query QA in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology OA. In order to execute this query<br />
<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> target data source, it needs to be rewritten <strong>on</strong>to query QB, which is formulated in<br />
terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology OB. This rewriting process is illustrated in Figure 2.3.<br />
After executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query, <str<strong>on</strong>g>the</str<strong>on</strong>g> results are transformed back to <str<strong>on</strong>g>the</str<strong>on</strong>g> OA representati<strong>on</strong><br />
and unified with <str<strong>on</strong>g>the</str<strong>on</strong>g> local instances using <str<strong>on</strong>g>the</str<strong>on</strong>g> techniques for instance transformati<strong>on</strong> and<br />
unificati<strong>on</strong> described above.<br />
2.2 <strong>Ontology</strong> <strong>Merging</strong><br />
Besides <str<strong>on</strong>g>the</str<strong>on</strong>g> instance transformati<strong>on</strong> and unificati<strong>on</strong> and query rewriting, we see ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
major use case for <strong>on</strong>tology mediati<strong>on</strong>: <strong>Ontology</strong> <strong>Merging</strong>.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>Ontology</strong> <strong>Merging</strong> [NM00b], two source <strong>on</strong>tologies shall be merged<br />
into <strong>on</strong>e target <strong>on</strong>tology based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies. In <str<strong>on</strong>g>the</str<strong>on</strong>g> general case, <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />
<strong>on</strong>tologies would disappear or become unavailable and <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> target (merged) <strong>on</strong>tology<br />
remains. A special case is when <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies remain, al<strong>on</strong>g with mappings to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
merged <strong>on</strong>tology. Note that <str<strong>on</strong>g>the</str<strong>on</strong>g> target (merged) <strong>on</strong>tology could coincide with <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
source <strong>on</strong>tologies.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> general case where <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies become unavailable after <str<strong>on</strong>g>the</str<strong>on</strong>g> merge,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> complete instance stores <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies have to be merged. In <str<strong>on</strong>g>the</str<strong>on</strong>g> special<br />
case, <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies can maintain <str<strong>on</strong>g>the</str<strong>on</strong>g>ir instance stores and during run-time <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
applicati<strong>on</strong>, processes <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong> and instance unificati<strong>on</strong> (cf. <str<strong>on</strong>g>the</str<strong>on</strong>g> previous<br />
subsecti<strong>on</strong>) might be necessary 3 . We can compare <str<strong>on</strong>g>the</str<strong>on</strong>g>se two distinct cases with noti<strong>on</strong>s<br />
developed in <str<strong>on</strong>g>the</str<strong>on</strong>g> field <str<strong>on</strong>g>of</str<strong>on</strong>g> database integrati<strong>on</strong>, namely, <str<strong>on</strong>g>the</str<strong>on</strong>g> noti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> materialised and<br />
virtual views [Hul97] respectively.<br />
Of course, when <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies do not have instance stores associated with<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>m, <str<strong>on</strong>g>the</str<strong>on</strong>g>se problems do not occur. However, in <str<strong>on</strong>g>the</str<strong>on</strong>g> general case an <strong>on</strong>tology will have<br />
<strong>on</strong>e or more instance stores associated with it. In special cases, such as <str<strong>on</strong>g>the</str<strong>on</strong>g> (distributed)<br />
development <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies, <str<strong>on</strong>g>the</str<strong>on</strong>g>re will not be instance stores.<br />
3 Note that mappings <strong>on</strong>ly can be derived when <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a semantical overlap. If <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no semantical<br />
overlap, <str<strong>on</strong>g>the</str<strong>on</strong>g> set <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings is empty and thus instance transformati<strong>on</strong>s and and instance unificati<strong>on</strong>s are<br />
not performed.
CHAPTER 2. MOTIVATIONAL USE CASES 26<br />
2.3 Creating <strong>Ontology</strong> Mappings<br />
In order to be able to support <str<strong>on</strong>g>the</str<strong>on</strong>g> previously menti<strong>on</strong>ed use cases, generally, a mapping<br />
needs to be created between <str<strong>on</strong>g>the</str<strong>on</strong>g> source and <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology. This does not apply to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging where <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies do not remain. Because <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />
<strong>on</strong>tologies disappear, <str<strong>on</strong>g>the</str<strong>on</strong>g>re needs to be no <strong>on</strong>tology mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g>se sources and<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> new merged <strong>on</strong>tology. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> techniques for finding c<strong>on</strong>cepts to be merged in<br />
different <strong>on</strong>tologies and finding mappings between c<strong>on</strong>cepts in different <strong>on</strong>tologies are<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> same, since <str<strong>on</strong>g>the</str<strong>on</strong>g>y are both based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts. In fact, a mapping<br />
between two <strong>on</strong>tologies can be used as a basis for <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology. In <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<strong>Ontology</strong> <strong>Merging</strong> where <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies remain, a mapping needs to be created<br />
between each source <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology.<br />
We split <str<strong>on</strong>g>the</str<strong>on</strong>g> ”Creating <strong>Ontology</strong> Mappings” use case into two distinct use cases: finding<br />
similarities between <strong>on</strong>tologies and specifying mappings between <strong>on</strong>tologies.<br />
2.3.1 Finding Similarities<br />
In order to determine which mappings need to be created, <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity between <strong>on</strong>tologies<br />
needs to be established. This can ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r be established manually or automatically<br />
using <str<strong>on</strong>g>the</str<strong>on</strong>g> so-called Match operator (cf. [RB01]). The Match operator takes as input two<br />
<strong>on</strong>tologies and returns as output a list <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities between entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> two source<br />
<strong>on</strong>tologies. These similarities can now be used as a st<str<strong>on</strong>g>art</str<strong>on</strong>g>ing point to semi-automatically<br />
create a mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies or to merge <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies (cf. [NM00b]).<br />
2.3.2 Specifying Mappings<br />
After having defined <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> different <strong>on</strong>tologies, a mapping<br />
needs to be specified between <str<strong>on</strong>g>the</str<strong>on</strong>g> similar entities <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. The requirements<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> this mapping depend <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong> scenario (cf. <str<strong>on</strong>g>the</str<strong>on</strong>g> various scenarios described in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> next secti<strong>on</strong>) and in general <str<strong>on</strong>g>the</str<strong>on</strong>g> requirements <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong>, as menti<strong>on</strong>ed in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> introducti<strong>on</strong>.
Chapter 3<br />
The Evaluati<strong>on</strong> Framework<br />
This chapter presents <str<strong>on</strong>g>the</str<strong>on</strong>g> framework used for evaluating and comparing different approaches<br />
in <strong>on</strong>tology merging and aligning, as well as data integrati<strong>on</strong> using <strong>on</strong>tologies.<br />
This framework is set up in such a way that it enables us to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> approaches to an <strong>on</strong>tology mediati<strong>on</strong> setting in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web c<strong>on</strong>text. Each <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> is described according to <str<strong>on</strong>g>the</str<strong>on</strong>g>se criteria. If <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> criteria<br />
is not applicable to <str<strong>on</strong>g>the</str<strong>on</strong>g> approach, it will be omitted.<br />
Summary <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approach We first summarize <str<strong>on</strong>g>the</str<strong>on</strong>g> approach to give <str<strong>on</strong>g>the</str<strong>on</strong>g> reader a feeling<br />
for what <str<strong>on</strong>g>the</str<strong>on</strong>g> approach is all about.<br />
<strong>Ontology</strong> Languages For each tool or method we describe which <strong>on</strong>tology languages are<br />
supported as sources and targets <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between <strong>on</strong>tologies. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore,<br />
we describe how <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology languages relate to <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language employed<br />
by <str<strong>on</strong>g>the</str<strong>on</strong>g> approach. In many cases <str<strong>on</strong>g>the</str<strong>on</strong>g> same language is used for both <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings. In some cases, this can have drawbacks if <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language<br />
is not expressive enough to capture all <str<strong>on</strong>g>the</str<strong>on</strong>g> required <strong>on</strong>tology mappings.<br />
Mapping language An important aspect in <strong>on</strong>tology mediati<strong>on</strong> is <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language<br />
which is used to actually specify <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. The mapping language determines<br />
to some extent <str<strong>on</strong>g>the</str<strong>on</strong>g> complexity <str<strong>on</strong>g>of</str<strong>on</strong>g> creating mappings and also <str<strong>on</strong>g>the</str<strong>on</strong>g> possibilities <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong><br />
in creating <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings and in transforming and unifying instances. The<br />
most important aspects <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology mapping language are its expressivity (i.e.<br />
what kind <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>s between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies can be expressed) and its usability.<br />
An important aspect <str<strong>on</strong>g>of</str<strong>on</strong>g> a mapping language is <str<strong>on</strong>g>the</str<strong>on</strong>g> types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings that are supported,<br />
in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r words <str<strong>on</strong>g>the</str<strong>on</strong>g> expressivity. We can distinguish several types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings<br />
here. The following is an (incomplete) list <str<strong>on</strong>g>of</str<strong>on</strong>g> types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings:<br />
• Class mappings<br />
• Property (i.e. relati<strong>on</strong>) mappings<br />
27
CHAPTER 3. THE EVALUATION FRAMEWORK 28<br />
• Instance mappings<br />
• Axioms / rules / c<strong>on</strong>straints<br />
• Value transformati<strong>on</strong>s (for properties)<br />
• C<strong>on</strong>diti<strong>on</strong>al mapping<br />
As was pointed out in Secti<strong>on</strong> 1.3, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are several mismatches between <strong>on</strong>tologies,<br />
both <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> language and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology level. A mapping language needs to take<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>se mismatches into account. These mismatches mostly c<strong>on</strong>cern <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
level, although <str<strong>on</strong>g>the</str<strong>on</strong>g>re are still some issues remaining <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> language level, as was<br />
pointed out in Secti<strong>on</strong> 1.3.<br />
Two notes about <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language with respect to <str<strong>on</strong>g>the</str<strong>on</strong>g> approach in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g><br />
are in order here. Firstly, an <strong>on</strong>tology merging tool (e.g. PROMPT [NM00b]) does<br />
not produce a mapping and <str<strong>on</strong>g>the</str<strong>on</strong>g>refore does not need a mapping language. Sec<strong>on</strong>dly,<br />
we describe several methods and tools for <strong>on</strong>tology matching in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>. These<br />
approaches typically do not produce a mapping, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r a specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities<br />
between entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
Mapping Patterns One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> major goals <str<strong>on</strong>g>of</str<strong>on</strong>g> Work Package 4 in <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project is to<br />
investigate <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> patterns for <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mappings. One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
tasks is to find such patterns. Therefore, it would be interesting to see if and how<br />
existing approaches cope with this and how mapping patterns could be integrated.<br />
This issue is very closely related to <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language.<br />
Automati<strong>on</strong> support We describe <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong> that is supported and <str<strong>on</strong>g>the</str<strong>on</strong>g> degree<br />
to which it is supported during creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping. <strong>Ontology</strong><br />
mapping can not be fully automated; <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process will always be an interactive<br />
<strong>on</strong>e.<br />
One important aspect in <str<strong>on</strong>g>the</str<strong>on</strong>g> automati<strong>on</strong> support is <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> external informati<strong>on</strong><br />
sources, such as domain-specific lexic<strong>on</strong>s or existing <strong>on</strong>tologies or data schemas.<br />
Applicability to use cases In order to see if and how an approach can be applied to our<br />
setting <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web we analyze <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
each approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> use cases presented in Chapter 2. More specifically, we relate<br />
each approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> following use cases:<br />
• Instance Transformati<strong>on</strong><br />
• Instance Unificati<strong>on</strong><br />
• Query Rewriting<br />
• <strong>Ontology</strong> <strong>Merging</strong><br />
We will not treat <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g> each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches to each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> use cases<br />
in detail, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r give an indicati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> (in)applicability to each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> use<br />
cases.
CHAPTER 3. THE EVALUATION FRAMEWORK 29<br />
Implementati<strong>on</strong> For each approach we describe <str<strong>on</strong>g>the</str<strong>on</strong>g> tool support developed for <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular<br />
method. We distinguish <str<strong>on</strong>g>the</str<strong>on</strong>g> following two categories <str<strong>on</strong>g>of</str<strong>on</strong>g> tools:<br />
• Tools that support <str<strong>on</strong>g>the</str<strong>on</strong>g> user in creating <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings (and merging <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies).<br />
These tools fall in two categories: (1) comp<strong>on</strong>ents that implement <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Match operator to find similarities between <strong>on</strong>tologies and (2) GUI tools that<br />
aid <str<strong>on</strong>g>the</str<strong>on</strong>g> user in specifying <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
• Tools that do <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time mediati<strong>on</strong>. These tools take care <str<strong>on</strong>g>of</str<strong>on</strong>g> query-rewriting,<br />
data transformati<strong>on</strong>, etc. . .<br />
An important aspect <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong> is <str<strong>on</strong>g>the</str<strong>on</strong>g> maturity <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool(set). An<br />
academic prototype that has just been built to support a PhD <str<strong>on</strong>g>the</str<strong>on</strong>g>sis would be less<br />
stable and less usable than a product that has underg<strong>on</strong>e much development over<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> years and is exploited by a commercial organizati<strong>on</strong>.<br />
Experiences with <str<strong>on</strong>g>the</str<strong>on</strong>g> approach We summarize <str<strong>on</strong>g>the</str<strong>on</strong>g> experiences that have been reported<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> literature for each approach. These experiences are very valuable, because<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>y show <str<strong>on</strong>g>the</str<strong>on</strong>g> applicability <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> methods to real <strong>on</strong>tology mapping and informati<strong>on</strong><br />
integrati<strong>on</strong> problems. They also show <str<strong>on</strong>g>the</str<strong>on</strong>g> usability and limitati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tools<br />
that have been developed for <str<strong>on</strong>g>the</str<strong>on</strong>g> method.<br />
We structure <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> in Chapter 4<br />
according to this evaluati<strong>on</strong> framework. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, we provide a comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
approaches in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> presented evaluati<strong>on</strong> framework in Chapter 5.
Chapter 4<br />
The Survey<br />
This chapter presents <str<strong>on</strong>g>the</str<strong>on</strong>g> actual <str<strong>on</strong>g>survey</str<strong>on</strong>g> <strong>on</strong> <strong>on</strong>tology merging and aligning approaches. We<br />
evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches according to <str<strong>on</strong>g>the</str<strong>on</strong>g> criteria identified in <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> framework in<br />
Chapter 3.<br />
In order to structure <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g>, we have grouped <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches into three categories:<br />
• Methods and Tools. We describe several special-purpose methods and tools.<br />
The purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this secti<strong>on</strong> ranges from <strong>on</strong>tology matching<br />
(GLUE, Semantic Matching) to <strong>on</strong>tology merging (PROMPT) and <strong>on</strong>tology mapping<br />
(MAFRA, RDFT, WSMT). Sometimes <str<strong>on</strong>g>the</str<strong>on</strong>g> lines between <str<strong>on</strong>g>the</str<strong>on</strong>g> purpose <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> approaches becomes blurred, because, for example, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA<br />
[MMSV02] also describe a way to do <strong>on</strong>tology matching. Also, in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
PROMPT we not <strong>on</strong>ly describe <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology merging tool, but also related tools in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> matching (even PROMPT itself has a matching algorithm) and <strong>on</strong>tology<br />
versi<strong>on</strong>ing (PROMPTDiff).<br />
• Data Integrati<strong>on</strong> Systems. We describe four approaches to data integrati<strong>on</strong> using<br />
<strong>on</strong>tologies, namely InfoSleuth, ONION, INFOMIX, MOMIS and OBSERVER.<br />
These integrati<strong>on</strong> systems are all comprehensive in <str<strong>on</strong>g>the</str<strong>on</strong>g> sense that <str<strong>on</strong>g>the</str<strong>on</strong>g>y typically<br />
have different types <str<strong>on</strong>g>of</str<strong>on</strong>g> functi<strong>on</strong>ality. For example, both ONION and MOMIS have<br />
matching tools, which aid in creating mappings between <strong>on</strong>tologies. All data integrati<strong>on</strong><br />
systems described in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> support querying <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data<br />
sources based <strong>on</strong> querying posed against an <strong>on</strong>tology; <str<strong>on</strong>g>the</str<strong>on</strong>g>y typically implement <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
wrapper/mediator architecture, which was described in Secti<strong>on</strong> 1.5.<br />
• Specific Techniques. We briefly describe a few specific techniques, which we do<br />
not evaluate according to <str<strong>on</strong>g>the</str<strong>on</strong>g> criteria in <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> framework. FCA-Merge is a<br />
method for <strong>on</strong>tology merging, based <strong>on</strong> formal c<strong>on</strong>cept analysis. OntoMorph is a<br />
system for syntactic and semantic rewriting <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. QOM (Quick <strong>Ontology</strong><br />
Mapping) is a method and tool for <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mapping, based <strong>on</strong> a<br />
30
CHAPTER 4. THE SURVEY 31<br />
combined similarity measure. OMEN is a system that describes mappings with a<br />
probability and a by means <str<strong>on</strong>g>of</str<strong>on</strong>g> a Bayesian Network.<br />
4.1 Methods and Tools<br />
4.1.1 MAFRA<br />
Summary MAFRA (MApping FRAmework for distributed <strong>on</strong>tologies) [MMSV02,<br />
SaR03b] is a framework defined for mapping distributed <strong>on</strong>tologies <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web<br />
based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> idea that complex mappings and reas<strong>on</strong>ing about those mappings is <str<strong>on</strong>g>the</str<strong>on</strong>g> best<br />
approach in a decentralized envir<strong>on</strong>ment like <str<strong>on</strong>g>the</str<strong>on</strong>g> Web. MAFRA has been implemented<br />
as a plug-in <str<strong>on</strong>g>of</str<strong>on</strong>g> KAON 1 and introduces two interesting new c<strong>on</strong>cepts: Semantic Bridges<br />
and service-centric approaches. A Semantic Bridge is defined as ”a declarative representati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> a semantic relati<strong>on</strong> between source and target <strong>on</strong>tologies entities” [SaR03b]. A<br />
Semantic Bridge provides <str<strong>on</strong>g>the</str<strong>on</strong>g> necessary mechanisms to transform instances and property<br />
fillers <str<strong>on</strong>g>of</str<strong>on</strong>g> a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular source <strong>on</strong>tology into instances and property fillers <str<strong>on</strong>g>of</str<strong>on</strong>g> a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular<br />
target <strong>on</strong>tology. Semantic Bridges are similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> noti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> structures (“<str<strong>on</strong>g>the</str<strong>on</strong>g><br />
points <str<strong>on</strong>g>of</str<strong>on</strong>g> linkage between two aligned <strong>on</strong>tologies”) in [Kle01] and <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies<br />
in ONION [MWK00, MW01] (also Secti<strong>on</strong> 4.2.2).<br />
The o<str<strong>on</strong>g>the</str<strong>on</strong>g>r novelty is <str<strong>on</strong>g>the</str<strong>on</strong>g> service-centric approach that <str<strong>on</strong>g>the</str<strong>on</strong>g> MAFRA Toolkit introduces<br />
[SaR03b]:<br />
Each semantic bridge has an associated transformati<strong>on</strong> service that determines<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> procedure and <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> user must provide<br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> engine. Each service is characterized by a set <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
arguments, which in turn are characterized by name, type, opti<strong>on</strong>ality and locati<strong>on</strong><br />
(whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r it is a source, target or c<strong>on</strong>diti<strong>on</strong> argument). Services are not<br />
<strong>on</strong>ly resp<strong>on</strong>sible for <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> capabilities but also for <str<strong>on</strong>g>the</str<strong>on</strong>g> validati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> argument values and semi-automatic mapping.<br />
The service oriented approach complements <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges mechanism providing<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> services necessary to perform <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping transformati<strong>on</strong>s. Silva<br />
and colleagues proposed a decentralized soluti<strong>on</strong> where independent transformati<strong>on</strong> modules<br />
are attached to <str<strong>on</strong>g>the</str<strong>on</strong>g> system. An overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> MAFRA toolkit can<br />
be seen in Figure 4.1, where some transformati<strong>on</strong> modules are included (copy instance,<br />
copy relati<strong>on</strong>, c<strong>on</strong>catenate, split, etc.).<br />
Figure 4.2 outlines <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>ceptual architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> MAFRA System. In <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>ceptual<br />
architecture a set <str<strong>on</strong>g>of</str<strong>on</strong>g> modules is identified and organized al<strong>on</strong>g two dimensi<strong>on</strong>s. Horiz<strong>on</strong>tal<br />
modules corresp<strong>on</strong>d to five fundamental phases in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mapping process<br />
1 KAON is an <strong>Ontology</strong> Management tool developed by <str<strong>on</strong>g>the</str<strong>on</strong>g> University <str<strong>on</strong>g>of</str<strong>on</strong>g> Karlsruhe,<br />
http://ka<strong>on</strong>.semanticweb.org/
CHAPTER 4. THE SURVEY 32<br />
Figure 4.1: MAFRA Toolkit System Architecture [SaR03b]
CHAPTER 4. THE SURVEY 33<br />
(Lift & Normalizati<strong>on</strong>, Similarity, Semantic Bridging, Executi<strong>on</strong> and Postprocessing).<br />
The vertical modules (Evoluti<strong>on</strong>, Domain Knowledge & C<strong>on</strong>straints, Cooperative C<strong>on</strong>sensus<br />
building and GUI) interact with <str<strong>on</strong>g>the</str<strong>on</strong>g> horiz<strong>on</strong>tal phases during <str<strong>on</strong>g>the</str<strong>on</strong>g> entire <strong>on</strong>tology<br />
mapping process.<br />
Figure 4.2: MAFRA C<strong>on</strong>ceptual Architecture [MMSV02]<br />
The main phases <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process described al<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g> horiz<strong>on</strong>tal dimensi<strong>on</strong><br />
are:<br />
• Lift & Normalizati<strong>on</strong>. Lifting refers to <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> importing <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies in a<br />
uniform representati<strong>on</strong> formalism (RDF-Schema in this case) for facilitating later<br />
operati<strong>on</strong>s. The next step after lifting is <str<strong>on</strong>g>the</str<strong>on</strong>g> normalizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabularies <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>tologies by eliminating lexical and semantic differences (like special characters,<br />
upper case letters and acr<strong>on</strong>yms).<br />
• Similarity In this step, similarities between <strong>on</strong>tology entities are calculated as a<br />
support for mapping discovery. A combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> different similarity algorithms<br />
is employed in MAFRA[MMSV02][Sil02] in order to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
identificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> equivalent terms.<br />
• Semantic Bridging After identifying <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between entities from different<br />
<strong>on</strong>tologies, <str<strong>on</strong>g>the</str<strong>on</strong>g> similar entities are semantically bridged, i.e. corresp<strong>on</strong>dences are<br />
established between <str<strong>on</strong>g>the</str<strong>on</strong>g>m for enabling <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e source<br />
entity to instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e target entity. Several sub-steps can be identified in this
CHAPTER 4. THE SURVEY 34<br />
phase. We describe <str<strong>on</strong>g>the</str<strong>on</strong>g>se sub-steps in more detail after <str<strong>on</strong>g>the</str<strong>on</strong>g> enumerati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
mapping phases and <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> vertical modules.<br />
MAFRA includes a formal representati<strong>on</strong> to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings. The formalism<br />
that is used to describe <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges is based <strong>on</strong> a DAML+OIL <strong>on</strong>tology,<br />
called <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridging <strong>Ontology</strong> (SBO). The result is close to <str<strong>on</strong>g>the</str<strong>on</strong>g> noti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology in ONION[MWK00] (see also Secti<strong>on</strong> 4.2.2). A mapping is<br />
a set <str<strong>on</strong>g>of</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic bridges described by this <strong>on</strong>tology. We describe<br />
this <strong>on</strong>tology in more detail later.<br />
• Executi<strong>on</strong>. When <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings/bridges are specified, <str<strong>on</strong>g>the</str<strong>on</strong>g> next step is to exploit<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>m in a meaningful way. As already menti<strong>on</strong>ed, MAFRA addresses <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> task<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong> [MMSV02]. This module actually transforms instances<br />
from <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology representati<strong>on</strong> into <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology<br />
by by evaluating <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> functi<strong>on</strong>s associated with <str<strong>on</strong>g>the</str<strong>on</strong>g> bridges<br />
defined in <str<strong>on</strong>g>the</str<strong>on</strong>g> previous stage. There are two possible operati<strong>on</strong>al modes: <str<strong>on</strong>g>of</str<strong>on</strong>g>fline<br />
(all <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s are executed <strong>on</strong>e time) and <strong>on</strong>line (<str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s are<br />
c<strong>on</strong>tinuously executed, and modificati<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> source or target <strong>on</strong>tologies are immediately<br />
reflected).<br />
• Postprocessing Based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> executi<strong>on</strong> results, <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping specificati<strong>on</strong> is again<br />
analyzed, e.g. for discovering object identities, in order to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> instance transformati<strong>on</strong> task.<br />
The vertical dimensi<strong>on</strong> comprises <str<strong>on</strong>g>the</str<strong>on</strong>g> following modules:<br />
• Evoluti<strong>on</strong> Synchr<strong>on</strong>ize <str<strong>on</strong>g>the</str<strong>on</strong>g> changes in <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Semantic Bridges defined by <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridge module.<br />
• Cooperative C<strong>on</strong>sensus Building From multiple alternative possible mappings <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
tool helps to set up a c<strong>on</strong>sensus between <str<strong>on</strong>g>the</str<strong>on</strong>g> various proposals <str<strong>on</strong>g>of</str<strong>on</strong>g> people involved<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping task.<br />
• Domain c<strong>on</strong>straints and Background Knowledge The tool allows users to include<br />
extra informati<strong>on</strong> (e.g. lexical <strong>on</strong>tologies like Wordnet can help in <str<strong>on</strong>g>the</str<strong>on</strong>g> identificati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> syn<strong>on</strong>yms) in order to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping.<br />
• GUI Visualizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies makes <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
mapping task a lot easier in <str<strong>on</strong>g>the</str<strong>on</strong>g> same way as do <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges established<br />
to represent <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between entities.<br />
The Semantic Bridging <strong>Ontology</strong> (SBO) (see Figure 4.3) is a tax<strong>on</strong>omy <str<strong>on</strong>g>of</str<strong>on</strong>g> generic<br />
bridges that are going to be instantiated for defining c<strong>on</strong>crete bridges/mappings. We give<br />
an overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> dimensi<strong>on</strong>s al<strong>on</strong>g which a bridge can be described in MAFRA, followed<br />
by a shallow descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> classes from SBO that allows <strong>on</strong>e to express such<br />
bridges.
CHAPTER 4. THE SURVEY 35<br />
Figure 4.3: Semantic Bridging <strong>Ontology</strong> (SBO) in UML [MMSV02]
CHAPTER 4. THE SURVEY 36<br />
A bridge can be described al<strong>on</strong>g five dimensi<strong>on</strong>s:<br />
• entity dimensi<strong>on</strong>: pertains to <str<strong>on</strong>g>the</str<strong>on</strong>g> entities related by a bridge which may be c<strong>on</strong>cepts<br />
(modeling classes <str<strong>on</strong>g>of</str<strong>on</strong>g> objects in <str<strong>on</strong>g>the</str<strong>on</strong>g> real world), relati<strong>on</strong>s, and attributes.<br />
• cardinality dimensi<strong>on</strong>: pertains to <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology entities at both sides <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> semantic bridge(usually 1:n or m:1, m:n is seldom required and it can be usually<br />
decomposed into m:1:n)<br />
• structural dimensi<strong>on</strong>: pertains to <str<strong>on</strong>g>the</str<strong>on</strong>g> way elementary bridges may be combined into<br />
a more complex bridge (relati<strong>on</strong>s that may hold between bridges: specializati<strong>on</strong>,<br />
alternatives, compositi<strong>on</strong>, abstracti<strong>on</strong>)<br />
• transformati<strong>on</strong> dimensi<strong>on</strong>: describes how instances are transformed by means <str<strong>on</strong>g>of</str<strong>on</strong>g> an<br />
associated transformati<strong>on</strong> functi<strong>on</strong>.<br />
• c<strong>on</strong>straint dimensi<strong>on</strong>: allows <strong>on</strong>e to express c<strong>on</strong>diti<strong>on</strong>s up<strong>on</strong> whose fulfilment <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
bridge evaluati<strong>on</strong> depends. The transformati<strong>on</strong> rule associated with <str<strong>on</strong>g>the</str<strong>on</strong>g> bridge is<br />
not executed unless <str<strong>on</strong>g>the</str<strong>on</strong>g>se c<strong>on</strong>diti<strong>on</strong>s hold.<br />
The abstract class SemanticBridge describes a generic bridge, up<strong>on</strong> which <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />
are no restricti<strong>on</strong>s regarding <str<strong>on</strong>g>the</str<strong>on</strong>g> entity types that <str<strong>on</strong>g>the</str<strong>on</strong>g> bridge c<strong>on</strong>nects or <str<strong>on</strong>g>the</str<strong>on</strong>g> cardinality.<br />
For supporting <str<strong>on</strong>g>the</str<strong>on</strong>g> compositi<strong>on</strong> modelling primitive, this class has defined a relati<strong>on</strong><br />
hasBridge. The class SemanticBridgeAlt supports <str<strong>on</strong>g>the</str<strong>on</strong>g> alternative modelling<br />
primitive by grouping several mutual exclusive semantic bridges. SemanticBridge is<br />
fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r specialized in <str<strong>on</strong>g>the</str<strong>on</strong>g> SBO according to <str<strong>on</strong>g>the</str<strong>on</strong>g> entity type, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology defining as subclasses<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> this class: Relati<strong>on</strong>Bridge, C<strong>on</strong>ceptBridge, and AtributeBridge.<br />
There is no specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SemanticBridge class to a class which bridges between<br />
individuals.<br />
Rule is a class for describing generic rules. C<strong>on</strong>diti<strong>on</strong> and Transformati<strong>on</strong><br />
are its subclasses which are resp<strong>on</strong>sible for describing <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>diti<strong>on</strong> necessary for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a bridge, and <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> functi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a bridge, respectively. The<br />
Service class maps <str<strong>on</strong>g>the</str<strong>on</strong>g> bridge parameters with <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> procedure arguments<br />
and call to procedures.<br />
The main goal in MAFRA is to transform instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology into instances<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology. Semantic Bridges specify how to perform <str<strong>on</strong>g>the</str<strong>on</strong>g>se transformati<strong>on</strong>s<br />
and categorize <str<strong>on</strong>g>the</str<strong>on</strong>g>m between c<strong>on</strong>cept bridges and property bridges. C<strong>on</strong>cept<br />
bridges define <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s between source instances and target instances, whereas<br />
property bridges specify <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s between source properties and target properties.<br />
The Semantic Bridge phase defines in <str<strong>on</strong>g>the</str<strong>on</strong>g> following steps <str<strong>on</strong>g>the</str<strong>on</strong>g> necessary structures<br />
to describe <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between two <strong>on</strong>tologies:<br />
1. Based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities that were discovered in <str<strong>on</strong>g>the</str<strong>on</strong>g> Similarity phase,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> first step is to select <str<strong>on</strong>g>the</str<strong>on</strong>g> pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> entities, which could be c<strong>on</strong>cepts, relati<strong>on</strong>s
CHAPTER 4. THE SURVEY 37<br />
and attributes, to be bridged that corresp<strong>on</strong>d with c<strong>on</strong>cept bridges. MAFRA allows<br />
relati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> different cardinality between source and target entities. Thus, a source<br />
or target entity can bel<strong>on</strong>g to <strong>on</strong>e or more Semantic Bridges.<br />
2. The property bridging step specifies matching properties for each c<strong>on</strong>cept bridge.<br />
The authors <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA distinguish two types <str<strong>on</strong>g>of</str<strong>on</strong>g> properties: attributes and relati<strong>on</strong>s.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> case that <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> source and target properties is different <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong><br />
specificati<strong>on</strong> informati<strong>on</strong> is required, and <str<strong>on</strong>g>the</str<strong>on</strong>g> domain expert is asked to supply<br />
this informati<strong>on</strong>. According to MAFRA’s definiti<strong>on</strong>, an attribute defines a relati<strong>on</strong><br />
between a c<strong>on</strong>cept and a data type value and a relati<strong>on</strong> defines a relati<strong>on</strong> between<br />
two c<strong>on</strong>cepts.<br />
3. This step (toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with <str<strong>on</strong>g>the</str<strong>on</strong>g> next <strong>on</strong>e) is p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> a refinement process to improve <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
matching results, and focuses <strong>on</strong> looking for mapping alternatives where <str<strong>on</strong>g>the</str<strong>on</strong>g>re is<br />
no target entities. If it is not possible to find a target entity for a source entity, <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
algorithm analyzes <str<strong>on</strong>g>the</str<strong>on</strong>g> hierarchy <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology and proposes an equivalent<br />
mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> some <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> parents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> unmapped source entity. So <str<strong>on</strong>g>the</str<strong>on</strong>g> source entity<br />
is mapped to <str<strong>on</strong>g>the</str<strong>on</strong>g> same target entities as some <str<strong>on</strong>g>of</str<strong>on</strong>g> its parents.<br />
4. As a p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> refinement process menti<strong>on</strong>ed previously, in this step <str<strong>on</strong>g>the</str<strong>on</strong>g> system<br />
tries to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> bridges between source c<strong>on</strong>cepts and target c<strong>on</strong>cepts.<br />
It can be viewed as a complementary routine to <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity phase.<br />
5. Associate transformati<strong>on</strong> procedures with <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping relati<strong>on</strong>s identified in previous<br />
phases. Although <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> main goals <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> authors <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA is to provide<br />
an elevated level <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping procedure, <str<strong>on</strong>g>the</str<strong>on</strong>g>y recognize that in<br />
this step <str<strong>on</strong>g>the</str<strong>on</strong>g> interventi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an expert is highly recommended.<br />
To finish this brief descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA, we present an example from [MMSV02]<br />
(see Figure 4.4). The goal <str<strong>on</strong>g>of</str<strong>on</strong>g> this exercise is to map two <strong>on</strong>tologies: <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology<br />
(o1) describes <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> a family and its events are categorized in family events<br />
(marriage and divorce) and individual events (birth date, death date); and <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology<br />
(o2) characterizes individuals as Man and Woman. A C<strong>on</strong>cept Bridge is defined to<br />
map o1 : Individual with o2 : Individual. All <str<strong>on</strong>g>the</str<strong>on</strong>g> attribute bridges are mapped using<br />
property/attribute bridges except for o1 : Individual - sex. This attribute is mapped using<br />
an alternative semantic bridge with two c<strong>on</strong>cept bridges that map o1 : Individual -<br />
sex with o2 : Man and o2 : W oman.<br />
<strong>Ontology</strong> Languages MAFRA needs <str<strong>on</strong>g>the</str<strong>on</strong>g> Lift & Normalizati<strong>on</strong> module to translate <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>tologies that p<str<strong>on</strong>g>art</str<strong>on</strong>g>icipate in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process into RDF (S). Precisely, <str<strong>on</strong>g>the</str<strong>on</strong>g> terminology<br />
specificati<strong>on</strong> is transformed to RDF Schema and <str<strong>on</strong>g>the</str<strong>on</strong>g> instances to RDF.
CHAPTER 4. THE SURVEY 38<br />
Figure 4.4: UML representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic bridge defined to map <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> example [MMSV02]<br />
Mapping Language Semantic Bridges (SBs) in c<strong>on</strong>juncti<strong>on</strong> with transformati<strong>on</strong>s modules<br />
services provide all <str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong>ality that a mapping language requires. The semantics<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> this mapping formalism is unambiguously specified through <str<strong>on</strong>g>the</str<strong>on</strong>g> SBO (Semantic<br />
Bridging <strong>Ontology</strong>).<br />
Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r important characteristic <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA is that it supports several types <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping<br />
like Class mappings, Property (i.e. relati<strong>on</strong>) mappings and Instance mappings.<br />
Mapping Patterns MAFRA does not support <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns in <strong>on</strong>tology<br />
mappings. However, <strong>on</strong>e could see a Semantic Bridge as an elementary mapping pattern<br />
and a specific combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a number <str<strong>on</strong>g>of</str<strong>on</strong>g> Semantic Bridges can be seen as a mapping<br />
pattern. Therefore, it should be possible to incorporate <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns into<br />
MAFRA.<br />
Automati<strong>on</strong> Support One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> main goals <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> designers <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA is to get a<br />
high level <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong> support. Unfortunately, <str<strong>on</strong>g>the</str<strong>on</strong>g> papers that describe <str<strong>on</strong>g>the</str<strong>on</strong>g> tool do not<br />
indicate precisely which steps are automatic and which are not. Also MAFRA gives <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
user <str<strong>on</strong>g>the</str<strong>on</strong>g> opportunity to define Semantic Bridges manually. The modules that are directly<br />
involved in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process (horiz<strong>on</strong>tal dimensi<strong>on</strong>) present <str<strong>on</strong>g>the</str<strong>on</strong>g> following level <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
automati<strong>on</strong>:<br />
• Lift & Normalizati<strong>on</strong> is probably a module that can work independently from users<br />
to provide a uniform representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies that will be mapped.
CHAPTER 4. THE SURVEY 39<br />
• The calculati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities inside a multi-strategy process looks like it is fully<br />
automatic.<br />
• The generati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Semantic Bridges is p<str<strong>on</strong>g>art</str<strong>on</strong>g>ially automated. The specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
mappings between properties (property bridging step) and <str<strong>on</strong>g>the</str<strong>on</strong>g> associati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> transformati<strong>on</strong><br />
procedures with mapping relati<strong>on</strong>s require <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g>icipati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a domain<br />
expert.<br />
• The executi<strong>on</strong> engine, implemented in Java, is fully automated, and achieves <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
transformati<strong>on</strong>s defined in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges.<br />
• The postprocessing module is not fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r elaborated in <str<strong>on</strong>g>the</str<strong>on</strong>g> papers that described<br />
MAFRA, and <str<strong>on</strong>g>the</str<strong>on</strong>g> level <str<strong>on</strong>g>of</str<strong>on</strong>g> automati<strong>on</strong> is not specified.<br />
The module Domain and Background Knowledge provides mechanisms to include<br />
background knowledge and domain c<strong>on</strong>straints by using for example glossaries or lexical<br />
<strong>on</strong>tologies. This features can c<strong>on</strong>siderably improved <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> results <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
similarity module and <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic bridge module.<br />
Applicability to Use Cases One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> goals <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA is to support instance transformati<strong>on</strong><br />
through transformati<strong>on</strong> procedures that are associated to Semantic Bridges. The<br />
postprocessing module tries to provide support for instance unificati<strong>on</strong> (recognizing that<br />
two instances represent <str<strong>on</strong>g>the</str<strong>on</strong>g> same real word object), but <str<strong>on</strong>g>the</str<strong>on</strong>g> authors recognized that it is<br />
a very challenging task and do not guarantee that it is fully implemented. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
hand, Semantic Bridges define explicitly mappings between entities <str<strong>on</strong>g>of</str<strong>on</strong>g> two <strong>on</strong>tologies,<br />
and MAFRA provides a semantic specificati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g>se mechanisms.<br />
Finally, in <str<strong>on</strong>g>the</str<strong>on</strong>g> papers [SaR03a] and [SaR03b], <str<strong>on</strong>g>the</str<strong>on</strong>g> authors outline a mechanism is<br />
close to <str<strong>on</strong>g>the</str<strong>on</strong>g> idea <str<strong>on</strong>g>of</str<strong>on</strong>g> query rewriting to retrieve all <str<strong>on</strong>g>the</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> a query that are stored<br />
in several <strong>on</strong>tologies which have mapping specificati<strong>on</strong>s between each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r.<br />
Implementati<strong>on</strong> As we menti<strong>on</strong>ed in <str<strong>on</strong>g>the</str<strong>on</strong>g> summary descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> this tool, <str<strong>on</strong>g>the</str<strong>on</strong>g> MAFRA<br />
toolkit was implemented as plug-in <str<strong>on</strong>g>of</str<strong>on</strong>g> KAON. Silva and colleagues c<strong>on</strong>tinuous with<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> this mapping system, and <str<strong>on</strong>g>the</str<strong>on</strong>g> latest versi<strong>on</strong>s can be founded at<br />
http://mafra-toolkit.sourceforge.net. Also some examples and documentati<strong>on</strong> are available<br />
<strong>on</strong> this site. MAFRA’s current approach is being used and tested under <str<strong>on</strong>g>the</str<strong>on</strong>g> Harm<strong>on</strong>ise<br />
project 2 . Harm<strong>on</strong>ise intends to overcome <str<strong>on</strong>g>the</str<strong>on</strong>g> interoperability problems occurring<br />
between tourism operators due to <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> different informati<strong>on</strong> representati<strong>on</strong> standards.<br />
The MAFRA Toolkit was adopted as <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> and transformati<strong>on</strong> engine<br />
core technology for <str<strong>on</strong>g>the</str<strong>on</strong>g> Harm<strong>on</strong>ise project. Harm<strong>on</strong>ise uses an “Interoperability Minimum<br />
Harm<strong>on</strong>isati<strong>on</strong> <strong>Ontology</strong>” (IMHO) as lingua franca between agents. The MAFRA<br />
Toolkit is resp<strong>on</strong>sible for <str<strong>on</strong>g>the</str<strong>on</strong>g> acquisiti<strong>on</strong>, representati<strong>on</strong> and executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
mapping between each agent specific <strong>on</strong>tology and IMHO [SaR03a].<br />
2 http://www.harm<strong>on</strong>ise.org
CHAPTER 4. THE SURVEY 40<br />
Experiences Silva and colleagues provided an informal evaluati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g>ir papers<br />
([SaR03a, SaR03b]) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> performance <str<strong>on</strong>g>of</str<strong>on</strong>g> MAFRA and <str<strong>on</strong>g>the</str<strong>on</strong>g>y compare <str<strong>on</strong>g>the</str<strong>on</strong>g>ir results with<br />
OntoMerge [DMQ02], a tool for mapping and merge.<br />
4.1.2 RDFT<br />
Summary Omelayenko and Fensel [OF01] present an approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
product informati<strong>on</strong> over <str<strong>on</strong>g>the</str<strong>on</strong>g> web by exploiting <str<strong>on</strong>g>the</str<strong>on</strong>g> data model <str<strong>on</strong>g>of</str<strong>on</strong>g> RDF [LS99], which<br />
is based <strong>on</strong> directed labeled graphs. In <str<strong>on</strong>g>the</str<strong>on</strong>g>ir approach, Omelayenko and Fensel assume<br />
product catalogs from different organizati<strong>on</strong>s specified in XML documents. The problem<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>y sketch is different organizati<strong>on</strong>s using different representati<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g>ir product<br />
catalogs. They intend to mediate between <str<strong>on</strong>g>the</str<strong>on</strong>g>se different representati<strong>on</strong>s with <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
RDF triples 3 .<br />
The approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> product catalogs is called two-layered because <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
product informati<strong>on</strong> itself is represented in XML, whereas <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> between<br />
different representati<strong>on</strong>s is d<strong>on</strong>e in RDF. The general idea is that an XML document,<br />
whose structure is described by a DTD (Document Type Definiti<strong>on</strong>) or XML Schema, is<br />
(1) abstracted to an RDF graph, which in turn is described by an <strong>on</strong>tology, which could<br />
be specified using <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF Schema [BG04] <strong>on</strong>tology language. The RDF document is<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>n (2) transformed into a target representati<strong>on</strong>, which is also described by an <strong>on</strong>tology.<br />
Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> target RDF is (3) refined to <str<strong>on</strong>g>the</str<strong>on</strong>g> target XML representati<strong>on</strong>, which can be used<br />
by applicati<strong>on</strong>s at <str<strong>on</strong>g>the</str<strong>on</strong>g> target vendor. All three transformati<strong>on</strong> steps are performed with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
XML transformati<strong>on</strong> language XSLT [Cla99]. The process <str<strong>on</strong>g>of</str<strong>on</strong>g> abstracti<strong>on</strong>, transformati<strong>on</strong><br />
and refinement is illustrated in Figure 4.5.<br />
[Ome02b] proposes a mapping meta-<strong>on</strong>tology for describing <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> between<br />
RDF documents. This mapping meta-<strong>on</strong>tology, called RDFT (RDF Transformati<strong>on</strong>)<br />
is specified using RDF Schema [BG04] and is used to describe <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between<br />
two RDFS <strong>on</strong>tologies. We describe this <strong>on</strong>tology and its use in more detail below.<br />
[Ome02a] describes a technique for discovering semantic corresp<strong>on</strong>dence between<br />
different product classificati<strong>on</strong> schemes based <strong>on</strong> a Naive-Bayes classifier. The mappings<br />
between <str<strong>on</strong>g>the</str<strong>on</strong>g> different product classificati<strong>on</strong>s are represented using <str<strong>on</strong>g>the</str<strong>on</strong>g> bridges from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
RDFT meta-<strong>on</strong>tology.<br />
<strong>Ontology</strong> Languages Omelayenko [Ome02b] not <strong>on</strong>ly describes a way to map between<br />
different RDF Schema <strong>on</strong>tologies, but also describes <str<strong>on</strong>g>the</str<strong>on</strong>g> way to transform XML documents<br />
to RDF using RDFT, <str<strong>on</strong>g>the</str<strong>on</strong>g>reby effectively specifying <str<strong>on</strong>g>the</str<strong>on</strong>g> way to perform <str<strong>on</strong>g>the</str<strong>on</strong>g> abstracti<strong>on</strong><br />
step.<br />
3 An RDF triple c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a subject, a predicate and an object. Subjects and objects form <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> graph, whereas predicates form <str<strong>on</strong>g>the</str<strong>on</strong>g> edges. An object in a triple can also occur as a subject or an object<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> a different triple.
CHAPTER 4. THE SURVEY 41<br />
Figure 4.5: Two-layered integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> XML catalogs using RDF<br />
RDFT can be used to express mappings between arbitrary <strong>on</strong>tologies specified in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
RDF Schema <strong>on</strong>tology language. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, it can be used to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong><br />
between XML documents and <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF representati<strong>on</strong>.<br />
Mapping language We will now give a short overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFT mapping meta<strong>on</strong>tology.<br />
The RDFT meta-<strong>on</strong>tology is used to describe three types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings denoted by<br />
classes in RDFT:<br />
• An EventMap is used to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between different events. Events<br />
in this c<strong>on</strong>text corresp<strong>on</strong>d to activity occurrences, such as sending or receiving a<br />
message. These events can be used, for example, to c<strong>on</strong>nect descripti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> two<br />
web services, described using <str<strong>on</strong>g>the</str<strong>on</strong>g> Web Service Definiti<strong>on</strong> Language WSDL 4 .<br />
• A DocumentMap specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between an XML and an RDF representati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> a catalog.<br />
• A VocabularyMap specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> actual relati<strong>on</strong>ships between two <strong>on</strong>tologies.<br />
For our purposes, <str<strong>on</strong>g>the</str<strong>on</strong>g> most interesting type <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping is <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabulary mapping<br />
(VocabularyMap).<br />
A mapping between two <strong>on</strong>tologies (vocabularies) is expressed using a number <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
bridges. Bridges in RDFT are subclasses <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFBridge class. RDFT distinguishes<br />
two types <str<strong>on</strong>g>of</str<strong>on</strong>g> RDF bridges, namely Class2Class and Property2Property<br />
4 http://www.w3.org/TR/wsdl
CHAPTER 4. THE SURVEY 42<br />
bridges. Class2Class bridges are used to describe <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between two classes<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se classes. The instance transformati<strong>on</strong> is specified<br />
using XPath [CD99] expressi<strong>on</strong>s. Property2Property bridges are used to describe<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between two properties in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. Again, XPath can be used<br />
to specify instance transformati<strong>on</strong>s.<br />
The types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings in RDFT (class-to-class and property-to-property) are probably<br />
sufficient in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <str<strong>on</strong>g>of</str<strong>on</strong>g> e-Marketplaces, which was <str<strong>on</strong>g>the</str<strong>on</strong>g> original target applicati<strong>on</strong><br />
domain [OF01], because <strong>on</strong>tologies can be expected to have a similar level <str<strong>on</strong>g>of</str<strong>on</strong>g> granularity<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> goals <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> different <strong>on</strong>tologies are similar. However, if <strong>on</strong>tologies are more<br />
diverse, different types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings, e.g. classes-to-instances, classes-to-properties, etc.<br />
will be necessary.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> approach taken by Omelayenko (cf. [OF01, Ome02b]), <str<strong>on</strong>g>the</str<strong>on</strong>g> steps <str<strong>on</strong>g>of</str<strong>on</strong>g> abstracti<strong>on</strong>,<br />
transformati<strong>on</strong>, and refinement all use <str<strong>on</strong>g>the</str<strong>on</strong>g> XML Transformati<strong>on</strong> language XSLT [Cla99]<br />
for specifying <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong>s between XML and RDF documents, as well as transformati<strong>on</strong>s<br />
between different RDF representati<strong>on</strong>s. While certainly XSLT is expressive<br />
enough to express arbitrary transformati<strong>on</strong>s between XML documents, and can <str<strong>on</strong>g>the</str<strong>on</strong>g>refore<br />
also transform RDF documents represented in <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF/XML [Bec03] serializati<strong>on</strong> into<br />
a different representati<strong>on</strong>, it is not well-suited for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> RDF transformati<strong>on</strong>s,<br />
because it does not take <str<strong>on</strong>g>the</str<strong>on</strong>g> data model <str<strong>on</strong>g>of</str<strong>on</strong>g> RDF, which is graph based into account,<br />
whereas <str<strong>on</strong>g>the</str<strong>on</strong>g> data model <str<strong>on</strong>g>of</str<strong>on</strong>g> XML is tree based. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF data model needs to<br />
be in a sense encoded in <str<strong>on</strong>g>the</str<strong>on</strong>g> tree based XML model in each single XSLT transformati<strong>on</strong>.<br />
Automati<strong>on</strong> Support [Ome02a] describes a way to discover similarity between classes<br />
based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> instance informati<strong>on</strong> for this class, using a machine-learning approach. In <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
use case, <str<strong>on</strong>g>the</str<strong>on</strong>g> class was a product classifier and <str<strong>on</strong>g>the</str<strong>on</strong>g> instance data c<strong>on</strong>sisted <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> product<br />
descripti<strong>on</strong>s.<br />
The RDFT meta-<strong>on</strong>tology was presented as <str<strong>on</strong>g>the</str<strong>on</strong>g> preferred way to specify mappings between<br />
<strong>on</strong>tologies, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities discovered by a matching tool, but no explicit<br />
support is provided for this.<br />
Applicability to Use Cases RDFT tackles <str<strong>on</strong>g>the</str<strong>on</strong>g> use case <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong><br />
through <str<strong>on</strong>g>the</str<strong>on</strong>g> XPath specificati<strong>on</strong>s attached to <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF Bridges. RDFT does not <str<strong>on</strong>g>of</str<strong>on</strong>g>fer a<br />
soluti<strong>on</strong> for instance unificati<strong>on</strong>, nor for query rewriting, although <str<strong>on</strong>g>the</str<strong>on</strong>g> declarative mapping<br />
between classes and properties could be used for this purpose. The scope <str<strong>on</strong>g>of</str<strong>on</strong>g> RDFT<br />
is limited to <str<strong>on</strong>g>the</str<strong>on</strong>g> transformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> XML documents between different representati<strong>on</strong>s.<br />
The use case <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging is not addressed, although a specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships<br />
between <strong>on</strong>tologies in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFT meta-<strong>on</strong>tology could help in merging<br />
different <strong>on</strong>tologies, because it specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between classes.
CHAPTER 4. THE SURVEY 43<br />
Implementati<strong>on</strong> A prototype tool was created to create mappings based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFT<br />
meta-<strong>on</strong>tology.<br />
Experiences RDFT as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> method proposed in [Ome02a] have<br />
been used for <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery and specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings between product classificati<strong>on</strong><br />
schemes in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> GoldenBullet [DKO + 02] project.<br />
4.1.3 PROMPT<br />
Summary The PROMPT suite c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a set <str<strong>on</strong>g>of</str<strong>on</strong>g> tools that have had an important impact<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> merging, alignment and versi<strong>on</strong>ing <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. A relevant result<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> this development is <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a global strategy that looks to take advantage <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> synergies that have been generated by <str<strong>on</strong>g>the</str<strong>on</strong>g> combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> tools that in <str<strong>on</strong>g>the</str<strong>on</strong>g> past where<br />
c<strong>on</strong>sidered independent. The PROMPT suite [NM03b] includes an <strong>on</strong>tology merging<br />
tool (iPROMPT, formerly known as PROMPT [NM00b]), an <strong>on</strong>tology tool for finding<br />
additi<strong>on</strong>al points <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity between <strong>on</strong>tologies for o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tools like iPROMPT (AnchorPROMPT,<br />
[NM00a]), an <strong>on</strong>tology versi<strong>on</strong>ing tool (PROMPTDiff, [NM03a]), and a<br />
tool for factoring out semantically complete sub<strong>on</strong>tologies (PROMPTFactor, [NM03b]).<br />
The work <str<strong>on</strong>g>of</str<strong>on</strong>g> Natasha Noy and colleagues proves that different tasks in multiple <strong>on</strong>tology<br />
management 5 ,like looking for differences between versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology or looking<br />
for similarities between two <strong>on</strong>tologies in a merging process, are closely interrelated and<br />
share several comp<strong>on</strong>ents and heuristics (see Figure 4.6). Thus tools for supporting some<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tasks in <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> multiple <strong>on</strong>tology management can benefit greatly from <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />
integrati<strong>on</strong> with o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs [NM03a].<br />
The key comp<strong>on</strong>ents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT suite have been developed as extensi<strong>on</strong>s (plugins)<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Protégé 2000 <strong>on</strong>tology development envir<strong>on</strong>ment 6 . We can distinguish <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
following comp<strong>on</strong>ents:<br />
• iPROMPT is an interactive <strong>on</strong>tology merging tool, which helps users in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
merging task by providing suggesti<strong>on</strong>s about which elements can be merged, by<br />
identifying inc<strong>on</strong>sistencies and potential problems and suggesting possible strategies<br />
to resolve <str<strong>on</strong>g>the</str<strong>on</strong>g>se problems and inc<strong>on</strong>sistencies.<br />
• AnchorPROMPT extends <str<strong>on</strong>g>the</str<strong>on</strong>g> performances <str<strong>on</strong>g>of</str<strong>on</strong>g> tools like iPROMPT determining<br />
additi<strong>on</strong>al points <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities between <strong>on</strong>tologies that are not identified by<br />
iPROMPT.<br />
5 Noy and colleagues [NM03b] define multiple <strong>on</strong>tology management as a set <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>crete tasks for<br />
dealing with multiple <strong>on</strong>tologies such as maintaining libraries <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies, import and reuse <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies,<br />
translating <strong>on</strong>tologies to o<str<strong>on</strong>g>the</str<strong>on</strong>g>r formalisms, <strong>on</strong>tology versi<strong>on</strong>ing support, <strong>on</strong>tology merging-mappingalignment<br />
support, inference across multiple <strong>on</strong>tologies and query across multiple <strong>on</strong>tologies<br />
6 http://protege.semanticweb.org/
CHAPTER 4. THE SURVEY 44<br />
Figure 4.6: The PROMPT suite infrastructure and interacti<strong>on</strong>s between tools [NM03b]
CHAPTER 4. THE SURVEY 45<br />
• PROMPTDiff compares two versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology and identifies structural differences<br />
between different versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tology.<br />
• PROMPTFactor is a tool that enables users to create a new <strong>on</strong>tology factoring out<br />
p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> an existing <strong>on</strong>tology. In this process, <str<strong>on</strong>g>the</str<strong>on</strong>g> tool guarantees that <str<strong>on</strong>g>the</str<strong>on</strong>g> terms<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resulting sub<strong>on</strong>tology are well-defined (for instance, every c<strong>on</strong>cept <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
sub<strong>on</strong>tology includes as appropriate <str<strong>on</strong>g>the</str<strong>on</strong>g> superc<strong>on</strong>cepts/subc<strong>on</strong>cepts required for its<br />
specificati<strong>on</strong>).<br />
One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> major c<strong>on</strong>tributi<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> PROMPT suite was <str<strong>on</strong>g>the</str<strong>on</strong>g> identificati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> an important overlap in <str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong>ality <str<strong>on</strong>g>of</str<strong>on</strong>g> its tools and <str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
an integrated approach where all <str<strong>on</strong>g>the</str<strong>on</strong>g>se tools benefit from each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r. For instance, some <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ents that were originally created for <str<strong>on</strong>g>the</str<strong>on</strong>g> interface <str<strong>on</strong>g>of</str<strong>on</strong>g> iPROMPT were reused in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> interfaces <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tools <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> suite. In additi<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> initial<br />
sets <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms between two <strong>on</strong>tologies that AnchorPROMPT requires as a st<str<strong>on</strong>g>art</str<strong>on</strong>g>ing<br />
point for a deeper analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities can be provided by iPROMPT. In return, AnchorPROMPT<br />
can supply an additi<strong>on</strong>al set <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms that can be used by iPROMPT<br />
to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> results <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process. A final example <str<strong>on</strong>g>of</str<strong>on</strong>g> this integrated approach<br />
can be found in <str<strong>on</strong>g>the</str<strong>on</strong>g> design <str<strong>on</strong>g>of</str<strong>on</strong>g> PROMPTDiff and iPROMPT. PROMPTDiff uses some<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> heuristics that were initially developed in iPROMPT for comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept<br />
names, slots attached to c<strong>on</strong>cepts, domains and range <str<strong>on</strong>g>of</str<strong>on</strong>g> slots and so <strong>on</strong>.<br />
As menti<strong>on</strong>ed above, iPROMPT [NM00b] is an interactive tool implemented as an<br />
extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Protégé 2000. iPROMPT guides users in <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> merging two <strong>on</strong>tologies<br />
(see an example <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user interface in Figure 4.7). The tool was originally<br />
developed to handle <strong>on</strong>tologies specified in OKBC [CFF + 98], but <str<strong>on</strong>g>the</str<strong>on</strong>g>re are significant<br />
efforts to adapt <str<strong>on</strong>g>the</str<strong>on</strong>g> tool in order to support <str<strong>on</strong>g>the</str<strong>on</strong>g> OWL <strong>on</strong>tology language [DS04]. The<br />
central element <str<strong>on</strong>g>of</str<strong>on</strong>g> iPROMPT is <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm that defines a set <str<strong>on</strong>g>of</str<strong>on</strong>g> steps for <str<strong>on</strong>g>the</str<strong>on</strong>g> interactive<br />
merging process, see also Figure 4.8. The first step is to identify potential merge candidates<br />
based <strong>on</strong> class-name similarities. The result is presented to <str<strong>on</strong>g>the</str<strong>on</strong>g> user as a list <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
potential merge operati<strong>on</strong>s. The sec<strong>on</strong>d step is initiated by <str<strong>on</strong>g>the</str<strong>on</strong>g> user who chooses <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> suggested operati<strong>on</strong>s from <str<strong>on</strong>g>the</str<strong>on</strong>g> list or specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> directly. The system performs<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> requested acti<strong>on</strong> and automatically executes additi<strong>on</strong>al changes derived from<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> acti<strong>on</strong>. It <str<strong>on</strong>g>the</str<strong>on</strong>g>n makes a new list <str<strong>on</strong>g>of</str<strong>on</strong>g> suggested acti<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> user based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> new<br />
structure <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology, determines c<strong>on</strong>flicts introduced by <str<strong>on</strong>g>the</str<strong>on</strong>g> last acti<strong>on</strong>, finds possible<br />
soluti<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g>se c<strong>on</strong>flicts and displays <str<strong>on</strong>g>the</str<strong>on</strong>g>se to <str<strong>on</strong>g>the</str<strong>on</strong>g> user.<br />
Initially, PROMPT identified a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging operati<strong>on</strong>s (merge classes,<br />
merge slots, merge bindings between a slot and a class, etc) and a set <str<strong>on</strong>g>of</str<strong>on</strong>g> possible c<strong>on</strong>flicts<br />
for <str<strong>on</strong>g>the</str<strong>on</strong>g>se operati<strong>on</strong>s (name c<strong>on</strong>flicts, dangling references, redundancy in <str<strong>on</strong>g>the</str<strong>on</strong>g> class hierarchy<br />
and slot-value restricti<strong>on</strong>s that violate class inheritance). These lists <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology<br />
merging operati<strong>on</strong>s and possible c<strong>on</strong>flict operati<strong>on</strong>s have been extended by <str<strong>on</strong>g>the</str<strong>on</strong>g> authors <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> tool as a p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> an evoluti<strong>on</strong> process in <str<strong>on</strong>g>the</str<strong>on</strong>g> design <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />
The goal <str<strong>on</strong>g>of</str<strong>on</strong>g> AnchorPROMPT [NM00a] is to augment <str<strong>on</strong>g>the</str<strong>on</strong>g> results <str<strong>on</strong>g>of</str<strong>on</strong>g> methods that
CHAPTER 4. THE SURVEY 46<br />
Figure 4.7: An example <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging in iPROMPT<br />
Figure 4.8: The flow <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> iPROMPT algorithm [NM00b]
CHAPTER 4. THE SURVEY 47<br />
analyze <strong>on</strong>ly local c<strong>on</strong>text in <strong>on</strong>tology structures, such as Chimaera [MFRW00] and<br />
iPROMPT [NM00b], by finding additi<strong>on</strong>al possible points <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity between <strong>on</strong>tologies.<br />
To do this AnchorPROMPT requires that <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tool or <str<strong>on</strong>g>the</str<strong>on</strong>g> user provides an initial<br />
set <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms. Following a graph perspective, <str<strong>on</strong>g>the</str<strong>on</strong>g> tool establishes a set <str<strong>on</strong>g>of</str<strong>on</strong>g> paths that<br />
c<strong>on</strong>nects <str<strong>on</strong>g>the</str<strong>on</strong>g> terms <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology that are related with <str<strong>on</strong>g>the</str<strong>on</strong>g> terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>e. The<br />
algorithm takes two pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms as input and analyzes <str<strong>on</strong>g>the</str<strong>on</strong>g> elements that are included<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> path that c<strong>on</strong>nect <str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tology with <str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> equivalence path <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology. So, we have two paths (<strong>on</strong>e for each <strong>on</strong>tology)<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> terms that compound <str<strong>on</strong>g>the</str<strong>on</strong>g>se paths. The analysis looks for terms al<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g> paths<br />
that might be similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r path, which bel<strong>on</strong>gs to <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology,<br />
assuming that <str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g> those paths are <str<strong>on</strong>g>of</str<strong>on</strong>g>ten similar as well. These new potentially<br />
related terms <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm discovers are marked with a similarity score that can<br />
be modified during <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r paths in which <str<strong>on</strong>g>the</str<strong>on</strong>g>se terms are also involved.<br />
Terms with high similar scores will be presented to <str<strong>on</strong>g>the</str<strong>on</strong>g> user to improve <str<strong>on</strong>g>the</str<strong>on</strong>g> set <str<strong>on</strong>g>of</str<strong>on</strong>g> possible<br />
suggesti<strong>on</strong>s in, for example, a merging process in iPROMPT.<br />
If <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies that we compare present important differences in <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
levels <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir hierarchy or in <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>s between classes, <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm does<br />
not work well.<br />
The third element <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> suite is PROMPTDiff [NM03a], which is used to compare<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular <strong>on</strong>tology and which identifies <str<strong>on</strong>g>the</str<strong>on</strong>g> frames<br />
(i.e. classes, slots or instances) that have no changes, frames with <strong>on</strong>ly changes in <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />
properties, and frames that have also changed in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r p<str<strong>on</strong>g>art</str<strong>on</strong>g>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir definiti<strong>on</strong>s. The name<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool, PROMPTDiff, is influenced by tools like CVS, which is a versi<strong>on</strong> c<strong>on</strong>trol<br />
system that is used to maintain <str<strong>on</strong>g>the</str<strong>on</strong>g> history <str<strong>on</strong>g>of</str<strong>on</strong>g> program source code files. This tool includes<br />
facilities to discover changes between versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> a document (finding a diff).<br />
The last element <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT suite we will describe here is <str<strong>on</strong>g>the</str<strong>on</strong>g> tool PROMPT-<br />
Factor [NM03b] which allows users to extract from a larger <strong>on</strong>tology <str<strong>on</strong>g>the</str<strong>on</strong>g> elements that<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> user is interested in, in a way that also copies all <str<strong>on</strong>g>the</str<strong>on</strong>g> terms required for preserving<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong>s. The authors <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool call this process “factoring<br />
sub<strong>on</strong>tologies”.<br />
During <str<strong>on</strong>g>the</str<strong>on</strong>g> analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT suite, we c<strong>on</strong>cluded that <str<strong>on</strong>g>the</str<strong>on</strong>g> tool has some limitati<strong>on</strong>s<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology versi<strong>on</strong>ing and evoluti<strong>on</strong>. We present a summary <str<strong>on</strong>g>of</str<strong>on</strong>g> some<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> most relevant c<strong>on</strong>clusi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> our study (some <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>m where c<strong>on</strong>firmed by Natasha<br />
Noy):<br />
• PROMPTDiff <strong>on</strong>ly detects differences between two versi<strong>on</strong>s using a structural diff.<br />
In [Kle04], we can find several complementary alternatives (change logs, c<strong>on</strong>ceptual<br />
relati<strong>on</strong>s and transformati<strong>on</strong> set) that can give us a richer descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
changes that <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tology has underg<strong>on</strong>e.
CHAPTER 4. THE SURVEY 48<br />
• The descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> differences between two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology that<br />
PROMPTDiff <str<strong>on</strong>g>of</str<strong>on</strong>g>fers is limited. For this reas<strong>on</strong>, Klein extended PROMPTDiff to<br />
support richer semantic descripti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> changes. He introduced a more complex<br />
classificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> changes (implicitly-changed, directly-changed, changed,<br />
isomorphic and unchanged, see [Kle04]) and provides a high level descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> changes based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> idea <str<strong>on</strong>g>of</str<strong>on</strong>g> minimal transformati<strong>on</strong> set and <strong>on</strong> an <strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
changes (again see, [Kle04]).<br />
PROMPTDiff can find difference between <strong>on</strong>tologies but it does not mean that <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />
is explicit support for versi<strong>on</strong>ing. PROMPTDiff does not allow <str<strong>on</strong>g>the</str<strong>on</strong>g> user to identify<br />
versi<strong>on</strong>s or to indicate that <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a versi<strong>on</strong>ing relati<strong>on</strong>ship between <strong>on</strong>tologies.<br />
Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> user has to find a way to manage different versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology<br />
and to identify that a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular <strong>on</strong>tology is a versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology.<br />
<strong>Ontology</strong> Languages The knowledge model underlying PROMPT is <str<strong>on</strong>g>the</str<strong>on</strong>g> Open Knowledge<br />
Base C<strong>on</strong>nectivity (OKBC) protocol [CFF + 98]. OKBC is frame-based: frames are<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> main elements in this knowledge model for building <strong>on</strong>tologies, and three types <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
frames can be distinguished: classes, slots and instances. A class is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> entities and<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> elements <str<strong>on</strong>g>of</str<strong>on</strong>g> such a set are called instances, which may be classes or slots as well as individuals.<br />
Slots define binary relati<strong>on</strong>s between pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> entities, whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g>y be classes,<br />
instances, or primitive objects (such as a string or a number).<br />
There has been a c<strong>on</strong>siderable effort to provide PROMPT with RDF and OWL<br />
support[NM03b]. In <str<strong>on</strong>g>the</str<strong>on</strong>g> tutorial about <str<strong>on</strong>g>the</str<strong>on</strong>g> Protégé OWL plug-in given at ISWC03 7 it was<br />
menti<strong>on</strong>ed that PROMPT can be used in c<strong>on</strong>juncti<strong>on</strong> with this plug-in 8 [KFNM]. Thus <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
OWL support for PROMPT is achieved through this plug-in.<br />
Mapping Language iPROMPT and AnchorPROMPT do not include a language that<br />
specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. We understand that <str<strong>on</strong>g>the</str<strong>on</strong>g>re should exist an internal representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping results because <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a str<strong>on</strong>g interacti<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g> tools <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> suite, and<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>y need to share <str<strong>on</strong>g>the</str<strong>on</strong>g>se results, but <str<strong>on</strong>g>the</str<strong>on</strong>g> related bibliography does not describe this possible<br />
formalism. Michel Klein (see [Kle04]) implemented an extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> PROMPTDiff that<br />
provides a language for change specificati<strong>on</strong> that characterizes differences between two<br />
<strong>on</strong>tologies. This language was originally defined using OKBC and <str<strong>on</strong>g>the</str<strong>on</strong>g>n translated and<br />
extended in OWL.<br />
iPROMPT in combinati<strong>on</strong> with AnchorPROMPT can map classes, properties and instances<br />
using linguistic and structural similarity techniques.<br />
Mapping Patterns Currently <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no support for mapping patterns in <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT<br />
suite.<br />
7 http://iswc2003.semanticweb.org/pdf/Protege-OWL-Tutorial-ISWC03.pdf<br />
8 http://protege.stanford.edu/plugins/owl/
CHAPTER 4. THE SURVEY 49<br />
Automati<strong>on</strong> Support iPROMPT is an interactive merging tool that guides users in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
process <str<strong>on</strong>g>of</str<strong>on</strong>g> merging two <strong>on</strong>tologies. iPROMPT proposes to <str<strong>on</strong>g>the</str<strong>on</strong>g> user a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology<br />
merging operati<strong>on</strong>s and a set <str<strong>on</strong>g>of</str<strong>on</strong>g> possible c<strong>on</strong>flicts for <str<strong>on</strong>g>the</str<strong>on</strong>g>se operati<strong>on</strong>s. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> user has<br />
two choices: select <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> suggesti<strong>on</strong>s generated by <str<strong>on</strong>g>the</str<strong>on</strong>g> tool, or specify <str<strong>on</strong>g>the</str<strong>on</strong>g> desired operati<strong>on</strong><br />
directly. After that, iPROMPT performs <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> and automatically executes<br />
additi<strong>on</strong>al changes that <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> requires. Finally <str<strong>on</strong>g>the</str<strong>on</strong>g> previous list <str<strong>on</strong>g>of</str<strong>on</strong>g> suggesti<strong>on</strong>s is<br />
modified as a result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> changes that <str<strong>on</strong>g>the</str<strong>on</strong>g> executed operati<strong>on</strong> produced. This cycle is<br />
repeated until <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process finishes, or <str<strong>on</strong>g>the</str<strong>on</strong>g> user decides to abort it.<br />
Applicability to Use Cases The PROMPT suite is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> tools that provides several soluti<strong>on</strong>s<br />
for <strong>on</strong>tology mediati<strong>on</strong>, versi<strong>on</strong>ing and factoring. iPROMPT covers <str<strong>on</strong>g>the</str<strong>on</strong>g> complete<br />
merging process, and can also generate a list <str<strong>on</strong>g>of</str<strong>on</strong>g> initial similarities that AnchorPROMPT<br />
improves in generating a new list <str<strong>on</strong>g>of</str<strong>on</strong>g> related terms <strong>on</strong> which <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping could be based.<br />
Implementati<strong>on</strong> All <str<strong>on</strong>g>the</str<strong>on</strong>g> tools <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PROMPT suite are plug-ins or extensi<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Protégé-2000 <strong>on</strong>tology development envir<strong>on</strong>ment. Protégé-2000 provides an intuitive<br />
graphical user interface for <strong>on</strong>tology development, a rich knowledge model based <strong>on</strong><br />
OKBC, and an extensible architecture that provides API access both to <str<strong>on</strong>g>the</str<strong>on</strong>g> Protégé-2000<br />
knowledge bases and to its user interface comp<strong>on</strong>ents [NM03b]. One <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> disadvantages<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> having Protégé-2000 as a base for PROMPT stems from <str<strong>on</strong>g>the</str<strong>on</strong>g> difficulties <str<strong>on</strong>g>of</str<strong>on</strong>g> supporting<br />
OWL <strong>on</strong> top <str<strong>on</strong>g>of</str<strong>on</strong>g> Protégé-2000. These are menti<strong>on</strong>ed in [KFNM] and are due to <str<strong>on</strong>g>the</str<strong>on</strong>g> differences<br />
between OKBC, Protégé-2000’s internal model, and OWL.<br />
The PROMPT suite is clearly user oriented where <str<strong>on</strong>g>the</str<strong>on</strong>g> main goal is to support <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
user in creating <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings (and merging <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies).The suite <str<strong>on</strong>g>of</str<strong>on</strong>g> tools provides a<br />
comm<strong>on</strong> user interface that follows <str<strong>on</strong>g>the</str<strong>on</strong>g> schema <str<strong>on</strong>g>of</str<strong>on</strong>g> Protégé-2000 GUI, and comp<strong>on</strong>ents<br />
that implement <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator to find similarities between <strong>on</strong>tologies.<br />
The PROMPT suite was developed and improved in <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> several projects<br />
during <str<strong>on</strong>g>the</str<strong>on</strong>g> last 5 years, with <str<strong>on</strong>g>the</str<strong>on</strong>g> collaborati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> many users who c<strong>on</strong>tinuously evaluate<br />
and exploit <str<strong>on</strong>g>the</str<strong>on</strong>g> tools providing valuable feedback for <str<strong>on</strong>g>the</str<strong>on</strong>g> developers.<br />
Experiences The papers that describe iPROMPT [NM00b], AnchorPROMPT [NM00a]<br />
and PROMPTDiff [NM03a] include evaluati<strong>on</strong> tests to show <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se tools.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> iPROMPT [NM00b], <str<strong>on</strong>g>the</str<strong>on</strong>g> authors tested <str<strong>on</strong>g>the</str<strong>on</strong>g> tool using two <strong>on</strong>tologies<br />
with 134 class and slot frames in total. The first <strong>on</strong>tology was developed for <str<strong>on</strong>g>the</str<strong>on</strong>g> unified<br />
problem solving method development language (UPML) [FMvH + 03] and <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d <strong>on</strong>tology<br />
for <str<strong>on</strong>g>the</str<strong>on</strong>g> method descripti<strong>on</strong> language (MDL) [GGM98]. The evaluati<strong>on</strong> showed<br />
that human experts followed 90% <str<strong>on</strong>g>of</str<strong>on</strong>g> iPROMPT’s suggesti<strong>on</strong>s and 75% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>flict<br />
resoluti<strong>on</strong> strategies. The users performed 74% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong>s suggested by iPROMPT<br />
during <str<strong>on</strong>g>the</str<strong>on</strong>g> merging process.
CHAPTER 4. THE SURVEY 50<br />
AnchorPROMPT was also tested in [NM00a]. The results show that <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
AnchorPROMPT is directly proporti<strong>on</strong>al to <str<strong>on</strong>g>the</str<strong>on</strong>g> length <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> paths c<strong>on</strong>sidered. For example<br />
with path length 2 <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy is 100% and with path length 4 <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy decreases<br />
to 67%. Noy and colleagues also tested AnchorPROMPT with <str<strong>on</strong>g>the</str<strong>on</strong>g> same <strong>on</strong>tologies as<br />
iPROMPT. They discovered an important limitati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool: <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm does not<br />
provide good results when <str<strong>on</strong>g>the</str<strong>on</strong>g> structures <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies differ c<strong>on</strong>siderably. The UPML<br />
<strong>on</strong>tology has a large number <str<strong>on</strong>g>of</str<strong>on</strong>g> classes distributed <strong>on</strong> many different levels. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
hand, <str<strong>on</strong>g>the</str<strong>on</strong>g> MDL <strong>on</strong>tology has a simpler structure with fewer classes and with <strong>on</strong>ly two<br />
levels in <str<strong>on</strong>g>the</str<strong>on</strong>g> hierarchy.<br />
Finally, <str<strong>on</strong>g>the</str<strong>on</strong>g> accuracy <str<strong>on</strong>g>of</str<strong>on</strong>g> PROMPTDiff [NM03a] was evaluated using several versi<strong>on</strong>s<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> two <strong>on</strong>tologies <str<strong>on</strong>g>of</str<strong>on</strong>g> two different projects: EON project and PharmGKB project. The<br />
tool identified 96% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> possible matches (recall) and 93% <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> identified matches<br />
were correct (precisi<strong>on</strong>).<br />
4.1.4 GLUE<br />
Summary GLUE [DMDH04] is a system which employs machine learning technologies<br />
to semi-automatically create mappings between heterogeneous <strong>on</strong>tologies, where an<br />
<strong>on</strong>tology is seen as a tax<strong>on</strong>omy <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts. With GLUE, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors port <str<strong>on</strong>g>the</str<strong>on</strong>g>ir previous<br />
work <strong>on</strong> matching database schemas (called LSD) [DMDH02] to <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web<br />
domain. GLUE focuses <strong>on</strong> finding 1-to-1 mappings between c<strong>on</strong>cepts in tax<strong>on</strong>omies, although<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> authors say that extending matching to relati<strong>on</strong>s and attributes and involving<br />
more complex mappings (such as 1-to-n and n-to-1 mappings) is <str<strong>on</strong>g>the</str<strong>on</strong>g> subject <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>going<br />
research.<br />
The similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> two c<strong>on</strong>cepts A and B in <str<strong>on</strong>g>the</str<strong>on</strong>g> two tax<strong>on</strong>omies O1 and O2 is based <strong>on</strong><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> sets <str<strong>on</strong>g>of</str<strong>on</strong>g> instances that overlap between <str<strong>on</strong>g>the</str<strong>on</strong>g> two c<strong>on</strong>cepts. In order to determine whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept B is also an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept A, first a classifier is built using<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept A as <str<strong>on</strong>g>the</str<strong>on</strong>g> training set. This classifier is now used to classify <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
instances <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cept B. The classifier <str<strong>on</strong>g>the</str<strong>on</strong>g>n decides for each instance <str<strong>on</strong>g>of</str<strong>on</strong>g> B, whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r it is<br />
also an instance <str<strong>on</strong>g>of</str<strong>on</strong>g> A or not.<br />
Based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>se classificati<strong>on</strong>s, four probabilities are computed, namely P (A, B),<br />
P (A, B), P (A, B) and P (A, B), where, for example, P (A, B) is <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that<br />
an instance in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain bel<strong>on</strong>gs to A, but not to B. These four probabilities can now<br />
be used to compute <str<strong>on</strong>g>the</str<strong>on</strong>g> joint probability distributi<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts A and B, which is a<br />
user supplied functi<strong>on</strong>, using <str<strong>on</strong>g>the</str<strong>on</strong>g>se four probabilities as parameters. [DMDH04] describes<br />
two possible functi<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> joint probability distributi<strong>on</strong>. The first example is <str<strong>on</strong>g>the</str<strong>on</strong>g> Jaccard<br />
coefficient, where <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity measure is computed by dividing <str<strong>on</strong>g>the</str<strong>on</strong>g> probability<br />
that an instance is in <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> two c<strong>on</strong>cepts by <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that an instance<br />
is in <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts (P (A ∩ B)/P (A ∪ B)), which intuitively corresp<strong>on</strong>ds to<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> relevant instances, which are both in A and B. The sec<strong>on</strong>d example is <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
“most-specific-parent”, where <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity measure is positive (i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> measure is not
CHAPTER 4. THE SURVEY 51<br />
0) for any parent B <str<strong>on</strong>g>of</str<strong>on</strong>g> A and it is <str<strong>on</strong>g>the</str<strong>on</strong>g> highest for <str<strong>on</strong>g>the</str<strong>on</strong>g> most specific parent, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cept<br />
BMSP , which represents <str<strong>on</strong>g>the</str<strong>on</strong>g> smallest superset <str<strong>on</strong>g>of</str<strong>on</strong>g> A.<br />
The general architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> GLUE system is as follows:<br />
• The Distributi<strong>on</strong> Estimator takes as input <str<strong>on</strong>g>the</str<strong>on</strong>g> two tax<strong>on</strong>omies O1 and O2, toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g>ir instances and applies machine learning to compute <str<strong>on</strong>g>the</str<strong>on</strong>g> four aforementi<strong>on</strong>ed<br />
probabilities P (A, B), P (A, B), P (A, B) and P (A, B). Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g> distributi<strong>on</strong><br />
estimator uses a c<strong>on</strong>tent learner, which learns a classifier based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
textual c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> instances, and a name learner, which learns a classifier based<br />
<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> name <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> instance. It is possible to plug in different learners for different<br />
aspects using a meta-learner which uses a certain functi<strong>on</strong> to incorporate <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
predicti<strong>on</strong>s from all learners into an overall predicti<strong>on</strong>.<br />
• The Similarity Estimator applies a user supplied functi<strong>on</strong>, such as <str<strong>on</strong>g>the</str<strong>on</strong>g> menti<strong>on</strong>ed<br />
Jaccard coefficient or <str<strong>on</strong>g>the</str<strong>on</strong>g> most-specific-parent, and computes a similarity value for<br />
each pair <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts 〈A ∈ O1, B ∈ O2〉.<br />
• The Relaxati<strong>on</strong> Labeler takes as input <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity values for <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts from<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> tax<strong>on</strong>omies and searches for <str<strong>on</strong>g>the</str<strong>on</strong>g> best mapping c<strong>on</strong>figurati<strong>on</strong>, exploiting user<br />
supplied domain specific c<strong>on</strong>straints and heuristics.<br />
All in all, GLUE can be seen as an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator and can be<br />
fit into <str<strong>on</strong>g>the</str<strong>on</strong>g> overall mapping process as illustrated in Secti<strong>on</strong> 1.2.<br />
<strong>Ontology</strong> Languages The GLUE matcher uses two tax<strong>on</strong>omies, in which <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes<br />
corresp<strong>on</strong>d to c<strong>on</strong>cepts, and edges corresp<strong>on</strong>d to subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
Clearly, such a tax<strong>on</strong>omy can be easily extracted from an <strong>on</strong>tology represented in<br />
any <strong>on</strong>tology language, although a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology are not taken<br />
into account. This, though, is not such a big problem for <str<strong>on</strong>g>the</str<strong>on</strong>g> approach, since <str<strong>on</strong>g>the</str<strong>on</strong>g> matching<br />
is mostly based <strong>on</strong> instance informati<strong>on</strong>.<br />
Mapping Language The result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching d<strong>on</strong>e in GLUE is not a mapping between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies, but ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r a set <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity measures, stating which c<strong>on</strong>cepts<br />
in <strong>on</strong>e <strong>on</strong>tology O1 are similar to c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology O2.<br />
Mapping Patterns Mapping patterns are not an issue in GLUE, since it is <strong>on</strong>ly c<strong>on</strong>cerned<br />
with discovering similarities between c<strong>on</strong>cepts based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir instances. GLUE<br />
could also not be used for matching patterns with an <strong>on</strong>tology, since a pattern does not<br />
have instances.
CHAPTER 4. THE SURVEY 52<br />
Automati<strong>on</strong> Support GLUE has a semi-automatic algorithm for specifying <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping<br />
between two <strong>on</strong>tologies. Ontologies are seen as tax<strong>on</strong>omies and <str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
matching is reduced to: “for each c<strong>on</strong>cept node in <strong>on</strong>e tax<strong>on</strong>omy, find <str<strong>on</strong>g>the</str<strong>on</strong>g> most similar<br />
node in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r tax<strong>on</strong>omy”.<br />
The input from <str<strong>on</strong>g>the</str<strong>on</strong>g> user in <str<strong>on</strong>g>the</str<strong>on</strong>g> matching process c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> functi<strong>on</strong> to be used for<br />
computing <str<strong>on</strong>g>the</str<strong>on</strong>g> overall similarity value, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> joint distributi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts, and<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> domain specific c<strong>on</strong>straints and heuristics, which are used for <str<strong>on</strong>g>the</str<strong>on</strong>g> relaxati<strong>on</strong> labeling<br />
process.<br />
GLUE takes a <strong>on</strong>e-shot approach at determining <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between tax<strong>on</strong>omies,<br />
which means that <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no user interacti<strong>on</strong> during <str<strong>on</strong>g>the</str<strong>on</strong>g> matching process. The user has to<br />
use <str<strong>on</strong>g>the</str<strong>on</strong>g> outcome <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching process as-is and use it as a basis for creating a mapping<br />
between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. In o<str<strong>on</strong>g>the</str<strong>on</strong>g>r words, GLUE implements <str<strong>on</strong>g>the</str<strong>on</strong>g> “find similarities” step in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process (Secti<strong>on</strong> 1.2), but does not provide support for <str<strong>on</strong>g>the</str<strong>on</strong>g> iterati<strong>on</strong> step.<br />
Applicability to Use Cases GLUE aids in creating mappings between <strong>on</strong>tologies in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
sense that it makes <str<strong>on</strong>g>the</str<strong>on</strong>g> work <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> human user easier by finding similarities between<br />
c<strong>on</strong>cepts in two <strong>on</strong>tologies based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir instances.<br />
Implementati<strong>on</strong> A prototypical implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> GLUE was created and <str<strong>on</strong>g>the</str<strong>on</strong>g> performance<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ents in <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture was evaluated. The main comp<strong>on</strong>ents<br />
to be evaluated were <str<strong>on</strong>g>the</str<strong>on</strong>g> different types <str<strong>on</strong>g>of</str<strong>on</strong>g> learners used for <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> and <str<strong>on</strong>g>the</str<strong>on</strong>g> relaxati<strong>on</strong><br />
labeler, which applied domain c<strong>on</strong>straints and heuristics in order to come up<br />
with better matches. It turned out that <str<strong>on</strong>g>the</str<strong>on</strong>g> combinati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> several combined classifiers toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
with domain heuristics can achieve significant performance enhancement in terms<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> accuracy, which can go up to 97% in some domains.<br />
Experiences [DMDH04] reports <strong>on</strong>ly <strong>on</strong> small evaluati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> performance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />
system for tax<strong>on</strong>omies in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <str<strong>on</strong>g>of</str<strong>on</strong>g> (university) course catalogs and company pr<str<strong>on</strong>g>of</str<strong>on</strong>g>iles.<br />
The matching accuracy for <str<strong>on</strong>g>the</str<strong>on</strong>g>ir chosen examples was typically between 70 and 90<br />
percent. However, experiments <strong>on</strong> a broader scale need to be d<strong>on</strong>e to see if GLUE works<br />
in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r domains and to evaluate <str<strong>on</strong>g>the</str<strong>on</strong>g> scalability <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approach.<br />
4.1.5 Semantic Matching<br />
Summary Semantic Matching [GS04] is an approach to matching classificati<strong>on</strong> hierarchies.<br />
The problem addressed by Semantic Matching is <str<strong>on</strong>g>the</str<strong>on</strong>g> following: say you have<br />
two different classificati<strong>on</strong> hierarchies, where each hierarchy is used to describe a set <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
documents, i.e. each term in <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> hierarchy describes a set <str<strong>on</strong>g>of</str<strong>on</strong>g> documents.<br />
How do <str<strong>on</strong>g>the</str<strong>on</strong>g> terms in <strong>on</strong>e hierarchy relate to <str<strong>on</strong>g>the</str<strong>on</strong>g> terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hierarchy?
CHAPTER 4. THE SURVEY 53<br />
Semantic Matching can also be seen as an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Match operator for<br />
purely tree-structures <strong>on</strong>tologies. The authors define Match as follows: “Match is an<br />
operator that takes two graph-like structures (e.g. database schemas or <strong>on</strong>tologies) and<br />
produces a mapping between elements <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two graphs that corresp<strong>on</strong>d semantically to<br />
each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r”. This definiti<strong>on</strong> is similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> provided in Secti<strong>on</strong> 1.1. However,<br />
in Semantic Matching <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> is limited to <str<strong>on</strong>g>the</str<strong>on</strong>g> graph representati<strong>on</strong> format for<br />
<strong>on</strong>tologies. This distincti<strong>on</strong> is fundamental to <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Matching approach, since it<br />
performs matching based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes and <str<strong>on</strong>g>the</str<strong>on</strong>g> edges between <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes in a graph.<br />
Until now Semantic Matching has been mostly developed and tested for <str<strong>on</strong>g>the</str<strong>on</strong>g> task <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
matching classificati<strong>on</strong> hierarchies. Classificati<strong>on</strong> hierarchies are tree-structured graphs in<br />
which each node has <strong>on</strong>ly <strong>on</strong>e parent. A property <str<strong>on</strong>g>of</str<strong>on</strong>g> classificati<strong>on</strong> hierarchies is that <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />
is <strong>on</strong>ly <strong>on</strong>e type <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ship, which is a more-specific-term relati<strong>on</strong> which subsumes<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ship. It is currently not clear if and exactly how Semantic Matching<br />
can be applied to <str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology matching, because most <strong>on</strong>tologies typically<br />
have different types <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships between c<strong>on</strong>cepts and <str<strong>on</strong>g>the</str<strong>on</strong>g> subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ship in<br />
<strong>on</strong>tologies is a formal relati<strong>on</strong>ship, interpreted <str<strong>on</strong>g>of</str<strong>on</strong>g>ten as a strict logical implicati<strong>on</strong> or a<br />
subset relati<strong>on</strong>ship (as is <str<strong>on</strong>g>the</str<strong>on</strong>g> case for <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> Descripti<strong>on</strong> Logics).<br />
Of course, an <strong>on</strong>tology can usually be rewritten as a graph with labelled edges, although<br />
some informati<strong>on</strong> (e.g. axioms and higher-order relati<strong>on</strong>s) might be lost in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
rewriting. C<strong>on</strong>cepts could be <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes and relati<strong>on</strong>ships between c<strong>on</strong>cepts could be <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
(labelled) edges; <str<strong>on</strong>g>the</str<strong>on</strong>g> label <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> edge would denote <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship. This is<br />
similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> labelled graphs used in ONION (see Secti<strong>on</strong> 4.2.2). There is currently work<br />
underway to incorporate <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Matching algorithm,<br />
but this work is still in <str<strong>on</strong>g>the</str<strong>on</strong>g> early stages.<br />
The authors <str<strong>on</strong>g>of</str<strong>on</strong>g> [GS04] have argued that almost all earlier approaches to schema and<br />
<strong>on</strong>tology matching have been syntactic matching approaches, as opposed to semantic<br />
matching. In syntactic matching, <str<strong>on</strong>g>the</str<strong>on</strong>g> labels and sometimes <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactical structure <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> graph is matched and typically some similarity coefficient [0, 1] is obtained, which<br />
indicates <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity between <str<strong>on</strong>g>the</str<strong>on</strong>g> two nodes. Semantic Matching computes a set-based<br />
relati<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes, taking into account <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> each node. The possible<br />
relati<strong>on</strong>s returned by <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Matching algorithm are equality (=), overlap (∩),<br />
mismatch (⊥), more general (⊆) or more specific (⊇). The corresp<strong>on</strong>dence <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> symbols<br />
with set <str<strong>on</strong>g>the</str<strong>on</strong>g>ory is not a coincidence, since each c<strong>on</strong>cept in <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> hierarchies<br />
represents a set <str<strong>on</strong>g>of</str<strong>on</strong>g> documents.<br />
We will now briefly sketch <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Matching (also S-Match) algorithm for graph<br />
matching.<br />
Two levels <str<strong>on</strong>g>of</str<strong>on</strong>g> granularity for matching are distinguished in S-Match, namely elementlevel<br />
matching and structure-level matching. At <str<strong>on</strong>g>the</str<strong>on</strong>g> element level, which is c<strong>on</strong>cerned<br />
with individual nodes, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors distinguish techniques with weak semantics and techniques<br />
with str<strong>on</strong>g semantics. Techniques with weak semantics corresp<strong>on</strong>d to <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactic<br />
matching which has been proposed in most previous literature (for an overview
CHAPTER 4. THE SURVEY 54<br />
see [RB01]). Element-level matching with str<strong>on</strong>g semantics is d<strong>on</strong>e using <str<strong>on</strong>g>the</str<strong>on</strong>g>sauri (e.g.<br />
WordNet [Fel99]), which typically c<strong>on</strong>tain syn<strong>on</strong>ym and hypernym relati<strong>on</strong>s between<br />
terms. These relati<strong>on</strong>s can be used to find semantic relati<strong>on</strong>s between nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> graphs.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> next phase, <str<strong>on</strong>g>the</str<strong>on</strong>g> structure-level matching, <str<strong>on</strong>g>the</str<strong>on</strong>g> matching problem, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> two<br />
graphs toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping query are translated into a propositi<strong>on</strong>al formula and<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>n checked for validity (i.e. satisfiability). A mapping query is a pair <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes and a<br />
semantic relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> pair <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes. If <str<strong>on</strong>g>the</str<strong>on</strong>g> propositi<strong>on</strong>al sentence is valid, we<br />
know that <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> two nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> query holds and thus<br />
can be added to <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping result.<br />
A potential problem with this algorithm is that <str<strong>on</strong>g>the</str<strong>on</strong>g> propositi<strong>on</strong>al satisfiability check<br />
(which is known to have n<strong>on</strong>deterministic polynomial complexity) has to be performed<br />
for every pair <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes from <str<strong>on</strong>g>the</str<strong>on</strong>g> two graphs. Clearly, this does not scale for large graphs.<br />
<strong>Ontology</strong> Languages Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic matching can work with classificati<strong>on</strong> hierarchies,<br />
but also directed acyclic graphs (DAGs) in general. Classificati<strong>on</strong> hierarchies<br />
can <str<strong>on</strong>g>of</str<strong>on</strong>g>ten be extracted from <strong>on</strong>tologies by treating classes in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology as nodes and<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships as edges, but all o<str<strong>on</strong>g>the</str<strong>on</strong>g>r relati<strong>on</strong>ships are lost in <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong>.<br />
Note that multiple classificati<strong>on</strong> hierarchies can be extracted from an <strong>on</strong>tology in<br />
which c<strong>on</strong>cepts have more than <strong>on</strong>e direct superclass. This <str<strong>on</strong>g>of</str<strong>on</strong>g> course does not rule out<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm as <str<strong>on</strong>g>the</str<strong>on</strong>g> input to a mapping process for <str<strong>on</strong>g>the</str<strong>on</strong>g> complete<br />
<strong>on</strong>tologies. Also, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is work underway to extend <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic matching to work with<br />
labeled graphs, taking <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> different relati<strong>on</strong>ships into account.<br />
This does not mean that <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm in its current form is useless, <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>trary.<br />
There are currently many classificati<strong>on</strong> schemes around, such as dmoz 9 , Yahoo 10 , and<br />
many o<str<strong>on</strong>g>the</str<strong>on</strong>g>r (specialized) classificati<strong>on</strong> hierarchies are in use. However, for arbitrary <strong>on</strong>tology<br />
matching <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web, it has not been shown that <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm performs<br />
well. That <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm works well has been shown <strong>on</strong>ly for <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> classificati<strong>on</strong><br />
hierarchies with more-specific-term relati<strong>on</strong>ships. Ontologies typically implement both<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> formal subclass-<str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong> and many o<str<strong>on</strong>g>the</str<strong>on</strong>g>r types <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships.<br />
Mapping Language S-Match is a matching algorithm and as such does not have a<br />
language for <str<strong>on</strong>g>the</str<strong>on</strong>g> actual specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings, <strong>on</strong>ly for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
similarities, although in this case <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities comes close to a real<br />
mapping specificati<strong>on</strong>.<br />
As we have menti<strong>on</strong>ed earlier, <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts is d<strong>on</strong>e<br />
using set-based primitives, denoting <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships <str<strong>on</strong>g>of</str<strong>on</strong>g> equality, disjointness, overlap<br />
and sub/superset. In later work (e.g. [GSY04]), <str<strong>on</strong>g>the</str<strong>on</strong>g> authors use <str<strong>on</strong>g>the</str<strong>on</strong>g> symbols comm<strong>on</strong>ly<br />
found in descripti<strong>on</strong> logics, i.e. 〈A, B, ⊓〉 for overlap, 〈A, B, ⊑〉 for subset, 〈A, B, ⊒〉<br />
9 http://www.dmoz.org/<br />
10 http://www.yahoo.com/
CHAPTER 4. THE SURVEY 55<br />
for superset and 〈A, B, ⊥〉 for disjointness <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts. These relati<strong>on</strong>s could be<br />
translated to Descripti<strong>on</strong> Logic [BCM + 03] axioms, i.e. A ⊓ B �≡ ⊥, A ⊑ B, B ⊑ A, and<br />
A ⊓ B ≡ ⊥.<br />
Mapping Patterns Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no use <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns in Semantic Matching.<br />
It might be worthwhile to see if mapping patterns can help to find similarities, although<br />
this is not a straightforward task. Perhaps it is possible to match <strong>on</strong>tologies against<br />
mapping patterns in order to find out if a certain mapping pattern might be applicable, but<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> authors do not give any hints as to if and how we can incorporate mapping patterns<br />
into <str<strong>on</strong>g>the</str<strong>on</strong>g> matching algorithm.<br />
Automati<strong>on</strong> Support Clearly, <str<strong>on</strong>g>the</str<strong>on</strong>g> proposed algorithm is an automatic <strong>on</strong>e-pass (i.e.<br />
no user interacti<strong>on</strong>) algorithm, which returns all similarities it can find between <str<strong>on</strong>g>the</str<strong>on</strong>g> two<br />
graphs. There is no user interacti<strong>on</strong> during <str<strong>on</strong>g>the</str<strong>on</strong>g> executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching.<br />
It cannot be assumed that <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping returned by <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm is ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r correct<br />
or complete. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> S-Match algorithm can serve as a first step in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> overall <strong>on</strong>tology mapping process. It can serve as <str<strong>on</strong>g>the</str<strong>on</strong>g> input for <str<strong>on</strong>g>the</str<strong>on</strong>g> next phase in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process, in which <str<strong>on</strong>g>the</str<strong>on</strong>g> user validates <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> matching and corrects<br />
any mistakes and does <str<strong>on</strong>g>the</str<strong>on</strong>g> necessary additi<strong>on</strong>s in order to make <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping correct and<br />
complete 11 .<br />
Applicability to Use Cases For <str<strong>on</strong>g>the</str<strong>on</strong>g> purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />
Web, <str<strong>on</strong>g>the</str<strong>on</strong>g> role which can be played by S-Match could be in <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery phase <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
mappings between <strong>on</strong>tologies. Since S-Match provides an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Match<br />
operator, it fits into <str<strong>on</strong>g>the</str<strong>on</strong>g> “find similarities” step in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process.<br />
Implementati<strong>on</strong> [GSY04] presents S-Match, an algorithm and implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Semantic<br />
Matching. It also compares <str<strong>on</strong>g>the</str<strong>on</strong>g> performance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> S-Match implementati<strong>on</strong><br />
in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> speed, precisi<strong>on</strong> and recall with available implementati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> existing approaches<br />
in syntactic matchings COMA [DR02], Cupid [MBR01] and Similarity Flooding<br />
[MGMR02], which was implemented in <str<strong>on</strong>g>the</str<strong>on</strong>g> RONDO system [MRB03].<br />
It turned out that for most applicati<strong>on</strong>s, S-Match outperformed <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r systems in<br />
terms <str<strong>on</strong>g>of</str<strong>on</strong>g> precisi<strong>on</strong> and recall. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r systems typically outperformed S-Match<br />
in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> time required to perform <str<strong>on</strong>g>the</str<strong>on</strong>g> actual matching. One possible explanati<strong>on</strong> is that<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> S-Match implementati<strong>on</strong> has not really been optimized. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> S-Match implementati<strong>on</strong><br />
uses a propositi<strong>on</strong>al SAT solver, which can not be efficiently implemented,<br />
because <str<strong>on</strong>g>the</str<strong>on</strong>g> problem is known to be NP-Hard. Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are no known algorithms<br />
that require less than exp<strong>on</strong>ential time for satisfiability checking.<br />
11 Of course, it can also never be guaranteed that <str<strong>on</strong>g>the</str<strong>on</strong>g> outcome <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> human mapping will be ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r correct<br />
or complete.
CHAPTER 4. THE SURVEY 56<br />
Experiences Semantic matching has so far <strong>on</strong>ly been tested with some toy examples.<br />
However, <str<strong>on</strong>g>the</str<strong>on</strong>g> results presented in [GSY04] do look promising with respect to <str<strong>on</strong>g>the</str<strong>on</strong>g> precisi<strong>on</strong><br />
and recall achieved by <str<strong>on</strong>g>the</str<strong>on</strong>g> system compared to o<str<strong>on</strong>g>the</str<strong>on</strong>g>r existing matchers. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, S-<br />
Match is currently in <str<strong>on</strong>g>the</str<strong>on</strong>g> early stages <str<strong>on</strong>g>of</str<strong>on</strong>g> its development; <str<strong>on</strong>g>the</str<strong>on</strong>g>re are plans to apply S-Match<br />
in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r settings, which will show whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r S-Match works for real-world problems <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Semantic Web.<br />
4.1.6 OntoMap<br />
OntoMap ([KSD01a]) is a knowledge representati<strong>on</strong> formalism, reas<strong>on</strong>er, and web portal<br />
12 for upper-level <strong>on</strong>tologies and lexical semantics. The project was developed by Ontotext<br />
Lab. in cooperati<strong>on</strong> with <str<strong>on</strong>g>the</str<strong>on</strong>g> Bulgarian Academy <str<strong>on</strong>g>of</str<strong>on</strong>g> Sciences. Note that <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>Ontology</strong><br />
Mapping system used in SEKT is also named OntoMap but is mainly developed by <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
company Ontoprise and is a completely different system. The portal provides access to<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> most popular upper-level <strong>on</strong>tologies and lexical resources, toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with hand-crafted<br />
mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g>m. It facilitates <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> and comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> upper-level <strong>on</strong>tologies<br />
and lexical knowledge bases. The portal is based <strong>on</strong> a unified representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> resources, a proprietary inference engine, and a mapping methodology. It includes a<br />
number <str<strong>on</strong>g>of</str<strong>on</strong>g> alternative viewers: HTML, DHTML, a stand-al<strong>on</strong>e GUI applicati<strong>on</strong>.<br />
In order to provide a uniform representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>m, OntoMap introduces a relatively simple meta-<strong>on</strong>tology called OntoMapO.<br />
The knowledge representati<strong>on</strong> language is more complex than RDF(S) and similar to<br />
OWL Lite − [dBPF04], but it also includes specific primitives for <strong>on</strong>tology-mapping.<br />
The following upper-level <strong>on</strong>tologies are hosted:<br />
• Upper Cyc <strong>Ontology</strong><br />
• EuroWordnet Top <strong>Ontology</strong><br />
• EuroWordnet Meta-<strong>Ontology</strong><br />
• WordNet Meta-<strong>Ontology</strong><br />
• WordNet Tops (<str<strong>on</strong>g>the</str<strong>on</strong>g> top 41 classes)<br />
• MikroKosmos Top (<str<strong>on</strong>g>the</str<strong>on</strong>g> top 13 classes)<br />
• OntoMap Meta-<strong>Ontology</strong><br />
• Protege Meta-<strong>Ontology</strong><br />
• Simple <strong>Ontology</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Business Entities<br />
12 http://www.<strong>on</strong>totext.com/projects/OntoMap.html
CHAPTER 4. THE SURVEY 57<br />
• SENSUS Top (<str<strong>on</strong>g>the</str<strong>on</strong>g> top 257 classes)<br />
Mappings between EuroWordnet Top and <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies were created. There are<br />
almost no direct mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies, but <str<strong>on</strong>g>the</str<strong>on</strong>g> equivalence and subsumpti<strong>on</strong><br />
relati<strong>on</strong>s are automatically propagated through <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping to EuroWordnet<br />
Top.<br />
Mapping Language The full descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> OntoMapO could be found in<br />
[KSD01b]. Here we present just its mapping primitives, as follows:<br />
• MuchMoreSpecific - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is much more specific than <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d <strong>on</strong>e;<br />
transitive relati<strong>on</strong>. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> MuchMoreGeneral and a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> ChildOf;<br />
• MuchMoreGeneral - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is much more general than <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d <strong>on</strong>e;<br />
transitive relati<strong>on</strong>. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> MuchMoreSpecific and a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> ParentOf;<br />
• TopInstance - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is <str<strong>on</strong>g>the</str<strong>on</strong>g> most general instance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d <strong>on</strong>e, which<br />
is a meta-c<strong>on</strong>cept. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> ExactClass and a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> InstanceOf;<br />
• ExactClass - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is a meta-c<strong>on</strong>cept, <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d c<strong>on</strong>cept is <str<strong>on</strong>g>the</str<strong>on</strong>g> most<br />
general instance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first <strong>on</strong>e. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> TopInstance and a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
ClassOf;<br />
• ParentAsInstance - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is more general than all <str<strong>on</strong>g>the</str<strong>on</strong>g> instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
sec<strong>on</strong>d <strong>on</strong>e which is a meta-c<strong>on</strong>cept. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> ChildAsClass;<br />
• ChildAsClass - <str<strong>on</strong>g>the</str<strong>on</strong>g> 1st c<strong>on</strong>cept is a meta-c<strong>on</strong>cept (class), all its instances are more<br />
specific than <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d c<strong>on</strong>cept. Inverse <str<strong>on</strong>g>of</str<strong>on</strong>g> ParentAsInstance.<br />
Automati<strong>on</strong> support OntoMap does not automatically create mappings. It assumes<br />
that ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r a mapping exists or it may be created manually. Although it may seem that<br />
automatic mapping may reduce <str<strong>on</strong>g>the</str<strong>on</strong>g> efforts, in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> upper-level <strong>on</strong>tologies <str<strong>on</strong>g>the</str<strong>on</strong>g> typical<br />
heuristics involved for domain <strong>on</strong>tologies can play a very limited role. This is explained<br />
in detail in [KSD01b]. Once a mapping to <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies it supports is created,<br />
OntoMap could automatically create a mapping to any <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies.<br />
Applicability to use cases OntoMap could be used for (semi-)automatic creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<strong>Ontology</strong> Mappings between o<str<strong>on</strong>g>the</str<strong>on</strong>g>r domain <strong>on</strong>tologies and existing <strong>on</strong>es, but it requires<br />
that a mapping exists to <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> supported <strong>on</strong>tologies. The different upper-level <strong>on</strong>tologies<br />
are suited for different purposes, thus, a domain <strong>on</strong>tology may naturally map to <strong>on</strong>e<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se, and <str<strong>on</strong>g>the</str<strong>on</strong>g>n OntoMap will automatically provide a mapping to <str<strong>on</strong>g>the</str<strong>on</strong>g> rest. Although<br />
OntoMap does not directly address <str<strong>on</strong>g>the</str<strong>on</strong>g> use-case <str<strong>on</strong>g>of</str<strong>on</strong>g> instance transformati<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings<br />
it creates could be used for such tasks. It is important to menti<strong>on</strong> that OntoMap handles
CHAPTER 4. THE SURVEY 58<br />
classes and instances in an uniform fashi<strong>on</strong> and thus could transform instances to classes<br />
and vice-versa (via ParentAsInstance and ChildAsClass mapping primitives).<br />
Tool support<br />
• The OntoMap web portal (http://<strong>on</strong>tomap.<strong>on</strong>totext.com) requires <str<strong>on</strong>g>the</str<strong>on</strong>g> users to register<br />
(it is free) and <str<strong>on</strong>g>the</str<strong>on</strong>g>n it allows <str<strong>on</strong>g>the</str<strong>on</strong>g> browsing <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies via a handy DHTML<br />
Tree View. The search for c<strong>on</strong>cepts throughout <strong>on</strong>e or more <strong>on</strong>tologies is also supported.<br />
The portal allows <str<strong>on</strong>g>the</str<strong>on</strong>g> export <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies to DAML+OIL.<br />
• CYC to EWN-Top mapping. An <strong>on</strong>line service, hosted at http://demo.<strong>on</strong>totext.com,<br />
allows <str<strong>on</strong>g>the</str<strong>on</strong>g> browsing <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EuroWordnet Top <strong>on</strong>tology and its mapping into Upper<br />
Cyc <strong>Ontology</strong>. The corresp<strong>on</strong>ding Cyc c<strong>on</strong>cepts are represented with: <str<strong>on</strong>g>the</str<strong>on</strong>g>ir glosses,<br />
direct and indirect super-classes (#$genls), direct and indirect classes (#$isa). The<br />
mapping itself is expressed in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> a CycL micro<str<strong>on</strong>g>the</str<strong>on</strong>g>ory encoding <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EuroWordNet<br />
Top <strong>Ontology</strong> <strong>on</strong> top <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> publicly available p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Cyc knowledge<br />
base. This approach was chosen because such a mapping is impossible by means <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
equivalence and subsumpti<strong>on</strong> relati<strong>on</strong>s <strong>on</strong>ly. However, a simplified relati<strong>on</strong>al view<br />
that is sufficient for many purposes, is also provided. More <str<strong>on</strong>g>the</str<strong>on</strong>g>oretical details can<br />
be found in [KS00].<br />
• The OntoMap Viewer is a standal<strong>on</strong>e java applicati<strong>on</strong>, which represents<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> main functi<strong>on</strong>ality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> OntoMap web portal. OntoMap Viewer is<br />
distributed for all popular platforms: Windows, Linux, Solaris, MacOS<br />
(http://www.<strong>on</strong>totext.com/projects/OntoMapViewer/install.htm). All <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> previously<br />
menti<strong>on</strong>ed <strong>on</strong>tologies are encoded into OntoMapO language. The viewer<br />
allows <str<strong>on</strong>g>the</str<strong>on</strong>g> browsing and searching by c<strong>on</strong>cepts from any <strong>on</strong>tology. An example,<br />
shown in Figure 4.9, illustrates <str<strong>on</strong>g>the</str<strong>on</strong>g> supported mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> upper-level <strong>on</strong>tologies.<br />
The user chooses an <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g>n selects a c<strong>on</strong>cept from it, e.g. Pers<strong>on</strong><br />
from SENSUS Top <strong>on</strong>tology. Then <str<strong>on</strong>g>the</str<strong>on</strong>g> viewer shows any equivalence, superand<br />
sub-c<strong>on</strong>cepts from all <strong>on</strong>tologies including <str<strong>on</strong>g>the</str<strong>on</strong>g> current <strong>on</strong>e, but also <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs,<br />
if <str<strong>on</strong>g>the</str<strong>on</strong>g>re are equivalence and/or subsumpti<strong>on</strong> relati<strong>on</strong>s (in this case <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cept is<br />
equivalent to Human from EuroWordnet Top and Pers<strong>on</strong> from UpperCyc).<br />
Summary OntoMap provides a mapping model for upper-level <strong>on</strong>tologies, and a few<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> most popular <strong>on</strong>es are encoded in it. Using <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping to <str<strong>on</strong>g>the</str<strong>on</strong>g> EuroWordnet Top<br />
<strong>on</strong>tology and a reas<strong>on</strong>er to support <str<strong>on</strong>g>the</str<strong>on</strong>g> knowledge representati<strong>on</strong> language, a mapping<br />
between all <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies is available. Thus, a new mapping from a domain <strong>on</strong>tology<br />
to <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> supported upper-level <strong>on</strong>tologies could be automatically mapped to each <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>es. However, OntoMap is focused <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> evaluati<strong>on</strong> and <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, which are encoded into OntoMapO, ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r than <strong>on</strong> <strong>Ontology</strong> mapping or<br />
instance transformati<strong>on</strong> services.
CHAPTER 4. THE SURVEY 59<br />
Figure 4.9: OntoMap Viewer - c<strong>on</strong>cept “Pers<strong>on</strong>” in different upper-level <strong>on</strong>tologies
CHAPTER 4. THE SURVEY 60<br />
4.1.7 RDFDiff<br />
RDFDiff (formerly known as OntoView [KKOF02]) is an algorithm and a tool, developed<br />
jointly by VU Amsterdam and Ontotext Lab. It aims to detect and depict changes<br />
in RDF-encoded knowledge (such as <strong>on</strong>tologies encoded in RDF(S), DAML+OIL, and<br />
OWL). Basically, it is very much inspired by <str<strong>on</strong>g>the</str<strong>on</strong>g> CVS diff and <str<strong>on</strong>g>the</str<strong>on</strong>g> UNIX diff. It has been<br />
tuned for RDF(S) and compares <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies <strong>on</strong> a structural level, i.e. ra<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> data<br />
model <str<strong>on</strong>g>of</str<strong>on</strong>g> different RDF(S) <strong>on</strong>tologies than <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactical encoding. RDFDiff utilizes<br />
change-classificati<strong>on</strong> rules, which are intended to serve as a basis for fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r semantic<br />
and structural analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> differences. Such rules can be classified as:<br />
• robust categorizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> changes<br />
• evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> compatibility between <str<strong>on</strong>g>the</str<strong>on</strong>g> versi<strong>on</strong>s<br />
• data-transformati<strong>on</strong> or mapping<br />
The change detecti<strong>on</strong> in RDFDiff includes:<br />
• Matching an<strong>on</strong>ymous resources and <str<strong>on</strong>g>the</str<strong>on</strong>g>ir descripti<strong>on</strong>s<br />
• Detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> renamed resources, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resource<br />
• Detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> renamed resources, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> usage <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resource (detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
sub-graphs, which do not change, given that <str<strong>on</strong>g>the</str<strong>on</strong>g> old ID/URI has been substituted<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> new <strong>on</strong>e)<br />
The tool resembles PROMPTDiff [NM03a] in <str<strong>on</strong>g>the</str<strong>on</strong>g> following characteristics:<br />
• change detecti<strong>on</strong> based <strong>on</strong> graphs<br />
• an extensible set <str<strong>on</strong>g>of</str<strong>on</strong>g> rules to classify a change (called “heuristics” in both PROMPT-<br />
Diff and RDFDiff)<br />
It differs from PROMPTDiff in being able to compare XML-serializati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF(S)<br />
<strong>on</strong>tology as well. RDFDiff detects changes mainly using <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF-graph model, but it<br />
presents <str<strong>on</strong>g>the</str<strong>on</strong>g> results in a diff-like way (textual), while trying to preserve <str<strong>on</strong>g>the</str<strong>on</strong>g> order <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
statements in <str<strong>on</strong>g>the</str<strong>on</strong>g> XML serialized files. It is focused <strong>on</strong> RDF(S) and thus it handles its<br />
specifics well (e.g. an<strong>on</strong>ymous resource matching). However, <str<strong>on</strong>g>the</str<strong>on</strong>g> rules model <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> logical language (i.e. Descripti<strong>on</strong> Logics for DAML+OIL and OWL).
CHAPTER 4. THE SURVEY 61<br />
RDFDiff overview Essentially, <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm takes as input two RDF-encoded files,<br />
compares <str<strong>on</strong>g>the</str<strong>on</strong>g>m, and produces a list <str<strong>on</strong>g>of</str<strong>on</strong>g> changes, organized into added, changed and deleted<br />
items.<br />
It is important to menti<strong>on</strong> that RDFDiff does not implement much semantics. The<br />
rules may implement <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data model <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF-encoded <strong>on</strong>tology but<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>y need not to be used. RDFDiff does not need to be run <strong>on</strong> two <strong>on</strong>tologies (it could<br />
compare any arbitrary RDF-graphs) but it was developed to aid <strong>on</strong>tology comparis<strong>on</strong>.<br />
In its first versi<strong>on</strong> RDFDiff was perceived as a diff for an old and a new versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an<br />
<strong>on</strong>tology, and thus <str<strong>on</strong>g>the</str<strong>on</strong>g> descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> algorithm, as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> user interface <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
tool implementing it, refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies being compared as ”old” and ”new”.<br />
However, except for <str<strong>on</strong>g>the</str<strong>on</strong>g> visualizati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> results, RDFDiff does not depend <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> fact<br />
that <str<strong>on</strong>g>the</str<strong>on</strong>g> RDF graphs are two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same graph and <str<strong>on</strong>g>the</str<strong>on</strong>g>refore it could well be<br />
used for any two <strong>on</strong>tologies.<br />
The algorithm treats an RDF-encoded <strong>on</strong>tology as a sequence <str<strong>on</strong>g>of</str<strong>on</strong>g> resource definiti<strong>on</strong>s,<br />
where a definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> resource R is assumed to be <str<strong>on</strong>g>the</str<strong>on</strong>g> list <str<strong>on</strong>g>of</str<strong>on</strong>g> all statements where R is<br />
a subject. This grouping <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> statements into resource definiti<strong>on</strong>s is opti<strong>on</strong>al, because<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> user might require <str<strong>on</strong>g>the</str<strong>on</strong>g> original grouping from <str<strong>on</strong>g>the</str<strong>on</strong>g> source file. In <str<strong>on</strong>g>the</str<strong>on</strong>g> case where <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
statements are not grouped into ”resource definiti<strong>on</strong>s”, <str<strong>on</strong>g>the</str<strong>on</strong>g>y will be handled as grouped in<br />
first level XML elements. The resource definiti<strong>on</strong>s are ordered according to <str<strong>on</strong>g>the</str<strong>on</strong>g> new file,<br />
c<strong>on</strong>sidering <str<strong>on</strong>g>the</str<strong>on</strong>g> positi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first statement from <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong>.<br />
<str<strong>on</strong>g>State</str<strong>on</strong>g>ments are c<strong>on</strong>sidered to be changed if and <strong>on</strong>ly if <str<strong>on</strong>g>the</str<strong>on</strong>g> subject and <str<strong>on</strong>g>the</str<strong>on</strong>g> predicate in<br />
each versi<strong>on</strong>s match unambiguously, but <str<strong>on</strong>g>the</str<strong>on</strong>g> values might be different, i.e. when <str<strong>on</strong>g>the</str<strong>on</strong>g>re is<br />
a single statement with such a subject and such a predicate in both versi<strong>on</strong>s. Because <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
resource definiti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first file are reordered according to <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d file, definiti<strong>on</strong>s<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> same terms that occur in different places in <str<strong>on</strong>g>the</str<strong>on</strong>g> two files can be found.<br />
Mapping language The change-classificati<strong>on</strong> rules are defined via a simple RDFS<br />
schema, which c<strong>on</strong>tains two classes and a few properties. The classes are Rule and Triplet.<br />
Each rule is a set <str<strong>on</strong>g>of</str<strong>on</strong>g> triplets defining relati<strong>on</strong>ships between some triples via comm<strong>on</strong> variables<br />
used in place <str<strong>on</strong>g>of</str<strong>on</strong>g> ’subject’, ’predicate’ or ’object’. The main task is to find all <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
possible soluti<strong>on</strong>s (read possible bindings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> variables used in <str<strong>on</strong>g>the</str<strong>on</strong>g> triplets) where <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />
exist triples that match <str<strong>on</strong>g>the</str<strong>on</strong>g> patterns <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> whole set <str<strong>on</strong>g>of</str<strong>on</strong>g> triplets. Then all soluti<strong>on</strong>s found<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> older versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resource definiti<strong>on</strong> are compared against those found in its<br />
newer versi<strong>on</strong>, and <str<strong>on</strong>g>the</str<strong>on</strong>g> equal pairs are removed.<br />
Each instance <str<strong>on</strong>g>of</str<strong>on</strong>g> a Rule class can be c<strong>on</strong>nected with several instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Triplet<br />
by <str<strong>on</strong>g>the</str<strong>on</strong>g> property use, which is defined as:<br />
<br />
These are <str<strong>on</strong>g>the</str<strong>on</strong>g> properties that indicate which variable is to be observed for possible<br />
changes and what label should be added, removed or changed:
CHAPTER 4. THE SURVEY 62<br />
<br />
The instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Triplet class can be c<strong>on</strong>nected with predicates specifying a subject,<br />
a predicate and an object <str<strong>on</strong>g>of</str<strong>on</strong>g> a triple for matching. Each value <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se predicates,<br />
st<str<strong>on</strong>g>art</str<strong>on</strong>g>ing with ”$”, can be treated as a name <str<strong>on</strong>g>of</str<strong>on</strong>g> a variable, and <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>rs as resource URIs<br />
or literals.<br />
<br />
The soluti<strong>on</strong>s are compared by <str<strong>on</strong>g>the</str<strong>on</strong>g> values bound to <str<strong>on</strong>g>the</str<strong>on</strong>g> used variables: if a value is a<br />
resource, by its URI, if it is a literal, lexically, and in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> an an<strong>on</strong>ymous resource,<br />
by <str<strong>on</strong>g>the</str<strong>on</strong>g> object value <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> triple with a predicate ”daml:<strong>on</strong>Property”. If <str<strong>on</strong>g>the</str<strong>on</strong>g>re is at least<br />
<strong>on</strong>e such soluti<strong>on</strong> in any <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> versi<strong>on</strong>s, <str<strong>on</strong>g>the</str<strong>on</strong>g> rule is applicable and its ‘label’ is added.<br />
If <str<strong>on</strong>g>the</str<strong>on</strong>g>re are soluti<strong>on</strong>s which <strong>on</strong>ly differ by <str<strong>on</strong>g>the</str<strong>on</strong>g> value <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ’checkVar’ variable: <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
generated ’label’ is <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e prepared by <str<strong>on</strong>g>the</str<strong>on</strong>g> literal, c<strong>on</strong>nected with a ’rule:<strong>on</strong>Changed’<br />
property to that rule instance. If <str<strong>on</strong>g>the</str<strong>on</strong>g>re are no soluti<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> older versi<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g>n for<br />
each ’extra’ soluti<strong>on</strong> that is left unmatched in <str<strong>on</strong>g>the</str<strong>on</strong>g> newer versi<strong>on</strong>, a ’label’ is generated<br />
from <str<strong>on</strong>g>the</str<strong>on</strong>g> literal, c<strong>on</strong>nected with a ’rule:<strong>on</strong>Added’ property to <str<strong>on</strong>g>the</str<strong>on</strong>g> rule. And in <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
unmatched soluti<strong>on</strong>s left from <str<strong>on</strong>g>the</str<strong>on</strong>g> older versi<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> rule label is generated with <str<strong>on</strong>g>the</str<strong>on</strong>g> literal<br />
c<strong>on</strong>nected to <str<strong>on</strong>g>the</str<strong>on</strong>g> rule with a rule:<strong>on</strong>Removed property.<br />
An example rule to detect <str<strong>on</strong>g>the</str<strong>on</strong>g> change <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> parent class follows:<br />
<br />
<br />
<br />
$X<br />
<br />
$Z<br />
<br />
<br />
$Z<br />
subClassOf.changed to $Z<br />
subClassOf.added to $Z<br />
subClassOf.removed to $Z<br />
<br />
Applicati<strong>on</strong> to use-cases When applied to two <strong>on</strong>tologies, RDFDiff will find structural<br />
similarities. It does not utilize semantic (S-Match) or linguistic (PROMPT) similarities.
CHAPTER 4. THE SURVEY 63<br />
The change-classificati<strong>on</strong> rules could be a powerful declarative syntax for <str<strong>on</strong>g>the</str<strong>on</strong>g> automatic<br />
detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> changes or for <str<strong>on</strong>g>the</str<strong>on</strong>g> automatic creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> a mapping between two <strong>on</strong>tologies.<br />
Instance transformati<strong>on</strong> is also possible, but all <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong>s require extensi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> allowed acti<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> rules.<br />
Tool support The RDFDiff tool is web-based. It implements <str<strong>on</strong>g>the</str<strong>on</strong>g> RDFDiff algorithm,<br />
and in additi<strong>on</strong> it allows <strong>on</strong>e to specify some handy compare opti<strong>on</strong>s:<br />
• pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> namespaces treated as equal;<br />
• an ignore-list <str<strong>on</strong>g>of</str<strong>on</strong>g> properties (e.g. ignoring rdf:Comment).<br />
Figure 4.10: RDFDiff - an example<br />
Summary RDFDiff is a diff-like tool oriented to <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> XML-serialized<br />
RDF(S) graphs. It could be used for change detecti<strong>on</strong> between two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology,<br />
or for a comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> two arbitrary <strong>on</strong>tologies. Although not directly suited for<br />
mediati<strong>on</strong> use-cases, its change classificati<strong>on</strong> rules allow for applicati<strong>on</strong>s for automatic<br />
creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings or instance transformati<strong>on</strong>.<br />
4.1.8 OntoMerge<br />
Dou et al.[DMQ02] introduce an approach to <strong>on</strong>tology mediati<strong>on</strong> “<strong>on</strong>tology translati<strong>on</strong> by<br />
<strong>on</strong>tology merging and automated reas<strong>on</strong>ing”. In this approach, <strong>on</strong>tologies are merged by
CHAPTER 4. THE SURVEY 64<br />
taking <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> both <strong>on</strong>tologies, where all terms are separated through <str<strong>on</strong>g>the</str<strong>on</strong>g> differences<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> namespace. So-called Bridging Axioms are used to c<strong>on</strong>nect <str<strong>on</strong>g>the</str<strong>on</strong>g> overlapping p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies.<br />
In general, when merging <strong>on</strong>tologies, <strong>on</strong>e would ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r create a new namespace for<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology or import <strong>on</strong>e <strong>on</strong>tology into <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r, so that <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology<br />
uses <str<strong>on</strong>g>the</str<strong>on</strong>g> namespace <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> importing <strong>on</strong>tology. Having in <str<strong>on</strong>g>the</str<strong>on</strong>g> end an <strong>on</strong>tology which<br />
uses different namespaces in its definiti<strong>on</strong>s can be very c<strong>on</strong>fusing for <str<strong>on</strong>g>the</str<strong>on</strong>g> user, since an<br />
<strong>on</strong>tology is intended to be shared am<strong>on</strong>g multiple p<str<strong>on</strong>g>art</str<strong>on</strong>g>ies. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> bridging<br />
axioms in <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology might also be very c<strong>on</strong>fusing for <str<strong>on</strong>g>the</str<strong>on</strong>g> user, since <str<strong>on</strong>g>the</str<strong>on</strong>g>y serve<br />
no o<str<strong>on</strong>g>the</str<strong>on</strong>g>r purpose than linking toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r related terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. Thus, <str<strong>on</strong>g>the</str<strong>on</strong>g> merged<br />
<strong>on</strong>tology c<strong>on</strong>tains a lot <str<strong>on</strong>g>of</str<strong>on</strong>g> clutter, which makes <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology hard to understand and hard<br />
to use. The clutter in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g>: (1) terms with different namespaces, (2)<br />
similar and equivalent terms exist in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology and (3) bridging axioms between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
related terms. These three factors impede usability and especially sharing <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology.<br />
On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, Dou et al.[DMQ02] does not propose to use <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tologies<br />
as such, but to merely use <str<strong>on</strong>g>the</str<strong>on</strong>g>m for three different tasks:<br />
1. Dataset translati<strong>on</strong> (cf. instance transformati<strong>on</strong> in [dBP04]). Dataset translati<strong>on</strong> is<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> translating a set <str<strong>on</strong>g>of</str<strong>on</strong>g> data (instances) from <strong>on</strong>e representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
o<str<strong>on</strong>g>the</str<strong>on</strong>g>r.<br />
2. <strong>Ontology</strong> extensi<strong>on</strong> generati<strong>on</strong>. The problem <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology extensi<strong>on</strong> generati<strong>on</strong> is<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> problem <str<strong>on</strong>g>of</str<strong>on</strong>g> generating an extensi<strong>on</strong> O2s, given two related <strong>on</strong>tologies O1 and<br />
O2 and an extensi<strong>on</strong> (sub<strong>on</strong>tology) O1s <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology O1. The example given by <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
authors is to generate a WSDL extensi<strong>on</strong> based <strong>on</strong> an OWL-S descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
corresp<strong>on</strong>ding Web Service.<br />
3. Querying different <strong>on</strong>tologies. This relates very much to <str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting described<br />
in [dBP04]. However, query rewriting is a technique for solving <str<strong>on</strong>g>the</str<strong>on</strong>g> problem<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> querying different <strong>on</strong>tologies, whereas Dou et al. [DMQ02] merely menti<strong>on</strong><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> problem.<br />
As we have also suggested in [dBP04], OntoMerge uses mappings between <strong>on</strong>tologies<br />
in order to enable <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong>. In fact, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology translati<strong>on</strong> (except for <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong><br />
generati<strong>on</strong>) can be seen as run-time mediati<strong>on</strong> [dBP04].<br />
[DMQ02] presents an internal representati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, called Web-PDDL,<br />
which is a typed first-order logic language. The import and export <str<strong>on</strong>g>of</str<strong>on</strong>g> DAML+OIL and<br />
OWL is already supported. Due to <str<strong>on</strong>g>the</str<strong>on</strong>g> fact that Web-PDDL can capture many different<br />
<strong>on</strong>tology languages, importers and exporters for o<str<strong>on</strong>g>the</str<strong>on</strong>g>r languages could be written as well.
CHAPTER 4. THE SURVEY 65<br />
Dataset translati<strong>on</strong> OntoMerge [DMQ02] performs dataset translati<strong>on</strong> in two distinct<br />
steps. First, given <str<strong>on</strong>g>the</str<strong>on</strong>g> source dataset (a set <str<strong>on</strong>g>of</str<strong>on</strong>g> facts) and <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology, all possible<br />
inferences are drawn from <str<strong>on</strong>g>the</str<strong>on</strong>g> source facts. Sec<strong>on</strong>dly, <str<strong>on</strong>g>the</str<strong>on</strong>g> results are projected <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
target vocabulary, retaining <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> results expressed in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology.<br />
These two steps guarantee that a maximal translati<strong>on</strong> is performed, with respect to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
merged <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g> source dataset.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g>ir practical evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system, <str<strong>on</strong>g>the</str<strong>on</strong>g> authors <strong>on</strong>ly work with very small<br />
datasets c<strong>on</strong>sisting <str<strong>on</strong>g>of</str<strong>on</strong>g> several thousand facts. The fact that <str<strong>on</strong>g>the</str<strong>on</strong>g>y use a <str<strong>on</strong>g>the</str<strong>on</strong>g>orem prover<br />
leaves open questi<strong>on</strong>s about scalability for large numbers <str<strong>on</strong>g>of</str<strong>on</strong>g> facts.<br />
<strong>Ontology</strong> extensi<strong>on</strong> generati<strong>on</strong> Say you have two related <strong>on</strong>tologies O1 and O2 and<br />
a sub<strong>on</strong>tology O1s <str<strong>on</strong>g>of</str<strong>on</strong>g> O1. It is now possible, using <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships between <str<strong>on</strong>g>the</str<strong>on</strong>g> two<br />
<strong>on</strong>tologies, to automatically generate a sub<strong>on</strong>tology O2s <str<strong>on</strong>g>of</str<strong>on</strong>g> O2 which corresp<strong>on</strong>ds with<br />
O1s.<br />
The sub<strong>on</strong>tology O2s will c<strong>on</strong>tain new predicates with identifiers generated using<br />
skolem c<strong>on</strong>stants. The new predicates are made sub-predicates <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> existing predicates.<br />
The major disadvantage to this approach for <strong>on</strong>tology extensi<strong>on</strong> generati<strong>on</strong>, identified<br />
by <str<strong>on</strong>g>the</str<strong>on</strong>g> authors, is that <str<strong>on</strong>g>the</str<strong>on</strong>g> generated sub<strong>on</strong>tology <strong>on</strong>ly c<strong>on</strong>tains subproperty axioms,<br />
whereas many sub<strong>on</strong>tologies might be specified using general axioms.<br />
Querying through Different Ontologies Querying is d<strong>on</strong>e in OntoMerge by selecting<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology which merges <str<strong>on</strong>g>the</str<strong>on</strong>g> query <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology. Then, a<br />
query selecti<strong>on</strong> and reformulati<strong>on</strong> module (not described in detail) is used to select subqueries<br />
and reformulate <str<strong>on</strong>g>the</str<strong>on</strong>g> subqueries. Each subquery is executed <strong>on</strong> respective knowledge<br />
bases and <str<strong>on</strong>g>the</str<strong>on</strong>g> results are combined.<br />
In fact, what we call an <strong>Ontology</strong> mapping is very similar to a set <str<strong>on</strong>g>of</str<strong>on</strong>g> bridging axioms<br />
in [DMQ02]. However, we do not presume <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies use <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />
language as <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping, whereas OntoMerge requires <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology to c<strong>on</strong>sist <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> bridging axioms.<br />
A major drawback <str<strong>on</strong>g>of</str<strong>on</strong>g> OntoMerge is that bridging axioms need to be written using a<br />
first-order language. Only very few people are familiar with <str<strong>on</strong>g>the</str<strong>on</strong>g> first-order logic.<br />
4.1.9 OMEN<br />
OMEN (<strong>Ontology</strong> Mapping Enhancer) [MNJ04] is a tool for describing mappings with<br />
probabilities and infer new mappings by means <str<strong>on</strong>g>of</str<strong>on</strong>g> Bayesian Network inference mechanisms.<br />
The motivati<strong>on</strong> for using Bayesian Networks for this purpose is that especially
CHAPTER 4. THE SURVEY 66<br />
when <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings are discovered automatically by heuristics or machine learning techniques,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>y are imprecise and attached with a certain uncertainty. Also experts are sometimes<br />
unsure about <str<strong>on</strong>g>the</str<strong>on</strong>g> exact match between <strong>on</strong>tological elements <str<strong>on</strong>g>of</str<strong>on</strong>g> different <strong>on</strong>tologies.<br />
This allows <str<strong>on</strong>g>the</str<strong>on</strong>g>m to assign some certainty rating to a match.<br />
The inference <str<strong>on</strong>g>of</str<strong>on</strong>g> new mappings happens by means <str<strong>on</strong>g>of</str<strong>on</strong>g> meta-rules based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantics<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology relati<strong>on</strong>s that expresses how each mapping affects o<str<strong>on</strong>g>the</str<strong>on</strong>g>r related<br />
mappings. These meta-rules are used by <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network.<br />
Within this tool, a simple <strong>on</strong>tology model is assumed. Similar to RDF Schema, <strong>on</strong>ly<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> following comp<strong>on</strong>ents are used to express <strong>on</strong>tologies:<br />
• Classes are c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain that are organized in a hierarchy <str<strong>on</strong>g>of</str<strong>on</strong>g> subclass<br />
superclass relati<strong>on</strong>ships with multiple inheritance.<br />
• Properties describe attributes <str<strong>on</strong>g>of</str<strong>on</strong>g> classes and relati<strong>on</strong>ships between classes. Properties<br />
are c<strong>on</strong>sidered to have <strong>on</strong>e or more domains and <strong>on</strong>e or more ranges. Domains<br />
are classes to which <str<strong>on</strong>g>the</str<strong>on</strong>g> property can be applied, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> domains that<br />
are possible is taken as final domain. Ranges restrict <str<strong>on</strong>g>the</str<strong>on</strong>g> classes for <str<strong>on</strong>g>the</str<strong>on</strong>g> values <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> property, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ranges that are possible is taken as final<br />
range.<br />
C<strong>on</strong>structi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network The nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network corresp<strong>on</strong>d<br />
to individual pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> matches, i.e. matches between individual pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> classes<br />
or properties from different <strong>on</strong>tologies. Arrows in <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network represent <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
influences between <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network. Each node is associated with a<br />
c<strong>on</strong>diti<strong>on</strong>al probability table (CPT) that represents <str<strong>on</strong>g>the</str<strong>on</strong>g> influences <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> parents <str<strong>on</strong>g>of</str<strong>on</strong>g> a node<br />
that affect <str<strong>on</strong>g>the</str<strong>on</strong>g> node itself.<br />
Note that not all possible pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> classes or properties can be represented as nodes<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network because <str<strong>on</strong>g>the</str<strong>on</strong>g>re are too many pairs and <str<strong>on</strong>g>the</str<strong>on</strong>g> inference would not<br />
be efficient. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network is pruned. For details <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> pruning see<br />
[MNJ04]. Also <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> parents that each nodes has is adjusted because too many<br />
parents significantly increase <str<strong>on</strong>g>the</str<strong>on</strong>g> computati<strong>on</strong>s that are necessary and make <str<strong>on</strong>g>the</str<strong>on</strong>g> inference<br />
intractable. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, all cycles that may occur with <str<strong>on</strong>g>the</str<strong>on</strong>g> arrows are broken by rejecting<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> edges from <str<strong>on</strong>g>the</str<strong>on</strong>g> parents whose matching informati<strong>on</strong> is minimum.<br />
For inferencing, <str<strong>on</strong>g>the</str<strong>on</strong>g> nodes that have an initial probability that is above a given threshold<br />
are c<strong>on</strong>sidered to be p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> evidence. The CPTs are computed by means <str<strong>on</strong>g>of</str<strong>on</strong>g> a set<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> generic meta-rules that enables <str<strong>on</strong>g>the</str<strong>on</strong>g> automatic generati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> CPTs for each p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular<br />
pair <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies. The implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> OMEN system is parametrized with respect<br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> meta-rules. Meta-rules can be added and removed in order to enable a better<br />
evaluati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> which <strong>on</strong>e work best for a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular knowledge model.<br />
Afterwards <str<strong>on</strong>g>the</str<strong>on</strong>g> inference <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Bayesian Network can be st<str<strong>on</strong>g>art</str<strong>on</strong>g>ed and <str<strong>on</strong>g>the</str<strong>on</strong>g> output is a<br />
new set <str<strong>on</strong>g>of</str<strong>on</strong>g> matches.
CHAPTER 4. THE SURVEY 67<br />
The set <str<strong>on</strong>g>of</str<strong>on</strong>g> meta-rules used in <str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong> are for example <str<strong>on</strong>g>the</str<strong>on</strong>g> following metarules:<br />
• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two c<strong>on</strong>cepts C and C ′ that match and <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a relati<strong>on</strong>ship between<br />
C and ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r c<strong>on</strong>cept C1 in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O and a relati<strong>on</strong>ship between C ′ and C ′ 1 in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O ′ . Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore <str<strong>on</strong>g>the</str<strong>on</strong>g>se two relati<strong>on</strong>ships match. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> probability<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> match between C1 and C ′ 1 is increased.<br />
• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two c<strong>on</strong>cepts C and C ′ that match and <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a relati<strong>on</strong>ship between<br />
C and ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r c<strong>on</strong>cept C1 in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O and a relati<strong>on</strong>ship between C ′ and C ′ 1<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology O ′ . Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore <str<strong>on</strong>g>the</str<strong>on</strong>g>se two relati<strong>on</strong>ships do not match. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
probability <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> match between C1 and C ′ 1 is decreased.<br />
• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two properties that match and each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>m has a single range. Then,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> probability <str<strong>on</strong>g>of</str<strong>on</strong>g> a match between <str<strong>on</strong>g>the</str<strong>on</strong>g> classes that represent <str<strong>on</strong>g>the</str<strong>on</strong>g> ranges is increased.<br />
• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two properties that match and <str<strong>on</strong>g>the</str<strong>on</strong>g> first property has a range that is a<br />
uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> classes C1 and C2. The o<str<strong>on</strong>g>the</str<strong>on</strong>g>r property has a single range corresp<strong>on</strong>ding<br />
to C ′ . Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that C1 is a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> C ′ can be increased.<br />
Analogously, <str<strong>on</strong>g>the</str<strong>on</strong>g> probability that C2 is a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> C ′ can be increased.<br />
• Say <str<strong>on</strong>g>the</str<strong>on</strong>g>re are mappings between superc<strong>on</strong>cepts <str<strong>on</strong>g>of</str<strong>on</strong>g> two certain c<strong>on</strong>cepts, each bel<strong>on</strong>ging<br />
to a different <strong>on</strong>tology, and all <str<strong>on</strong>g>the</str<strong>on</strong>g> siblings. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g> probability <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />
match between <str<strong>on</strong>g>the</str<strong>on</strong>g> remaining c<strong>on</strong>cepts is increased.<br />
Note that in <str<strong>on</strong>g>the</str<strong>on</strong>g> OMEN system probabilistic influences are combined as follows: If<br />
a node in a Bayesian Network has two parents, <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>diti<strong>on</strong>al probability tables are<br />
combined for <str<strong>on</strong>g>the</str<strong>on</strong>g> child using <str<strong>on</strong>g>the</str<strong>on</strong>g> assumpti<strong>on</strong> that <str<strong>on</strong>g>the</str<strong>on</strong>g> two parents are independent. I.e.<br />
P (N|P1, P2) = P (N|P1)P (N|P2)<br />
When <str<strong>on</strong>g>the</str<strong>on</strong>g> match <str<strong>on</strong>g>of</str<strong>on</strong>g> two pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> parents influences each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r, this assumpti<strong>on</strong> is<br />
not true. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> system obtained encouraging results even with this simplifying<br />
assumpti<strong>on</strong>.<br />
Experiments were c<strong>on</strong>ducted in order to see whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> inference <str<strong>on</strong>g>of</str<strong>on</strong>g> new mappings<br />
works. For this purpose, two <strong>on</strong>tologies expressed in RDF using RDF Schema have been<br />
used. The experiments showed that new mappings can be inferred by OMEN. E.g. by<br />
giving <strong>on</strong>ly 3 out <str<strong>on</strong>g>of</str<strong>on</strong>g> 11 matches, up to 7 missing matches were able to be generated.<br />
4.1.10 WSMT Data mediati<strong>on</strong> module<br />
The Web Services Modelling Toolkit (WSMT) [Ker05] has different modules to model<br />
Semantic Web Services based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMO framework [DR05]. The data mediati<strong>on</strong><br />
module [Moc05] presented in this secti<strong>on</strong> c<strong>on</strong>tains three sub-modules or comp<strong>on</strong>ents: a<br />
graphical user interface for defining <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings, a comp<strong>on</strong>ent that reads <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings
CHAPTER 4. THE SURVEY 68<br />
from <str<strong>on</strong>g>the</str<strong>on</strong>g> storage and generates <str<strong>on</strong>g>the</str<strong>on</strong>g> appropriate mapping rules and an envir<strong>on</strong>ment that<br />
provides <str<strong>on</strong>g>the</str<strong>on</strong>g> means for executing <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules. The first module represents <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
design-time comp<strong>on</strong>ent while <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d two modules represent <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time comp<strong>on</strong>ent.<br />
Figure4.11 shows how <str<strong>on</strong>g>the</str<strong>on</strong>g>se comp<strong>on</strong>ents interact.<br />
Overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Data Mediati<strong>on</strong> Module<br />
Design Time Comp<strong>on</strong>ent<br />
Figure 4.11: Overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMT Data Mediati<strong>on</strong> Module<br />
The mapping tools <str<strong>on</strong>g>of</str<strong>on</strong>g>fer graphical interfaces in order to assist <str<strong>on</strong>g>the</str<strong>on</strong>g> user in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping<br />
creati<strong>on</strong> and to reduce his or her effort to simple choices and validati<strong>on</strong>s.<br />
The graphical interface is built using Java 1.5 13 and <str<strong>on</strong>g>the</str<strong>on</strong>g> default <strong>on</strong>tology language<br />
supported is WSML. Future versi<strong>on</strong>s are expected to propose various wrappers that could<br />
transliterate <strong>on</strong>tologies expressed in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r languages into WSML (a simplified <strong>on</strong>e for<br />
Flora-2 already exists). The object model behind <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping tool is fully compatible<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMO API and WSMO4J (and this applies to <str<strong>on</strong>g>the</str<strong>on</strong>g> runtime comp<strong>on</strong>ent as well).<br />
The WSMO API provides a set <str<strong>on</strong>g>of</str<strong>on</strong>g> interface for manipulating WSMO entities, including<br />
<strong>on</strong>tologies. WSMO4J provides an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se interfaces and a parser for<br />
WSML 14 . The mapping tool was integrated as a plug-in in <str<strong>on</strong>g>the</str<strong>on</strong>g> Web Service Modelling<br />
13 see http://java.sun.com for more informati<strong>on</strong><br />
14 for more details see https://sourceforge.net/projects/wsmo4j
CHAPTER 4. THE SURVEY 69<br />
Toolkit, which also <str<strong>on</strong>g>of</str<strong>on</strong>g>fers as plug-ins a WSML editor and a WSMX invoker (see Figure<br />
4.12). The editor can be used by <str<strong>on</strong>g>the</str<strong>on</strong>g> domain expert to operate <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies <str<strong>on</strong>g>the</str<strong>on</strong>g>y are<br />
mapping and <str<strong>on</strong>g>the</str<strong>on</strong>g> framework <str<strong>on</strong>g>of</str<strong>on</strong>g>fers basic synchr<strong>on</strong>izati<strong>on</strong> functi<strong>on</strong>s such as reloading <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>tologies into <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping tool or invalidating mapping rules affected by changes in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>tologies. The WSML invoker can be used to test mappings with run-time comp<strong>on</strong>ents<br />
deployed as Web Services, directly from <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping tool (more details about this are<br />
provided in secti<strong>on</strong> 4.1.10).<br />
Figure 4.12: Screenshots <str<strong>on</strong>g>of</str<strong>on</strong>g> WSMT Plug-ins (from back to fr<strong>on</strong>t: <str<strong>on</strong>g>the</str<strong>on</strong>g> Preferences Panel,<br />
WSMX Invoker, WSML Editor and Mapping Tool)<br />
As <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping tool and WSMT are two <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> surrounding efforts around WSMO<br />
and WSMX, <str<strong>on</strong>g>the</str<strong>on</strong>g>y are p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMX open source project 15 - Windows and Linux<br />
installers for WSMT can be found <strong>on</strong> SourceForge 16 .<br />
15 see https://sourceforge.net/projects/wsmx<br />
16 downloads available at: http://sourceforge.net/project/showfiles.php?group id=113321
CHAPTER 4. THE SURVEY 70<br />
Map mediate(<strong>Ontology</strong> source<strong>Ontology</strong>,<br />
<strong>Ontology</strong> target<strong>Ontology</strong>,<br />
Set data)<br />
Transforms a set <str<strong>on</strong>g>of</str<strong>on</strong>g> source <strong>on</strong>tology<br />
instances into instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target<br />
<strong>on</strong>tology.<br />
List mediate(<strong>Ontology</strong> source<strong>Ontology</strong>,<br />
<strong>Ontology</strong> target<strong>Ontology</strong>,<br />
Identifiable data)<br />
Transforms a given source <strong>on</strong>tology<br />
instance into instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target<br />
<strong>on</strong>tology.<br />
StringBuffer mediate(<strong>Ontology</strong> source<strong>Ontology</strong>,<br />
<strong>Ontology</strong> target<strong>Ontology</strong>,<br />
StringBuffer payload)<br />
Transforms source <strong>on</strong>tology instances<br />
into instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology.<br />
The payload represents a WSML document<br />
c<strong>on</strong>taining <str<strong>on</strong>g>the</str<strong>on</strong>g> instances to be mediated.<br />
It will be parsed and after <str<strong>on</strong>g>the</str<strong>on</strong>g> mediati<strong>on</strong><br />
takes places a new WSML document is created<br />
c<strong>on</strong>taining <str<strong>on</strong>g>the</str<strong>on</strong>g> target instances.<br />
Table 4.1: Interfaces <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Run-time Mediator Method Summary<br />
The mappings are stored in an external storage, in this case a relati<strong>on</strong>al database, from<br />
where <str<strong>on</strong>g>the</str<strong>on</strong>g>y can be loaded by <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules generator module. Also, by means <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
this graphical interface, <str<strong>on</strong>g>the</str<strong>on</strong>g> user can load existing mappings from <str<strong>on</strong>g>the</str<strong>on</strong>g> external storage for<br />
fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r refinements or as support in computing <str<strong>on</strong>g>the</str<strong>on</strong>g> suggesti<strong>on</strong>s.<br />
Run-time comp<strong>on</strong>ent<br />
The Mapping Rules Creator implements <str<strong>on</strong>g>the</str<strong>on</strong>g> grounding mechanism described in Secti<strong>on</strong><br />
4.2 while <str<strong>on</strong>g>the</str<strong>on</strong>g> Rule Executi<strong>on</strong> Envir<strong>on</strong>ment has <str<strong>on</strong>g>the</str<strong>on</strong>g> role <str<strong>on</strong>g>of</str<strong>on</strong>g> executing <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules<br />
against <str<strong>on</strong>g>the</str<strong>on</strong>g> incoming source <strong>on</strong>tology instances. The Run-time comp<strong>on</strong>ent is designed to<br />
be p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMX architecture [MZ05], and as a c<strong>on</strong>sequence it <str<strong>on</strong>g>of</str<strong>on</strong>g>fers well-defined<br />
interfaces explaining how it can be invoked. These interfaces are presented in Table 4.1.<br />
The first two methods refer to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tological entities in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMO API<br />
objects, while <str<strong>on</strong>g>the</str<strong>on</strong>g> last method manipulates WSML documents embedded in StringBuffer<br />
objects.<br />
Outside <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMX architecture <str<strong>on</strong>g>the</str<strong>on</strong>g> Run-time comp<strong>on</strong>ent can be used in <str<strong>on</strong>g>the</str<strong>on</strong>g> following
CHAPTER 4. THE SURVEY 71<br />
ways:<br />
• as a standal<strong>on</strong>e applicati<strong>on</strong> able to c<strong>on</strong>nect to <str<strong>on</strong>g>the</str<strong>on</strong>g> provided mapping storage and<br />
to perform mediati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> instances provided as WSML documents. It <str<strong>on</strong>g>of</str<strong>on</strong>g>fers a small<br />
graphical interface where <str<strong>on</strong>g>the</str<strong>on</strong>g> user can set <str<strong>on</strong>g>the</str<strong>on</strong>g> source and <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology, provide<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> data to be mediated and retrieve <str<strong>on</strong>g>the</str<strong>on</strong>g> mediated data.<br />
Figure 4.13: The Graphical Interface <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Stand-al<strong>on</strong>e Run-time Mediator<br />
• as a deployed Web Service that can be invoked with <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tology<br />
IDs and data to be mediated and that returns <str<strong>on</strong>g>the</str<strong>on</strong>g> mediated data. The mapping tool<br />
can directly invoke such Web Services by using <str<strong>on</strong>g>the</str<strong>on</strong>g> WSMT Invoker.
CHAPTER 4. THE SURVEY 72<br />
4.1.11 DOME mapping module<br />
The Distributed <strong>Ontology</strong> Management Envir<strong>on</strong>ment(DOME [Hen05]) 17 is an <strong>on</strong>going<br />
project aiming at building a suite <str<strong>on</strong>g>of</str<strong>on</strong>g> tools to manage distributed <strong>on</strong>tologies. Proposed as<br />
a stand-al<strong>on</strong>e applicati<strong>on</strong> or as an Eclipse plug-in, it includes three modules: an <strong>on</strong>tology<br />
editor, a mapping editor, and a versi<strong>on</strong>ing editor. DOME is developed using c<strong>on</strong>cepts from<br />
Human Interface Design and <strong>on</strong>tology engineering to facilitate <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology management<br />
in being easily usable and intuitive. Figure 4.14 present a screen-shot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware.<br />
Figure 4.14: Screenshot <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Distributed <strong>Ontology</strong> Management Envir<strong>on</strong>ment(DOME)<br />
Figure 4.15: DOME screenshot<br />
Dome is based <strong>on</strong> WSMO4j to editing WSML <strong>on</strong>tologies.<br />
The mapping editor is based <strong>on</strong> an abstract <strong>on</strong>tology mapping language[SdB05] and<br />
presents an original mapping document tree view. It allows mapping between classes,<br />
attributes, relati<strong>on</strong>s and instance, and exploits <str<strong>on</strong>g>the</str<strong>on</strong>g> capability <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> abstract mapping language<br />
to relate many to many entities by operator selecti<strong>on</strong>. It is to our knowledge <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>ly mapping tool providing this capability.<br />
17 The s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware is available at http://dome.sourceforge.net
CHAPTER 4. THE SURVEY 73<br />
In order to realize a mapping, <str<strong>on</strong>g>the</str<strong>on</strong>g> user select <strong>on</strong>e or many entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tology,<br />
and drags’n’drops <str<strong>on</strong>g>the</str<strong>on</strong>g>m to <str<strong>on</strong>g>the</str<strong>on</strong>g> entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tologies. If many entities<br />
were selected, <str<strong>on</strong>g>the</str<strong>on</strong>g>y can be combined via a selected operator. It is in <str<strong>on</strong>g>the</str<strong>on</strong>g>n possible to<br />
map, for example, <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>juncti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> male and female classes to <str<strong>on</strong>g>the</str<strong>on</strong>g> pers<strong>on</strong> class using a<br />
unique mapping rule. The mapping document tree editor gives <str<strong>on</strong>g>the</str<strong>on</strong>g> possibility <str<strong>on</strong>g>of</str<strong>on</strong>g> adding<br />
c<strong>on</strong>diti<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g> rules, for example mapping pers<strong>on</strong> to female require <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>diti<strong>on</strong> that<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> gender attribute <str<strong>on</strong>g>of</str<strong>on</strong>g> pers<strong>on</strong> is equal to ”female”. Figure 4.16 presents <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping<br />
document tree editor.<br />
Figure 4.16: Mapping document tree editor<br />
It is interesting to note that <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping module is fully integrated in <str<strong>on</strong>g>the</str<strong>on</strong>g> whole <strong>on</strong>tology<br />
management envir<strong>on</strong>ment, to realize <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping <str<strong>on</strong>g>the</str<strong>on</strong>g> user <strong>on</strong>ly has to open ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
<strong>on</strong>tology and can directly relate <str<strong>on</strong>g>the</str<strong>on</strong>g> entities.
CHAPTER 4. THE SURVEY 74<br />
4.2 Integrated Systems<br />
4.2.1 InfoSleuth<br />
Summary InfoSleuth [FNPB99, NFK + 00] is an agent-based system, which supports<br />
c<strong>on</strong>structi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> complex <strong>on</strong>tologies from smaller comp<strong>on</strong>ent <strong>on</strong>tologies so that tools tailored<br />
for <strong>on</strong>e comp<strong>on</strong>ent <strong>on</strong>tology can be used in many applicati<strong>on</strong> domains. The purpose<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system is to provide an interface to very dynamic data sources which can appear and<br />
disappear from <str<strong>on</strong>g>the</str<strong>on</strong>g> system at any given time. Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> reused <strong>on</strong>tologies include units<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> measure, chemistry knowledge, geographic metadata, and so <strong>on</strong>. Mapping is explicitly<br />
specified am<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g>se <strong>on</strong>tologies as relati<strong>on</strong>ships between c<strong>on</strong>cepts in <strong>on</strong>e <strong>on</strong>tology and<br />
related c<strong>on</strong>cepts in o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies.<br />
All mappings between <strong>on</strong>tologies are maintained by a special class <str<strong>on</strong>g>of</str<strong>on</strong>g> agents known<br />
as resource agents. A resource agent encapsulates a set <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
mapping rules, and presents that informati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r agents in c<strong>on</strong>cepts <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e<br />
or more <strong>on</strong>tologies (called domain <strong>on</strong>tologies). All mapping is encapsulated within <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
resource agents. Ontologies are represented in OKBC (Open Knowledge Base C<strong>on</strong>nectivity)<br />
[CFF + 98] format and stored in an OKBC server by a special class <str<strong>on</strong>g>of</str<strong>on</strong>g> agents called<br />
<strong>on</strong>tology agents, which provide <strong>on</strong>tology specificati<strong>on</strong>s to users (for request formulati<strong>on</strong>)<br />
and to resource agents (for mapping).<br />
The InfoSleuth architecture [NFK + 00] (Figure 4.17) c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a number <str<strong>on</strong>g>of</str<strong>on</strong>g> different<br />
types <str<strong>on</strong>g>of</str<strong>on</strong>g> agents. User agents and resource agents are <str<strong>on</strong>g>the</str<strong>on</strong>g> main agents in <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />
User agents request informati<strong>on</strong> to fulfil <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s informati<strong>on</strong> needs and resource agents<br />
provide that informati<strong>on</strong>. The remaining agents in <str<strong>on</strong>g>the</str<strong>on</strong>g> system provide <str<strong>on</strong>g>the</str<strong>on</strong>g> “glue” (or<br />
mediati<strong>on</strong>) between <str<strong>on</strong>g>the</str<strong>on</strong>g> two.<br />
• The user agents act <strong>on</strong> behalf <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user and maintain <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s state. They provide<br />
a system interface that enables users to communicate with <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />
• The resource agents wrap and activate databases and o<str<strong>on</strong>g>the</str<strong>on</strong>g>r repositories <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong>.<br />
They translate queries and data stored in external repositories between <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />
local forms and <str<strong>on</strong>g>the</str<strong>on</strong>g>ir InfoSleuth forms. There are resource agents for different types<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> data sources, including relati<strong>on</strong>al databases, flat files, and images.<br />
• Service agents provide internal informati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> agent system.<br />
Service agents include Broker agents, which collectively maintain <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> agents advertise about <str<strong>on</strong>g>the</str<strong>on</strong>g>mselves, <strong>Ontology</strong> agents, which maintain a knowledge<br />
base <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> different <strong>on</strong>tologies used for specifying requests, and M<strong>on</strong>itor<br />
agents, which m<strong>on</strong>itor <str<strong>on</strong>g>the</str<strong>on</strong>g> operati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system.<br />
• Query and analysis agents fuse and/or analyze informati<strong>on</strong> from <strong>on</strong>e or more resources<br />
into single (<strong>on</strong>e-time) results. Query and analysis agents include Multiresource<br />
query agents, which process queries that span multiple data sources, De-
CHAPTER 4. THE SURVEY 75<br />
Figure 4.17: The InfoSleuth architecture<br />
viati<strong>on</strong> detecti<strong>on</strong> agents, which m<strong>on</strong>itor streams <str<strong>on</strong>g>of</str<strong>on</strong>g> data to detect deviati<strong>on</strong>s, and<br />
o<str<strong>on</strong>g>the</str<strong>on</strong>g>r data mining agents.<br />
Multi-resource query agents query multiple heterogeneous resources. The queries<br />
posed to <str<strong>on</strong>g>the</str<strong>on</strong>g> agent are specified in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> some domain <strong>on</strong>tology. In InfoSleuth,<br />
applicati<strong>on</strong>s can use several domain <strong>on</strong>tologies. However, a query is always posed<br />
over <strong>on</strong>e domain-specific <strong>on</strong>tology.<br />
• Planning and temporal agents guide <str<strong>on</strong>g>the</str<strong>on</strong>g> request through some processing which<br />
may take place over a period <str<strong>on</strong>g>of</str<strong>on</strong>g> time, such as a l<strong>on</strong>g-term plan, a workflow, or <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
detecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> complex events. Planning and temporal agents include Subscripti<strong>on</strong><br />
agents, which m<strong>on</strong>itor how a set <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> (in a data source) changes over<br />
time, Task planning and executi<strong>on</strong> agents plan <str<strong>on</strong>g>the</str<strong>on</strong>g> processing <str<strong>on</strong>g>of</str<strong>on</strong>g> user requests in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
system, and Sentinel agents m<strong>on</strong>itor <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> and event stream for complex<br />
events.<br />
• Value mapping agents provide value mappings, i.e. mappings between values,<br />
am<strong>on</strong>g equivalent representati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> same informati<strong>on</strong>.<br />
InfoSleuth uses a variant <str<strong>on</strong>g>of</str<strong>on</strong>g> SQL in which a query c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a select, from and where<br />
clause as its query language. Functi<strong>on</strong>s are allowed in <str<strong>on</strong>g>the</str<strong>on</strong>g> select and where clauses and<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> syntax is c<strong>on</strong>sistent with that used in popular relati<strong>on</strong>al database management systems.<br />
For <str<strong>on</strong>g>the</str<strong>on</strong>g> user queries, a layer <strong>on</strong> top <str<strong>on</strong>g>of</str<strong>on</strong>g> this query language has been developed,
CHAPTER 4. THE SURVEY 76<br />
called Template-based Query Markup Language (TQML), which uses templates and materialized<br />
views to aid <str<strong>on</strong>g>the</str<strong>on</strong>g> user in creating queries.<br />
When agents come <strong>on</strong>line <str<strong>on</strong>g>the</str<strong>on</strong>g>y advertise <str<strong>on</strong>g>the</str<strong>on</strong>g>ir capabilities to a specific broker agent<br />
in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> InfoSleuth <strong>on</strong>tology. This <strong>on</strong>tology is a special <strong>on</strong>tology used for advertisement<br />
and querying <str<strong>on</strong>g>of</str<strong>on</strong>g> agents. When a query is posed to a broker agent, <str<strong>on</strong>g>the</str<strong>on</strong>g> brokering<br />
process is initiated. First, syntactic matching is d<strong>on</strong>e for example to, determine which resource<br />
agents speak <str<strong>on</strong>g>the</str<strong>on</strong>g> desired language. The semantic matching is d<strong>on</strong>e in order to find<br />
out which resources c<strong>on</strong>tain informati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> desired c<strong>on</strong>cepts. Finally, pragmatic<br />
matching is d<strong>on</strong>e to restrict <str<strong>on</strong>g>the</str<strong>on</strong>g> set <str<strong>on</strong>g>of</str<strong>on</strong>g> resources to those that, for example, have <str<strong>on</strong>g>the</str<strong>on</strong>g> correct<br />
access permissi<strong>on</strong>s.<br />
Resource agents in InfoSleuth functi<strong>on</strong> as a wrapper <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data source. A<br />
resource agent advertises <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> overall domain <strong>on</strong>tology that it supports, advertises<br />
its query capabilities and does <str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting and transforms <str<strong>on</strong>g>the</str<strong>on</strong>g> retrieved data<br />
to facts <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <strong>on</strong>tology.<br />
Mappings between different value domains are encapsulated in value mapping agents,<br />
which perform simple and complex mappings between domains. Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> complex<br />
mappings are sophisticated functi<strong>on</strong>s (e.g. differences in time intervals) and incorporating<br />
values from (multiple) external <strong>on</strong>tologies.<br />
The executi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> queries is d<strong>on</strong>e by <str<strong>on</strong>g>the</str<strong>on</strong>g> query agent. This query agent decomposes<br />
a query into a number <str<strong>on</strong>g>of</str<strong>on</strong>g> subqueries, <strong>on</strong>e for each resource agent involved in <str<strong>on</strong>g>the</str<strong>on</strong>g> query.<br />
Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, it creates a number <str<strong>on</strong>g>of</str<strong>on</strong>g> global queries for fusing <str<strong>on</strong>g>the</str<strong>on</strong>g> results <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> subqueries<br />
in order not to have redundancy in <str<strong>on</strong>g>the</str<strong>on</strong>g> overall query result.<br />
<strong>Ontology</strong> Languages <strong>Ontology</strong> agents which provide an OKBC interface to <str<strong>on</strong>g>the</str<strong>on</strong>g> knowledge<br />
base can all be c<strong>on</strong>nected to <str<strong>on</strong>g>the</str<strong>on</strong>g> InfoSleuth agent system. All <strong>on</strong>tologies within<br />
InfoSleuth are expressed using <str<strong>on</strong>g>the</str<strong>on</strong>g> OKBC knowledge model. Each resource agent must<br />
wrap an external informati<strong>on</strong> source and provide a mapping with <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <strong>on</strong>tologies<br />
currently in use in <str<strong>on</strong>g>the</str<strong>on</strong>g> InfoSleuth system.<br />
Mapping Language [NFK + 00] reports no mapping between <strong>on</strong>tologies in InfoSleuth.<br />
In fact, this was seen as future work. However, because InfoSleuth is mostly a data integrati<strong>on</strong><br />
system, it is more relevant that a mapping between data sources and <str<strong>on</strong>g>the</str<strong>on</strong>g> domain<br />
<strong>on</strong>tologies is possible. InfoSleuth does not provide a mapping language, but does provide<br />
a number <str<strong>on</strong>g>of</str<strong>on</strong>g> Java templates, which can be used for <str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers, which<br />
c<strong>on</strong>tain a procedural mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g> data schema and <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <strong>on</strong>tologies in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
agent system. An important point here is that it is possible to map to multiple domain<br />
<strong>on</strong>tologies and it would be very interesting to combine this with actual mappings between<br />
<strong>on</strong>tologies, as is d<strong>on</strong>e in ONION [MWK00], for example.
CHAPTER 4. THE SURVEY 77<br />
Mapping Patterns Although some aid in <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings through <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
Java templates is <str<strong>on</strong>g>of</str<strong>on</strong>g>fered to <str<strong>on</strong>g>the</str<strong>on</strong>g> user, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no c<strong>on</strong>cept <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns in InfoSleuth.<br />
Extensi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> InfoSleuth, which would enable mapping between <strong>on</strong>tologies, would<br />
benefit from <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns. However, we are not aware <str<strong>on</strong>g>of</str<strong>on</strong>g> any c<strong>on</strong>tinuati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> work <strong>on</strong> InfoSleuth after <str<strong>on</strong>g>the</str<strong>on</strong>g> work reported in [NFK + 00].<br />
Automati<strong>on</strong> Support There is no automati<strong>on</strong> support in creating mappings between<br />
data schemas and <strong>on</strong>tologies. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting and data fusi<strong>on</strong> is completely<br />
automated, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> data schemas and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. A query<br />
written in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> a domain <strong>on</strong>tology is automatically decomposed in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> resources,<br />
and after executi<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> results are automatically fused by a different decompositi<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> original query.<br />
Applicability to Use Cases As stated above, <str<strong>on</strong>g>the</str<strong>on</strong>g> resource agents take care <str<strong>on</strong>g>of</str<strong>on</strong>g> transforming<br />
data from <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying sources to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system and also<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> rewriting <str<strong>on</strong>g>the</str<strong>on</strong>g> query in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data schema.<br />
The querying agent fuses query results from different sources in order to remove redundancies.<br />
The fusi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> query results is based <strong>on</strong> a different decompositi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user<br />
query, which defines a uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query results and eliminates any redundancy in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
results.<br />
Implementati<strong>on</strong> The InfoSleuth agent system has been implemented in two prototype<br />
projects.<br />
There are Java templates available to make <str<strong>on</strong>g>the</str<strong>on</strong>g> development <str<strong>on</strong>g>of</str<strong>on</strong>g> new agents easier. To<br />
create a resource agent using such a template, it is generally sufficient to just supply a<br />
c<strong>on</strong>figurati<strong>on</strong> and a mapping file to complete <str<strong>on</strong>g>the</str<strong>on</strong>g> agent [NFK + 00]. It is possible to use<br />
different <strong>on</strong>tologies in an InfoSleuth system. Each OKBC-compliant Knowledge Base<br />
can be used in InfoSleuth by wrapping it using an <strong>on</strong>tology agent.<br />
Experiences [NFK + 00] reports <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> InfoSleuth in two prototype projects. The<br />
first is <str<strong>on</strong>g>the</str<strong>on</strong>g> EDEN (Envir<strong>on</strong>mental Data Exchange Network) project. The aim <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> EDEN<br />
project was to provide integrated access to envir<strong>on</strong>mental informati<strong>on</strong> resources over <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Web. EDEN posed many challenges in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> legacy databases and<br />
mappings <str<strong>on</strong>g>of</str<strong>on</strong>g> values <str<strong>on</strong>g>of</str<strong>on</strong>g> different representati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> similar informati<strong>on</strong>.<br />
Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r prototype project in which InfoSleuth was applied is MCC’s Competitive<br />
Intelligence System [NFK + 00].
CHAPTER 4. THE SURVEY 78<br />
4.2.2 ONION<br />
Summary ONION (ONtology compositION) [MWK00, MW01] is an architecture<br />
based <strong>on</strong> a sound formalism to support a scalable framework for <strong>on</strong>tology integrati<strong>on</strong><br />
that uses a graph-oriented model for <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. The special feature<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> this system is that it separates <str<strong>on</strong>g>the</str<strong>on</strong>g> logical inference engine from <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong><br />
model (<str<strong>on</strong>g>the</str<strong>on</strong>g> graph representati<strong>on</strong>) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies as much as possible. This allows for<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> accommodati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> different inference engines in <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture.<br />
In ONION <str<strong>on</strong>g>the</str<strong>on</strong>g>re are two types <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies, individual <strong>on</strong>tologies, referred to as<br />
source <strong>on</strong>tologies and <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies, which c<strong>on</strong>tain <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts and relati<strong>on</strong>ships<br />
expressed as <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules (rules that provide links across domains). Articulati<strong>on</strong><br />
rules are established to enable knowledge inter-operability, and to bridge <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic<br />
gap between heterogeneous sources. They indicate which c<strong>on</strong>cepts individually<br />
or in c<strong>on</strong>juncti<strong>on</strong>, are related in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies [MWK00]. SKAT (<str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />
Knowledge Articulati<strong>on</strong> Tool) [MWJ99] uses <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se graphs toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with<br />
term-matching and o<str<strong>on</strong>g>the</str<strong>on</strong>g>r rules to propose <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules for <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies.<br />
The source <strong>on</strong>tologies are reflected in <str<strong>on</strong>g>the</str<strong>on</strong>g> system by <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers.<br />
The mapping between <strong>on</strong>tologies is executed by <strong>on</strong>tology algebra [Wie94, MW01].<br />
Such algebra c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> three operati<strong>on</strong>s, namely, intersecti<strong>on</strong>, uni<strong>on</strong> and difference. The<br />
objective <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology algebra is to provide <str<strong>on</strong>g>the</str<strong>on</strong>g> capability for interrogating many largely<br />
semantically disjoint knowledge resources, given <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology algebra as input. The descripti<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> algebra operators is as follows:<br />
• The intersecti<strong>on</strong> produces an <strong>on</strong>tology graph, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two<br />
source <strong>on</strong>tologies with respect to a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules, generated by an <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />
generator functi<strong>on</strong>. The nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <strong>on</strong>tology are those that<br />
appear in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules. The edges are those edges between nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
intersecti<strong>on</strong> <strong>on</strong>tology that appear in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies or have been established<br />
as an <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rule. The intersecti<strong>on</strong> determines <str<strong>on</strong>g>the</str<strong>on</strong>g> porti<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> knowledge bases<br />
that deal with similar c<strong>on</strong>cepts.<br />
• The uni<strong>on</strong> operator generates a unified <strong>on</strong>tology graph comprising <str<strong>on</strong>g>the</str<strong>on</strong>g> two original<br />
<strong>on</strong>tology graphs c<strong>on</strong>nected by <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong>. The uni<strong>on</strong> presents a coherent,<br />
c<strong>on</strong>nected and semantically sound unified <strong>on</strong>tology if <str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies are<br />
c<strong>on</strong>sistent.<br />
• The difference operator, to distinguish <str<strong>on</strong>g>the</str<strong>on</strong>g> difference between two <strong>on</strong>tologies<br />
(O1 −O2) is defined as <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts and relati<strong>on</strong>ships <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first <strong>on</strong>tology that have<br />
not been determined to exist in <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d. This operati<strong>on</strong> allows a local <strong>on</strong>tology<br />
maintainer to determine <str<strong>on</strong>g>the</str<strong>on</strong>g> extent <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e’s <strong>on</strong>tology that remains independent <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r domain <strong>on</strong>tologies so that it can be independently manipulated<br />
without having to update <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong>.
CHAPTER 4. THE SURVEY 79<br />
Figure 4.18: The comp<strong>on</strong>ents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION system<br />
ONION tries to separate as much as possible <str<strong>on</strong>g>the</str<strong>on</strong>g> logical inference engine from <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
representati<strong>on</strong> model <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, allowing <str<strong>on</strong>g>the</str<strong>on</strong>g> accommodati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> different inference<br />
engines. It also uses <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies to inter-operate am<strong>on</strong>g <strong>on</strong>tologies.<br />
These <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies can be organized in a hierarchical fashi<strong>on</strong>. For example, an<br />
<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology can be created for two o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies that unify different<br />
source <strong>on</strong>tologies. The <strong>on</strong>tology mapping is based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> graph mapping, and at <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
same time, domain experts can define a variety <str<strong>on</strong>g>of</str<strong>on</strong>g> fuzzy matching.<br />
<strong>Ontology</strong> Language Before <strong>on</strong>tologies are integrated in <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION system, <str<strong>on</strong>g>the</str<strong>on</strong>g>y are<br />
translated to <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION graph-based c<strong>on</strong>ceptual model. An <strong>on</strong>tology O = (G, R) is<br />
represented as a directed labeled graph G and a set <str<strong>on</strong>g>of</str<strong>on</strong>g> rules R. The graph G = (V, E)<br />
c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a finite set <str<strong>on</strong>g>of</str<strong>on</strong>g> nodes V and a finite set <str<strong>on</strong>g>of</str<strong>on</strong>g> edges E. Nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> graph corresp<strong>on</strong>d<br />
to c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. Edges corresp<strong>on</strong>d to semantic relati<strong>on</strong>ships between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cepts.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION c<strong>on</strong>ceptual model, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are several semantic relati<strong>on</strong>ships with a builtin<br />
meaning, namely {SubClassOf, P <str<strong>on</strong>g>art</str<strong>on</strong>g>Of, AttributeOf, InstanceOf, V alueOf}.<br />
Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> user can create user-defined semantic relati<strong>on</strong>ships. The user <str<strong>on</strong>g>the</str<strong>on</strong>g>n has<br />
to axiomatize <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship. The better <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship<br />
is axiomatized, <str<strong>on</strong>g>the</str<strong>on</strong>g> more accurate <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> will be. A more detailed descripti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> built-in semantic relati<strong>on</strong>ships can be found in [MW01].<br />
An <strong>on</strong>tology graph can be represented in <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web language RDF [LS99],<br />
because RDF has a graph-based data model. The set <str<strong>on</strong>g>of</str<strong>on</strong>g> logical rules R are expressed as
CHAPTER 4. THE SURVEY 80<br />
Horn clauses.<br />
An <strong>on</strong>tology in any source language can be translated to <str<strong>on</strong>g>the</str<strong>on</strong>g> graph-based model using<br />
a custom wrapper. It could happen that during <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION c<strong>on</strong>ceptual<br />
model, some semantic informati<strong>on</strong> is lost. This informati<strong>on</strong> can no l<strong>on</strong>ger be used for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tologies, however, <str<strong>on</strong>g>the</str<strong>on</strong>g> user can still access this<br />
informati<strong>on</strong> by querying <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying <strong>on</strong>tology directly.<br />
Mapping Language Inter-operati<strong>on</strong> in ONION is achieved through <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />
<strong>on</strong>tologies. An <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology denotes <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> two source<br />
<strong>on</strong>tologies. The intersecti<strong>on</strong> is an operati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> so-called <strong>on</strong>tology algebra [Wie94].<br />
The <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology is c<strong>on</strong>structed based <strong>on</strong> so-called <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules. An <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />
rule specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies. An<br />
<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rule is a rule <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> form P ⇒ Q, which can be intuitively read as “P semantically<br />
implies Q”. In o<str<strong>on</strong>g>the</str<strong>on</strong>g>r words, P is a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Q, or “P is subsumed by<br />
Q”.<br />
ONION distinguishes between simple and compound rules. A simple <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />
rule, which specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between nodes in two <strong>on</strong>tology graphs, is <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
form O1.A ⇒ O2.B, where A depicts a node in <strong>on</strong>tology O1 and B depicts a node in<br />
<strong>on</strong>tology O2. The rule specifies <str<strong>on</strong>g>the</str<strong>on</strong>g> fact that A is a specializati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> B. This rule translates<br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> simplest semantic bridge, <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic implicati<strong>on</strong> bridge, which is an edge<br />
(A, “SIBridge”, B), c<strong>on</strong>necting <str<strong>on</strong>g>the</str<strong>on</strong>g> two nodes. Compound rules incorporate c<strong>on</strong>juncti<strong>on</strong><br />
and/or disjuncti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> rule. Such rules are modeled by adding <strong>on</strong>e or more nodes to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology and creating <str<strong>on</strong>g>the</str<strong>on</strong>g> appropriate semantic implicati<strong>on</strong> bridges between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> nodes in <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> new node in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology. For more<br />
informati<strong>on</strong>, see [MWK00].<br />
In order to allow for value transformati<strong>on</strong>s, ONION <str<strong>on</strong>g>of</str<strong>on</strong>g>fers <str<strong>on</strong>g>the</str<strong>on</strong>g> possibility <str<strong>on</strong>g>of</str<strong>on</strong>g> associating<br />
a functi<strong>on</strong> with an edge in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology. Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> such functi<strong>on</strong>s are<br />
currency c<strong>on</strong>versi<strong>on</strong> and c<strong>on</strong>versi<strong>on</strong> between different distance measures.<br />
Automati<strong>on</strong> Support The <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules are created in a semi-automatic process<br />
with SKAT [MWJ99] (Semantic Knowledge Annotati<strong>on</strong> Tool), which proposes <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong><br />
rules to <str<strong>on</strong>g>the</str<strong>on</strong>g> expert and <str<strong>on</strong>g>the</str<strong>on</strong>g> expert can ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r accept or decline <str<strong>on</strong>g>the</str<strong>on</strong>g>se proposals and also<br />
specify rules which are not proposed by <str<strong>on</strong>g>the</str<strong>on</strong>g> tool.<br />
SKAT does matching <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two source <strong>on</strong>tologies using both term matching and<br />
structural matching.<br />
Applicability to use cases ONION is a system for <str<strong>on</strong>g>the</str<strong>on</strong>g> unificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogeneous<br />
<strong>on</strong>tologies through <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies with <str<strong>on</strong>g>the</str<strong>on</strong>g> purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> query processing.<br />
The resulting <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology is presented to <str<strong>on</strong>g>the</str<strong>on</strong>g> user and is used (toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with <str<strong>on</strong>g>the</str<strong>on</strong>g>
CHAPTER 4. THE SURVEY 81<br />
source <strong>on</strong>tologies) by <str<strong>on</strong>g>the</str<strong>on</strong>g> user for querying. The ONION query system translates a query<br />
<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology to <str<strong>on</strong>g>the</str<strong>on</strong>g> actual source <strong>on</strong>tologies and executes <str<strong>on</strong>g>the</str<strong>on</strong>g> query <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
underlying <strong>on</strong>tologies. The results are <str<strong>on</strong>g>the</str<strong>on</strong>g>n translated back to <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology.<br />
ONION does not propose a strategy for unifying instances. The <strong>on</strong>tology obtained<br />
from applying <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> operator can be seen as <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> a merge operati<strong>on</strong>.<br />
The complete mapping process is included in ONION. In fact, <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping is just<br />
<strong>on</strong>e aspect <str<strong>on</strong>g>of</str<strong>on</strong>g> ONION, because ONION also provides <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time envir<strong>on</strong>ment for data<br />
integrati<strong>on</strong>.<br />
Implementati<strong>on</strong> The ONION architecture [MWK00, MW01] (Figure 4.18, taken from<br />
[MW01]) c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> four comp<strong>on</strong>ents:<br />
• The ONION data layer. This layer c<strong>on</strong>tains <str<strong>on</strong>g>the</str<strong>on</strong>g> wrappers for <str<strong>on</strong>g>the</str<strong>on</strong>g> external sources<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies that form <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Bridges between <str<strong>on</strong>g>the</str<strong>on</strong>g> sources.<br />
• The ONION viewer. This is <str<strong>on</strong>g>the</str<strong>on</strong>g> user interface comp<strong>on</strong>ent <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system. The viewer<br />
visualizes both <str<strong>on</strong>g>the</str<strong>on</strong>g> source and <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tologies.<br />
• The ONION query system. The query system translates queries formulated in term<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> an <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <strong>on</strong>tology into a query executi<strong>on</strong> plan and executes <str<strong>on</strong>g>the</str<strong>on</strong>g> query.<br />
• The Articulati<strong>on</strong> Engine. The <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> generator takes <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules proposed<br />
by SKAT [MWJ99], <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Knowledge Articulati<strong>on</strong> Tool, and generates<br />
sets <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules, which are forwarded to <str<strong>on</strong>g>the</str<strong>on</strong>g> expert for c<strong>on</strong>firmati<strong>on</strong>.<br />
The different comp<strong>on</strong>ents in <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture have been implemented as a research<br />
prototype to support a PhD <str<strong>on</strong>g>the</str<strong>on</strong>g>sis.<br />
Experiences [MWK00, MW01] do not show any real experiences with <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> ONION besides toy examples described in <str<strong>on</strong>g>the</str<strong>on</strong>g> papers.<br />
4.2.3 OBSERVER<br />
Summary OBSERVER (<strong>Ontology</strong> Based System Enhanced with Relati<strong>on</strong>ships for Vocabulary<br />
hEterogeneity Resoluti<strong>on</strong>) [MIKS00] is a system which is intended to overcome<br />
problems with heterogeneity between distributed data repositories by using comp<strong>on</strong>ent<br />
<strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> explicit relati<strong>on</strong>ships between <str<strong>on</strong>g>the</str<strong>on</strong>g>se comp<strong>on</strong>ents. OBSERVER<br />
presents an architecture c<strong>on</strong>sisting <str<strong>on</strong>g>of</str<strong>on</strong>g> comp<strong>on</strong>ent nodes, each <str<strong>on</strong>g>of</str<strong>on</strong>g> which has its own <strong>on</strong>tology,<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> Inter-<strong>on</strong>tology Relati<strong>on</strong>ship Manager (IRM), which maintains mappings<br />
between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies at <str<strong>on</strong>g>the</str<strong>on</strong>g> different comp<strong>on</strong>ent nodes. Besides <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology, each comp<strong>on</strong>ent<br />
node c<strong>on</strong>tains a number <str<strong>on</strong>g>of</str<strong>on</strong>g> data repositories al<strong>on</strong>g with mappings to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology,
CHAPTER 4. THE SURVEY 82<br />
to enable semantic querying <str<strong>on</strong>g>of</str<strong>on</strong>g> data residing in <str<strong>on</strong>g>the</str<strong>on</strong>g>se repositories. When o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ents<br />
need to be queried, <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM provides mappings to <strong>on</strong>tologies <str<strong>on</strong>g>of</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ent<br />
nodes in order to enable querying. The user views <str<strong>on</strong>g>the</str<strong>on</strong>g> data in <str<strong>on</strong>g>the</str<strong>on</strong>g> system through its own<br />
local <strong>on</strong>tology, located at <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s comp<strong>on</strong>ent node.<br />
OBSERVER uses a comp<strong>on</strong>ent-based approach to <strong>on</strong>tology mapping. It provides<br />
brokering capabilities across domain <strong>on</strong>tologies to enhance distributed <strong>on</strong>tology querying,<br />
thus avoiding <str<strong>on</strong>g>the</str<strong>on</strong>g> need to have a global schema or collecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>cepts.<br />
OBSERVER uses multiple pre-existing <strong>on</strong>tologies to access heterogeneous, distributed<br />
and independently developed data repositories. Each repository is described by<br />
means <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>e or more <strong>on</strong>tologies expressed using <str<strong>on</strong>g>the</str<strong>on</strong>g> Descripti<strong>on</strong> Logic (DL) system<br />
CLASSIC. The informati<strong>on</strong> requested from OBSERVER is expressed according to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
user’s domain <strong>on</strong>tology, also expressed using DL. DL allows matching <str<strong>on</strong>g>the</str<strong>on</strong>g> query with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
available relevant data repositories, as well as translating it to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies used in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
local repositories.<br />
The system c<strong>on</strong>tains a number <str<strong>on</strong>g>of</str<strong>on</strong>g> comp<strong>on</strong>ent nodes, <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> which is <str<strong>on</strong>g>the</str<strong>on</strong>g> user node.<br />
Each node has an <strong>on</strong>tology server that provides definiti<strong>on</strong>s for <str<strong>on</strong>g>the</str<strong>on</strong>g> terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
and retrieves data underlying <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology in <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ent node. If <str<strong>on</strong>g>the</str<strong>on</strong>g> user wants to<br />
expand his query over different <strong>on</strong>tology servers, <str<strong>on</strong>g>the</str<strong>on</strong>g> original query needs to be translated<br />
from <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabulary <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s <strong>on</strong>tology into <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabulary <str<strong>on</strong>g>of</str<strong>on</strong>g> ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r’s comp<strong>on</strong>ent <strong>on</strong>tology.<br />
This is d<strong>on</strong>e in <str<strong>on</strong>g>the</str<strong>on</strong>g> following way: <str<strong>on</strong>g>the</str<strong>on</strong>g> source and target <strong>on</strong>tologies are integrated,<br />
after which all terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> query for which a syn<strong>on</strong>ym exists in <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology, are<br />
replaced by this syn<strong>on</strong>ym and all o<str<strong>on</strong>g>the</str<strong>on</strong>g>r terms are replaced by semantically similar expressi<strong>on</strong><br />
that could lead to a full translati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user query. Each such term for which <str<strong>on</strong>g>the</str<strong>on</strong>g>re<br />
is no syn<strong>on</strong>ym in <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ent <strong>on</strong>tology is replaced by <str<strong>on</strong>g>the</str<strong>on</strong>g> intersecti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> its immediate<br />
parents or <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> its immediate children, recursively, until <str<strong>on</strong>g>the</str<strong>on</strong>g> rewritten query c<strong>on</strong>tains<br />
<strong>on</strong>ly terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ent <strong>on</strong>tology. Several alternative translati<strong>on</strong>s might be<br />
generated in this process, which are not exact in <str<strong>on</strong>g>the</str<strong>on</strong>g> case where terms were replace with<br />
semantically similar expressi<strong>on</strong> and not syn<strong>on</strong>yms.<br />
A method for evaluating <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> loss for <str<strong>on</strong>g>the</str<strong>on</strong>g> case <str<strong>on</strong>g>of</str<strong>on</strong>g> inexact translati<strong>on</strong>s was<br />
developed, with <str<strong>on</strong>g>the</str<strong>on</strong>g> purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> enabling <str<strong>on</strong>g>the</str<strong>on</strong>g> system to choose ”<str<strong>on</strong>g>the</str<strong>on</strong>g> best” am<strong>on</strong>g alternative<br />
translati<strong>on</strong>s. The user can opt for translati<strong>on</strong>s for which <str<strong>on</strong>g>the</str<strong>on</strong>g> informati<strong>on</strong> loss is smaller<br />
than a certain value by setting a certain threshold. By setting this parameter to 0, <str<strong>on</strong>g>the</str<strong>on</strong>g> user<br />
specifies no loss at all. The loss <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> threshold is used by <str<strong>on</strong>g>the</str<strong>on</strong>g> query processor,<br />
which discards queries exceeding <str<strong>on</strong>g>the</str<strong>on</strong>g> threshold.<br />
The informati<strong>on</strong> loss is computed based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong>ly encountered metrics in<br />
Informati<strong>on</strong> Retrieval, precisi<strong>on</strong> and recall. The idea is to take into account both <str<strong>on</strong>g>the</str<strong>on</strong>g>se<br />
metrics, that is <str<strong>on</strong>g>the</str<strong>on</strong>g> loss in precisi<strong>on</strong> (percentage <str<strong>on</strong>g>of</str<strong>on</strong>g> retrieved objects which are not relevant)<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> loss in recall (percentage <str<strong>on</strong>g>of</str<strong>on</strong>g> relevant objects which are not retrieved), because it is<br />
not known a priori <str<strong>on</strong>g>the</str<strong>on</strong>g> preference <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> user towards <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se measures. The formula<br />
for computing informati<strong>on</strong> loss based <strong>on</strong> precisi<strong>on</strong> loss and recall loss c<strong>on</strong>tains a weight<br />
which expresses <str<strong>on</strong>g>the</str<strong>on</strong>g> bias towards <strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two metrics, which can be set by <str<strong>on</strong>g>the</str<strong>on</strong>g> user.
CHAPTER 4. THE SURVEY 83<br />
However <str<strong>on</strong>g>the</str<strong>on</strong>g>se metrics, i.e. precisi<strong>on</strong> and recall, had to be adapted in order to give<br />
higher priority to semantic relati<strong>on</strong>ships than to those suggested by <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying extensi<strong>on</strong>s.<br />
This problem arises due to <str<strong>on</strong>g>the</str<strong>on</strong>g> fact that extensi<strong>on</strong>s are coming from different<br />
<strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong>al relati<strong>on</strong>ships may not reflect <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic relati<strong>on</strong>ships.<br />
For example, a term in a user <strong>on</strong>tology which semantically subsumes a term in <str<strong>on</strong>g>the</str<strong>on</strong>g> target<br />
<strong>on</strong>tology may have a smaller extensi<strong>on</strong> than <str<strong>on</strong>g>the</str<strong>on</strong>g> child term.<br />
An Inter-<strong>on</strong>tology Relati<strong>on</strong>ship Manager (IRM) provides <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong>s between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
terms am<strong>on</strong>g <str<strong>on</strong>g>the</str<strong>on</strong>g> different comp<strong>on</strong>ent <strong>on</strong>tologies. The IRM effectively c<strong>on</strong>tains a <strong>on</strong>eto-<strong>on</strong>e<br />
mapping between any two comp<strong>on</strong>ent <strong>on</strong>tologies. This module is able to deal<br />
with (intenti<strong>on</strong>al) Syn<strong>on</strong>ym, Hyp<strong>on</strong>ym, Hypernym, Overlap, Disjoint and Covering inter<strong>on</strong>tology<br />
relati<strong>on</strong>ships. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM is also able to deal with extensi<strong>on</strong>al relati<strong>on</strong>ships<br />
between <strong>on</strong>tologies through <str<strong>on</strong>g>the</str<strong>on</strong>g> use <str<strong>on</strong>g>of</str<strong>on</strong>g> so-called transformer functi<strong>on</strong>s.<br />
The user submits a query to <str<strong>on</strong>g>the</str<strong>on</strong>g> query processor in its own comp<strong>on</strong>ent node (in fact,<br />
each comp<strong>on</strong>ent node has a query processor). The query processor first uses <str<strong>on</strong>g>the</str<strong>on</strong>g> local<br />
<strong>on</strong>tology server to translate <str<strong>on</strong>g>the</str<strong>on</strong>g> query into queries <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> local data repositories and <str<strong>on</strong>g>the</str<strong>on</strong>g>n<br />
execute <str<strong>on</strong>g>the</str<strong>on</strong>g>m, after which <str<strong>on</strong>g>the</str<strong>on</strong>g> user can choose to incrementally increase <str<strong>on</strong>g>the</str<strong>on</strong>g> query to<br />
multiple <strong>on</strong>tologies. The query processor <str<strong>on</strong>g>the</str<strong>on</strong>g>n uses <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM to translate <str<strong>on</strong>g>the</str<strong>on</strong>g> query into<br />
terms used by <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ent <strong>on</strong>tologies and retrieve <str<strong>on</strong>g>the</str<strong>on</strong>g> results from <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
servers at <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ent nodes.<br />
Querying in OBSERVER c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> following three steps:<br />
1. The user c<strong>on</strong>structs <str<strong>on</strong>g>the</str<strong>on</strong>g> query using terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> user’s <strong>on</strong>tology.<br />
2. The query processor uses <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology server to access <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data at <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
user’s node. The query is executed against <str<strong>on</strong>g>the</str<strong>on</strong>g> local data repositories.<br />
3. In a process <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>trolled query expansi<strong>on</strong> to new <strong>on</strong>tologies <str<strong>on</strong>g>the</str<strong>on</strong>g> user can specify<br />
whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r he/she is satisfied with <str<strong>on</strong>g>the</str<strong>on</strong>g> query results or whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> query should be expanded<br />
to o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ent <strong>on</strong>tologies. In this case, <str<strong>on</strong>g>the</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships<br />
retrieved from <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM are used to rewrite queries and to transform query results.<br />
The <strong>on</strong>tology server can be queried in two ways. Informati<strong>on</strong> about <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology itself<br />
can be retrieved and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology server can answer queries formulated over an <strong>on</strong>tology<br />
using <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings to <str<strong>on</strong>g>the</str<strong>on</strong>g> different data sources and <str<strong>on</strong>g>the</str<strong>on</strong>g> wrappers available for each data<br />
source. The query capabilities <str<strong>on</strong>g>of</str<strong>on</strong>g> each data source are c<strong>on</strong>sulted by <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology server,<br />
which creates a query plan and invokes <str<strong>on</strong>g>the</str<strong>on</strong>g> wrappers to retrieve <str<strong>on</strong>g>the</str<strong>on</strong>g> data from <str<strong>on</strong>g>the</str<strong>on</strong>g> sources.<br />
In principle, <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tology server is queried initially. The user can <str<strong>on</strong>g>the</str<strong>on</strong>g>n<br />
choose to incrementally expand <str<strong>on</strong>g>the</str<strong>on</strong>g> query over multiple <strong>on</strong>tologies in order to retrieve<br />
more results for <str<strong>on</strong>g>the</str<strong>on</strong>g> query.<br />
<strong>Ontology</strong> Languages Ontologies, as well as mappings between <strong>on</strong>tologies are specified<br />
using <str<strong>on</strong>g>the</str<strong>on</strong>g> Descripti<strong>on</strong> Logic system CLASSIC.
CHAPTER 4. THE SURVEY 84<br />
Figure 4.19: The general OBSERVER architecture [MIKS00]
CHAPTER 4. THE SURVEY 85<br />
Mapping Language In OBSERVER, <str<strong>on</strong>g>the</str<strong>on</strong>g>re exist two types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings, namely <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
mappings between data repositories and <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
We will first describe <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between data repositories and <strong>on</strong>tologies,<br />
after which we describe <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships.<br />
A data source is seen as c<strong>on</strong>sisting <str<strong>on</strong>g>of</str<strong>on</strong>g> entities and attributes (in <str<strong>on</strong>g>the</str<strong>on</strong>g> Entity-Relati<strong>on</strong>ship<br />
[Che79] sense <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> terms). Mapping between data sources and <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology is represented<br />
by associating each term in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology with a number <str<strong>on</strong>g>of</str<strong>on</strong>g> Extended Relati<strong>on</strong><br />
Algebra (ERA) expressi<strong>on</strong>s. ERA is used as an intermediate language between <str<strong>on</strong>g>the</str<strong>on</strong>g> Descripti<strong>on</strong><br />
Logic expressi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology and <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data repositories. The<br />
wrapper is resp<strong>on</strong>sible for <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> between ERA and <str<strong>on</strong>g>the</str<strong>on</strong>g> data repository itself,<br />
which is straightforward if <str<strong>on</strong>g>the</str<strong>on</strong>g> data source is a relati<strong>on</strong>al database. It is interesting to<br />
note that in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping to roles in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology, ERA allows functi<strong>on</strong>s, which typically<br />
represent value transformati<strong>on</strong>s.<br />
OBSERVER deals with a number <str<strong>on</strong>g>of</str<strong>on</strong>g> types <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships in order to<br />
enable inter-operability:<br />
• Syn<strong>on</strong>ym. Two syn<strong>on</strong>ymous terms have <str<strong>on</strong>g>the</str<strong>on</strong>g> same semantics, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> same intended<br />
meaning. This does not guarantee that <str<strong>on</strong>g>the</str<strong>on</strong>g>y have <str<strong>on</strong>g>the</str<strong>on</strong>g> same extensi<strong>on</strong>.<br />
• Hyp<strong>on</strong>ym. A term is a hyp<strong>on</strong>ym <str<strong>on</strong>g>of</str<strong>on</strong>g> ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r term if it is less general, i.e. a term in<br />
<strong>on</strong>e <strong>on</strong>tology subsumes a term in ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology.<br />
• Hypernym. A term is a hypernym <str<strong>on</strong>g>of</str<strong>on</strong>g> ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r term if it is more general, i.e. a term<br />
in <strong>on</strong>e <strong>on</strong>tology is subsumed by a term in ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology.<br />
• Overlap. This means <str<strong>on</strong>g>the</str<strong>on</strong>g> two terms have an overlap, i.e. a n<strong>on</strong>-empty intersecti<strong>on</strong>.<br />
In OBSERVER, <str<strong>on</strong>g>the</str<strong>on</strong>g> overlap between terms is usually indicated with a percentage,<br />
which can be used to estimate <str<strong>on</strong>g>the</str<strong>on</strong>g> loss <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> in a query translati<strong>on</strong>.<br />
• Disjoint. This means <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no intersecti<strong>on</strong> between <str<strong>on</strong>g>the</str<strong>on</strong>g> two terms.<br />
• Covering. When a term in <strong>on</strong>e <strong>on</strong>tology corresp<strong>on</strong>ds to a uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
<strong>on</strong>tology, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> meaning <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> term in <strong>on</strong>e <strong>on</strong>tology is covered by <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
terms from <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r <strong>on</strong>tology. There does not exist an object represented by <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
term in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e <strong>on</strong>tology, which is not represented by <str<strong>on</strong>g>the</str<strong>on</strong>g> uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> given children<br />
terms.<br />
The above menti<strong>on</strong>ed inter-<strong>on</strong>tology relati<strong>on</strong>ships explicate <str<strong>on</strong>g>the</str<strong>on</strong>g> intenti<strong>on</strong>al relati<strong>on</strong>ship<br />
between terms in two <strong>on</strong>tologies. However, when an intenti<strong>on</strong>al relati<strong>on</strong>ship between<br />
terms is true, it does not mean that this relati<strong>on</strong>ship holds also for <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong>s (i.e. sets<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> instances) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. For <str<strong>on</strong>g>the</str<strong>on</strong>g> extensi<strong>on</strong>al level, a set <str<strong>on</strong>g>of</str<strong>on</strong>g> transformer functi<strong>on</strong>s<br />
between roles in different <strong>on</strong>tologies is used. These functi<strong>on</strong>s are used for both instance<br />
transformati<strong>on</strong> and instance unificati<strong>on</strong>.
CHAPTER 4. THE SURVEY 86<br />
The Inter-<strong>on</strong>tology Relati<strong>on</strong>ship Manager can be used to discover sets <str<strong>on</strong>g>of</str<strong>on</strong>g> related comp<strong>on</strong>ent<br />
<strong>on</strong>tologies, to retrieve related terms between <strong>on</strong>tologies and to perform value transformati<strong>on</strong>s<br />
from <strong>on</strong>e <strong>on</strong>tology representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r.<br />
Automati<strong>on</strong> Support There seems to be no automati<strong>on</strong> in creating <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between<br />
<strong>on</strong>tologies and/or data sources. However, known matching algorithms could be<br />
easily used to identify similarities between <strong>on</strong>tologies.<br />
The query processing, <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, is completely automated with <str<strong>on</strong>g>the</str<strong>on</strong>g> excepti<strong>on</strong><br />
that <str<strong>on</strong>g>the</str<strong>on</strong>g> user is still required in <str<strong>on</strong>g>the</str<strong>on</strong>g> incremental querying process to specify whe<str<strong>on</strong>g>the</str<strong>on</strong>g>r o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
comp<strong>on</strong>ent nodes need to be queried and he/she might also intervene to set some parameters<br />
like <str<strong>on</strong>g>the</str<strong>on</strong>g> bias towards precisi<strong>on</strong> or recall in <str<strong>on</strong>g>the</str<strong>on</strong>g> computati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> loss and<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> threshold for informati<strong>on</strong> loss in a query translati<strong>on</strong>. As already menti<strong>on</strong>ed, if <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
used desires <str<strong>on</strong>g>the</str<strong>on</strong>g> query to be answered by accessing o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ents as well, <str<strong>on</strong>g>the</str<strong>on</strong>g> query<br />
is translated into <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language(s) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ents. If <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no exact<br />
translati<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g>re might be alternative translati<strong>on</strong>s which are generated and evaluated by<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> system and <str<strong>on</strong>g>the</str<strong>on</strong>g> optimal <strong>on</strong>e, which does not surpass <str<strong>on</strong>g>the</str<strong>on</strong>g> threshold set by <str<strong>on</strong>g>the</str<strong>on</strong>g> user for<br />
informati<strong>on</strong> loss, is chosen.<br />
Applicability to Use Cases Instance transformati<strong>on</strong> and instance unificati<strong>on</strong> are both<br />
performed in <str<strong>on</strong>g>the</str<strong>on</strong>g> querying process. The query processor is resp<strong>on</strong>sible for transforming<br />
and correlating query results from <str<strong>on</strong>g>the</str<strong>on</strong>g> target <strong>on</strong>tology into <str<strong>on</strong>g>the</str<strong>on</strong>g> user <strong>on</strong>tology.<br />
The emphasis in OBSERVER is really <strong>on</strong> query rewriting. The relati<strong>on</strong>ships between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, expressed using Descripti<strong>on</strong> Logics, are used to rewrite <str<strong>on</strong>g>the</str<strong>on</strong>g> queries from<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> user’s <strong>on</strong>tology to <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ent <strong>on</strong>tology.<br />
OBSERVER is a data integrati<strong>on</strong> system and as such provides no explicit support for<br />
<strong>on</strong>tology merging, although in <str<strong>on</strong>g>the</str<strong>on</strong>g> query processing <strong>on</strong>tologies are automatically integrated<br />
based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships retrieved from <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM.<br />
Implementati<strong>on</strong> The OBSERVER architecture, depicted in Figure 4.19, c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a<br />
number <str<strong>on</strong>g>of</str<strong>on</strong>g> comp<strong>on</strong>ent nodes and <str<strong>on</strong>g>the</str<strong>on</strong>g> IRM node. A comp<strong>on</strong>ent node c<strong>on</strong>tains an <strong>Ontology</strong><br />
Server which provides for <str<strong>on</strong>g>the</str<strong>on</strong>g> interacti<strong>on</strong> with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources. It uses<br />
a repository <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings to relate <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources and to be able to<br />
translate queries <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology to queries <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying data sources. The architecture<br />
c<strong>on</strong>tains <strong>on</strong>e Inter-<strong>Ontology</strong> Relati<strong>on</strong>ship Manager (IRM), which enables semantic<br />
inter-operati<strong>on</strong> between comp<strong>on</strong>ent nodes by maintaining <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>tologies.<br />
OBSERVER has been implemented as a prototype for <str<strong>on</strong>g>the</str<strong>on</strong>g> access <str<strong>on</strong>g>of</str<strong>on</strong>g> distributed heterogeneous<br />
data sources in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> bibliographic data.
CHAPTER 4. THE SURVEY 87<br />
Experiences The authors have reused different existing <strong>on</strong>tologies in <str<strong>on</strong>g>the</str<strong>on</strong>g> area <str<strong>on</strong>g>of</str<strong>on</strong>g> bibliographic<br />
data and represented <str<strong>on</strong>g>the</str<strong>on</strong>g>m in CLASSIC for integrati<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> OBSERVER architecture.<br />
Real-life <strong>on</strong>tologies and data repositories were used in <str<strong>on</strong>g>the</str<strong>on</strong>g> prototype. It turned<br />
out that <str<strong>on</strong>g>the</str<strong>on</strong>g> time required to access <str<strong>on</strong>g>the</str<strong>on</strong>g> distributed data repositories was <str<strong>on</strong>g>the</str<strong>on</strong>g> bottleneck for<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> prototype.<br />
4.2.4 MOMIS<br />
Summary MOMIS (Mediator envirOnment for Multiple Informati<strong>on</strong> Sources) approach<br />
[BCV99, BCVB01] is an approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogeneous data sources<br />
using a global <strong>on</strong>tology, which is <str<strong>on</strong>g>the</str<strong>on</strong>g> result <str<strong>on</strong>g>of</str<strong>on</strong>g> a merge <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> local data schemas.<br />
The goal <str<strong>on</strong>g>of</str<strong>on</strong>g> MOMIS is to give <str<strong>on</strong>g>the</str<strong>on</strong>g> user a global virtual view (cf. [Hul97]) <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
informati<strong>on</strong> coming from heterogeneous informati<strong>on</strong> sources. MOMIS creates a global<br />
mediati<strong>on</strong> schema (<strong>on</strong>tology) for <str<strong>on</strong>g>the</str<strong>on</strong>g> structured and semi-structured heterogeneous data<br />
sources, in order to provide to <str<strong>on</strong>g>the</str<strong>on</strong>g> user a uniform query interface to <str<strong>on</strong>g>the</str<strong>on</strong>g>se sources.<br />
The first step in <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global mediati<strong>on</strong> schema is <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Comm<strong>on</strong> Thesaurus from <str<strong>on</strong>g>the</str<strong>on</strong>g> disparate data sources. To do this, first a wrapper is created<br />
for each data source in <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3 language. ODL I 3 is an object-oriented language with<br />
an underlying Descripti<strong>on</strong> Logic language OLCD, which enables making inferences (e.g.<br />
subsumpti<strong>on</strong>) about <str<strong>on</strong>g>the</str<strong>on</strong>g> classes expressed in that language.<br />
Using <str<strong>on</strong>g>the</str<strong>on</strong>g> disparate schemas, a Comm<strong>on</strong> Thesaurus is created, which describes intraand<br />
inter-schema knowledge about ODL I 3 classes and attributes <str<strong>on</strong>g>of</str<strong>on</strong>g> source schemas. The<br />
Comm<strong>on</strong> Thesaurus is built in an incremental process in which relati<strong>on</strong>ships (between<br />
classes) are added based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source schemas, lexical properties <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
source classes and attributes (e.g. WordNet [Fel99] can be used to identify possible syn<strong>on</strong>yms),<br />
relati<strong>on</strong>ships supplied by <str<strong>on</strong>g>the</str<strong>on</strong>g> designer, and relati<strong>on</strong>ships inferred by <str<strong>on</strong>g>the</str<strong>on</strong>g> inference<br />
engine.<br />
Once <str<strong>on</strong>g>the</str<strong>on</strong>g> Comm<strong>on</strong> Thesaurus has been created, a tree <str<strong>on</strong>g>of</str<strong>on</strong>g> affinity clusters is created,<br />
in which c<strong>on</strong>cepts are clustered based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir (name and structural) affinity. The name<br />
affinity coefficient is calculated based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> terminological relati<strong>on</strong>ships between two<br />
classes. The structural affinity coefficient between two classes is calculated based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
level <str<strong>on</strong>g>of</str<strong>on</strong>g> matching <str<strong>on</strong>g>of</str<strong>on</strong>g> attribute relati<strong>on</strong>ships in <str<strong>on</strong>g>the</str<strong>on</strong>g> Comm<strong>on</strong> Thesaurus. The sum <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se<br />
two coefficients is <str<strong>on</strong>g>the</str<strong>on</strong>g> global affinity coefficient, which is used to c<strong>on</strong>struct <str<strong>on</strong>g>the</str<strong>on</strong>g> affinity<br />
tree, in which c<strong>on</strong>cepts with a high affinity are clustered toge<str<strong>on</strong>g>the</str<strong>on</strong>g>r.<br />
For each cluster in <str<strong>on</strong>g>the</str<strong>on</strong>g> affinity tree, a global class is (interactively) created. For each<br />
global class a mapping (expressed in ODL I 3) is maintained to all <str<strong>on</strong>g>the</str<strong>on</strong>g> source classes.<br />
If we compare MOMIS with OBSERVER we can say that OBSERVER takes a minimalist<br />
approach to <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> inter-<strong>on</strong>tology relati<strong>on</strong>ships, specifying <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
bare minimum required for query processing, whereas MOMIS tries to identify all possible<br />
relati<strong>on</strong>ships between a set <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies, integrating <str<strong>on</strong>g>the</str<strong>on</strong>g>m in <strong>on</strong>e global <strong>on</strong>tology.
CHAPTER 4. THE SURVEY 88<br />
<strong>Ontology</strong> Languages A wrapper translates each data schema to <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3 representati<strong>on</strong>.<br />
MOMIS also deals with semi-structured data by extracting object patterns, which<br />
are used as schema informati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> source to generate <str<strong>on</strong>g>the</str<strong>on</strong>g> corresp<strong>on</strong>ding ODL I 3 descripti<strong>on</strong>.<br />
ODL I 3 is an object-oriented language with a translati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> OLCD Descripti<strong>on</strong><br />
Logic languages in order to allow inferencing. OLCD is a KL-ONE [BS85] like <strong>on</strong>tology<br />
language, which allows classes (types), binary roles (attributes), disjuncti<strong>on</strong>, negati<strong>on</strong> and<br />
also has a number <str<strong>on</strong>g>of</str<strong>on</strong>g> base data types (integer, string, Boolean, real).<br />
Mapping Language Source schemas and object patterns are first translated into ODL I 3<br />
descripti<strong>on</strong>s. This translati<strong>on</strong> is performed by a wrapper. Then, a Comm<strong>on</strong> Thesaurus is<br />
created based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3 class descripti<strong>on</strong>s and attributes. The Comm<strong>on</strong> Thesaurus<br />
c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> four kinds <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships, which are added to <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>saurus in <str<strong>on</strong>g>the</str<strong>on</strong>g> following<br />
phases:<br />
1. Schema-derived relati<strong>on</strong>ships In this phase, <strong>on</strong>ly intra-schema relati<strong>on</strong>ships are<br />
c<strong>on</strong>sidered. Relati<strong>on</strong>ships within <strong>on</strong>e p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular schema are extracted, e.g. by<br />
exploiting foreign and primary key relati<strong>on</strong>ships in order to infer related and<br />
broader/narrower term relati<strong>on</strong>ships.<br />
2. Lexical-derived relati<strong>on</strong>ships Lexical relati<strong>on</strong>ships between names in different<br />
schemas are exploited to extract inter-schema relati<strong>on</strong>ships. WordNet [Fel99] is<br />
used to extract syn<strong>on</strong>yms and hypo/hypernyms. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, syn<strong>on</strong>ymous terms<br />
are also extracted from attributes with similar names.<br />
3. Designer-supplied relati<strong>on</strong>ships In this phase, <str<strong>on</strong>g>the</str<strong>on</strong>g> designer can supply relati<strong>on</strong>ships<br />
between schemas. A Descripti<strong>on</strong> Logic reas<strong>on</strong>er, such as ODB, can be used to<br />
check <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ships for inc<strong>on</strong>sistency.<br />
4. Inferred relati<strong>on</strong>ships Descripti<strong>on</strong> Logic reas<strong>on</strong>ing is used to infer new relati<strong>on</strong>ships<br />
between ODL I 3 classes, based <strong>on</strong> relati<strong>on</strong>ships specified in <str<strong>on</strong>g>the</str<strong>on</strong>g> previous<br />
phases.<br />
In each <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> phases, intenti<strong>on</strong>al relati<strong>on</strong>ships are added to <str<strong>on</strong>g>the</str<strong>on</strong>g> Comm<strong>on</strong> Thesaurus.<br />
The designer can streng<str<strong>on</strong>g>the</str<strong>on</strong>g>n <str<strong>on</strong>g>the</str<strong>on</strong>g>se intenti<strong>on</strong>al relati<strong>on</strong>ships by creating extensi<strong>on</strong>al relati<strong>on</strong>ships,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>reby enabling subsumpti<strong>on</strong> reas<strong>on</strong>ing and c<strong>on</strong>sistency checking. An intenti<strong>on</strong>al<br />
relati<strong>on</strong>ship can be seen as saying “<str<strong>on</strong>g>the</str<strong>on</strong>g>re exists in general a relati<strong>on</strong>ship between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>se terms”, whereas an extensi<strong>on</strong>al relati<strong>on</strong>ship can be seen as saying “this relati<strong>on</strong>ship<br />
holds for <str<strong>on</strong>g>the</str<strong>on</strong>g>se p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular data sources”.<br />
MOMIS employs hierarchical clustering based <strong>on</strong> an affinity measure, which indicates<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> similarity between classes in a cluster. The affinity measure is based <strong>on</strong> both <str<strong>on</strong>g>the</str<strong>on</strong>g> name<br />
and structural similarity.
CHAPTER 4. THE SURVEY 89<br />
The clusters in <str<strong>on</strong>g>the</str<strong>on</strong>g> hierarchy are used to interactively create new classes for <str<strong>on</strong>g>the</str<strong>on</strong>g> merged<br />
<strong>on</strong>tology. Generally, a uni<strong>on</strong> is taken <str<strong>on</strong>g>of</str<strong>on</strong>g> all classes in a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular cluster in order to come<br />
up with <str<strong>on</strong>g>the</str<strong>on</strong>g> new global class. During <str<strong>on</strong>g>the</str<strong>on</strong>g> process <str<strong>on</strong>g>of</str<strong>on</strong>g> creating <str<strong>on</strong>g>the</str<strong>on</strong>g> global class, mapping<br />
rules between <str<strong>on</strong>g>the</str<strong>on</strong>g> attributes in <str<strong>on</strong>g>the</str<strong>on</strong>g> local classes and <str<strong>on</strong>g>the</str<strong>on</strong>g> global class are established and<br />
stored in <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology for later use. Because <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology is created <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
basis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies, MOMIS takes <str<strong>on</strong>g>the</str<strong>on</strong>g> global-as-view approach [Lev00], which<br />
means that <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema is created as a view over <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas and all queries<br />
to <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema can be easily translated to <str<strong>on</strong>g>the</str<strong>on</strong>g> local representati<strong>on</strong> because <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
presence <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping rules.<br />
It is not clear how new sources can really be integrated <strong>on</strong>ce <str<strong>on</strong>g>the</str<strong>on</strong>g> system is in place. It<br />
seems that <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema has to be re-created from <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas, although <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
computed affinity clusters can <str<strong>on</strong>g>of</str<strong>on</strong>g> course be reused and if <str<strong>on</strong>g>the</str<strong>on</strong>g> classes in <str<strong>on</strong>g>the</str<strong>on</strong>g> new schema to<br />
be integrated fit inside <str<strong>on</strong>g>the</str<strong>on</strong>g> existing affinity clusters, <strong>on</strong>ly <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
new schema and <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema need to be created.<br />
Automati<strong>on</strong> Support Automati<strong>on</strong> support in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology merging task is provided by<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> ARTEMIS tool [CdA99]. The ARTEMIS tool provides support in <str<strong>on</strong>g>the</str<strong>on</strong>g> matching task<br />
by determining <str<strong>on</strong>g>the</str<strong>on</strong>g> (name and structural) affinity between terms in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
Applicability to Use Cases In MOMIS, <str<strong>on</strong>g>the</str<strong>on</strong>g> Query Manager does query rewriting based<br />
<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules in <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology. The wrapper <str<strong>on</strong>g>of</str<strong>on</strong>g> each data source rewrites<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> query from its ODL I 3 representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data source and also<br />
transforms <str<strong>on</strong>g>the</str<strong>on</strong>g> query results back to <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3 representati<strong>on</strong>. The query manager <str<strong>on</strong>g>the</str<strong>on</strong>g>n<br />
uses <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping rules to translate <str<strong>on</strong>g>the</str<strong>on</strong>g> query results back to <str<strong>on</strong>g>the</str<strong>on</strong>g> global representati<strong>on</strong> in<br />
order to present <str<strong>on</strong>g>the</str<strong>on</strong>g> results to <str<strong>on</strong>g>the</str<strong>on</strong>g> user. It is not exactly clear if and how <str<strong>on</strong>g>the</str<strong>on</strong>g> Query Manager<br />
fuses <str<strong>on</strong>g>the</str<strong>on</strong>g> query results from <str<strong>on</strong>g>the</str<strong>on</strong>g> different sources in case <str<strong>on</strong>g>of</str<strong>on</strong>g> overlap in <str<strong>on</strong>g>the</str<strong>on</strong>g> result sets.<br />
Implementati<strong>on</strong> A number <str<strong>on</strong>g>of</str<strong>on</strong>g> comp<strong>on</strong>ents are used to enable <str<strong>on</strong>g>the</str<strong>on</strong>g> MOMIS architecture.<br />
These comp<strong>on</strong>ents are (see Figure 4.20, taken from [BCVB01]):<br />
• A wrapper performs <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> individual data source into <str<strong>on</strong>g>the</str<strong>on</strong>g> ODL I 3<br />
language (and translates <str<strong>on</strong>g>the</str<strong>on</strong>g> queries back).<br />
• The mediator c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query manager (QM) and <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema builder<br />
(GSB). The QM comp<strong>on</strong>ent breaks up global ODL I 3 queries into sub-queries for<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> different data sources. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> GSB is an <str<strong>on</strong>g>of</str<strong>on</strong>g>fline comp<strong>on</strong>ent, which aids<br />
in <strong>on</strong>tology merging and <str<strong>on</strong>g>the</str<strong>on</strong>g> QM is a run-time comp<strong>on</strong>ent, which performs <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
queries.<br />
• The ARTEMIS tool envir<strong>on</strong>ment performs <str<strong>on</strong>g>the</str<strong>on</strong>g> classificati<strong>on</strong> (affinity and syn<str<strong>on</strong>g>the</str<strong>on</strong>g>sis)<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> classes for <str<strong>on</strong>g>the</str<strong>on</strong>g> syn<str<strong>on</strong>g>the</str<strong>on</strong>g>sis <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global classes.
CHAPTER 4. THE SURVEY 90<br />
• The ODB-tools engine performs schema validati<strong>on</strong> and inferences for <str<strong>on</strong>g>the</str<strong>on</strong>g> generati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Comm<strong>on</strong> Thesaurus, as well as query optimizati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g> Query Manager.<br />
Figure 4.20: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> MOMIS system [BCVB01]<br />
The architecture in Figure 4.20 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> main tools used to support <str<strong>on</strong>g>the</str<strong>on</strong>g> overall architecture.<br />
A disadvantage is that <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no integrated tool envir<strong>on</strong>ment. Any data source<br />
can be c<strong>on</strong>nected to <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture, as l<strong>on</strong>g as an ODL I 3 wrapper is created.<br />
Experience As far as we are aware, MOMIS has not been used in any major industrial<br />
project and is mainly an academic research activity, with toy examples. However,<br />
[CAdV01] reports <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> ARTEMIS in a research project in <str<strong>on</strong>g>the</str<strong>on</strong>g> Italian Public<br />
Administrati<strong>on</strong> domain.<br />
4.2.5 INFOMIX<br />
INFOMIX [LGR + 05b, LGR + 05a, LLR02] is a system that supports informati<strong>on</strong> integrati<strong>on</strong><br />
by utilizing advanced reas<strong>on</strong>ing capabilities. The INFOMIX system is built in cooperati<strong>on</strong><br />
with RODAN systems 18 which is a commercial database management system<br />
developer. It uses <str<strong>on</strong>g>the</str<strong>on</strong>g> DLV reas<strong>on</strong>ing system [LPF + 05] for <str<strong>on</strong>g>the</str<strong>on</strong>g> reas<strong>on</strong>ing tasks. The DLV<br />
system is a disjunctive datalog reas<strong>on</strong>ing system and has been delevoped independently<br />
18 The web site <str<strong>on</strong>g>of</str<strong>on</strong>g> RODAN systems can be found at http://www.rodan.pl/en/.
CHAPTER 4. THE SURVEY 91<br />
from INFOMIX. However, INFOMIX uses <str<strong>on</strong>g>the</str<strong>on</strong>g> DLV system for solving its reas<strong>on</strong>ing<br />
tasks.<br />
Figure 4.21: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system [LGR + 05b]<br />
The INFOMIX architecture is depicted in Figure 4.21. It is divided into three levels<br />
and supports two modes:<br />
• a design mode<br />
Here, <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema, <str<strong>on</strong>g>the</str<strong>on</strong>g> source schema and <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between <str<strong>on</strong>g>the</str<strong>on</strong>g>m are<br />
specified. Also, <str<strong>on</strong>g>the</str<strong>on</strong>g> wrappers for <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources are created or imported. The data<br />
sources c<strong>on</strong>sist <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>al and XML data.<br />
• a query mode<br />
Here, query answering facilities are provided at run time, including data acquisiti<strong>on</strong>,<br />
integrati<strong>on</strong>, answer computati<strong>on</strong> and presentati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> user.<br />
In both modes, <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system is divided into three levels:<br />
• Informati<strong>on</strong> Service Level<br />
Here, a direct interface to <str<strong>on</strong>g>the</str<strong>on</strong>g> user is provided at run time and to <str<strong>on</strong>g>the</str<strong>on</strong>g> designer at
CHAPTER 4. THE SURVEY 92<br />
design time. This level deals with global data and provides <str<strong>on</strong>g>the</str<strong>on</strong>g> interfaces that are<br />
necessary (e.g. for global schema definiti<strong>on</strong> and for query formulati<strong>on</strong>.<br />
This level comprises two modules: <str<strong>on</strong>g>the</str<strong>on</strong>g> Informati<strong>on</strong> Model Manager and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Query Formulati<strong>on</strong> module:<br />
– The Informati<strong>on</strong> Model Manager handles <str<strong>on</strong>g>the</str<strong>on</strong>g> definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas, as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping. User-friendly interfaces for<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>se tasks, including schema browsers, are provided. Automatic support<br />
for <str<strong>on</strong>g>the</str<strong>on</strong>g> verificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> coherency, redundancy and adequacy <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> applicati<strong>on</strong><br />
specificati<strong>on</strong> is given. Finally, this module presents query results in a suitable<br />
form to <str<strong>on</strong>g>the</str<strong>on</strong>g> user.<br />
– The Query Formulati<strong>on</strong> module provides a graphical, user-friendly interface<br />
for query formulati<strong>on</strong> over <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema and query validati<strong>on</strong> facilities.<br />
The query validati<strong>on</strong> facilities check <str<strong>on</strong>g>the</str<strong>on</strong>g> interacti<strong>on</strong>s between <str<strong>on</strong>g>the</str<strong>on</strong>g> user query<br />
and global integrity c<strong>on</strong>straints to guarantee that query answering is always<br />
decidable.<br />
• Internal Integrati<strong>on</strong> Level<br />
This level is based <strong>on</strong> computati<strong>on</strong>al logic and deductive database technology. It is<br />
composed by three modules:<br />
– The Query Rewriter reformulates <str<strong>on</strong>g>the</str<strong>on</strong>g> user query according to global integrity<br />
c<strong>on</strong>straints. It makes use <str<strong>on</strong>g>of</str<strong>on</strong>g> a submodule to verify data c<strong>on</strong>sistency. This<br />
submodule exploits <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping and unfolds <str<strong>on</strong>g>the</str<strong>on</strong>g> user query over <str<strong>on</strong>g>the</str<strong>on</strong>g> source<br />
relati<strong>on</strong>s and activates <str<strong>on</strong>g>the</str<strong>on</strong>g> corresp<strong>on</strong>ding wrappers to retrieve relevant data.<br />
Afterwards, <str<strong>on</strong>g>the</str<strong>on</strong>g> submodule checks we<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g>re are integrity c<strong>on</strong>straint violati<strong>on</strong>s.<br />
If no violati<strong>on</strong>s occur, <str<strong>on</strong>g>the</str<strong>on</strong>g> reformulati<strong>on</strong> produced by <str<strong>on</strong>g>the</str<strong>on</strong>g> rewriter is a<br />
simple (disjuncti<strong>on</strong> free) Datalog program. O<str<strong>on</strong>g>the</str<strong>on</strong>g>rwise, a suitable disjunctive<br />
Datalog program is generated that performs automatic repair <str<strong>on</strong>g>of</str<strong>on</strong>g> data, in a way<br />
such that cautious answers to this program evaluated over <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources<br />
corresp<strong>on</strong>d to <str<strong>on</strong>g>the</str<strong>on</strong>g> certain answers to <str<strong>on</strong>g>the</str<strong>on</strong>g> query.<br />
– The Query Optimizer provides several optimizati<strong>on</strong> strategies which enhance<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> efficiency <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system. In p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular, <str<strong>on</strong>g>the</str<strong>on</strong>g> module exploits some<br />
focusing techniques which are able to isolate <str<strong>on</strong>g>the</str<strong>on</strong>g> porti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source database<br />
that is relevant to answer a user query. This is d<strong>on</strong>e by pushing c<strong>on</strong>stants in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> query towards <str<strong>on</strong>g>the</str<strong>on</strong>g> sources, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query that c<strong>on</strong>tains c<strong>on</strong>stants<br />
is processed first. To this aim, an optimized (possibly disjunctive) Datalog<br />
program is generated by applying advanced binding propagati<strong>on</strong> techniques<br />
similar to <str<strong>on</strong>g>the</str<strong>on</strong>g> magic sets algorithms that are well-known in <str<strong>on</strong>g>the</str<strong>on</strong>g> database area.<br />
– The Query Evaluator takes <str<strong>on</strong>g>the</str<strong>on</strong>g> optimized program and first loads data from<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> Internal Data Store and <str<strong>on</strong>g>the</str<strong>on</strong>g>n invokes <str<strong>on</strong>g>the</str<strong>on</strong>g> DLV system in order to com-
CHAPTER 4. THE SURVEY 93<br />
pute c<strong>on</strong>sistent answers. The results are <str<strong>on</strong>g>the</str<strong>on</strong>g>n sent to <str<strong>on</strong>g>the</str<strong>on</strong>g> Informati<strong>on</strong> Model<br />
Manager for suitable presentati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> user.<br />
• Data Acquisiti<strong>on</strong> and Transformati<strong>on</strong> Level<br />
This level provides access to external data sources. INFOMIX has an architecture<br />
which allows for <str<strong>on</strong>g>the</str<strong>on</strong>g> integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogenous types <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources. The primary<br />
types <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources are relati<strong>on</strong>al, XML, HTML, and object-oriented data sources.<br />
However, it is claimed that arbitrary o<str<strong>on</strong>g>the</str<strong>on</strong>g>r types <str<strong>on</strong>g>of</str<strong>on</strong>g> data sources can be incorporated<br />
easily. All data sources are c<strong>on</strong>ceptually transformed into a uniform source data<br />
format, which is a fragment <str<strong>on</strong>g>of</str<strong>on</strong>g> XML Schema. Data encoded in this uniform source<br />
data format can be browsed. The acquisiti<strong>on</strong> and transformati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> data is d<strong>on</strong>e by<br />
wrappers. A query plan for executing suitable wrappers is generated, which load<br />
data into <str<strong>on</strong>g>the</str<strong>on</strong>g> Internal Data Store. C<strong>on</strong>stants are pushed to <str<strong>on</strong>g>the</str<strong>on</strong>g> query wrappers whenever<br />
it is possible in order to reduce <str<strong>on</strong>g>the</str<strong>on</strong>g> amount <str<strong>on</strong>g>of</str<strong>on</strong>g> data retrieved.<br />
Currently, INFOMIX <str<strong>on</strong>g>of</str<strong>on</strong>g>fers three classes <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers, which provide different levels<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> support for query formulati<strong>on</strong> and wrapper code generati<strong>on</strong>:<br />
– Code wrappers are basically a definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> an API and <str<strong>on</strong>g>the</str<strong>on</strong>g> code implementing<br />
it. The internals and characteristics <str<strong>on</strong>g>of</str<strong>on</strong>g> code wrappers are <str<strong>on</strong>g>the</str<strong>on</strong>g>refore inaccessible<br />
to INFOMIX.<br />
– Query wrappers propagate queries to external data sources and treat <str<strong>on</strong>g>the</str<strong>on</strong>g> result<br />
as a logical data source.<br />
– Visual wrappers support interactive development <str<strong>on</strong>g>of</str<strong>on</strong>g> wrappers at design time.<br />
Currently, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is support for developing LiXto wrappers and pipes as well as<br />
for Rodans Data Extractor.<br />
<strong>Ontology</strong> Languages The INFOMIX system supports <str<strong>on</strong>g>the</str<strong>on</strong>g> following kinds <str<strong>on</strong>g>of</str<strong>on</strong>g> external<br />
data sources:<br />
• Relati<strong>on</strong>al data in <str<strong>on</strong>g>the</str<strong>on</strong>g> ODBC standard, i.e. especially SQL, is supported.<br />
• Any object-oriented database that c<strong>on</strong>forms to <str<strong>on</strong>g>the</str<strong>on</strong>g> ODMG standard (cf<br />
http://www.odmg.org/) and which is accessible by <str<strong>on</strong>g>the</str<strong>on</strong>g> ODMG 2.0 Java binding will<br />
be supported. In p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular, <str<strong>on</strong>g>the</str<strong>on</strong>g> object-oriented database must be accessible by OQL<br />
queries (more specifically, by <str<strong>on</strong>g>the</str<strong>on</strong>g> subset <str<strong>on</strong>g>of</str<strong>on</strong>g> OQL supported by <str<strong>on</strong>g>the</str<strong>on</strong>g> ODMG Java<br />
binding). However, no multimedia data types will be supported although usually<br />
supported by object-oriented database management systems. Currently <str<strong>on</strong>g>the</str<strong>on</strong>g> support<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> object-oriented databases is not available.<br />
• Pure (semi-structured) XML data (according to <str<strong>on</strong>g>the</str<strong>on</strong>g> XML 1.0 standard) as well as<br />
XML data structured according to a corresp<strong>on</strong>ding XML Schema specificati<strong>on</strong> is<br />
supported. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system does not check <str<strong>on</strong>g>the</str<strong>on</strong>g> validity <str<strong>on</strong>g>of</str<strong>on</strong>g> structured<br />
XML data. It also does not take DTD specificati<strong>on</strong>s into account.
CHAPTER 4. THE SURVEY 94<br />
• Valid web pages c<strong>on</strong>forming to <str<strong>on</strong>g>the</str<strong>on</strong>g> HTML 2.0, 3.2 and 4.1 standards are supported.<br />
However, <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system does not check <str<strong>on</strong>g>the</str<strong>on</strong>g> validity <str<strong>on</strong>g>of</str<strong>on</strong>g> HTML data. Note<br />
that HTML rendered versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> embedded data formats like comma separated values<br />
(CVS), Micros<str<strong>on</strong>g>of</str<strong>on</strong>g>t Word (planned), Excel and Powerpoint (planned) documents<br />
as well PDF documents are supported. INFOMIX does not support multi-media<br />
data such as image files, sound or movie data.<br />
Data extracti<strong>on</strong> and preprocessing <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources is d<strong>on</strong>e by LiXto [BFG01] wrappers<br />
and tools.<br />
The data is internally stored in a language called ISDF (INFOMIX source data format)<br />
and has an XML-like syntax. It is specified by a subset <str<strong>on</strong>g>of</str<strong>on</strong>g> XML Schema. Essentially,<br />
ISDF amounts to a small extensi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>al format, by allowing nesting <str<strong>on</strong>g>of</str<strong>on</strong>g> data<br />
and multiple attribute values. This format is translated by <str<strong>on</strong>g>the</str<strong>on</strong>g> query reformulator into<br />
disjunctive Datalog programs extended by aggregate functi<strong>on</strong>s.<br />
Mapping Language The mapping is a Global-As-View (GAV) mapping. I.e. <str<strong>on</strong>g>the</str<strong>on</strong>g>re is a<br />
global schema G and a source schema S which comprises <str<strong>on</strong>g>the</str<strong>on</strong>g> schemas <str<strong>on</strong>g>of</str<strong>on</strong>g> all <str<strong>on</strong>g>the</str<strong>on</strong>g> sources to<br />
be integrated. It c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> a set <str<strong>on</strong>g>of</str<strong>on</strong>g> logical implicati<strong>on</strong>s in Datalog. Such an implicati<strong>on</strong><br />
has a c<strong>on</strong>juncti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> atoms in S as antecedent and as c<strong>on</strong>sequent a relati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> G. All<br />
free variables <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> body <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> implicati<strong>on</strong> are quantified. This means that each global<br />
relati<strong>on</strong> is associated with a uni<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> c<strong>on</strong>junctive queries. Clearly, this mapping language<br />
is not as general as our mapping language which can be grounded to different kinds <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
logics and is much more expressive.<br />
Mapping Patterns There are a user-friendly interface and schema browsers for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
global schema, <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings. However, <str<strong>on</strong>g>the</str<strong>on</strong>g>re does not exist a<br />
c<strong>on</strong>cept <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping patterns. We are not aware <str<strong>on</strong>g>of</str<strong>on</strong>g> planned extensi<strong>on</strong>s to use mapping<br />
patterns.<br />
Automati<strong>on</strong> Support There is no automati<strong>on</strong> support in <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings between<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas and <str<strong>on</strong>g>the</str<strong>on</strong>g> global schema. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> query rewriting and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
data fusi<strong>on</strong> is completely automated based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> local schemas<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> global schemas as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> wrapper specificati<strong>on</strong>s between <str<strong>on</strong>g>the</str<strong>on</strong>g> original data<br />
source formats and <str<strong>on</strong>g>the</str<strong>on</strong>g> internal storage format.<br />
Applicability to Use Cases Instance transformati<strong>on</strong> and unificati<strong>on</strong> can be performed<br />
by <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system. Query rewriting is d<strong>on</strong>e by <str<strong>on</strong>g>the</str<strong>on</strong>g> INFOMIX system as well<br />
naturally.<br />
The INFOMIX system does not perform <strong>on</strong>tology merging. However, it translates <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
relevant data into a uniform data format. This can be viewed in some respect as <strong>on</strong>tology
CHAPTER 4. THE SURVEY 95<br />
merging.<br />
Implementati<strong>on</strong> The kernel <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> system is be developed in C++ to achieve better performance.<br />
The user interfaces and data-extracti<strong>on</strong> modules are based <strong>on</strong> Java comp<strong>on</strong>ents.<br />
Experience There is no experience report available yet.<br />
4.2.6 AutoMed<br />
AutoMed [BKL + 04] is an integrated system that incorporates <str<strong>on</strong>g>the</str<strong>on</strong>g> BAV (both as view)<br />
approach. This means that it does not adhere to ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r <str<strong>on</strong>g>the</str<strong>on</strong>g> LAV (local as view) approach<br />
or <str<strong>on</strong>g>the</str<strong>on</strong>g> GAV (global as view) or GLAV (global local as view) approach. BAV subsumes<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> expressive power <str<strong>on</strong>g>of</str<strong>on</strong>g> all <str<strong>on</strong>g>the</str<strong>on</strong>g>se menti<strong>on</strong>ed approaches.<br />
Figure 4.22 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed system. The meta data repository<br />
can be viewed as <str<strong>on</strong>g>the</str<strong>on</strong>g> core and o<str<strong>on</strong>g>the</str<strong>on</strong>g>r comp<strong>on</strong>ents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> s<str<strong>on</strong>g>of</str<strong>on</strong>g>tware architecture are implemented<br />
up<strong>on</strong> it. When a data source is wrapped, a definiti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> schema for that data<br />
source is added to <str<strong>on</strong>g>the</str<strong>on</strong>g> repository. The schema matching tool can <str<strong>on</strong>g>the</str<strong>on</strong>g>n identify related<br />
objects in various data sources by means <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> query processor that retrieves data from<br />
schema objects. The template transformati<strong>on</strong> tool is used to generate transformati<strong>on</strong>s<br />
between <str<strong>on</strong>g>the</str<strong>on</strong>g> data sources. A GUI is available for <str<strong>on</strong>g>the</str<strong>on</strong>g>se comp<strong>on</strong>ents.<br />
The repository has two logical comp<strong>on</strong>ents:<br />
• The model defniti<strong>on</strong>s repository (MDR) defines how a data modelling language<br />
is represented (in <str<strong>on</strong>g>the</str<strong>on</strong>g> hypergraph data model (HDM)). It can be used to c<strong>on</strong>figure<br />
AutoMed to handle a p<str<strong>on</strong>g>art</str<strong>on</strong>g>icular data modelling language.<br />
• The schema transformati<strong>on</strong> repository (STR) defines schemas in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
data modelling c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> MDR and transformati<strong>on</strong>s that will be specified<br />
between those schemas. Most tools and users will be c<strong>on</strong>cerned with editing this<br />
repository when new databases are added to <str<strong>on</strong>g>the</str<strong>on</strong>g> repository.<br />
The MDR and STR may be held in <str<strong>on</strong>g>the</str<strong>on</strong>g> same or in separate persistent storages.<br />
Figure 4.23 gives an overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> key objects in <str<strong>on</strong>g>the</str<strong>on</strong>g> repository. The STR c<strong>on</strong>tains<br />
a set <str<strong>on</strong>g>of</str<strong>on</strong>g> Schema descripti<strong>on</strong>s. Each such Schema descripti<strong>on</strong> c<strong>on</strong>tains a set <str<strong>on</strong>g>of</str<strong>on</strong>g> SchemaObject<br />
instances. Each such SchemaObject instance must be based <strong>on</strong> a C<strong>on</strong>struct instance<br />
that exists in <str<strong>on</strong>g>the</str<strong>on</strong>g> MDR. This C<strong>on</strong>struct describes how <str<strong>on</strong>g>the</str<strong>on</strong>g> SchemaObject can be c<strong>on</strong>structed<br />
in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> strings and references to o<str<strong>on</strong>g>the</str<strong>on</strong>g>r schema objects, and <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>struct to <str<strong>on</strong>g>the</str<strong>on</strong>g> HDM. Schemas are thus easily translatable into HDM which is<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> comm<strong>on</strong> underlying representati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> all <str<strong>on</strong>g>the</str<strong>on</strong>g> data modelling languages handled in<br />
AutoMed. Note that each Schema may c<strong>on</strong>tain SchemaObjects from more that <strong>on</strong>e data
CHAPTER 4. THE SURVEY 96<br />
Figure 4.22: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed system [BKL + 04]<br />
modelling language. In this way, mappings between different data modelling languages<br />
can be described. Schemas may be related to each o<str<strong>on</strong>g>the</str<strong>on</strong>g>r using transformati<strong>on</strong> instances.<br />
For this deliverable <str<strong>on</strong>g>the</str<strong>on</strong>g> schema matching architecture is most relevant and most important.<br />
Figure 4.24 shows <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> schema mapping module.<br />
Within AutoMed, five types <str<strong>on</strong>g>of</str<strong>on</strong>g> semantic relati<strong>on</strong>ships between schema objects are distinguished<br />
based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir intenti<strong>on</strong>al domains, i.e. <str<strong>on</strong>g>the</str<strong>on</strong>g> sets <str<strong>on</strong>g>of</str<strong>on</strong>g> real-world<br />
entities represented by <str<strong>on</strong>g>the</str<strong>on</strong>g> schema objects. These are: equivalence, subsumpti<strong>on</strong>, intersecti<strong>on</strong>,<br />
disjointness and incompatibility. Transformati<strong>on</strong> rules that should be performed<br />
<strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> schemas based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> discovered semantic relati<strong>on</strong>ships are defined in [Riz03]. In<br />
AutoMed, <str<strong>on</strong>g>the</str<strong>on</strong>g> automatic discovery <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic relati<strong>on</strong>ships are performed by means<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> a bidirecti<strong>on</strong>al comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> schema objects.<br />
The relati<strong>on</strong>ship discovery process is implemented by means <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture in<br />
Figure 4.24. The architecture c<strong>on</strong>sists <str<strong>on</strong>g>of</str<strong>on</strong>g> several modules that exploit different types <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
informati<strong>on</strong> in order to compute bidirecti<strong>on</strong>al similarity degrees <str<strong>on</strong>g>of</str<strong>on</strong>g> schema objects. The<br />
currently implemented modules compare schema object names, instances, statistical data<br />
over <str<strong>on</strong>g>the</str<strong>on</strong>g> instances, data types, value ranges and lengths. There are two types <str<strong>on</strong>g>of</str<strong>on</strong>g> modules:<br />
relati<strong>on</strong>ship identificati<strong>on</strong> modules attempt to discover compatible pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> schema objects,<br />
and relati<strong>on</strong>ship clarificati<strong>on</strong> modules attempt to specify <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic<br />
relati<strong>on</strong>ship in each compatible pair. Initially in <str<strong>on</strong>g>the</str<strong>on</strong>g> schema matching process, <str<strong>on</strong>g>the</str<strong>on</strong>g> bidirecti<strong>on</strong>al<br />
similarity degrees produced by <str<strong>on</strong>g>the</str<strong>on</strong>g> modules are combined by a Filter that uses <str<strong>on</strong>g>the</str<strong>on</strong>g>
CHAPTER 4. THE SURVEY 97<br />
Figure 4.23: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed repository schema [BKL + 04]<br />
average aggregati<strong>on</strong> strategy in order to separate <str<strong>on</strong>g>the</str<strong>on</strong>g> compatible from <str<strong>on</strong>g>the</str<strong>on</strong>g> incompatible<br />
pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> schema objects. Afterwards, <str<strong>on</strong>g>the</str<strong>on</strong>g> Aggregator comp<strong>on</strong>ent combines <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity<br />
degrees <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> compatible schema objects with <str<strong>on</strong>g>the</str<strong>on</strong>g> product aggregati<strong>on</strong> strategy and indicates<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>ir semantic relati<strong>on</strong>ships. The output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Aggregator becomes <str<strong>on</strong>g>the</str<strong>on</strong>g> input <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> Degree Combinator, which is based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship clarificati<strong>on</strong> modules and <strong>on</strong><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> values <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity degrees. The Degree Combinator has as output <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic<br />
relati<strong>on</strong>ships it has discovered. The user can <str<strong>on</strong>g>the</str<strong>on</strong>g>n validate or reject <str<strong>on</strong>g>the</str<strong>on</strong>g> discovered<br />
relati<strong>on</strong>ships and step ahead to <str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong> process.<br />
<strong>Ontology</strong> Languages The AutoMed system can use different <strong>Ontology</strong> languages. It<br />
supports RDF and RDFS as well as XML and comm<strong>on</strong> relati<strong>on</strong>al Databases formats.<br />
Mapping Language The mapping language is <str<strong>on</strong>g>the</str<strong>on</strong>g> hypergraph data model (HDM). This<br />
is <str<strong>on</strong>g>the</str<strong>on</strong>g> internal language that is used.<br />
Mapping Patterns Mapping Patterns are not menti<strong>on</strong>ed.<br />
Automati<strong>on</strong> Support There is automatic support for <str<strong>on</strong>g>the</str<strong>on</strong>g> creati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings. The<br />
mappings creati<strong>on</strong> architecture shown in Figure 4.24 displays <str<strong>on</strong>g>the</str<strong>on</strong>g> comp<strong>on</strong>ents that are<br />
used to create automatic mappings.<br />
Applicability to Use Cases All use cases can be applied.
CHAPTER 4. THE SURVEY 98<br />
Figure 4.24: Architecture <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed mapping system [BKL + 04]<br />
Implementati<strong>on</strong> The AutoMed system has been implemented in Java. The different<br />
comp<strong>on</strong>ents <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> architecture have been implemented in Java as well.<br />
Experience The authors are not aware <str<strong>on</strong>g>of</str<strong>on</strong>g> experience reports <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> AutoMed system.<br />
4.3 Specific Techniques<br />
In this secti<strong>on</strong> we briefly describe some specific techniques for <strong>on</strong>tology merging such as<br />
FCA-Merge, OntoMorph, and QOM.<br />
4.3.1 FCA-Merge<br />
FCA-Merge [SM01] is a method for merging <strong>on</strong>tologies based <strong>on</strong> Formal C<strong>on</strong>cept Analysis<br />
[GW99]. The FCA-Merge approach is a bottom-up approach, which means that it<br />
is based <strong>on</strong> applicati<strong>on</strong>-specific instances <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> two <strong>on</strong>tologies that need to be merged.
CHAPTER 4. THE SURVEY 99<br />
A set <str<strong>on</strong>g>of</str<strong>on</strong>g> documents that are relevant to both <strong>on</strong>tologies are provided as input. Using linguistic<br />
analysis, instances are extracted from <str<strong>on</strong>g>the</str<strong>on</strong>g> documents for both <strong>on</strong>tologies. Now a<br />
pruned c<strong>on</strong>cept lattice is generated using <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity in instances for both <strong>on</strong>tologies.<br />
These first two steps (lexical analysis and generating <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cept lattice) are carried out<br />
fully automatically. In <str<strong>on</strong>g>the</str<strong>on</strong>g> third and last step <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> method, <str<strong>on</strong>g>the</str<strong>on</strong>g> merged target <strong>on</strong>tology is<br />
created interactively (i.e. semi-automatically).<br />
4.3.2 OntoMorph<br />
The OntoMorph system aims to facilitate <strong>on</strong>tology merging and <str<strong>on</strong>g>the</str<strong>on</strong>g> rapid generati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
knowledge base translators [Cha00]. It combines two powerful mechanisms to describe<br />
KB transformati<strong>on</strong>s. The first <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se mechanisms is syntactic rewriting via patterndirected<br />
rewrite rules that allow <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cise specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> sentence-level transformati<strong>on</strong>s<br />
based <strong>on</strong> pattern matching, and <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d mechanism involves semantic rewriting<br />
which modulates syntactic rewriting via semantic models and logical inference. The integrati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies can be based <strong>on</strong> any mixture <str<strong>on</strong>g>of</str<strong>on</strong>g> syntactic and semantic criteria.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> syntactic rewriting process, input expressi<strong>on</strong>s are first tokenized into lexemes<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g>n represented as syntax trees, which are represented internally as flat sequences<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> tokens and <str<strong>on</strong>g>the</str<strong>on</strong>g>ir structure <strong>on</strong>ly exists logically. OntoMorph’s pattern language and<br />
executi<strong>on</strong> model is str<strong>on</strong>gly influenced by Plisp [Smi90]. The pattern language can match<br />
and de-structure arbitrarily nested syntax trees in a direct and c<strong>on</strong>cise fashi<strong>on</strong>. Rewrite<br />
rules are applied to <str<strong>on</strong>g>the</str<strong>on</strong>g> executi<strong>on</strong> model.<br />
For <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic rewriting process, OntoMorph is built <strong>on</strong> top <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> PowerLoom 19<br />
knowledge representati<strong>on</strong> system, which is a successor to <str<strong>on</strong>g>the</str<strong>on</strong>g> Loom system. Using semantic<br />
import rules, <str<strong>on</strong>g>the</str<strong>on</strong>g> precise image <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source KB semantics can be established<br />
within PowerLoom (limited <strong>on</strong>ly by <str<strong>on</strong>g>the</str<strong>on</strong>g> expressiveness <str<strong>on</strong>g>of</str<strong>on</strong>g> first-order logic).<br />
4.3.3 QOM Quick <strong>Ontology</strong> Mapping<br />
The QOM tool [ES04a] represents an approach that c<strong>on</strong>siders both <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> mapping<br />
results as well as <str<strong>on</strong>g>the</str<strong>on</strong>g> run-time complexity. The hypo<str<strong>on</strong>g>the</str<strong>on</strong>g>sis is that mapping algorithms<br />
may be streamlined such that <str<strong>on</strong>g>the</str<strong>on</strong>g> loss <str<strong>on</strong>g>of</str<strong>on</strong>g> quality (compared to a standard baseline) is<br />
marginal, but <str<strong>on</strong>g>the</str<strong>on</strong>g> improvement <str<strong>on</strong>g>of</str<strong>on</strong>g> efficiency is so tremendous that it allows for <str<strong>on</strong>g>the</str<strong>on</strong>g> ad-hoc<br />
mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> large-size, light-weight <strong>on</strong>tologies. To substantiate <str<strong>on</strong>g>the</str<strong>on</strong>g> hypo<str<strong>on</strong>g>the</str<strong>on</strong>g>sis, a number<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> practical experiments were performed.<br />
The goal is to present an efficient mapping algorithm. The outcome is QOM — Quick<br />
<strong>Ontology</strong> Mapping. It is defined by <str<strong>on</strong>g>the</str<strong>on</strong>g> steps <str<strong>on</strong>g>of</str<strong>on</strong>g> a process model as shown in Figure 4.25.<br />
Mapping <strong>on</strong>e <strong>on</strong>tology <strong>on</strong>to ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r means that for each entity (c<strong>on</strong>cept C, relati<strong>on</strong> R, or<br />
19 http://www.isi.edu/isd/LOOM/PowerLoom/
CHAPTER 4. THE SURVEY 100<br />
instance I) in <strong>on</strong>tology O1, <strong>on</strong>e tries to find a corresp<strong>on</strong>ding entity, which has <str<strong>on</strong>g>the</str<strong>on</strong>g> same<br />
intended meaning, in <strong>on</strong>tology O2.<br />
Figure 4.25: QOM Mapping Process<br />
1. Firstly, QOM uses RDF triples as features.<br />
2. Sec<strong>on</strong>d, instead <str<strong>on</strong>g>of</str<strong>on</strong>g> comparing all entities <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> first <strong>on</strong>tology with all entities <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
sec<strong>on</strong>d <strong>on</strong>tology, QOM uses heuristics to lower <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> candidate mappings,<br />
which is a major problem for run-time complexity. In this dynamic programming<br />
approach we <strong>on</strong>ly choose promising candidate mappings.<br />
3. The actual similarity computati<strong>on</strong> is d<strong>on</strong>e by using a wide range <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity functi<strong>on</strong>s<br />
[ES04b]. An entity is described by <str<strong>on</strong>g>the</str<strong>on</strong>g> kind <str<strong>on</strong>g>of</str<strong>on</strong>g> appearance that is found to<br />
hold for this entity for characteristics like: identifiers such as URIs, RDF/S primitives<br />
such as subclass and instance relati<strong>on</strong>s, or domain specific features e.g. a<br />
hashcode-<str<strong>on</strong>g>of</str<strong>on</strong>g>-file in a file sharing domain. These features <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tological entities are<br />
compared using String Similarity and SimSet for set comparis<strong>on</strong>s. For efficiency<br />
reas<strong>on</strong>s <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity computati<strong>on</strong> was disburdened by removing extremely costly<br />
feature-measure combinati<strong>on</strong>s such as <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> all subclasses <str<strong>on</strong>g>of</str<strong>on</strong>g> two c<strong>on</strong>cepts.<br />
4. These individual measures are all input to <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity aggregati<strong>on</strong>. Instead <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
applying linear aggregati<strong>on</strong> functi<strong>on</strong>s, QOM applies a sigmoid functi<strong>on</strong>, which emphasizes<br />
high individual similarities and de-emphasizes low individual similarities.<br />
5. From <str<strong>on</strong>g>the</str<strong>on</strong>g> similarity values we derive <str<strong>on</strong>g>the</str<strong>on</strong>g> actual mappings. A threshold to discard<br />
spurious evidence <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity is applied. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r mappings are assigned based <strong>on</strong><br />
a greedy strategy that st<str<strong>on</strong>g>art</str<strong>on</strong>g>s with <str<strong>on</strong>g>the</str<strong>on</strong>g> largest similarity values first.<br />
6. Through several iterati<strong>on</strong> rounds <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> results rises c<strong>on</strong>siderably. Eventually,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> output returned is a mapping table representing <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong> mapO1,O2.<br />
The table is represented in a proprietary format, but can easily be transformed into<br />
a standardized format.<br />
The evaluati<strong>on</strong> was very promising. Depending <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> scenario QOM reaches high<br />
quality mapping levels very quickly. QOM is <strong>on</strong> a par with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r good state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g><br />
algorithms c<strong>on</strong>cerning <str<strong>on</strong>g>the</str<strong>on</strong>g> quality <str<strong>on</strong>g>of</str<strong>on</strong>g> proposed mappings, while outperforming <str<strong>on</strong>g>the</str<strong>on</strong>g>m with<br />
respect to efficiency — in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> run-time complexity (O(n · log(n)) instead <str<strong>on</strong>g>of</str<strong>on</strong>g> O(n 2 ))<br />
and in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> experiments that have been performed <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> QOM tool (by a factor<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> 10 to 100).
Chapter 5<br />
Comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Methods<br />
5.1 <strong>Ontology</strong> Languages<br />
We compare <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology languages supported by <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches included in this <str<strong>on</strong>g>survey</str<strong>on</strong>g>.<br />
Because many <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> systems included in <str<strong>on</strong>g>the</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> are database integrati<strong>on</strong> systems, <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>tology language is not <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>ly language that counts. For <str<strong>on</strong>g>the</str<strong>on</strong>g>se systems, it also matters<br />
which database schema languages are supported.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g>se integrati<strong>on</strong> systems, <str<strong>on</strong>g>the</str<strong>on</strong>g> data schema is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten lifted to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology level before<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> actual integrati<strong>on</strong> takes place. Therefore, for <str<strong>on</strong>g>the</str<strong>on</strong>g> core mapping task we are usually<br />
<strong>on</strong>ly c<strong>on</strong>cerned with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology languages. This lifting process <str<strong>on</strong>g>of</str<strong>on</strong>g>ten employed<br />
in database integrati<strong>on</strong> systems indicates <str<strong>on</strong>g>the</str<strong>on</strong>g> need for a comparis<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> lifting methods<br />
employed by <str<strong>on</strong>g>the</str<strong>on</strong>g> different systems. This small sub-comparis<strong>on</strong> evaluates to what extent<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> schema is actually translated to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology. In o<str<strong>on</strong>g>the</str<strong>on</strong>g>r words, what we need to know<br />
is: (1) is <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> sound?, i.e. is <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> from <str<strong>on</strong>g>the</str<strong>on</strong>g> schema to <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology<br />
correct and semantics preserving? and (2) is <str<strong>on</strong>g>the</str<strong>on</strong>g> translati<strong>on</strong> complete?, i.e. is <str<strong>on</strong>g>the</str<strong>on</strong>g> schema<br />
translated completely?<br />
Table 5.1 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology languages supported by <str<strong>on</strong>g>the</str<strong>on</strong>g> various approaches. It<br />
turns out that integrati<strong>on</strong> systems typically do not focus <strong>on</strong> inter-operability with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r<br />
<strong>on</strong>tology tools. This makes sense, because all tasks (mainly querying) are performed<br />
in a closed envir<strong>on</strong>ment. In a Semantic Web setting, use <str<strong>on</strong>g>of</str<strong>on</strong>g> standards is very important<br />
to enable inter-operability. We can see that tools created especially for <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />
Web (e.g. MAFRA, RDFT, PROMPT, OntoMap, OMEN) support RDFS and PROMPT<br />
also supports OWL. On <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, matchers such as GLUE and S-Match use <str<strong>on</strong>g>the</str<strong>on</strong>g>ir<br />
own internal representati<strong>on</strong>. Usually it is not a problem to c<strong>on</strong>vert an <strong>on</strong>tology in any<br />
language to such a representati<strong>on</strong>, which is typically d<strong>on</strong>e in more comprehensive tool<br />
envir<strong>on</strong>ments such as PROMPT.<br />
101
CHAPTER 5. COMPARISON OF THE METHODS 102<br />
Approach <strong>Ontology</strong> Language Remarks<br />
Methods and Tools<br />
MAFRA RDFS<br />
RDFT RDFS<br />
PROMPT Protégé-2000 supported Includes support for RDFS, OWL, etc.<br />
GLUE tax<strong>on</strong>omies<br />
S-Match DAGs<br />
OntoMap proprietary language similar<br />
to OWL Lite− supports export to RDFS<br />
RDFDiff RDF<br />
OMEN proprietary language similar<br />
to RDF Schema<br />
WSMX Mediati<strong>on</strong> Flora (so<strong>on</strong> WSML based) Integrated in <str<strong>on</strong>g>the</str<strong>on</strong>g> Web Services Modeling<br />
Toolkit.<br />
DOME Mediati<strong>on</strong> WSML Based <strong>on</strong> an abstract mapping language<br />
Integrati<strong>on</strong> Systems<br />
InfoSleuth OKBC wrappers are used to integrate data<br />
sources<br />
ONION Directed labeled graphs and source schemas are translated using<br />
Horn rules<br />
wrappers<br />
OBSERVER Descripti<strong>on</strong> Logics (CLAS-<br />
SIC)<br />
The <strong>on</strong>tology server maintains mappings<br />
between data schemas and <strong>on</strong>tologies<br />
MOMIS ODLI3 relati<strong>on</strong>al and semi-structured sources<br />
are translated to ODLI3 using custom<br />
wrappers<br />
INFOMIX a set <str<strong>on</strong>g>of</str<strong>on</strong>g> logical implicati<strong>on</strong>s INFOMIX has a very rich support for<br />
in Disjunctive Datalog different interfaces and languages<br />
AutoMed HDM XML, RDF(S) and relati<strong>on</strong>al databases<br />
are supported<br />
Table 5.1: <strong>Ontology</strong> Languages
CHAPTER 5. COMPARISON OF THE METHODS 103<br />
5.2 Mapping Language<br />
Table 5.2 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping languages used by <str<strong>on</strong>g>the</str<strong>on</strong>g> different approaches. In mapping<br />
languages we can see three general approaches:<br />
• The <strong>on</strong>tology language and <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping language are <str<strong>on</strong>g>the</str<strong>on</strong>g> same. This is <str<strong>on</strong>g>the</str<strong>on</strong>g> case in<br />
MOMIS, to some extent in OntoMap and OMEN, to some extent in OBSERVER<br />
and to some extent in ONION.<br />
• The mapping language is different from <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language. This is <str<strong>on</strong>g>the</str<strong>on</strong>g> case in<br />
MAFRA, RDFT and AutoMed and to some extent in OBSERVER and INFOMIX.<br />
MAFRA and RDFT both use a meta-<strong>on</strong>tology to describe types <str<strong>on</strong>g>of</str<strong>on</strong>g> bridges, which<br />
explicate <str<strong>on</strong>g>the</str<strong>on</strong>g> relati<strong>on</strong>ship between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. These types <str<strong>on</strong>g>of</str<strong>on</strong>g> bridges can be<br />
seen as <str<strong>on</strong>g>the</str<strong>on</strong>g> vocabulary for <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping languages.<br />
OBSERVER uses to some extent <str<strong>on</strong>g>the</str<strong>on</strong>g> same languages for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> both<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping also allows transformer<br />
functi<strong>on</strong>s, which are bey<strong>on</strong>d <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology language and <str<strong>on</strong>g>the</str<strong>on</strong>g> mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>tologies and <str<strong>on</strong>g>the</str<strong>on</strong>g> data schemas are specified using ERA (Extended Relati<strong>on</strong>al Algebra).<br />
• There is no real mapping language. The output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool is a similarity measure<br />
between c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies. This is <str<strong>on</strong>g>the</str<strong>on</strong>g> case for matchers, such as GLUE<br />
and S-Match. The purpose <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se matchers is not to create an <strong>on</strong>tology mapping<br />
as such, but to discover similarities between <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies.<br />
MAFRA, RDFT and OntoMap describe an <strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g> bridges, called SBO (Semantic<br />
Bridge <strong>Ontology</strong>), RDFT (RDF-Transformati<strong>on</strong>s) and OntoMapO (OntoMap <strong>Ontology</strong>),<br />
respectively. These bridges are instantiated in <str<strong>on</strong>g>the</str<strong>on</strong>g> actual <strong>on</strong>tology mapping in order to<br />
realize <str<strong>on</strong>g>the</str<strong>on</strong>g> actual mapping specificati<strong>on</strong>. In this c<strong>on</strong>text, MAFRA has <str<strong>on</strong>g>the</str<strong>on</strong>g> most elaborate<br />
bridge <strong>on</strong>tology, i.e. MAFRA has <str<strong>on</strong>g>the</str<strong>on</strong>g> most expressive mapping language. However,<br />
MAFRA does not support mappings between classes and instances, which is supported<br />
by OntoMap. The specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> such a bridge <strong>on</strong>tology has many advantages; <str<strong>on</strong>g>the</str<strong>on</strong>g> main<br />
advantage is that it makes <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings clearer and more understandable to <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
user and it allows <str<strong>on</strong>g>the</str<strong>on</strong>g> user to more easily find suitable mappings between <strong>on</strong>tologies.<br />
We have not menti<strong>on</strong>ed PROMPT and RDFDiff yet. PROMPT is not used to create a<br />
mapping between <strong>on</strong>tologies, but to merge <strong>on</strong>tologies. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g> output <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tool is<br />
not a mapping specificati<strong>on</strong>, but a merged <strong>on</strong>tology, which, in this case, can be exported to<br />
any <strong>on</strong>tology language supported by Protégé, such as RDFS or OWL. RDFDiff returns as<br />
its output not <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping between two different <strong>on</strong>tologies, but <str<strong>on</strong>g>the</str<strong>on</strong>g> structural difference<br />
between two versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an <strong>on</strong>tology in <str<strong>on</strong>g>the</str<strong>on</strong>g> form <str<strong>on</strong>g>of</str<strong>on</strong>g> changed, added and deleted triples in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> RDF document.
CHAPTER 5. COMPARISON OF THE METHODS 104<br />
Approach Mapping Language Remarks<br />
Methods and Tools<br />
MAFRA Semantic Bridge <strong>Ontology</strong><br />
(SBO)<br />
SBO is a meta-<strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g> semantic<br />
bridges. It allows arbitrary mappings<br />
between c<strong>on</strong>cepts, relati<strong>on</strong>s, and attributes,<br />
as well as c<strong>on</strong>diti<strong>on</strong>al mappings<br />
and procedural transformati<strong>on</strong>s<br />
RDFT RDFT RDFT is a meta-<strong>on</strong>tology, which<br />
describes types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings (bridges).<br />
Only allows class-to-class and<br />
property-to-property bridges<br />
PROMPT - not applicable; PROMPT merges <strong>on</strong>tologies<br />
GLUE similarity measures<br />
S-Match set-based (equal, disjoint,<br />
subset, superset)<br />
OntoMap OntoMapO OntoMapO allows specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> relati<strong>on</strong>ships<br />
between classes and also<br />
between classes and instances.<br />
RDFDiff changed, added, deleted RDFDiff detects changed, added and<br />
deleted triples between versi<strong>on</strong>s <str<strong>on</strong>g>of</str<strong>on</strong>g> an<br />
RDF document<br />
OMEN Bayesian Network with individual<br />
pairs <str<strong>on</strong>g>of</str<strong>on</strong>g> matches as<br />
nodes<br />
WSMX Mediati<strong>on</strong> SEKT Abstract Mapping<br />
Language<br />
DOME SEKT Abstract Map[ping<br />
Language<br />
Integrati<strong>on</strong> Systems<br />
InfoSleuth wrappers no <strong>on</strong>tology mapping; just mapping<br />
data schemas to <strong>on</strong>tologies<br />
ONION Articulati<strong>on</strong> rules<br />
OBSERVER Extended Relati<strong>on</strong>al Algebra<br />
for mapping <strong>on</strong>tology-<br />
DB and DL and transformer<br />
functi<strong>on</strong>s for mapping between<br />
<strong>on</strong>tologies<br />
MOMIS ODL I 3 wrappers are used to integrate data<br />
sources<br />
INFOMIX Disjunctive Datalog wrappers are used to integrate data<br />
sources<br />
AutoMed HDM wrappers are used to integrate data<br />
sources<br />
Table 5.2: Mapping Language
CHAPTER 5. COMPARISON OF THE METHODS 105<br />
5.3 Mapping Patterns<br />
N<strong>on</strong>e <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> uses mapping patterns in <str<strong>on</strong>g>the</str<strong>on</strong>g> way proposed by<br />
[PGM98]. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> types <str<strong>on</strong>g>of</str<strong>on</strong>g> mappings <str<strong>on</strong>g>of</str<strong>on</strong>g>ten present in specific mapping languages<br />
(e.g. <str<strong>on</strong>g>the</str<strong>on</strong>g> bridges in RDFT and MAFRA and <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> rules in ONION) can be seen<br />
as elementary mapping patterns. One could combine a number <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se bridges to create<br />
more complex mapping patterns. However, <str<strong>on</strong>g>the</str<strong>on</strong>g>re is no explicit support for such combinati<strong>on</strong>s<br />
in existing approaches.<br />
5.4 Automati<strong>on</strong> Support<br />
Table 5.3 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> automati<strong>on</strong> support provided by <str<strong>on</strong>g>the</str<strong>on</strong>g> different approaches. Both<br />
<strong>on</strong>tology matchers (GLUE and S-Match) are completely automated, in <str<strong>on</strong>g>the</str<strong>on</strong>g> sense that after<br />
inputting two <strong>on</strong>tologies, <str<strong>on</strong>g>the</str<strong>on</strong>g> similarities between c<strong>on</strong>cepts in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology are returned<br />
without any user interacti<strong>on</strong>s. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> matching <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies is just <strong>on</strong>e step in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
overall mapping process (see Secti<strong>on</strong> 1.2). Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g>se approaches automate <strong>on</strong>ly<br />
p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process.<br />
Mapping (or merging) <strong>on</strong>tologies is <str<strong>on</strong>g>of</str<strong>on</strong>g>ten an interactive process (e.g. in PROMPT),<br />
where <str<strong>on</strong>g>the</str<strong>on</strong>g> tool suggests mapping or merging acti<strong>on</strong>s to <str<strong>on</strong>g>the</str<strong>on</strong>g> user and <str<strong>on</strong>g>the</str<strong>on</strong>g> user can choose<br />
to ei<str<strong>on</strong>g>the</str<strong>on</strong>g>r perform <str<strong>on</strong>g>the</str<strong>on</strong>g> suggested acti<strong>on</strong>, to discard it or to perform a different acti<strong>on</strong>. After<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> user interacti<strong>on</strong>, <str<strong>on</strong>g>the</str<strong>on</strong>g> tool has more informati<strong>on</strong> to come up with more accurate suggesti<strong>on</strong>s.<br />
It is not clear if and how such <strong>on</strong>e-shot matchers as GLUE and S-Match could<br />
fit in such an interactive process.<br />
The integrati<strong>on</strong> systems ONION and MOMIS use specific tools (SKAT and<br />
ARTEMIS, respectively) for <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery <str<strong>on</strong>g>of</str<strong>on</strong>g> similarity between <strong>on</strong>tologies. These tools<br />
are typically integrated in <str<strong>on</strong>g>the</str<strong>on</strong>g> system, which allows user interacti<strong>on</strong> in <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process.<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>text <str<strong>on</strong>g>of</str<strong>on</strong>g> both MAFRA and RDFT, techniques were described to do <strong>on</strong>tology<br />
matching. In this c<strong>on</strong>text, MAFRA exploits <str<strong>on</strong>g>the</str<strong>on</strong>g> terms and <str<strong>on</strong>g>the</str<strong>on</strong>g> structure <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies<br />
for <str<strong>on</strong>g>the</str<strong>on</strong>g> matching and RDFT exploits <str<strong>on</strong>g>the</str<strong>on</strong>g> instance descripti<strong>on</strong>s associated with <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
<strong>on</strong>tologies to find similarities.<br />
5.5 Applicability to Use Cases<br />
The integrati<strong>on</strong> systems (InfoSleuth, MOMIS, OBSERVER and ONION) in this <str<strong>on</strong>g>survey</str<strong>on</strong>g><br />
typically support all instance mediati<strong>on</strong> use cases presented in Chapter 2. This is because<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> typical use case for data integrati<strong>on</strong> systems is <str<strong>on</strong>g>the</str<strong>on</strong>g> integrated querying <str<strong>on</strong>g>of</str<strong>on</strong>g> multiple<br />
data sources using a unified view (<strong>on</strong>tology). The querying <str<strong>on</strong>g>of</str<strong>on</strong>g> a unified view can be<br />
decomposed into query rewriting (<str<strong>on</strong>g>the</str<strong>on</strong>g> query in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology has to be
CHAPTER 5. COMPARISON OF THE METHODS 106<br />
Approach Automati<strong>on</strong> Support Remarks<br />
Methods and Tools<br />
MAFRA lexical and structural matching<br />
and semi-automatic creati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> mappings<br />
RDFT discovery <str<strong>on</strong>g>of</str<strong>on</strong>g> similarities<br />
based <strong>on</strong> instance data<br />
PROMPT name & structural matching semi-automatic <strong>on</strong>tology merging,<br />
where merge acti<strong>on</strong>s are suggested<br />
based <strong>on</strong> similarities<br />
GLUE multi-strategy machine<br />
learning approach<br />
S-Match matching based <strong>on</strong> synsets<br />
from <str<strong>on</strong>g>the</str<strong>on</strong>g>sauri, using a SAT<br />
solver<br />
OntoMap - automati<strong>on</strong> is supported when two <strong>on</strong>tologies<br />
are mapped to a comm<strong>on</strong> <strong>on</strong>tology<br />
RDFDiff changes between versi<strong>on</strong>s are<br />
detected automatically<br />
OMEN mappings that depend <strong>on</strong><br />
o<str<strong>on</strong>g>the</str<strong>on</strong>g>r mappings can be inferred<br />
automatically<br />
WSMX Mediati<strong>on</strong> String Matching, Wordnet<br />
similarity and real time user<br />
feedback<br />
DOME No specific automati<strong>on</strong> support.<br />
The use <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Mapping<br />
Language enable <str<strong>on</strong>g>the</str<strong>on</strong>g> possibility<br />
to use automati<strong>on</strong> tools<br />
exporting in this language.<br />
Integrati<strong>on</strong> Systems<br />
InfoSleuth -<br />
ONION term and structural matching<br />
using SKAT<br />
OBSERVER -<br />
MOMIS name and structural matching<br />
using ARTEMIS<br />
INFOMIX -<br />
AutoMed bidirecti<strong>on</strong>al similarity degrees<br />
Table 5.3: Automati<strong>on</strong> Support<br />
affinities computed by ARTEMIS are<br />
used to identify candidates for classes<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology
CHAPTER 5. COMPARISON OF THE METHODS 107<br />
rewritten in terms <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> local data source), instance transformati<strong>on</strong> (query results need<br />
to be translated from <str<strong>on</strong>g>the</str<strong>on</strong>g> local representati<strong>on</strong> to <str<strong>on</strong>g>the</str<strong>on</strong>g> global representati<strong>on</strong>) and instance<br />
unificati<strong>on</strong> (duplicates and redundancy have to be removed from <str<strong>on</strong>g>the</str<strong>on</strong>g> query results when<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> results from different data sources are combined).<br />
Both MAFRA and RDFT have specific support for instance transformati<strong>on</strong>s. In<br />
MAFRA, it is possible to attach an executable piece <str<strong>on</strong>g>of</str<strong>on</strong>g> code to a Semantic Bridge. In<br />
RDFT, it is possible to associate an XPath expressi<strong>on</strong> with a bridge. Because RDFT is<br />
used for transforming XML documents and RDF documents in <str<strong>on</strong>g>the</str<strong>on</strong>g>ir XML representati<strong>on</strong>,<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> XPath language can be used to express such transformati<strong>on</strong>s.<br />
A form <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology merging is performed in both <str<strong>on</strong>g>the</str<strong>on</strong>g> ONION and <str<strong>on</strong>g>the</str<strong>on</strong>g> MOMIS systems.<br />
In both systems, a global <strong>on</strong>tology is created, based <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies and<br />
database schemas. The global <strong>on</strong>tology is, in both cases, a virtual view over <str<strong>on</strong>g>the</str<strong>on</strong>g> underlying<br />
data sources; <str<strong>on</strong>g>the</str<strong>on</strong>g> local sources remain and mapping rules between <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology<br />
and <str<strong>on</strong>g>the</str<strong>on</strong>g> local <strong>on</strong>tologies are specified inside <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology.<br />
PROMPT, <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> o<str<strong>on</strong>g>the</str<strong>on</strong>g>r hand, is a pure <strong>on</strong>tology merging tool. The outcome <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
PROMPT tool is a merged versi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> source <strong>on</strong>tologies; no mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
sources and <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology are created; <str<strong>on</strong>g>the</str<strong>on</strong>g> merged <strong>on</strong>tology is assumed to replace<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> original <strong>on</strong>tologies.<br />
5.6 Implementati<strong>on</strong><br />
Table 5.4 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> type <str<strong>on</strong>g>of</str<strong>on</strong>g> implementati<strong>on</strong>s that have been made for <str<strong>on</strong>g>the</str<strong>on</strong>g> different<br />
approaches.<br />
As we can see from <str<strong>on</strong>g>the</str<strong>on</strong>g> table, most approaches have <strong>on</strong>ly been implemented as academic<br />
prototypes. For most approaches we are not aware <str<strong>on</strong>g>of</str<strong>on</strong>g> any planned fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r development<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> tools. Excepti<strong>on</strong>s are PROMPT, which is currently under active development<br />
and has recently been adapted for <str<strong>on</strong>g>the</str<strong>on</strong>g> Protégé OWL plugin; S-Match, which is currently<br />
being extended to take <str<strong>on</strong>g>the</str<strong>on</strong>g> different semantics <str<strong>on</strong>g>of</str<strong>on</strong>g> different relati<strong>on</strong>s in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies into<br />
account and to optimize <str<strong>on</strong>g>the</str<strong>on</strong>g> performance <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> implementati<strong>on</strong>; and RDFDiff, which will<br />
be fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r developed in <str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project.<br />
5.7 Experiences<br />
Table 5.5 enumerates <str<strong>on</strong>g>the</str<strong>on</strong>g> experiences with <str<strong>on</strong>g>the</str<strong>on</strong>g> various approaches reported in <str<strong>on</strong>g>the</str<strong>on</strong>g> literature.<br />
Most <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> experiences reported in <str<strong>on</strong>g>the</str<strong>on</strong>g> literature are really toy problems; we feel<br />
that real experience with <strong>on</strong>tology mapping and <strong>on</strong>tology-based informati<strong>on</strong> integrati<strong>on</strong><br />
is lacking. A cause <str<strong>on</strong>g>of</str<strong>on</strong>g> this problem is that <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web has not gained any real
CHAPTER 5. COMPARISON OF THE METHODS 108<br />
Approach Implementati<strong>on</strong> Remarks<br />
Methods and Tools<br />
MAFRA Two prototypes have been<br />
implemented<br />
RDFT Research Prototype<br />
PROMPT Versi<strong>on</strong> 2.1.1 PROMPT is still undergoing active development<br />
GLUE Research Prototype<br />
S-Match First prototype Work is still under way to improve <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
implementati<strong>on</strong><br />
OntoMap Prototype; under development<br />
since 2001<br />
RDFDiff Research Prototype The tool will be fur<str<strong>on</strong>g>the</str<strong>on</strong>g>r developed in<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> course <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> SEKT project<br />
OMEN first research prototype<br />
WSMX Mediati<strong>on</strong> Research Prototype<br />
DOME Research Prototype<br />
Integrati<strong>on</strong> Systems<br />
InfoSleuth Project Prototype<br />
ONION Research Prototype<br />
OBSERVER Research Prototype<br />
MOMIS Research Prototype<br />
INFOMIX INFOMIX IST Project<br />
AutoMed AutoMed project, LSDI<br />
project, ISPIDER project,<br />
RoDEX project<br />
Table 5.4: Implementati<strong>on</strong>
CHAPTER 5. COMPARISON OF THE METHODS 109<br />
Approach Experience Remarks<br />
Methods and Tools<br />
MAFRA Toy Problems<br />
RDFT Mapping product classificati<strong>on</strong><br />
schemes in GoldenBullet<br />
project<br />
PROMPT HPKB project; evaluati<strong>on</strong> using<br />
example <strong>on</strong>tologies<br />
GLUE Toy problems<br />
Semantic Matching Toy problems Was evaluated against o<str<strong>on</strong>g>the</str<strong>on</strong>g>r matchers<br />
OntoMap Applied to most upper-level<br />
<strong>on</strong>tologies<br />
RDFDiff Toy problems<br />
OMEN Toy problems<br />
WSMX Mediati<strong>on</strong> DIP European Project<br />
DOME DIP European project<br />
Integrati<strong>on</strong> Systems<br />
InfoSleuth Two elaborate case studies<br />
ONION Toy Examples<br />
OBSERVER Prototype with real-life bibliographic<br />
data<br />
MOMIS Toy Examples ARTEMIS (p<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> MOMIS) has been<br />
applied in <str<strong>on</strong>g>the</str<strong>on</strong>g> domain <str<strong>on</strong>g>of</str<strong>on</strong>g> Italian Public<br />
Administrati<strong>on</strong><br />
INFOMIX -<br />
AutoMed some case studies<br />
Table 5.5: Experience<br />
momentum as yet. Therefore, <str<strong>on</strong>g>the</str<strong>on</strong>g>re are currently not so many <strong>on</strong>tologies <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Web, although<br />
<str<strong>on</strong>g>the</str<strong>on</strong>g>re is some experience with real-life data sources, such as bibliographic sources.
Chapter 6<br />
C<strong>on</strong>clusi<strong>on</strong>s<br />
In this <str<strong>on</strong>g>survey</str<strong>on</strong>g> we have evaluated and compared several approaches to <strong>on</strong>tology mapping,<br />
<strong>on</strong>tology matching, <strong>on</strong>tology merging and data integrati<strong>on</strong>.<br />
Comparing different types <str<strong>on</strong>g>of</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> has made it clear that n<strong>on</strong>e<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches exactly fits all our criteria for <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic<br />
Web. The purposes <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> approaches in this <str<strong>on</strong>g>survey</str<strong>on</strong>g> tend to vary. The integrati<strong>on</strong> systems<br />
such as MOMIS [BCVB01] and ONION [MWK00, MW01] have <str<strong>on</strong>g>the</str<strong>on</strong>g> aim <str<strong>on</strong>g>of</str<strong>on</strong>g> providing<br />
query answering services over multiple data sources to <str<strong>on</strong>g>the</str<strong>on</strong>g> user. Matchers such as GLUE<br />
[DMDH04] and S-Match [GS04] have <str<strong>on</strong>g>the</str<strong>on</strong>g> more specific goal <str<strong>on</strong>g>of</str<strong>on</strong>g> finding similarities between<br />
schemas or <strong>on</strong>tologies. Integrati<strong>on</strong> systems <str<strong>on</strong>g>of</str<strong>on</strong>g>ten use matchers for <str<strong>on</strong>g>the</str<strong>on</strong>g> discovery <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
mappings (e.g. ONION uses SKAT; MOMIS uses ARTEMIS).<br />
MAFRA [MMSV02] and RDFT [Ome02b] provide meta-<strong>on</strong>tologies for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong>s<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> mappings between <strong>on</strong>tologies. These mappings can be used for instance transformati<strong>on</strong>s.<br />
In fact, both approaches include specific means to enable transformati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> instances between different representati<strong>on</strong>s. However, whereas InfoSleuth provides<br />
methods for <str<strong>on</strong>g>the</str<strong>on</strong>g> fusi<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> query results (instance unificati<strong>on</strong>), nei<str<strong>on</strong>g>the</str<strong>on</strong>g>r MAFRA nor RDFT<br />
handle this situati<strong>on</strong>. Query rewriting is also not explicitly handled, but this should not<br />
be a problem because bridges between entities in <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies are explicitly present and<br />
can be readily used for query rewriting.<br />
This state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g> has made clear that work still needs to be d<strong>on</strong>e in <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
area <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong> <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web. We can learn from <str<strong>on</strong>g>the</str<strong>on</strong>g> data integrati<strong>on</strong><br />
systems, which provide services for query answering over distributed heterogeneous data<br />
sources. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> current setting <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g>se integrati<strong>on</strong> systems is inside <str<strong>on</strong>g>the</str<strong>on</strong>g> enterprise,<br />
which is still a more-or-less c<strong>on</strong>trolled area. On <str<strong>on</strong>g>the</str<strong>on</strong>g> Web, not much c<strong>on</strong>trol over <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
use <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies can be expected and <str<strong>on</strong>g>the</str<strong>on</strong>g> global integrati<strong>on</strong> scenario is not expected to<br />
scale, because eventually different organizati<strong>on</strong>s will use different <strong>on</strong>tologies and will not<br />
want to commit to a new <strong>on</strong>tology. However, <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>e-to-<strong>on</strong>e integrati<strong>on</strong> approach is also<br />
not expected to scale, because it would require <str<strong>on</strong>g>the</str<strong>on</strong>g> maintenance <str<strong>on</strong>g>of</str<strong>on</strong>g> too many mappings<br />
110
CHAPTER 6. CONCLUSIONS 111<br />
Figure 6.1: <strong>Ontology</strong> “islands”: large ellipses depict locally global <strong>on</strong>tologies; small<br />
ellipses depict locally local <strong>on</strong>tologies<br />
between <strong>on</strong>tologies. Therefore, we expect a hybrid approach will appear, where we have<br />
several “islands” around influential domain <strong>on</strong>tologies, where within <str<strong>on</strong>g>the</str<strong>on</strong>g> island <str<strong>on</strong>g>the</str<strong>on</strong>g>re is<br />
a form <str<strong>on</strong>g>of</str<strong>on</strong>g> global integrati<strong>on</strong>; <strong>on</strong>e <strong>on</strong>tology would be <str<strong>on</strong>g>the</str<strong>on</strong>g> global <strong>on</strong>tology <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> islands<br />
and a number <str<strong>on</strong>g>of</str<strong>on</strong>g> local <strong>on</strong>tologies are mapped to this global <strong>on</strong>tology. Then, <str<strong>on</strong>g>the</str<strong>on</strong>g>re would<br />
be mappings between <str<strong>on</strong>g>the</str<strong>on</strong>g> islands, as illustrated in Figure 6.1. In <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tology mediati<strong>on</strong><br />
soluti<strong>on</strong> to be developed within SEKT, we need to take this into account and we need<br />
to combine global integrati<strong>on</strong> approaches, such as <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>es supported by MOMIS and<br />
ONION, with <strong>on</strong>e-to-<strong>on</strong>e mappings, which are supported by, for example, MAFRA.<br />
From <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping process (Secti<strong>on</strong> 1.2) we can already see that we need different<br />
types <str<strong>on</strong>g>of</str<strong>on</strong>g> methods and tools for its realizati<strong>on</strong>. Most notably, we need an <strong>on</strong>tology matcher<br />
in order to identify similarities between <strong>on</strong>tologies and we need a mapping language and a<br />
mapping tool for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology mapping. Examples <str<strong>on</strong>g>of</str<strong>on</strong>g> matchers are GLUE<br />
and S-Match. MAFRA and OntoMap provide mapping languages for <str<strong>on</strong>g>the</str<strong>on</strong>g> specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
mappings between <strong>on</strong>tologies.<br />
From <str<strong>on</strong>g>the</str<strong>on</strong>g> comparis<strong>on</strong> in Chapter 5 we can see that current approaches to <strong>on</strong>tology<br />
mapping have mostly been applied to toy problems and cannot be expected to scale both
CHAPTER 6. CONCLUSIONS 112<br />
in <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies to be mapped and <str<strong>on</strong>g>the</str<strong>on</strong>g> number <str<strong>on</strong>g>of</str<strong>on</strong>g> instances to <str<strong>on</strong>g>the</str<strong>on</strong>g> transformed<br />
at executi<strong>on</strong> time. Within SEKT we need to take into account <str<strong>on</strong>g>the</str<strong>on</strong>g> different approaches<br />
that are out <str<strong>on</strong>g>the</str<strong>on</strong>g>re and especially look into database integrati<strong>on</strong> approaches, which can<br />
overcome some <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> scalability issues with large sets <str<strong>on</strong>g>of</str<strong>on</strong>g> instances.<br />
Within SEKT <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>Ontology</strong> Mediati<strong>on</strong> plugin called OntoMap is being developed.<br />
Note that OntoMap shares <str<strong>on</strong>g>the</str<strong>on</strong>g> name with <str<strong>on</strong>g>the</str<strong>on</strong>g> mediati<strong>on</strong> system developed by Sirma (cf.<br />
Secti<strong>on</strong> 4.1.6), but is a completely different system. The OntoMap plugin uses F-Logic as<br />
<strong>Ontology</strong> Mapping Language and is integrated in OntoStudio which yields an integrated<br />
system. The Ontobroker reas<strong>on</strong>er is used for reas<strong>on</strong>ing with <str<strong>on</strong>g>the</str<strong>on</strong>g> <strong>on</strong>tologies, mediati<strong>on</strong><br />
rules and data. OntoStudio has support for different kinds <str<strong>on</strong>g>of</str<strong>on</strong>g> databases and source formats<br />
and thus is competitive with <str<strong>on</strong>g>the</str<strong>on</strong>g> presented integrated systems for mediati<strong>on</strong>. It also<br />
provides facilities for Query Answering. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, <str<strong>on</strong>g>the</str<strong>on</strong>g> following use cases can be<br />
implemented by OntoMap:<br />
• Instance Transformati<strong>on</strong><br />
• Instance Unificati<strong>on</strong><br />
• <strong>Ontology</strong> <strong>Merging</strong><br />
OntoMap supports Mapping Patterns which is not available in any <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> systems that<br />
have been <str<strong>on</strong>g>survey</str<strong>on</strong>g>ed in this state-<str<strong>on</strong>g>of</str<strong>on</strong>g>-<str<strong>on</strong>g>the</str<strong>on</strong>g>-<str<strong>on</strong>g>art</str<strong>on</strong>g> <str<strong>on</strong>g>survey</str<strong>on</strong>g>.<br />
Thus, OntoMap as a plugin for Ontostudio, can be c<strong>on</strong>sidered at least competitive<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> presented systems.
Bibliography<br />
[AC01] Maurizio Lenzerini Andrea Cal, Giuseppe De Giacomo. Models for informati<strong>on</strong><br />
integrati<strong>on</strong>: turning local-as-view into global-as-view. In Proc. <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong> Foundati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> Models for Informati<strong>on</strong> Integrati<strong>on</strong> (FMII<br />
2001), pages 270–284, 2001.<br />
[BCM + 03] Franz Baader, Diego Calvanese, Deborah L. McGuinness, Daniele Nardi,<br />
and Peter F. Patel-Schneider, editors. The Descripti<strong>on</strong> Logic Handbook.<br />
Cambridge University Press, 2003.<br />
[BCV99] S<strong>on</strong>ia Bergamaschi, Silvana Castano, and Maurizio Vincini. Semantic integrati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> semistructured and structured data sources. SIGMOD Record<br />
Special Issua <strong>on</strong> Semantic Interoperability in Global Informati<strong>on</strong>, 28(1),<br />
March 1999.<br />
[BCVB01] S<strong>on</strong>ia Bergamaschi, Silvana Castano, Maurizio Vincini, and Domenico Beneventano.<br />
Semantic integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogeneous informati<strong>on</strong> sources. Special<br />
Issue <strong>on</strong> Intelligent Informati<strong>on</strong> Integrati<strong>on</strong>, Data & Knowledge Engineering,<br />
36(1):215–249, 2001.<br />
[Bec03] Dave Beckett. RDF/XML syntax specificati<strong>on</strong> (revised). Recommendati<strong>on</strong><br />
10 February 2004, W3C, 2003.<br />
[BFG01] R. Baumg<str<strong>on</strong>g>art</str<strong>on</strong>g>ner, S. Flesca, and G. Gottlob. Visual web informati<strong>on</strong> extracti<strong>on</strong><br />
with LiXto. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> VLDB, 2001.<br />
[BG04] Dan Brickley and Ramanathan V. Guha. RDF vocabulary descripti<strong>on</strong> language<br />
1.0: RDF schema. Recommendati<strong>on</strong> 10 February 2004, W3C, 2004.<br />
Available from http://www.w3.org/TR/rdf-schema/.<br />
[BKL + 04] M. Boyd, S. Kittivoravitkul, P. Lazanitis, P.J. McBrien, and N. Rizopoulos.<br />
Automed: A bav data integrati<strong>on</strong> system for heterogeneous data sources. In<br />
Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> CAiSE04, 2004.<br />
[BLHL01] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic<br />
web. Scientific American, 284(5):34–43, May 2001.<br />
113
BIBLIOGRAPHY 114<br />
http://www.sciam.com/<str<strong>on</strong>g>art</str<strong>on</strong>g>icle.cfm?<str<strong>on</strong>g>art</str<strong>on</strong>g>icleID=00048144-10D2-1C70-<br />
84A9809EC588EF21&ref=sciam.<br />
[BS85] R<strong>on</strong>ald J. Brachman and James G. Schmolze. An overview <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> KL-ONE<br />
knowledge representati<strong>on</strong> system. Cognitive Science, 9(2):171–216, 1985.<br />
[CAdV01] Silvana Castano, Valeria De Ant<strong>on</strong>ellis, and Sabrina De Capitani di Vimercati.<br />
Global viewing <str<strong>on</strong>g>of</str<strong>on</strong>g> heterogeneous data sources. IEEE Transacti<strong>on</strong>s <strong>on</strong><br />
Knowledge and Data Engineering, 13(2):277–297, 2001.<br />
[CD99] James Clark and Steve DeRose. XML path language (XPath) versi<strong>on</strong> 1.0.<br />
Recommendati<strong>on</strong> 16 November 1999, W3C, 1999.<br />
[CdA99] Silvana Castano and Valeria de Ant<strong>on</strong>ellis. A schema analysis and rec<strong>on</strong>ciliati<strong>on</strong><br />
tool envir<strong>on</strong>ment for heterogeneous databases. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
1999 Internati<strong>on</strong>al Symposium <strong>on</strong> Database Engineering & Applicati<strong>on</strong>s.<br />
IEEE Computer Society, 1999.<br />
[CDGL01] Diego Calvanese, Giuseppe De Giacomo, and Maurizio Lenzerini. A framework<br />
for <strong>on</strong>tology integrati<strong>on</strong>. In Proc. <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 2001 Int. Semantic Web Working<br />
Symposium (SWWS 2001), pages 303–316, 2001.<br />
[CFF + 98] Vinay K. Chaudhri, Adam Farquhar, Richard Fikes, Peter D. Karp, and<br />
James P. Rice. OKBC: A programmatic foundati<strong>on</strong> for knowledge base<br />
interoperability. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Fifteenth Nati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Artificial<br />
Intelligence (AAAI-98), pages 600–607, Madis<strong>on</strong>, Wisc<strong>on</strong>sin, USA,<br />
1998. MIT Press.<br />
[Cha00] Hans Chalupsky. OntoMorph: A translati<strong>on</strong> system for symbolic knowledge.<br />
In Anth<strong>on</strong>y G. Cohn, Fausto Giunchiglia, and B<str<strong>on</strong>g>art</str<strong>on</strong>g> Selman, editors,<br />
KR 2000, Principles <str<strong>on</strong>g>of</str<strong>on</strong>g> Knowledge Representati<strong>on</strong> and Reas<strong>on</strong>ing Proceedings<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Seventh Internati<strong>on</strong>al C<strong>on</strong>ference, pages 471–482, Breckenridge,<br />
Colorado, USA, 2000. Morgan Kaufmann Publishers.<br />
[Che79] P. Chen. The entity relati<strong>on</strong>ship model - toward a unified view <str<strong>on</strong>g>of</str<strong>on</strong>g> data. ACM<br />
Transacti<strong>on</strong>s <strong>on</strong> Database Systems, 1(1):9–36, 1979.<br />
[Cla99] James Clark. XSL transformati<strong>on</strong>s (XSLT) versi<strong>on</strong> 1.0. Recommendati<strong>on</strong><br />
16 November 1999, W3C, 1999.<br />
[dB04] Jos de Bruijn. Semantic integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> disparate data sources in <str<strong>on</strong>g>the</str<strong>on</strong>g> cog<br />
project. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 6th Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Enterprise<br />
Informati<strong>on</strong> Systems (ICEIS2004), Porto, Portugal, 2004.<br />
[dBP04] Jos de Bruijn and Axel Polleres. Towards and <strong>on</strong>tology mapping specificati<strong>on</strong><br />
language for <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic web. Technical Report DERI-2004-06-30,<br />
DERI, 2004.
BIBLIOGRAPHY 115<br />
[dBPF04] Jos de Bruijn, Axel Polleres, and Dieter Fensel. OWL lite − . Deliverable<br />
d20v0.1, WSML, 2004.<br />
Available from http://www.wsmo.org/2004/d20/v0.1/.<br />
[DKO + 02] Ying Ding, M. Korotkiy, Borys Omelayenko, V. K<str<strong>on</strong>g>art</str<strong>on</strong>g>seva, V. Zykov, Michel<br />
Klein, Ellen Schulten, and Dieter Fensel. GoldenBullet: Automated classificati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> product data in e-commerce. In Withold Abramowicz, editor,<br />
Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> BIS 2002, Poznan, Poland, 2002.<br />
[DMDH02] AnHai Doan, Jazant Madhavan, Pedro Domingos, and Al<strong>on</strong> Halevy. Learning<br />
to map between <strong>on</strong>tologies <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> semantic web. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
World-Wide Web C<strong>on</strong>ference, 2002.<br />
[DMDH04] AnHai Doan, Jazant Madhaven, Pedro Domingos, and Al<strong>on</strong> Halevy. <strong>Ontology</strong><br />
matching: A machine learning approach. In Steffen Staab and Rudi<br />
Studer, editors, Handbook <strong>on</strong> Ontologies in Informati<strong>on</strong> Systems, pages<br />
397–416. Springer-Verlag, 2004.<br />
[DMQ02] Dejing Dou, Drew McDermott, and Peishen Qi. <strong>Ontology</strong> translati<strong>on</strong> by<br />
<strong>on</strong>tology merging and automated reas<strong>on</strong>ing. In Proc. EKAW2002 Workshop<br />
<strong>on</strong> Ontologies for Multi-Agent Systems, pages 3–18, 2002.<br />
[DR02] H<strong>on</strong>g-Hai Do and Erhard Rahm. COMA – a system for flexible combinati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> schema matching approaches. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> VLDB’02, pages<br />
610–621, 2002.<br />
[DR05] H. Lausen (eds.) D. Roman, U. Keller. Web service modeling <strong>on</strong>tology.<br />
Technical report, DERI, 2005.<br />
[DS04] Mike Dean and Guus Schreiber, editors. OWL Web <strong>Ontology</strong> Language<br />
Reference. 2004. W3C Recommendati<strong>on</strong> 10 February 2004.<br />
[EA] P.McBrien E.Jasper, N.T<strong>on</strong>g and A.Poulovassilis. Generating and optimising<br />
views from both as view data integrati<strong>on</strong> rules. In Proc. 6th Baltic C<strong>on</strong>ference<br />
<strong>on</strong> Database and Informati<strong>on</strong> Systems (DBIS’04), Riga, June 2004.<br />
[ES04a] Marc Ehrig and Steffen Staab. QOM - quick <strong>on</strong>tology mapping. Hiroshima,<br />
Japan, November 2004.<br />
[ES04b] Marc Ehrig and York Sure. <strong>Ontology</strong> mapping - an integrated approach.<br />
In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> First European Semantic Web Symposium, Herakli<strong>on</strong>,<br />
Greece, May 2004.<br />
[Fel99] Christiane Fellbaum, editor. WordNet: An Electr<strong>on</strong>ic Lexical Database.<br />
MIT Press, 1999.
BIBLIOGRAPHY 116<br />
[Fen03] Dieter Fensel. Ontologies: Silver Bullet for Knowledge Management and<br />
Electr<strong>on</strong>ic Commerce, 2nd editi<strong>on</strong>. Springer-Verlag, Berlin, 2003.<br />
[FMvH + 03] Dieter Fensel, Enrico Motta, Frank van Harmelen, V. Richard Benjamins,<br />
Stefan Decker, Mauro Gaspari, Rix Groenboom, William Grosso, Mark A.<br />
Musen, Enric, Guus Schreiber, Rudi Studer, and Bob Wielinga. The unified<br />
problem-solving method development language upml. Knowledge and<br />
Informati<strong>on</strong> Systems(KAIS) journal, 5(1), 2003.<br />
[FNPB99] Jerry Fowler, Marian Nodine, Brad Perry, and Bruce Bargmeyer. Agentbased<br />
semantic interoperability in infosleuth. SIGMOD Record, 28(1),<br />
1999.<br />
[GGM98] John H. Gennari, William Grosso, and Mark A. Musen. A methoddescripti<strong>on</strong><br />
language: An initial <strong>on</strong>tology with examples. In Proceedings<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Eleventh Banff Knowledge Acquisiti<strong>on</strong> for Knowledge-Bases Systems<br />
Workshop, Banff, Canada, 1998.<br />
[Gru94] T. Gruber. An <strong>on</strong>tology for engineering ma<str<strong>on</strong>g>the</str<strong>on</strong>g>matics. Technical report,<br />
Knowledge Systems Laboratory, Stanford University, 1994. Report KSL-<br />
94-18.<br />
[GS04] Fausto Giunchiglia and Pavel Shvaiko. Semantic matching. The Knowledge<br />
Engineering Review, 18(3):265–280, 2004.<br />
[GSY04] Fausto Giunchiglia, Pavel Shvaiko, and Mikalai Yatskevich. S-match: an<br />
algorithm and an implementati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> semantic matching. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
ESWS’04, number 3053 in LNCS, pages 61–75, Herakli<strong>on</strong>, Greece, 2004.<br />
Springer-Verlag.<br />
[GW99] Bernhard Ganter and Rudolph Wille. Formal c<strong>on</strong>cept analysis: Ma<str<strong>on</strong>g>the</str<strong>on</strong>g>matical<br />
Foundati<strong>on</strong>s. Springer, Berlin-Heidelberg, 1999.<br />
[Hen05] Jan Henke. Omwg d8.2 editing and browsing design. Technical report,<br />
2005.<br />
[HM93] Joachim Hammer and Dennis McLeod. An approach to resolving semantic<br />
heterogeneity in a federati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> aut<strong>on</strong>omous, heterogeneous, database<br />
systems. Internati<strong>on</strong>al Journal <strong>on</strong> Intelligent and Cooperative Informati<strong>on</strong><br />
Systems, 2(1):51–83, 1993.<br />
[Hul97] Richard Hull. Managing semantic heterogeneity in databases: A <str<strong>on</strong>g>the</str<strong>on</strong>g>oretical<br />
perspective. In ACM Symposium <strong>on</strong> Principles <str<strong>on</strong>g>of</str<strong>on</strong>g> Database Systems, pages<br />
51–61, Tusc<strong>on</strong>, Ariz<strong>on</strong>a, USA, 1997.<br />
[Ker05] M. Kerrigan. Web service modeling toolkit (wsmt). Technical report,<br />
WSMX Working Draft, 2005.
BIBLIOGRAPHY 117<br />
[KFNM] Holger Knublauch, Ray W. Fergers<strong>on</strong>, Natalya F. Noy, and Mark A. Musen.<br />
The protégé owl plugin: An open development envir<strong>on</strong>ment for semanticweb<br />
applicati<strong>on</strong>s. In J. Allenet et al., editors, Third Internati<strong>on</strong>al Semantic<br />
Web C<strong>on</strong>ference - ISWC 2004, Hiroshima, Japan.<br />
[KKOF02] Michel Klein, Atanas Kiryakov, Damyan Ognyanov, and Dieter Fensel.<br />
Finding and characterizing changes in <strong>on</strong>tologies. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
21st Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> C<strong>on</strong>ceptual Modeling (ER2002), Tampere,<br />
Finland, 2002.<br />
[Kle01] Michel Klein. Combining and relating <strong>on</strong>tologies: an analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> problems<br />
and soluti<strong>on</strong>s. In Asunci<strong>on</strong> Gomez-Perez, Michael Gruninger, Heiner<br />
Stuckenschmidt, and Michael Uschold, editors, Workshop <strong>on</strong> Ontologies<br />
and Informati<strong>on</strong> Sharing, IJCAI’01, Seattle, USA, August 4–5, 2001.<br />
[Kle04] Michel Klein. Change Management for Distributed Ontologies. PhD <str<strong>on</strong>g>the</str<strong>on</strong>g>sis,<br />
Free University <str<strong>on</strong>g>of</str<strong>on</strong>g> Amsterdam, 2004.<br />
[KS00] Atanas Kiryakov and Kiril Iv. Simov. Mapping <str<strong>on</strong>g>of</str<strong>on</strong>g> eurowordnet top <strong>on</strong>tology<br />
to upper cyc <strong>on</strong>tology. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> Ontologies and Text workshop,<br />
EKAW 2000, Juan-les-Pins, French Riviera, 2000.<br />
[KSD01a] Atanas Kiryakov, Kiril Iv. Simov, and Marin Dimitrov. Ontomap: The<br />
upper-<strong>on</strong>tology portal. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> ”Formal <strong>Ontology</strong> in Informati<strong>on</strong><br />
Systems”, Ogunquit, Maine, 2001.<br />
[KSD01b] Atanas Kiryakov, Kiril Iv. Simov, and Marin Dimitrov. Tr1. <strong>on</strong>tomap: The<br />
upper-<strong>on</strong>tology portal. revisi<strong>on</strong> 2, Ontotext, 2001.<br />
[Len] Maurizio Lenzerini. Data integrati<strong>on</strong>: A <str<strong>on</strong>g>the</str<strong>on</strong>g>oretical perspective. In PODS<br />
2002, pages 233–246.<br />
[Lev00] Al<strong>on</strong> Y. Levy. Logic-Based Techniques in Data Integrati<strong>on</strong>, pages 575–595.<br />
Kluwer Publishers, 2000.<br />
[LGR + 05a] N. Le<strong>on</strong>e, G. Gottlob, R. Rosati, G. Greco, G. Ianni, V. Lio, V. Terracina,<br />
T. Eiter, W. Faber, M. Fink, D. Lembo, M. Lenzerini, M. Ruzzi, E. Kalka,<br />
B. Nowicki, and W. Staniszkis. Data integrati<strong>on</strong> by logic programming: The<br />
INFOMIX system. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Eighth Internati<strong>on</strong>al C<strong>on</strong>ference<br />
<strong>on</strong> Logic Programming and N<strong>on</strong>m<strong>on</strong>ot<strong>on</strong>ic Reas<strong>on</strong>ing (LPNMR), 2005.<br />
[LGR + 05b] N. Le<strong>on</strong>e, G. Gottlob, R. Rosati, G. Greco, G. Ianni, V. Lio, V. Terracina,<br />
T. Eiter, W. Faber, M. Fink, D. Lembo, M. Lenzerini, M. Ruzzi, E. Kalka,<br />
B. Nowicki, and W. Staniszkis. The INFOMIX system for advanced integrati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> incomplete and inc<strong>on</strong>sistent data. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ACM<br />
SIGMOD Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Management <str<strong>on</strong>g>of</str<strong>on</strong>g> Data, pages 915–<br />
917, 2005.
BIBLIOGRAPHY 118<br />
[LLR02] Domenico Lembo, Maurizio Lenzerini, and Ricardo Rosati. Functi<strong>on</strong>al<br />
specificati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> infomix system. Technical Report D2.1, University <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
Rome, 2002.<br />
[LPF + 05] N. Le<strong>on</strong>e, G. Pfeifer, W. Faber, T. Eiter, G. Gottlob, S. Perri, and F. Scarcello.<br />
The dlv system for knowledge representati<strong>on</strong> and reas<strong>on</strong>ing. In ACM<br />
Transacti<strong>on</strong>s <strong>on</strong> Computati<strong>on</strong>al Logic, 2005.<br />
[LS99] Ora Lassila and Ralph R. Swick. Resource descripti<strong>on</strong> framework (RDF)<br />
model and syntax specificati<strong>on</strong>. W3c recommendati<strong>on</strong>, W3C, 1999.<br />
http://www.w3.org/TR/1999/REC-rdf-syntax-19990222.<br />
[MBR01] Jayant Madhavan, Philip A. Bernstein, and Erhard Rahm. Generic schema<br />
matching with cupid. In Proc. 27th Int. C<strong>on</strong>f. <strong>on</strong> Very Large Data Bases<br />
(VLDB), 2001.<br />
[MFRW00] Deborah L. McGuinness, Richard Fikes, James Rice, and Steve Wilder.<br />
An envir<strong>on</strong>ment for merging and testing large <strong>on</strong>tologies. In Proc. 7th<br />
Intl. C<strong>on</strong>f. On Principles <str<strong>on</strong>g>of</str<strong>on</strong>g> Knowledge Representati<strong>on</strong> and Reas<strong>on</strong>ing<br />
(KR2000), Colorado, USA, April 2000.<br />
[MGMR02] Sergey Melnik, Hector Garcia-Molina, and Erhard Rahm. Similarity flooding:<br />
A versatile graph matching algorithm. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> ICDE, pages<br />
117–128, 2002.<br />
[MIKS00] Eduardo Mena, Arantza Illarramendi, Vipul Kashyap, and Amit P. Sheth.<br />
OBSERVER: An approach for query processing in global informati<strong>on</strong> systems<br />
based <strong>on</strong> interoperati<strong>on</strong> across pre-existing <strong>on</strong>tologies. Distributed and<br />
Parallel Databases, 8(2):223–271, 2000.<br />
[MMSV02] Alexander Maedche, Boris Motik, Nuno Silva, and Raphael Volz. Mafra a<br />
mapping framework for distributed <strong>on</strong>tologies. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 13th<br />
European C<strong>on</strong>ference <strong>on</strong> Knowledge Engineering and Knowledge Management<br />
EKAW-2002, Madrid, Spain, 2002.<br />
[MNJ04] P. Mitra, N. F. Noy, and A. R. Jaiswal. Omen: A probabilistic <strong>on</strong>tology<br />
mapping tool. In Workshop <strong>on</strong> Meaning coordinati<strong>on</strong> and negotiati<strong>on</strong> at <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Third Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g> Semantic Web (ISWC), Hiroshima,<br />
Japan, 2004.<br />
[Moc05] A. Mocan. Wsmx data mediati<strong>on</strong>. Technical report, WSMX Working Draft,<br />
2005.<br />
[MP03] P.J. McBrien and A. Poulovassilis. Data integrati<strong>on</strong> by bi-directi<strong>on</strong>al schema<br />
transformati<strong>on</strong> rules. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> ICDE03, pages 227 – 238. IEEE,<br />
2003.
BIBLIOGRAPHY 119<br />
[MRB03] Sergey Melnik, Erhard Rahm, and Philip A. Bernstein. Developing<br />
metadata-intensive applicati<strong>on</strong>s with r<strong>on</strong>do. Journal <str<strong>on</strong>g>of</str<strong>on</strong>g> Web Semantics, 1(1),<br />
December 2003.<br />
[MvH04] Deborah L. McGuinness and Frank van Harmelen. OWL web <strong>on</strong>tology language<br />
overview. Recommendati<strong>on</strong> 10 February 2004, W3C, 2004. Available<br />
from http://www.w3.org/TR/owl-features/.<br />
[MW01] Prasenjit Mitra and Gio Wiederhold. An algebra for semantic interoperability<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> informati<strong>on</strong> sources. In IEEE Internati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Bioinformatics<br />
and Biomedical Egineering, pages 174–182, 2001.<br />
[MWJ99] Prasenjit Mitra, Gio Wiederhold, and Jan Jannink. Semi-automatic integrati<strong>on</strong><br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> knowledge sources. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> Fusi<strong>on</strong> 99, Sunnydale,<br />
California, USA, July 1999.<br />
[MWK00] Prasenjit Mitra, Gio Wiederhold, and M<str<strong>on</strong>g>art</str<strong>on</strong>g>in L. Kersten. A graph-oriented<br />
model for <str<strong>on</strong>g>art</str<strong>on</strong>g>iculati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tology interdependencies. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
C<strong>on</strong>ference <strong>on</strong> Extending Database Technology (EDBT 2000), K<strong>on</strong>stanz,<br />
Germany, March 2000.<br />
[MZ05] M. Moran Michal Zaremba. Wsmx architecture. Technical report, WSMO<br />
Working Draft, 2005.<br />
[NFK + 00] Marian H. Nodine, Jerry Fowler, Tomasz Ksiezyk, Brad Perry, Malcolm<br />
Taylor, and Amy Unruh. Active informati<strong>on</strong> ga<str<strong>on</strong>g>the</str<strong>on</strong>g>ring in infosleuth. Internati<strong>on</strong>al<br />
Journal <str<strong>on</strong>g>of</str<strong>on</strong>g> Cooperative Informati<strong>on</strong> Systems, 9(1-2):3–28, 2000.<br />
[NM99] Natalya F. Noy and Mark A. Musen. Sm<str<strong>on</strong>g>art</str<strong>on</strong>g>: Automated support for <strong>on</strong>tology<br />
merging and alignment. Technical Report SMI-1999-0813, Stanford<br />
Medical Informatics, 1999.<br />
[NM00a] Natalya F. Noy and Mark A. Musen. Anchor-prompt: Using n<strong>on</strong>-local c<strong>on</strong>text<br />
for semantic matching. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong> Ontologies<br />
and Informati<strong>on</strong> Sharing at <str<strong>on</strong>g>the</str<strong>on</strong>g> Seventeenth Internati<strong>on</strong>al Joint C<strong>on</strong>ference<br />
<strong>on</strong> Artificial Intelligence (IJCAI-2001), Seattle, WA, USA, 2000.<br />
[NM00b] Natalya F. Noy and Mark A. Musen. Prompt: Algorithm and tool for automated<br />
<strong>on</strong>tology merging and alignment. In Proc. 17th Natl. C<strong>on</strong>f. On<br />
Artificial Intelligence (AAAI2000), Austin, Texas, USA, July/August 2000.<br />
[NM03a] Natalya F. Noy and Mark A. Musen. <strong>Ontology</strong> versi<strong>on</strong>ing as an element <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
an <strong>on</strong>tology-management framework. To be published in IEEE Intelligent<br />
Systems, 2003.
BIBLIOGRAPHY 120<br />
[NM03b] Natalya F. Noy and Mark A. Musen. The PROMPT suite: Interactive<br />
tools for <strong>on</strong>tology merging and mapping. Internati<strong>on</strong>al Journal <str<strong>on</strong>g>of</str<strong>on</strong>g> Human-<br />
Computer Studies, 59(6):983–1024, 2003.<br />
[OF01] Borys Omelayenko and Dieter Fensel. A two-layered integrati<strong>on</strong> approach<br />
for product informati<strong>on</strong> in B2B e-commerce. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Sec<strong>on</strong>d<br />
Intenati<strong>on</strong>al C<strong>on</strong>ference <strong>on</strong> Electr<strong>on</strong>ic Commerce and Web Technologies<br />
(EC WEB-2001), Munich, Germany, 2001. Springer-Verlag.<br />
[Ome02a] Borys Omelayenko. Integrating vocabularies: Discovering and representing<br />
vocabulary maps. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> First Internati<strong>on</strong>al Semantic Web<br />
C<strong>on</strong>ference (ISWC2002), Sardinia, Italy, 2002.<br />
[Ome02b] Borys Omelayenko. RDFT: A mapping meta-<strong>on</strong>tology for business integrati<strong>on</strong>.<br />
In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong> Knowledge Transformati<strong>on</strong> for <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
Semantic Web (KTSW 2002) at <str<strong>on</strong>g>the</str<strong>on</strong>g> 15-th European C<strong>on</strong>ference <strong>on</strong> Artificial<br />
Intelligence, pages 76–83, Ly<strong>on</strong>, France, 2002.<br />
[PGM98] John Y. Park, John H. Gennari, and Mark A. Musen. Mappings for reuse in<br />
knowledge-based systems. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> 11th Workshop <strong>on</strong> Knowledge<br />
Acquisiti<strong>on</strong>, Modelling and Management (KAW 98), Banff, Canada,<br />
1998.<br />
[RB01] Erhard Rahm and Philip A. Bernstein. A <str<strong>on</strong>g>survey</str<strong>on</strong>g> <str<strong>on</strong>g>of</str<strong>on</strong>g> approaches to automatic<br />
schema matching. VLDB Journal: Very Large Data Bases, 10(4):334–350,<br />
2001.<br />
[Riz03] N. Rizopoulos. Bav transformati<strong>on</strong>s <strong>on</strong> relati<strong>on</strong>al schemas based <strong>on</strong> semantic<br />
relati<strong>on</strong>ships between attributes. Technical Report Technical Report 22,<br />
Imperial College L<strong>on</strong>d<strong>on</strong>, 2003.<br />
[SaR03a] Nuno Silva and Jo ao Rocha. <strong>Ontology</strong> mapping for interoperability in<br />
semantic web. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> IADIS Internati<strong>on</strong>al C<strong>on</strong>ference<br />
WWW/Internet 2003 (ICWI’2003), Algarve, Portugal, 2003.<br />
[SaR03b] Nuno Silva and Jo ao Rocha. Service-oriented <strong>on</strong>tology mapping system. In<br />
Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Workshop <strong>on</strong> Semantic Integrati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> Internati<strong>on</strong>al<br />
Semantic Web C<strong>on</strong>ference (ISWC2003), Sanibel Island, USA, 2003.<br />
[SdB05] François Scharffe and Jos de Bruijn. A language to specify corresp<strong>on</strong>dences<br />
between <strong>on</strong>tologies. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> IEEE Signal Image and Internet<br />
Based Systems (SITIS-05), 2005.<br />
[Sil02] N. Silva. Discovering mappings between distributed <strong>on</strong>tologies. Internal<br />
Report July, University <str<strong>on</strong>g>of</str<strong>on</strong>g> Karlsruhe, 2002.
BIBLIOGRAPHY 121<br />
[SM01] Gerd Stumme and Alexander Maedche. Fca-merge: Bottom-up merging <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
<strong>on</strong>tologies. In 7th Intl. C<strong>on</strong>f. <strong>on</strong> Artificial Intelligence (IJCAI ’01), pages<br />
225–230, Seattle, WA, USA, 2001.<br />
[Smi90] D.C. Smith. Plisp Users Manual. Apple Computers, august, 1990 editi<strong>on</strong>,<br />
1990.<br />
[Usc00] Mike Uschold. Creating, integrating, and maintaining local and global <strong>on</strong>tologies.<br />
In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> First Workshop <strong>on</strong> <strong>Ontology</strong> Learning (OL-<br />
2000) in c<strong>on</strong>juncti<strong>on</strong> with <str<strong>on</strong>g>the</str<strong>on</strong>g> 14th European C<strong>on</strong>ference <strong>on</strong> Artificial Intelligence<br />
(ECAI-2000), Berlin, Germany, August 2000.<br />
[VC98] Pepijn R. S. Visser and Zhan Cui. On accepting heterogeneous <strong>on</strong>tologies<br />
in distributed architectures. In Proceedings <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> ECAI98 workshop <strong>on</strong> applicati<strong>on</strong>s<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tologies and problem-solving methods, Bright<strong>on</strong>, UK, 1998.<br />
[VJBCS97] Pepijn R. S. Visser, Dean M. J<strong>on</strong>es, T. J. M. Bench-Cap<strong>on</strong>, and M. J. R.<br />
Shave. An analysis <str<strong>on</strong>g>of</str<strong>on</strong>g> <strong>on</strong>tological mismatches: Heterogeneity versus interoperability.<br />
In AAAI 1997 Spring Symposium <strong>on</strong> Ontological Engineering,<br />
Stanford, USA, 1997.<br />
[Wie94] Gio Wiederhold. An algebra for <strong>on</strong>tology compositi<strong>on</strong>. In Proceedings<br />
<str<strong>on</strong>g>of</str<strong>on</strong>g> 1994 M<strong>on</strong>terey Workshop <strong>on</strong> formal Methods, pages 56–61, U.S. Naval<br />
Postgraduate School, M<strong>on</strong>terey CA, 1994.