27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

fying the equivalences between concepts, associating them<br />

in those three models. For example: a) the XML element<br />

Writer is equivalent to the OWL concept Author that<br />

is represented by the table Author; b) the XML element<br />

name is equivalent to the OWL concept fullName that is<br />

represented by the column fullName;<br />

The output of the first level is the mapping file (XML<br />

OWL RDB (Relational Database) Mapping). The XML descriptor<br />

is an auxiliary file created by the programmer using<br />

the Castor library. Section 6 describes further details.<br />

The third level is in charge of interpreting the mapping<br />

file with the XML descriptor. As a result, objects are created<br />

in the level 4. The manager is responsible for interpreting<br />

the XML files according to the mapping objects. If<br />

a conflict occurs, conflict resolution becomes necessary to<br />

resolve the data inconsistency. Section 5 addresses specifically<br />

about the conflicts that can occur between XML files.<br />

With the mapping objects in memory, the manager is responsible<br />

for creating the header objects for the inclusion<br />

into the DB along with the respective tuples, as shown in<br />

Figure 3. Finally, level 5 generates the DML 1 file to insert<br />

the content of XML instances into tables and columns.<br />

Figure 3. Header layout<br />

Next we present the input artifacts produced by OntoGen<br />

(ontology) and OntoRel (relational schema), as well as three<br />

XML documents. These artifacts are important for understanding<br />

the XMap approach and the remaining paper.<br />

4.1 Input Artifacts<br />

Each XML file can have a particular structure. Figure 4<br />

illustrates three XML files that contains data about authors,<br />

publications and institutions. Although these files belong to<br />

the same application domain (published papers in scientific<br />

events), they have some structural differences, such as:<br />

• On documents “A” e “C” (line 10), the author concept<br />

is ; on document “B” (line 10), the same<br />

concept is ;<br />

• On document “A” (line 10), the concept author is lexical<br />

2 ; on documents “B” e “C” (line 10), this concept<br />

is non-lexical 3 ;<br />

1 DML - Data Manipulation Language.<br />

2 Atomic elements directly represented in computer, such as a string.<br />

3 Complex element, such as an author composed by name and address.<br />

Figure 4. Sample XML documents<br />

Querying data in such heterogenous structure would require<br />

individual extraction processes that could represent a<br />

bottleneck to the system. To store the content of these files<br />

into a RDB, our work uses a ontology as a global schema to<br />

handle the structural distinctions between the documents.<br />

Figure 5 shows the resulting OWL ontology, considering<br />

the sample XML files as input to the OntoGen component.<br />

Details of the ontology generation are found in [7].<br />

Then the ontology is mapped to a relational schema by<br />

the OntoRel component using some mapping rules:<br />

440

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!