25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 Preliminaries and Existing Techniques<br />

Due to these goals <strong>of</strong> integration that are based on different application use cases,<br />

a variety <strong>of</strong> integration approaches and realizations emerged in the past. In order to<br />

precisely separate the focus <strong>of</strong> this thesis, we classify those integration approaches, where<br />

integration flows are only one category among others.<br />

2.1.1 Classification <strong>of</strong> <strong>Integration</strong> Approaches<br />

There exist several, partially overlapping, classification aspects <strong>of</strong> integration approaches,<br />

which include the (1) application area, (2) data location, (3) time aspect and consistency<br />

model, (4) event model, (5) topology, and (6) specification method. We explicitly exclude<br />

orthogonal integration aspects such as schema matching, model management, master data<br />

management, lineage tracing and the integration <strong>of</strong> semi-structured and unstructured data<br />

because techniques from these areas are typically not specifically designed for a certain<br />

integration approach.<br />

<strong>Integration</strong><br />

Application<br />

Area<br />

GUI<br />

<strong>Integration</strong><br />

Process<br />

<strong>Integration</strong><br />

Application<br />

<strong>Integration</strong><br />

Information<br />

<strong>Integration</strong><br />

Data<br />

Location<br />

virtual materialized<br />

materialized materialized virtual<br />

System<br />

Aspect<br />

Portals Mashups WfMS WSMS BPEL<br />

Engines<br />

EAI<br />

servers<br />

MOM<br />

systems<br />

ETL<br />

Tools<br />

Publish/<br />

Subscribe<br />

DSMS<br />

PDMS VDBMS/<br />

FDBMS<br />

Specification<br />

Method<br />

User-Interface-Oriented<br />

<strong>Integration</strong> <strong>Flows</strong><br />

Query-<strong>Based</strong><br />

Figure 2.1: Classification <strong>of</strong> <strong>Integration</strong> Approaches<br />

Our overall classification that is shown in Figure 2.1 comprises the aspects (1) application<br />

area, (2) data location, and (6) specification method. In addition, we put the major types<br />

<strong>of</strong> integration systems (system aspect) into the context <strong>of</strong> this classification. Regarding<br />

the application area, we distinguish between information integration (data and function<br />

integration), application integration, process integration and GUI integration.<br />

Information integration refers to the area <strong>of</strong> data-centric integration approaches, where<br />

huge amounts <strong>of</strong> data are integrated. In this context, we typically distinguish between<br />

virtual and materialized integration [DD99] based on the data location. First, for virtual<br />

integration, a global (virtual) view over the distributed data sources is provided and<br />

data is not physically consolidated. Examples for this type <strong>of</strong> integration approach are<br />

virtual DBMS (VDBMS), federated DBMS (FDBMS), and Peer Database Management<br />

Systems (PDBMS). The difference is that VDBMS/FDBMS use a hierarchical topology,<br />

while PDBMS use a peer-to-peer topology. Typically, both system types provide an event<br />

model <strong>of</strong> ad-hoc queries in order to allow dynamic integration. Second, for materialized<br />

integration, the data is physically stored or exchanged with the aim <strong>of</strong> data consolidation<br />

or synchronization. In this context, we mainly distinguish two types <strong>of</strong> system categories<br />

based on their event model. On the one side, DSMS (Data Stream Management Systems)<br />

and Publish/Subscribe systems follow a data-driven model where tuples are processed<br />

by standing queries or subscription trees, respectively. While DSMS execute many rather<br />

6

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!