Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2 Preliminaries and Existing Techniques<br />
Due to these goals <strong>of</strong> integration that are based on different application use cases,<br />
a variety <strong>of</strong> integration approaches and realizations emerged in the past. In order to<br />
precisely separate the focus <strong>of</strong> this thesis, we classify those integration approaches, where<br />
integration flows are only one category among others.<br />
2.1.1 Classification <strong>of</strong> <strong>Integration</strong> Approaches<br />
There exist several, partially overlapping, classification aspects <strong>of</strong> integration approaches,<br />
which include the (1) application area, (2) data location, (3) time aspect and consistency<br />
model, (4) event model, (5) topology, and (6) specification method. We explicitly exclude<br />
orthogonal integration aspects such as schema matching, model management, master data<br />
management, lineage tracing and the integration <strong>of</strong> semi-structured and unstructured data<br />
because techniques from these areas are typically not specifically designed for a certain<br />
integration approach.<br />
<strong>Integration</strong><br />
Application<br />
Area<br />
GUI<br />
<strong>Integration</strong><br />
Process<br />
<strong>Integration</strong><br />
Application<br />
<strong>Integration</strong><br />
Information<br />
<strong>Integration</strong><br />
Data<br />
Location<br />
virtual materialized<br />
materialized materialized virtual<br />
System<br />
Aspect<br />
Portals Mashups WfMS WSMS BPEL<br />
Engines<br />
EAI<br />
servers<br />
MOM<br />
systems<br />
ETL<br />
Tools<br />
Publish/<br />
Subscribe<br />
DSMS<br />
PDMS VDBMS/<br />
FDBMS<br />
Specification<br />
Method<br />
User-Interface-Oriented<br />
<strong>Integration</strong> <strong>Flows</strong><br />
Query-<strong>Based</strong><br />
Figure 2.1: Classification <strong>of</strong> <strong>Integration</strong> Approaches<br />
Our overall classification that is shown in Figure 2.1 comprises the aspects (1) application<br />
area, (2) data location, and (6) specification method. In addition, we put the major types<br />
<strong>of</strong> integration systems (system aspect) into the context <strong>of</strong> this classification. Regarding<br />
the application area, we distinguish between information integration (data and function<br />
integration), application integration, process integration and GUI integration.<br />
Information integration refers to the area <strong>of</strong> data-centric integration approaches, where<br />
huge amounts <strong>of</strong> data are integrated. In this context, we typically distinguish between<br />
virtual and materialized integration [DD99] based on the data location. First, for virtual<br />
integration, a global (virtual) view over the distributed data sources is provided and<br />
data is not physically consolidated. Examples for this type <strong>of</strong> integration approach are<br />
virtual DBMS (VDBMS), federated DBMS (FDBMS), and Peer Database Management<br />
Systems (PDBMS). The difference is that VDBMS/FDBMS use a hierarchical topology,<br />
while PDBMS use a peer-to-peer topology. Typically, both system types provide an event<br />
model <strong>of</strong> ad-hoc queries in order to allow dynamic integration. Second, for materialized<br />
integration, the data is physically stored or exchanged with the aim <strong>of</strong> data consolidation<br />
or synchronization. In this context, we mainly distinguish two types <strong>of</strong> system categories<br />
based on their event model. On the one side, DSMS (Data Stream Management Systems)<br />
and Publish/Subscribe systems follow a data-driven model where tuples are processed<br />
by standing queries or subscription trees, respectively. While DSMS execute many rather<br />
6