25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 Preliminaries and Existing Techniques<br />

control-flow modeling [MMLW05]. In detail, SQL activities, Retrieve Set activities, and<br />

Atomic SQL Sequence activities (as transactional boundary for other SQL activities) have<br />

been introduced. For example, this concept was implemented as the BPEL extension<br />

ii4BPEL (information integration for BPEL) [IBM05b] within the IBM Business <strong>Integration</strong><br />

Suite that is based on the WebSphere application server. See the survey by Vrhovnik<br />

et al. [VSRM08] for a detailed comparison <strong>of</strong> SQL support by workflow products. To<br />

summarize, this BPEL/SQL approach provides hybrid modeling semantics in terms <strong>of</strong><br />

control flow and data flow. In addition, it enables the efficient execution <strong>of</strong> data-intensive<br />

processes due to potentially reduced transferred data. We will revisit this approach from<br />

an optimization perspective in Subsection 2.1.5.<br />

In contrast to the BPEL/SQL activities, the BPEL-DT (Data Transitions) [HRP + 07,<br />

Hab09] approach uses the traditional control-flow semantics enriched with so-called data<br />

transitions. It is based on the concept <strong>of</strong> data-grey-box Web services [HPL + 07] that<br />

represent a specific data framework in order to propagate data between Web services<br />

by reference. Data transitions are used in terms <strong>of</strong> data-intensive data dependencies<br />

between external Web service interactions within a BPEL process. Such a data transition<br />

is modeled and executed as an integration flow using arbitrary integration platforms such<br />

as ETL tools. In conclusion, arbitrary combinations <strong>of</strong> the control-flow-oriented BPEL<br />

specification with data-flow-oriented integration flows can be realized.<br />

However, both approaches are still control-flow-oriented because the control-flow-oriented<br />

BPEL is enriched with data-centric constructs, but still exhibits temporal dependencies,<br />

which is the major classification criterion between data-flow and control-flow semantics.<br />

B. Hybrid Flow Structure<br />

In addition to the mentioned hybrid modeling semantics, there are also arguments for a<br />

hybrid modeling structure. For example, when using directed graphs or the hierarchy <strong>of</strong><br />

sequences, aspects like complex control flow branches, specific variable initialization or<br />

the preparation <strong>of</strong> complex queries to external systems are difficult and complex to model.<br />

In contrast, when using source code structures, the overall process specification is hidden<br />

and too fine-grained. For these reasons, a combination <strong>of</strong> both aspects is advantageous<br />

for certain integration flows.<br />

BPELJ (BPEL for Java) [BEA04] enables such a hybrid modeling structure by combining<br />

the hierarchy <strong>of</strong> sequences and source code. Therefore, arbitrary Java code snippets<br />

(expressions, or small blocks <strong>of</strong> Java code) can be included in BPEL process specifications.<br />

This is done via so-called bpelj:snippet activities, where the complete code block is included<br />

in this activity and the activity can be used within the overall process specification.<br />

C. Model-Driven Development <strong>of</strong> <strong>Integration</strong> <strong>Flows</strong><br />

In the context <strong>of</strong> complex integration flows, we observe a trend towards applying modeldriven<br />

development techniques from the area <strong>of</strong> s<strong>of</strong>tware technology.<br />

First steps towards model-driven development <strong>of</strong> integration flows were made by Simitsis<br />

et al. in the sense <strong>of</strong> separating ETL flow modeling into conceptual [VSS02a], logical<br />

[VSS02b] and physical [TVS07] models as well as using transformation rules [Sim05] to<br />

describe the transition from one model to another.<br />

The Orchid project [DHW + 08] enables the generation <strong>of</strong> ETL jobs from declarative<br />

schema mapping specifications. A so-called Operator Hub Model (OHM) is used in order<br />

12

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!