Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
2 Preliminaries and Existing Techniques<br />
control-flow modeling [MMLW05]. In detail, SQL activities, Retrieve Set activities, and<br />
Atomic SQL Sequence activities (as transactional boundary for other SQL activities) have<br />
been introduced. For example, this concept was implemented as the BPEL extension<br />
ii4BPEL (information integration for BPEL) [IBM05b] within the IBM Business <strong>Integration</strong><br />
Suite that is based on the WebSphere application server. See the survey by Vrhovnik<br />
et al. [VSRM08] for a detailed comparison <strong>of</strong> SQL support by workflow products. To<br />
summarize, this BPEL/SQL approach provides hybrid modeling semantics in terms <strong>of</strong><br />
control flow and data flow. In addition, it enables the efficient execution <strong>of</strong> data-intensive<br />
processes due to potentially reduced transferred data. We will revisit this approach from<br />
an optimization perspective in Subsection 2.1.5.<br />
In contrast to the BPEL/SQL activities, the BPEL-DT (Data Transitions) [HRP + 07,<br />
Hab09] approach uses the traditional control-flow semantics enriched with so-called data<br />
transitions. It is based on the concept <strong>of</strong> data-grey-box Web services [HPL + 07] that<br />
represent a specific data framework in order to propagate data between Web services<br />
by reference. Data transitions are used in terms <strong>of</strong> data-intensive data dependencies<br />
between external Web service interactions within a BPEL process. Such a data transition<br />
is modeled and executed as an integration flow using arbitrary integration platforms such<br />
as ETL tools. In conclusion, arbitrary combinations <strong>of</strong> the control-flow-oriented BPEL<br />
specification with data-flow-oriented integration flows can be realized.<br />
However, both approaches are still control-flow-oriented because the control-flow-oriented<br />
BPEL is enriched with data-centric constructs, but still exhibits temporal dependencies,<br />
which is the major classification criterion between data-flow and control-flow semantics.<br />
B. Hybrid Flow Structure<br />
In addition to the mentioned hybrid modeling semantics, there are also arguments for a<br />
hybrid modeling structure. For example, when using directed graphs or the hierarchy <strong>of</strong><br />
sequences, aspects like complex control flow branches, specific variable initialization or<br />
the preparation <strong>of</strong> complex queries to external systems are difficult and complex to model.<br />
In contrast, when using source code structures, the overall process specification is hidden<br />
and too fine-grained. For these reasons, a combination <strong>of</strong> both aspects is advantageous<br />
for certain integration flows.<br />
BPELJ (BPEL for Java) [BEA04] enables such a hybrid modeling structure by combining<br />
the hierarchy <strong>of</strong> sequences and source code. Therefore, arbitrary Java code snippets<br />
(expressions, or small blocks <strong>of</strong> Java code) can be included in BPEL process specifications.<br />
This is done via so-called bpelj:snippet activities, where the complete code block is included<br />
in this activity and the activity can be used within the overall process specification.<br />
C. Model-Driven Development <strong>of</strong> <strong>Integration</strong> <strong>Flows</strong><br />
In the context <strong>of</strong> complex integration flows, we observe a trend towards applying modeldriven<br />
development techniques from the area <strong>of</strong> s<strong>of</strong>tware technology.<br />
First steps towards model-driven development <strong>of</strong> integration flows were made by Simitsis<br />
et al. in the sense <strong>of</strong> separating ETL flow modeling into conceptual [VSS02a], logical<br />
[VSS02b] and physical [TVS07] models as well as using transformation rules [Sim05] to<br />
describe the transition from one model to another.<br />
The Orchid project [DHW + 08] enables the generation <strong>of</strong> ETL jobs from declarative<br />
schema mapping specifications. A so-called Operator Hub Model (OHM) is used in order<br />
12