25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 Preliminaries and Existing Techniques<br />

2.1.4 Executing <strong>Integration</strong> <strong>Flows</strong><br />

When deploying an integration flow, we transform the logical flow into an executable plan.<br />

Therefore, we distinguish two major plan representations. First, there are interpreted<br />

plans, where we use an object graph <strong>of</strong> operators and interpret this object graph during<br />

execution <strong>of</strong> a plan instance. Second, there are compiled plans, where code templates are<br />

used in order to generate and compile physical executable plans. As a first side-effect<br />

from the modeling perspective, (1) directed graphs are typically interpreted, while (2)<br />

hierarchies <strong>of</strong> sequences, source code and fixed flows are commonly executed as compiled<br />

plans.<br />

Moreover, as a second side effect, flows with data-flow modeling semantics are typically<br />

also executed with data-flow semantics. The same is true for control-flow semantics.<br />

Hence, we use the flow semantics, as our major classification criterion <strong>of</strong> execution approaches.<br />

With regard to the data granularity as the second classification criterion <strong>of</strong> plan<br />

execution, we traditionally distinguish between two fundamental execution models:<br />

• Iterator Model: The Volcano iterator model [Gra90, Gra94] is the typical execution<br />

model <strong>of</strong> traditional DBMS (row stores). Each operator implements an interface<br />

with the operations open(), next() and close(). The operators <strong>of</strong> a plan call their<br />

predecessors, i.e., the top operator determines the execution <strong>of</strong> the whole plan (pull<br />

principle). In addition, each operator can be executed by an individual thread (and<br />

thus, adheres to the pipes-and-filter execution model), where each operator exhibits<br />

a so-called iterator state (tuple buffer) [Gra90]. The advantages <strong>of</strong> this model are<br />

extensibility with additional operators as well as the exploitation <strong>of</strong> vertical parallelism<br />

(pipeline parallelism or data parallelism) and horizontal parallelism (parallel<br />

pipelines). The disadvantages are the high communication overhead between operators<br />

and the predominant applicability for row-based (tuple-based) execution.<br />

• Materialized Intermediates: The concept <strong>of</strong> materialized intermediates is the typical<br />

execution model <strong>of</strong> column stores [KM05, MBK00]. Operators <strong>of</strong> a plan are executed<br />

in sequence (one operator at a time), where the result <strong>of</strong> one operator is completely<br />

materialized (as variable) and then used as input <strong>of</strong> the next operator (push principle).<br />

This reduces the overhead <strong>of</strong> operator communication and is particularly<br />

advantageous for column stores, where operators work on (compressed) columns in<br />

the form <strong>of</strong> continuous memory (arrays). This concept <strong>of</strong>fers additional optimization<br />

opportunities such as vectorized execution within a single operator, or the recycling<br />

<strong>of</strong> intermediate results [IKNG09] across multiple plans.<br />

<strong>Integration</strong> flows are typically executed as independent plan instances. Here, we distinguish<br />

between data-driven integration flows, where incoming data conceptually initiates a<br />

new plan instance, and scheduled integration flows, where such an instance is initiated by<br />

a time-based scheduler. If strong consistency is required, data-driven integration flows are<br />

executed synchronously, which means that the client systems are blocked during execution.<br />

In contrast, if only weak (eventual) consistency is required, data-driven integration<br />

flows can also be executed asynchronously using inbound queues. Note that scheduled<br />

integration flows are per se asynchronous and thus, ensure only weak consistency. We use<br />

this integration-flow-specific characteristic <strong>of</strong> independent instances to refine the classification<br />

criterion <strong>of</strong> data granularity. Therefore, we introduce the notion <strong>of</strong> instance-local<br />

(data/messages <strong>of</strong> one flow instance) and instance-global (data/messages <strong>of</strong> multiple flow<br />

instances) data granularity.<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!