Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.2 Plan Vectorization<br />
the control-flow semantics <strong>of</strong> an integration flow as well as the Copy operator for data flow<br />
splits. The semantics <strong>of</strong> these operators are described in Table 4.1.<br />
In order to realize the pipes-and-filter execution model, operators are executed in socalled<br />
multi-threaded executed buckets. Figure 4.4 illustrates the conceptual model <strong>of</strong><br />
such an execution bucket.<br />
input queue<br />
q in1<br />
sync input queue<br />
sq in<br />
…<br />
dequeue()<br />
Thread<br />
input queue<br />
q in k1<br />
sync input queue<br />
sq in<br />
input queue<br />
q in<br />
dequeue()<br />
Thread<br />
input queue<br />
q in1<br />
dequeue()<br />
Thread<br />
input queue<br />
q in2<br />
operator o i<br />
operator o i<br />
operator o i<br />
output queue<br />
q out1<br />
enqueue()<br />
…<br />
output queue<br />
q out k2<br />
enqueue()<br />
output queue<br />
enqueue()<br />
output queue<br />
sync output queue<br />
sq out<br />
q out<br />
sync output queue<br />
sq out<br />
q out<br />
sync output queue<br />
sq out<br />
(a) Generic Operator<br />
(b) Unary Operator<br />
(c) Binary Operator<br />
Figure 4.4: Conceptual Model <strong>of</strong> Execution Buckets<br />
In general, Figure 4.4(a) shows the generic model <strong>of</strong> an execution bucket that contains<br />
a set <strong>of</strong> input message queues, a set <strong>of</strong> output message queues, a single so-called input<br />
synchronization queue, a single output synchronization queue, and an operator. The<br />
bucket is a thread with an endless loop, where in each iteration it dequeues from all<br />
input message queues, dequeues from the synchronization queue, executes the operator,<br />
and enqueues the results into all output queues. However, with regard to the defined<br />
flow meta model, unary (see Figure 4.4(b)) and binary (see Figure 4.4(c)) operators with<br />
one or two inputs and a single output are most common. Note that only the unary<br />
operators Xor and And require synchronization input queues, while each operator can have<br />
a synchronization output queue, which depends on its position within the plan and the<br />
need for serialization. Apart from the synchronization queues, similar operator models<br />
are commonly used in the context <strong>of</strong> data stream management systems [GAW + 08] and<br />
related system categories [CEB + 09, CWGN11].<br />
4.2.2 Rewriting Algorithm<br />
In this subsection, we first describe the core rewriting algorithm and second, we specify<br />
the control-flow-specific rewriting techniques, which preserve the external behavior. Both<br />
aspects are required in order to enable plan vectorization as an optimization technique.<br />
Core Algorithm<br />
The basic rewriting algorithm that is described in the following can be applied for all types<br />
<strong>of</strong> operators <strong>of</strong> our integration flow meta model.<br />
In detail, Algorithm 4.1 consists <strong>of</strong> two parts. In a first part, the dependency analysis<br />
is performed by determining all data dependencies between operators. There, we create<br />
a queue instance for each data dependency between two operators (the output message<br />
<strong>of</strong> operator o i is the input message <strong>of</strong> operator o j ). Internally, our optimizer reuses the<br />
existing dependency graph that was described in Subsection 3.2.1. In a second part, we<br />
93