Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
execution, where each incoming message initiates a new plan instance. All operators <strong>of</strong><br />
one instance are executed before the next instance is started.<br />
plan instance pid=3<br />
plan instance pid=2<br />
plan instance pid=1<br />
Receive (o1)<br />
[service: s5, out: msg1]<br />
Assign (o2)<br />
[in: msg1, out: msg2]<br />
Invoke (o3)<br />
[service: s4, in: msg2, out: msg3]<br />
Join (o4)<br />
[in: msg1,msg3, out: msg4]<br />
Assign (o5)<br />
[in: msg4, out: msg5]<br />
Invoke (o6)<br />
[service s3, in: msg5]<br />
msg1<br />
msg2<br />
msg3<br />
msg4<br />
msg5<br />
Message<br />
Queue<br />
Receive (o1)<br />
[service: s5, out: msg1]<br />
Assign (o2)<br />
[in: msg1, out: msg2]<br />
Invoke (o3)<br />
[service: s4, in: msg2, out: msg3]<br />
Join (o4)<br />
[in: msg1,msg3, out: msg4]<br />
Assign (o5)<br />
[in: msg4, out: msg5]<br />
Invoke (o6)<br />
[service s3, in: msg5]<br />
msg1<br />
msg2<br />
msg3<br />
msg4<br />
msg5<br />
time t<br />
(a) Example Plan P 2<br />
(b) Instance-<strong>Based</strong> Plan Execution <strong>of</strong> P 2<br />
Figure 4.1: Example Instance-<strong>Based</strong> Execution <strong>of</strong> Plan P 2<br />
In contrast, Figure 4.2 shows the fully vectorized plan, where each operator is executed<br />
within an execution bucket. Note that we also emphasized the changed operator parameters.<br />
Vectorized plan P’<br />
Assign (o2)<br />
[in: msg1, out: msg2]<br />
Invoke (o3)<br />
[service: s4, in: msg2, out: msg3]<br />
Message<br />
Queue<br />
Copy (oc)<br />
[in: msg1, out: msg1]<br />
Join (o4)<br />
[in: msg1,msg3, out: msg4]<br />
Assign (o5)<br />
[in: msg4, out: msg5]<br />
Invoke (o6)<br />
[service s3, in: msg5]<br />
inter-bucket message queue<br />
execution bucket bi (thread)<br />
Figure 4.2: Example Fully Vectorized Execution <strong>of</strong> Plan P ′ 2<br />
We can leverage pipeline parallelism (within a single pipeline) and parallel pipelines. In<br />
this model, each edge <strong>of</strong> a data flow graph includes a message queue for inter-operator<br />
communication. Dashed arrows represent dequeue (read) operations, while normal arrows<br />
represent enqueue (write) operations. Additional operators (e.g., the Copy operator for<br />
data flow splits) are required, while the Receive operator is not needed anymore.<br />
Major challenges have to be tackled when transforming P into P ′ in order to preserve<br />
the control-flow semantics and prevent the external behavior from being changed. <strong>Based</strong><br />
on the mentioned requirement <strong>of</strong> ensuring semantic correctness in the form <strong>of</strong> serialized<br />
external behavior, we now formally define the plan vectorization problem. Figure 4.3(a)<br />
illustrates the temporal aspects <strong>of</strong> the example instance-based plan (assuming a sequence<br />
<strong>of</strong> operators). In this case, different instances <strong>of</strong> this plan are serialized in incoming order.<br />
Such an instance-based plan is the input <strong>of</strong> our vectorization problem. In contrast to this,<br />
Figure 4.3(b) shows the temporal aspects <strong>of</strong> a vectorized plan for the best case. Here,<br />
only the external behavior (according to the start time t 0 and the end time t 1 <strong>of</strong> plan<br />
and operator instances) must be serialized. Such a vectorized plan is the output <strong>of</strong> the<br />
vectorization problem. The plan vectorization problem is then defined as follows.<br />
90