Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
(parallelizing sequences and iterations) because otherwise we might miss optimization opportunities.<br />
Furthermore, this technique should be used also after WC4 (merging parallel<br />
flows) because otherwise, we might perform unnecessary optimization efforts.<br />
Rewriting Sequences to Parallel <strong>Flows</strong><br />
The technique WC2: Rewriting Sequences to Parallel <strong>Flows</strong> is used to optimize sequences<br />
<strong>of</strong> operators. Such sequences can be found as implicit children <strong>of</strong> each Plan, Switch path,<br />
Fork subflow, and Iteration. The costs <strong>of</strong> a sequence <strong>of</strong> operators o is given by the sum<br />
<strong>of</strong> their execution times with W (o) = ∑ m<br />
i=1 W (o i).<br />
The core concept is to rewrite a sequence <strong>of</strong> operators to parallel subflows <strong>of</strong> a Fork<br />
operator by analyzing the dependencies between the single operators. Recall that the<br />
execution time <strong>of</strong> the Fork operator is determined by the subflow with highest cost. For<br />
analyzing the optimality <strong>of</strong> such a rewriting, we take into account the number <strong>of</strong> logical<br />
processors (hardware threads) k as well as the CPU utilization <strong>of</strong> the involved operators.<br />
There, the CPU utilization <strong>of</strong> an operator with regard to a single logical processor is<br />
computed by (W (o i ) − wait(o i ))/W (o i ), where the waiting time can be monitored (e.g.,<br />
waiting time for external systems); otherwise we assume wait(o i ) = W (o i ) · 0.05 as a<br />
heuristic. Such a rewriting <strong>of</strong> a sequence to |r| parallel subflows is advantageous if<br />
⎛<br />
⎞<br />
|r|<br />
max<br />
i=1<br />
∑m i<br />
⎝ Ŵ (o i,j ) + i · W (Start Thread) ⎠ <<br />
j=1<br />
with Ŵ (o i,j) =<br />
m∑<br />
Ŵ (o i )<br />
i=1<br />
|r|<br />
min(|r|, k) · (W (o i,j) − wait(o i,j )) + wait(o i,j ),<br />
(3.17)<br />
which means that the estimated most time-consuming parallel subflow must have lower<br />
costs than the plan sequence <strong>of</strong> operators. There, the costs <strong>of</strong> operators within parallel<br />
subflows are estimated by the waiting time plus the execution time that depends on the<br />
number <strong>of</strong> logical processors k and the number <strong>of</strong> parallel subflows |r|. Intuitively, this<br />
represents the increased execution time if parallel subflows share hardware resources. This<br />
is a worst-case consideration because for an exact model, the temporal overlap <strong>of</strong> waiting<br />
times and execution times would be required as well.<br />
Rewriting sequences to parallel flows is realized with the following algorithm. First, we<br />
split the given sequence into disjoint subsequences according to Rule 3 <strong>of</strong> Definition 3.1<br />
(preserve temporal order <strong>of</strong> writing interactions to the same external system). Second, for<br />
each <strong>of</strong> these subsequences, we create a new Fork operator and partition the individual<br />
operators, where we iterate over the operators and determine if they depend on other<br />
operators <strong>of</strong> the same subsequence. If so, we add this operator to the existing subflow,<br />
where the operator referred by the dependency exists; otherwise, we create a new subflow<br />
and add the operator as a child. If an operator depends on multiple operators from<br />
different subflows, this operator splits the subsequence into two subsequences. Third,<br />
if a Fork operator contains only one subflow, we rewrite it back to a simple sequence.<br />
Fourth, and finally, we check the optimality condition for each Fork operator. The time<br />
complexity <strong>of</strong> this algorithm is O(m 2 ) due to the dependency checking for each operator.<br />
In the following, we use an example to illustrate this rewriting concept in more detail.<br />
Example 3.10 (Rewriting Sequences to Parallel <strong>Flows</strong>). Recall our example plan P 8 that<br />
is essentially a sequence <strong>of</strong> operators and assume the monitored execution times shown in<br />
62