25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

(parallelizing sequences and iterations) because otherwise we might miss optimization opportunities.<br />

Furthermore, this technique should be used also after WC4 (merging parallel<br />

flows) because otherwise, we might perform unnecessary optimization efforts.<br />

Rewriting Sequences to Parallel <strong>Flows</strong><br />

The technique WC2: Rewriting Sequences to Parallel <strong>Flows</strong> is used to optimize sequences<br />

<strong>of</strong> operators. Such sequences can be found as implicit children <strong>of</strong> each Plan, Switch path,<br />

Fork subflow, and Iteration. The costs <strong>of</strong> a sequence <strong>of</strong> operators o is given by the sum<br />

<strong>of</strong> their execution times with W (o) = ∑ m<br />

i=1 W (o i).<br />

The core concept is to rewrite a sequence <strong>of</strong> operators to parallel subflows <strong>of</strong> a Fork<br />

operator by analyzing the dependencies between the single operators. Recall that the<br />

execution time <strong>of</strong> the Fork operator is determined by the subflow with highest cost. For<br />

analyzing the optimality <strong>of</strong> such a rewriting, we take into account the number <strong>of</strong> logical<br />

processors (hardware threads) k as well as the CPU utilization <strong>of</strong> the involved operators.<br />

There, the CPU utilization <strong>of</strong> an operator with regard to a single logical processor is<br />

computed by (W (o i ) − wait(o i ))/W (o i ), where the waiting time can be monitored (e.g.,<br />

waiting time for external systems); otherwise we assume wait(o i ) = W (o i ) · 0.05 as a<br />

heuristic. Such a rewriting <strong>of</strong> a sequence to |r| parallel subflows is advantageous if<br />

⎛<br />

⎞<br />

|r|<br />

max<br />

i=1<br />

∑m i<br />

⎝ Ŵ (o i,j ) + i · W (Start Thread) ⎠ <<br />

j=1<br />

with Ŵ (o i,j) =<br />

m∑<br />

Ŵ (o i )<br />

i=1<br />

|r|<br />

min(|r|, k) · (W (o i,j) − wait(o i,j )) + wait(o i,j ),<br />

(3.17)<br />

which means that the estimated most time-consuming parallel subflow must have lower<br />

costs than the plan sequence <strong>of</strong> operators. There, the costs <strong>of</strong> operators within parallel<br />

subflows are estimated by the waiting time plus the execution time that depends on the<br />

number <strong>of</strong> logical processors k and the number <strong>of</strong> parallel subflows |r|. Intuitively, this<br />

represents the increased execution time if parallel subflows share hardware resources. This<br />

is a worst-case consideration because for an exact model, the temporal overlap <strong>of</strong> waiting<br />

times and execution times would be required as well.<br />

Rewriting sequences to parallel flows is realized with the following algorithm. First, we<br />

split the given sequence into disjoint subsequences according to Rule 3 <strong>of</strong> Definition 3.1<br />

(preserve temporal order <strong>of</strong> writing interactions to the same external system). Second, for<br />

each <strong>of</strong> these subsequences, we create a new Fork operator and partition the individual<br />

operators, where we iterate over the operators and determine if they depend on other<br />

operators <strong>of</strong> the same subsequence. If so, we add this operator to the existing subflow,<br />

where the operator referred by the dependency exists; otherwise, we create a new subflow<br />

and add the operator as a child. If an operator depends on multiple operators from<br />

different subflows, this operator splits the subsequence into two subsequences. Third,<br />

if a Fork operator contains only one subflow, we rewrite it back to a simple sequence.<br />

Fourth, and finally, we check the optimality condition for each Fork operator. The time<br />

complexity <strong>of</strong> this algorithm is O(m 2 ) due to the dependency checking for each operator.<br />

In the following, we use an example to illustrate this rewriting concept in more detail.<br />

Example 3.10 (Rewriting Sequences to Parallel <strong>Flows</strong>). Recall our example plan P 8 that<br />

is essentially a sequence <strong>of</strong> operators and assume the monitored execution times shown in<br />

62

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!