Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4.3 <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />
Join (o4)<br />
[in: msg1,msg3, out: msg4]<br />
Assign (o5)<br />
[in: msg4, out: msg5]<br />
Invoke (o6)<br />
[service s3, in: msg5]<br />
Copy (oc)<br />
[in: msg1, out: msg1]<br />
Execution Bucket 4<br />
Execution Bucket 5<br />
Execution Bucket 6<br />
Execution Bucket 1<br />
Assign (o2)<br />
[in: msg1, out: msg2]<br />
Execution Bucket 2<br />
Invoke (o3)<br />
[service: s4, in: msg2, out: msg3]<br />
(a) Vectorized Plan P ′ 2<br />
Execution Bucket 3<br />
Join (o4)<br />
[in: msg1,msg3, out: msg4]<br />
Assign (o5)<br />
[in: msg4, out: msg5]<br />
Invoke (o6)<br />
[service s3, in: msg5]<br />
Copy (oc)<br />
[in: msg1, out: msg1]<br />
Execution Bucket 3<br />
Execution Bucket 4<br />
Execution Bucket 1<br />
Assign (o2)<br />
[in: msg1, out: msg2]<br />
Invoke (o3)<br />
[service: s4, in: msg2, out: msg3]<br />
Execution Bucket 2<br />
(b) <strong>Cost</strong>-<strong>Based</strong> Vectorized Plan P ′′<br />
2<br />
Figure 4.9: Example <strong>Cost</strong>-<strong>Based</strong> Plan Vectorization<br />
(t 1 (b i , o i ) ≤ t 0 (p ′′<br />
i+1 , b i)) must hold. We define that (l bi ≥ 1) ∧ (l bi ≤ m) and ∑ |b|<br />
i=1 l b i<br />
= m<br />
and that each operator o i is assigned to exactly one bucket b i .<br />
An instance-based plan P is a specific case <strong>of</strong> the cost-based vectorized plan P ′′ , with<br />
k = 1 execution buckets. Similarly, the fully vectorized plan P ′ is also a specific case <strong>of</strong> the<br />
cost-based vectorized plan P ′′ , with k = m execution buckets, where m denotes the number<br />
<strong>of</strong> operators. Figure 4.10 illustrates the resulting spectrum <strong>of</strong> cost-based vectorization.<br />
instance-based<br />
plan<br />
k=1 2 3<br />
cost-based vectorized plan<br />
m-2 m-1<br />
Figure 4.10: Spectrum <strong>of</strong> <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />
vectorized<br />
plan<br />
k=m<br />
At this point, we need to define the optimization objective φ, where in general, arbitrary<br />
objectives could be used. However, the goal <strong>of</strong> the vectorization optimization technique<br />
is message throughput improvement. Thus, the optimization objective is to reach the<br />
highest degree <strong>of</strong> pipeline parallelism with a minimal number <strong>of</strong> threads.<br />
The core idea <strong>of</strong> this objective is illustrated in Figure 4.11. In case <strong>of</strong> an instance-based<br />
plan, all operators, except for parallel subflows, are included in the critical path that is<br />
shown as gray-shaded operators. In contrast, in case <strong>of</strong> a vectorized plan that includes<br />
a sequence <strong>of</strong> operators o with data dependencies between these operators, the execution<br />
time mainly depends on the most time-consuming operator o max with W (o max ) =<br />
max m i=1 W (o i). The reason is that queues in front <strong>of</strong> this most time-consuming operator<br />
reach their maximum constraints—in case <strong>of</strong> full system utilization—and thus, the costs<br />
<strong>of</strong> a vectorized plan are computed with W (P ′ ) = (n + m − 1) · W (o max ). This execution<br />
characteristic, that the work-cycle <strong>of</strong> a pipeline depends on its most time-consuming subtask,<br />
is known from other research areas (e.g., databases and operating systems) as the<br />
convoy effect [Ros10, BGMP79]. As a result <strong>of</strong> this effect, the work cycle <strong>of</strong> the vectorized<br />
plan is given by W (o max ) with the time period between the start <strong>of</strong> two subsequent plan<br />
instances W (o max ) = t 0 (p i+1 )−t 0 (p i ). For example, operator o 3 dominates the work cycle<br />
101