25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.3 <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />

Join (o4)<br />

[in: msg1,msg3, out: msg4]<br />

Assign (o5)<br />

[in: msg4, out: msg5]<br />

Invoke (o6)<br />

[service s3, in: msg5]<br />

Copy (oc)<br />

[in: msg1, out: msg1]<br />

Execution Bucket 4<br />

Execution Bucket 5<br />

Execution Bucket 6<br />

Execution Bucket 1<br />

Assign (o2)<br />

[in: msg1, out: msg2]<br />

Execution Bucket 2<br />

Invoke (o3)<br />

[service: s4, in: msg2, out: msg3]<br />

(a) Vectorized Plan P ′ 2<br />

Execution Bucket 3<br />

Join (o4)<br />

[in: msg1,msg3, out: msg4]<br />

Assign (o5)<br />

[in: msg4, out: msg5]<br />

Invoke (o6)<br />

[service s3, in: msg5]<br />

Copy (oc)<br />

[in: msg1, out: msg1]<br />

Execution Bucket 3<br />

Execution Bucket 4<br />

Execution Bucket 1<br />

Assign (o2)<br />

[in: msg1, out: msg2]<br />

Invoke (o3)<br />

[service: s4, in: msg2, out: msg3]<br />

Execution Bucket 2<br />

(b) <strong>Cost</strong>-<strong>Based</strong> Vectorized Plan P ′′<br />

2<br />

Figure 4.9: Example <strong>Cost</strong>-<strong>Based</strong> Plan Vectorization<br />

(t 1 (b i , o i ) ≤ t 0 (p ′′<br />

i+1 , b i)) must hold. We define that (l bi ≥ 1) ∧ (l bi ≤ m) and ∑ |b|<br />

i=1 l b i<br />

= m<br />

and that each operator o i is assigned to exactly one bucket b i .<br />

An instance-based plan P is a specific case <strong>of</strong> the cost-based vectorized plan P ′′ , with<br />

k = 1 execution buckets. Similarly, the fully vectorized plan P ′ is also a specific case <strong>of</strong> the<br />

cost-based vectorized plan P ′′ , with k = m execution buckets, where m denotes the number<br />

<strong>of</strong> operators. Figure 4.10 illustrates the resulting spectrum <strong>of</strong> cost-based vectorization.<br />

instance-based<br />

plan<br />

k=1 2 3<br />

cost-based vectorized plan<br />

m-2 m-1<br />

Figure 4.10: Spectrum <strong>of</strong> <strong>Cost</strong>-<strong>Based</strong> Vectorization<br />

vectorized<br />

plan<br />

k=m<br />

At this point, we need to define the optimization objective φ, where in general, arbitrary<br />

objectives could be used. However, the goal <strong>of</strong> the vectorization optimization technique<br />

is message throughput improvement. Thus, the optimization objective is to reach the<br />

highest degree <strong>of</strong> pipeline parallelism with a minimal number <strong>of</strong> threads.<br />

The core idea <strong>of</strong> this objective is illustrated in Figure 4.11. In case <strong>of</strong> an instance-based<br />

plan, all operators, except for parallel subflows, are included in the critical path that is<br />

shown as gray-shaded operators. In contrast, in case <strong>of</strong> a vectorized plan that includes<br />

a sequence <strong>of</strong> operators o with data dependencies between these operators, the execution<br />

time mainly depends on the most time-consuming operator o max with W (o max ) =<br />

max m i=1 W (o i). The reason is that queues in front <strong>of</strong> this most time-consuming operator<br />

reach their maximum constraints—in case <strong>of</strong> full system utilization—and thus, the costs<br />

<strong>of</strong> a vectorized plan are computed with W (P ′ ) = (n + m − 1) · W (o max ). This execution<br />

characteristic, that the work-cycle <strong>of</strong> a pipeline depends on its most time-consuming subtask,<br />

is known from other research areas (e.g., databases and operating systems) as the<br />

convoy effect [Ros10, BGMP79]. As a result <strong>of</strong> this effect, the work cycle <strong>of</strong> the vectorized<br />

plan is given by W (o max ) with the time period between the start <strong>of</strong> two subsequent plan<br />

instances W (o max ) = t 0 (p i+1 )−t 0 (p i ). For example, operator o 3 dominates the work cycle<br />

101

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!