25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

o 2 p 1 o 1 o 3 o 4<br />

p 2 o 2 o 3 o 4<br />

o 1<br />

o 1 o 2 o 3 o 4<br />

p 3<br />

time t<br />

t 0(p 1) t 1(p 1) t 0(p 2)<br />

t 1(p 2) t 0(p 3)<br />

t 1(p 3)<br />

(a) Instance-<strong>Based</strong> Plan P<br />

p 1<br />

o 1<br />

o 2 o 3 o 4<br />

o 2 o 3 o 4<br />

p 2 o 1<br />

o 1<br />

o 2 o 3 o 4<br />

p 3<br />

t 0(p 1) t 1(p 1)<br />

t 0(p 2) t 0(p 3)<br />

t 1(p 2)<br />

t 1(p 3)<br />

time t<br />

(b) Fully Vectorized Plan P ′<br />

p 1 o 1<br />

o 2 o 3 o 4<br />

p 2 o 1 o 2 o 3 o 4<br />

p 3 o 1 o 2 o 3 o 4<br />

t 0(p 1) t 0(p 2) t 0(p 3) t 1(p 1)<br />

t 1(p 2)<br />

t 1(p 3)<br />

time t<br />

(c) <strong>Cost</strong>-<strong>Based</strong> Vectorized Plan P ′′<br />

Figure 4.11: Work Cycle Domination by Operator o 3<br />

<strong>of</strong> plan P ′ in Figure 4.11(b). In conclusion, we leverage the waiting time during work<br />

cycles <strong>of</strong> the data flow graph and merge operators into execution buckets if applicable.<br />

Formally, this optimization objective is defined as follows:<br />

⎛ ⎞<br />

φ = min<br />

m<br />

l k | ∀i ∈ [1, k] : ∑ bi<br />

⎝ W (o j ) ⎠ ≤ W (o max ) (4.7)<br />

k=1<br />

The goal is to find the minimal number <strong>of</strong> execution buckets k under the restriction that the<br />

execution time <strong>of</strong> each bucket b i (sum <strong>of</strong> execution times <strong>of</strong> the l bi operators <strong>of</strong> this bucket)<br />

does not exceed the execution time <strong>of</strong> the most time-consuming operator. As a result, we<br />

achieve the highest degree <strong>of</strong> parallelism with a minimal number <strong>of</strong> threads. Further<br />

advantages <strong>of</strong> this concept are reduced latency time for single messages and robustness<br />

in the case <strong>of</strong> many plan operators but limited thread resources. The special case <strong>of</strong><br />

the P-CPV with optimization objective φ, where all operators are independent (no data<br />

dependencies), is reducible to the NP-hard <strong>of</strong>fline bin packing problem [Joh74].<br />

Typically, the optimization objective φ allows to find a scheme that exploits the highest<br />

pipeline parallelism but requires fewer threads than the full vectorization. However, in<br />

special cases such as (1) where all operators exhibit almost the same execution time or<br />

(2) where a plan contains too many operators, the problem <strong>of</strong> a large number <strong>of</strong> required<br />

threads still exist. In order to overcome this general problem, we extend the P-CPV by a<br />

parameter to allow for higher robustness. In detail, this extended optimization problem<br />

is defined as follows:<br />

Definition 4.3 (Constrained P-CPV). With regard to the P-CPV, find the minimal number<br />

<strong>of</strong> k buckets and an assignment <strong>of</strong> operators o j with j ∈ [1, m] to those execution buckets<br />

j=1<br />

102

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!