Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
o 2 p 1 o 1 o 3 o 4<br />
p 2 o 2 o 3 o 4<br />
o 1<br />
o 1 o 2 o 3 o 4<br />
p 3<br />
time t<br />
t 0(p 1) t 1(p 1) t 0(p 2)<br />
t 1(p 2) t 0(p 3)<br />
t 1(p 3)<br />
(a) Instance-<strong>Based</strong> Plan P<br />
p 1<br />
o 1<br />
o 2 o 3 o 4<br />
o 2 o 3 o 4<br />
p 2 o 1<br />
o 1<br />
o 2 o 3 o 4<br />
p 3<br />
t 0(p 1) t 1(p 1)<br />
t 0(p 2) t 0(p 3)<br />
t 1(p 2)<br />
t 1(p 3)<br />
time t<br />
(b) Fully Vectorized Plan P ′<br />
p 1 o 1<br />
o 2 o 3 o 4<br />
p 2 o 1 o 2 o 3 o 4<br />
p 3 o 1 o 2 o 3 o 4<br />
t 0(p 1) t 0(p 2) t 0(p 3) t 1(p 1)<br />
t 1(p 2)<br />
t 1(p 3)<br />
time t<br />
(c) <strong>Cost</strong>-<strong>Based</strong> Vectorized Plan P ′′<br />
Figure 4.11: Work Cycle Domination by Operator o 3<br />
<strong>of</strong> plan P ′ in Figure 4.11(b). In conclusion, we leverage the waiting time during work<br />
cycles <strong>of</strong> the data flow graph and merge operators into execution buckets if applicable.<br />
Formally, this optimization objective is defined as follows:<br />
⎛ ⎞<br />
φ = min<br />
m<br />
l k | ∀i ∈ [1, k] : ∑ bi<br />
⎝ W (o j ) ⎠ ≤ W (o max ) (4.7)<br />
k=1<br />
The goal is to find the minimal number <strong>of</strong> execution buckets k under the restriction that the<br />
execution time <strong>of</strong> each bucket b i (sum <strong>of</strong> execution times <strong>of</strong> the l bi operators <strong>of</strong> this bucket)<br />
does not exceed the execution time <strong>of</strong> the most time-consuming operator. As a result, we<br />
achieve the highest degree <strong>of</strong> parallelism with a minimal number <strong>of</strong> threads. Further<br />
advantages <strong>of</strong> this concept are reduced latency time for single messages and robustness<br />
in the case <strong>of</strong> many plan operators but limited thread resources. The special case <strong>of</strong><br />
the P-CPV with optimization objective φ, where all operators are independent (no data<br />
dependencies), is reducible to the NP-hard <strong>of</strong>fline bin packing problem [Joh74].<br />
Typically, the optimization objective φ allows to find a scheme that exploits the highest<br />
pipeline parallelism but requires fewer threads than the full vectorization. However, in<br />
special cases such as (1) where all operators exhibit almost the same execution time or<br />
(2) where a plan contains too many operators, the problem <strong>of</strong> a large number <strong>of</strong> required<br />
threads still exist. In order to overcome this general problem, we extend the P-CPV by a<br />
parameter to allow for higher robustness. In detail, this extended optimization problem<br />
is defined as follows:<br />
Definition 4.3 (Constrained P-CPV). With regard to the P-CPV, find the minimal number<br />
<strong>of</strong> k buckets and an assignment <strong>of</strong> operators o j with j ∈ [1, m] to those execution buckets<br />
j=1<br />
102