Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4.2 Plan Vectorization<br />
p1<br />
o1 o2 o3<br />
o4 o5 o6<br />
p2<br />
o1 o2 o3 o4 o5 o6<br />
t0(p1)<br />
t1(p1)<br />
t0(p2)<br />
time t<br />
t1(p2)<br />
(a) Instance-<strong>Based</strong> Execution <strong>of</strong> Plan P<br />
p1<br />
o1 o2 o3<br />
o4 o5 o6<br />
p2<br />
o1 o2 o3 o4 o5 o6<br />
t0(p1) t0(p2) t1(p1)<br />
t1(p2)<br />
possible improvement<br />
due to vectorization<br />
time t<br />
(b) Fully Vectorized Execution <strong>of</strong> Plan P ′<br />
Figure 4.3: Temporal Aspects <strong>of</strong> Instance-<strong>Based</strong> and Vectorized Plans<br />
Definition 4.1 (Plan Vectorization Problem (P-PV)). Let P denote a plan, and p i ∈<br />
{p 1 , p 2 , . . . , p n } denotes the plan instances with P ⇒ p i . Further, let the plan P contain a<br />
sequence <strong>of</strong> atomic or complex operators o i ∈ {o 1 , o 2 , . . . , o m }. For serialization purposes,<br />
the plan instances are executed in sequence, where the end time t 1 <strong>of</strong> a plan instance is<br />
lower than the start time t 0 <strong>of</strong> the subsequent plan instance with t 1 (p i ) ≤ t 0 (p i+1 ). Then,<br />
the P-PV describes the search for the vectorized plan P ′ (with data flow semantics) that<br />
exhibits the highest degree <strong>of</strong> parallelism for the plan instances p ′ i such that the constraint<br />
conditions (t 1 (p ′ i , o i) ≤ t 0 (p ′ i , o i+1)) ∧ (t 1 (p ′ i , o i) ≤ t 0 (p ′ i+1 , o i)) hold and the semantic correctness<br />
(see Definition 3.1) is ensured.<br />
The same rules <strong>of</strong> ensuring semantic correctness as used for inter-operator parallelism<br />
in Chapter 3 also apply for vectorized plans. For example, this requires synchronization <strong>of</strong><br />
writing interactions. However, we assume independence <strong>of</strong> plan instances, which holds for<br />
typical data-propagating integration flows. This means that we synchronize, for example,<br />
a reading interaction with a subsequent writing interaction <strong>of</strong> plan instance p 1 but we<br />
allow executing the reading interaction <strong>of</strong> p 2 in parallel to the writing interaction <strong>of</strong> p 1 .<br />
Nevertheless, monotonic reads and writes are ensured. We will revisit this issue <strong>of</strong> intrainstance<br />
synchronization when discussing the rewriting algorithm.<br />
<strong>Based</strong> on the P-PV, we now reveal the static cost analysis <strong>of</strong> the best case (full pipelines),<br />
where cost denotes the total execution time. Let P include an operator sequence o with<br />
constant operator costs W (o i ) = 1, the costs <strong>of</strong> n plan instances are<br />
W (P ) = n · m<br />
W (P ′ ) = n + m − 1<br />
∆(W (P ) − W (P ′ )) = (n − 1) · (m − 1),<br />
// instance-based<br />
// fully vectorized<br />
(4.1)<br />
where m denotes the number <strong>of</strong> operators. This is an idealized model only used for<br />
illustration purposes. In practice, the improvement depends on the most time-consuming<br />
operator o max with W (o max ) = max m i=1 W (o i) <strong>of</strong> a vectorized plan P ′ because the workcycle<br />
<strong>of</strong> the whole data flow graph depends on this operator due to filled queues (with<br />
maximum constraints) in front <strong>of</strong> this operator. We will revisit this effect when discussing<br />
the cost-based vectorization in Section 4.3. The costs are then specified by:<br />
W (P ) = n ·<br />
m∑<br />
W (o i )<br />
i=1<br />
W (P ′ ) = (n + m − 1) · W (o max ).<br />
// instance-based<br />
// fully vectorized<br />
(4.2)<br />
91