Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
(a) P with m = 100 (b) P with m = 200<br />
Figure 4.8: Speedup Test with Varying Degree <strong>of</strong> Parallelism<br />
<strong>of</strong> the P-PV and present our cost-based vectorization approach [BHP + 09a, BHP + 11] that<br />
overcomes the two drawbacks <strong>of</strong> vectorization. The instance-based plan and the fully<br />
vectorized plan are then specific cases <strong>of</strong> this more general solution. This cost-based<br />
vectorization directly relies on the foundations <strong>of</strong> our general cost-based optimization<br />
framework in the form <strong>of</strong> a cost-based, control-flow-oriented optimization technique.<br />
4.3.1 Problem Generalization<br />
The core idea <strong>of</strong> this problem generalization is to rewrite an instance-based plan to a costbased<br />
vectorized plan with a minimal number <strong>of</strong> execution buckets, where each bucket<br />
can contain multiple operators. All operators <strong>of</strong> a single execution bucket are executed<br />
instance-based, while the set <strong>of</strong> execution buckets use the pipes-and-filter execution model.<br />
Example 4.5 (<strong>Cost</strong>-<strong>Based</strong> Vectorization). Recall the vectorized plan P 2 ′ (shown in Figure<br />
4.9(a)) from Example 4.1. If we now use cost-based vectorization, we search for the<br />
cost-optimal plan P 2 ′′ with k execution buckets. Figure 4.9(b) illustrates an example <strong>of</strong><br />
such a cost-based vectorized plan. There, k = 4 execution buckets are used, while buckets<br />
2 and 4 include two operators. The individual operators <strong>of</strong> each bucket are executed with<br />
the instance-based execution model. As a result, we require only four instead <strong>of</strong> six threads<br />
for this plan.<br />
The input (instance-based plan) and the output (vectorized plan) <strong>of</strong> the P-PV are<br />
extreme cases <strong>of</strong> this generalization. In order to compute the cost-optimal vectorized<br />
plan, we generalize the P-PV to the <strong>Cost</strong>-<strong>Based</strong> P-PV:<br />
Definition 4.2 (<strong>Cost</strong>-<strong>Based</strong> Plan Vectorization Problem (P-CPV)). Let P denote a plan,<br />
and p i ∈ {p 1 , p 2 , . . . , p n } denotes the implied plan instances with P ⇒ p i . Further, let<br />
each plan P comprise a sequence <strong>of</strong> atomic and complex operators o i ∈ {o 1 , o 2 , . . . , o m }.<br />
For serialization purposes, the plan instances are executed in sequence with t 1 (p i ) ≤<br />
t 0 (p i+1 ). The P-CPV describes the search for the derived cost-optimal plan P ′′ according<br />
to the optimization objective φ with k ∈ [1, m] execution buckets b i ∈ {b 1 , b 2 , . . . , b k },<br />
where each bucket contains l operators o i ∈ {o 1 , o 2 , . . . , o l }. Here, the constraint conditions<br />
(t 1 (p ′′<br />
i , b i) ≤ t 0 (p ′′<br />
i , b i+1)) ∧ (t 1 (p ′′<br />
i , b i) ≤ t 0 (p ′′<br />
i+1 , b i)) and (t 1 (b i , o i ) ≤ t 0 (b i , o i+1 )) ∧<br />
100