25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

(a) P with m = 100 (b) P with m = 200<br />

Figure 4.8: Speedup Test with Varying Degree <strong>of</strong> Parallelism<br />

<strong>of</strong> the P-PV and present our cost-based vectorization approach [BHP + 09a, BHP + 11] that<br />

overcomes the two drawbacks <strong>of</strong> vectorization. The instance-based plan and the fully<br />

vectorized plan are then specific cases <strong>of</strong> this more general solution. This cost-based<br />

vectorization directly relies on the foundations <strong>of</strong> our general cost-based optimization<br />

framework in the form <strong>of</strong> a cost-based, control-flow-oriented optimization technique.<br />

4.3.1 Problem Generalization<br />

The core idea <strong>of</strong> this problem generalization is to rewrite an instance-based plan to a costbased<br />

vectorized plan with a minimal number <strong>of</strong> execution buckets, where each bucket<br />

can contain multiple operators. All operators <strong>of</strong> a single execution bucket are executed<br />

instance-based, while the set <strong>of</strong> execution buckets use the pipes-and-filter execution model.<br />

Example 4.5 (<strong>Cost</strong>-<strong>Based</strong> Vectorization). Recall the vectorized plan P 2 ′ (shown in Figure<br />

4.9(a)) from Example 4.1. If we now use cost-based vectorization, we search for the<br />

cost-optimal plan P 2 ′′ with k execution buckets. Figure 4.9(b) illustrates an example <strong>of</strong><br />

such a cost-based vectorized plan. There, k = 4 execution buckets are used, while buckets<br />

2 and 4 include two operators. The individual operators <strong>of</strong> each bucket are executed with<br />

the instance-based execution model. As a result, we require only four instead <strong>of</strong> six threads<br />

for this plan.<br />

The input (instance-based plan) and the output (vectorized plan) <strong>of</strong> the P-PV are<br />

extreme cases <strong>of</strong> this generalization. In order to compute the cost-optimal vectorized<br />

plan, we generalize the P-PV to the <strong>Cost</strong>-<strong>Based</strong> P-PV:<br />

Definition 4.2 (<strong>Cost</strong>-<strong>Based</strong> Plan Vectorization Problem (P-CPV)). Let P denote a plan,<br />

and p i ∈ {p 1 , p 2 , . . . , p n } denotes the implied plan instances with P ⇒ p i . Further, let<br />

each plan P comprise a sequence <strong>of</strong> atomic and complex operators o i ∈ {o 1 , o 2 , . . . , o m }.<br />

For serialization purposes, the plan instances are executed in sequence with t 1 (p i ) ≤<br />

t 0 (p i+1 ). The P-CPV describes the search for the derived cost-optimal plan P ′′ according<br />

to the optimization objective φ with k ∈ [1, m] execution buckets b i ∈ {b 1 , b 2 , . . . , b k },<br />

where each bucket contains l operators o i ∈ {o 1 , o 2 , . . . , o l }. Here, the constraint conditions<br />

(t 1 (p ′′<br />

i , b i) ≤ t 0 (p ′′<br />

i , b i+1)) ∧ (t 1 (p ′′<br />

i , b i) ≤ t 0 (p ′′<br />

i+1 , b i)) and (t 1 (b i , o i ) ≤ t 0 (b i , o i+1 )) ∧<br />

100

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!