25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

We ran our experiments using the same platform as described in Section 3.6. Further,<br />

we executed all experiments on synthetically generated XML data (using our DIPBench<br />

toolsuite [BHLW08c]) due to only minor influence <strong>of</strong> the data distribution <strong>of</strong> real data sets<br />

on the benefit achieved by vectorization because it is a control-flow-oriented optimization<br />

technique. However, there are several aspects with influences on vectorization. In general,<br />

we used five scale factors for all three execution approaches: the data size d <strong>of</strong> input<br />

messages, the number <strong>of</strong> operators m, the time interval t between two arriving messages,<br />

the number <strong>of</strong> plan instances n, and the maximum constraint <strong>of</strong> messages in a queue q.<br />

End-to-End Comparison and Scalability<br />

Similar to the general comparison experiment <strong>of</strong> optimized and unoptimized plan execution,<br />

which results are shown in Figure 3.22, we first evaluated the impact <strong>of</strong> vectorization<br />

and cost-based vectorization compared to the unoptimized execution for our example use<br />

case plans. In detail, we executed 20,000 plan instances for all asynchronous, data-driven<br />

example plans (P 1 , P 2 , P 5 , and P 7 ) and for each execution model. We fixed the cardinality<br />

<strong>of</strong> input data sets to d = 1 (100 kB messages) and used the same workload configuration<br />

(without workload changes and without correlations) as in the mentioned experiment <strong>of</strong><br />

Chapter 3. Note that the normal cost-based plan rewriting is orthogonal to vectorization,<br />

where vectorization achieves additional improvements except for the effects <strong>of</strong> rewriting<br />

patterns to parallel flows. In order to be focused on vectorization, we disable all other<br />

optimization techniques. Furthermore, we fixed an optimization interval <strong>of</strong> ∆t = 5 min,<br />

a sliding window size <strong>of</strong> ∆w = 5 min and EMA as the workload aggregation method. To<br />

summarize, we consistently observe significant total execution time reductions (see Figure<br />

4.19(a)) <strong>of</strong> 71% (P 1 ), 72% (P 2 ), 69% (P 5 ), and 55% (P 7 ). In contrast to Chapter 3, we<br />

measured the scenario elapsed time (the latency time <strong>of</strong> the message sequence) because<br />

for vectorized execution, the execution times <strong>of</strong> single plan instances cannot be aggregated<br />

due to overlapping message execution (pipeline semantics).<br />

(a) Scenario Elapsed Time<br />

(b) First <strong>Optimization</strong> Time<br />

Figure 4.19: Use Case Comparison <strong>of</strong> Vectorization<br />

First, the full vectorization approach leads to a significant reduction <strong>of</strong> the total elapsed<br />

time for execution <strong>of</strong> the sequence <strong>of</strong> 20,000 plan instances. We achieved a speedup <strong>of</strong> factor<br />

three for the plans P 1 , P 2 , and P 5 , while for the plan P 7 we achieved a speedup <strong>of</strong> factor<br />

two. Furthermore, the cost-based vectorization further improved the full vectorization by<br />

about 10%. However, there are cases, where the cost-based vectorization caused only a<br />

minor improvement because plans such as P 1 are too restrictive with regard to merging<br />

execution buckets (e.g., the combination <strong>of</strong> a Switch operator with specific paths is not<br />

120

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!