25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.5 Experimental Evaluation<br />

First, in order to investigate the benefits <strong>of</strong> rewriting patterns to parallel subflows in<br />

more detail, we executed a speedup experiment. Figure 3.23 shows the results <strong>of</strong> this<br />

experiment on the rewriting <strong>of</strong> sequences to parallel flows (WC2). We used a plan that<br />

contains a sequence <strong>of</strong> m = 100 independent Delay operators, and we explicitly varied<br />

the number <strong>of</strong> threads k (concurrent subflows) used for parallelism (Fork operator) in<br />

order to evaluate the speedup. Due to the distribution <strong>of</strong> m operators to k forklanes, a<br />

theoretical speedup <strong>of</strong> m/ ⌈m/k⌉ is possible. Then, we varied the waiting time <strong>of</strong> each<br />

single Delay operator from 10 ms to 40 ms in order to simulate different network delays<br />

and waiting times for external systems, respectively. This experiment was repeated ten<br />

times. As a result, an increasing maximum speedup (at an increasing number <strong>of</strong> threads)<br />

was measured with increasing waiting time (note that the fall-<strong>of</strong>fs were caused by the Java<br />

garbage collector). This strong dependence <strong>of</strong> multi-tasking benefit on the waiting time<br />

<strong>of</strong> involved operators justifies the decision to use the waiting time as main cost indicator<br />

when rewriting sequences and iterations to parallel flows.<br />

Second, we used our running example plans in order to investigate the scalability <strong>of</strong><br />

optimization benefits with increasing input data size. <strong>Based</strong> on the experimental results<br />

shown in Figure 3.22, we re-used the workload configuration and all plans except plan P 2<br />

because for this plan no optimization technique could be applied. In detail, we executed<br />

20,000 plan instances for each running example plan and compared the periodical reoptimization<br />

with no-optimization varying the input data size d ∈ {1, 2, 3, 4, 5, 6, 7} (in<br />

100 kB messages), which resulted in a total processed input data size <strong>of</strong> up to 13.35 GB<br />

(for d = 7). Note that we varied this input data size only for the initially received message<br />

<strong>of</strong> a plan, while for plans P 3 , P 6 and P 8 , we changed the cardinality <strong>of</strong> externally loaded<br />

data sets because these plans are time-based initiated. Further, we fixed an optimization<br />

interval <strong>of</strong> ∆t = 5 min, a sliding window size <strong>of</strong> ∆w = 5 min and EMA as the workload<br />

aggregation method. For all investigated plans, we observe that the relative benefit <strong>of</strong><br />

optimization increases with increasing data size as shown in Figure 3.24. Essentially, the<br />

same optimization techniques were applied and thus, the optimization time is unaffected<br />

by the used input data size. The highest benefits were reached by the data-flow oriented<br />

optimization techniques. For example, the unoptimized versions <strong>of</strong> plans P 4 , P 6 , P 7 show<br />

a super-linearly increasing execution time with increasing data size, while the optimized<br />

versions shows almost a linear increasing execution time. Thus, the optimized versions<br />

exhibit a better asymptotic behavior, which is for example, caused by rewriting nested<br />

loop Joins to combinations <strong>of</strong> Orderby and merge Join subplans. With this in mind, it<br />

is clear that arbitrarily high optimization benefits can be reached with increasing input<br />

data size (input cardinalities).<br />

<strong>Optimization</strong> Overhead<br />

In addition to the evaluation <strong>of</strong> performance benefits and the scalability with increasing<br />

input cardinalities and increasing number <strong>of</strong> operators, we evaluated the optimization<br />

overhead in more detail. Therefore, we conducted a series <strong>of</strong> experiments, where (1) we<br />

compared the optimization overhead <strong>of</strong> our exhaustive optimization approach versus the<br />

heuristic optimization approach and (2) we analyzed the overhead <strong>of</strong> statistics monitoring<br />

and aggregation in detail.<br />

First, we evaluated the influence <strong>of</strong> using the exhaustive optimization algorithm A-PMO<br />

or the heuristic A-HPMO. We already discussed when it is applicable to use the second<br />

heuristic A-CPO and therefore did not include it into the evaluation. Essentially, the ex-<br />

79

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!