Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
3.5 Experimental Evaluation<br />
First, in order to investigate the benefits <strong>of</strong> rewriting patterns to parallel subflows in<br />
more detail, we executed a speedup experiment. Figure 3.23 shows the results <strong>of</strong> this<br />
experiment on the rewriting <strong>of</strong> sequences to parallel flows (WC2). We used a plan that<br />
contains a sequence <strong>of</strong> m = 100 independent Delay operators, and we explicitly varied<br />
the number <strong>of</strong> threads k (concurrent subflows) used for parallelism (Fork operator) in<br />
order to evaluate the speedup. Due to the distribution <strong>of</strong> m operators to k forklanes, a<br />
theoretical speedup <strong>of</strong> m/ ⌈m/k⌉ is possible. Then, we varied the waiting time <strong>of</strong> each<br />
single Delay operator from 10 ms to 40 ms in order to simulate different network delays<br />
and waiting times for external systems, respectively. This experiment was repeated ten<br />
times. As a result, an increasing maximum speedup (at an increasing number <strong>of</strong> threads)<br />
was measured with increasing waiting time (note that the fall-<strong>of</strong>fs were caused by the Java<br />
garbage collector). This strong dependence <strong>of</strong> multi-tasking benefit on the waiting time<br />
<strong>of</strong> involved operators justifies the decision to use the waiting time as main cost indicator<br />
when rewriting sequences and iterations to parallel flows.<br />
Second, we used our running example plans in order to investigate the scalability <strong>of</strong><br />
optimization benefits with increasing input data size. <strong>Based</strong> on the experimental results<br />
shown in Figure 3.22, we re-used the workload configuration and all plans except plan P 2<br />
because for this plan no optimization technique could be applied. In detail, we executed<br />
20,000 plan instances for each running example plan and compared the periodical reoptimization<br />
with no-optimization varying the input data size d ∈ {1, 2, 3, 4, 5, 6, 7} (in<br />
100 kB messages), which resulted in a total processed input data size <strong>of</strong> up to 13.35 GB<br />
(for d = 7). Note that we varied this input data size only for the initially received message<br />
<strong>of</strong> a plan, while for plans P 3 , P 6 and P 8 , we changed the cardinality <strong>of</strong> externally loaded<br />
data sets because these plans are time-based initiated. Further, we fixed an optimization<br />
interval <strong>of</strong> ∆t = 5 min, a sliding window size <strong>of</strong> ∆w = 5 min and EMA as the workload<br />
aggregation method. For all investigated plans, we observe that the relative benefit <strong>of</strong><br />
optimization increases with increasing data size as shown in Figure 3.24. Essentially, the<br />
same optimization techniques were applied and thus, the optimization time is unaffected<br />
by the used input data size. The highest benefits were reached by the data-flow oriented<br />
optimization techniques. For example, the unoptimized versions <strong>of</strong> plans P 4 , P 6 , P 7 show<br />
a super-linearly increasing execution time with increasing data size, while the optimized<br />
versions shows almost a linear increasing execution time. Thus, the optimized versions<br />
exhibit a better asymptotic behavior, which is for example, caused by rewriting nested<br />
loop Joins to combinations <strong>of</strong> Orderby and merge Join subplans. With this in mind, it<br />
is clear that arbitrarily high optimization benefits can be reached with increasing input<br />
data size (input cardinalities).<br />
<strong>Optimization</strong> Overhead<br />
In addition to the evaluation <strong>of</strong> performance benefits and the scalability with increasing<br />
input cardinalities and increasing number <strong>of</strong> operators, we evaluated the optimization<br />
overhead in more detail. Therefore, we conducted a series <strong>of</strong> experiments, where (1) we<br />
compared the optimization overhead <strong>of</strong> our exhaustive optimization approach versus the<br />
heuristic optimization approach and (2) we analyzed the overhead <strong>of</strong> statistics monitoring<br />
and aggregation in detail.<br />
First, we evaluated the influence <strong>of</strong> using the exhaustive optimization algorithm A-PMO<br />
or the heuristic A-HPMO. We already discussed when it is applicable to use the second<br />
heuristic A-CPO and therefore did not include it into the evaluation. Essentially, the ex-<br />
79