25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.6 Summary and Discussion<br />

(a) Selectivity Variations<br />

(b) Execution Time<br />

Figure 3.29: Comparison Scenario <strong>of</strong> Periodical Re-<strong>Optimization</strong> with Correlation<br />

maintaining conditional probabilities over multiple versions <strong>of</strong> a plan. Due to the involvement<br />

<strong>of</strong> three correlated operators, incremental optimization required deleting records<br />

from this correlation table. As a result, there are also some plan switches to suboptimal<br />

plans (e.g., after workload shift *2, we observe three wrong plan switches). However, over<br />

time, the conditional selectivity estimates converge to the real selectivities, which reasons<br />

a 10% improvement with regard to the cumulative execution time. In conclusion, the use<br />

<strong>of</strong> our correlation table ensures robustness in the presense <strong>of</strong> correlated data or conditional<br />

probabilities, while the overhead is negligible (in this comparison scenario, we maintained<br />

three entries in this correlation table).<br />

The details <strong>of</strong> our exhaustive evaluation have shown that significant performance improvements<br />

can be achieved by periodical re-optimization in the sense <strong>of</strong> minimizing the<br />

average execution time <strong>of</strong> a plan, while only moderate overhead is imposed by statistics<br />

monitoring and periodical re-optimization. Even in the case, where no optimization<br />

techniques could be applied, no significant performance penalty was measured. Most importantly,<br />

the optimized plans show a better scalability than unoptimized plans. Thus,<br />

typically, the relative performance improvements increase with an increasing input data<br />

size or an increasing number <strong>of</strong> operators. Finally, with the right choice <strong>of</strong> parameters<br />

the self-adjusting cost model in combination with correlation awareness enables a fast but<br />

still robust adaptation to changing workload characteristics.<br />

3.6 Summary and Discussion<br />

To summarize, in this chapter, we introduced the cost-based optimization <strong>of</strong> imperative<br />

integration flows to overcome the major problem <strong>of</strong> inefficiently performing integration<br />

flows in the presence <strong>of</strong> changing workload characteristics. The incremental maintenance<br />

<strong>of</strong> execution statistics addresses missing knowledge about data properties when integrating<br />

heterogeneous and highly distributed systems and applications. In the area <strong>of</strong> integration<br />

flows, cost-based re-optimization has been considered for the first time. Our solution comprises<br />

the dependency analysis and details on the monitoring <strong>of</strong> workload and execution<br />

statistics as well as the definition <strong>of</strong> the double-metric cost model. <strong>Based</strong> on these foundations,<br />

we discussed the NP-hard Periodic Plan <strong>Optimization</strong> Problem (P-PPO), including<br />

85

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!