25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 Preliminaries and Existing Techniques<br />

plan. Further, progressive optimization [HNM + 07, MRS + 04, EKMR06] uses checkpoint<br />

operators to determine if validity ranges <strong>of</strong> subplans are violated and thus, can avoid<br />

unnecessary re-optimization. The major drawback is that validity ranges are defined as<br />

black boxes for subplans. Hence, directed re-optimization is impossible and it cannot be<br />

guaranteed that the current plan is not optimal. In addition, there might be trashing<br />

<strong>of</strong> intermediates, and the use <strong>of</strong> materialization points might be too coarse-grained. The<br />

latter problems were addressed by corrective query processing [IHW04] (reactive, intraoperator)<br />

that uses data partitioning across plans. Here, new plans are used only for new<br />

data and stitch-up phases combine the results <strong>of</strong> different plans. This is disadvantageous<br />

if there are large intermediate results in combination with large operator states because<br />

these results must be post-processed and then merged with a union.<br />

In contrast to these reactive re-optimization approaches, proactive, inter-operator reoptimization<br />

in Rio [BBD05a, BBD05b] computes bounding boxes around all used estimates<br />

to express their uncertainty before optimization. The bounding boxes are then<br />

used to create robust or switchable plans. During execution, a switch operator can choose<br />

between three (low, estimate, high) different remaining plans based on a random sample<br />

<strong>of</strong> its input. However, those bounding boxes are used as black boxes with regard to single<br />

estimates. Again, this makes directed re-optimization impossible and the suboptimality<br />

<strong>of</strong> the current plan cannot be guaranteed.<br />

To summarize, all plan-based adaptation approaches rely on the assumption <strong>of</strong> longrunning<br />

queries, where mid-query re-optimization can significantly improve the query execution<br />

time. <strong>Optimization</strong> is triggered synchronously at materialization points or asynchronously<br />

in the case <strong>of</strong> intra-operator re-optimization.<br />

2.2.3 Continuous-Query-<strong>Based</strong> Adaptation<br />

In contrast to plan-based adaptation in DBMS, the adaptation <strong>of</strong> continuous queries in<br />

DSMS is typically realized with another approach: The optimizer specifies which statistics<br />

to gather, requests them from the monitoring component, and re-optimization is triggered<br />

periodically or whenever significant changes have occurred [BB05]. CQ-specific aspects<br />

are the extensive pr<strong>of</strong>iling <strong>of</strong> stream characteristics [BMM + 04] and the state migration<br />

(e.g., tuples in hash tables) during re-optimization [ZRH04] in order to prevent missing<br />

tuples or duplicates and to ensure the tuple order. Examples for this optimization model<br />

are CAPE [ZRH04, RDS + 04, LZJ + 05], NiagaraCQ [CDTW00], StreaMon [BW04], and<br />

PIPES [CKSV08, KS09, KS04]. There exist high statistics monitoring overhead and the<br />

mentioned problem <strong>of</strong> when to trigger re-optimization.<br />

In order to tackle the problem <strong>of</strong> state migration and to allow for fine-grained adaptation<br />

as well as load balancing, also tuple routing strategies can be applied. The routing-based<br />

adaptation does not use any optimizer but combines optimization, execution, and statistics<br />

gathering. The most prominent example <strong>of</strong> such a system is Eddies [AH00, MSHR02].<br />

An eddy operator is used to route single tuples along different operators rather than<br />

using predefined plans. Due to the dynamic evaluation <strong>of</strong> applicable operators as well<br />

as the decisions on routing paths by routing policies [AH00, TD03, BBDW05], there can<br />

be significant overhead compared to plan-based adaptation [Des04]. This problem was<br />

weakened by the self-tuning query mesh [NRB09, NWRB09] that uses a concept drift<br />

approach to route groups <strong>of</strong> tuples instead <strong>of</strong> single tuples.<br />

Both approaches <strong>of</strong> continuous-query-based adaptation and tuple routing strategies rely<br />

on the assumption that continuous queries process infinite tuple streams. Specific charac-<br />

20

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!