25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

within the critical path. When reordering selective operators (e.g., joins), this can have<br />

tremendous impact for following operators (that are included in the critical path). For<br />

example, we might exclude a join operator from the optimization because it is not within<br />

the critical path, which might lead to a suboptimal join order.<br />

In conclusion, there are three cases where this critical-path approach is advantageous.<br />

First, it can be used if the optimization interval is really short and thus, optimization<br />

time is more important than in other cases. Second, it is beneficial if the parallel subflows<br />

are fully independent <strong>of</strong> the rest <strong>of</strong> the plan because then, we do not miss any global<br />

optimization potential. Third, its application is promising if there is a significant difference<br />

in the costs <strong>of</strong> the single subflows; otherwise, the critical path might change.<br />

Heuristic Join Enumeration<br />

Since the complexity <strong>of</strong> the A-PMO is dominated by the complexity <strong>of</strong> join enumeration,<br />

we typically use our tailor-made heuristic optimization algorithm, if the number <strong>of</strong> join<br />

operators <strong>of</strong> a plan exceed a certain number. In contrast, for the second problem with<br />

high complexity (merging parallel flows), we apply a heuristic by default because the plan<br />

cost influence <strong>of</strong> join enumeration is much higher than the number <strong>of</strong> parallel flows. Thus,<br />

the A-HPMO is essentially equivalent to the A-PMO except that we do not apply the full<br />

join enumeration but the heuristic described here.<br />

Similar concepts are also used in DBMS, where existing approaches typically fall back<br />

to some kind <strong>of</strong> greedy heuristics or randomized algorithms if a certain optimization time<br />

is exceeded [Neu09]. In contrast, we use—similar to selected DBMS such as Postgres—the<br />

number <strong>of</strong> joins as an indicator when to use the heuristic because otherwise, intermediate<br />

results <strong>of</strong> join enumeration cannot be exploited when using the DPSize (bottom-up dynamic<br />

programming) join enumeration algorithm and thus, the elapsed optimization time<br />

would be wasted if we have to fall back to the heuristic.<br />

Before discussing the join enumeration heuristic, we need to define the join enumeration<br />

restrictions that must be taken into account when reordering joins in order to preserve<br />

the semantic correctness. Most importantly, Rule 2 from Definition 3.1 applies. Thus, if<br />

there is a dependency between an interaction-oriented operator and another operator, the<br />

temporal order <strong>of</strong> them must be equivalent in P and P ′ . In addition to this, the input<br />

data <strong>of</strong> interaction-oriented operators (data that is sent to external systems) must also be<br />

equivalent in P and P ′ (preventing the external behavior from being changed). This has<br />

influence <strong>of</strong> applicable join re-orderings. We use an example to illustrate the consequences<br />

<strong>of</strong> these join enumeration restrictions.<br />

Example 3.6 (Join Enumeration Restrictions). Recall the example plan P 7 and a slightly<br />

different plan P 7 ′ (where we changed the initial order <strong>of</strong> the Invoke operator o 19) that<br />

are illustrated in Figure 3.9. While for plan P 7 (Figure 3.9(a)) a full reordering is applicable<br />

(in case <strong>of</strong> a clique query type, where all data sets are directly connected), for P 7<br />

′<br />

(Figure 3.9(b)), we are not allowed to reorder all Join operators <strong>of</strong> this plan. The plan<br />

P 7 ′ is an example, where we might change the external behavior if we consider full join<br />

enumeration. The reason is, that the external system s 6 requires the result <strong>of</strong> operators<br />

o 14 , o 15 and o 16 . During full join reordering, we might use operator o 17 earlier in this<br />

chain <strong>of</strong> Join operators and thus, we would be unable to produce the required result in<br />

case <strong>of</strong> selective joins (or we need at least a combination <strong>of</strong> Selection and Projection<br />

operators in order to hide additional data). In conclusion, only partial join reordering<br />

52

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!