Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />
within the critical path. When reordering selective operators (e.g., joins), this can have<br />
tremendous impact for following operators (that are included in the critical path). For<br />
example, we might exclude a join operator from the optimization because it is not within<br />
the critical path, which might lead to a suboptimal join order.<br />
In conclusion, there are three cases where this critical-path approach is advantageous.<br />
First, it can be used if the optimization interval is really short and thus, optimization<br />
time is more important than in other cases. Second, it is beneficial if the parallel subflows<br />
are fully independent <strong>of</strong> the rest <strong>of</strong> the plan because then, we do not miss any global<br />
optimization potential. Third, its application is promising if there is a significant difference<br />
in the costs <strong>of</strong> the single subflows; otherwise, the critical path might change.<br />
Heuristic Join Enumeration<br />
Since the complexity <strong>of</strong> the A-PMO is dominated by the complexity <strong>of</strong> join enumeration,<br />
we typically use our tailor-made heuristic optimization algorithm, if the number <strong>of</strong> join<br />
operators <strong>of</strong> a plan exceed a certain number. In contrast, for the second problem with<br />
high complexity (merging parallel flows), we apply a heuristic by default because the plan<br />
cost influence <strong>of</strong> join enumeration is much higher than the number <strong>of</strong> parallel flows. Thus,<br />
the A-HPMO is essentially equivalent to the A-PMO except that we do not apply the full<br />
join enumeration but the heuristic described here.<br />
Similar concepts are also used in DBMS, where existing approaches typically fall back<br />
to some kind <strong>of</strong> greedy heuristics or randomized algorithms if a certain optimization time<br />
is exceeded [Neu09]. In contrast, we use—similar to selected DBMS such as Postgres—the<br />
number <strong>of</strong> joins as an indicator when to use the heuristic because otherwise, intermediate<br />
results <strong>of</strong> join enumeration cannot be exploited when using the DPSize (bottom-up dynamic<br />
programming) join enumeration algorithm and thus, the elapsed optimization time<br />
would be wasted if we have to fall back to the heuristic.<br />
Before discussing the join enumeration heuristic, we need to define the join enumeration<br />
restrictions that must be taken into account when reordering joins in order to preserve<br />
the semantic correctness. Most importantly, Rule 2 from Definition 3.1 applies. Thus, if<br />
there is a dependency between an interaction-oriented operator and another operator, the<br />
temporal order <strong>of</strong> them must be equivalent in P and P ′ . In addition to this, the input<br />
data <strong>of</strong> interaction-oriented operators (data that is sent to external systems) must also be<br />
equivalent in P and P ′ (preventing the external behavior from being changed). This has<br />
influence <strong>of</strong> applicable join re-orderings. We use an example to illustrate the consequences<br />
<strong>of</strong> these join enumeration restrictions.<br />
Example 3.6 (Join Enumeration Restrictions). Recall the example plan P 7 and a slightly<br />
different plan P 7 ′ (where we changed the initial order <strong>of</strong> the Invoke operator o 19) that<br />
are illustrated in Figure 3.9. While for plan P 7 (Figure 3.9(a)) a full reordering is applicable<br />
(in case <strong>of</strong> a clique query type, where all data sets are directly connected), for P 7<br />
′<br />
(Figure 3.9(b)), we are not allowed to reorder all Join operators <strong>of</strong> this plan. The plan<br />
P 7 ′ is an example, where we might change the external behavior if we consider full join<br />
enumeration. The reason is, that the external system s 6 requires the result <strong>of</strong> operators<br />
o 14 , o 15 and o 16 . During full join reordering, we might use operator o 17 earlier in this<br />
chain <strong>of</strong> Join operators and thus, we would be unable to produce the required result in<br />
case <strong>of</strong> selective joins (or we need at least a combination <strong>of</strong> Selection and Projection<br />
operators in order to hide additional data). In conclusion, only partial join reordering<br />
52