25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.5 Periodical Re-<strong>Optimization</strong><br />

changed the current plan, there is a need for dynamic rewriting <strong>of</strong> vectorized plans due<br />

to the required state migration <strong>of</strong> loaded message queues and intra-operator states. Both<br />

challenges are addressed in the following.<br />

Evaluating Re-<strong>Optimization</strong> Potential<br />

If the current logical plan has been changed during optimization, we need to evaluate<br />

the benefit <strong>of</strong> rewriting the physical plan during runtime. This is required because a<br />

vectorized plan exhibits a state in the sense <strong>of</strong> all messages that are currently within the<br />

standing process (operators and loaded queues) and thus, we cannot simply generate a new<br />

physical plan. Hence, there is a trade-<strong>of</strong>f between the overhead <strong>of</strong> exchanging the plan and<br />

the benefit yielded by the newly computed best plan. In general, the same is true for the<br />

general optimization framework as well. However, in contrast to the efficient inter-instance<br />

plan change, this trade-<strong>of</strong>f has much higher importance when rewriting vectorized plans<br />

due to the need for state migration or flushing <strong>of</strong> pipelines.<br />

The intuition behind our evaluation approach is to compare the costs <strong>of</strong> flushing the<br />

pipelines <strong>of</strong> the current plan, with the estimated benefit we gain by using P new ′′ instead <strong>of</strong><br />

P cur ′′ for the next period ∆t. We restrict the cost comparison to ∆t because at the next<br />

evaluation timestamp, we might revert the plan change due to changed execution statistics<br />

and hence, we cannot estimate the benefit for a period longer than ∆t. Although we might<br />

miss optimization opportunities in case <strong>of</strong> a constant workload, we use this approach in<br />

order to ensure robustness <strong>of</strong> optimizer choices in terms <strong>of</strong> plan stability.<br />

In detail, the costs <strong>of</strong> flushing the pipelines are affected by the number <strong>of</strong> messages in<br />

the queues q i and the execution time <strong>of</strong> the most time-consuming operator. We determine<br />

the queue cardinalities and compute the costs by<br />

W flush (P ′′<br />

cur) = W (b x ) ·<br />

|b|<br />

l x∑ ∑<br />

bi<br />

|q i | + W (b i ) with W (b x ) = max<br />

k ∑<br />

W (o l ). (4.20)<br />

i=1<br />

i=x<br />

These costs are given by the number <strong>of</strong> messages in front <strong>of</strong> the most time-consuming execution<br />

bucket multiplied by the costs <strong>of</strong> this bucket W (b x ) plus the costs for the remaining<br />

buckets after b x . Those are the approximated costs for flushing the whole pipeline. Note<br />

that incremental rewriting (merging and splitting) is possible as well.<br />

For computing the benefit <strong>of</strong> dynamic rewriting, we use the message rate R to compute<br />

the number <strong>of</strong> processed messages by n = R · ∆t. Then, the benefit <strong>of</strong> exchanging plans<br />

is given by<br />

W change (P ′′ ) = (n + |b| P ′′<br />

new − 1) · W P ′′ new (b x1) − (n + |b| P ′′<br />

cur − 1) · W P ′′ cur (b x2), (4.21)<br />

where W change < 0 holds by definition because the optimizer will only return a new plan<br />

P new if W (P new ) < W (P cur ). Finally, we would change plans if<br />

j=1<br />

l=1<br />

W flush + W change ≤ 0. (4.22)<br />

We illustrate this evaluation approach with the following example.<br />

Example 4.12 (Evaluating Rewriting Benefit). Assume the current plan shown in Figure<br />

4.18(a). The figure also shows the statistics that were present when creating this plan<br />

(t 1 ) as well as the current state (t 2 ) in the form <strong>of</strong> the numbers <strong>of</strong> messages in queues.<br />

During the period ∆t = 10 s, average execution times changed (W (o 2 ) = 4 and W (o 5 ) = 4).<br />

117

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!