25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

t1: Pcur<br />

o1 o2 o3<br />

o4 o5<br />

W(o1)=3 W(o2)=2 W(o3)=3 W(o4)=5 W(o5)=3<br />

t2: Pcur<br />

1 2<br />

3 1<br />

o1 o2 o3<br />

o4 o5<br />

W(o1)=3 W(o2)=4 W(o3)=3 W(o4)=5 W(o5)=4<br />

Pnew<br />

o1 o2 o3<br />

o4 o5<br />

W(o1)=3 W(o2)=4 W(o3)=3 W(o4)=5 W(o5)=4<br />

(a) Deployed Plan P cur<br />

(b) New Plan P new<br />

Figure 4.18: Example Periodical Re-<strong>Optimization</strong><br />

Hence, we created the new plan shown in Figure 4.18(b). First, we determine the costs<br />

for flushing the current pipeline, assuming the new statistics with<br />

W flush (P ′′<br />

cur) = 3 · W (b 2 ) + 5 ms + 4 ms = 30 ms.<br />

Then, we compute the benefit <strong>of</strong> changing the plan by<br />

W change = (n + 5 − 1) · 5 ms − (n + 4 − 1) · (4 ms + 3 ms) = −2 ms · n − 1 ms.<br />

Subsequently, we use the monitored message rate R = 10.7 msg /s and the optimization period<br />

∆t = 10 s as estimation for the number <strong>of</strong> processed messages n = 10.7 msg /s · 10 s = 107<br />

during the next period and compare the costs with the benefit, by assuming full system<br />

utilization, as follows:<br />

(W flush + W change = 30 ms + (−2 ms · 107 − 1 ms)) ≤ 0.<br />

Finally, we decide to exchange plans because in the next period ∆t, we will yield an improvement<br />

<strong>of</strong> 185 ms, including the overhead for rewriting.<br />

If the evaluation <strong>of</strong> the rewriting benefit resulted in the decision to exchange plans, we<br />

need to dynamically rewrite the existing plan during runtime. In the following, we explain<br />

this step in more detail.<br />

Dynamic Plan Rewriting<br />

The major problem when rewriting a vectorized plan during runtime is posed by loaded<br />

queues. One approach would be explicit state migration and state re-computation [ZRH04].<br />

However, re-computation might is impossible for integration flows due to interactions with<br />

external systems that have to be executed exactly once. Therefore, plan rewriting is realized<br />

by stopping execution buckets and flushing <strong>of</strong> intermediate queues.<br />

For example, in order to merge two execution buckets b i and b i+1 with a queue q i+1 in<br />

between, we need to stop the execution bucket b i , while bucket b i+1 is still working. Over<br />

time, we flush q i+1 and wait until it contains zero messages. We then merge the execution<br />

buckets to b i , which contains an instance-based subplan with all operators <strong>of</strong> the merged<br />

subplans, and simply remove q i+1 . This concept can be used for bucket merging and<br />

splitting, respectively and we never loose a message during dynamic plan rewriting.<br />

Putting it all together, we introduced the general concept <strong>of</strong> vectorization as a controlflow-oriented<br />

optimization technique that aims to improve the message throughput. Furthermore,<br />

we generalized this concept to the cost-based plan vectorization and explained<br />

how to take multiple deployed plans into account as well. Finally, we described how this<br />

technique is embedded into our general cost-based optimization framework. Although, this<br />

118

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!