Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />
t1: Pcur<br />
o1 o2 o3<br />
o4 o5<br />
W(o1)=3 W(o2)=2 W(o3)=3 W(o4)=5 W(o5)=3<br />
t2: Pcur<br />
1 2<br />
3 1<br />
o1 o2 o3<br />
o4 o5<br />
W(o1)=3 W(o2)=4 W(o3)=3 W(o4)=5 W(o5)=4<br />
Pnew<br />
o1 o2 o3<br />
o4 o5<br />
W(o1)=3 W(o2)=4 W(o3)=3 W(o4)=5 W(o5)=4<br />
(a) Deployed Plan P cur<br />
(b) New Plan P new<br />
Figure 4.18: Example Periodical Re-<strong>Optimization</strong><br />
Hence, we created the new plan shown in Figure 4.18(b). First, we determine the costs<br />
for flushing the current pipeline, assuming the new statistics with<br />
W flush (P ′′<br />
cur) = 3 · W (b 2 ) + 5 ms + 4 ms = 30 ms.<br />
Then, we compute the benefit <strong>of</strong> changing the plan by<br />
W change = (n + 5 − 1) · 5 ms − (n + 4 − 1) · (4 ms + 3 ms) = −2 ms · n − 1 ms.<br />
Subsequently, we use the monitored message rate R = 10.7 msg /s and the optimization period<br />
∆t = 10 s as estimation for the number <strong>of</strong> processed messages n = 10.7 msg /s · 10 s = 107<br />
during the next period and compare the costs with the benefit, by assuming full system<br />
utilization, as follows:<br />
(W flush + W change = 30 ms + (−2 ms · 107 − 1 ms)) ≤ 0.<br />
Finally, we decide to exchange plans because in the next period ∆t, we will yield an improvement<br />
<strong>of</strong> 185 ms, including the overhead for rewriting.<br />
If the evaluation <strong>of</strong> the rewriting benefit resulted in the decision to exchange plans, we<br />
need to dynamically rewrite the existing plan during runtime. In the following, we explain<br />
this step in more detail.<br />
Dynamic Plan Rewriting<br />
The major problem when rewriting a vectorized plan during runtime is posed by loaded<br />
queues. One approach would be explicit state migration and state re-computation [ZRH04].<br />
However, re-computation might is impossible for integration flows due to interactions with<br />
external systems that have to be executed exactly once. Therefore, plan rewriting is realized<br />
by stopping execution buckets and flushing <strong>of</strong> intermediate queues.<br />
For example, in order to merge two execution buckets b i and b i+1 with a queue q i+1 in<br />
between, we need to stop the execution bucket b i , while bucket b i+1 is still working. Over<br />
time, we flush q i+1 and wait until it contains zero messages. We then merge the execution<br />
buckets to b i , which contains an instance-based subplan with all operators <strong>of</strong> the merged<br />
subplans, and simply remove q i+1 . This concept can be used for bucket merging and<br />
splitting, respectively and we never loose a message during dynamic plan rewriting.<br />
Putting it all together, we introduced the general concept <strong>of</strong> vectorization as a controlflow-oriented<br />
optimization technique that aims to improve the message throughput. Furthermore,<br />
we generalized this concept to the cost-based plan vectorization and explained<br />
how to take multiple deployed plans into account as well. Finally, we described how this<br />
technique is embedded into our general cost-based optimization framework. Although, this<br />
118