Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5 Multi-Flow <strong>Optimization</strong><br />
the waiting time at the lower border <strong>of</strong> the defined T L function interval (line 7). Finally,<br />
we compute the total latency time, the resulting number <strong>of</strong> messages per batch k ′ and<br />
check the validity condition in order to react on overload situations (line 8-11). Due to<br />
the linear classification <strong>of</strong> operator costs and the analytical determination <strong>of</strong> ∆tw, this<br />
algorithm exhibits a linear complexity <strong>of</strong> O(m) according to the number <strong>of</strong> operators m.<br />
Despite the described simplification <strong>of</strong> waiting time computation, where we compute<br />
the special case ∆tw = W (P ′ , k ′ ), it would not be sufficient to simply collect messages for<br />
the execution time <strong>of</strong> the current message partition. Most importantly, this is reasoned<br />
by horizontal (value-based) partitioning that leads to internal out-<strong>of</strong>-order execution. For<br />
high message rates in combination with many distinct partitions, the average message<br />
latency as well as the effective system output rate would degrade due to message synchronization<br />
at the outbound side. By computing the waiting time or maximal partition size,<br />
respectively, we flush oldest messages out <strong>of</strong> the system such that better average message<br />
latency times and output rates are achieved and we also reduce the effort for outbound<br />
message synchronization. Finally, we can also compute the global minimum for the general<br />
case <strong>of</strong> arbitrary cost models.<br />
In this subsection, we have described how to compute the optimal waiting time with<br />
regard to minimizing the total latency time under a maximum latency constraint. This<br />
maximizes the message throughput with regard to a single deployed plan. With regard<br />
to multiple deployed plans this approach is applicable as well. However, changing the optimization<br />
objective to minimizing the total execution time under the maximum latency<br />
constraint can lead to an even higher throughput because this minimizes potentially overlapping<br />
execution. However, the computation approach is realized similarly as for the case<br />
<strong>of</strong> a single deployed plan despite the difference that we always try to determine the upper<br />
border (lc) <strong>of</strong> the defined T L function interval rather than the lower border. Finally, the<br />
waiting time computation is a central aspect <strong>of</strong> the multi-flow optimization technique and<br />
<strong>of</strong> its integration into our general cost-based optimization framework.<br />
5.3.4 <strong>Optimization</strong> Algorithm<br />
Putting it all together, we now describe how the multi-flow optimization technique is<br />
integrated into our general cost-based optimization framework. This includes changes <strong>of</strong><br />
the deployment process as well as the cost-based re-optimization.<br />
The deployment process is modified such that it now additionally includes the automatic<br />
derivation <strong>of</strong> partitioning attributes, as described in Subsection 5.2. Apart from<br />
that, most aspects <strong>of</strong> the multi-flow optimization technique are integrated into the feedback<br />
loop <strong>of</strong> our cost-based optimization framework. The major issues are the derivation<br />
<strong>of</strong> partitioning schemes, the rewriting <strong>of</strong> plans and the computation <strong>of</strong> the optimal waiting<br />
time. First, according to the monitored selectivities <strong>of</strong> partitioning attributes that have<br />
been derived during the initial deployment, we derive the optimal partitioning scheme in<br />
case <strong>of</strong> multiple partitioning attributes. Second, if there are at least two attributes and<br />
if we have found a new partitioning scheme during re-optimization, we rewrite the plan<br />
according to this partitioning scheme in order to enable the execution <strong>of</strong> operations on<br />
horizontally partitioned message batches. For a single partitioning attribute, rewriting is<br />
not required because all operators can work transparently on a single partition level as<br />
described in Subsection 5.2.3. This rewriting includes also the requirement <strong>of</strong> dynamic<br />
state migration in the sense <strong>of</strong> transforming an existing partition tree that indexes collected<br />
messages from one partitioning scheme into another scheme. In order to ensure the<br />
150