25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5 Multi-Flow <strong>Optimization</strong><br />

the waiting time at the lower border <strong>of</strong> the defined T L function interval (line 7). Finally,<br />

we compute the total latency time, the resulting number <strong>of</strong> messages per batch k ′ and<br />

check the validity condition in order to react on overload situations (line 8-11). Due to<br />

the linear classification <strong>of</strong> operator costs and the analytical determination <strong>of</strong> ∆tw, this<br />

algorithm exhibits a linear complexity <strong>of</strong> O(m) according to the number <strong>of</strong> operators m.<br />

Despite the described simplification <strong>of</strong> waiting time computation, where we compute<br />

the special case ∆tw = W (P ′ , k ′ ), it would not be sufficient to simply collect messages for<br />

the execution time <strong>of</strong> the current message partition. Most importantly, this is reasoned<br />

by horizontal (value-based) partitioning that leads to internal out-<strong>of</strong>-order execution. For<br />

high message rates in combination with many distinct partitions, the average message<br />

latency as well as the effective system output rate would degrade due to message synchronization<br />

at the outbound side. By computing the waiting time or maximal partition size,<br />

respectively, we flush oldest messages out <strong>of</strong> the system such that better average message<br />

latency times and output rates are achieved and we also reduce the effort for outbound<br />

message synchronization. Finally, we can also compute the global minimum for the general<br />

case <strong>of</strong> arbitrary cost models.<br />

In this subsection, we have described how to compute the optimal waiting time with<br />

regard to minimizing the total latency time under a maximum latency constraint. This<br />

maximizes the message throughput with regard to a single deployed plan. With regard<br />

to multiple deployed plans this approach is applicable as well. However, changing the optimization<br />

objective to minimizing the total execution time under the maximum latency<br />

constraint can lead to an even higher throughput because this minimizes potentially overlapping<br />

execution. However, the computation approach is realized similarly as for the case<br />

<strong>of</strong> a single deployed plan despite the difference that we always try to determine the upper<br />

border (lc) <strong>of</strong> the defined T L function interval rather than the lower border. Finally, the<br />

waiting time computation is a central aspect <strong>of</strong> the multi-flow optimization technique and<br />

<strong>of</strong> its integration into our general cost-based optimization framework.<br />

5.3.4 <strong>Optimization</strong> Algorithm<br />

Putting it all together, we now describe how the multi-flow optimization technique is<br />

integrated into our general cost-based optimization framework. This includes changes <strong>of</strong><br />

the deployment process as well as the cost-based re-optimization.<br />

The deployment process is modified such that it now additionally includes the automatic<br />

derivation <strong>of</strong> partitioning attributes, as described in Subsection 5.2. Apart from<br />

that, most aspects <strong>of</strong> the multi-flow optimization technique are integrated into the feedback<br />

loop <strong>of</strong> our cost-based optimization framework. The major issues are the derivation<br />

<strong>of</strong> partitioning schemes, the rewriting <strong>of</strong> plans and the computation <strong>of</strong> the optimal waiting<br />

time. First, according to the monitored selectivities <strong>of</strong> partitioning attributes that have<br />

been derived during the initial deployment, we derive the optimal partitioning scheme in<br />

case <strong>of</strong> multiple partitioning attributes. Second, if there are at least two attributes and<br />

if we have found a new partitioning scheme during re-optimization, we rewrite the plan<br />

according to this partitioning scheme in order to enable the execution <strong>of</strong> operations on<br />

horizontally partitioned message batches. For a single partitioning attribute, rewriting is<br />

not required because all operators can work transparently on a single partition level as<br />

described in Subsection 5.2.3. This rewriting includes also the requirement <strong>of</strong> dynamic<br />

state migration in the sense <strong>of</strong> transforming an existing partition tree that indexes collected<br />

messages from one partitioning scheme into another scheme. In order to ensure the<br />

150

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!