Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
5 Multi-Flow <strong>Optimization</strong><br />
mention its complexity. The A-MPR exhibits—similar to the plan vectorization algorithm<br />
(A-PV, Subsection 4.2.2)—a cubic worst-case time complexity <strong>of</strong> O(m 3 ) according to the<br />
number <strong>of</strong> operators m. The rationale for this is the already analyzed dependency checking.<br />
Note that the additional inner loop over following operators do not change this asymptotic<br />
behavior because each operator is assigned only once to an inserted Iteration operator.<br />
The split and merge approach realizes the transparent plan rewriting and thus enables<br />
the execution <strong>of</strong> message partitions even in the case <strong>of</strong> multiple partitioning attributes.<br />
The rewritten plan mainly depends on the cost-based derived partitioning scheme, which<br />
neglects any additional costs <strong>of</strong> PSlit and PMerge operators. The reason for this optimization<br />
objective is that the ordering <strong>of</strong> partitioning attributes has a higher influence<br />
on the overall performance (partitioned queue maintenance and benefit by partitioning)<br />
than the additional operators, because PSplit and PMerge are low-cost operators with<br />
linear scalability with regard to the number <strong>of</strong> messages due to the efficient hash partition<br />
tree data structure. In addition to the throughput improvement achieved by executing<br />
operations on partitions <strong>of</strong> messages, the inserted Iteration operators <strong>of</strong>fer further optimization<br />
potential. In detail, the technique WC3: Rewriting Iterations to Parallel <strong>Flows</strong>,<br />
described in Subsection 3.4.1, can be applied after the A-MPR in order to additionally<br />
achieve a higher degree <strong>of</strong> parallelism that further increases the throughput.<br />
To summarize, we discussed the necessary preconditions in order to enable the horizontal<br />
message queue partitioning and the execution <strong>of</strong> operations on these message partitions. In<br />
detail, we introduced the (hash) partition tree as a message queue data structure that allows<br />
the hierarchical partitioning <strong>of</strong> messages according to certain partitioning attributes.<br />
Furthermore, we introduced basic algorithms (1) for deriving candidate partitioning attributes<br />
from a plan, (2) for deriving the optimal partitioning scheme <strong>of</strong> attributes, and (3)<br />
for rewriting the plan according to this scheme. Only minor changes <strong>of</strong> operators and the<br />
execution environment are necessary, while all other aspects are issues <strong>of</strong> logical optimization<br />
and therefore, fit seamlessly into our cost-based optimization framework. Multi-flow<br />
optimization now reduces to the challenge <strong>of</strong> computing the optimal waiting time.<br />
5.3 Periodical Re-<strong>Optimization</strong><br />
The cost-based decision <strong>of</strong> the multi-flow optimization technique is to compute the optimal<br />
waiting time ∆tw in order to adjust the trade-<strong>of</strong>f between message throughput and latency<br />
times <strong>of</strong> single messages according to the current workload characteristics. In this section,<br />
we define the formal optimization objective, we explain the extended cost model and cost<br />
estimation, we discuss the waiting time computation and finally, show how to integrate<br />
this optimization technique into our cost-based optimization framework.<br />
5.3.1 Formal Problem Definition<br />
As described in Section 2.3.1, we assume a message sequence M = {m 1 , m 2 , . . . , m n } <strong>of</strong><br />
incoming messages, where each message m i is modeled as a (t i , d i , a i )-tuple, where t i ∈ Z +<br />
denotes the incoming timestamp <strong>of</strong> the message, d i denotes a semi-structured tree <strong>of</strong> namevalue<br />
data elements, and a i denotes a list <strong>of</strong> additional atomic name-value attributes. Each<br />
message m i is processed by an instance p i <strong>of</strong> a plan P , and t out (m i ) ∈ Z + denotes the<br />
timestamp when the message has been successfully executed. Here, the latency <strong>of</strong> a single<br />
message T L (m i ) is given by T L (m i ) = t out (m i ) − t i (m i ).<br />
142