25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5 Multi-Flow <strong>Optimization</strong><br />

The incoming messages m i are partitioned according to the partitioning attribute customer<br />

name that was extracted with ba = m i /Customer/Cname at the inbound side. A plan<br />

instance <strong>of</strong> the rewritten plan P 2 ′ reads the first partition from the queue and executes the<br />

single operators for this partition. Due to the equal values <strong>of</strong> the partitioning attribute, we<br />

do not need to rewrite the query to the external system s 4 . Every batch contains exactly<br />

one distinct attribute value according to ba. In total, we achieve performance benefits for<br />

the Assign, as well as the Invoke operators. Thus, the throughput is further improved<br />

because the execution time <strong>of</strong> such a batch does not include any distinct messages anymore.<br />

Note that the incoming order <strong>of</strong> messages was changed (arrows in Figure 5.5) and therefore<br />

needs to be serialized at the outbound side.<br />

It is important to note that beside external queries (Invoke operator), also local operators<br />

(e.g., Assign and Switch) can directly benefit from horizontal partitioning. Partitioning<br />

attributes are derived from the plan (e.g., query predicates, or switch expressions).<br />

This benefit is caused by operation execution on partitions instead <strong>of</strong> on individual messages.<br />

A similar underlying concept is also used for pre-aggregation [IHW04] or earlygroup-by<br />

[CS94]. In addition, all operators that work on partitions <strong>of</strong> equal messages<br />

(e.g., loaded once from an external system) also need to be executed only once.<br />

Our cost-based optimizer realizes the optimization objective <strong>of</strong> throughput maximization<br />

by monitoring several statistics <strong>of</strong> the stream <strong>of</strong> incoming messages and by periodical<br />

re-optimization, where the optimal waiting time ∆tw is computed. In case <strong>of</strong> low message<br />

rates (no full utilization <strong>of</strong> the integration platform), the waiting time is decreased in order<br />

to ensure low latency <strong>of</strong> single messages. As the message rate increases, the waiting time<br />

is increased accordingly to increase the message throughput by processing more messages<br />

per batch, while preserving maximum latency constraints for single messages.<br />

MQO (Multi-Query <strong>Optimization</strong>) and OOP (Out-<strong>of</strong>-Order Processing) [LTS + 08] have<br />

already been investigated for other system types. In contrast to existing work, we present<br />

the novel MFO approach that is tailor-made for integration flows and that maximizes<br />

the throughput by employing horizontal message queue partitioning and computing the<br />

optimal waiting time. MFO is also related to caching and the recycling <strong>of</strong> intermediate<br />

results [IKNG09]. While caching might lead to use <strong>of</strong> outdated data, the partitioned execution<br />

might cause reading more recent data <strong>of</strong> different objects. However, we cannot<br />

ensure strong consistency by using asynchronous integration flows (decoupled from clients<br />

with message queues) anyway. Furthermore, with regard to eventual consistency [Vog08],<br />

we guarantee (1) monotonic writes, (2) monotonic reads with regard to individual data<br />

objects 12 , (3) read-your-writes/session consistency, (4) semantic correctness as defined in<br />

Definition 3.1, (5) that the temporal gap <strong>of</strong> up-to-dateness is at most equal to a given<br />

latency constraint, and (6) that no outdated data is read. In contrast, caching cannot<br />

guarantee read-your-writes/session consistency and that no outdated data is read. In conclusion,<br />

caching is advantageous if data <strong>of</strong> external sources is static and the amount <strong>of</strong> data<br />

is rather small, while MFO is beneficial if data <strong>of</strong> external sources changes dynamically.<br />

Finally, the major research challenges <strong>of</strong> MFO via horizontal partitioning are (1) to<br />

enable plan execution <strong>of</strong> horizontally partitioned message batches and (2) to compute<br />

the optimal waiting time ∆tw during periodical re-optimization. We address these two<br />

challenges in Section 5.2 and Section 5.3, respectively.<br />

12 Both caching and MFO cannot ensure monotonic reads over multiple data objects due to different read<br />

times <strong>of</strong> certain data objects.<br />

134

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!