25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2 Preliminaries and Existing Techniques<br />

A. Data Transfer <strong>Optimization</strong><br />

Essentially, the problem <strong>of</strong> expensive access to external systems (by actively invoking the<br />

existing outbound adapters) is tackled with data transfer optimization. In this context,<br />

we distinguish between (1) streaming transfered data (where the amount <strong>of</strong> exchanged<br />

data is almost unchanged) and (2) reducing the amount <strong>of</strong> transfered data.<br />

Streaming Transfered Data. The standard invocation model <strong>of</strong> external systems or Web<br />

services is the transfer <strong>of</strong> materialized messages, i.e. the complete message is serialized<br />

(converted to a byte stream) or, more generally, marshalled and transfered to the external<br />

system, where the message is unmarshalled and processed. In contrast, the streaming<br />

invocation <strong>of</strong> Web services for each tuple achieves higher throughput by leveraging pipeline<br />

parallelism, but it also causes overhead, incurred by parsing SOAP/XML headers and<br />

network latencies. Thus, there is a trade-<strong>of</strong>f between pipeline parallelism achieved by<br />

streaming and the existing overhead. There are approaches that exploit this trade-<strong>of</strong>f<br />

by passing batches <strong>of</strong> tuples (chunks <strong>of</strong> messages) in the form <strong>of</strong> individual messages to<br />

the external system. Srivastava et al. introduced the adaptive data chunking [SMWM06],<br />

where the chunk size k was determined—based on the measured response time <strong>of</strong> a batch<br />

c i (k i )—by minimizing the response time per tuple c i (k i )/k. This concept was refined by<br />

Gounaris et al. to the use <strong>of</strong> an online extremum-control approach [GYSD08b, GYSD08a]<br />

that was adapted from control theory. Both approaches use a combination <strong>of</strong> control-flow<br />

semantics with iterator, instance-local execution. In contrast, Preißler et al. introduced<br />

the concept <strong>of</strong> stream-based Web services [PVHL09a, PVHL09b] over multiple process<br />

instances, which is thus classified as control-flow semantics with iterator, instance-global<br />

execution. Here, splitting rules are defined and message buckets are determined by the<br />

message content according to those splitting rules. These buckets are then streamed to<br />

and from the external Web service without the overhead (network latency) <strong>of</strong> passing<br />

individual messages. The concept <strong>of</strong> stream-based Web services has the advantage that<br />

both the data transfer (similar to the previously mentioned approaches, but with less<br />

latency) and the processing within the Web service (that is able to statefully work on<br />

the defined subtrees <strong>of</strong> XML messages) are optimized. In addition, streaming subtrees <strong>of</strong><br />

messages is also advantageous for local processing steps [PHL10]. Although some <strong>of</strong> these<br />

approaches are first steps towards the adaptive behavior <strong>of</strong> integration flows, they neglect<br />

local processing costs in the sense <strong>of</strong> optimizing only calls to external systems or applying<br />

static (rule-based) optimizations only.<br />

Reducing Transfered Data. The aforementioned approaches use a stream-based data<br />

transfer between the integration platform and the external systems. This does not affect<br />

the amount <strong>of</strong> exchanged data (payload). In contrast, there are also approaches that can<br />

reduce the amount <strong>of</strong> transfered data. First, BPEL-DT and BPEL/SQL achieve this by<br />

using references to data sets instead <strong>of</strong> physically exchanging data. BPEL-DT [HRP + 07,<br />

Hab09] reduces the transfered data by passing references to a data layer and using ETL<br />

tools when possible. This reduction is achieved by using proprietary exchange formats<br />

instead <strong>of</strong> XML. Furthermore, Vrhovnik et al. introduced the rule-based optimization<br />

<strong>of</strong> BPEL/SQL processes [VSES08, VSS + 07], where several rewrite rules were defined in<br />

order to condense sequences <strong>of</strong> SQL statements and to push down certain operations to the<br />

DBMS. In particular, the tuple-to-set rewrite rules reduce the amount <strong>of</strong> transfered data.<br />

Second, Subramanian and Sindre introduced the rule-based optimization <strong>of</strong> ActiveXML<br />

workflows [SS09a, SS09b]. An ActiveXML document is an XML document that contains<br />

several Web service calls in order to load dynamic external content. In this approach,<br />

16

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!