Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2 Preliminaries and Existing Techniques<br />
A. Data Transfer <strong>Optimization</strong><br />
Essentially, the problem <strong>of</strong> expensive access to external systems (by actively invoking the<br />
existing outbound adapters) is tackled with data transfer optimization. In this context,<br />
we distinguish between (1) streaming transfered data (where the amount <strong>of</strong> exchanged<br />
data is almost unchanged) and (2) reducing the amount <strong>of</strong> transfered data.<br />
Streaming Transfered Data. The standard invocation model <strong>of</strong> external systems or Web<br />
services is the transfer <strong>of</strong> materialized messages, i.e. the complete message is serialized<br />
(converted to a byte stream) or, more generally, marshalled and transfered to the external<br />
system, where the message is unmarshalled and processed. In contrast, the streaming<br />
invocation <strong>of</strong> Web services for each tuple achieves higher throughput by leveraging pipeline<br />
parallelism, but it also causes overhead, incurred by parsing SOAP/XML headers and<br />
network latencies. Thus, there is a trade-<strong>of</strong>f between pipeline parallelism achieved by<br />
streaming and the existing overhead. There are approaches that exploit this trade-<strong>of</strong>f<br />
by passing batches <strong>of</strong> tuples (chunks <strong>of</strong> messages) in the form <strong>of</strong> individual messages to<br />
the external system. Srivastava et al. introduced the adaptive data chunking [SMWM06],<br />
where the chunk size k was determined—based on the measured response time <strong>of</strong> a batch<br />
c i (k i )—by minimizing the response time per tuple c i (k i )/k. This concept was refined by<br />
Gounaris et al. to the use <strong>of</strong> an online extremum-control approach [GYSD08b, GYSD08a]<br />
that was adapted from control theory. Both approaches use a combination <strong>of</strong> control-flow<br />
semantics with iterator, instance-local execution. In contrast, Preißler et al. introduced<br />
the concept <strong>of</strong> stream-based Web services [PVHL09a, PVHL09b] over multiple process<br />
instances, which is thus classified as control-flow semantics with iterator, instance-global<br />
execution. Here, splitting rules are defined and message buckets are determined by the<br />
message content according to those splitting rules. These buckets are then streamed to<br />
and from the external Web service without the overhead (network latency) <strong>of</strong> passing<br />
individual messages. The concept <strong>of</strong> stream-based Web services has the advantage that<br />
both the data transfer (similar to the previously mentioned approaches, but with less<br />
latency) and the processing within the Web service (that is able to statefully work on<br />
the defined subtrees <strong>of</strong> XML messages) are optimized. In addition, streaming subtrees <strong>of</strong><br />
messages is also advantageous for local processing steps [PHL10]. Although some <strong>of</strong> these<br />
approaches are first steps towards the adaptive behavior <strong>of</strong> integration flows, they neglect<br />
local processing costs in the sense <strong>of</strong> optimizing only calls to external systems or applying<br />
static (rule-based) optimizations only.<br />
Reducing Transfered Data. The aforementioned approaches use a stream-based data<br />
transfer between the integration platform and the external systems. This does not affect<br />
the amount <strong>of</strong> exchanged data (payload). In contrast, there are also approaches that can<br />
reduce the amount <strong>of</strong> transfered data. First, BPEL-DT and BPEL/SQL achieve this by<br />
using references to data sets instead <strong>of</strong> physically exchanging data. BPEL-DT [HRP + 07,<br />
Hab09] reduces the transfered data by passing references to a data layer and using ETL<br />
tools when possible. This reduction is achieved by using proprietary exchange formats<br />
instead <strong>of</strong> XML. Furthermore, Vrhovnik et al. introduced the rule-based optimization<br />
<strong>of</strong> BPEL/SQL processes [VSES08, VSS + 07], where several rewrite rules were defined in<br />
order to condense sequences <strong>of</strong> SQL statements and to push down certain operations to the<br />
DBMS. In particular, the tuple-to-set rewrite rules reduce the amount <strong>of</strong> transfered data.<br />
Second, Subramanian and Sindre introduced the rule-based optimization <strong>of</strong> ActiveXML<br />
workflows [SS09a, SS09b]. An ActiveXML document is an XML document that contains<br />
several Web service calls in order to load dynamic external content. In this approach,<br />
16