25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3 Fundamentals <strong>of</strong> Optimizing <strong>Integration</strong> <strong>Flows</strong><br />

size <strong>of</strong> messages as good indicator for the cardinalities. <strong>Based</strong> on these monitored atomic<br />

statistics, derived more complex statistics such as the relative frequencies <strong>of</strong> alternative<br />

paths P (path i ) or operator selectivities sel(o i ) = |ds out1 (o i )|/|ds in1 (o i )| are computable.<br />

In order to allow cost estimation for integration flows with control-flow semantics, in the<br />

following, we define a double-metric cost model.<br />

Definition 3.2 (Double-Metric <strong>Cost</strong> Estimation). The costs <strong>of</strong> a plan P are defined as<br />

an aggregate <strong>of</strong> operator costs, where these operator costs are defined by two metrics:<br />

1. Abstract costs C(o i ) are defined for all data-flow- and interaction-oriented operators<br />

in the form <strong>of</strong> their time complexity using cardinalities as the metric.<br />

2. Execution times W (o i ) are then used as the second metric in order to weight the<br />

abstract costs. For control-flow-oriented operators, we only use execution times.<br />

Both types <strong>of</strong> input statistics (cardinalities and execution times) are used in the form <strong>of</strong><br />

aggregates over the monitored atomic statistics <strong>of</strong> executed plan instances.<br />

For multiple deployed integration flows, it can be necessary to normalize the monitored<br />

execution statistics (in particular execution times) when aggregating them. We presented<br />

detailed cost normalization algorithms [BHLW09c] that we omit here for the sake <strong>of</strong> being<br />

focused on the core cost model.<br />

<strong>Based</strong> on Definition 3.2, Tables 3.1-3.3 show the double-metric costs <strong>of</strong> all operators <strong>of</strong><br />

our flow meta model (see Subsection 2.3.1). In the following, we substantiate the different<br />

cost formulas according to the two steps <strong>of</strong> our cost estimation approach.<br />

1: Abstract <strong>Cost</strong>s: In a first step, we consider the mentioned abstract costs C(P ) that<br />

are based on the cardinality metric. <strong>Cost</strong>s for interaction-oriented operators include the<br />

costs for the operators Receive, Reply and Invoke. The costs for the Receive operator<br />

are determined as |ds out |, i.e., by the cardinality <strong>of</strong> the received data set ds out and they<br />

comprise costs for transport, protocol handling, format conversion as well as decompression.<br />

The costs for the related Reply operator are similarly computed with |ds in |. Finally,<br />

the costs <strong>of</strong> the Invoke operator are computed by |ds in | + |ds out | because it actively sends<br />

and receives data. A data set |ds out | might be NULL (e.g., in case <strong>of</strong> pure write interactions);<br />

in that case, we assume a cardinality <strong>of</strong> 0. Control-flow-oriented operators have no<br />

abstract costs. An exception to this is the Switch operator because it evaluates expressions<br />

over input data sets, which requires costs <strong>of</strong> |ds in | for a single path expression and<br />

due to the if-elseif-else semantic total costs <strong>of</strong> ∑ n<br />

i=1 (P (path i) · i · |ds in |) for all n paths,<br />

where P (path i ) denotes switch path probabilities (relative frequencies). For computing<br />

the abstract costs <strong>of</strong> the data-flow-oriented operators, we partly adapted a cost model from<br />

a commercial RDBMS [Mak07] to the specific characteristics <strong>of</strong> integration flows. With<br />

regard to physical operator alternatives, we excluded hash-based algorithms because they<br />

Table 3.1: Double-Metric <strong>Cost</strong>s <strong>of</strong> Interaction-Oriented Operators<br />

Operator<br />

Name<br />

Abstract <strong>Cost</strong>s C(o i ) Execution Time W (o i )<br />

Invoke |ds in | + |ds out | W (o i )<br />

Receive |ds out | W (o i )<br />

Reply |ds in | W (o i )<br />

40

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!