25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.3 Periodical Re-<strong>Optimization</strong><br />

Table 3.4: Example Execution Statistics<br />

PID NID OType Start End ... W [ms]<br />

1 (p 1 ) 1 Receive 4.1 5.7 ... 1.6<br />

1 (p 1 ) 2 Invoke 5.9 7.7 ... 1.8<br />

1 (p 1 ) Plan 4.0 8.0 ... 4.0<br />

2 (p 2 ) 1 Receive 19.1 20.3 ... 1.2<br />

2 (p 2 ) 2 Invoke 20.4 21.7 ... 1.3<br />

2 (p 2 ) Plan 19.0 22.0 ... 3.0<br />

3 (p 3 ) 1 Receive 24.2 26.1 ... 1.9<br />

3 (p 3 ) 2 Invoke 26.1 27.9 ... 1.8<br />

3 (p 3 ) Plan 24.0 28.0 ... 4.0<br />

Basically, the optimization algorithm is triggered with period ∆t = 16 ms. At those timestamps,<br />

only statistics <strong>of</strong> plan instances p i ∈ [T k − ∆w, T k ] are used for cost estimation.<br />

Hence, at T 1 = 13, only statistics <strong>of</strong> p 1 are included, while at T 2 = 29 (T 1 + ∆t), statistics<br />

<strong>of</strong> p 2 and p 3 are used.<br />

As a result, we are able to periodically estimate the costs <strong>of</strong> a plan with the aim to<br />

optimize this plan according to the current workload characteristics.<br />

In the following, we discuss the complexity <strong>of</strong> this approach. Essentially, the periodic<br />

plan optimization problem that includes the creation <strong>of</strong> the optimal plan is NP-hard. This<br />

claim is justified by the known complexity <strong>of</strong> two subproblems (concrete optimization<br />

techniques). First, the merging <strong>of</strong> parallel flows (see Fork operator) to a minimal number<br />

<strong>of</strong> parallel flows with a maximum constraint on the total costs <strong>of</strong> such a flow is reducible<br />

to the NP-hard bin packing problem. Second, also the subproblem <strong>of</strong> join enumeration is,<br />

in general, NP-hard but requires a more detailed argumentation:<br />

• A plan is a hierarchy <strong>of</strong> sequences (Definition 2.1) with control-flow semantics (that<br />

subsume the data-flow semantics). Hence, all types <strong>of</strong> join queries are possible.<br />

• The join enumeration is NP-hard in general (if all types <strong>of</strong> join queries are supported)<br />

[Neu09]. For a comprehensive analysis <strong>of</strong> known results, see [Moe09].<br />

• If a cost model has the ASI property, join enumeration can be computed with polynomial<br />

time. There, ranks are assigned to relations and the sequence <strong>of</strong> ordered<br />

ranks is optimal [IK84, KBZ86, CM95].<br />

• Our cost model does not exhibit the ASI property (see Subsection 3.2.2).<br />

As a result, the periodic plan optimization problem belongs to the complexity class <strong>of</strong><br />

NP-hard problems. Obviously, the analogous problem <strong>of</strong> cost-based query optimization<br />

in DBMS is also NP-hard. However, in [SMWM06], it was shown that the optimal Web<br />

service query plan can be computed in O(n 5 ), where n is the number <strong>of</strong> Web services.<br />

The difference is caused by their assumption <strong>of</strong> negligible local processing costs (including<br />

joins and other data-flow-oriented operators) such that no join enumeration has been used.<br />

In contrast, our optimization objective is to minimize the average total execution costs<br />

including local processing steps. In order to ensure efficient periodical re-optimization, we<br />

will introduce tailor-made search space reduction heuristics in Subsection 3.3.2.<br />

47

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!