25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

Invoke (o5)<br />

[service s6, in: msg2, out: msg4]<br />

δ msg4<br />

D<br />

Invoke (o6)<br />

[service s6, in: msg3, out: msg5]<br />

Invoke (o5)<br />

[service s6, in: msg2, out: msg4]<br />

Invoke (o6)<br />

[service s6, in: msg3, out: msg5]<br />

Assign (o7)<br />

[in: msg4, out: msg6]<br />

δ msg5<br />

D<br />

Assign (o7)<br />

[in: msg4, out: msg6]<br />

Assign (o8)<br />

[in: msg5, out: msg7]<br />

δ msg6<br />

D<br />

Assign (o8)<br />

[in: msg5, out: msg7]<br />

Invoke (o9)<br />

[service s7, in: msg6]<br />

δ msg7<br />

D<br />

Invoke (o9)<br />

[service s7, in: msg6]<br />

AND (oa)<br />

[in: msg7, out: msg7]<br />

Invoke (o10)<br />

[service s7, in: msg7]<br />

Invoke (o10)<br />

[service s7, in: msg7]<br />

(a) Dependency Graph DG(P 8) <strong>of</strong> Subplan P 8<br />

(b) Vectorized Subplan P ′ 8<br />

Figure 4.7: Rewriting Invoke Operators<br />

4.2.3 <strong>Cost</strong> Analysis<br />

We already discussed the cost analysis <strong>of</strong> sequences <strong>of</strong> operators, where each operator o i<br />

has a single data dependency with the previous operator o i−1 . In addition, we investigate<br />

the costs with regard to the specific rewriting results. Therefore, we reuse the idealized<br />

model where each operator exhibits constant costs with W (o i ) = 1. Similar to the case <strong>of</strong><br />

operator sequences, this can be extended to the case <strong>of</strong> arbitrary operator costs.<br />

Parallel data flow branches. In the case <strong>of</strong> different data flow branches, messages are<br />

processed by |r| concurrent pipelines within one vectorized plan, where a single pipeline<br />

r i contains |r i | operators. Examples for this type <strong>of</strong> branches are simply overlapping<br />

data dependencies, as well as the Switch and the Fork operator. In this case <strong>of</strong> multiple<br />

branches with |r| ≥ 2, the idealized costs for processing n messages are<br />

{<br />

n · m instance-based (unoptimized)<br />

W (P ) =<br />

n · max |r|<br />

i=1 (|r i|) instance-based (optimized, if applicable)<br />

|r|<br />

W (P ′ ) = n + max (|r i|) − 1 fully vectorized,<br />

i=1<br />

(4.6)<br />

where max |r|<br />

i=1 (|r i|) denotes the longest branch. The benefit compared to inter-operator<br />

(horizontal) parallelism depends on the optimization techniques that could be applied on<br />

the instance-based representation because not all operators without data dependencies can<br />

be parallelized (e.g., sequence <strong>of</strong> writing interactions). Furthermore, in the case <strong>of</strong> |r| = 1<br />

and thus, |r 1 | = m, the general cost analysis stays true. The improvement is caused by the<br />

higher degree <strong>of</strong> parallelism. However, the presence <strong>of</strong> parallel data flow branches may also<br />

cause overhead for vectorized plans with regard to splitting and merging those branches.<br />

An example for the splitting <strong>of</strong> branches is the Copy operator that is used for multiple<br />

dependencies on one message. Further, examples for the merging <strong>of</strong> branches are the And<br />

operator for synchronizing external writes as well as the Xor operator for synchronizing<br />

the Switch operator.<br />

Rolled-out Iteration. Further, the rewriting <strong>of</strong> Iteration operators with foreach<br />

semantics needs some consideration. Here, we split messages according to the foreach<br />

condition and execute the iteration body as inner pipeline without cyclic dependencies.<br />

Finally, the processed sub messages are merged using the Setoperation operator (UNION<br />

98

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!