25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4 Vectorizing <strong>Integration</strong> <strong>Flows</strong><br />

Algorithm 4.1 Plan Vectorization (A-PV)<br />

Require: operator sequence o<br />

1: B ← ∅, D ← ∅, Q ← ∅<br />

2: for i ← 1 to |o| do // for each operator o i<br />

3: // Part 1: Dependency Analysis<br />

4: for j ← i to |o| do // for each following operator o j<br />

5:<br />

δ<br />

if ∃o j → oi then<br />

6: Q ← Q ∪ q with q ← createQueue()<br />

7: D ← D ∪ d〈o j , q, o i 〉 with d〈o j , q, o i 〉 ← createDependency()<br />

8: // Part 2: Graph Creation<br />

9: if o i ∈ Switch, Iteration, Validate, Signal, Savepoint, Invoke then<br />

10: // see rules on rewriting context-specific operators<br />

11: else<br />

12: b(o i ) ← createBucket(o i )<br />

13: for k ← 1 to |D| do // for each dependency d<br />

14: d〈o y , q, o x 〉 ← d k<br />

15: if o i ≡ o x then<br />

16: b(o i ).addOutputQueue(q)<br />

17: else if o i ≡ o y then<br />

18: b(o i ).addInputQueue(q)<br />

19: if b(o i ).countOutputQueues() ≥ 2 then<br />

20: createCopyOperatorAfter(b(o i ), B, D, Q)<br />

21: else if b(o i ).countOutputQueues() = 0 then<br />

22: createLogicalQueue(b(o i−1 ), b(o i ), B, D, Q)<br />

23: B ← B ∪ b(o i )<br />

24: return B<br />

create the graph <strong>of</strong> execution buckets. This part contains the following three steps. First,<br />

we create an execution bucket for each operator. Second, we connect each operator with<br />

the referenced input queues. Clearly, each queue is referenced by exactly one operator,<br />

but each operator can reference multiple queues. Third, we connect each operator with<br />

the referenced output queues. If one operator must be connected to n output queues with<br />

n ≥ 2 (its results are used by multiple following operators), we insert a Copy operator<br />

after this operator in order to send the message to all dependency sources. Within the<br />

createCopyOperatorAfter function the new operator, execution bucket and queue are<br />

created as well as the dependencies and queue references are changed accordingly. Furthermore,<br />

if an operator is connected to zero input queues, we insert an artificial dummy queue<br />

(where empty message objects are passed) between this operator and its former temporal<br />

predecessor. <strong>Based</strong> on this basic rewriting concept, additional rewriting rules for contextspecific<br />

operators (e.g., Switch, Iteration) and for serialization and recoverability are<br />

required (line 10). We use an example to illustrate the A-PV in more detail.<br />

Example 4.2 (Automatic Plan Vectorization). Recall our example plan P 2 as well as the<br />

vectorized plan P 2 ′ <strong>of</strong> Example 4.1. Figure 4.5(a) and 4.5(b) illustrate the core aspects<br />

when rewriting P 2 to P 2 ′ . First, we determine all direct data dependencies δ and add those<br />

to the set <strong>of</strong> dependencies D. If an operator is referenced by multiple dependencies (e.g.,<br />

operator o 1 in this example), we need to insert a Copy operator just after it. Furthermore,<br />

94

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!