25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.2 Horizontal Queue Partitioning<br />

denote the average selectivity <strong>of</strong> a single partitioning attribute. Thus, 1/sel(ba i ) represents<br />

the number <strong>of</strong> distinct values <strong>of</strong> this attribute, which is equivalent to the worst-case<br />

number <strong>of</strong> partitions at index level i. These partitions are unsorted according to the partitioning<br />

attribute. Thus, due to the resulting linear comparison <strong>of</strong> partitions at each index<br />

level, the enqueue operation exhibits a worst-case time complexity <strong>of</strong> O( ∑ h<br />

i 1/sel(ba i)).<br />

In contrast to this, the dequeue operation exhibits a constant worst-case time complexity<br />

<strong>of</strong> O(1) because it simply removes the first top-level partition. In conclusion, there is only<br />

moderate overhead for maintaining partition trees if the selectivity is not too low.<br />

However, in order to ensure robustness with regard to arbitrary selectivities, we extend<br />

the basic structure <strong>of</strong> a partition tree to the hash partition tree. Figure 5.7 illustrates<br />

an example <strong>of</strong> its extended queue data structure. Essentially, a hash partition tree is<br />

a partition tree with a hash table as a secondary index structure over the partitioning<br />

attribute values (primarily applicable for attribute types value and range).<br />

h(ba1) last first<br />

0<br />

1<br />

2<br />

3<br />

ba1<br />

(Customer)<br />

partition b5 [“CustB“] partition b2 [“CustC“] partition b1 [“CustA“]<br />

tc(b5)<br />

> tc(b2) > tc(b1)<br />

Figure 5.7: Example Hash Partition Tree<br />

This hash table is used in order to probe (get) if there already exist a value <strong>of</strong> a partitioning<br />

attribute. If so, we insert the message into the corresponding partition; otherwise,<br />

we create a new partition, append it at the end <strong>of</strong> the list and put a reference to this<br />

partition into the hash table as well. Accordingly, the dequeue operation still gets the<br />

first partition from the list but additionally removes the pointer to this partition from<br />

the hash table. As a result, we reduced the complexity for both operations, enqueue and<br />

dequeue—for the case, where no serialized external behavior (SEB) is required—to constant<br />

time complexity <strong>of</strong> O(1). Despite, this probe possibility, for SEB, we additionally<br />

need to determine the number <strong>of</strong> messages that have been outrun if a related partition<br />

already exist. As a result, for the case, where SEB is required, the worst-case complexity<br />

is still O( ∑ h<br />

i 1/sel(ba i)) but we benefit if the partition does not exist already.<br />

The requirement <strong>of</strong> serialized external behavior (SEB) implies the synchronization <strong>of</strong><br />

messages at the outbound side. Therefore, we extended the message structure by a counter<br />

c with c ∈ Z + to m i = (t i , c i , d i , a i ). If a message m i outruns another message m j during<br />

the enqueue operation, its counter c i is incremented by one. The serialization at the<br />

outbound side is then realized by comparing source message IDs similar to the serialization<br />

concept <strong>of</strong> Chapter 4, and for each reordered message, the counter is decremented by<br />

one. Thus, at the outbound side, we are not allowed to send message m i until c i = 0.<br />

This counter-based serialization concept 13 is required in addition to the concept <strong>of</strong> AND<br />

and XOR serialization operators (introduced in Chapter 4) in order to allow out-<strong>of</strong>-order<br />

execution rather than just parallel execution <strong>of</strong> concurrent operator pipelines. Despite this<br />

serialization concept, we cannot execute message partitions in parallel because this would<br />

13 Counter-based serialization works also for CN:CM multiplicities between input and output messages,<br />

where locally created messages get the counter <strong>of</strong> the input messages. For example, N:C multiplicities<br />

arise if there are writing interactions in paths <strong>of</strong> a Switch operator. This is addressed by periodically<br />

flushing the outbound queues, where all counters <strong>of</strong> messages that exceed lc are set to zero.<br />

137

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!