Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.2 Horizontal Queue Partitioning<br />
denote the average selectivity <strong>of</strong> a single partitioning attribute. Thus, 1/sel(ba i ) represents<br />
the number <strong>of</strong> distinct values <strong>of</strong> this attribute, which is equivalent to the worst-case<br />
number <strong>of</strong> partitions at index level i. These partitions are unsorted according to the partitioning<br />
attribute. Thus, due to the resulting linear comparison <strong>of</strong> partitions at each index<br />
level, the enqueue operation exhibits a worst-case time complexity <strong>of</strong> O( ∑ h<br />
i 1/sel(ba i)).<br />
In contrast to this, the dequeue operation exhibits a constant worst-case time complexity<br />
<strong>of</strong> O(1) because it simply removes the first top-level partition. In conclusion, there is only<br />
moderate overhead for maintaining partition trees if the selectivity is not too low.<br />
However, in order to ensure robustness with regard to arbitrary selectivities, we extend<br />
the basic structure <strong>of</strong> a partition tree to the hash partition tree. Figure 5.7 illustrates<br />
an example <strong>of</strong> its extended queue data structure. Essentially, a hash partition tree is<br />
a partition tree with a hash table as a secondary index structure over the partitioning<br />
attribute values (primarily applicable for attribute types value and range).<br />
h(ba1) last first<br />
0<br />
1<br />
2<br />
3<br />
ba1<br />
(Customer)<br />
partition b5 [“CustB“] partition b2 [“CustC“] partition b1 [“CustA“]<br />
tc(b5)<br />
> tc(b2) > tc(b1)<br />
Figure 5.7: Example Hash Partition Tree<br />
This hash table is used in order to probe (get) if there already exist a value <strong>of</strong> a partitioning<br />
attribute. If so, we insert the message into the corresponding partition; otherwise,<br />
we create a new partition, append it at the end <strong>of</strong> the list and put a reference to this<br />
partition into the hash table as well. Accordingly, the dequeue operation still gets the<br />
first partition from the list but additionally removes the pointer to this partition from<br />
the hash table. As a result, we reduced the complexity for both operations, enqueue and<br />
dequeue—for the case, where no serialized external behavior (SEB) is required—to constant<br />
time complexity <strong>of</strong> O(1). Despite, this probe possibility, for SEB, we additionally<br />
need to determine the number <strong>of</strong> messages that have been outrun if a related partition<br />
already exist. As a result, for the case, where SEB is required, the worst-case complexity<br />
is still O( ∑ h<br />
i 1/sel(ba i)) but we benefit if the partition does not exist already.<br />
The requirement <strong>of</strong> serialized external behavior (SEB) implies the synchronization <strong>of</strong><br />
messages at the outbound side. Therefore, we extended the message structure by a counter<br />
c with c ∈ Z + to m i = (t i , c i , d i , a i ). If a message m i outruns another message m j during<br />
the enqueue operation, its counter c i is incremented by one. The serialization at the<br />
outbound side is then realized by comparing source message IDs similar to the serialization<br />
concept <strong>of</strong> Chapter 4, and for each reordered message, the counter is decremented by<br />
one. Thus, at the outbound side, we are not allowed to send message m i until c i = 0.<br />
This counter-based serialization concept 13 is required in addition to the concept <strong>of</strong> AND<br />
and XOR serialization operators (introduced in Chapter 4) in order to allow out-<strong>of</strong>-order<br />
execution rather than just parallel execution <strong>of</strong> concurrent operator pipelines. Despite this<br />
serialization concept, we cannot execute message partitions in parallel because this would<br />
13 Counter-based serialization works also for CN:CM multiplicities between input and output messages,<br />
where locally created messages get the counter <strong>of</strong> the input messages. For example, N:C multiplicities<br />
arise if there are writing interactions in paths <strong>of</strong> a Switch operator. This is addressed by periodically<br />
flushing the outbound queues, where all counters <strong>of</strong> messages that exceed lc are set to zero.<br />
137