Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
Cost-Based Optimization of Integration Flows - Datenbanken ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.3 Periodical Re-<strong>Optimization</strong><br />
benefit from a partitioning attribute (e.g., for writing interactions <strong>of</strong> an Invoke operator).<br />
However, in some cases (e.g., operations on externally loaded data), all operators<br />
can benefit from partitioning as well because they are inherently executed only once and<br />
just expanded to the batch size if required (e.g., by the first binary operator that receives<br />
a message partition as one <strong>of</strong> its inputs). For example, as we load data for a partition<br />
<strong>of</strong> messages, we can execute any subsequent transformation <strong>of</strong> this loaded data also only<br />
once.<br />
For operators that do not benefit from partitioning, the abstract costs are computed by<br />
C(o ′ i , k′ ) = C(o i ) · k ′ and the execution time can be computed by W (o ′ i , k′ ) = W (o i ) · k ′<br />
or by W (o ′ i , k′ ) = W (o i ) · C(o ′ i , k′ )/C(o i ). Finally, if k ′ = 1, we get the instance-based<br />
costs with C(o ′ i , k′ ) = C(o i ) and W (o ′ i , k′ ) = W (o i ). Thus, the instance-based execution<br />
is a specific case <strong>of</strong> the execution <strong>of</strong> horizontally partitioned message batches. As a result,<br />
theoretically, partitioning cannot cause any performance decrease <strong>of</strong> an operator.<br />
In addition to the mentioned operators that can benefit from partitioning, there are<br />
further operators that might also benefit from partitioning. Examples for these are the<br />
Join, Selection, and Groupby operators. However, due to the partitioning <strong>of</strong> complete<br />
messages (with tree-structured data) partitioning applies only to specific cases, where a<br />
message has only a single tuple (to which the value <strong>of</strong> the partitioning attribute refers).<br />
Hence, we do not consider these operators because the possible benefit is strongly limited.<br />
Nevertheless, these operators could be included with benefit if streaming <strong>of</strong> message parts<br />
(e.g., a part for each tuple) [PVHL09a, PVHL09b] is applied because, we could execute<br />
Join, Selection, and Groupby operators efficiently on whole batches <strong>of</strong> these parts. We<br />
use our example plan P 2 in order to illustrate the overall cost estimation in detail.<br />
Example 5.7 (Extended <strong>Cost</strong> Estimation). Recall the rewritten plan P 2 ′ (Figure 5.5) and<br />
assume a number <strong>of</strong> k ′ messages per message partition. Using the extended cost model, we<br />
can estimate the execution time W (P 2 ′, k′ ). The monitored average execution times W (o i )<br />
are shown in the table in Figure 5.11. Now, we compute W (P 2 ′, k′ ) as follows:<br />
W (P ′ 2, k ′ ) =<br />
m∑<br />
W (o ′ i, k ′ ) = W (o 1 ) + W (o 2 ) + W (o 3 ) + W (o 4 ) · k ′ + W (o 5 ) · k ′ + W (o 6 ) · k ′<br />
i=1<br />
= W (o 1 ) + W (o 2 ) + W (o 3 ) + (W (o 4 ) + W (o 5 ) + W (o 6 )) · k ′<br />
The operators o 1 , o 2 , and o 3 benefit from partitioning and hence, we assign costs that<br />
are independent <strong>of</strong> k ′ , while costs <strong>of</strong> operators o 4 , o 5 , and o 6 increase linearly with k ′ .<br />
Using this cost function <strong>of</strong> P 2 we can estimate the execution time for an arbitrary number<br />
Operator o i Execution Time W (o i )<br />
o 1<br />
o 2<br />
o 3<br />
o 4<br />
o 5<br />
o 6<br />
P<br />
0.01 s<br />
0.015 s<br />
0.3 s<br />
0.055 s<br />
0.02 s<br />
0.13 s<br />
0.53 s<br />
Figure 5.11: Relative Execution Time W (P ′ 2 , k′ )/k ′ 145