Cost-Based Optimization of Integration Flows - Datenbanken ...

More documents

Recommendations

Info

3 Fundamentals of Optimizing Integration Flows for this example plan, the re-optimization time is dominated by the physical plan compilation and the waiting time for the next possible exchange of plans. The generation of physical plans regardless of whether or not a new plan was found has been reasoned by optimization techniques (such as switch path re-ordering), which directly reorder operators and thus, they always signal that recompilation is required. Clearly, for periodical re-optimization, we could use a larger ∆t and thus, would reduce the cumulative total optimization time. However, in that case, we would use suboptimal plans for a longer time and hence, we would miss more optimization opportunities. Furthermore, Figures 3.20(d) and 3.20(f) show the execution time using periodical reoptimization compared to the non-optimized execution. The different execution times are caused by the changing workload characteristics in the sense of different input cardinalities as well as selectivities of the different operators. When using periodical reoptimization, the often re-occurring small peaks are caused by the numerous asynchronous re-optimization steps. Further, a major characteristic of periodical re-optimization is that after a certain workload shift, there is an adaptation delay until periodical re-optimization is triggered. During this time, the execution time of the current plan is much longer than the execution time of the optimal plan. In order to reduce these adaptation delays, a small ∆t is required. However, this would significantly increase the total re-optimization time. To summarize, over time, significant execution time reductions are yielded by periodical re-optimization due to the adaptation to changing workload characteristics. (a) Cumulative Execution Time (b) Cumulative Opt. Time (c) Scenario Elapsed Time Figure 3.21: Influence of Optimization Interval ∆t Second, we used the introduced comparison scenario in order to investigate the influence of the parameter ∆t in more detail. We re-executed this with different optimization periods ∆t ∈ {1 s, 2 s, 3.75 s, 7.5 s, 15 s, 30 s, 60 s, 120 s, 240 s, 480 s, 960 s 1800 s, 3600 s, 7200 s}. Figure 3.21(a) illustrates the resulting cumulative execution time for optimized execution compared to the unoptimized execution. We observe that the higher the optimization period, the higher the cumulative execution time because we miss optimization opportunities after a workload shift due to deferred adaptation. However, it is important to note that if the optimization period is too small, the execution time gets worse again. This is reasoned by small benefits reached by immediate asynchronous re-optimization in combination with increasing optimization costs as shown in Figure 3.21(b). While, so far we used only the cumulative execution time (sum of plan execution times) as indicator, now, we also discuss the elapsed time (time required for executing the sequence of plan instances, which includes the workflow engine overhead and time during exchange of plans). Figure 3.21(c) shows that for small optimization intervals ∆t—where we often exchange plans—the elapsed time increase faster than the cumulative execution time. The reason is 76
3.5 Experimental Evaluation (a) Cumulative Execution Time (b) Cumulative Optimization Time Figure 3.22: Use Case Comparison of Periodical Re-Optimization asynchronous optimization but synchronous exchange of plans, where execution is blocked. As a result both cumulative execution time and elapsed time might have different optimal ∆t configurations. Thus, this parameter is a possibility to fine-tune the optimizer. Third, with regard to workability, we observed fairly similar results for our other example plans and statistic variations. Here, we compared the periodical re-optimization with no-optimization once again. In detail, we executed 20,000 plan instances for each example plan (P 1 ,P 2 ,P 3 ,P 4 ,P 5 ,P 6 ,P 7 ,P 8 ) and for each execution model. There, we fixed the cardinality of input data sets to d = 1 (100 kB messages) and used a well-balanced workload configuration (without correlations and without workload changes). Furthermore, we fixed an optimization interval of ∆t = 5 min, a sliding window size of ∆w = 5 min and EMA as the workload aggregation method. To summarize, we consistently observe execution time reductions (see Figure 3.22(a)). In the following, we describe in detail how these benefits have been achieved: • P 1 : This plan was affected by three different optimization techniques. First, the technique WD1 reordered the two paths of Switch operator o 2 . Furthermore, the operator sequence (o 7 ,o 8 ,o 9 ) has been rewritten to parallel subflows (o 7 ) and (o 8 ,o 9 ). Finally, the technique WC1 rescheduled the start of both subflows in order to start the most time-consuming subflow (o 8 ,o 9 ) first. • P 2 : No optimization technique affected this plan. • P 3 : Similar to plan P 1 , the techniques WC2 and WC1 have been applied on the operator sequence (o 2 ,o 3 ) in order to rewrite this sequence to parallel subflows and to reschedule the start of these subflows. In addition, the technique WD6 has been applied in order to pushdown the invariant group-by and thus, exchanged the temporal order and data dependencies of operators o 4 and o 5 . • P 4 : For this rather complex plan, only the optimization technique WD9 was applied. In detail the Join operator o 9 was rewritten from a nested loop join to a subplan of two (concurrent) Orderby operators and one merge join. • P 5 : Similar to our first end-to-end comparison scenario, the technique WD4 was applied for the plan P 5 with the aim of reordering the sequence of Selection operators (o 2 ,o 3 ,o 4 ). • P 6 : This plan was affected by a set of techniques. First, the initially given Fork operator was rescheduled by WC1. Furthermore, the techniques WD11 and WD8 rewrote the two subsequent Setoperation (UNION DISTINCT) operators to a sub- 77
Page 1:
Cost-Based Optimization of Integrat
Page 4 and 5:
of traditional data management syst
Page 7 and 8:
Contents 1 Introduction 1 2 Prelimi
Page 9:
Contents 6.5 Experimental Evaluatio
Page 12 and 13:
1 Introduction for integration flow
Page 14 and 15:
1 Introduction violated. We present
Page 16 and 17:
2 Preliminaries and Existing Techni
Page 18 and 19:
Page 20 and 21:
Page 22 and 23:
Page 24 and 25:
Page 26 and 27:
Page 28 and 29:
Page 30 and 31:
Page 32 and 33:
Page 34 and 35:
Page 36 and 37: 2 Preliminaries and Existing Techni
Page 44 and 45: 3 Fundamentals of Optimizing Integr
Page 98 and 99: 4 Vectorizing Integration Flows •
Page 100 and 101: 4 Vectorizing Integration Flows exe
Page 102 and 103: 4 Vectorizing Integration Flows Ove
Page 104 and 105: 4 Vectorizing Integration Flows Alg
Page 106 and 107: 4 Vectorizing Integration Flows two
Page 108 and 109: 4 Vectorizing Integration Flows Inv
Page 110 and 111: 4 Vectorizing Integration Flows (a)
Page 112 and 113: 4 Vectorizing Integration Flows o 2
Page 114 and 115: 4 Vectorizing Integration Flows In
Page 116 and 117: 4 Vectorizing Integration Flows 2.
Page 118 and 119: 4 Vectorizing Integration Flows The
Page 120 and 121: 4 Vectorizing Integration Flows ord
Page 122 and 123: 4 Vectorizing Integration Flows 4.3
Page 124 and 125: 4 Vectorizing Integration Flows P
Page 126 and 127: 4 Vectorizing Integration Flows P
Page 128 and 129: 4 Vectorizing Integration Flows t1:
Page 130 and 131: 4 Vectorizing Integration Flows We
Page 136 and 137:
4 Vectorizing Integration Flows (a)
Page 138 and 139:
4 Vectorizing Integration Flows 4.7
Page 140 and 141:
5 Multi-Flow Optimization cannot be
Page 142 and 143:
5 Multi-Flow Optimization the query
Page 144 and 145:
5 Multi-Flow Optimization The incom
Page 146 and 147:
5 Multi-Flow Optimization example p
Page 148 and 149:
5 Multi-Flow Optimization not allow
Page 150 and 151:
5 Multi-Flow Optimization partition
Page 152 and 153:
5 Multi-Flow Optimization mention i
Page 154 and 155:
5 Multi-Flow Optimization • Case
Page 156 and 157:
5 Multi-Flow Optimization k ′ . F
Page 158 and 159:
5 Multi-Flow Optimization partition
Page 160 and 161:
5 Multi-Flow Optimization the waiti
Page 162 and 163:
5 Multi-Flow Optimization Execution
Page 164 and 165:
5 Multi-Flow Optimization Thus, for
Page 166 and 167:
5 Multi-Flow Optimization Thus, T L
Page 168 and 169:
5 Multi-Flow Optimization • P 5 :
Page 170 and 171:
5 Multi-Flow Optimization decreasin
Page 172 and 173:
5 Multi-Flow Optimization (a) Fixed
Page 174 and 175:
5 Multi-Flow Optimization reached,
Page 176 and 177:
5 Multi-Flow Optimization (2) plan
Page 178 and 179:
6 On-Demand Re-Optimization categor
Page 180 and 181:
6 On-Demand Re-Optimization present
Page 182 and 183:
6 On-Demand Re-Optimization stratum
Page 184 and 185:
6 On-Demand Re-Optimization o 3 o 4
Page 186 and 187:
6 On-Demand Re-Optimization For on-
Page 188 and 189:
6 On-Demand Re-Optimization 6.3.1 O
Page 190 and 191:
6 On-Demand Re-Optimization such th
Page 192 and 193:
6 On-Demand Re-Optimization Join En
Page 194 and 195:
6 On-Demand Re-Optimization f γ((
Page 196 and 197:
6 On-Demand Re-Optimization The res
Page 198 and 199:
6 On-Demand Re-Optimization project
Page 200 and 201:
6 On-Demand Re-Optimization (a) Sel
Page 202 and 203:
6 On-Demand Re-Optimization (a) Loa
Page 204 and 205:
6 On-Demand Re-Optimization ical re
Page 206 and 207:
6 On-Demand Re-Optimization evaluat
Page 208 and 209:
6 On-Demand Re-Optimization 6.6 Sum
Page 210 and 211:
7 Conclusions Existing approaches b
Page 212 and 213:
Bibliography [BBD05a] Shivnath Babu
Page 214 and 215:
Bibliography [BHP + 09b] [BHP + 11]
Page 216 and 217:
Bibliography [CM95] Sophie Cluet an
Page 218 and 219:
Bibliography [GZ08] [HA03] [Haa07]
Page 220 and 221:
Bibliography [IKNG09] [INSS92] [Ioa
Page 222 and 223:
Bibliography [LX09] [LZ05] Rubao Le
Page 224 and 225:
Bibliography [OMG07] OMG. XML Metad
Page 226 and 227:
Bibliography [Sto02] Michael Stoneb
Page 228 and 229:
Bibliography [ZRH04] Yali Zhu, Elke
Page 230 and 231:
List of Figures 3.27 Workload Adapt
Page 233:
List of Tables 2.1 Interaction-Orie
Page 237:
Selbstständigkeitserklärung Hierm
show all

Cost-Based Optimization of Integration Flows - Datenbanken ...

Create successful ePaper yourself

Delete template?

Save as template?