25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2 Preliminaries and Existing Techniques<br />

2.4.2 Vertical <strong>Integration</strong>: Real-Time ETL<br />

In contrast to horizontal integration, the use case <strong>of</strong> vertical integration addresses the<br />

consolidation <strong>of</strong> data from the operational source systems into dispositive and strategical<br />

systems. In this context, typically, data-centric ETL (Extraction Transformation Loading)<br />

flows are used. However, there is a trend towards operational BI where changes in the<br />

source systems are directly propagated to the data warehouse infrastructure in order to<br />

achieve high up-to-dateness <strong>of</strong> analytical query results [DCSW09, O’C08, WK10]. This<br />

is typically realized with (1) near real-time ETL flows, where data is loaded periodically<br />

but with high frequency, or with (2) real-time ETL flows, where data is loaded based<br />

on business transactions. As a result <strong>of</strong> both strategies, many instances <strong>of</strong> integration<br />

flows with rather small amounts <strong>of</strong> data are executed over time. Although this is similar<br />

to horizontal integration, we use selected ETL integration flows in order to demonstrate<br />

their specific characteristics as well.<br />

Example 2.8 (Real-Time Standard Orders Loading). If a new order <strong>of</strong> standard products<br />

(not user-defined) is created using the ERP system, the data is directly propagated to<br />

the data warehouse infrastructure. Therefore, the integration flow (plan P 5 ) shown in<br />

Figure 2.11(a) is used. A plan instance is asynchronously initiated by receiving a data<br />

set from the ERP system, and it executes three different Selection operators (according<br />

to different attributes) in order to filter orders that are maintained within dedicated data<br />

marts. Subsequently, a Switch operator routes the incoming tuples—using content-based<br />

predicates—to specific schema mapping Translation operators (specific to the referenced<br />

material). Finally, the result is loaded into the data warehouse s 6 .<br />

Example 2.9 (Near-Real-Time Customer Loading). Non-disjoint (overlapping) customer<br />

master data from the eCommerce Web shop, the CRM system as well as the ERP system<br />

is loaded into the data warehouse in a near real-time fashion. Plan instances <strong>of</strong> the<br />

integration flow (plan P 6 )—that is illustrated in Figure 2.11(b)—are periodically initiated<br />

and executed. Essentially, this plan creates three parallel subflows, where the customer<br />

master data is loaded from the ERP system s 3 , from the CRM system s 4 , and from the<br />

eCommerce Web shop s 5 . After the subflows have been temporally joined, two subsequent<br />

Setoperation operators (type UNION DISTINCT) are executed in order to eliminate duplicates.<br />

Finally, the resulting customer data is loaded into the data warehouse s 6 .<br />

Example 2.10 (Real-Time Customized Orders Loading). Incoming orders <strong>of</strong> customized<br />

products that are registered within the central ERP system are also directly propagated<br />

to the data warehouse architecture using the integration flow (plan P 7 ) that is shown in<br />

Figure 2.11(c). Initiated by those incoming order messages, four parallel subflows are<br />

executed, where we load the related supplier data from the SCM system s 1 , product information<br />

from the Material system s 2 , customer data from the CRM system s 4 , and<br />

transaction information from the eCommerce Web shop s 5 . Subsequently, a left-deepjoin-tree—consisting<br />

<strong>of</strong> four Join operators (chain query type)—merges the received order<br />

message with the loaded data. Finally, the result is sent to the data warehouse s 6 .<br />

Example 2.11 (DSS Data Provision). The consolidated data <strong>of</strong> the data warehouse is<br />

partially provided for strategical planning within the DSS. In order to synchronize the data<br />

warehouse with the DSS, the integration flow (plan P 8 ) is used as shown in Figure 2.11(d).<br />

Essentially, instances <strong>of</strong> this plan are initiated periodically. First, a procedure is called by<br />

the Invoke operator o 2 that executes several data cleaning and aggregation operations on<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!