15.07.2013 Views

Implementation

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

P a g e | 20<br />

the intermediary data store namely an incremental table and two staging tables. These<br />

three tables are populated according to dataflow steps 1-4 discussed above. Data from first<br />

staging tables is populated into another second staging table using a stored procedure.<br />

Data in the first two tables i.e. the incremental table and the first staging table is always<br />

truncated before new set of data is populated. The data in the last table is not truncated<br />

to maintain historical data.<br />

3.1.4 NON-PRIMARY KEY TABLES<br />

GENERAL DATAFLOW<br />

The flow of data from the intermediary data store house to the final warehouse for tables<br />

without any primary key columns is similar to that of primary key tables but only logic to<br />

maintain data integrity is changed. It is assumed that in the source table a similar record can<br />

appear more than once and all are valid records. In order to differentiate between the records<br />

an Indicator key is generated while reading the data from Journal into the data store tables.<br />

An Indicator key is generated by appending data in all the columns of a records and<br />

applying some algorithm to it to get a 32-digit alpha numeric value. A indicator key changes<br />

with change in data of even a single column and hence becomes a new record.<br />

Note: This concept of indicator keys is also utilized for primary-key tables. It has no<br />

significance and is present only to make the tables in the final data ware house generic.<br />

Loading Strategy<br />

1. While reading from journal, it is possible that a network failure takes place and the restart<br />

token is reset to a point from where journal is already read. This results in duplication of<br />

records in the intermediate data store. This duplication is handled in the incremental<br />

mappings based on a token value and the action status for the entry. The combination of<br />

both is unique for an entry.<br />

2. Records are sorted on basis of token and the action status.<br />

3. Then duplication for above combination is filtered and only those entries which were<br />

recorded first in journal are passed to the first of the staging tables.<br />

4. From the first staging table only those records which were processed a day before are<br />

taken and again duplicate records are filtered as explained above.<br />

5. In the second table, records are already present with similar indicator key.<br />

__________________________________________________________________________________<br />

BI Reporting Tool <strong>Implementation</strong> Arpan Ganguly

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!