Implementation
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
P a g e | 20<br />
the intermediary data store namely an incremental table and two staging tables. These<br />
three tables are populated according to dataflow steps 1-4 discussed above. Data from first<br />
staging tables is populated into another second staging table using a stored procedure.<br />
Data in the first two tables i.e. the incremental table and the first staging table is always<br />
truncated before new set of data is populated. The data in the last table is not truncated<br />
to maintain historical data.<br />
3.1.4 NON-PRIMARY KEY TABLES<br />
GENERAL DATAFLOW<br />
The flow of data from the intermediary data store house to the final warehouse for tables<br />
without any primary key columns is similar to that of primary key tables but only logic to<br />
maintain data integrity is changed. It is assumed that in the source table a similar record can<br />
appear more than once and all are valid records. In order to differentiate between the records<br />
an Indicator key is generated while reading the data from Journal into the data store tables.<br />
An Indicator key is generated by appending data in all the columns of a records and<br />
applying some algorithm to it to get a 32-digit alpha numeric value. A indicator key changes<br />
with change in data of even a single column and hence becomes a new record.<br />
Note: This concept of indicator keys is also utilized for primary-key tables. It has no<br />
significance and is present only to make the tables in the final data ware house generic.<br />
Loading Strategy<br />
1. While reading from journal, it is possible that a network failure takes place and the restart<br />
token is reset to a point from where journal is already read. This results in duplication of<br />
records in the intermediate data store. This duplication is handled in the incremental<br />
mappings based on a token value and the action status for the entry. The combination of<br />
both is unique for an entry.<br />
2. Records are sorted on basis of token and the action status.<br />
3. Then duplication for above combination is filtered and only those entries which were<br />
recorded first in journal are passed to the first of the staging tables.<br />
4. From the first staging table only those records which were processed a day before are<br />
taken and again duplicate records are filtered as explained above.<br />
5. In the second table, records are already present with similar indicator key.<br />
__________________________________________________________________________________<br />
BI Reporting Tool <strong>Implementation</strong> Arpan Ganguly