10.11.2012 Views

Expert Cube Development with Microsoft SQL Server 2008

Expert Cube Development with Microsoft SQL Server 2008

Expert Cube Development with Microsoft SQL Server 2008

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

[ 23 ]<br />

Chapter 1<br />

A snapshot fact table records of the state of something at different points in time.<br />

If we record in a fact table the total sales for each product every month, we are not<br />

recording an event but a specific situation. Snapshots can also be useful when we<br />

want to measure something not directly related to any other fact. If we want to<br />

rank out customers based on sales or payments, for example, we may want to store<br />

snapshots of this data in order to analyze how these rankings change over time in<br />

response to marketing campaigns.<br />

Using a snapshot table containing aggregated data instead of a transaction table<br />

can drastically reduce the number of rows in our fact table, which in turn leads to<br />

smaller cubes, faster cube processing and faster querying. The price we pay for this<br />

is the loss of any information that can only be stored at the transaction level and<br />

cannot be aggregated up into the snapshot, such as the transaction number data we<br />

encountered when discussing degenerate dimensions. Whether this is an acceptable<br />

price to pay is a question only the end users can answer.<br />

Updating fact and dimension tables<br />

In an ideal world, data that is stored in the data warehouse would never change.<br />

Some books suggest that we should only support insert operations in a data<br />

warehouse, not updates: data comes from the OLTP, is cleaned and is then stored<br />

in the data warehouse until the end of time, and should never change because it<br />

represents the situation at the time of insertion.<br />

Nevertheless, the real world is somewhat different to the ideal one. While some<br />

updates are handled by the slowly changing dimension techniques already<br />

discussed, there are other kinds of updates needed in the life of a data warehouse.<br />

In our experience, these other types of update in the data warehouse are needed<br />

fairly regularly and are of two main kinds:<br />

•<br />

•<br />

Structural updates: when the data warehouse is up and running, we will<br />

need to perform updates to add information like new measures or new<br />

dimension attributes. This is normal in the lifecycle of a BI solution.<br />

Data updates: we need to update data that has already been loaded into the<br />

data warehouse, because it is wrong. We need to delete the old data and<br />

enter the new data, as the old data will inevitably lead to confusion. There<br />

are many reasons why bad data comes to the data warehouse; the sad reality<br />

is that bad data happens and we need to manage it gracefully.<br />

Download at Boykma.Com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!