10.11.2012 Views

Expert Cube Development with Microsoft SQL Server 2008

Expert Cube Development with Microsoft SQL Server 2008

Expert Cube Development with Microsoft SQL Server 2008

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

This is the rule for relational tables. However, you also need to remember<br />

that the equivalent measure data type in Analysis Services must be large<br />

enough to hold the largest aggregated value of a given measure, not just<br />

the largest value present in a single fact table row.<br />

[ 29 ]<br />

Chapter 1<br />

Always remember that there are situations in which the rules must be overridden.<br />

If we have a fact table containing 20 billion rows, each composed of 20 bytes and a<br />

column that references a date, then it might be better to use a SMALLINT column for<br />

the date, if we find a suitable representation that holds all necessary values. We will<br />

gain 2 bytes for each row, and that means a 10% in the size of the whole table.<br />

<strong>SQL</strong> queries generated during cube<br />

processing<br />

When Analysis Services needs to process a cube or a dimension, it sends queries<br />

to the relational database in order to retrieve the information it needs. Not all the<br />

queries are simple SELECTs; there are many situations in which Analysis Services<br />

generates complex queries. Even if we do not have space enough to cover all<br />

scenarios, we're going to provide some examples relating to <strong>SQL</strong> <strong>Server</strong>, and we<br />

advise the reader to have a look at the <strong>SQL</strong> queries generated for their own cube<br />

to check whether they can be optimized in some way.<br />

Dimension processing<br />

During dimension processing Analysis Services sends several queries, one for<br />

each attribute of the dimension, in the form of SELECT DISTINCT ColName, where<br />

ColName is the name of the column holding the attribute.<br />

Many of these queries are run in parallel (exactly which ones can be run in parallel<br />

depends on the attribute relationships defined on the Analysis Services dimension),<br />

so <strong>SQL</strong> <strong>Server</strong> will take advantage of its cache system and perform only one physical<br />

read of the table, so all successive scans are performed from memory. Nevertheless,<br />

keep in mind that the task of detecting the DISTINCT values of the attributes is done<br />

by <strong>SQL</strong> <strong>Server</strong>, not Analysis Services.<br />

We also need to be aware that if our dimensions are built from complex views, they<br />

might confuse the <strong>SQL</strong> <strong>Server</strong> engine and lead to poor <strong>SQL</strong> query performance. If,<br />

for example, we add a very complex WHERE condition to our view, then the condition<br />

will be evaluated more than once. We have personally seen a situation where the<br />

processing of a simple time dimension <strong>with</strong> only a few hundred rows, which had a<br />

very complex WHERE condition, took tens of minutes to complete.<br />

Download at Boykma.Com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!