17.06.2013 Views

Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 8: Being Normal: Normalization and Other Basic Design Issues<br />

So, to reach the third normal form, we just need to drop off the TotalPrice column and compute it<br />

when needed.<br />

Other Normal Forms<br />

226<br />

Derived data is one of the places that you’ll see me “de-normalize” data most often.<br />

Why? Speed! A query that reads WHERE TotalPrice > $100 runs faster than one<br />

that reads WHERE Qty * UnitPrice > 50 — particularly if we are able to index our<br />

computed TotalPrice.<br />

On the other side of this, however, I do sometimes take more of a hybrid approach<br />

by utilizing a computed column and letting <strong>SQL</strong> <strong>Server</strong> keep a sum of the other two<br />

columns for us (you may recall us using this idea for our the PreviousSalary example<br />

in the Employees table of the Accounting database in Chapter 5). If this is a very<br />

important column from a performance perspective (you’re running lots of columns<br />

that filter based on the values in this column), then you may want to add an index to<br />

your new computed column. The significance of this is that the index “materializes”<br />

the computed data. What does that mean? Well, it means that even <strong>SQL</strong> <strong>Server</strong> doesn’t<br />

have to calculate the computed column on the fly — instead, it calculates it once<br />

when the row is stored in the index, and, thereafter, uses the precalculated column.<br />

It can be very fast indeed, and we’ll examine it further in Chapter 9. That said, there<br />

is a trade off (if there wasn’t, everyone would do it this way all the time, right?) —<br />

space. You’re storing data that doesn’t need to be stored, and if you do that to every<br />

possible piece of derived data, then it can really add up. More space means more<br />

data to read, and that can mean things actually get slower. The point here is to weigh<br />

your options, and make a balanced choice.<br />

There are a few other forms out there that are considered, at least by academics, to be part of the normalization<br />

model. These include:<br />

❑ Boyce-Codd (considered really just to be a variation on third normal form): This one tries to<br />

address situations where you have multiple overlapping candidate keys. This can only happen if:<br />

a. All the candidate keys are composite keys (that is, it takes more than one column<br />

to make up the key).<br />

b. There is more than one candidate key.<br />

c. The candidate keys each have at least one column that is in common with<br />

another candidate key.<br />

This is typically a situation where any number of solutions works, and almost never gets logically<br />

thought of outside the academic community.<br />

❑ Fourth Normal Form: This one tries to deal with issues surrounding multi-valued dependence.<br />

This is the situation where, for an individual row, no column depends on a column other than<br />

the primary key and depends on the whole primary key (meeting third normal form). However,<br />

there can be rather odd situations where one column in the primary key can depend separately<br />

on other columns in the primary key. These are rare, and don’t usually cause any real problem.<br />

As such, they are largely ignored in the database world, and we will not address them here.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!