17.06.2013 Views

Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

Beginning Microsoft SQL Server 2008 ... - S3 Tech Training

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 9: <strong>SQL</strong> <strong>Server</strong> Storage and Index Structures<br />

The Pros<br />

Clustered indexes are best for queries when the column(s) in question will frequently be the subject of<br />

a ranged query. This kind of query is typified by use of the BETWEEN statement or the < or > symbols.<br />

Queries that use a GROUP BY and make use of the MAX, MIN, and COUNT aggregators are also great examples<br />

of queries that use ranges and love clustered indexes. Clustering works well here because the search can<br />

go straight to a particular point in the physical data, keep reading until it gets to the end of the range,<br />

and then stop. It is extremely efficient.<br />

Clusters can also be excellent when you want your data sorted (using ORDER BY) based on the cluster key.<br />

The Cons<br />

286<br />

There are two situations in which you don’t want to create that clustered index. The first is fairly<br />

obvious — when there’s a better place to use it. I know I’m sounding repetitive here, but don’t use a<br />

clustered index on a column just because it seems like the thing to do (primary keys are the common<br />

culprit here). Be sure that you don’t have another column that it’s better suited to first.<br />

Perhaps the much bigger no-no use for clustered indexes, however, is when you are going to be doing a<br />

lot of inserts in a non-sequential order. Remember that concept of page splits? Well, here’s where it can<br />

come back and haunt you big time.<br />

Imagine this scenario: You are creating an accounting system. You would like to make use of the concept<br />

of a transaction number for your primary key in your transaction files, but you would also like those<br />

transaction numbers to be somewhat indicative of what kind of transaction it is (it really helps trouble -<br />

shooting for your accountants). So you come up with something of a scheme — you’ll place a prefix on<br />

all the transactions indicating what sub-system they come out of. They will look something like this:<br />

ARXXXXXX Accounts Receivable Transactions<br />

GLXXXXXX General Ledger Transactions<br />

APXXXXXX Accounts Payable Transactions<br />

where XXXXXX will be a sequential numeric value.<br />

This seems like a great idea, so you implement it, leaving the default of the clustered index going on the<br />

primary key.<br />

At first glance, everything about this setup looks fine. You’re going to have unique values, and the<br />

accountants will love the fact that they can infer where something came from based on the transaction<br />

number. The clustered index seems to make sense since they will often be querying for ranges of transaction<br />

IDs.<br />

Ah, if only it were that simple. Think about your inserts for a bit. With a clustered index, we originally<br />

had a nice mechanism to avoid much of the overhead of page splits. When a new record was inserted<br />

that was to go after the last record in the table, then even if there was a page split, only that record would<br />

go to the new page — <strong>SQL</strong> <strong>Server</strong> wouldn’t try and move around any of the old data. Now we’ve messed<br />

things up though.<br />

New records inserted from the General Ledger will wind up going on the end of the file just fine (GL is<br />

last alphabetically, and the numbers will be sequential). The AR and AP transactions have a major problem<br />

though — they are going to be doing non-sequential inserts. When AP000025 gets inserted and there

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!