29.01.2014 Views

Tutorial slides

Tutorial slides

Tutorial slides

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

DB Performance: Linear Clustering<br />

<br />

<br />

<br />

<br />

<br />

<br />

Indexed clustering is an old idea; in 1980s one company had<br />

demographic data on 80 million families from warranty cards<br />

They crafted mail lists for sales promotions of client companies<br />

Typically, they concentrated on local areas (store localities) for<br />

these companies to announce targeted sales: sports, toys, etc.<br />

To speed searches, they kept families in order by zipcode; also<br />

had other data, so could restrict by incomeclass, hobbies, etc.<br />

The result was something like like Q3B: a 50 mile radius circle in<br />

a state will result in a small union of ranges on zipcode<br />

The Q3B range union was a fraction 0.06 of all possible values<br />

in clustering column KSEQ in BENCH<br />

Clustered ranges are all the more important with modern disks<br />

30

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!