Assumptions

This is NOT going to be a ‘Basic’ Presentation

We will be reviewing and discussing fairly advanced areas

of Optimizer P&T; some of this you may have seen in the

past, but a little review never hurt

• You’ve worked with optimizer P&T

• You’re running ASE 11.9.2 or above

• You understand the basics of optimization

• You’ve used Traceons 302/310 and Optdiag

• You’ve used the various update statistics syntax available in

ASE 11.9.2 and above

• You really want to know about tuning the statistics

There are Two Kinds of Optimizer Statistics

• Table/Index level - describes a table and its index(es)

• Page/row counts, cluster ratios, deleted and forwarded rows

• Some are updated dynamically as DML occurs

• page/ row counts, deleted rows, forwarded rows, cluster ratios

• Stored in systabstats

• Column level - describes the data to the optimizer

• Histogram (distribution), density values, default selectivity

values

• Static, need to be updated or written directly

• Stored in sysstatistics

• This presentation deals with the column level statistics

Some Quick Definitions

Range cell density: 0.0037264745412389

Total density: 0.3208892191740000

Range selectivity: default used (0.33)

In between selectivity: default used (0.25)

Histogram for column: “A"

Column datatype: integer

Requested step count: 20

Actual step count: 10

Step Weight Value

1 0.00000000

Statistics On Inner Columns

of Composite Indexes

• Think of a composite index as a 3D object, columns with

statistics are transparent, those without statistics are

opaque

• Columns with statistics give the optimizer a clearer picture

of an index – sometimes good, sometimes not

• This is a fairly common practice

• update index statistics most commonly used to do this

Statistics On Inner Columns

of Composite Indexes cont.

Index on columns E and B – No statistics on column B

select * from TW4

where E = "yes" and b >= 959789065 and id >= 600000 and

F > "May 14, 2002“ and A_A = 959000000

Beginning selection of qualifying indexes for table TW4',

varno = 0, objectid 464004684.

The table (Allpages) has 1000000 rows, 24098 pages,

Estimated selectivity for E,

selectivity = 0.527436, upper limit = 0.527436.

No statistics available for B,

using the default range selectivity to estimate selectivity.

Estimated selectivity for B,

selectivity = 0.330000.

Statistics On Inner Columns

of Composite Indexes cont.

The best qualifying index is ‘E_B' (indid 7)

costing 49264 pages, with an estimate of 191

rows to be returned per scan of the table

FINAL PLAN (total cost = 481960):

varno=0 (TW4) indexid=0 ()

path=0xfbccc120 pathtype=sclause

method=NESTED ITERATION

Table: TW4 scan count 1, logical reads:(regular=24098

apf=0 total=24098)

apf IOs used=0

Statistics On Inner Columns

of Composite Indexes cont.

Statistics are now on column B

Estimated selectivity for E,

selectivity = 0.527436, upper limit = 0.527436.

Estimated selectivity for B,

selectivity = 0.022199, upper limit = 0.074835.

The best qualifying index is ‘E_B' (indid 7)

costing 3317 pages,with an estimate of 13 rows to

be returned per scan of the table

FINAL PLAN (total cost = 55108):

varno=0 (TW4) indexid=7 (E_B)

path=0xfbd1da08 pathtype=sclause

method=NESTED ITERATION

Table: TW4 scan count 1, logical

Statistics On Non-Indexed Columns and Joins

Can’t help with index selection but can affect join ordering

• Columns with statistics give the optimizer a clearer picture of the

column – no hard coded assumptions have to be used

• When costing joins of non-indexed columns having statistics may

result in better plans than using the default values

• Without statistics there will be no Total density or histogram that the

optimizer can use to cost the column in the join

• Yes, in some circumstances histograms can be used in costing joins –

if there is a SARG on the joining column and that column is also in the

join table then the SARG from the joining table can be used to filter the

join table

• If there is no SARG on the join column or on the joining column the

Total density value (with stats) or the default value (w/o stats) will be

used

Statistics On Non-Indexed Columns

and Joins cont.

“Inherited” SARG example

select ....from TW1, TW4

where TW1.A = TW4.A and TW1.A = 10

Selecting best index for the JOIN CLAUSE:

TW4.A = TW1.A

TW4.A = 10

Estimated selectivity for a,

selectivity = 0.003726,upper limit = 0.049683.

Histogram values used

select ....from TW1, TW4

where TW1.A = TW4.A and TW1.B = 10

Selecting best index for the JOIN CLAUSE:

TW4.A = TW1.A

Estimated selectivity for a,

selectivity = 0.320889. Total density value used

Statistics On Non-Indexed Columns

and Joins - Example

select * from TW1,TW2

where TW1.A=TW2.A and TW1.A =805975090

A simple join with a SARG on the join column of one table

Table TW2 column A has no statistics, TW1 column A does

Selecting best index for the JOIN CLAUSE: (for TW2.A)

TW2.A = TW1.A

TW2.A = 805975090 Inherited from SARG on TW1

But, can’t help…no stats

Estimated selectivity for A,

selectivity = 0.100000.

The best qualifying access is a table scan,

costing 13384 pages, with an estimate of 50000

rows to be returned per scan of the table,

using no data prefetch (size 2K I/O),

in data cache 'default data cache' (cacheid 0)

with MRU replacement

Join selectivity is 0.100000.

Inherited SARG from other table doesn’t help in this case

Statistics On Non-Indexed Columns

and Joins – Example cont.

Without statistics on TW2.A the plan includes a reformat

with TW1 as the outer table

FINAL PLAN (total cost = 2855774):

varno=0 (TW1) indexid=2 (A_E_F)

path=0xfbd46800 pathtype=sclause

method=NESTED ITERATION

varno=1 (TW2) indexid=0 ()

path=0xfbd0bb10 pathtype=join

method=REFORMATTING

• Not the best plan – but the optimizer had little to go on

Statistics On Non-Indexed Columns

and Joins – Example cont.

• Table TW2 column A now has statistics

• The inherited SARG on TW1.A can now be used to help

filter the join on TW2.A

Selecting best index for the JOIN CLAUSE:

TW2.A = TW1.A

TW2.A = 805975090

Estimated selectivity for A,

selectivity = 0.001447, upper limit = 0.052948.

The best qualifying access is a table scan,

costing 13384 pages, with an estimate of 724 rows to be

returned per scan of the table, using no data prefetch

(size 2K I/O), in data cache 'default data cache' (cacheid

0) with MRU replacement

Join selectivity is 0.001447.

Statistics On Non-Indexed Columns

and Joins – Example cont.

• With statistics on TW2.A reformatting is not used and the

join order has changed

FINAL PLAN (total cost = 1252148):

varno=1 (TW2) indexid=0 ()

path=0xfbd0b800 pathtype=sclause

method=NESTED ITERATION

varno=0 (TW1) indexid=2 (A_E_F)

path=0xfbd46800 pathtype=sclause

method=NESTED ITERATION

The Effects of Changing the

Number of Steps (Cells)

• The number of cells (steps) affects SARG costing – as the number

of steps changes, costing does too

• Cell weights and range cell density are used in costing SARGs

• Cell weight is used as column’s ‘upper limit’ Range cell density is used

as ‘selectivity’ for Equi-SARGs – as seen in 302 output

• Result(s) of interpolation is used as column ‘selectivity’ for Range

SARGs

• Increasing the number of steps narrows the average cell width, thus the

weight of Range cells decreases

• Can also result in more Frequency count cells and thus change the

Range cell density value

• More cells means more granular cells

The Effects of Changing the Number of Steps

(Cells) cont.

Average cell width = # of rows/(# of requested steps –1)

• Table has 1 million rows, requested 20 steps -

• 1,000,000/19 = 52,632 rows per cell

• 1,000,000/199 = 5,025 rows per cell

• What does this mean?

• As you increase the number of steps (cells) they

become narrower – representing fewer values

• We’ll see that this has an effect on how the optimizer

estimates the cost of a SARG

The Effects of Changing the

Number of Steps (Cells) cont.

Changing the number of steps – effects on Equi-SARGs

select A from TW2 where B = 842000000

With 20 cells (steps) in the histogram

Range cell density: 0.0012829768785739

9 0.05263200

• Range cell density decreased because Frequency

count cells appeared in the histogram

The Effects of Changing the

Number of Steps (Cells) cont.

With 200 cells (steps) in the histogram

Range cell density: 0.0002303825911991

77 0.00507200

The Effects of Changing the

Number of Steps (Cells) cont.

Changing the number of steps – effects on Range SARGs -

select * from TW2 where B between

825570000 and 830000000

With 20 cells (steps) in the histogram

Range cell density: 0.0012829768785739

9 0.05263200

The Effects of Changing the

Number of Steps (Cells) cont.

select * from TW2 where B between

825570000 and 830000000

With 200 cells (steps) in the histogram

Range cell density: 0.0002303825911991

67 0.00505200

Adding Boundary Values To The Histogram

• Changing the boundary values can keep SARG values

within the histogram

• Avoids ‘out of bounds’ costing

• Out of bounds costing usually happens on an atomic column

whose histogram is out of date in relation the SARG value(s)

• Optimizer has only two choices for selectivity – 1 or 0

depending on the SARG operator and which end of the

histogram the SARG value falls outside of

To The Histogram cont.

Histogram for column: “F"

Column datatype: datetimn

Requested step count: 20

Actual step count: 20

Step Weight Value

1 0.28396901 < "May 1 2002 12:00:00:000AM"

2 0.04839900 = "May 1 2002 12:00:00:000AM“

20 0.00432500

To The Histogram cont.

Out of bounds costing that uses a 0.00 selectivity

select count(*) from TW1 where F = "April 30, 2002“

To The Histogram cont.

Out of bounds costing that uses a 1.00 selectivity

select count(*) from TW1 where F >= “Apr 30 2002”

> “Apr 30 2002”

“May 16 2002”

Estimated selectivity for F,

selectivity = 1.000000.

Lower bound search value 'Apr 30 2002 12:00:00:000AM' is less

than the smallest value in sysstatistics for this column.

Estimating selectivity of index ‘ind_F', indid 6

scan selectivity 1.000000,filter selectivity 1.000000

Search argument selectivity is 1.000000.

To The Histogram cont.

What to do if out of bounds costing is a problem

• Not always a problem, particularly when a selectivity of

0.000000 is used

• There are two ways to deal with it

• Add a dummy row to the table with a column value that

allows the SARG value(s) to fall within the histogram – not

always allowed

• If you do add a dummy row keep in mind that it will affect

the histograms of other columns; be careful with the values

you use

• Write a new histogram boundary using optdiag. Edit the file

and read it back in. This won’t directly affect the data, but it

will extend the histogram to include the SARG values(s)

Removing Statistics Can Effect Query Plans

Sometimes no statistics are better then having them

This will usually be an issue when very dense columns

are involved

Histogram for column: “E"

Step Weight Value

1 0.00000000 < "no"

2 0.47256401 = "no"

3 0.00000000 < "yes"

4 0.52743602 = "yes“

This can also show up when you have ‘spikes’

(Frequency count cells) in the distribution

Removing Statistics Can

Effect Query Plans cont.

select count(*) from TW4

where E = “yes” and C = 825765940

The table…has 1000000 rows, 24098 pages,

Estimated selectivity for E,

selectivity = 0.527436, upper limit = 0.527436.

Estimating selectivity of index ‘E_AA_B', indid 6

scan selectivity 0.52743602,filter selectivity 0.527436

527436 rows, 174107 pages

The best qualifying index is ‘E_AA_B' (indid 6)

costing 174107 pages, with an estimate of 526 rows

FROM TABLE

TW4

Nested iteration.

Table Scan.

Removing Statistics Can

Effect Query Plans cont.

delete statistics TW4(E)

Estimated selectivity for E,

selectivity = 0.100000.

Estimating selectivity of index ‘E_AA_B', indid 6

scan selectivity 0.100000,filter selectivity 0.100000

100000 rows, 20584 pages

The best qualifying index is ‘E_AA_B (indid 6)

costing 20584 pages, with an estimate of 92 rows

FROM TABLE

TW4

Nested iteration.

Index : E_AA_B

Forward scan.

Positioning by key.

Maintaining Tuned Statistics

• Any statistical value you write to sysstatistics either via

optdiag or sp_modifystats will be overwritten by update

statistics

• Keep optdiag input files for reuse

• If needed get an optdiag output file, edit it and read it in

• Keep scripts that run sp_modifystats

• Rewrite tuned statistics after running update statistics that

affects the column with the modified statistics

Sampling For Update Statistics

New feature in 12.5.0.3

• Can dramatically speed up the running of update statistics

• Reads rows from random pages to build column level

statistics (histogram)

• The percentage of pages to sample can be specified

update statistics table(col) with sampling=10 percent

• Also applies to update index statistics and

update all statistics

• Unofficial tests show that a sampling rate of 10% on a 1

million row numeric column reduces the time for update

statistics to run from 9 minutes to 30 seconds

Sampling For Update Statistics cont.

• Density values not updated by sampling

• Sampled statistics will vary from those obtained by a ‘full

scan’

• More variations will appear as the sampling rate

decreases

• Test queries against sampled statistics. In most cases

you won’t see any major changes

• Values may become ‘out of bounds’ this will affect the

optimizer – likely to have greatest affect on atomic

columns

• The Sybase Customer newsgroups

• http://support.sybase.com/newsgroups

• The Sybase list server

• SYBASE-L@LISTSERV.UCSB.EDU

• The external Sybase FAQ

• http://www.isug.com/Sybase_FAQ/

• Join the ISUG, ISUG Technical Journal, feature requests

• http://www.isug.com

• The latest Performance and Tuning Guide

• Don’t be put off by the ASE 12.0 in the title, it covers the

11.9.2 features/functionality too

• http://sybooks.sybase.com/onlinebooks/group-as/asg1200e

• Any “What’s New” docs for a new ASE release

• Tech Docs at Sybase Support

• http://techinfo.sybase.com/css/techinfo.nsf/Home

• http://www.sybase.com/support/techdocs/migration

Sybase Developer Network (SDN)

and up-to-date technical information:

• White papers and documentation

• Collaboration with other developers and Sybase engineers

• Code samples and beta programs

• Technical recordings

• Free software

• Join today: www.sybase.com/developer or visit SDN at

TechWave’s Technology Boardwalk

More magazines by this user
Similar magazines