Click here to download this presentation in PDF format. - Sybase

Assumptions

This is NOT go**in**g **to** be a ‘Basic’ Presentation

We will be review**in**g and discuss**in**g fairly advanced areas

of Optimizer P&T; some of **this** you may have seen **in** the

past, but a little review never hurt

• You’ve worked with optimizer P&T

• You’re runn**in**g ASE 11.9.2 or above

• You understand the basics of optimization

• You’ve used Traceons 302/310 and Optdiag

• You’ve used the various update statistics syntax available **in**

ASE 11.9.2 and above

• You really want **to** know about tun**in**g the statistics

T**here** are Two K**in**ds of Optimizer Statistics

• Table/Index level - describes a table and its **in**dex(es)

• Page/row counts, cluster ratios, deleted and forwarded rows

• Some are updated dynamically as DML occurs

• page/ row counts, deleted rows, forwarded rows, cluster ratios

• S**to**red **in** systabstats

• Column level - describes the data **to** the optimizer

• His**to**gram (distribution), density values, default selectivity

values

• Static, need **to** be updated or written directly

• S**to**red **in** sysstatistics

• This **presentation** deals with the column level statistics

Some Quick Def**in**itions

Range cell density: 0.0037264745412389

Total density: 0.3208892191740000

Range selectivity: default used (0.33)

In between selectivity: default used (0.25)

His**to**gram for column: “A"

Column datatype: **in**teger

Requested step count: 20

Actual step count: 10

Step Weight Value

1 0.00000000

Statistics On Inner Columns

of Composite Indexes

• Th**in**k of a composite **in**dex as a 3D object, columns with

statistics are transparent, those without statistics are

opaque

• Columns with statistics give the optimizer a clearer picture

of an **in**dex – sometimes good, sometimes not

• This is a fairly common practice

• Does add ma**in**tenance

• update **in**dex statistics most commonly used **to** do **this**

Statistics On Inner Columns

of Composite Indexes cont.

Index on columns E and B – No statistics on column B

select * from TW4

w**here** E = "yes" and b >= 959789065 and id >= 600000 and

F > "May 14, 2002“ and A_A = 959000000

Beg**in**n**in**g selection of qualify**in**g **in**dexes for table TW4',

varno = 0, objectid 464004684.

The table (Allpages) has 1000000 rows, 24098 pages,

Estimated selectivity for E,

selectivity = 0.527436, upper limit = 0.527436.

No statistics available for B,

us**in**g the default range selectivity **to** estimate selectivity.

Estimated selectivity for B,

selectivity = 0.330000.

Statistics On Inner Columns

of Composite Indexes cont.

The best qualify**in**g **in**dex is ‘E_B' (**in**did 7)

cost**in**g 49264 pages, with an estimate of 191

rows **to** be returned per scan of the table

FINAL PLAN (**to**tal cost = 481960):

varno=0 (TW4) **in**dexid=0 ()

path=0xfbccc120 pathtype=sclause

method=NESTED ITERATION

Table: TW4 scan count 1, logical reads:(regular=24098

apf=0 **to**tal=24098)

physical reads: (regular=16468 apf=0 **to**tal=16468),

apf IOs used=0

Statistics On Inner Columns

of Composite Indexes cont.

Statistics are now on column B

Estimated selectivity for E,

selectivity = 0.527436, upper limit = 0.527436.

Estimated selectivity for B,

selectivity = 0.022199, upper limit = 0.074835.

The best qualify**in**g **in**dex is ‘E_B' (**in**did 7)

cost**in**g 3317 pages,with an estimate of 13 rows **to**

be returned per scan of the table

FINAL PLAN (**to**tal cost = 55108):

varno=0 (TW4) **in**dexid=7 (E_B)

path=0xfbd1da08 pathtype=sclause

method=NESTED ITERATION

Table: TW4 scan count 1, logical

reads:(regular=4070 apf=0 **to**tal=4070),

physical reads: (regular=820 apf=0 **to**tal=820),

Statistics On Non-Indexed Columns and Jo**in**s

Can’t help with **in**dex selection but can affect jo**in** order**in**g

• Columns with statistics give the optimizer a clearer picture of the

column – no hard coded assumptions have **to** be used

• When cost**in**g jo**in**s of non-**in**dexed columns hav**in**g statistics may

result **in** better plans than us**in**g the default values

• Without statistics t**here** will be no Total density or his**to**gram that the

optimizer can use **to** cost the column **in** the jo**in**

• Yes, **in** some circumstances his**to**grams can be used **in** cost**in**g jo**in**s –

if t**here** is a SARG on the jo**in****in**g column and that column is also **in** the

jo**in** table then the SARG from the jo**in****in**g table can be used **to** filter the

jo**in** table

• If t**here** is no SARG on the jo**in** column or on the jo**in****in**g column the

Total density value (with stats) or the default value (w/o stats) will be

used

Statistics On Non-Indexed Columns

and Jo**in**s cont.

“Inherited” SARG example

select ....from TW1, TW4

w**here** TW1.A = TW4.A and TW1.A = 10

Select**in**g best **in**dex for the JOIN CLAUSE:

TW4.A = TW1.A

TW4.A = 10

Estimated selectivity for a,

selectivity = 0.003726,upper limit = 0.049683.

His**to**gram values used

select ....from TW1, TW4

w**here** TW1.A = TW4.A and TW1.B = 10

Select**in**g best **in**dex for the JOIN CLAUSE:

TW4.A = TW1.A

Estimated selectivity for a,

selectivity = 0.320889. Total density value used

Statistics On Non-Indexed Columns

and Jo**in**s - Example

select * from TW1,TW2

w**here** TW1.A=TW2.A and TW1.A =805975090

A simple jo**in** with a SARG on the jo**in** column of one table

Table TW2 column A has no statistics, TW1 column A does

Select**in**g best **in**dex for the JOIN CLAUSE: (for TW2.A)

TW2.A = TW1.A

TW2.A = 805975090 Inherited from SARG on TW1

But, can’t help…no stats

Estimated selectivity for A,

selectivity = 0.100000.

The best qualify**in**g access is a table scan,

cost**in**g 13384 pages, with an estimate of 50000

rows **to** be returned per scan of the table,

us**in**g no data prefetch (size 2K I/O),

**in** data cache 'default data cache' (cacheid 0)

with MRU replacement

Jo**in** selectivity is 0.100000.

Inherited SARG from other table doesn’t help **in** **this** case

Statistics On Non-Indexed Columns

and Jo**in**s – Example cont.

Without statistics on TW2.A the plan **in**cludes a re**format**

with TW1 as the outer table

FINAL PLAN (**to**tal cost = 2855774):

varno=0 (TW1) **in**dexid=2 (A_E_F)

path=0xfbd46800 pathtype=sclause

method=NESTED ITERATION

varno=1 (TW2) **in**dexid=0 ()

path=0xfbd0bb10 pathtype=jo**in**

method=REFORMATTING

• Not the best plan – but the optimizer had little **to** go on

Statistics On Non-Indexed Columns

and Jo**in**s – Example cont.

• Table TW2 column A now has statistics

• The **in**herited SARG on TW1.A can now be used **to** help

filter the jo**in** on TW2.A

Select**in**g best **in**dex for the JOIN CLAUSE:

TW2.A = TW1.A

TW2.A = 805975090

Estimated selectivity for A,

selectivity = 0.001447, upper limit = 0.052948.

The best qualify**in**g access is a table scan,

cost**in**g 13384 pages, with an estimate of 724 rows **to** be

returned per scan of the table, us**in**g no data prefetch

(size 2K I/O), **in** data cache 'default data cache' (cacheid

0) with MRU replacement

Jo**in** selectivity is 0.001447.

Statistics On Non-Indexed Columns

and Jo**in**s – Example cont.

• With statistics on TW2.A re**format**t**in**g is not used and the

jo**in** order has changed

FINAL PLAN (**to**tal cost = 1252148):

varno=1 (TW2) **in**dexid=0 ()

path=0xfbd0b800 pathtype=sclause

method=NESTED ITERATION

varno=0 (TW1) **in**dexid=2 (A_E_F)

path=0xfbd46800 pathtype=sclause

method=NESTED ITERATION

The Effects of Chang**in**g the

Number of Steps (Cells)

• The number of cells (steps) affects SARG cost**in**g – as the number

of steps changes, cost**in**g does **to**o

• Cell weights and range cell density are used **in** cost**in**g SARGs

• Cell weight is used as column’s ‘upper limit’ Range cell density is used

as ‘selectivity’ for Equi-SARGs – as seen **in** 302 output

• Result(s) of **in**terpolation is used as column ‘selectivity’ for Range

SARGs

• Increas**in**g the number of steps narrows the average cell width, thus the

weight of Range cells decreases

• Can also result **in** more Frequency count cells and thus change the

Range cell density value

• More cells means more granular cells

The Effects of Chang**in**g the Number of Steps

(Cells) cont.

Average cell width = # of rows/(# of requested steps –1)

• Table has 1 million rows, requested 20 steps -

• 1,000,000/19 = 52,632 rows per cell

• 1,000,000/199 = 5,025 rows per cell

• What does **this** mean?

• As you **in**crease the number of steps (cells) they

become narrower – represent**in**g fewer values

• We’ll see that **this** has an effect on how the optimizer

estimates the cost of a SARG

The Effects of Chang**in**g the

Number of Steps (Cells) cont.

Chang**in**g the number of steps – effects on Equi-SARGs

select A from TW2 w**here** B = 842000000

With 20 cells (steps) **in** the his**to**gram

Range cell density: 0.0012829768785739

9 0.05263200

• Range cell density decreased because Frequency

count cells appeared **in** the his**to**gram

The Effects of Chang**in**g the

Number of Steps (Cells) cont.

With 200 cells (steps) **in** the his**to**gram

Range cell density: 0.0002303825911991

77 0.00507200

The Effects of Chang**in**g the

Number of Steps (Cells) cont.

Chang**in**g the number of steps – effects on Range SARGs -

select * from TW2 w**here** B between

825570000 and 830000000

With 20 cells (steps) **in** the his**to**gram

Range cell density: 0.0012829768785739

9 0.05263200

The Effects of Chang**in**g the

Number of Steps (Cells) cont.

select * from TW2 w**here** B between

825570000 and 830000000

With 200 cells (steps) **in** the his**to**gram

Range cell density: 0.0002303825911991

67 0.00505200

Add**in**g Boundary Values To The His**to**gram

• Chang**in**g the boundary values can keep SARG values

with**in** the his**to**gram

• Avoids ‘out of bounds’ cost**in**g

• Out of bounds cost**in**g usually happens on an a**to**mic column

whose his**to**gram is out of date **in** relation the SARG value(s)

• Optimizer has only two choices for selectivity – 1 or 0

depend**in**g on the SARG opera**to**r and which end of the

his**to**gram the SARG value falls outside of

Add**in**g Boundary Values

To The His**to**gram cont.

His**to**gram for column: “F"

Column datatype: datetimn

Requested step count: 20

Actual step count: 20

Step Weight Value

1 0.28396901 < "May 1 2002 12:00:00:000AM"

2 0.04839900 = "May 1 2002 12:00:00:000AM“

20 0.00432500

Add**in**g Boundary Values

To The His**to**gram cont.

Out of bounds cost**in**g that uses a 0.00 selectivity

select count(*) from TW1 w**here** F = "April 30, 2002“

Add**in**g Boundary Values

To The His**to**gram cont.

Out of bounds cost**in**g that uses a 1.00 selectivity

select count(*) from TW1 w**here** F >= “Apr 30 2002”

> “Apr 30 2002”

“May 16 2002”

Estimated selectivity for F,

selectivity = 1.000000.

Lower bound search value 'Apr 30 2002 12:00:00:000AM' is less

than the smallest value **in** sysstatistics for **this** column.

Estimat**in**g selectivity of **in**dex ‘**in**d_F', **in**did 6

scan selectivity 1.000000,filter selectivity 1.000000

Search argument selectivity is 1.000000.

Add**in**g Boundary Values

To The His**to**gram cont.

What **to** do if out of bounds cost**in**g is a problem

• Not always a problem, particularly when a selectivity of

0.000000 is used

• T**here** are two ways **to** deal with it

• Add a dummy row **to** the table with a column value that

allows the SARG value(s) **to** fall with**in** the his**to**gram – not

always allowed

• If you do add a dummy row keep **in** m**in**d that it will affect

the his**to**grams of other columns; be careful with the values

you use

• Write a new his**to**gram boundary us**in**g optdiag. Edit the file

and read it back **in**. This won’t directly affect the data, but it

will extend the his**to**gram **to** **in**clude the SARG values(s)

Remov**in**g Statistics Can Effect Query Plans

Sometimes no statistics are better then hav**in**g them

This will usually be an issue when very dense columns

are **in**volved

His**to**gram for column: “E"

Step Weight Value

1 0.00000000 < "no"

2 0.47256401 = "no"

3 0.00000000 < "yes"

4 0.52743602 = "yes“

This can also show up when you have ‘spikes’

(Frequency count cells) **in** the distribution

Remov**in**g Statistics Can

Effect Query Plans cont.

select count(*) from TW4

w**here** E = “yes” and C = 825765940

The table…has 1000000 rows, 24098 pages,

Estimated selectivity for E,

selectivity = 0.527436, upper limit = 0.527436.

Estimat**in**g selectivity of **in**dex ‘E_AA_B', **in**did 6

scan selectivity 0.52743602,filter selectivity 0.527436

527436 rows, 174107 pages

The best qualify**in**g **in**dex is ‘E_AA_B' (**in**did 6)

cost**in**g 174107 pages, with an estimate of 526 rows

FROM TABLE

TW4

Nested iteration.

Table Scan.

Remov**in**g Statistics Can

Effect Query Plans cont.

delete statistics TW4(E)

Estimated selectivity for E,

selectivity = 0.100000.

Estimat**in**g selectivity of **in**dex ‘E_AA_B', **in**did 6

scan selectivity 0.100000,filter selectivity 0.100000

100000 rows, 20584 pages

The best qualify**in**g **in**dex is ‘E_AA_B (**in**did 6)

cost**in**g 20584 pages, with an estimate of 92 rows

FROM TABLE

TW4

Nested iteration.

Index : E_AA_B

Forward scan.

Position**in**g by key.

Ma**in**ta**in****in**g Tuned Statistics

Tuned statistics will add **to** your ma**in**tenance

• Any statistical value you write **to** sysstatistics either via

optdiag or sp_modifystats will be overwritten by update

statistics

• Keep optdiag **in**put files for reuse

• If needed get an optdiag output file, edit it and read it **in**

• Keep scripts that run sp_modifystats

• Rewrite tuned statistics after runn**in**g update statistics that

affects the column with the modified statistics

Sampl**in**g For Update Statistics

New feature **in** 12.5.0.3

• Can dramatically speed up the runn**in**g of update statistics

• Reads rows from random pages **to** build column level

statistics (his**to**gram)

• The percentage of pages **to** sample can be specified

update statistics table(col) with sampl**in**g=10 percent

• Also applies **to** update **in**dex statistics and

update all statistics

• Unofficial tests show that a sampl**in**g rate of 10% on a 1

million row numeric column reduces the time for update

statistics **to** run from 9 m**in**utes **to** 30 seconds

Sampl**in**g For Update Statistics cont.

• Density values not updated by sampl**in**g

• Sampled statistics will vary from those obta**in**ed by a ‘full

scan’

• More variations will appear as the sampl**in**g rate

decreases

• Test queries aga**in**st sampled statistics. In most cases

you won’t see any major changes

• Values may become ‘out of bounds’ **this** will affect the

optimizer – likely **to** have greatest affect on a**to**mic

columns

W**here** To Get More In**format**ion

• The **Sybase** Cus**to**mer newsgroups

• http://support.sybase.com/newsgroups

• The **Sybase** list server

• SYBASE-L@LISTSERV.UCSB.EDU

• The external **Sybase** FAQ

• http://www.isug.com/**Sybase**_FAQ/

• Jo**in** the ISUG, ISUG Technical Journal, feature requests

• http://www.isug.com

W**here** To Get More In**format**ion

• The latest Performance and Tun**in**g Guide

• Don’t be put off by the ASE 12.0 **in** the title, it covers the

11.9.2 features/functionality **to**o

• http://sybooks.sybase.com/onl**in**ebooks/group-as/asg1200e

• Any “What’s New” docs for a new ASE release

• Tech Docs at **Sybase** Support

• http://tech**in**fo.sybase.com/css/tech**in**fo.nsf/Home

• Upgrade/Migration help page

• http://www.sybase.com/support/techdocs/migration

**Sybase** Developer Network (SDN)

Additional Resources for Developers/DBAs

• S**in**gle po**in**t of access **to** developer software, services,

and up-**to**-date technical **in****format**ion:

• White papers and documentation

• Collaboration with other developers and **Sybase** eng**in**eers

• Code samples and beta programs

• Technical record**in**gs

• Free software

• Jo**in** **to**day: www.sybase.com/developer or visit SDN at

TechWave’s Technology Boardwalk