Click here to download this presentation in PDF format. - Sybase

This Presentation Will Include:

• An overview of Query Process**in**g and the Optimizer’s

place **in** it

• Evolution of the Optimizer over the last three versions

• Description of the Optimizer Statistics

• How, w**here** and why **to** add statistics

• Some Optimizer related Myths and Legends

The Optimizer’s Place In Query Process**in**g - cont.

The optimizer is one phase of Query Process**in**g

• Parser - checks for validity of the SQL

• Normalizer - Changes names **to** object ids, rearranges

ANDs and ORs, creates the query tree

• Pre-process**in**g - resolves and merges views with tree,

transforms subqueries

• Optimizer - estimates cheapest way **to** access the data,

creates query plan, cost based estimates

• Execution eng**in**e - executes the steps **in** the query plan,

returns the result set

ASE 11.9.2 - The First Major Optimizer Changes

ASE 11.9.2 **in**troduced major changes

First phase of the optimizer redesign

• Major changes **to** the statistics - statistics system

tables added

• New update statistics and create **in**dex syntax

developed **to** allow placement of statistics on columns

• New statistics used **in** cost**in**g - the cluster ratios

• New cost**in**g methods and enhancement of exist**in**g

cost**in**g

• Tool added **to** allow read**in**g,writ**in**g and simulat**in**g of

the statistics - optdiag utility

ASE 12.0 - More New Features and Functionality

ASE 12.0 - Enhancements and Expanded

Functionality

Many **in**volved both optimization and query process**in**g

• Sort Merge Jo**in**s

• Jo**in** Transitive Closure

• Predicate Fac**to**r**in**g and Trans**format**ion

• Like optimization enhancements

• 50 Table Limit

• Abstract Query Plans

ASE 12.5

No fundamental optimizer changes, some QP

features

• Handl**in**g statistics for columns greater than 255

bytes

• Union **in** views

• Changes **to** optdiag **to** handle large page sizes

What Are The Statistics

They describe the table, its **in**dexes and the data **in**

the columns **to** the optimizer

Used by the optimizer **to** estimate the most efficient way

**to** access the data required by a query - selectivity

• The statistics are all the optimizer knows about your

dataset

• Without statistics the optimizer can only make

guesses about how selective a column or **in**dex will

be - it will use default values for selectivity

Some Terms and Def**in**itions

• His**to**gram - re**presentation** of the distribution of values **in**

the column

• Steps (cells) - sampled values from the column used **to**

build the his**to**gram

• Density values - measure the average number of duplicate

values **in** the column used **to** cost jo**in**s and sargs

• His**to**gram weights - the percentage of the column

represented by his**to**gram cell, makes each cell very

accurate

• Cost**in**g - the optimizer’s process of estimat**in**g the cost of

an access method

• Frequency count cell (his**to**gram) - represents one highly

duplicated value, very accurate measure

Review Of Changes To The Statistics

**in** ASE 11.9.2 & Above

T**here** are two types of statistics -

Table/Index level statistics describe the table and its **in**dexes **to**

the optimizer. They are now centralized **in** systabstats

• Some are ma**in**ta**in**ed dynamically by ASE

• Page/Row counts, deleted/forwarded rows, cluster ratios

• Should not be written directly - will be quickly overwritten

Column level (Distribution) statistics belong **to** a column, not

an **in**dex. They describe the data **in** the column **to** the

optimizer. They are s**to**red **in** sysstatistics.

• Are static, need **to** be updated or written

• Can be written directly

Additional & Modified Column Statistics

Additional Statistics

• Statistics on columns other than the lead**in**g column

of an **in**dex - m**in**or **in**dex columns and non-**in**dexed

columns

Modifi**in**g statistics -

• Any statistical value you write directly

• Usually done via optdiag or sp_modifystats

• Only column level (sysstatistics) values should be

written

Why Would You Want To Add or Modify Statistics

• Make composite **in**dexes more selective - add**in**g

statistics **to** **in**ner **in**dex columns - Highly recommended

• Put statistics on non-**in**dexed columns - good for jo**in**

cost**in**g, no statistics on a column means assumptions

are made

• Change the number of steps (cells) **in** a column’s

his**to**gram

• More granular his**to**grams - Highly recommended

• Change the default selectivity values - writ**in**g statistics

• Change the density values - writ**in**g statistics

• Add statistics **to** a column’s his**to**gram - writ**in**g statistics

Update Statistics

The statistics are obta**in**ed by read**in**g the data **in** a

column.

Update statistics gathers and writes the statistics

• Statistics based on the data and the state of the

table/**in**dex

update statistics table_A [**in**dex_1]

Writes statistics for the lead**in**g column of **in**dex(es) only

- same as **in** past versions of ASE

update statistics table_A (colu m n_nam e)

• This will create or update statistics on the specified

column

Update Statistics cont.

Use update statistics **to** build/update column level

statistics

update **in**dex statistics table_na m e [**in**dex_nam e]

• This will build/ update column statistics on all columns

of all **in**dexes **in** the table, or on the specified **in**dex.

• Highly recommended

update allstatistics table_na m e

• This will create or update column level statistics on all

columns of the table. It will also run update partition

statistics.

• WARNING - This can take a VERY long time **to** run

• It is rarely necessary **to** run update all statistics

A Word About Ma**in**ta**in****in**g Statistics

The more columns with statistics the more

ma**in**tenance - usually a good trade off

Ma**in**tenance considerations **in**clude -

• The time **to** run update statistics on each column

• Edit**in**g and/or read**in**g **in** an optdiag file

• Increased use of tempdb for updat**in**g column

statistics

• A worktable will be used **to** sort all **in**ner **in**dex columns

and non-**in**dexed columns

• Proc cache usage for sorts - dependant on size of

datatype

Add**in**g Statistics

Statistics can be added **to** any column

Gives the optimizer more **in****format**ion about a composite

**in**dex (more selective).

On a non-**in**dexed column - more accurate jo**in** cost**in**g

of non-**in**dexed columns

• Not absolutely necessary, but highly recommended

• update statistics table_name (col_name

• update **in**dex statistics table_name [**in**d_name]

• Does add ma**in**tenance

• Test it before implement**in**g

Add**in**g Statistics To Composite Index Columns–eg:

Statistics on the lead**in**g column of the **in**dex only - col_A

SARGs on col_A, col_B, col_C

• No statistics available for col_B or col_C,

col_B uses the default**in** between selectivity of 0.25

col_C uses selectivity of 0.10 for equi-SAR G with no statistics

Estimat**in**g selectivity of **in**dex 't1_i1',**in**did 2

scan selectivity 0.900251,filter selectivity 0.056268

5627 rows, 6480 pages

Search argu ment selectivity is 0.05627

Add**in**g Statistics To Composite Index Columns–eg:

Statistics on all three columns of the **in**dex

Estimat**in**g selectivity of **in**dex 't1_i1',**in**did 2

scan selectivity 0.900283, filter selectivity 0.000283

28 rows, 882 pages

Search argu ment selectivity is 0.000028.

Table: t1 scan count 1,logicalreads:(regular=885 apf=0

**to**tal=885),physicalreads:(regular=0 apf=0 **to**tal=0

Add**in**g Statistics To Non-Indexed Columns

Useful **in** cost**in**g jo**in**s on non-**in**dexed columns

Without statistics on the column t**here** is no Total density

value **to** use **in** cost**in**g jo**in**s

• Assumptions made about how many rows will qualify

• Not usually accurate - based on the jo**in** opera**to**r

Estimated selectivity for col_A,

selectivity = 0.100000.

Statistics on the column allow the **to**tal density value **to** be

used **to** estimate the number of qualify**in**g rows.

Estimated selectivity for col_A,

selectivity = 0.000025, upper limit = 0.081425.

Chang**in**g Requested Step Count

The number of his**to**gram steps has an effect on the

optimizer

By default new column statistics are built us**in**g 20 steps

(cells)

If statistics exist the exist**in**g step count will be reused

unless you specify a different count

• Increas**in**g the step count may result **in** more

Frequency count cells - they represent only one

value, very accurate

• Will effect optimization of SARGs because cell

granularity is **in**creased - cells represent fewer rows

each - lower weights

Chang**in**g Requested Step Count - How Many

Steps **to** Request

The number of steps **to** request will depend on your

data and your queries

From the default of 20 try 200 and run tests - use trial

and error **to** determ**in**e the best number of steps **to**

use

• Use traceon 302 **to** moni**to**r changes **to** estimated

selectivity values as you change the number of steps

T**here**’s no rule of thumb for how many steps **to** use

Chang**in**g Requested Step Count How To Do It

Extensions **to** create **in**dex and update statistics

create **in**dex I1 on T1 (colA) us**in**g X values

update statistics T1 us**in**g X values

X values = requested steps, seen **in** optdiag

• You may not need a lot of cells

• create **in**dex with 0 values will create **in**dex, but will

not write the statistics

Writ**in**g The Statistics Directly

Use an optdiag **in**put file **to** write the statistics

directly

Always get an optdiag output file before writ**in**g or

chang**in**g the statistics - as **in**surance

• -o output_file_name, -i **in**put_file_name

• Save a clean copy of the output file

• Useful **in** general if you want **to** go back **to** a previous

set of statistics

• Rename and edit output file for changes **to** the

statistics

Ma**in**ta**in****in**g Directly Written Statistics

Use optdiag **in**put files **to** ma**in**ta**in** statistics that

have been written directly

All changes **to** the column level statistics will be over

written by update statistics

Traceon 302 output will display message when edited

statistics are used **in** cost**in**g

Statistics for **this** colum n have been edited.

If you change non-persistent values you will need set

them back after updat**in**g statistics

Chang**in**g Statistics - Highly Duplicated Values

and The Statistics

A few values occupy many rows while many values

occupy a few rows - spikes **in** the distribution

Highly duplicated values will have an effect on the Total

Density value and thus on the cost**in**g of jo**in**s.

• Possible that the estimated number of rows qualify**in**g

for a jo**in** from an **in**ner table will be estimated **to**o high

• Weighted averag**in**g used **to** obta**in** the Total density

• Highly duplicated values have a disproportional effect

on the Total density value

• Try the arithmetic average -

Number of dist**in**ct values/ number of rows

Example of a Highly Duplicated Value **in** the

His**to**gram

Example of a highly duplicate value **in** the his**to**gram -

Range cell density: 0.0000502421670203

Totaldensity: 0.2697381850000000

Step Weight Value

1 0.00000000

Chang**in**g Statistics - Highly Duplicated Values

and The Statistics cont.

sp_modifystats - New system s**to**red procedure

Makes modifications **to** the density values (best version

**in** 11.9.2.4, 12.0.0.4, 12.5)

• Can specify a value, fac**to**r it by 10 or match the Total

and Range cell density (not a good idea)

• Documented **in** 12.5 docs

• Caution - remember that the Total Density will be

used for all jo**in**s on the column. If it is set very low

you may want **to** modify Total density us**in**g optdiag

Chang**in**g Statistics - SARG Values That Are Out

Of Range

If a value is greater than the largest or less than the

smallest his**to**gram boundary value

Traceon 302 sample -

Estimated selectivity for colA,

selectivity = 0.000000, upper limit = 0.000000.

Lower bound search value 10000 is greater than the largest value

**in** sysstatistics for **this** colu m n.

• Special cost**in**g is done

• Not always the most accurate value:

• Selectivity of 0.00 or 1.00 depend**in**g on the value and

the opera**to**r

Chang**in**g Statistics - SARG Values That Are Out

Of Range

Two ways **to** effect the his**to**gram for **this** -

• Add a dummy row **to** the data

• Not always practical or allowed - but is persistent

• Add a dummy boundary value **to** the his**to**gram via optdiag

• Easier **to** do **in** some cases, but not persistent

• It’s July 8 and update statistics hasn’t been run - step 20 is last

18 0.05301946

Writ**in**g the Statistics Or Runn**in**g Update

Statistics

Two ways **to** use optdiag **in**stead of update

statistics

The Dump and Load Method - no edit**in**g of statistics

required

• Dump dataset and load somew**here** else

• Run update stats on the loaded dataset

• Get an optdiag output file for those tables you want

new stats on

• Load the optdiag file **in****to** the orig**in**al dataset

Will take more time than runn**in**g update statistics on the

orig**in**al dataset - but no **in**terference with users

Writ**in**g the Statistics Or Runn**in**g Update

Statistics cont.

The Optdiag Method - requires edit**in**g of optdiag

output files

• Get an optdiag output file of the tables you want **to**

update statistics on (may also do **this** for **in**dividual

columns)

• Edit the files **to** reflect changes **in** your dataset - you’ll

need **to** understand what changes have occurred

• Read the file **in** via optdiag

Very fast, take care **to** ensure that statistics are correct

Test it before implement**in**g it

Some Optimizer Myths: The 20% Rule”

“If 20% or more of the table (rows) will be returned

the only choice the optimizer has is **to** table scan”

Grew out of a generality - made it’s way **in****to** many

publications

• Was a way of address**in**g ‘pessimistic non-clustered

**in**dex cost**in**g - 1 I/O for every row **in** the leaf level

• A clustered **in**dex access and covered **in**dex access

disprove

• Data Row Cluster Ratio now used it measure

cluster**in**g of **in**dex rows **in** relation **to** datapages

“Update Statistics Will Give You Good Performance”

Update statistics only guarantees that statistics are

up **to** date at end of the run

Statistics are the optimizer’s view of your dataset - the

view may not always be pretty

• Simply because statistics are up **to** date doesn’t

mean an **in**dex or jo**in** order will be used

• The distribution of values could change a great deal

between runs

• Us**in**g an old statistics set is f**in**e if it results **in** efficient

plan

“Delete Statistics from Time **to** Time”

We started **this** one by accident

Resulted from a ‘pr**in**t bug’ that was fixed early on

• DO NOT DELETE STATISTICS

• Will loose ‘requested step count’ - have **to** use default

of 20 steps

• May have an adverse effect on cost**in**g range and equi-

SARGs

• May not result **in** Frequency count cells appear**in**g

• The only time **to** delete statistics is when you know you

don’t need them or when remov**in**g them helps

“T**here**’s A Traceflag To Set Optimizer Behavior

Back To An Earlier Version”

T**here** is NO traceflag that will do **this** (t**here** never has

been)

This seems **to** come up with every new version

• T**here** are traces for some **in**dividual

behavior/functionality

• Usually **in**troduced **to** ‘fix’ a bug or provide optional

or backwards compatible functionality

• Sometimes they have a long ‘life’ **in** ASE, sometimes

not

• Be careful of becom**in**g dependant on them

• Don’t use them unless they’ve been fully expla**in**ed **to**

you

Conclusion

ASE 11.9.2 and above allows you add or write

statistics

• Add**in**g and writ**in**g statistics **in** not absolutely

necessary - but they’re powerful P&T **to**ols

• Add**in**g column level statistics is highly recommended

**in** most cases

• Writ**in**g statistics directly is recommended only when

necessary

Always test before implement**in**g

W**here** To Get More In**format**ion

• The **Sybase** Cus**to**mer newsgroups:

http://support.sybase.com/newsgroups

• The **Sybase** list server:

SYBASE-L@LISTSERV.UCSB.EDU

• The external **Sybase** FAQ:

http://www.isug.com/**Sybase**_FAQ/

• Jo**in** the ISUG, ISUG Technical Journal, feature

requests http://www.isug.com