Click here to download this presentation in PDF format. - Sybase

sybase.jp

Click here to download this presentation in PDF format. - Sybase

This Presentation Will Include:

• An overview of Query Processing and the Optimizer’s

place in it

• Evolution of the Optimizer over the last three versions

• Description of the Optimizer Statistics

• How, where and why to add statistics

• Some Optimizer related Myths and Legends


The Optimizer’s Place In Query Processing - cont.

The optimizer is one phase of Query Processing

• Parser - checks for validity of the SQL

• Normalizer - Changes names to object ids, rearranges

ANDs and ORs, creates the query tree

• Pre-processing - resolves and merges views with tree,

transforms subqueries

• Optimizer - estimates cheapest way to access the data,

creates query plan, cost based estimates

• Execution engine - executes the steps in the query plan,

returns the result set


ASE 11.9.2 - The First Major Optimizer Changes

ASE 11.9.2 introduced major changes

First phase of the optimizer redesign

• Major changes to the statistics - statistics system

tables added

• New update statistics and create index syntax

developed to allow placement of statistics on columns

• New statistics used in costing - the cluster ratios

• New costing methods and enhancement of existing

costing

• Tool added to allow reading,writing and simulating of

the statistics - optdiag utility


ASE 12.0 - More New Features and Functionality

ASE 12.0 - Enhancements and Expanded

Functionality

Many involved both optimization and query processing

• Sort Merge Joins

• Join Transitive Closure

• Predicate Factoring and Transformation

• Like optimization enhancements

• 50 Table Limit

• Abstract Query Plans


ASE 12.5

No fundamental optimizer changes, some QP

features

• Handling statistics for columns greater than 255

bytes

• Union in views

• Changes to optdiag to handle large page sizes


What Are The Statistics

They describe the table, its indexes and the data in

the columns to the optimizer

Used by the optimizer to estimate the most efficient way

to access the data required by a query - selectivity

• The statistics are all the optimizer knows about your

dataset

• Without statistics the optimizer can only make

guesses about how selective a column or index will

be - it will use default values for selectivity


Some Terms and Definitions

• Histogram - representation of the distribution of values in

the column

• Steps (cells) - sampled values from the column used to

build the histogram

• Density values - measure the average number of duplicate

values in the column used to cost joins and sargs

• Histogram weights - the percentage of the column

represented by histogram cell, makes each cell very

accurate

• Costing - the optimizer’s process of estimating the cost of

an access method

• Frequency count cell (histogram) - represents one highly

duplicated value, very accurate measure


Review Of Changes To The Statistics

in ASE 11.9.2 & Above

There are two types of statistics -

Table/Index level statistics describe the table and its indexes to

the optimizer. They are now centralized in systabstats

• Some are maintained dynamically by ASE

• Page/Row counts, deleted/forwarded rows, cluster ratios

• Should not be written directly - will be quickly overwritten

Column level (Distribution) statistics belong to a column, not

an index. They describe the data in the column to the

optimizer. They are stored in sysstatistics.

• Are static, need to be updated or written

• Can be written directly


Additional & Modified Column Statistics

Additional Statistics

• Statistics on columns other than the leading column

of an index - minor index columns and non-indexed

columns

Modifiing statistics -

• Any statistical value you write directly

• Usually done via optdiag or sp_modifystats

• Only column level (sysstatistics) values should be

written


Why Would You Want To Add or Modify Statistics

• Make composite indexes more selective - adding

statistics to inner index columns - Highly recommended

• Put statistics on non-indexed columns - good for join

costing, no statistics on a column means assumptions

are made

• Change the number of steps (cells) in a column’s

histogram

• More granular histograms - Highly recommended

• Change the default selectivity values - writing statistics

• Change the density values - writing statistics

• Add statistics to a column’s histogram - writing statistics


Update Statistics

The statistics are obtained by reading the data in a

column.

Update statistics gathers and writes the statistics

• Statistics based on the data and the state of the

table/index

update statistics table_A [index_1]

Writes statistics for the leading column of index(es) only

- same as in past versions of ASE

update statistics table_A (colu m n_nam e)

• This will create or update statistics on the specified

column


Update Statistics cont.

Use update statistics to build/update column level

statistics

update index statistics table_na m e [index_nam e]

• This will build/ update column statistics on all columns

of all indexes in the table, or on the specified index.

• Highly recommended

update allstatistics table_na m e

• This will create or update column level statistics on all

columns of the table. It will also run update partition

statistics.

• WARNING - This can take a VERY long time to run

• It is rarely necessary to run update all statistics


A Word About Maintaining Statistics

The more columns with statistics the more

maintenance - usually a good trade off

Maintenance considerations include -

• The time to run update statistics on each column

• Editing and/or reading in an optdiag file

• Increased use of tempdb for updating column

statistics

• A worktable will be used to sort all inner index columns

and non-indexed columns

• Proc cache usage for sorts - dependant on size of

datatype


Adding Statistics

Statistics can be added to any column

Gives the optimizer more information about a composite

index (more selective).

On a non-indexed column - more accurate join costing

of non-indexed columns

• Not absolutely necessary, but highly recommended

• update statistics table_name (col_name

• update index statistics table_name [ind_name]

• Does add maintenance

• Test it before implementing


Adding Statistics To Composite Index Columns–eg:

Statistics on the leading column of the index only - col_A

SARGs on col_A, col_B, col_C

• No statistics available for col_B or col_C,

col_B uses the defaultin between selectivity of 0.25

col_C uses selectivity of 0.10 for equi-SAR G with no statistics

Estimating selectivity of index 't1_i1',indid 2

scan selectivity 0.900251,filter selectivity 0.056268

5627 rows, 6480 pages

Search argu ment selectivity is 0.05627


Adding Statistics To Composite Index Columns–eg:

Statistics on all three columns of the index

Estimating selectivity of index 't1_i1',indid 2

scan selectivity 0.900283, filter selectivity 0.000283

28 rows, 882 pages

Search argu ment selectivity is 0.000028.

Table: t1 scan count 1,logicalreads:(regular=885 apf=0

total=885),physicalreads:(regular=0 apf=0 total=0


Adding Statistics To Non-Indexed Columns

Useful in costing joins on non-indexed columns

Without statistics on the column there is no Total density

value to use in costing joins

• Assumptions made about how many rows will qualify

• Not usually accurate - based on the join operator

Estimated selectivity for col_A,

selectivity = 0.100000.

Statistics on the column allow the total density value to be

used to estimate the number of qualifying rows.

Estimated selectivity for col_A,

selectivity = 0.000025, upper limit = 0.081425.


Changing Requested Step Count

The number of histogram steps has an effect on the

optimizer

By default new column statistics are built using 20 steps

(cells)

If statistics exist the existing step count will be reused

unless you specify a different count

• Increasing the step count may result in more

Frequency count cells - they represent only one

value, very accurate

• Will effect optimization of SARGs because cell

granularity is increased - cells represent fewer rows

each - lower weights


Changing Requested Step Count - How Many

Steps to Request

The number of steps to request will depend on your

data and your queries

From the default of 20 try 200 and run tests - use trial

and error to determine the best number of steps to

use

• Use traceon 302 to monitor changes to estimated

selectivity values as you change the number of steps

There’s no rule of thumb for how many steps to use


Changing Requested Step Count How To Do It

Extensions to create index and update statistics

create index I1 on T1 (colA) using X values

update statistics T1 using X values

X values = requested steps, seen in optdiag

• You may not need a lot of cells

• create index with 0 values will create index, but will

not write the statistics


Writing The Statistics Directly

Use an optdiag input file to write the statistics

directly

Always get an optdiag output file before writing or

changing the statistics - as insurance

• -o output_file_name, -i input_file_name

• Save a clean copy of the output file

• Useful in general if you want to go back to a previous

set of statistics

• Rename and edit output file for changes to the

statistics


Maintaining Directly Written Statistics

Use optdiag input files to maintain statistics that

have been written directly

All changes to the column level statistics will be over

written by update statistics

Traceon 302 output will display message when edited

statistics are used in costing

Statistics for this colum n have been edited.

If you change non-persistent values you will need set

them back after updating statistics


Changing Statistics - Highly Duplicated Values

and The Statistics

A few values occupy many rows while many values

occupy a few rows - spikes in the distribution

Highly duplicated values will have an effect on the Total

Density value and thus on the costing of joins.

• Possible that the estimated number of rows qualifying

for a join from an inner table will be estimated too high

• Weighted averaging used to obtain the Total density

• Highly duplicated values have a disproportional effect

on the Total density value

• Try the arithmetic average -

Number of distinct values/ number of rows


Example of a Highly Duplicated Value in the

Histogram

Example of a highly duplicate value in the histogram -

Range cell density: 0.0000502421670203

Totaldensity: 0.2697381850000000

Step Weight Value

1 0.00000000


Changing Statistics - Highly Duplicated Values

and The Statistics cont.

sp_modifystats - New system stored procedure

Makes modifications to the density values (best version

in 11.9.2.4, 12.0.0.4, 12.5)

• Can specify a value, factor it by 10 or match the Total

and Range cell density (not a good idea)

• Documented in 12.5 docs

• Caution - remember that the Total Density will be

used for all joins on the column. If it is set very low

you may want to modify Total density using optdiag


Changing Statistics - SARG Values That Are Out

Of Range

If a value is greater than the largest or less than the

smallest histogram boundary value

Traceon 302 sample -

Estimated selectivity for colA,

selectivity = 0.000000, upper limit = 0.000000.

Lower bound search value 10000 is greater than the largest value

in sysstatistics for this colu m n.

• Special costing is done

• Not always the most accurate value:

• Selectivity of 0.00 or 1.00 depending on the value and

the operator


Changing Statistics - SARG Values That Are Out

Of Range

Two ways to effect the histogram for this -

• Add a dummy row to the data

• Not always practical or allowed - but is persistent

• Add a dummy boundary value to the histogram via optdiag

• Easier to do in some cases, but not persistent

• It’s July 8 and update statistics hasn’t been run - step 20 is last

18 0.05301946


Writing the Statistics Or Running Update

Statistics

Two ways to use optdiag instead of update

statistics

The Dump and Load Method - no editing of statistics

required

• Dump dataset and load somewhere else

• Run update stats on the loaded dataset

• Get an optdiag output file for those tables you want

new stats on

• Load the optdiag file into the original dataset

Will take more time than running update statistics on the

original dataset - but no interference with users


Writing the Statistics Or Running Update

Statistics cont.

The Optdiag Method - requires editing of optdiag

output files

• Get an optdiag output file of the tables you want to

update statistics on (may also do this for individual

columns)

• Edit the files to reflect changes in your dataset - you’ll

need to understand what changes have occurred

• Read the file in via optdiag

Very fast, take care to ensure that statistics are correct

Test it before implementing it


Some Optimizer Myths: The 20% Rule”

“If 20% or more of the table (rows) will be returned

the only choice the optimizer has is to table scan”

Grew out of a generality - made it’s way into many

publications

• Was a way of addressing ‘pessimistic non-clustered

index costing - 1 I/O for every row in the leaf level

• A clustered index access and covered index access

disprove

• Data Row Cluster Ratio now used it measure

clustering of index rows in relation to datapages


“Update Statistics Will Give You Good Performance”

Update statistics only guarantees that statistics are

up to date at end of the run

Statistics are the optimizer’s view of your dataset - the

view may not always be pretty

• Simply because statistics are up to date doesn’t

mean an index or join order will be used

• The distribution of values could change a great deal

between runs

• Using an old statistics set is fine if it results in efficient

plan


“Delete Statistics from Time to Time”

We started this one by accident

Resulted from a ‘print bug’ that was fixed early on

• DO NOT DELETE STATISTICS

• Will loose ‘requested step count’ - have to use default

of 20 steps

• May have an adverse effect on costing range and equi-

SARGs

• May not result in Frequency count cells appearing

• The only time to delete statistics is when you know you

don’t need them or when removing them helps


“There’s A Traceflag To Set Optimizer Behavior

Back To An Earlier Version”

There is NO traceflag that will do this (there never has

been)

This seems to come up with every new version

• There are traces for some individual

behavior/functionality

• Usually introduced to ‘fix’ a bug or provide optional

or backwards compatible functionality

• Sometimes they have a long ‘life’ in ASE, sometimes

not

• Be careful of becoming dependant on them

• Don’t use them unless they’ve been fully explained to

you


Conclusion

ASE 11.9.2 and above allows you add or write

statistics

• Adding and writing statistics in not absolutely

necessary - but they’re powerful P&T tools

• Adding column level statistics is highly recommended

in most cases

• Writing statistics directly is recommended only when

necessary

Always test before implementing


Where To Get More Information

• The Sybase Customer newsgroups:

http://support.sybase.com/newsgroups

• The Sybase list server:

SYBASE-L@LISTSERV.UCSB.EDU

• The external Sybase FAQ:

http://www.isug.com/Sybase_FAQ/

• Join the ISUG, ISUG Technical Journal, feature

requests http://www.isug.com

More magazines by this user
Similar magazines