Automatic Performance Diagnosis and Tuning in Oracle

Automatic Performance Diagnosis and Tuning in OracleKarl Dias, Mark Ramacher, Uri Shaft, Venkateshwaran Venkataramani, Graham WoodOracle Corporation500 Oracle ParkwayRedwood Shores, CA 94065, USA{kdias, mramache, ushaft, veeve, gwood}@oracle.comAbstractPerformance tuning in modern database systemsrequires a lot of expertise, is very timeconsuming and often misdirected. Tuningattempts often lack a methodology that has aholistic view of the database. The absence ofhistorical diagnostic information to investigateperformance issues at first occurrenceexacerbates the whole tuning process oftenrequiring that problems be reproduced beforethey can be correctly diagnosed.In this paper we describe how Oracleovercomes these challenges and provides a wayto perform automatic performance diagnosis andtuning. We define a new measure called‘Database Time’ that provides a commoncurrency to gauge the performance impact of anyresource or activity in the database. We explainhow the Automatic Database Diagnostic Monitor(ADDM) automatically diagnoses thebottlenecks affecting the total databasethroughput and provides actionablerecommendations to alleviate them. We alsodescribe the types of performance measurementsthat are required to perform an ADDM analysis.Finally we show how ADDM plays a central rolewithin Oracle 10g’s manageability framework toself-manage a database and provide acomprehensive tuning solution.1. IntroductionWith the advent of web business, system performance ismore important to businesses than ever before. A poorlyperforming web site is just as bad as one that is actuallyPermission to copy without fee all or part of this material is grantedprovided that the copies are not made or distributed for directcommercial advantage, the VLDB copyright notice and the title of thepublication and its date appear, and notice is given that copying is bypermission of the Very Large Data Base Endowment. To copyotherwise, or to republish, requires a fee and/or special permission fromthe EndowmentProceedings of the 2005 CIDR Conference.down, because users get frustrated and are unlikely toreturn. Databases are an integral part of these high profilesystems. With the decreasing costs of computer hardware,businesses are building more and bigger databases, andthe database administrators (DBAs) are being asked totake on more and larger databases.Database systems typically have a plethora ofmeasurements and statistics available about theiroperation. Often many individual components maintainseparate sets of measurements and it can be hard to get anoverall view of what is happening in the database. Themany sets of measurements can lead to data overload forthe DBA who is tasked with analysing a problem.Identification of the root cause of a performanceproblem is not easy [WE02, CH00, HE97, BR94]. It isnot uncommon for DBAs to spend large amounts of timeand resources fixing performance symptoms, only to findthat this had marginal effect on system performance.Often a symptom is treated mainly because the DBA hasseen that symptom before and knows how to treat it. Lackof a holistic view of the database leads to incorrectdiagnosis and misdirected tuning efforts resulting in overconfiguredsystems increasing the total cost of ownership[HU02, CH00, GA96].Even when ‘correct’ analysis is performed it is oftenfound that the available data stops short of what isrequired to fully diagnose the root cause of the problemand in order to complete the diagnosis the problem mustbe reproduced. Reproducing problems is sometimes nontrivialand can range from simply asking a user to performan operation again, to requiring the building of a copy ofthe database, which may take weeks to set-up. Manytimes performance issues go unresolved because it is notregarded as cost effective to spend the time and resourcesto reproduce the problem. Often, the issue is dropped andthe DBA hopes that the problem will not reoccur. In mostcases, the mechanisms provided to capture detailedinformation required to be able to fully analyse a problemare not designed to be non-intrusive and have a significantoverhead. Sometimes tuning experts or ‘gurus’ are calledin to try and resolve a problem. Specialist skills andexpertise are required because there are many areas wherevalue judgments and rules-of-thumb are involved.

One of the things that makes performance tuningdifficult is that the performance of different componentsof the database is often measured using unrelated metricslike buffer hit-ratio, transactions per second and averageI/O latency. Such unrelated metrics make it infeasible toweigh the performance impact of one component of thedatabase with the impact of others components. Tuningactions that are solely based on such metrics have atendency either to make the DBAs believe that thedatabase needs excessive hardware or to make just onecomponent of the database perform better with negligibleeffect on the total database throughput.Keeping these limitations in mind, we built theAutomatic Database Diagnostic Monitor (ADDM) inOracle 10g. ADDM automates the entire process ofdiagnosing performance issues and suggests relevanttuning recommendations with the primary objective ofmaximizing the total database throughput.Modern database systems have complicatedinteractions between their sub-components and have theability to work with a variety of applications. This resultsin a very large list of potential performance issues suchautomatic diagnosis solutions could identify. Also, as newdatabase technologies and applications are invented andolder ones are obsoleted, it is pivotal that automaticdiagnostic and tuning solutions can easily be adapted toaccommodate such changes.The objectives of an automatic performance diagnosisand tuning solution can be summarized as follows:• Should posses a holistic view of the database andunderstand the interactions between variousdatabase components.• Should be capable of distinguishing symptomsfrom the actual root-cause of performancebottlenecks.• Should provide mechanisms to diagnoseperformance issues on their first occurrence.• Should easily keep up with changingtechnologies.The remainder of the paper is organized as follows:We discuss related work in both academic research andcommercially available databases in Section 2. We definea new measure, Database Time, and describe in detail ourmethodology to solve this problem in Section 3. Wediscuss the various measurements that were required toimplement our solution in Section 4. We extend ourdefinition of Database Time to other models in Section 5and explain how ADDM plays a central role in Oracle10g’s self-managing framework in Section 6. We presentthe experiments we used to validate our implementationand their results in Section 7. Finally, we summarize ourconclusions in Section 8.2. Related WorkThe COMFORT project used an online feedback controlloop to solve individual tuning issues like load control forlocking, dynamic data placement among others [WE02,WE94]. Automated tuning systems have been proposed[HE97] that employ a feedback control mechanismlayered on top of a target system. A different school ofthought suggests that current database systems have toomany features, unpredictable performance and very low“gain/pain ratio”, and RISC-type simplification ofdatabase functionality is crucial to achieve automatictuning [WE02, CH00].Oracle’s previous solution for performance tuning wasStatspack [ORP8, ORP9, ORSP]. Statspack is a tool thattakes snapshots of database performance statistics andgenerates reports across a pair of snapshots. ThoughStatspack reports reduced the time it took for DBAs todiagnose performance issues [ORSP2], it still left theinterpretation of the report and the actual root-causeidentification to the DBAs. Oracle 9i has other selfoptimizingfeatures like dynamic runtime queryoptimization [ORQ9], self-configuring features for spaceand memory management [DA02, ORM9, EL03]. Referto Section 6 for a description of various Oracle 10g’smanageability features like “SQL Tuning Advisor” and“Segment Advisor”. IBM DB2 version 8.1 has featureslike “Health Center” for database monitoring and“Configuration Assistant” to assist DBAs in databaseconfiguration [IBP8, IBG8]. Similarly, SQL Server 2000provides performance tools like “Index Tuning Wizard”and “SQL Query Analyzer” to help the DBA achieveindividual performance goals and has monitoring toolslike “System Monitor objects” [CH97, MSPD].Commercially available databases have features tomanage some of their sub-components automatically,mechanisms to actively monitor database performance,and tools that make it easier for the DBA to achieve somespecific performance goals. However, none of them (untilOracle 10g) have a performance diagnosis and tuningsolution that automatically identifies the root-causesaffecting total database throughput.3. Automatic Performance DiagnosisPerformance of various components of the database aremeasured using different metrics. For example, theefficiency of the data-block buffer cache is expressed as apercentage in buffer hit-ratio; the I/O subsystem ismeasured using average read and write latencies; thedatabase throughput is measured using transactions-persecond.However, using these metrics, finding theperformance impact of a particular component over thetotal database throughput is extremely hard, if notinfeasible. For example, determining by how much wouldtransactions-per-second decrease if the average I/Olatency becomes twice as long is not trivial.

User sendsa requestUser getsa responseWide area networkApplication ServerLocal area networkOracle DatabaseWall-clock timeTime spent in the Wide area networkTime spent in the Application Server (Middle Tier)Time spent in the Local area networkDatabase time spent processing the user requestFigure 1: Database time from a single user’s perspective.SessionconnectExecuteSQLExecuteSQLExecuteSQLFetchresult..FetchresultsFetchresultsExecuteSQLFetchresultsFetchresultsFetchresultFetchresultsFetchresultsExecuteSQLExecuteSQLSessiondisconnectFetchresultUser 1..User 2User nWall-clock timeDatabase time spent in processing user requestsFigure 2: Database time defined as the sum of the time spent in processing all user requestsOne of the first objectives of this project was to definea common currency that can be used to measure theperformance impact of any component within thedatabase. It then becomes possible to be able to comparethe impact of any two database components. For example,one would then be able to compare the performanceimpact due to an under-sized data-block buffer cache withthe performance impact due to a badly written SQLstatement. For this purpose, we define a measure called‘Database Time’ that would serve as a common currency.3.1 Database TimeDatabase time is defined as the sum of the time spentinside the database processing user requests. Figure 1illustrates a simple scenario of a single user submitting arequest. User’s response time is the time interval betweenthe instant the request is sent and the instant the responseis received. The database time involved in that userrequest is only a portion of that user’s response time thatis spent inside the database.Figure 2 illustrates database time as it is summed overmultiple users and each user is performing a series ofoperations resulting in a series of requests to the database.It can be seen from Figure 2 that database time is directlyproportional to the number and duration of user requestsand can be higher or lower than the corresponding wallclocktime.Using database time as a measure, one should be ableto gauge the performance impact of any entity of thedatabase. For example, the performance impact of anunder-sized buffer cache would be measured as the totaldatabase time spent in performing additional I/O requeststhat could have been avoided if the buffer cache waslarger.

SessionconnectExecuteSQLExecuteSQLExecuteSQLFetchresultFetchresultsFetchresultsFetchresultsFetchresultsFetchresultsFetchresultsExecuteSQLSessiondisconnect. Execute FetchExecute Fetch..SQLresultSQL result .User 1User 2User nSample 1 Sample 2 Sample 3Sample 4 Sample 5Wall-clock timeA single row in the Active Session History table shown belowSample UserNumber NumberOperation Details1 1 Connect Using CPU2 1 Execute Using CPU2 2 Execute Waiting for read I/O of block 5 of Emp table3 n Execute Waiting for lock on block 5 of Emp Table4 2 Fetch Using CPU4 n Execute Waiting for CPUFigure 4: Active Session History Samplesnegligible in some cases, but could be prohibitive inothers.In sections 4.2 to 4.5 we describe the different types ofdata we collect for ADDM analysis. In all cases wedescribe both the purpose of the data and how it conformsto the minimal intrusion principle.4.2 Database Time MeasurementsThe first priority in an ADDM analysis is to establish themain components that consume significant database time.The database server must be instrumented to measure theamount of database time spent in specific operationsADDM is interested in. This measurement is a cumulativenon-decreasing function of time. ADDM analysis isalways performed over a specific time interval called the“analysis period” (e.g., yesterday between 10am and11am). ADDM finds how much time was spent on aspecific operation by subtracting the measurement for thebeginning of the analysis period from the measurementfor the end of the analysis period.We maintain measurements at various granularities soan ADDM analysis can go from the symptoms (e.g.,commit operations consumed significant database time) tothe root cause (e.g., write I/O to one of the log files wasvery slow—possibly a hardware issue). We need tomaintain measurements at the coarse level—time spent inall commit operations. We also need the finergranularity—time spent in write I/O for each log file.Direct measurements can only be done on databaseoperations that usually take significant time to finish. Ifwe try to measure the timing of numerous short operationswe are likely to violate the minimal intrusion principle.The decision about which operations should be measuredshould be based on the cost of measurement (i.e., howmuch time it takes to start and end a timer) and theexpected length and quantity of such operations. Forexample, measuring the total time spent in I/O operationsis reasonable because I/O operations take many orders ofmagnitude more time than starting and ending a timer. Onthe other hand, measuring the time we spend in the criticalsection of a commonly used semaphore would beprohibitively expensive. Our solution to capture shortduration operations is to use sampling. We use bothfrequency based sampling, where we sample oneoperation out of every N occurrences, and time basedsampling which is discussed in detail in the next section.4.3 Active Session HistoryADDM often needs a detailed description of the workloadto be able to find root-causes of problems and give

effective recommendations. It is not possible to collect acomplete system trace of operations since the amount ofdata is very large and it will significantly affect databaseperformance. Our solution is to provide a time basedsample of all activity in the system. We call this sampleddata the “Active Session History” (ASH).ASH is a collection of samples taken at regular timeintervals. Each sample contains information about whatthe database server is doing on behalf of each connecteduser (a.k.a. “session”) at the time of sampling. We onlycollect data for sessions that are actively using thedatabase during the sample time. If a session is waitingfor the next user command, it does not appear in ASH forthe specific sample time.Figure 4 illustrates a workload and the table in thatfigure shows the resulting ASH samples. The samples inASH describe the database operation performed by eachsession in great detail. For example, when a session iswaiting for I/O we can tell which database object is read,what portion of it is read. This amount of detail enablesADDM to drill down to very specific root-causes.The time interval we use for ASH sampling should beat least two orders of magnitude shorter than the timeinterval used as an analysis period. This ensures that wehave at least hundreds of samples of specific sessionstates to analyze. If a specific operation consumessignificant database time (say more than a few percents ofdatabase time) during the analysis period, there is a highprobability that this operation will appear in a significantnumber of samples in ASH. This enables ADDM todiagnose such operations even if we do not measure themdirectly. The ASH sampling frequency should be chosenwith care. If sampling is too frequent, it might violate theminimal intrusion principle. If sampling is infrequent,ADDM may not be able to find the root-cause of aproblem or an effective recommendation, because theremay not be enough samples of the specific operation.ADDM uses ASH for finding root-causes andeffective recommendations. For example, suppose we findfrom exact database time measurements that the chiefsource of contention in the system is on locks that dealwith space allocation to tables. We can use ASH to findfiner granularity of events like “which are the top tablesthat experienced space allocation contention and what arethe types of SQL statements used for such contention?”We might be able to determine that a specific table suffersfrom contention due to INSERT statements andrecommend a partitioning scheme to alleviate theproblem. While time measurements might be able to leadus to the specific type of locks that cause a problem, orsometimes to the specific table, only ASH allows us toperform a cross analysis with the types of SQL statementsthat contended for the locks. Without ASH, ADDM couldnot recommend the specific solution: partition a specifictable.4.4 System ConfigurationWe also collect data that is not numeric in nature or thatcan decrease with time. This data is usually aboutdatabase settings. We need to maintain a full log ofchanges for this kind of data. Fortunately, databasesettings do not change very often, so the cost ofmaintaining a full log of changes does not cost much. Thistype of data can be crucial to giving recommendations forfixing specific problems. Examples of such data are:• Size of memory components (like buffer cache)• Number of CPUs used by the system.• Special query optimizer settings.The exact nature of the data depends on theimplementation of the database server and is outside thescope of this paper.4.5 SimulationsSometimes, estimating the impact of a specific area of thedatabase requires a simulation of various possiblealternatives. For example, we might have significantimpact of read I/O operations because the buffer cache isunder-sized. The database time measurement is the timewe spent on read I/O operations. However, to find thatthe buffer cache is the root-cause we must determine thatwe spent time reading data blocks that were in the buffercache at some point in time and were removed to makeroom for other data blocks. In other words, we need todetermine how many read I/O operations could have beensaved given an infinite buffer cache. Our solution is tosimulate how much database time would be saved whenthe buffer cache is increased to various sizes andrecommend an optimal setting given the resources.5. Database Time in Other ModelsThe definition of database time in Section 3.1 relies on asimple computation model. The model is characterizedby the following limitations:• Single Machine. This means that we do not havea distributed system. Instead, we use a singlehost machine for the database server.• No Background Activity. This means that alldatabase work is done while the user is waitingfor a response.• No Parallel Computation. Every user connectioncorresponds to a single thread of computation inthe database server. This means that work doneon behalf of a user call cannot use more than oneCPU at the same time, wait for more than oneevent at the same time, or wait for an event anduse a CPU at the same time.Most database servers support computation models thatuse distributed systems, parallel queries and backgroundactivity. Therefore, we needed to establish ADDM’s goalswhen each of these assumptions is violated.

5.1 Distributed SystemsThe simplest distributed system has identical components.This means that all the machines that are part of thesystem are identical machines, running the same software,have the same network connections, and use the sametypes of I/O devices. In this simple case, all we need to dois sum the database time observed on all machines to getthe total database time for the distributed system. Everysubdivision of database time (e.g., the time spent waitingfor I/O) is summed across all machines as well.It is a lot more complicated when there are differencesbetween machines. In that case we need to find a commoncurrency between machines and not just between differentareas in the same server. For example, we cannot considerone second of CPU usage on one machine equivalent toone second of CPU usage on another machines if the twoCPUs have different computation speeds. We still sum allthe database times from the different machines becausethis number still corresponds to the total amount timespent by users waiting for a response from the databaseserver. We just need to consider that moving a thread ofcomputation from one machine to another may result inchanges to the total database time. For example, moving aCPU heavy computation from a slow CPU to a fast CPUis going to reduce database time if the two machines arenot CPU bound both before and after the move.5.2 Background ActivityBackground activity is an important component in anOracle server. For example, data blocks do not need to bewritten to disk even after a transaction commits. It isenough that the appropriate log records are written todisk. Therefore, most write I/O to tables and indexes isdone as a background activity; no user waits for thesewrites to complete.We regard background activity as a set of threads ofcomputation. The time spent in these threads is notcounted in database time since no user waits for thesethreads. However, statistics about resource usage by thebackground must be maintained. When ADDM detectsthat a resource is used too much, a possiblerecommendation is to reduce the background activity thatuses the resource. For example, if the I/O sub-system isused so heavily that I/O response time is both very slowand responsible for most of the database time, we mightconsider reducing the frequency of checkpoints.5.3 Parallel ComputationThe first problem we have with parallel computation ishow to measure database time. Parallel computationimplies that a single user connection is responsible formultiple threads of computation, which may use multipleCPUs and wait for multiple events or resources. Sincedatabase time is a measurement of throughput, we mustconsider all time spent by all the threads of computationas part of database time. The only exception is when onethread is waiting for another thread that belongs to thesame user connection. In that case we consider the waittime as idle and we do not add it to database time. Thereason for this exception is that we try to gauge what thedatabase time would be if the computation were serializedwhile maintaining the same execution plan. In that case,all CPU usage and waits for external resources are stillparts of the computation. However, internal wait betweenthreads disappears from the computation.Parallel computations are used when we need toreduce the response time of a request. As is always thecase with parallel algorithms throughput is sacrificed toget better response time. In many cases, the response timeof a specific user request is more important than the totalsystem throughput. ADDM advises the databaseadministrator how much extra database time is spent onparallel computations compared to serial computation. Inthis case we must consult the optimizer to find the bestserial execution plan and compare it to the parallelexecution plan that was actually used. This is a simulationsince we do not execute the serial plan and do not knowthe exact cost in terms of database time.5.4 SummaryWe have seen how distributed systems, backgroundactivity and parallel queries can complicate both thenotion of database time and the goal of an ADDManalysis. These are not the only complications weencountered and solved when implementing ADDM inOracle 10g. However, a complete list is outside the scopeof this paper. The general principle remains the same:find a common currency to measure database throughput,find ways to increase that throughput and improve theperformance of the database as users experience it.6. ADDM’s Role in Self-ManagingDatabasesADDM is a central part of the manageability frameworkfor self-managing a database that was developed in Oracle10g (see [ORM10]). This framework enables acomprehensive tuning solution by providing the necessarycomponents. ADDM is a key component that serves asthe central intelligence for various tuning activities in thesystem, like SQL Tuning [ORQ10] or SpaceFragmentation Analysis [ORS10], since it takes a holisticview of the database system and identifies the top issuesaffecting overall throughput. ADDM itself relies on theframework to a large extent for ready access to theperformance measurements it needs.The manageability solution in Oracle 10g is centeredaround the three phases of the self-managing loop:Observe, Diagnose, and Resolve. Each componentprovided by the framework plays a key role in one ormore of these phases. These phases refer to a particularactivity (e.g.: a SQL tuning cycle), and there could be

AutomaticPerformanceDiagnosisADDMmany such activities occurring concurrently each indifferent phases. Figure 5 and the subsequent sectionsbelow illustrate the relationship between the maincomponents and how they interact with ADDM.6.1 Observe PhaseAWRRecommendationsAuto SnapshotCollectionOracle 10gAdvisorsFigure 5: ADDM and the Self-ManagingDatabase frameworkIn-memoryperf dataApply tosystemThis phase is automatic and continuous in Oracle 10g andprovides the data needed for analysis and feedback as aresult of the actions taken as part of the analysis. Toenable accurate system performance monitoring andtuning it is imperative that the system under considerationexposes relevant measurements and statistics. Themanageability framework allows for instrumentation ofthe code to obtain precise timing information, andprovides a lightweight comprehensive data collectionmechanism to store these measurements for furtheronline or offline analysis.The main component of this phase is the “AutomaticWorkload Repository” (AWR). It is a persistent store ofperformance data for Oracle10g. The database capturessystem and performance data from in-memory viewsevery hour and stores it in AWR. Each collection isreferred to as a snapshot. Any pair of snapshotsdetermines an analysis period that can be used for anADDM analysis. The AWR is self-managing; it acceptspolicies for data retention and proactively purges datashould it encounter space pressure.The snapshot mechanism along with thecomprehensive data it collects solves the “first occurrenceanalysis” requirement. ADDM runs automatically eachtime a snapshot is taken, the period of analysis beingdefined by the two most recent consecutive snapshots.6.2 Diagnose PhaseThe activities in this phase refer to the analysis of variousparts of the database system using the data in AWR or inthe in-memory views. Oracle 10g introduces a set ofadvisors, ADDM being one of them, for analyzing andoptimizing the performance of its respective subcomponents.Of particular note is the fact that all theadvisors provide an impact in terms of Database Time forthe problems they diagnose. This allows for easycomparison across advisors’ results.ADDM consults these advisors during its analysisdepending on the amount of time and resources needed bythese advisors. If an advisor could potentially take asubstantial amount of time for its analysis, ADDMgenerates a recommendation to invoke that specificadvisor instead. The user can then schedule this task whenappropriate.The set of advisors that ADDM can invoke include• SQL Tuning Advisor: tunes SQL statements byimproving the execution plan, by performingcost based access path analysis and SQLstructure analysis.• Segment Advisor: analyses space wastage byobjects due to internal and externalfragmentation.• Memory Advisors: continuously monitor thedatabase instance and auto-tune the memoryutilization between the various memory pools.6.3 Resolve PhaseThe various advisors, after having performed theiranalysis, provide as output a set of recommendations thatcan be implemented or applied to the database. Eachrecommendation is accompanied by a benefit, in databasetime, that the workload would experience should therecommendation be applied. The recommendations maybe automatically applied by the database (e.g., thememory resizing by the memory advisors) or it may beinitiated manually. This is the Resolve phase.Applying recommendations to the system close aniteration of that particular tuning loop. The influence ofthe recommendations on the workload will then beobserved in future performance measurements. Furthertuning loops may be initiated until the desired level ofperformance is attained.7. Experiments and ResultsIt is not an easy task to test and quantify the results ofADDM. The value of ADDM is in helping DBAs managethe performance of their databases. Checking if ADDMmeets the task, we need to survey many customers thatalready adapted Oracle 10g in a production or testenvironment. It is too early after the release of Oracle10g to perform such a survey. However, we provide threeexamples of real ADDM usage in Sections 7.1 to 7.3.

We used various scenarios to gauge the effectiveness ofADDM.Blind tests were performed in the initial testing phase;performance tuning experts were asked to analyse andmake recommendations on running systems. In a majorityof cases ADDM produced comparable results and in somecases ADDM’s recommendations produced greaterbenefit than the experts’ recommendations.As part of the testing performed internal to OracleServer Technologies group we have a number of tests inwhich we introduced known ‘common faults’ andapplication inefficiencies. When ADDM analysis wasperformed on the data captured during these tests ADDMcorrectly identified the fault in all cases.High load stress testing is an integral part of thetesting of the database product at Oracle and the effect ofrunning all of the Oracle 10g manageability features wasmeasured. With both typical customer workloads andhighly tuned industry standard benchmarks the reductionin throughput from enabling all of the manageabilityfeatures was approximately 3%. AWR and ADDM weretwo of the features enabled. ADDM runs performed aftereach AWR snapshot were available in a timely manner,typically under ten seconds.Although Oracle 10g was declared production inJanuary 2004 a number of internal production systemswere running the software for several months before thisdate. ADDM has identified and correctly diagnosed anumber of performance issues in this time on thesesystems.7.1 Experience of QualcommQualcomm Centauri Application was upgraded fromOracle version 8.1.7.4 to 10g RAC. While testing theupgrade they found significant performance problems andtesters reported poor response times. The DBA looked atthe ADDM recommendations which highlighted a SQL(update statement) that was causing over 90% of thesystem load. They then ran SQL Tuning Advisor on thestatement and it recommended the creation of an index. Itwas later found that the recommended index should havebeen in place. The index was missing because a patch tothe application was applied to the production system butnot to the upgraded test system. Identifying the indexmade problem diagnosis easy.7.2 Experience in Oracle’s Bug DatabaseThe Oracle Bug database is used daily by many thousandsof users and was one of the first production systems tomove to Oracle 10g. The system runs on an 8 CPU PA-RISC HP machine. After upgrading to Oracle 10g usersexperienced poor performance. ADDM reported that userswere spending a large proportion of their time waiting forfree buffer waits and recommended examining the I/Osubsystem for write performance (this type of problemhappens when the buffer cache is filled with dirty buffersand faster I/O should solve the problem). When theSystem Administrator looked at the I/O subsystem hefound that the asynchronous I/O was incorrectlyconfigured causing asynchronous I/O requests to fail andthen be performed synchronously leading to extremelyslow IO times.7.3 Experience of Oracle Applications QA TestingAn Applications Development DBA reported that a usersaid that the system was slow. Unfortunately there was notimescale or details given. Investigation was made harderby the fact that the users of the system and the DBAsinvestigating the slowdown where on opposite sides of theworld, 12 time zones apart.Looking at ADDM reports there were a couple of onehour periods in which the time spent in the database wassignificantly higher. Both of these ADDM reports showedthat most of the time was spent in parsing and that theparsing was caused by the application generating largenumbers of SQL statements containing literal stringvalues. (This problem is Oracle’s equivalent of usingmany similar SQL statements instead of a storedprocedure. The cost of such configurations is time spentin parsing, optimizing and compiling the SQL statementsinOracle’s terminology it is called “hard parse”). Therecommendation from ADDM was to modify theapplication to use bind variables rather than literals or tochange a database configuration parameter toautomatically convert the literals into binds. Whenapplication development was approached about the literalusage it was discovered that this QA system was runninga 'known bad' internal build of the application andupgrading to the correct build removed the issue.8. ConclusionIn this paper we described how Oracle 10g offers acomprehensive tuning solution by providing automaticperformance diagnosis and tuning via ADDM. Thisaddresses the ever-increasing performance tuningchallenges currently faced by database administrators.We defined a new measure called Database Time thatallows for comparison of the performance impact ofvarious database components with each other. We alsodescribed what types of performance measurements areneeded for accurate performance diagnosis, as well ashow we obtain them in a manner that has marginal impacton the system. This solution also obviates the need toreproduce a problem in order to diagnose it.ADDM incorporates a holistic view of the system, andby using database time in conjunction with the twodimensionalDBTime-graph it is able to quickly isolatethe root causes of performance bottlenecks affecting thethroughput of the system. ADDM provides specificactionable recommendations with an estimate of thebenefit for alleviating performance problems.

The results of running ADDM on a variety ofworkloads and production systems within Oracledemonstrates the benefits and practicality of our approachfor throughput tuning.AcknowledgementsThe authors would like to thank all the members of theServer Manageability team at Oracle for their valuablefeedback in all stages of this project. We would like tothank Connie Green for providing helpful comments onan earlier version of this paper.References[BE03] D. G. Benoit. Automatic Diagnosis of PerformanceProblems in Database Management Systems, PhD Thesis,Queen’s University, Canada, 2003.[BR94] K. P. Brown, M. Mehta, M.J. Carey, M. Livny: TowardsAutomated Performance Tuning for Complex Workloads. 20thInternational Conference on Very Large Data Bases, Santiago,Chile, 1994.[CH97] S. Chaudhuri, V. Narasayya: An Efficient, Cost-drivenIndex Tuning Wizard for Microsoft SQL Server, 23rdInternational Conference on Very Large Data Bases, Athens,Greece, 1997.[CH00] S. Chaudhuri and G. Weikum: Rethinking DatabaseSystem Architecture: Towards a Self-tuning RISC-styleDatabase System. 26th International Conference on Very LargeData Bases, Cairo, Egypt, 2000.[DA02] B. Dageville, M. Zait: SQL Memory Management inOracle9i, 28th International Conference on Very Large DataBases, Hong Kong, China, 2002.[EL03] S. Elnaffar, W. Powley, D. Benoit, P. Martin: Today’sDBMSs: How Autonomic Are They? Proceedings of the FirstIEEE International Autonomic Systems Workshop, Prague,DEXA 2003.[GA96] Gartner Group: Total Cost of Ownership: The Impact ofSystem Management Tools, 1996.[HE97] J. L. Hellerstein. Automated Tuning Systems: BeyondDecision Support. In the proceedings on ComputerMeasurement Group, 1997[HU02] Hurwitz Group: Achieving Faster Time-to-Benefit andReduced TCO with Oracle Certified Configurations, March2002.[IBG8] IBM Corporation: DB2 Universal Database Version 8Guide to GUI Tools for Administration and Development, IBMCorporation, 2003.[MSPD] Microsoft Corporation: RDBMS Performance TuningGuide for Data Warehousing, Chapter 20, SQL Server 2000Resource Kit.[ORM9] Oracle Corporation: Oracle 9i Database Manageability,Oracle White Paper,http://www.oracle.com/technology/products/manageability/database/pdf/Oracle9iManageabilityBWP.pdf[ORM10] Oracle Corporation: Oracle Database 10g: The Self-Managing Database, Oracle White Paper,http://www.oracle.com/technology/products/manageability/database/pdf/twp03/TWP_manage_self_managing_database.pdf[ORP8] Oracle Corporation: Oracle 8i Designing and Tuning forPerformance Release 2, Oracle 8i Documentation,http://tahiti.oracle.com[ORP9] Oracle Corporation: Oracle 9i Database PerformanceTuning Guide and Reference, Oracle 9i Documentation,http://tahiti.oracle.com[ORQ9] Oracle Corporation: Query Optimization in Oracle 9i,Oracle White Paper,http://www.oracle.com/technology/products/bi/pdf/o9i_optimization_twp.pdf[ORQ10] Oracle Corporation: The Self-Managing Database:Guided Application and SQL Tuning, Oracle White Paper,http://www.oracle.com/technology/products/manageability/database/pdf/twp03/TWP_manage_automatic_SQL_tuning.pdf[ORS10] Oracle Corporation: The Self-Managing Database:Proactive Space and Schema Object Management, OraclePresentation,http://www.oracle.com/technology/products/manageability/database/pdf/ow03p/40170p.pdf[ORSP] Oracle Corporation: Diagnosing PerformanceBottlenecks using Statspack and The Oracle PerformanceMethod, Oracle White Paper,http://www.oracle.com/technology/deploy/performance/pdf/statspack_opm4.pdf[ORSP2] Oracle Corporation: Diagnosing Performance UsingStatspack, Oracle White Paper,http://www.oracle.com/technology/deploy/performance/pdf/statspack.pdf[WE94] G. Weikum, C. Hasse, A. Mönkeberg, P. Zabback: TheCOMFORT Automatic Tuning Project, Information Systems,Vol. 19, No. 5, pp. 381-432, 1994.[WE02] G. Weikum, A. Mönkeberg, C. Hasse, P. Zabback:Selftuning Database Technology and Information Services:From Wishful Thinking to Viable Engineering, 28thInternational Conference on Very Large Data Bases, HongKong, China, 2002.[IBP8] IBM Corporation: DB2 Universal Database Version 8Administration Guide: Performance, IBM Corporation, 2003.

Automatic Performance Diagnosis and Tuning in Oracle

Create successful ePaper yourself

Delete template?

Save as template?