Performance Analysis of Idle Programs - Researcher - IBM

Performance Analysis of Idle ProgramsErik Altman Matthew Arnold Stephen Fink Nick MitchellIBM T.J. Watson Research Center{ealtman,marnold,sjfink,nickm}@us.ibm.comAbstractThis paper presents an approach for performance analysis of modernenterprise-class server applications. In our experience, performancebottlenecks in these applications differ qualitatively frombottlenecks in smaller, stand-alone systems. Small applications andbenchmarks often suffer from CPU-intensive hot spots. In contrast,enterprise-class multi-tier applications often suffer from problemsthat manifest not as hot spots, but as idle time indicating a lackof forward motion. Many factors can contribute to undesirable idletime, including locking problems, excessive system-level activitieslike garbage collection, various resource constraints, and problemsdriving load.We present the design and methodology for WAIT, a tool todiagnosis the root cause of idle time in server applications. Givenlightweight samples of Java activity on a single tier, the tool canoften pinpoint the primary bottleneck on a multi-tier system. Themethodology centers on an informative abstraction of the states ofidleness observed in a running program. This abstraction allows thetool to distinguish, for example, between hold-ups on a databasemachine, insufficient load, lock contention in application code,and a conventional bottleneck due to a hot method. To computethe abstraction, we present a simple expert system based on anextensible set of declarative rules.WAIT can be deployed on the fly, without modifying or evenrestarting the application. Many groups in IBM have applied thetool to diagnosis performance problems in commercial systems,and we present a number of examples as case studies.Categories and Subject Descriptors Software [Software Engineering]:Metrics, Performance MeasuresGeneral TermsPerformanceKeywords bottlenecks, performance analysis, multi-tier serverapplications, idle time1. IntroductionShun idleness. It is a rust that attaches itself to the mostbrilliant metals. Voltaire.This paper addresses the challenges of performance analysisfor modern enterprise-class server applications. These systems runacross multiple physical tiers, and their software comprises manyPermission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full citationon the first page. To copy otherwise, to republish, to post on servers or to redistributeto lists, requires prior specific permission and/or a fee.OOPSLA/SPLASH’10, October 17–21, 2010, Reno/Tahoe, Nevada, USA.Copyright c○ 2010 ACM 978-1-4503-0203-6/10/10. . . $10.00components from different vendors and middleware stacks. Manyof these applications support a high degree of concurrency, servingthousands or even millions of concurrent user requests. They supportrich and frequent interactions with other systems, with no interveninghuman think time. Many server applications manipulatelarge data sets, requiring substantial network and disk infrastructureto support bandwidth requirements.With these requirements and complexities, such applicationsface untold difficulties when attempting to scale for heavy productionloads. In our experience with dozens of industrial applications,every individual deployment introduces a unique set of challenges,due to issues specific to a particular configuration. Any change tokey configuration parameters, such as machine topology, applicationparameters, code versions, and load characteristics, can causesevere performance problems due to unanticipated interactions.Part of the challenge arises from the sheer diversity of potentialpitfalls. Even a single process can suffer from any numberof bottlenecks, including concurrency issues from thread lockingbehavior, excessive garbage collection load due to temporary objectchurn [9, 21, 24], and saturating the machine’s memory bandwidth[24]. Any of these problems may appear as a serializationbottleneck in that the application fails to use multiple threads effectively;however, one must drill down further to find the rootcause. Other problems can arise from limited capacity of physicalresources including disk I/O and network links. A load balancermay not effectively distribute load to application clones. When performancetesting, testers often encounter problems generating loadeffectively. In such cases, the primary bottleneck may be processoror memory saturation on a remote node, outside the system-undertest.Furthermore, many profiling and performance understandingtools are inappropriate for commercial server environments. Manytools [2, 3, 5, 6, 11, 13, 14, 16, 18, 19] rely on restarting or instrumentingan application, which is often forbidden in commercialdeployment environments. Similarly, many organizations will notdeploy any unapproved monitoring agents, nor tolerate any significantperturbation of the running system. In practice, diagnosingperformance problems under such constraints resembles detectivework, where the analyst must piece together clues from incompleteinformation.Addressing performance analysis under these constraints, wepresent the design and methodology of a tool called WAIT. WAIT’smethodology centers on Idle Time Analysis: rather than analyzewhat an application is doing, the analysis focuses on explainingidle time. Specifically, the tool tries to determine the root causethat leads to under-utilized processors. Additionally, the tool mustpresent information in a way that is easily consumable, and operateunder restrictions typical of production deployment scenarios.The main contributions of this paper include:• a hierarchical abstraction of execution state: We present a novelabstraction of concrete execution states, which provides a hier-

Figure 3. From information that includes monitor states and stacksample context inferences, WAIT infers a Wait State for everysample.level details. In this way, the Wait States serve the same purposeas the conventional run states, but provide a richer semantics thathelps identify bottlenecks. Figure 3 gives example inferences ofWait States. For the stack in Figure 4, the analysis would infer theWait State “Network’, as the stack matches the pattern shown in themiddle example of Figure 3.For each stack frame, we compute an abstraction called a Categorythat represents the code or activity being performed by thatmethod invocation. For example, a method invocation could indicateapplication-level activity such as Database Query, ClientCommunication, or JDBC Overhead. We further label each stacksample with a Primary Category which best characterizes the activitybeing performed by an entire stack at the sampled moment intime.The abstract model of activity forms a hierarchy: a Wait Stategives a coarse but meaningful abstraction of a class of behaviors.For more information, one can “drill down” through the model tosee finer distinctions based on Primary Categories, stacks of Categories,and finally concrete stack samples without abstraction. Thishierarchy provides a natural model for an intuitive user interface,which provides a high-level overview and the ability to drill downthrough layers of abstraction to pinpoint relevant details.Tool We have implemented a tool based on the above abstraction,which is deployed as a service within IBM. The tool computesthe abstraction described above based on a simple set of rules,defined declaratively by an expert based on knowledge of commonmethods in standard library and middleware stacks. We presentsome statistics and case studies that indicate the methodology ispractical, and successfully identifies diverse sources of idle time.3. Hub SamplingWAIT relies on samples of processor activity and of the state ofthreads in a JVM. The system typically takes samples from thehub process (e.g., application server) of a multi-tier application, butcan also collect data from any standard Java environment. Despitecollecting no data from the other tiers, information from a hubprocess frequently illuminates multi-tier bottlenecks.We next describe how WAIT collects information from a Javahub, and discuss challenges due to limitations in available data.3.1 Sampling MechanismsWe have found a low barrier to entry to be a first-order requirementfor tools in this space. Many, if not most, potential userswill reject any changes to deployment scripts, root permissions,kernel changes, specific software versions, or specialized monitoringagents. Instead, WAIT collects samples of processor utilization,process utilization, and snapshots of Java activity using builtinmechanisms that are available on nearly every deployed Javasystem. Table 1 summarizes the mechanisms by which the systemcollects data.java/net/SocketInputStream.socketRead0java/net/SocketInputStream.read line 140com/ibm/db2/jcc/t4/Reply.fill line 195com/ibm/db2/jcc/t4/Reply.ensureALayerDataInBuffer line 243com/ibm/db2/jcc/t4/Reply.readDssHeader line 357com/ibm/db2/jcc/t4/Reply.startSameIdChainParse line 1139com/ibm/db2/jcc/t4/T4XAConnectionReply.readXaStartUnitOfWork line 75com/ibm/db2/jcc/t4/T4XAConnection.readTransactionStart line 251com/ibm/db2/jcc/am/Agent.beginReadChain line 314com/ibm/db2/jcc/t4/T4Agent.beginReadChain line 512com/ibm/db2/jcc/am/Agent.flow line 202com/ibm/db2/jcc/am/PreparedStatement.flowExecute line 3472com/ibm/db2/jcc/am/PreparedStatement.executeQuery line 625com/ibm/ws/rsadapter/jdbc/WSJdbcPreparedStatement.executeQuery line 707org/apache/openjpa/lib/jdbc/DelegatingPreparedStatement.executeQuery line 264org/apache/openjpa/lib/jdbc/LoggingConnectionDecorator$PreparedStatement.executeQueryorg/apache/openjpa/lib/jdbc/DelegatingPreparedStatement.executeQuery line 262org/apache/openjpa/jdbc/kernel/JDBCStoreManager$CancelPreparedStatement.executeQueryorg/apache/openjpa/lib/jdbc/DelegatingPreparedStatement.executeQueryorg/apache/openjpa/jdbc/sql/SelectImpl.executeQueryorg/apache/openjpa/jdbc/sql/SelectImpl.execute line 362org/apache/openjpa/jdbc/kernel/JDBCStoreManager.getInitializeStateResult line 503org/apache/openjpa/jdbc/kernel/JDBCStoreManager.initializeState line 298org/apache/openjpa/jdbc/kernel/JDBCStoreManager.initialize line 278com/ibm/ws/persistence/jdbc/kernel/WsJpaJDBCStoreManager.initialize line 144org/apache/openjpa/kernel/DelegatingStoreManager.initializeorg/apache/openjpa/kernel/ROPStoreManager.initialize line 55org/apache/openjpa/kernel/BrokerImpl.initialize line 911org/apache/openjpa/kernel/BrokerImpl.find line 871org/apache/openjpa/kernel/DelegatingBroker.findorg/apache/openjpa/persistence/EntityManagerImpl.findcom/ibm/ws/jpa/management/JPATxEmInvocation.find line 216com/ibm/ws/jpa/management/JPAEntityManager.find line 185org/apache/geronimo/samples/daytrader/ejb3/TradeSLSBBean.getQuote line 400org/apache/geronimo/samples/daytrader/ejb3/EJSRemote0SLTradeSLSBBean_8ae41722.getQuoteorg/apache/geronimo/samples/daytrader/ejb3/_TradeSLSBRemote_Stub.getQuoteorg/apache/geronimo/samples/daytrader/TradeAction.getQuote line 376com/ibm/_jsp/_displayQuote._jspService line 92com/ibm/ws/jsp/runtime/HttpJspBase.service line 98javax/servlet/http/HttpServlet.service line 831com/ibm/ws/cache/servlet/ServletWrapper.serviceProxied line 307com/ibm/ws/cache/servlet/CacheHook.handleFragment line 429com/ibm/ws/cache/servlet/CacheHook.handleServlet line 247com/ibm/ws/cache/servlet/ServletWrapper.service line 259com/ibm/ws/webcontainer/servlet/ServletWrapper.service line 1539com/ibm/ws/webcontainer/servlet/ServletWrapper.service line 1471com/ibm/ws/webcontainer/filter/WebAppFilterChain.doFilter line 98com/ibm/ws/webcontainer/filter/WebAppFilterChain._doFilter line 77com/ibm/ws/webcontainer/filter/WebAppFilterManager.doFilter line 812com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest line 825com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest line 459com/ibm/ws/webcontainer/servlet/ServletWrapperImpl.handleRequest line 175com/ibm/wsspi/webcontainer/servlet/GenericServletWrapper.handleRequest line 121com/ibm/ws/jsp/webcontainerext/AbstractJSPExtensionServletWrapper.handleRequest line 130com/ibm/ws/webcontainer/webapp/WebAppRequestDispatcher.include line 654org/apache/jasper/runtime/JspRuntimeLibrary.include line 1045org/apache/jasper/runtime/JspRuntimeLibrary.include line 1006com/ibm/_jsp/_quote._jspService line 77com/ibm/ws/jsp/runtime/HttpJspBase.service line 98javax/servlet/http/HttpServlet.service line 831com/ibm/ws/cache/servlet/ServletWrapper.serviceProxied line 307com/ibm/ws/cache/servlet/CacheHook.handleFragment line 429com/ibm/ws/cache/servlet/CacheHook.handleServlet line 247com/ibm/ws/cache/servlet/ServletWrapper.service line 259com/ibm/ws/webcontainer/servlet/ServletWrapper.service line 1539com/ibm/ws/webcontainer/servlet/ServletWrapper.service line 1471com/ibm/ws/webcontainer/filter/WebAppFilterChain.doFilter line 98com/ibm/ws/webcontainer/filter/WebAppFilterChain._doFilter line 77com/ibm/ws/webcontainer/filter/WebAppFilterManager.doFilter line 812com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest line 825com/ibm/ws/webcontainer/servlet/ServletWrapper.handleRequest line 459com/ibm/ws/webcontainer/servlet/ServletWrapperImpl.handleRequest line 175com/ibm/wsspi/webcontainer/servlet/GenericServletWrapper.handleRequest line 121com/ibm/ws/jsp/webcontainerext/AbstractJSPExtensionServletWrapper.handleRequest line 130com/ibm/ws/webcontainer/webapp/WebAppRequestDispatcher.include line 654org/apache/geronimo/samples/daytrader/web/TradeServletAction.requestDispatch line 729org/apache/geronimo/samples/daytrader/web/TradeServletAction.doQuotes line 583org/apache/geronimo/samples/daytrader/web/TradeAppServlet.performTask line 112org/apache/geronimo/samples/daytrader/web/TradeAppServlet.doGet line 78javax/servlet/http/HttpServlet.service line 711Figure javax/servlet/http/HttpServlet.service 4. Part of a 105-deep line 831 call stack, from the DayTrader benchmark(a configuration of which is now part of the DaCapo Suite).com/ibm/ws/cache/servlet/ServletWrapper.serviceProxied line 307com/ibm/ws/cache/servlet/CacheHook.handleFragment line 429com/ibm/ws/cache/servlet/CacheHook.handleServlet line 247com/ibm/ws/cache/servlet/ServletWrapper.service line 259com/ibm/ws/webcontainer/servlet/ServletWrapper.service line 1539com/ibm/ws/webcontainer/servlet/ServletWrapper.service line 1471com/ibm/ws/webcontainer/filter/WebAppFilterChain.doFilter line 98

data UNIX Windowsmachine utilization vmstat typeperfprocess utilization ps tasklistJava state kill -3 sendsignalTable 1. The built-in mechanisms we use to sample the Java hub.Note that kill -3 does not terminate the signaled process, and 3is the numeric code for SIGQUIT.sampling interval (seconds)throughput slowdown1 59%5 24%10 13%20 8%30 2%1000 unmeasurableTable 2. Throughput perturbation versus sampling interval, from adocument processing system running IBM’s 1.5 JVM.The system can also produce meaningful results with partialdata. In practice, data sometimes arrives corrupted or prematurelyterminated, due to a myriad of problems. For example, the targetmachine may run out of disk space while writing out data, the targetJVM may have bugs in its data collection, or there may be simpleuser errors. If any of the sources of data described are incomplete,the system will produce the best possible analysis based on the dataavailable.Processor Utilization Most operating systems support non-intrusiveprocessor utilization sampling. We attempt to collect time series ofprocessor utilization at three levels of granularity: (1) for the wholemachine, (2) for the process being monitored, and (3) for the individualthreads within the process. For example, on UNIX platformswe use vmstat and ps to collect this data.Java Thread Activity To monitor the state of Java threads, werely on the support built into most commercial JVMs to dump“javacore” files. Our implementation currently supports (parses)the javacore format produced by IBM JVMs, and contains preliminarysupport for the HotSpot JVM. This data, originally intendedto help diagnose process failures and deadlock, can be sampled byissuing a signal to a running JVM process. 1 Upon receiving thissignal, the JVM stops running threads, and then writes out the informationspecified in Figure 2. The JVM forces threads to quiesceusing the same “safepoint” mechanism that is used by other standardJVM mechanisms, such as the garbage collector.IBM JVMs can produce javacore samples with fairly low perturbation.For a large application with several hundred threads withdeep call stacks, writing out a javacore file can pause the applicationfor several hundred milliseconds. As long as samples occurinfrequently, writing javacores has a small effect on throughput,but an unavoidable hit on latency for requests in flight at the momentof sample. Table 2 shows that samples taken once every 30seconds result in a 2% slowdown. For server applications, whereoperations repeat indefinitely, perturbation can be dialed down aslow as needed, at least concerning throughput.When the hub of a multi-tier application spans multiple processes,possibly running on multiple machines, we currently chooseone at random. In the future, we will tackle multi-process environ-1 There are some nominal differences between this format and that providedby a Sun JVM. Also, Sun provides a finer granularity for data acquisition,via the jstack, jstat, and related family of commands.ments more generally, including cloud environments. However, wenote that in many enterprise workloads, the bottlenecks facing allJVMs are similar, and thus our approach provides substantial benefiteven now.3.2 Difficulties with Thread StatesThe run states provided by the JVM and operating system are ofteninconsistent or imprecise, due to several complications. The firstproblem is that many JVM implementations quiesce threads at safepointsbefore dumping the javacore. Threads that are already quiesced(e.g., waiting to acquire a monitor) will be reported correctlyas having a conventional run state of Blocked. However, any threadthat was Runnable before triggering the dump will be reported tohave a false run state of CondWait, since the thread was stopped bythe JVM before writing the javacore file.The boundary between the JVM and the operating system introducesfurther difficulties with thread run states. The JVM and OSeach track the run state of a thread. The JVM may think a threadis Blocked, while the OS reports the same thread Runnable, in themidst of executing a spinlock. Spinning is sometimes a detail outsidethe JVM’s jurisdiction, implemented in a native library calledby the JVM. Similarly, the JVM may report a thread in a Cond-Wait state, even though the thread is executing system code such ascopying data out of network buffers or traversing directory entriesin the filesystem implementation.Even if conventional run states were perfectly accurate, theyoften help little in diagnosing the nature of a bottleneck. Considerthe conventional CondWait run state. One such thread may bewaiting at a join point, in a fork-join style of parallelism. Anotherthread, with the same CondWait run state, may be waiting for datafrom a remote source, such as a database. A third such thread maybe a worker thread, idle only for want of work.For these reasons, we instead compute on a richer thread stateabstraction that distinguishes between these different types ofstates.4. A Hierarchical Abstraction of Execution StateWAIT’s analysis maps concrete program execution states into anabstract model, designed to illuminate root causes of idle time. Inthis Section, we describe the concepts in the abstraction. Section 5later describes how the analyzer computes the abstraction, andhow details of the abstraction hierarchy arise from a declarativespecification.The analysis maps each sampled thread into an abstract state,which consists of a pair of two elements called the Wait State anda stack of Categories. A Wait State encapsulates the status of athread regarding its potential to make forward progress, while thea Category represents the code or activity being performed by aparticular method invocation.We next describe the hierarchical abstraction in more detail.4.1 The Wait State AbstractionThe Wait State abstraction groups thread samples, assigning eachsample a label representing common cases of forward progress (orthe lack thereof). Figure 5(a) shows the the hierarchy of Wait Stateswhich cover all possible concrete thread states. The analysis mapseach concrete thread sample into exactly one node in the tree.The hierarchy has proven to be stable over time; we have notneeded to add additional states beyond those shown in the hierarchyof Figure 5(a), even as usage of the WAIT tool has expanded toincluded diverse workloads.At the coarsest level, the Wait State of a sampled thread indicateswhether that thread is currently held up or making forwardprogress: Java threads may be either Waiting or Runnable. A third

# ! (a) The Wait State Tree " !"$($" ##&' $ % %(b) The Category TreeFigure 5. Every concrete state, a sampled thread, maps to a pair of abstract states: a Wait State and a stack of Categories (one per frame).Every frame in a call stack has a default name based on the invoked package; e.g. com.MyBank.login() would be named MyBank Code.possibility covers a thread executing native (non-Java) code that thetool fails to characterize, in which case the thread is assigned WaitState Native Unknown.For Java threads, the analysis partitions Waiting and Runnableinto finer abstractions, which convey more information regardingsources of idle time. For example, a Waiting thread might be waitingfor data from some source (Awaiting Data), blocked on lockcontention (Contention), or have put itself to sleep (Sleeping). Asshown in the Figure, finer distinctions are also possible. Considera Sleeping thread: this could be part of a polling loop that containsa call to Thread.sleep (Poll); it could be the join in a fork-joinstyle of parallelism (Join); or it could be that the thread is a workerin a pool of threads, and is waiting for new work to arrive in thework queue (Awaiting Notification).We claim that in many cases, distinctions in Wait States givea good first approximation of common sources of idle time inserver applications. Furthermore, we claim that differences in WaitStates indicate fundamentally different types of problems that leadto idle time. A server application suffering from low throughputdue to insufficient load would have many threads in the AwaitingNotification state. The solution to this problem might, for example,be to tune the load balancer. A system that improperly usesThread.sleep suffers from a problem of a completely differentnature. Similarly, having a preponderance of threads waiting fordata from a database has radically different implications on the systemhealth than many threads, also idle, suffering from lock contention.Thus, the Wait State gives a high-level description of the rootcause of idle time in an application. The second part of the abstrac- ! Figure 6. The call stack is mapped to a stack of Categories, fromwhich a primary Category (Database Communication) is chosen.tion, the Category stack, gives a finer abstraction for pinpointingroot causes.4.2 The Category AbstractionThe Category abstraction assigns each stack frame a label representingthe code or activity being performed by that method invocation.Category names provide a convenient abstraction that summarizesnested method invocations that implement larger units offunctionality [10].Note that since each stack frame maps to a Category, each stackwill contain representatives from several Categories. To understandbehavior of many stacks at a glance, it is useful to assign eachstack a primary Category, which represents the Category whichprovides the best high-level characterization of the activity of theentire stack. For example, in Figure 4, the JDBC Category wouldbe chosen as the primary Category, based on priority logic that determinesthat the JDBC Category label conveys more information

com/mysql/jdbc/ConnectionImpl.commit ⇒ Database Commit.../XATransactionWrapper.rollback ⇒ Database Rollback.../SQLServerStatement.doExecuteStatement ⇒ Database Query.../WSJdbcStatement.executeBatch ⇒ Database Batch.../OracleResultSetImpl.next ⇒ Database CursorFigure 9. Rules that define aspects of the Database Category.clusters table in Figure 8 reveals that cluster c1 occurred 3592times across all application samples. Viewing table blocked in Figure8 indicates that in the first and last application samples, clusterc1 was waiting to enter the critical section guarded by monitor m2.Viewing the owned by table in turn reveals that m2 was owned bycluster c2. In other words, thread stack c1 was blocked on a monitorheld by thread stack c2.5.2 Category AnalysisWAIT relies on a simple pattern-matching system, driven by aset of rules characterizing well-known methods and packages, todetermine the Category label for each stack frame. The systemrelies on a simple declarative specification of textual patterns thatdefine Categories.The declarative rules that define the Category analysis definetwo models. The first model is a Category Tree, such as the oneshown in 5(b). A Category Tree provides the namespace, inheritancestructure, and prioritization of the Category abstractions thatare available as method labels. The second model is a set of rules.Each rule maps a regular expression over method names to a nodein the Category Tree. For example, Figure 9 shows rules that, inpart, define Database activity. The rules distinguish between fiveaspects of database activity: queries, batch queries, commits, rollbacks,iteration over result sets. This example illustrates how it iseasy to define a Category Tree that is more precise than the oneshown in 5(b).Given these rules, the Category analysis is simple and straightforward.The analysis engine iterates over every frame of everycall stack cluster, looking for the highest-priority rule thatmatches each frame. As described in Section 4, every method hasan implicit Category, its package which is assigned to the Category’sCode Nickname. Thus, if no Category rule applies to aframe, then we form an ad hoc Category for that frame: a methodP1/P2/P3/.../Class.Method receives the Code Nickname P2Code.5.3 Wait State AnalysisIn addition to inferring Categories, WAIT infers a Wait State asillustrated in Figure 5(a). Our analysis to infer Wait States combinesthree sources of information: processor utilization, the concretedata model previously presented in Figure 8, and rules basedon method names. The rules over method names, in some cases,require the inspection of multiple frames in stack. This differs fromthe Category analysis, where each frame’s Category is independentof other frames.The main challenge in using method names to infer a Wait Stateconcerns handling imperfect knowledge of an application’s state.As discussed in Section 3.2, the true Wait State of a sampled threadis, in many cases, not knowable. To fill this knowledge gap, weuse expert knowledge about the meaning of activities, based onmethod names. Fortunately, many aspects of Wait States dependon the meaning of native methods, and the use of native methodsdoes not vary greatly from application to application. The designof Java seems actively to discourage the use of native methods.Nevertheless our conclusions are always subject to the inevitablehole in the rule set.The algorithm proceeds as a sieve, looking for the Wait Statethat can be inferred with the most certainty. The algorithm usesdata from the concrete data model presented in Figure 8, as wellas a set of rules over method names, specified declaratively withpatterns (analogous to the Category Analysis).The Wait State of a given call stack cluster c at sample index iis the first match found when traversing the following conditions,in order:1. Deadlock: if this stack cluster participates in a cycle of lockcontention in the monitor graph; i.e., there is a cycle in theBlocked and Owned By relations.2. Lock Contention: if the stack cluster, at the moment in time ofsample i, has an entry in the Blocked relation.3. Awaiting Notification: if the stack cluster, at the moment intime of sample i, has an entry in the Waiting relation.4. Spinlock: if the Wait State rule set defines a method in cthat with high certainty, implies the use of spinlocking. Manymethods in the java.util.concurrent library fall in thishigh-certainty category.5. Awaiting Data from Disk, Network: if the rule set matchesc as a use of a filesystem or a network interface. Most suchrules need only inspect the leaf invocation of the stack, e.g. asocketRead native invocation is a very good indication thatthis stack sample cluster is awaiting data from the network. 2In some cases, requests for data are dispatched to a “stub/tie”method, as is common in LDAP or ORB implementations.6. Executing Java Code: if the method invoked by the top of thestack is not a native method, then c is assumed to be Runnable,executing Java code.7. Executing Native Code: if the method invoked by the top of thestack is a native method, and the rule set asserts that this nativemethod is truly running, then we infer that the native method isRunnable. We treat native and Java invocations asymmetrically,to increase robustness. A Java method, unless it participatesin the monitor graph, is almost certain to be Runnable. Thesame cannot be said of native methods. In our experience,native methods more often than not, serve the role of fetchingdata, rather than executing code. Therefore, we require nativemethods to be whitelisted in order to be considered Runnable.8. JVM Services: if c has no call stack, it is assumed to be executingnative services. Any compilation and Garbage Collectionthreads, spawned by the JVM itself, fall into this category. Eventhough these call stack samples have no call stacks, and unreliablethread states, they participate in the monitor graph. Thus,unless they are observed to be in a Contention or Awaiting Notificationstate, we assume they are Runnable, executing JVMServices.9. Poll, IOWait, Join Point: if there exists a rule that describesthe native method at the top of the stack as one of these variantsof Sleeping.10. NativeUnknown: any call stack cluster with a native methodat the top of the stack and not otherwise classified is placedinto the NativeUnknown Wait State. This classification is incontrast to call stack clusters with Java leaf invocations, whichare assumed to be Runnable. For robustness, the algorithmrequires call stacks with native leaf invocations to be specifiedby rules to be in some particular Wait State. This allows toolusers to quickly spot deficiencies in the rule set. In practice, we2 This is likely, but not an absolute certainty. The JVM and operating systemspend some time copying network buffers, etc., as part of fetching remotedata. We are currently working to incorporate native stacks into the rule set,will allow for a more refined Wait State inference.

java/net/SocketInputStream.socketRead0 ⇒ Networkcom/ibm/as400/NativeMethods.socketRead ⇒ Networkorg/apache/.../OSNetworkSystem.write ⇒ Networksun/nio/ch/SocketChannelImpl.write0 ⇒ Networkjava/lang/Object.wait ⇒ %Condvarjava/util/concurrent/locks/ReentrantLock ⇒ %Condvar(a) Declarative Rules Based on One Leaf Invocation%Condvar & ObjectInputStream.readObject ⇒ Network(b) A Declarative Rule Based on Several Frames in a StackFigure 10. Some example Wait State rules. (a) Simple rules specifya Wait State (e.g., Network) or an auxiliary tag (e.g.,%CondVar)based on a single method frame. (b) Complex rules can use conjunctionsof antecedents to match stacks which satisfy multipleconditions at different layers in the stack.Wait State Rule# RulesWaiting on Condition Variable 26Native Runnable 22Awaiting Data from Disk,Network 16Spinlock 12Table 3. The 76 Wait State rules, grouped according to Wait State.Only 5 of these rules use semantics from outside the StandardLibrary and Apache Commons.have found that our handling of native methods is robust enoughthat this state fires rarely.Rules for Wait States The syntax for declaring Wait State rulesis more general than that for Category rules, which depend on exactlyone method name. In particular, rules can specify antecedentswhich depend on a conjunction of frame patterns appearing togetherin a single stack, as illustrated in Figure 10.For convenience, the rules engine allows a declarative specificationof tags, which are auxiliary labels which can be attached toa stack to encode information consumed by other rules which produceWait States.Figure 10(b) shows an example that matches against twoframes, and relies on an auxiliary tag (%CondVar) which labelsvarious manifestations of waiting on a condition variable.5.4 Rule CoverageA key hypothesis in any rule-based system is that a stable, andhopefully small, set of rules can achieve good coverage on range ofdiverse inputs. This section evaluates this hypothesis for the WAITrule-based analysis.WAIT has been live within IBM for 10 months, and over thattime has received over 1,200 submissions, 830 of which were submittedby 151 unique users who were not involved in any way withWAIT development. The submissions came from multiple divisionswithin IBM and cover a wide range of Java applications, includingdocument processing systems, e-commerce systems, business analytics,software configuration and management, mashups, objectcaches, and collaboration services. To handle these reports we haveencoded a total of 76 Wait State rules.This set of rules has proven to be extremely stable. When ourtool sees code from a new framework, these rules must change onlyto the extent that the framework interacts with the world outside ofJava in new ways. In practice, we rarely see new flavors of crossingsbetween Java application code and the native environment. In thecase of a new cryptographic library, or a new in-Java cachingframework, none of these Wait State rules need change.Category(a) Rules, by Category# RulesDatabase 72Administrative 59Client Communication 41Disk, Network I/O 46Waiting for Work 30Marshalling 30JEE 22Classloader 13Logging 12LDAP 6(b) Database, by providerCategory# RulesDB2 18MySQL 14Oracle 12Apache 8SqlServer 6Table 4. It typically takes only a small number of rules to coverthe common Categories.# Thread Category Package# Reports Stacks Rule NicknameWAIT team 378 605,460 87.7% 12.2%External users 830 1,391,033 77.3% 22.7%Total 1208 1,996,493 80.4% 19.5%Table 5. Rule coverage statistics on WAIT reports submitted. Datashows what percent of thread stacks are labeled explicitly by theCategory rules, vs falling back to nickname based on the Javapackage.For the Category analysis we have found that only a small numberof rules are necessary to capture a wide range of Categories.Table 4a characterizes most of the 387 Category rules that we havecurrently defined. For example, our current rule set covers five commonJDBC libraries, including IBM DB2 and Microsoft SqlServer,with only 72 rules. The number of rules specific to a particularJDBC implementation lies on the order of 10–20, as shown in Table4b. We have also found the rules to be stable across versions ofany one implementation. For example, the same set of rules coversall known versions and platforms of the DB2 JDBC driver thatwe have tested. This testing includes three versions of the code andfour platforms.It is difficult to prove that the existing rules are “sufficient”other than to observe the eventual success of the tool within IBM.One concrete metric to assess rule coverage is to observe howoften a thread stack does not match any Category, thus defaultsto a nickname based on the Java package. Table 5 presents thismetric for the WAIT submissions we have received so far. The 1208submissions contain 1,996,493 thread samples; 80.5% of thesestacks are labeled by our Category analysis. The remaining 19.5%fall back to being named by package. Ignoring the 378 submissionsby WAIT team members, the coverage drops only slightly to 77.3%of stacks being labeled by rules, and 22.7% by package name.6. The WAIT ToolBased on the abstractions and analyses presented throughout thepaper, we have implemented WAIT as a software-as-a-service deployedin IBM. In this section, we describe the overall design ofthe tool, present the user interface, and discuss some implementationchoices.

6.1 WAIT Tool ArchitectureWe designed WAIT with a primary requirement of a low barrier toentry: the tool must be simple and easy to use. Any complex installprocedure or steep learning curve will inhibit adoption, particularlywhen targeting large commercial systems deployments.To meet this goal, we implemented WAIT as a service. UsingWAIT involves three steps:1. Collect one or more javacores. This can be done manually, or byusing a data collection script we provide that collects machineutilization and process utilization, as discussed in Section 3.1.2. Upload the collected data to the WAIT server through a webinterface.3. View the report in a browser.A service architecture offers the following advantages:• Zero-install. The user can use WAIT without having to installany software.• Easy to collaborate. A WAIT report can be shared and discussedby forwarding a single URL.• Incrementally refined knowledge base. By having access to thedata submitted to the WAIT service, the WAIT team can monitorthe reports being generated and continually improve theknowledge base when existing rules prove insufficient. This incrementalrefinement approach has proved particularly beneficialduring the early stages of development and allowed us torelease the tool far earlier than would have been possible with astandalone tool.• Cross-report analysis. Having access to a large number of reportsallows looking for trends that may not stand out clearly ina single report.The main disadvantage of a service-based tool is that it requiresa network connection to the server, which is sometimes not availablewhen diagnosing a problem at a customer site. There may alsobe privacy concerns with uploading the data to a central server, althoughthese concerns are mitigated by a server behind a corporatefirewall. One IBM organization has deployed a clone WAIT serviceon their own server, to satisfy more strict privacy requirements.6.2 User InterfaceFigure 11 shows a screenshot of a WAIT report being viewed inMozilla Firefox. The report is intended to be scanned from top tobottom, as this order aligns with the logical progression of questionsan expert would likely ask when diagnosing a performanceproblem.Activity Summary The top portion of the report present a highlevelview of the application’s behavior. The pie charts on theleft present data averaged over the whole collection period, whiletimelines on the right show how the behavior changed over time.The top row shows the machine utilization during the collectionperiod, breaking down the activity into four possible categories:Your Application (the Java program being monitored), GarbageCollection, Other Processes, and Idle. This overview appears inthe WAIT report first because it represents the first property oneusually checks when debugging a performance problem. In thisparticular report, the CPU utilization drops to zero roughly 1/3 ofthe way through the collection period, a common occurrence whenproblems arise in a multi-tier application.The second and third rows report the Wait State (described inSection 4.1) of all threads found running in the JVM. The secondrow shows threads that are Runnable, while the third row showsthreads that are Waiting. Each bar in the timeline represents the datafrom one Javacore. This example shows as many as 65 Runnablethreads for the first 8 javacores taken, at which point all runnableactivity ceased, and the number of Waiting threads shot up to 140,all in Wait State Delayed by Remote Request. 3Skimming the top portion of the WAIT report enables a user toquickly see that CPU idle time follows from threads backing up ina remote request to another tier.Category Viewer The lower left hand pane of the WAIT reportshows a breakdown of most active Categories (technically, primaryCategories from Section 4.2) executing in the application. Clickingon a pie slice or bar in the above charts causes the Category paneto drill down, showing the Category breakdown for the Wait Statethat was clicked.This report shows that all but one of the threads in Wait StateDelayed by Remote Request were executing Category Getting Datafrom Database. This indicates that the source of this problem stemsfrom the database becoming slow or unresponsive. WAIT does notcurrently support further detailed diagnosis of problems from thedatabase tier itself; WAIT’s utility stems from the ease with whichthe user can narrow down the problem to the database, withouthaving even looked at logs from the database machine.Stack Viewer Glancing at the commonly occurring Wait Statesand Category activity often suffices to rapidly identify bottlenecks;however, WAIT provides one additional level of drilldown. Selectinga bar in the report opens a stack viewer pane to display all callstacks that match the selected Wait State and Category. Stacks aresorted by most common occurrence to help identify the most importantbottlenecks. Having full stack samples available has provenvaluable not only for understanding performance problems, but forfixing them. The stacks allow mapping back to source code withfull context and exact lines of code where the backups are occurring.Passing this information on to the application developers isoften sufficient for them to identify a fix.The presence of thread stacks makes WAIT useful not onlyfor analyzing waiting threads, but also for identifying program hotspots. Clicking on the Runnable Threads pie slice causes the StackViewer to display the most commonly occurring running threads.Browsing the top candidates often produces surprising results suchas seeing “logging activity” or “date formatter” appear near the top,suggesting wasted cycles and easy opportunities for streamliningthe code.Discussion The WAIT user interface takes an intentionally minimalisticapproach, striving to present a small amount of semanticallyrich data to users rather than overloading them with mountainsof raw data. Although the input to WAIT is significantly lighterweight and more sparse than of many other tools, particularly thosebased on tracing, we have found that WAIT is often more effectivefor quick analysis of performance problems.The pairing of WAIT’s analyses together with drilldown tofull stack traces has proven to be a powerful combination. TheWAIT abstractions guide the user’s focus in the right direction, andpresents a set of concrete thread stacks that can be used to confirmthe hypotheses. If the WAIT analysis is incorrect, or is simply nottrusted by a skeptical user, viewing the stacks quickly confirms ordenies their suspicion.3 This label corresponds to the Awaiting Data node in Figure 5(a). TheUI does not always present the same text as used to define the underlyingmodel, but the mapping is straightforward. For the remainder of this paper,when discussing the UI, we simply refer to text labels as presented in theUI.

Figure 11. Example WAIT Report

applicationsamplesthreadssampled zipped input clustered outputA1 1 228 156kB 50kBA2 1 206 447kB 40kBA3 6 90 112kB 22kBA4 10 356 449kB 15kBA5 499 27,638 27MB 56kBA6 946 127,536 120MB 499kBTable 6. Size of the data model communicated from server to theclient’s browser. This data is a representative sample from actualIBM customer data that was submitted, by support personnel, tothe WAIT service.Analysis time (milliseconds)Firefox 3.6 Safari 4.05A1 140 23A2 90 16A3 53 13A4 68 28A5 560 204A6 720 110Table 7. Execution time for the inference engine, as Javascript ona 2.4GHz Intel Core2 Duo with 4GB of memory, on MacOS 10.6.2.The applications, A1–A6, are the same as those in Table 6.6.3 ImplementationWAIT is coded in a combination of Java and Javascript. The ETLstep of parsing the raw data and producing the data model (Section5.1) runs in Java and executes on the server once, when a reportis created. The remaining analyses (Wait State analysis and Categoryanalysis) run in Javascript and execute in the browser eachtime a report is loaded. This somewhat controversial design allowsusers to modify the rules or select alternate rules configurationswithout a round trip to the server. This design also allows WAITreports, once generated, to be viewed in headless mode without aserver present; the browser can load the report off a local disk andmaintain full functionality of the report.This design presents two performance constraints on the WAITanalysis: (1) the data models must be sufficiently small to avoidexcessive network transfer delays, and (2) the analysis written inJavascript must be fast enough to avoid intrusive pauses, and moreoveravoid Javascript timeouts. For example, Firefox 2.5 warns theuser that something might be wrong with the page after roughlyfive seconds of Javascript execution.Table 6 reports data model sizes for six representative IBM customerapplications, labeled A1–A6. The clustering and optimizationsteps described in Section 5.1 produce a significant reductionin space for the data model. For larger inputs, the clustered datamodel is multiple orders of magnitude smaller than the zipped rawdata input files. The largest submission received to date contained946 samples (Javacores), with a total of 127,536 stack samples. Thedata model for this report was still a manageable 499kB. More typically,we have found that the clustered output consumes fewer than50kB.The performance of the WAIT analyses is reasonable, eventhough executed within the browser. Table 7 shows the analysisruntime for the applications from Table 6. The table shows that,even with a large number of application samples (e.g. the 946 ofapplication A6), the analysis completes in less than one second. Asdiscussed later in Section 7, most typical uses of the tool have inputswith a smaller number of samples, more similar to applicationsA1 through A4. For these typical cases, the analysis usually completesin less than 100ms. In general, analysis takes less time thangeneral widget-related rendering work.7. Case StudiesWe now show six additional case studies to demonstrate how WAITilluminates various performance problems. All these case studiesrepresent real problems discovered when analyzing IBM systems.To conserve space, we show only the relevant portions of the UIfor each case study.7.1 Lock Contention and DeadlockFigure 12 depicts a WAIT report on a 48-core system. The WaitingThreads timeline shows a sudden and sustained surge of Blockedon Monitor; i.e. threads seeking a lock and not receiving it, andthus being blocked from making progress. Looking at the Categorybreakdown suggests that lock contention comes from miscellaneousAPACHE OPENJPA Code. The thread stacks from this categoryidentify the location of the lock contention as line 364 ofgetMetaDataLocking().Sometimes locking issues go beyond contention to the point ofdeadlock or livelock. When WAIT detects a cycle in the monitorgraph, their Wait State is Blocked on Deadlock. This situation appearsin Figure 13. Looking at the thread stacks at the lower right ofthe report suggests that the threads are waiting for a lock in the logginginfrastructure method SystemOutStream.processEvent()line 277. Armed with this information, a programmer could look atthe code and try to determine the reason for the deadlock.7.2 Not Enough LoadThe WAIT report in Figure 14 shows that the machine was 67%idle, and the timeline confirms that utilization was consistently lowduring the entire monitoring period. The Waiting Threads piechartindicates that threads spend most of their time Delayed by RemoteRequest. Digging more deeply, the Category view indicates that theremote request on which the threads are delayed is client communicationusing an HTTP protocol. In this case, the performance problemis not with the server machine, but that the amount of databeing received from the client machine is not sufficient to keep theserver busy. Indeed it is possible that there is no problem in thissituation, other than that the server may be over provisioned forthis workload. WAIT cannot directly determine whether the clientis generating a small amount data or if the network between theclient and server is under-provisioned to handle the request traffic.To answer this question, a network utility such as netstat couldbe employed, or the CPU utilization of client machines could beinvestigated.7.3 Memory LeakWAIT can also detect memory leaks. As shown in Figure 15, amemory leak can be detected by looking at just the first timeline,which shows garbage collection activity over time. In this example,initially the non-GC work dominates, but over time the garbagecollection activity increases until it eventually consuming most ofthe non-idle CPU cycles. The large increase in garbage collectionas time passes is strong evidence that the heap is inadequate for theamount of live memory; either the heap size is not appropriate forthe workload, or that the application has a memory leak.

Figure 12. WAIT Report: Lock ContentionFigure 13. WAIT Report: Deadlock7.4 Database BottleneckFigure 16 presents an example of a database bottleneck. Unlike Figure11 where the database became completely unresponsive, in thiscase the database is simply struggling to keep up with the applicationserver’s requests. Over time the server’s utilization variesbetween approximately 10% and 85%, and these dips in utilizationcorrelate roughly with the spikes in the number of threads in Waitingstate Delayed by Remote Request and Category Getting Datafrom Database, thus pointing to the likely source of the delay.Clicking on the orange bar for Getting Data from Databasereveals the thread stacks that a developer can analyze to determinekey parts of the application delayed by the database, and try toreduce the load generated against the database. Alternatively, thedatabase could be optimized or deployed on faster hardware.7.5 Disk I/O Affecting LatencyFigure 17 shows a WAIT report where filesystem activity is limitingperformance. The top two pie charts and timelines show that thereis enough Java code activity to keep the CPUs well-utilized. However,the Waiting activity show a significant number of threads inWait State Delayed by Disk I/O, and CategoryFilesystem MetadataOperations, suggesting room for improvement with faster disks, orby restructuring the code to perform fewer disk operations.Reducing these delays would clearly improve latency, sinceeach transaction would spend less time waiting on Disk I/O, butsuch improvement would have other benefits as well. Even thoughthe four CPUs on this machine are currently well utilized, thisapplication will likely scale poorly on larger machines. As thenumber of processors increases, the frequent disk access delays willeventually become a scaling bottleneck. WAIT can help identifythese scaling limiters early in the development process.8. Related WorkMost conventional gprof-style performance analysis tools (e.g.,[19, 20]) provide a profile which indicates in which methods a programspends time. Often, these tools provide a hierarchical treeview, in which an analyst can recursively drill down to find timespent in subroutines. These tools also often provide a summaryview which indicates the sum totals of execution time over a run.Since performance characteristics vary over time, a summary viewis often less helpful than snapshot views of profile data over smallerwindows. Many tools (e.g., [12, 15, 20]) provide snapshot or windowviews.A few previous works have applied abstraction to profile information,in order to infer higher-level notions of bottlenecks.Hollingsworth [7] identified classes of performance bottlenecks inparallel programs, and introduced a methodology to dynamicallyinsert instrumentation to test for bottlenecks. Srinivas and Srinivasan[10] present a tool design that allows the user to define“Components’, abstract names for code from particular packagesor archives. This paper presents a new methodology to present lay-

Figure 14. WAIT Report: Not Enough LoadFigure 15. WAIT Report: Memory Leakers of abstraction when monitoring performance, suitable for productionenterprise deployment scenarios.A few previous works have been developed to mine log informationin large systems [22, 23]. However, unlike the approachproposed here, both require access to source code that is oftenunavailable in enterprise environments. In addition, [22] employsmachine-learning to determine important characteristics to reportas opposed to the expert rule structure employed here. [23] usesanalysis of source code control paths to determine precise causesof error conditions.There are heavier-weight alternatives to sampling that we choseto avoid. Performance tools exist that collect detailed informationabout method invocations [2, 3, 5, 6, 11, 13, 19], or that stitchtogether an end-to-end trace of call flows across a multi-tier system[1, 4, 14, 16, 18].We also note that performance experts have been using Javacore dumps for many years, but generally in an ad-hoc way usingtheir eyes and brains to summarize the data instead of having a toolto do so. Though other tools like Thread Analyzer [17] take Javacores as input, the tools of which we are aware are interactive, andrequire an expert user for effective results. As a result, existing Javacore tools are generally not suitable for use by less expert users forquick triage.Others have attempted to use method names to help diagnosesoftware problems [8]. However, unlike our approach which employsmethod names to diagnose functional issues in the system,[8] uses method name analysis to promote adherence to namingconventions and readable, maintainable software.9. ConclusionWe have presented the design and methodology behind WAIT, atool designed to pinpoint performance bottlenecks in server applicationsbased on idle time analysis. WAIT has been used widelythroughout IBM, in both development and production deployments.Through these deployments, we have learned how to exploitubiquitous sample-based monitoring to infer rich semantic informationabout server application performance. Although much ofthe literature explores intrusive monitoring based on instrumentationand monitoring agents, we have shown effective performanceanalysis is possible with a lower barrier to entry.

Figure 16. WAIT Report: Database BottleneckReferences[1] M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen.Performance debugging for distributed systems of blackboxes. In Symposium on Operating System Principles. ACM, 2003.[2] W. P. Alexander, R. F. Berry, F. E. Levine, and R. J. Urquhart. Aunifying approach to performance analysis in the java environment.IBM Systems Journal, 39(1), 2000.[3] G. Ammons, J.-D. Choi, M. Gupta, and N. Swamy. Finding andremoving performance bottlenecks in large systems. In The EuropeanConference on Object-Oriented Programming. Springer, 2004.[4] B. Darmawan, R. Gummadavelli, S. Kuppusamy, C. Tan, D. Rintoul,H. Anglin, H. Chuan, A. Subhedar, A. Firtiyan, P. Nambiar, P. Lall,R. Dhall, and R. Pires. IBM Tivoli Composite Application ManagerFamily: Installation, Configuration, and Basic Usage. http://www.redbooks.ibm.com/abstracts/sg247151.html?Open.[5] W. De Pauw, E. Jensen, N. Mitchell, G. Sevitsky, J. Vlissides, andJ. Yang. Visualizing the execution of Java programs. In SoftwareVisualization, State-of-the-art Survey, volume 2269 of Lecture Notesin Computer Science. Springer-Verlag, 2002.[6] R. J. Hall. Cpprofj: Aspect-capable call path profiling of multithreadedjava applications. In Automated Software Engineering, pages107–116. IEEE Computer Society Press, 2002.[7] J. K. Hollingsworth. Finding Bottlenecks in Large-scale ParallelPrograms. PhD thesis, University of Wisconsin, Aug. 1994.[8] E. W. Host and B. M. Ostvold. Debugging method names. In TheEuropean Conference on Object-Oriented Programming. Springer,2009.[9] N. Mitchell, G. Sevitsky, and H. Srinivasan. Modeling runtime behaviorin framework-based applications. The European Conference onObject-Oriented Programming, 2006.[10] K. Srinivas and H. Srinivasan. Summarizing application performancefrom a components perspective. Foundations of Software Engineering,30(5):136–145, 2005.[11] Borland Software Corporation. OptimizeIt TM Enterprise Suite.http://www.borland.com/us/products/optimizeit, 2005.[12] Compuware. Compuware Vantage Analyzer. http://www.compuware.com/solutions/e2e brochures factsheets.asp.[13] Eclipse. Eclipse Test & Performance Tools Platform Project. http://www.eclipse.org/tptp.[14] HP. HP Diagnostics for J2EE.[15] IBM. Compuware Vantage Analyzer. http://alphaworks.ibm.com/tech/dcva4j/download.[16] IBM. IBM OMEGAMON XE for WebSphere. http://www-01.ibm.com/software/tivoli/products/omegamon-xe-was.[17] IBM. Thread and Monitor Dump Analyzer for Java. http://www.alphaworks.ibm.com/tech/jca.[18] IBM. Tivoli Monitoring for Transaction Performance.http://www-01.ibm.com/software/tivoli/products/monitor-transaction.[19] Sun Microsystems. HPROF JVM profiler. http://java.sun.com/developer/technicalArticles/Programming/HPROF.html, 2005.[20] Yourkit LLC. Yourkit profiler. http://www.yourkit.com.[21] G. Xu, M. Arnold, N. Mitchell, A. Rountev, and G. Sevitsky. Gowith the flow: profiling copies to find runtime bloat. In PLDI ’09:Proceedings of the 2009 ACM SIGPLAN conference on Programminglanguage design and implementation, pages 419–430, New York, NY,USA, 2009. ACM.

Figure 17. WAIT Report: Good Throughput but Filesystem Bottleneck[22] W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan. Detectinglarge-scale system problems by mining console logs. In Symposiumon Operating System Principles. ACM, 2009.[23] D. Yuan, H. Mai, W. Xiong, L. Tan, Y. Zhou, and S. Pasupathy.Sherlog: Error diagnosis by connecting clues from run-time logs. InArchitectural Support for Programming Languages and OperatingSystems, Mar. 2010.[24] Y. Zhao, J. Shi, K. Zheng, H. Wang, H. Lin, and L. Shao. Allocationwall: a limiting factor of java applications on emerging multi-coreplatforms. In OOPSLA ’09: Proceeding of the 24th ACM SIGPLANconference on Object oriented programming systems languages andapplications, pages 361–376, New York, NY, USA, 2009. ACM.

Performance Analysis of Idle Programs - Researcher - IBM

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?