SYSTEM ARCHITECTUREUnderstandingPerformance BenchmarksBenchmarks provide objective information that can be used to compare computer platforms,components, operating systems, and specific system configurations. This articlediscusses characteristics of credible benchmarks, guidelines for evaluating benchmarkresults, and some of the main benchmarks used at <strong>Dell</strong> for assessing the performanceof server, workstation, and client systems.BY SHARON HANSON, DIEGO ESTEVES, AND CLINT ESPINOZAAt their best, performance benchmarks provide impartialinformation that can be used to evaluate andcompare the performance of computer systems. <strong>Dell</strong> andthe computer industry promote objective and crediblebenchmarking in various ways, including participation instandards bodies such as the Standard Performance EvaluationCorporation (SPEC), Business Applications PerformanceCorporation (BAPCo), Transaction ProcessingPerformance Council (TPC), and Storage PerformanceCouncil. When properly run and documented, the benchmarksproduced by these and other groups help provideobjective information that can be used to compare computerplatforms, components, operating systems, and specificsystem configurations.<strong>Dell</strong> is committed to furthering industry practices thatyield objective industry-standard benchmark results. Organizationscan use these benchmarks to evaluate and compare<strong>Dell</strong> systems to competitors’ systems. <strong>Dell</strong> also usesthe benchmarks when developing new products andassessing new technologies.The <strong>Dell</strong> benchmark philosophy is based on threetenets:• Benchmark in a way that closely resembles howorganizations use applications on <strong>Dell</strong> systems• Ensure that anyone can reproduce results with asystem shipped directly from <strong>Dell</strong>, using publiclyavailable drivers• Promote benchmark and run-rule changes thatreflect this approach to benchmarkingThis article discusses characteristics of credible benchmarksand presents high-level guidelines for evaluatingbenchmark results. It concludes with a list of the keybenchmarks used at <strong>Dell</strong> to evaluate server, workstation,and client system performance.Characteristics of credible performance benchmarksA computer performance benchmark is a standard bywhich a computer system can be measured and judged.Many of the well-known benchmarks are developed andregulated by standards organizations such as SPEC andBAPCo. Just as common are unregulated benchmarks thatmeasure system performance when running specific applicationssuch as Adobe ® Photoshop ® , Microsoft ® Exchange,Parametric ® Pro/E ® , or Id Software ® Quake III ® software.These benchmarks can help administrators evaluate systemperformance on a single, critical application such as Pro/Eor Microsoft Exchange. Such benchmarks can be run—andtheir results reported—with varying degrees of flexibility.88POWER SOLUTIONS June 2004
SYSTEM ARCHITECTUREIn contrast, regulated benchmarks tend to have well-defined anddocumented methodologies, and their results are documented andreproducible. A good example is the SPEC ® CPU2000 benchmark,which is produced by SPEC, a nonprofit corporation. According toSPEC, the organization’s mission is to establish, maintain, andendorse a standardized set of relevant benchmarks. SPEC developssuites of benchmarks and also reviews and publishes submittedresults from member organizations and other benchmark licensees. 1The SPEC organization has industry-wide representation and itsbenchmark suites are well accepted and credible.The SPEC CPU2000 benchmark provides performance measurementsthat can be used to compare compute-intensive workloads(both integer and floating point) on different computersystems. These compute-intensive benchmarks measure the performanceof a system’s processor,memory architecture, and compiler.CPU2000 consists of a setof objective tests that must becompiled and run according toSPEC run rules. SPEC providesthe benchmarks as source codeso they can be compiled to runon a variety of platforms, includingindustry-standard Intel ®architecture–based systems andSPARC ® processor–based Sun systems.In addition, SPEC providesguidelines for legitimately optimizingthe performance of testedsystems on the benchmark. TheseThose who are evaluatingbenchmarks shouldconsider whether thebenchmark workload isreasonably representativeof the real-worldapplications that will berun on the system.guidelines are designed to ensure that the hardware and softwareconfigurations of tested systems are suitable to run real-world applications.The organization also requires a full disclosure report,which provides benchmark results and configuration details sufficientto independently reproduce the results. SPEC encouragessubmission of reports for publication on the SPEC Web site(http://www.spec.org). These reports undergo a peer-review processbefore publication. Because of these rigorous requirements, CPU2000benchmark results that are published on the SPEC Web site arewidely used to compare the CPU, memory, and compiler performanceof client and server systems.BAPCo, TPC, and the Storage Performance Council are alsononprofit corporations that provide industry-standard benchmarkswidely used to compare the performance of client, server, and storagesystems. TPC was founded to define transaction processing anddatabase benchmarks. The BAPCo charter is to develop and distributea set of objective performance benchmarks based on popular computerapplications and industry-standard operating systems. Thegoal of the Storage Performance Council is to define, promote, andenforce vendor-neutral benchmarks that characterize the performanceof storage subsystems.Guidelines for evaluating benchmark resultsWhen using benchmark results to evaluate and compare systems,administrators should understand the benchmark, be aware ofsystem optimizations, and ensure comparable system comparisons,as follows.Understand the benchmarkIt is essential to understand which aspects of system performancea benchmark is testing as well as what the system’s workload willbe. Those who are evaluating benchmarks should consider whetherthe benchmark workload is reasonably representative of the realworldapplications that will be run on the system. For instance, ifa client system will be used to run mainstream business productivityapplications, the BAPCo SYSmark ® or Ziff Davis ® BusinessWinstone ® benchmarks are good candidates. 2 On the other hand,if the test subject is a workstation system that will be used primarilyto run Pro/E, the Pro/E application benchmark is suitable. Ifpossible, those who are evaluating benchmarks should focus on regulatedbenchmarks from standards bodies such as SPEC and BAPCoor on benchmarks that are standard industry applications.Application benchmarks can be run with a variety of inputs, eachof which attempts to represent different usage scenarios. For example,Adobe Photoshop performance varies greatly depending on thesize of the image and the operations performed on it. Moreover, somePhotoshop operations may be better suited or optimized for a particularsystem architecture. Even within a particular operation (suchas the Gaussian Blur filter), the end user may be able to modify howthe filter is applied. Different code algorithms may be used, resultingin significantly different performance results. These variables makeit relatively easy to create a suite of Photoshop benchmark operationsthat greatly favor a particular system architecture. For thisreason, <strong>Dell</strong> recommends that organizations look beyond summarybenchmark results to help ensure that the operations performed arerepresentative of their specific usage models.Be aware of system optimizationsSome optimization of the tested system is expected and allowed onall benchmarks. SPEC outlines broad optimization guidelines in itsrun rules for each benchmark. The expectation of these guidelines1 For more information about SPEC, visit http://www.spec.org.2 For more information about the BAPCo SYSmark benchmark, visit www.bapco.com; for more information about the Ziff Davis Winstone benchmark, visit http://www.veritest.com/benchmarks/bwinstone/default.asp.www.dell.com/powersolutions POWER SOLUTIONS 89