15.01.2013 Views

U. Glaeser

U. Glaeser

U. Glaeser

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

is generally the most important measure of performance. Execution time is the product of the number<br />

of instructions, cycles per instruction (CPI), and the clock period. Throughput of an application is a<br />

more important metric, especially in server systems. In servers that serve the banking industry, airline<br />

industry, or other similar businesses, what is important is the number of transactions that could be<br />

completed in unit time. Such servers, typically called transaction processing systems use transactions per<br />

minute (tpm) as a performance metric. Millions of instructions per second (MIPS) and million of floating<br />

point operations per second (MFLOPS) were very popular measures of performance in the past. Both<br />

of these are very simple and straightforward to understand and hence have been used often, however,<br />

they do not contain all three components of program execution time and hence are incomplete measures<br />

of performance. Several low level metrics are of interest to microprocessor designers, in order to help<br />

them identify performance bottlenecks and tune their designs. Cache hit ratios, branch misprediction<br />

ratios, number of off-chip memory accesses, etc. are examples of such measures.<br />

Another major problem is the issue of reporting performance with a single number. A single number<br />

is easy to understand and easy to be used by the trade press. Use of several benchmarks also makes it<br />

necessary to find some kind of a mean. Arithmetic mean, geometric mean, and harmonic mean are three<br />

ways of finding the central tendency of a group of numbers; however, it should be noted that each of<br />

these means should be used in appropriate conditions depending on the nature of the numbers which<br />

need to be averaged. Simple arithmetic mean can be used to find average execution time from a set of<br />

execution times. Geometric mean can be used to find the central tendency of metrics that are in the form<br />

of ratios (e.g., speedup) and harmonic mean can be used to find the central tendency of measures that<br />

are in the form of a rate (e.g., throughput). Cragon [2] and Smith [3] discuss the use of the appropriate<br />

mean for a given set of data. Cragon [2] and Patterson and Hennessy [4] illustrate several mistakes one<br />

could possibly make while finding a single performance number.<br />

The rest of this chapter section is organized as follows. “Performance Measurement” describes performance<br />

measurement techniques including hardware on-chip performance monitoring counters on microprocessors.<br />

“Performance Modeling” describes simulation and analytical modeling of microprocessors and<br />

computer systems. “Workloads and Benchmarks” presents several state-of-the-art benchmark suites for a<br />

variety of workloads. Due to space limitations, we describe some typical examples of tools and techniques<br />

and provide the reader with pointers for more information.<br />

Performance Measurement<br />

Performance measurement is used for understanding systems that are already built or prototyped.<br />

Performance measurement can serve two major purposes: tune this system or systems to be built and<br />

tune the application if source code and algorithms can still be changed. Essentially, the process involves<br />

(i) understanding the bottlenecks in the system that has been built, (ii) understanding the applications<br />

that are running on the system and the match between the features of the system and the characteristics<br />

of the workload, and (iii) innovating design features that will exploit the workload features. Performance<br />

measurement can be done via the following means:<br />

• Microprocessor on-chip performance monitoring counters<br />

• Off-chip hardware monitoring<br />

• Software monitoring<br />

• Microcoded instrumentation<br />

On-Chip Performance Monitoring Counters<br />

All state-of-the-art high performance microprocessors including Intel’s Pentium III and Pentium IV, IBM’s<br />

POWER 3 and POWER 4 processors, AMD’s Athlon, Compaq’s Alpha, and Sun’s UltraSPARC processors<br />

incorporate on-chip performance monitoring counters, which can be used to understand performance of<br />

these microprocessors, while they run complex, real-world workloads. This ability has overcome a<br />

serious limitation of simulators, that they often could not execute complex workloads. Now, complex<br />

run-time systems involving multiple software applications can be evaluated and monitored very closely.<br />

© 2002 by CRC Press LLC

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!