Copyright by William Lloyd Bircher 2010 - The Laboratory for ...

More documents

Recommendations

Info

increased while performance is decreased. OS scheduling characteristics exacerbate this problem. Unless the user makes use of explicit process affinity or an affinity library, some operating systems will attempt to balance the workloads across all cores. This causes a process to spend less contiguous time on a particular core. At each migration from one core to another there is a lag from when the core goes active to when the active core has its frequency increased. The aggressiveness of the p-state setting amplifies the performance loss/power increase due to this phenomenon. Fortunately, recent operating systems such as Microsoft Windows Vista provide means for system designers and end users to adjust the settings to match their workloads/hardware [Vi07]. 6.2 Reactive Power Management In this section, results are presented to show the effect that dynamic adaptations ultimately have on performance and power consumption. All results are obtained from a real system, instrumented for power measurement. The two major areas presented are probe sensitivity (indirect) and operating system effects (direct). First, the probe sensitivity of SPEC CPU2006 is considered. Table 6.1 shows performance loss due to the use of p-states. In this experiment the minimum p-state is set below the recommended performance breakover point for probe response. This emphasizes the inherent sensitivity workloads have to probe response. Operating system frequency scheduling is biased towards performance by fixing active cores at the maximum frequency and idle cores at the minimum frequency. These results suggest that floating point workloads tend to be most sensitive to probe latency. However, in the case 110
of SPEC CPU2000 workloads, almost no performance loss is shown. The reason, as shown in section 4.3.1, is that smaller working set size reduces memory traffic and, therefore, the dependence on probe latency. For these workloads only swim, equake, and eon showed a measureable performance loss. Table 6.1 Performance Loss Due to Low Idle Core Frequency – SPEC CPU 2006 SPEC CPU 2006 – INT perlbench -0.8% sjeng 0.0% bzip2 -1.0% libquantum -7.0% gcc -3.6% h264ref -0.8% mcf -1.8% omnetpp -3.7% gobmk -0.3% astar -0.5% hmmer -0.2% SPEC CPU 2006 – FP bwaves -5.6% soplex -6.7% games -0.6% povray -0.5% milc -7.9% calculix -0.6% zeusmp -2.1% GemsFDTD -5.9% gromacs -0.3% tonto -0.6% cactusADM -2.6% lbm -5.6% leslie3D -6.0% wrf -3.2% namd -0.1% sphinx3 -5.6% dealII -1.3% Next, it is shown that by slightly increasing the minimum p-state frequency it is possible to recover almost the entire performance loss. Figure 6.2 shows an experiment using a synthetic kernel with high probe sensitivity with locally and remotely allocated memory. The remote case simply shows that the performance penalty of accessing remote memory can obfuscate the performance impact of minimum p-state frequency. The indirect performance effect can be seen clearly by noting that performance increases rapidly as the idle core frequency is increased from 800 MHz to approximately 1.1 GHz. This is a 111
Page 1 and 2:
Copyright by William Lloyd Bircher
Page 3 and 4:
Predictive Power Management for Mul
Page 5 and 6:
Acknowledgements I would like to th
Page 7 and 8:
Predictive Power Management for Mul
Page 9 and 10:
Table of Contents Chapter 1 Introdu
Page 11 and 12:
5.3.6 Memory ......................
Page 13 and 14:
List of Tables Table 1.1 Windows Vi
Page 15 and 16:
List of Figures Figure 1.1 CPU Core
Page 17 and 18:
Chapter 1 Introduction Computing sy
Page 19 and 20:
increases the overhead of adaptatio
Page 21 and 22:
suboptimal from a power and perform
Page 23 and 24:
Active Core Activity Idle except fo
Page 25 and 26:
4. Design a predictive power manage
Page 27 and 28:
1.7 Organization This dissertation
Page 29 and 30:
Chapter 2 Methodology The developme
Page 31 and 32:
2.1.2 Subsystem-Level Power in a Se
Page 33 and 34:
The main components are subsystem p
Page 35 and 36:
Table 2.4 Laptop System Description
Page 37 and 38:
2.3 Performance Counter Sampling To
Page 39 and 40:
instruction streams that exercise a
Page 41 and 42:
the power trace to the PMC trace co
Page 43 and 44:
The first bar in Figure 3.1 “Fetc
Page 45 and 46:
Table 3.4 Instruction Linear Regres
Page 47 and 48:
long time to complete, more aggress
Page 49 and 50:
power adaptations as c-states. Thes
Page 51 and 52:
Core Power (Watts) 60 50 40 30 20 1
Page 53 and 54:
The difference between C0-Idle and
Page 55 and 56:
case error is 3.3%. Alternatively s
Page 57 and 58:
3. Based on basic domain knowledge,
Page 59 and 60:
3.6 Summary This section describes
Page 61 and 62:
Power (Watts) 30 25 20 15 10 5 0 Co
Page 63 and 64:
workloads become more memory-bound,
Page 65 and 66:
At the other extreme, the productiv
Page 67 and 68:
processor. In both platforms the to
Page 69 and 70:
Table 4.1 Subsystem Power Standard
Page 71 and 72:
the case of I/O, the observed workl
Page 73 and 74:
average distribution. The apparent
Page 75 and 76: Table 4.4 Workload Phase Classifica
Page 77 and 78: Frequency 1 10 100 1000 PhaseLength
Page 79 and 80: power management, it is shown that
Page 81 and 82: power measurement hardware for mult
Page 83 and 84: memory. Since the number of main me
Page 85 and 86: TLB Misses - Loads/stores that miss
Page 87 and 88: The form of the subsystem power mod
Page 89 and 90: 5.2.2 Memory This section considers
Page 91 and 92: prefetch traffic does increase afte
Page 93 and 94: average access time to the distant
Page 95 and 96: Watts Figure 5.6 Disk Power Model (
Page 97 and 98: exhibits little variation in power
Page 99 and 100: The memory model averaged about 9%
Page 101 and 102: efficiency rather than performance.
Page 103 and 104: through the application GUI. The nu
Page 105 and 106: 1 data cache access rate dominates
Page 107 and 108: periods of disconnect, cache snoop
Page 109 and 110: 5.3.4 CPU To test the extensibility
Page 111 and 112: Despite this, high accuracy of less
Page 113 and 114: Light activity yields higher precha
Page 115 and 116: 5.3.8 Chipset The Chipset power mod
Page 117 and 118: applied to the GPU core logic, larg
Page 119 and 120: Chapter 6 Performance Effects of Dy
Page 121 and 122: can be considered as predictors whi
Page 123 and 124: improvement for low idle core frequ
Page 125: 6.1.5 Direct Performance Effects Si
Page 129 and 130: adjusting a hysteresis timer. The t
Page 131 and 132: increase/decrease time. Since the i
Page 133 and 134: In order to reduce p-state performa
Page 135 and 136: performance loss and power consumpt
Page 137 and 138: eactive scheme used in Windows Vist
Page 139 and 140: Watts Watts Watts Watts Watts 150 1
Page 141 and 142: 7.2 Commercial DVFS Algorithm Exist
Page 143 and 144: of idle-active transitions in the c
Page 145 and 146: workload/operating systems adds and
Page 147 and 148: instructions. In APCI[Ac07] termino
Page 149 and 150: confidence level will drop below a
Page 151 and 152: First, prediction accuracy is consi
Page 153 and 154: coverage of 43% and accuracy over 9
Page 155 and 156: which corresponds to the DVFS sched
Page 157 and 158: Table 7.6: SYSmark 2007 Power and P
Page 159 and 160: In this case the predictor achieves
Page 161 and 162: 2007 power consumption contains man
Page 163 and 164: Chapter 8 Related Research This sec
Page 165 and 166: 8.2 System-Level Power Characteriza
Page 167 and 168: prediction scheme in this dissertat
Page 169 and 170: Chapter 9 Conclusions and Future Wo
Page 171 and 172: the duration of power and performan
Page 173 and 174: execution, portions of pipelines or
Page 175 and 176: [BiJo06-1] W. L. Bircher and L. Joh
Page 177 and 178:
the 2005 ACM SIGMETRICS Internation
Page 179 and 180:
[HaKe07] H. Hanson, S.W. Keckler, K
Page 181 and 182:
[JoMa01] R. Joseph and M. Martonosi
Page 183 and 184:
[LiBr05] Y. Li, D. Brooks, Z. Hu, a
Page 185 and 186:
[Os06] Open Source Development Lab,
Page 187 and 188:
[WaCh08] X. Wang and M. Chen. Clust
Page 189 and 190:
[PiSh01] P. Pillai and K. G. Shin.
show all

Copyright by William Lloyd Bircher 2010 - The Laboratory for ...

Create successful ePaper yourself

Delete template?

Save as template?