Copyright by William Lloyd Bircher 2010 - The Laboratory for ...
Copyright by William Lloyd Bircher 2010 - The Laboratory for ...
Copyright by William Lloyd Bircher 2010 - The Laboratory for ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
of Quad-Core AMD processors, this is the dominant effect. When an active core<br />
per<strong>for</strong>ms a cache probe of an idle core, latency is increased compared to probing an<br />
active core. <strong>The</strong> per<strong>for</strong>mance loss can be significant <strong>for</strong> memory-bound (cache probe-<br />
intensive) workloads. Direct per<strong>for</strong>mance effects are due to the current operating<br />
frequency of an active core. <strong>The</strong> effect tends to be less compared to indirect, since<br />
operating systems are reasonably effective at matching current operating frequency to<br />
per<strong>for</strong>mance demand. <strong>The</strong>se effects are illustrated in Figure 6.1.<br />
Two extremes of workloads are presented: the compute-bound crafty and the memory-<br />
bound equake. For each workload, two cases are presented: fixed and normal scheduling.<br />
Fixed scheduling isolates indirect per<strong>for</strong>mance loss <strong>by</strong> eliminating the effect of OS<br />
frequency scheduling and thread migration. This is accomplished <strong>by</strong> <strong>for</strong>cing the<br />
software thread to a particular core <strong>for</strong> the duration of the experiment. In this case, the<br />
thread runs always run at the maximum frequency. <strong>The</strong> idle cores always run at the<br />
minimum frequency. As a result, crafty achieves 100 percent of the per<strong>for</strong>mance of<br />
processor that does not use dynamic power management. In contrast, the memory-bound<br />
equake shows significant per<strong>for</strong>mance loss due to the reduced per<strong>for</strong>mance of idle cores.<br />
Direct per<strong>for</strong>mance loss is shown in the dark solid and light solid lines, which utilize OS<br />
scheduling of frequency and threads. Because direct per<strong>for</strong>mance losses are caused <strong>by</strong><br />
suboptimal frequency in active cores, the compute-bound crafty shows a significant<br />
per<strong>for</strong>mance loss. <strong>The</strong> memory-bound equake actually shows a per<strong>for</strong>mance<br />
106