White Paper<strong>Wireless</strong> <strong>Intel</strong> <strong>SpeedStep</strong> ® <strong>Power</strong> <strong>Manager</strong>800000700000DHRYSTONES/SEC6000005000004000003000002000001000000CORE/PX/SDCLKDhrystones/secProcessing BoundFrequency scaling despiteconfigured core speedHigher is Better208, 208, 208, 208, 520, 520, 520, 520,104, 208, 104, 208, 208, 104, 208, 104,52 52 104 208 52 52 208 104269778 269778 269778 269778 674442 674443 674443 674444Figure 5. Dhrystone/sec vs. Core/PX/SDCLK180160140120100MB/SEC806040200CORE/PX/SDCLKMB/sec208, 520, 208, 520, 520, 208, 208, 520,104, 104, 104, 104, 208, 208, 208, 208,52 52 104 104 52 52 104 10468.9 69.1 79.9 80.6 80.9 81.4 116.1 126.3Figure 6. MB/sec vs. Core/PX/SDCLK5.0 Workload Characterization for<strong>Intel</strong> DFM and DVMMost software applications/workloads can be generalized intothree main categories:■■■■CPU (compute) bound applicationsMemory bound applicationsI/O bound applicationsCPU and Memory bound applications5.1 CPU Bound ApplicationsAn application is generally considered CPU-bound when mostof its execution time is spent on computation, using the dataand instructions loaded in D-cache and I-cache. Schedulinginstructions suitable to the underlying processor architecture(reducing stalls) can potentially increase performance of CPUboundapplications. These applications tend to keep the CPUbusy all of the time and the processor’s idle time is negligible.An increase in processor frequency (and in turn voltage) helpsto increase the performance of these applications. As anexample, Dhrystone is a purely CPU bound workload and asshown in Figure 5, the performance is a linear function of CPU(core) frequency. In order to meet performance requirements, itis essential to have these applications run at the maximumpossible frequency. This information is very critical for aperformance-optimizing policy manager.5.2 Memory Bound ApplicationsSome applications that work on large data blocks (greater thanthe cache size) usually have to access data outside of thecaches, and become bound by the memory or the system busspeed. These applications, such as a memory copy, move largeblocks of data and tend to generate significant memory traffic,with most CPU cycles lost waiting for data. In such cases,performance does not improve (see Figure 6) even if thecore’s speed is increased, since the performance is a functionof the memory speed. This information is very critical for aperformance-optimizing policy manager.10
<strong>Wireless</strong> <strong>Intel</strong> <strong>SpeedStep</strong> ® <strong>Power</strong> <strong>Manager</strong>White Paper25WMV FPS2015105MEMBoundCPUBoundMEMBoundCPUBoundMEMBoundHigher is Better0CORE/PX/SDCLKWMV FPS/sec208, 208, 520, 208, 520, 520,104, 104, 104, 208, 104, 208,52 104 52 104 104 1048.7 10.3 14.1 15.1 18.1 21.5Knowledge relevant to <strong>Power</strong> Management from system or application perspectiveFigure 7. WMV FPS vs. Core/PX/SDCLK5.3 I/O Bound ApplicationsApplications that are waiting on some I/O (peripheral) devicefor data are considered I/O bound. An example would be anEthernet driver waiting for data from the network.5.4 CPU and Memory Bound ApplicationsMany applications have performance demands that vary overtime. At a given instant they could be either CPU (compute)bound or memory bound. Multimedia applications thatundertake a large amount of computations as well as work onlarge data blocks fall in this category. The characteristics ofthese multimedia applications show that performance isbounded by both memory and CPU speed. For these typesof workloads, accurate prediction or estimation of thecharacteristics yields a better power policy. For example, avideo player is a CPU and memory bound type of application.Its performance is plotted as a function of the core andmemory frequency in Figure 7.6.0 Idle ProfilerThe Idle Profiler provides CPU usage and operating system idleinformation to the <strong>Power</strong> <strong>Manager</strong> software. Figure 8 shows theoperating system’s idle thread providing input to the Idle Profiler.Since the idle thread is executed when the OS is not busy (notexecuting any code), it is one of the preferred choices toprovide CPU usage information. However, since the idle threadonly executes when there are no tasks ready to run, theinformation is only provided when the CPU is used less than100 percent of the time. In cases where CPU usage is less than100 percent of the time, but still very high, the ISR can be usedto provide CPU usage information to the Idle Profiler.7.0 Performance ProfilerSystem workload does not remain static at any given time.So, dynamic workload characterization is essential to theoptimization of system performance at minimum powerdissipation. The Performance Profiler monitors systeminformation and maintains a system state. At any given time,the Policy <strong>Manager</strong> can direct the Performance Profiler toreturn the current system state. Since the Performance Profileris event driven, it can automatically alert the Policy <strong>Manager</strong>when the system state changes.In order to achieve dynamic scaling for power and performancebased on dynamic characterization of CPU bound, memorybound, or CPU and memory bound workloads, the PerformanceProfiler monitors the <strong>Intel</strong> PXA27x processor’s PerformanceMonitoring Unit (PMU) as shown in Figure 9, next page.OS IdleThreadISRCPUUtilizationLogicPredictionLogicStateDeterminationPMInterfacePM Interface<strong>Power</strong> <strong>Manager</strong> Idle ProfilerFigure 8. Idle Profiler, <strong>Power</strong> <strong>Manager</strong> Software11