Lecture Notes in Computer Science 4917
Lecture Notes in Computer Science 4917
Lecture Notes in Computer Science 4917
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Integrated CPU Cache Power Management <strong>in</strong> Multiple Clock Doma<strong>in</strong> Processors 221<br />
energy-delay improvements. However, for embedded systems, a two-doma<strong>in</strong> processor<br />
is a more appropriate design choice when compared to a processor with a larger<br />
number of doma<strong>in</strong>s (due to its simplicity). Figure 6 shows that <strong>in</strong>creas<strong>in</strong>g the number<br />
of doma<strong>in</strong>s had little (positive or negative) impact on the difference <strong>in</strong> energy-delay<br />
product between our policy and the <strong>in</strong>dependent policy. This <strong>in</strong>dicates that the core-<br />
L2 cache <strong>in</strong>teraction is most critical <strong>in</strong> terms of its effect on energy and delay, which<br />
yielded higher sav<strong>in</strong>gs <strong>in</strong> the two-doma<strong>in</strong> case. We can conclude that a small number<br />
of doma<strong>in</strong>s is the most appropriate for embedded processors, not only from a design<br />
perspective but also for improv<strong>in</strong>g energy-delay.<br />
6 Related Work<br />
MCD design has the advantages of alleviat<strong>in</strong>g some clock synchronization bottlenecks<br />
and reduc<strong>in</strong>g the power consumed by the global clock network. Semeraro et al. explored<br />
the benefit of the voltage scal<strong>in</strong>g <strong>in</strong> MCD versus globally synchronous designs [3]. They<br />
f<strong>in</strong>d a potential 20% average improvement <strong>in</strong> the energy-delay product. Similarly, Iyer<br />
at al. analyzed the power and performance benefit of MCD with DVS [4]. They f<strong>in</strong>d<br />
that DVS provides up to 20% power sav<strong>in</strong>gs over an MCD core with s<strong>in</strong>gle voltage.<br />
In <strong>in</strong>dustrial semiconductor manufactur<strong>in</strong>g, National Semiconductor <strong>in</strong> collaboration<br />
with ARM developed the PowerWise technology that uses Adaptive Voltage Scal<strong>in</strong>g<br />
and threshold scal<strong>in</strong>g to automatically control the voltage of multiple doma<strong>in</strong>s on<br />
chip [1]. The PowerWise technology can support up to 4 voltage doma<strong>in</strong>s [12]. Their<br />
current technology also provides power management <strong>in</strong>terface for dual-core processors.<br />
Another technique by Magklis et al. is a profile-based approach that identifies program<br />
regions that justify reconfiguration [5]. This approach <strong>in</strong>volves extra overhead of<br />
profil<strong>in</strong>g and analyz<strong>in</strong>g phases for each application. Zhu et al presented architectural<br />
optimizations for improv<strong>in</strong>g power and reduc<strong>in</strong>g complexity [9]. However, these policies<br />
do not take <strong>in</strong>to account the cascad<strong>in</strong>g effect of chang<strong>in</strong>g a doma<strong>in</strong> voltage on the<br />
other doma<strong>in</strong>s.<br />
Rusu et al. proposed a DVS policy that controls the doma<strong>in</strong>’s frequency us<strong>in</strong>g mach<strong>in</strong>e<br />
learn<strong>in</strong>g approach [13][14]. They characterize applications us<strong>in</strong>g performance<br />
counter values such as cycle-per-<strong>in</strong>struction and number of L2 accesses per <strong>in</strong>struction.<br />
In a tra<strong>in</strong><strong>in</strong>g phase, the policy searches for the best frequency for each application<br />
phase. Dur<strong>in</strong>g runtime, based on the values of the monitors performance counters, the<br />
policy sets the frequency for all doma<strong>in</strong>s based on their offl<strong>in</strong>e analysis. The paper<br />
shows improvement <strong>in</strong> energy-delay product close to a near-optimal scheme. However,<br />
the technique requires an extra offl<strong>in</strong>e tra<strong>in</strong><strong>in</strong>g step to f<strong>in</strong>d the best frequencies for each<br />
doma<strong>in</strong> and application characterization.<br />
Wu et al. present a formal solution by model<strong>in</strong>g each doma<strong>in</strong> as a queu<strong>in</strong>g system [6].<br />
However, they study each doma<strong>in</strong> <strong>in</strong> isolation and <strong>in</strong>corporat<strong>in</strong>g doma<strong>in</strong> <strong>in</strong>teractions<br />
<strong>in</strong>creases the complexity of the queu<strong>in</strong>g model. Vary<strong>in</strong>g the DVS power management<br />
<strong>in</strong>terval is another way to save energy. Wu et al. adaptively vary the controll<strong>in</strong>g <strong>in</strong>terval<br />
to react to changes <strong>in</strong> workload <strong>in</strong> each doma<strong>in</strong> was presented <strong>in</strong> [15]. They do not<br />
take <strong>in</strong>to account the effect <strong>in</strong>duced by voltage change <strong>in</strong> one doma<strong>in</strong> on the other<br />
doma<strong>in</strong>s.