21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Integrated CPU Cache Power Management <strong>in</strong> Multiple Clock Doma<strong>in</strong> Processors 221<br />

energy-delay improvements. However, for embedded systems, a two-doma<strong>in</strong> processor<br />

is a more appropriate design choice when compared to a processor with a larger<br />

number of doma<strong>in</strong>s (due to its simplicity). Figure 6 shows that <strong>in</strong>creas<strong>in</strong>g the number<br />

of doma<strong>in</strong>s had little (positive or negative) impact on the difference <strong>in</strong> energy-delay<br />

product between our policy and the <strong>in</strong>dependent policy. This <strong>in</strong>dicates that the core-<br />

L2 cache <strong>in</strong>teraction is most critical <strong>in</strong> terms of its effect on energy and delay, which<br />

yielded higher sav<strong>in</strong>gs <strong>in</strong> the two-doma<strong>in</strong> case. We can conclude that a small number<br />

of doma<strong>in</strong>s is the most appropriate for embedded processors, not only from a design<br />

perspective but also for improv<strong>in</strong>g energy-delay.<br />

6 Related Work<br />

MCD design has the advantages of alleviat<strong>in</strong>g some clock synchronization bottlenecks<br />

and reduc<strong>in</strong>g the power consumed by the global clock network. Semeraro et al. explored<br />

the benefit of the voltage scal<strong>in</strong>g <strong>in</strong> MCD versus globally synchronous designs [3]. They<br />

f<strong>in</strong>d a potential 20% average improvement <strong>in</strong> the energy-delay product. Similarly, Iyer<br />

at al. analyzed the power and performance benefit of MCD with DVS [4]. They f<strong>in</strong>d<br />

that DVS provides up to 20% power sav<strong>in</strong>gs over an MCD core with s<strong>in</strong>gle voltage.<br />

In <strong>in</strong>dustrial semiconductor manufactur<strong>in</strong>g, National Semiconductor <strong>in</strong> collaboration<br />

with ARM developed the PowerWise technology that uses Adaptive Voltage Scal<strong>in</strong>g<br />

and threshold scal<strong>in</strong>g to automatically control the voltage of multiple doma<strong>in</strong>s on<br />

chip [1]. The PowerWise technology can support up to 4 voltage doma<strong>in</strong>s [12]. Their<br />

current technology also provides power management <strong>in</strong>terface for dual-core processors.<br />

Another technique by Magklis et al. is a profile-based approach that identifies program<br />

regions that justify reconfiguration [5]. This approach <strong>in</strong>volves extra overhead of<br />

profil<strong>in</strong>g and analyz<strong>in</strong>g phases for each application. Zhu et al presented architectural<br />

optimizations for improv<strong>in</strong>g power and reduc<strong>in</strong>g complexity [9]. However, these policies<br />

do not take <strong>in</strong>to account the cascad<strong>in</strong>g effect of chang<strong>in</strong>g a doma<strong>in</strong> voltage on the<br />

other doma<strong>in</strong>s.<br />

Rusu et al. proposed a DVS policy that controls the doma<strong>in</strong>’s frequency us<strong>in</strong>g mach<strong>in</strong>e<br />

learn<strong>in</strong>g approach [13][14]. They characterize applications us<strong>in</strong>g performance<br />

counter values such as cycle-per-<strong>in</strong>struction and number of L2 accesses per <strong>in</strong>struction.<br />

In a tra<strong>in</strong><strong>in</strong>g phase, the policy searches for the best frequency for each application<br />

phase. Dur<strong>in</strong>g runtime, based on the values of the monitors performance counters, the<br />

policy sets the frequency for all doma<strong>in</strong>s based on their offl<strong>in</strong>e analysis. The paper<br />

shows improvement <strong>in</strong> energy-delay product close to a near-optimal scheme. However,<br />

the technique requires an extra offl<strong>in</strong>e tra<strong>in</strong><strong>in</strong>g step to f<strong>in</strong>d the best frequencies for each<br />

doma<strong>in</strong> and application characterization.<br />

Wu et al. present a formal solution by model<strong>in</strong>g each doma<strong>in</strong> as a queu<strong>in</strong>g system [6].<br />

However, they study each doma<strong>in</strong> <strong>in</strong> isolation and <strong>in</strong>corporat<strong>in</strong>g doma<strong>in</strong> <strong>in</strong>teractions<br />

<strong>in</strong>creases the complexity of the queu<strong>in</strong>g model. Vary<strong>in</strong>g the DVS power management<br />

<strong>in</strong>terval is another way to save energy. Wu et al. adaptively vary the controll<strong>in</strong>g <strong>in</strong>terval<br />

to react to changes <strong>in</strong> workload <strong>in</strong> each doma<strong>in</strong> was presented <strong>in</strong> [15]. They do not<br />

take <strong>in</strong>to account the effect <strong>in</strong>duced by voltage change <strong>in</strong> one doma<strong>in</strong> on the other<br />

doma<strong>in</strong>s.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!