21.01.2013 Views

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

Lecture Notes in Computer Science 4917

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

35%<br />

30%<br />

25%<br />

20%<br />

15%<br />

10%<br />

5%<br />

0%<br />

164.gzip<br />

168.wupwise<br />

171.swim<br />

Turbo-ROB: A Low Cost Checkpo<strong>in</strong>t/Restore Accelerator 269<br />

ROB TROB_32 TROB_64 TROB_128<br />

172.mgrid<br />

173.applu<br />

175.vpr<br />

176.gcc<br />

177.mesa<br />

178.galgel<br />

179.art<br />

181.mcf<br />

183.equake<br />

186.crafty<br />

187.facerec<br />

188.ammp<br />

189.lucas<br />

191.fma3d<br />

197.parser<br />

252.eon<br />

254.gap<br />

255.vortex<br />

256.bzip2<br />

300.twolf<br />

301.apsi<br />

Fig. 6. Per-benchmark and average performance deterioration relative to PERF with<br />

ROB-only recovery and TROB-only recovery as a function of the number of the TROB<br />

entries for a 128-entry w<strong>in</strong>dow processor<br />

70%<br />

60%<br />

50%<br />

40%<br />

30%<br />

20%<br />

10%<br />

0%<br />

164.gzip<br />

168.wupwise<br />

171.swim<br />

172.mgrid<br />

ROB TROB_32 TROB_64 TROB_128 TROB_256<br />

AVG<br />

173.applu<br />

175.vpr<br />

176.gcc<br />

177.mesa<br />

178.galgel<br />

179.art<br />

181.mcf<br />

183.equake<br />

186.crafty<br />

187.facerec<br />

188.ammp<br />

189.lucas<br />

191.fma3d<br />

197.parser<br />

252.eon<br />

254.gap<br />

255.vortex<br />

256.bzip2<br />

300.twolf<br />

301.apsi<br />

AVG<br />

Fig. 7. Per-benchmark and average performance deterioration relative to PERF with<br />

ROB-only recovery and TROB-only recovery as a function of the number of the TROB<br />

entries for a 256-entry w<strong>in</strong>dow processor<br />

branch with a repair po<strong>in</strong>t proceeds via the TROB only, otherwise, it is necessary<br />

to use both the ROB and the sTROB. In the latter case, recovery proceeds<br />

first at the nearest subsequent repair po<strong>in</strong>t via the sTROB. It takes a s<strong>in</strong>gle cycle<br />

to locate this repair po<strong>in</strong>t if it exists provided that we keep a small ordered list<br />

of all repair po<strong>in</strong>ts at decode. Recovery completes via the ROB for the rema<strong>in</strong><strong>in</strong>g<br />

<strong>in</strong>structions if any. Whenever the TROB is full, decode is not stalled but the<br />

current repair po<strong>in</strong>t if any is marked as <strong>in</strong>valid. This repair po<strong>in</strong>t and any preced<strong>in</strong>g<br />

ones can no longer be used for recovery. Figure 8 shows the per-benchmark<br />

and average performance deterioration for sTROB recovery as a function of the<br />

number of sTROB entries. We use a 1K-entry table of 4-bit resett<strong>in</strong>g counters<br />

to identify low confidence branches [8]. Very similar results were obta<strong>in</strong>ed with<br />

the zero-cost, anyweak estimator [11]. Average performance improves even with<br />

a 32-entry sTROB. These results demonstrate that the TROB can be used as

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!