01.05.2015 Views

Building Blocks for 64-bit AMD Opteron Clusters - Linux Clusters ...

Building Blocks for 64-bit AMD Opteron Clusters - Linux Clusters ...

Building Blocks for 64-bit AMD Opteron Clusters - Linux Clusters ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>AMD</strong> <strong>Opteron</strong> Processor Core Facts<br />

16 instruction bytes fetched per cycle<br />

L2<br />

Cache<br />

L1<br />

Instruction<br />

Cache<br />

<strong>64</strong>KB<br />

Fetch<br />

Fastpath<br />

Scan/Align<br />

Branch<br />

Prediction<br />

Microcode Engine<br />

System<br />

Request<br />

Queue<br />

Bus Unit<br />

L1<br />

Data<br />

Cache<br />

<strong>64</strong>KB<br />

µOPs<br />

Instruction Control Unit (72 entries)<br />

Int Decode & Rename FP Decode & Rename<br />

Crossbar<br />

Res Res Res<br />

36-entry FP scheduler<br />

Memory<br />

Controller<br />

HyperTransport TM<br />

44-entry<br />

Load/Store<br />

Queue<br />

AGU<br />

ALU<br />

MULT<br />

AGU AGU FADD FMUL FMISC<br />

ALU ALU<br />

9-way Out-Of-Order execution<br />

• 12-stage Int, 17-stage fast-path pipelines<br />

• Enhanced TLB structures w/flush filter<br />

• Opts <strong>for</strong> off-loading writes, probes, memory<br />

• 16 SSE & SSE2 128-<strong>bit</strong> xmm registers<br />

• 8 legacy x87 80-<strong>bit</strong> registers<br />

FPU Throughput<br />

• 36 entry FPU instruction scheduler<br />

• <strong>64</strong>-<strong>bit</strong>/80-<strong>bit</strong> FP Realized thru-put (1<br />

Mul + 1 Add)/cycle: 1.9 FLOPs/cycle<br />

• 32-<strong>bit</strong> FP Realized thru-put (2 Mul +<br />

2 Add)/cycle: 3.4+ FLOPs/cycle<br />

55

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!