Building Blocks for 64-bit AMD Opteron Clusters - Linux Clusters ...
Building Blocks for 64-bit AMD Opteron Clusters - Linux Clusters ...
Building Blocks for 64-bit AMD Opteron Clusters - Linux Clusters ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>AMD</strong> <strong>Opteron</strong> Processor Core Facts<br />
16 instruction bytes fetched per cycle<br />
L2<br />
Cache<br />
L1<br />
Instruction<br />
Cache<br />
<strong>64</strong>KB<br />
Fetch<br />
Fastpath<br />
Scan/Align<br />
Branch<br />
Prediction<br />
Microcode Engine<br />
System<br />
Request<br />
Queue<br />
Bus Unit<br />
L1<br />
Data<br />
Cache<br />
<strong>64</strong>KB<br />
µOPs<br />
Instruction Control Unit (72 entries)<br />
Int Decode & Rename FP Decode & Rename<br />
Crossbar<br />
Res Res Res<br />
36-entry FP scheduler<br />
Memory<br />
Controller<br />
HyperTransport TM<br />
44-entry<br />
Load/Store<br />
Queue<br />
AGU<br />
ALU<br />
MULT<br />
AGU AGU FADD FMUL FMISC<br />
ALU ALU<br />
9-way Out-Of-Order execution<br />
• 12-stage Int, 17-stage fast-path pipelines<br />
• Enhanced TLB structures w/flush filter<br />
• Opts <strong>for</strong> off-loading writes, probes, memory<br />
• 16 SSE & SSE2 128-<strong>bit</strong> xmm registers<br />
• 8 legacy x87 80-<strong>bit</strong> registers<br />
FPU Throughput<br />
• 36 entry FPU instruction scheduler<br />
• <strong>64</strong>-<strong>bit</strong>/80-<strong>bit</strong> FP Realized thru-put (1<br />
Mul + 1 Add)/cycle: 1.9 FLOPs/cycle<br />
• 32-<strong>bit</strong> FP Realized thru-put (2 Mul +<br />
2 Add)/cycle: 3.4+ FLOPs/cycle<br />
55