3D Accelerators For Visualization - Prace Training Portal

training.prace.ri.eu

3D Accelerators For Visualization - Prace Training Portal

A Balanced Platform Approach to

Heterogeneous Computing

James Hrica, Product Manager Stream Computing Hardware

December 16, 2009


AMD Balanced Platform Advantage

Multi-core CPU is excellent for running

some algorithms

�� Ideal for serial, task-parallel

and irregular data-parallel

workloads

�� Manage platform I/O

Serial/Task-Parallel

Workloads

Delivers optimal performance for a wide range of

platform configurations

GPU is ideal for data parallel algorithms

�� Image processing, e.g. CT

reconstruction

�� Dense matrix algebra

�� Monte Carlo methods

�� etc

Graphics Workloads

Other Highly

Parallel Workloads


ATI Stream Technology is…

Heterogeneous: Developers leverage AMD GPUs and CPUs for

optimal application performance and user experience

High performance: Massively parallel, programmable GPU

architecture delivers unprecedented performance and power efficiency

Industry Standards: OpenCL and DirectCompute 11 enable crossplatform

development

Gaming

Sciences Government

Digital Content Creation

Engineering

Productivity


AMD’s HPC Product Portfolio

Energy efficient CPU and discrete GPU processors focused on

addressing the most demanding HPC workloads

4

Multi-core x86 Processors

•Outstanding Performance

•Superior Scalability

•Enhanced Power Efficiency

Professional Graphics

3D Accelerators For Visualization

•See More and Do More with Your Data

ATI Stream Computing

•GPU Optimized For Computation

•Massive Data-parallel Processing

•High Performance Per Watt


AMD’s HPC Product Portfolio

Energy efficient CPU and discrete GPU processors focused on

addressing the most demanding HPC workloads

5

Multi-core x86 Processors

•Outstanding Performance

•Superior Scalability

•Enhanced Power Efficiency

Professional Graphics

3D Accelerators For Visualization

•See More and Do More with Your Data

ATI Stream Computing

•GPU Optimized For Computation

•Massive Data-parallel Processing

•High Performance Per Watt


A Look At The Current Top 5 Of Top500

Nov 2009

Rank

6

Top 5 Supercomputer Sites – Nov 2009

Vendor

Rmax

(Tflops)

Computer Processors

1 Cray 1759 Jaguar AMD Opteron TM

2 IBM 1042 Roadrunner

IBM PowerXCell and

AMD Opteron TM

3 Cray 832 Kraken AMD Opteron TM

4 IBM 826 JUGENE IBM PowerPC 450

5 NUDT 563 Tianhe-1

Intel Xeon and

ATI RV770 GPU’s

The results above are the five highest Rmax results published on www.top500.org as of November 16, 2009. For the latest results, visit

http://www.top500.org/lists/2009/11.


The AMD Opteron 6100 Processor

� Target: Enterprise Class 2-way and

4-way Servers

– Twelve-core and Eight–core 12M L3 Cache

– CoolCore Technology, Enhanced AMD

PowerNow! Technology, Enhanced C1

state, AMD CoolSpeed Technology, APML

– Quad-Channel LV & U/RDDR3, ECC, On-line

spare

– Up to 3 DIMMs/channel, 12 per CPU

� Single Series for performance DP

and MP platforms

– 2P economics for 4P servers

– Compelling price/performance for volume

market

� G34 Socket Infrastructure

– Performance-optimized Power/thermals

– Quad 16-bit HT3 links, up to 6.4 GT/s per

link

– AMD SR56x0 chipset with AMD-Vi and

PCIe Gen2

7

4P Socket G34 Server Platform

12/8 core Processor Support

SR56x0

AMD-Vi AMD Vi

SP5100

South Bridge

Non-coherent Non coherent HT3

x4 A-Link A Link

SR56x0

AMD-Vi AMD Vi

Coherent HT3


Enhanced Integrated Memory Architecture

Benefits HPC, Virtualization and Database

4/8 Socket

2 Socket

4P AMD Opteron 61XX

8P AMD Opteron 84XX

4P AMD Opteron 84XX

4P Intel Xeon 74XX

2P AMD Opteron 61XX

2P Intel Xeon 55XX

2P AMD Opteron 24XX

Greater peak performance…

1) Based on measurements in AMD Performance Labs as of October 6, 2009. Please see backup slides for configuration information.

8

Memory Bandwidth

(GB/s in STREAM benchmark)

0 50 100

Memory Bandwidth

(GB/s in STREAM benchmark)

…and consistency across

power bands


AMD 2010-2011 Sweet Spot Server Strategy

4P/8P Platforms

~5% of Market*

2P Platforms

~75% of Market*

1P Platforms

~20% of Market*

9

Performance-per

Performance per-

watt and

Expandability

Highly Energy

Efficient and

AMD Opteron

6000 Series Platform

”Maranello” • 2/4 socket; 4 memory channels

• Highly scalable without compromising value

AMD Opteron

4000 Series Platform

2010 2011

“Magny-Cours”

8 and 12 cores

“Lisbon”

4 and 6 cores

“Interlagos”

12 and 16 cores

Bulldozer

Core

“Valencia”

6 and 8 cores

Cost Optimized “San Marino”

and “Adelaide”

• 1/2 socket; 2 memory channels

• New levels of value and power efficiency

*AMD internal estimates of total server market as of Q309

Platform Consistency

and Commonality


Designed for Scalability and Performance

“Bulldozer”

module

Two cores in a single unit

that enables two

simultaneous threads, the

building blocks of a

“Bulldozer” die

Parallel

Threads

The ability to execute two

threads on two discrete,

unshared cores without

compromising or creating

bottlenecks

10

Flex FP

A flexible floating point

unit that can be dedicated

OR shared between the

two cores per cycle

Dedicated

Scheduler

Independent integer

schedulers and an FP

scheduler improve

scalability by efficient

execution


Unprecedented Server CPU Performance Gains

AMD Opteron

244

2010 is projected to be the beginning of unprecedented leaps in server

performance-per-watt for AMD

11

AMD Opteron

250

AMD Opteron

x75

(dual-core)

AMD Opteron

285

(dual-core)

* “Magny-Cours” and “Interlagos” data is based on AMD projections

AMD Opteron

x356

AMD Opteron

x384

AMD Opteron

2435

“Magny-Cours”* “Interlagos”*


AMD’s HPC Product Portfolio

Energy efficient CPU and discrete GPU processors focused on

addressing the most demanding HPC workloads

12

Multi-core x86 Processors

•Outstanding Performance

•Superior Scalability

•Enhanced Power Efficiency

Professional Graphics

3D Accelerators For Visualization

•See More and Do More with Your Data

ATI Stream Computing

•GPU Optimized For Computation

•Massive Data-parallel Processing

•High Performance Per Watt


A Look At The Current Top 5 Of Top500

Nov 2009

Rank

13

Top 5 Supercomputer Sites – Nov 2009

Vendor

Rmax

(Tflops)

Computer Processors

1 Cray 1759 Jaguar AMD Opteron TM

2 IBM 1042 Roadrunner

IBM PowerXCell and

AMD Opteron TM

3 Cray 832 Kraken AMD Opteron TM

4 IBM 826 JUGENE IBM PowerPC 450

5 NUDT 563 Tianhe-1

Intel Xeon and

ATI RV770 GPU’s

The results above are the five highest Rmax results published on www.top500.org as of November 16, 2009. For the latest results, visit

http://www.top500.org/lists/2009/11.


NUDT’s Tianhe-1 (5 th in TOP 500 list)

14

1.206 PFLOPS peak - 563.1 TFLOPS LINPACK

6,144 x86 CPUs - 5,120 ATI RV770 GPUs


GPGPU Parallel Processing Power and Programmability

3000

2500

2000

1500

1000

500

0

Sep-05

R520

ATI RADEON

X1800

ATI FireGL

V7200

V7300

V7350

15

GigaFLOPS

Mar-06

*

* Peak single-precision performance;

For RV670, RV770 & Cypress divide by 5 for peak double-precision performance

R580(+)

ATI RADEON

X19xx

ATI FireStream

Oct-06

R600

ATI RADEON

HD 2900

ATI FireGL

V7600

V8600

V8650

GPGPU

via CTM

Apr-07

RV670

ATI RADEON

HD 3800

ATI FireGL

V7700

AMD FireStream

9170

Unified

Shaders

Nov-07

RV770

ATI RADEON

HD 4800

ATI FirePro

V8700

AMD FireStream

9250

9270

Double-precision

floating point

Jun-08

Cypress

ATI RADEON

HD 5870

Stream SDK

CAL+IL/Brook+

Dec-08

2.5x ALU

increase

OpenCL 1.0

DirectX 11

2.25x Perf.


GPU Computational Efficiency Progression

16

1.07

0.42

GFLOPS/W

GFLOPS/mm 2

2.01

1.06

2.21

0.92

4.50

2.24

7.50

14.47

GFLOPS/W

4.56

7.90

GFLOPS/mm 2


ATI Radeon HD 5870

The World’s Most Powerful and Advanced GPU

� Accelerating PCs with nearly 3 teraFLOPS of compute power

� Ultimate immersion with DirectX® 11 and ATI Eyefinity Technology

Compute Power 2.72 TFLOPS

Core Clock Speed 850 MHz

Processing

Elements

17

1600

Frame Buffer 1GB , 4.8Gbps

Max/Idle Board

Power

188W/27W

Max Res. 3x 2560x1600

Shipping Now!


ATI 5800 Series “Cypress” GPU Architecture

2.72 Teraflops Single Precision

544 Gigaflops Double Precision

• Compute Features:

� DirectCompute 11 and

OpenCL 1.0

� IEEE754-2008 Compliance

Enhancements

� 32-bit Atomic Operations

� 32kB Local Data Shares

� 64kB Global Data Share

� Global synchronization

� Append/consume buffers

Consumer Version Shipping Now!

18


SIMD Engine

Each SIMD:

– Includes 16 VLIW Thread Processing Units, each with 5 scalar

stream processing units + 32KB Local Data Share

– Has its own control logic and runs from a shared set of threads

– Has dedicated fetch unit w/ 8KB L1 cache

– Communicates with other SIMD cores via 64KB global data share

19


Thread Processors

� Co-issue MUL & dependent ADD in a

single clock

� Sum of Absolute Differences (SAD)

– 12x speed-up with native instruction

– Used for video encoding, computer

vision

– Exposed via OpenCL extension

� Bit-level ops (DirectX 11)

– Bit count, insert, extract, etc.

� Fused Multiply-Add

� IEEE754-2008 FP compliance

– All rounding modes

– FMA

– Denorms

– Flags

� Full hardware barrier support

20

� Each Thread Processor includes:

� 4 Stream Cores + 1 Special Function

Stream Core

� Branch Unit

� General Purpose Registers


Cross-Platform Programming with Standard API

21

High Level Language

Compilers

AMD

GPUs

OpenCL -

Open and Custom Tools

High Level

Tools

Industry Standard Interface

OpenCL

AMD

CPUs

• Cross-platform Cross platform development

• Interoperability with OpenGL

Application Specific

Libraries

Other CPUs

Other GPUs

Accelerators

• CPU/GPU backends enable balanced platform approach


Hybrid Parallel Gas Dynamics in OpenCL TM

LANL, IBM, AMD, NVIDIA Booths at SC09

Common Source!!

22

• SC09 demos on

•x86 CPU – Opteron

•x86 CPU – Xeon

•GPU – NVIDIA

•GPU – AMD

•Power6

•PowerXCell


ATI Stream SDK v2.0 Beta:

OpenCL For Multicore x86 CPUs and ATI GPUs

The Power of Fusion: Developers leverage heterogeneous

architecture to deliver superior user experience

• First complete OpenCL development platform

• Certified OpenCL 1.0 compliant by the Khronos Group

• Write code that can scale well on multi-core CPUs and GPUs

• AMD delivers on the promise of OpenCL, with both highperformance

CPU and GPU technologies

• Available for download now as part of ATI Stream SDK beta

program – includes documentation, samples, and developer

support

Beta Program: http://developer.amd.com/streambeta

23

Serial and

Task Parallel

Workloads

Software

Applications

Graphics Workloads

Data Parallel

Workloads

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos.


ATI Stream SDK v2.0 Beta:

OpenCL For Multicore x86 CPUs and GPUs

24


OpenCL View

Constants

Workgroups

Image cache

Global memory

26

Local memory

L2 cache


OpenCL TM Memory space on AMD GPU

Registers/LDS

Thread Processor Unit

SIMD

Local Data Share

Board Mem/Constant Cache

Board Memory

27

Private

Memory

Work Item

1

Private

Memory

Work Item

M

Compute Unit 1

Private

Memory

Work Item

1

Private

Memory

Work Item

M

Compute Unit N

Local Memory Local Memory

Global / Constant Memory Data Cache

Compute Device

Global Memory

Compute Device Memory


OpenCL Backend for HMPP (Scheduled to be released end of 2009 by CAPS)

• A compiler integrating OpenCL stream generator

o Build portable CPU and GPU hardware specific computations

• C & Fortran programming directives

o High level programming interface for scientific applications

• Runtime library

o Ease application deployment on multi-GPUs systems

C/Fortran HMPP annotated source code

HMPP Compiler

HMPP Preprocessor HMPP OpenCL OpenCL Generator

CPU Source

C/Fortran Std. Compiler

HMPP Runtime

CPU

OpenCL OpenCL Source

AMD OpenCL OpenCL Compiler

AMD OpenCL OpenCL Driver

ATI Stream Hardware

www.caps-entreprise.com 28


The Future…

29


A New Era of Processor Performance

Single-thread Performance

Single-Core Era

Constrained by:

Power

Complexity

30

Time

we are

here

?

Throughput

Performance

Multi-Core Era

Constrained by:

Power

Parallel SW availability

Scalability

we are

here

Time

(# of processors)

Targeted Application

Performance

Heterogeneous

Systems Era

Enabled by:

� Abundant data parallelism

� Power efficient GPUs

Constrained by:

Programming models

we are

here

Time

(Data-parallel exploitation)


A New Era of Processor Performance

Programmability CPU

31

Microprocessor Advancement

Single-Core

Single Core

Era

Multi-Core

Multi Core

Era

Homogeneous

Computing

Heterogeneous

Systems Era

Heterogeneous

Computing

System-level

System level

programmable

OpenCL/DX

driver-based

driver based

programs

Graphics

driver-based

driver based

programs

Throughput Performance GPU

GPU

Advancement


AMD Fusion APUs Fill the Need

x86 CPU owns

the Software World

� Windows, MacOS

and Linux franchises

� Thousands of apps

� Established programming

and memory model

� Mature tool chain

� Extensive backward

compatibility for

applications and OSs

� High barrier to entry

32

GPU Optimized for

Modern Workloads

� Enormous parallel

computing capacity

� Outstanding

performance-per -

watt-per-dollar

� Very efficient

hardware threading

� SIMD architecture well

matched to modern

workloads: video, audio,

graphics


Heterogeneous Computing:

Next-Generation Software Ecosystem

Load balance

across CPUs and

GPUs; leverage

AMD Fusion

performance

advantages

33

Advanced Optimizations

& Load Balancing

End-user End user Applications

High Level

Frameworks

Middleware/Libraries: Video,

Imaging, Math/Sciences,

Physics

OpenCL & Direct Compute

Hardware & Drivers: AMD Fusion, Fusion

Discrete CPUs/GPUs

Increase ease of

application

development

Tools: HLL

compilers,

Debuggers,

Profilers

Drive new

features into

industry standards


Today

758 million transistors @45nm

Multi-tasking

Most compute tasks

34

2.15 billion transistors @40nm

3D OS

Multi-panel HD gaming

Full HD video and audio


Now the AMD Fusion Era of Computing Begins

� ~1 billion transistors @32nm

in one design

� APU: Fusion of CPU & GPU compute

power within one processor

35

� Significantly enhances power

efficiency

� High-bandwidth I/O


Summary

• Heterogeneous computing has arrived with OpenCL

• Heterogeneous building blocks exist with significant

increases on the horizon

• The path to cost effective heterogeneous computing has

begun and AMD is helping to lead the charge

• The future is fusion!

36


Thank You!

37


Footnotes

Two-Socket SPECint ® _rate2006

158 using 2 x Six-Core AMD Opteron processors (“Istanbul”) Model 2419 EE in

Supermicro A+ Server 1021M-UR+B server, 32GB (8x4GB DDR2-800) memory, 250GB

SATA disk drive, Red Hat Enterprise Linux ® Server 5.2 64-bit

119 using 2 x Quad-Core AMD Opteron processors (“Shanghai”) Model 2377 EE in

Supermicro A+ Server 1021M-UR+B server, 32GB (8x4GB DDR2-800) memory, 300GB

SATA disk drive, SuSE Linux ® Enterprise Server 10 SP1 64-bit

http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090406-06935.html

38


HT Assist

• HT Assist helps reduces memory latency and increase overall system performance in

multi-socket systems

• HT Assist improves HyperTransport technology link efficiency and increases

performance by:

• Reducing probe traffic

• Resolving probes more quickly

• Probe “broadcasting” can be eliminated in 8 of 11 typical CPU-to-CPU transactions

• HT Assist can provide a significant benefit in 4-way and greater systems

• 4-Way Stream memory bandwidth performance improves by ~60%

(41.5GB/s with HT Assist vs. 25.5GB/s without HT Assist)

• HT Assist reserves 1MB portion of each CPU’s L3 cache to act as a directory. This

directory tracks where that CPU’s cache lines are used elsewhere in the system

For 2-way systems this reduction in L3 cache size may eliminate any HT Assist

benefit as probe traffic is already significantly less than in 4-way systems

• Each CPU is considered the “host” of the cache information contained in its L3 directory

For many CPU-to-CPU transactions the host CPU knows exactly which CPU to probe for

the information it needs, eliminating the need to “broadcast”.

• Reducing broadcasting cuts down considerably the amount of system probe traffic and

helps resolve probes more quickly, resulting in reduced memory latency and improved

system performance

39


Two-Socket SPECint ® _rate2006

205 using 2 x Six-Core AMD Opteron processors (“Istanbul”) Model 2435 in Supermicro A+ Server 1021M-UR+B

server, 32GB (8x4GB DDR2-800) memory, 250GB SATA disk drive, SuSE Linux ® Enterprise Server 10 SP2 64-bit

136 using 2 x Quad-Core AMD Opteron processors (“Shanghai”) Model 2384 in Supermicro A+ Server 1021M-

UR+B server, 32GB (8x4GB DDR2-800) memory, 250GB SATA disk drive, SuSE Linux ® Enterprise Server 10 SP2

64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081024-05683.html

217 using 2 x Intel Xeon processors (“Gainestown”) Model E5540 in Supermicro SuperServer 6026T-NTR+ server,

24GB (12x2GB DDR3-1066) memory, 150GB SATA disk drive, SuSE Linux ® Enterprise Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2009q1/cpu2006-20090316-06749.html

131 using 2 x Quad-Core AMD Opteron processors (“Shanghai”) Model 2382 in Dell PowerEdge R805 server, 32GB

(8x4GB DDR2-800) memory, 73GB SAS disk drive, SuSE Linux® Enterprise Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081110-05955.html

51.5 using 2 x Dual-Core AMD Opteron processors Model 2218 in Fujitsu Siemens Computers PRIMERGY RX330

S1 server, 32GB (8x4GB DDR2-667) memory, 36GB SAS disk drive, SuSE Linux® Enterprise Server 10 64-bit

http://www.spec.org/cpu2006/results/res2007q2/cpu2006-20070427-00947.html

139 using 2 x Quad-Core Intel Xeon processors (“Harpertown”) Model E5450 in Dell PowerEdge 2950 III server,

16GB (4x4GB DDR2-667 FB-DIMM) memory, 73GB SAS disk drive, SuSE Linux ® Enterprise Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081027-05790.html

40


Two-Socket SPECfp ® _rate2006

143 using 2 x Six-Core AMD Opteron processors (“Istanbul”) Model 2435 in Supermicro A+ Server 1021M-UR+B

server, 32GB (8x4GB DDR2-800) memory, 250GB SATA disk drive, SuSE Linux ® Enterprise Server 10 SP2 64-bit

118 using 2 x Quad-Core AMD Opteron processors (“Shanghai”) Model 2384 in Supermicro A+ Server 1021M-

UR+B server, 32GB (8x4GB DDR2-800) memory, 250GB SATA disk drive, SuSE Linux ® Enterprise Server 10 SP2

64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081024-05684.html

115 using 2 x Quad-Core AMD Opteron processors (“Shanghai”) Model 2382 in Dell PowerEdge R805 server, 32GB

(8x4GB DDR2-800) memory, 73GB SAS disk drive, SuSE Linux® Enterprise Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081110-05954.html

48.5 using 2 x Dual-Core AMD Opteron processors Model 2218 in Fujitsu Siemens Computers PRIMERGY RX330

S1 server, 32GB (8x4GB DDR2-667) memory, 36GB SAS disk drive, SuSE Linux® Enterprise Server 10 64-bit

http://www.spec.org/cpu2006/results/res2007q2/cpu2006-20070427-00945.html

169 using 2 x Intel Xeon processors (“Gainestown”) Model E5540 in Supermicro SuperServer 6026T-NTR+ server,

24GB (12x2GB DDR3-1066) memory, 150GB SATA disk drive, SuSE Linux ® Enterprise Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2009q1/cpu2006-20090316-06747.html

78.4 using 2 x Quad-Core Intel Xeon processors (“Harpertown”) Model E5450 in Dell PowerEdge 2950 III server,

16GB (4x4GB DDR2-667 FB-DIMM) memory, 73GB SAS disk drive, SuSE Linux ® Enterprise Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081027-05795.html

41


Two-Socket SPECjbb ® 2005

462189 score using 2 x Six-Core AMD Opteron processors (“Istanbul”) Model 2435 in Supermicro A+

Server 1021M-UR+B server, 32GB (8x4GB DDR2-800) memory, 300GB SATA disk drive, Microsoft ®

Windows Server ® 2003 Enterprise x64 Edition SP2

352700 score using 2 x Quad-Core AMD Opteron processors (“Shanghai”) Model 2384 in Supermicro A+

Server 1021M-UR+B server, 32GB (8x4GB DDR2-800) memory, 300GB SATA disk drive, Microsoft ®

Windows Server ® 2003 Enterprise x64 Edition SP2

http://www.spec.org/osg/jbb2005/results/res2008q4/jbb2005-20081024-00551.html

468132 score using 2 x Intel Xeon processors (“Gainestown”) Model E5540 in Supermicro X8DTN+

motherboard, 12GB (6x2GB DDR3-1066) memory, 150GB SATA disk drive, Microsoft ® Windows Server ®

2008 R2 Enterprise x64 Edition SP1

362239 using 2 x Quad-Core AMD Opteron processors (“Shanghai”) Model 2382 in Supermicro A+

Server 1021M-UR+B server, 32GB (8x4GB DDR2-800) memory, 300GB SATA disk drive, Microsoft ®

Windows Server ® 2008 Enterprise x64 Edition

66420 using 2 x Dual-Core AMD Opteron processors Model 2218 in Tyan Thunder K9HM (S3992) server,

8GB (16x512MB DDR2-667) memory, 120GB IDE disk drive, Microsoft® Windows Server® 2003

Enterprise Edition SP1 64-bit

http://www.spec.org/osg/jbb2005/results/res2006q3/jbb2005-20060718-00154.html

310028 using 2 x Quad-Core Intel Xeon processors (“Harpertown”) Model E5450 in IBM BladeCenter

HS21XM server, 16GB (8x2GB DDR2-667) memory, 73GB SAS disk drive, Microsoft ® Windows Server ®

2003 R2 Enterprise x64 Edition SP1

http://www.spec.org/osg/jbb2005/results/res2008q2/jbb2005-20080527-00494.html

42


Two-Socket STREAM

21GB/s using 2 x Six-Core AMD Opteron processors (“Istanbul”) Model 2435 in Supermicro H8DMU+

motherboard, 16GB (8x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit

21GB/s using 2 x Quad-Core AMD Opteron processors (“Shanghai”) Model 8384 in Supermicro H8DMU+

motherboard, 16GB (8x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit

Two-Socket Server Idle Power – Slide 11

83.21W using 2 x Dual-Core AMD Opteron processors Model 2218 in ZT Systems 1224Ri Datacenter

Server, 16GB (4x4GB DDR2-800) memory, 80GB SSD SATA disk drive, Sparkle Power International LTD

SPI4001UG power supply, Microsoft® Windows Server® 2008 SP1 64-bit

92.84W using 2 x Quad-Core AMD Opteron (“Shanghai”) processors Model 2218 in ZT Systems 1224Ri

Datacenter Server, 16GB (4x4GB DDR2-800) memory, 80GB SSD SATA disk drive, Sparkle Power

International LTD SPI4001UG power supply, Microsoft® Windows Server® 2008 SP1 64-bit

91.32W using 2 x Six-Core AMD Opteron (“Istanbul”) processors Model 2218 in ZT Systems 1224Ri

Datacenter Server, 16GB (4x4GB DDR2-800) memory, 80GB SSD SATA disk drive, Sparkle Power

International LTD SPI4001UG power supply, Microsoft® Windows Server® 2008 SP1 64-bit

43


Four-Socket and Eight-Socket SPECint ® _rate2006

402 using 4 x Six-Core AMD Opteron processors (“Istanbul”) Model 8435 in Tyan Thunder n4250QE

server (S4985-SI), 64GB (16x4GB DDR2-800) memory, 250GB SATA disk drive, Red Hat Enterprise

Linux ® Server release 5.3 64-bit

http://www.spec.org/cpu2006/results/res2009q2/cpu2006-20090511-07355.html

249 using 4 x Quad-Core AMD Opteron processors (“Shanghai”) Model 8384 in HP ProLiant DL585 G5

server, 64GB (16x4GB DDR2-800) memory, 146GB SAS disk drive, Red Hat Enterprise Linux ® Server

release 5.2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081027-05740.html

386 using 8 x Quad-Core AMD Opteron processors (“Shanghai”) Model 8384 in Sun Fire X4600 M2

server, 128GB (64x2GB DDR2-667) memory, 2x 72GB SAS disk drive, OpenSolaris 2008.05

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081208-06223.html

294 using 4 x Hex-Core Intel Xeon processors (“Dunnington”) Model X7460 in IBM System x3850 M2

server, 64GB (16x4GB DDR2-667 FB-DIMM) memory, 73GB SAS disk drive, SuSE Linux ® Enterprise

Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20080915-05319.html

253 using 4 x Hex-Core Intel Xeon processors (“Dunnington”) Model E7450 in Intel Server System

S7000FC4UR motherboard, 32GB (16x2GB DDR2-667 FB-DIMM) memory, 73GB SAS disk drive, SuSE

Linux ® Enterprise Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20080915-05347.html

44


Four-Socket SPECjbb ® 2005

866261 using 4 x Six-Core AMD Opteron processors (“Istanbul”) Model 8435 in Tyan Transport TX46

server, 64GB (16x4GB DDR2-800) memory, 250GB SATA disk drive, Microsoft ® Windows Server ® 2008

Enterprise x64 Edition

721843 using 4 x Quad-Core AMD Opteron processors (“Shanghai”) Model 8384 in IBM Bladecenter

LS42 server, 64GB (16x4GB DDR2-800) memory, 36GB SAS disk drive, Microsoft ® Windows Server ® 2008

Enterprise x64 Edition

http://www.spec.org/osg/jbb2005/results/res2008q4/jbb2005-20081112-00559.html

633897 using 4 x Hex-Core Intel Xeon processors (“Dunnington”) Model X7460 in Fujitsu PRIMERGY

RX600 S4 server, 64GB (16x4GB DDR2-667 FB-DIMM) memory, 36GB SAS disk drive, Microsoft ®

Windows Server ® 2003 Enterprise x64 Edition

http://www.spec.org/osg/jbb2005/results/res2009q1/jbb2005-20090305-00664.html

45


Four-Socket SPECfp ® _rate2006

276 using 4 x Six-Core AMD Opteron processors (“Istanbul”) Model 8435 in Tyan Transport TX46 server,

64GB (16x4GB DDR2-800) memory, 250GB SATA disk drive, Red Hat Enterprise Linux ® Server release 5.2

64-bit

212 using 4 x Quad-Core AMD Opteron processors (“Shanghai”) Model 8384 in Dell PowerEdge M905

server, 64GB (16x4GB DDR2-800) memory, 36GB SAS + 73GB SAS disk drives, SuSE Linux ® Enterprise

Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20081110-05969.html

156 using 4 x Hex-Core Intel Xeon processors (“Dunnington”) Model X7460 in HP ProLiant DL580 G5

server, 64GB (16x4GB DDR2-667 FB-DIMM) memory, 146GB SAS disk drive, SuSE Linux ® Enterprise

Server 10 SP1 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20080911-05253.html

139 using 4 x Hex-Core Intel Xeon processors (“Dunnington”) Model E7450 in Fujitsu Siemens Computers

PRIMERGY RX600 S4 server, 64GB (16x4GB DDR2-667 FB-DIMM) memory, 36GB SAS disk drive, SuSE

Linux ® Enterprise Server 10 SP2 64-bit

http://www.spec.org/cpu2006/results/res2008q4/cpu2006-20080922-05389.html

46


Four-Socket STREAM

42GB/s using 4 x Six-Core AMD Opteron processors (“Istanbul”) Model 8435 in Tyan Thunder n4250QE

(S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP1 64bit

(with HT Assist enabled)

25.5GB/s using 4 x Six-Core AMD Opteron processors (“Istanbul”) Model 8435 in Tyan Thunder

n4250QE (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server

10 SP1 64-bit (with HT Assist disabled)

24GB/s using 4 x Quad-Core AMD Opteron processors (“Shanghai”) Model 8384 in Tyan Thunder

n4250QE (S4985-E) motherboard, 32GB (16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server

10 SP1 64-bit

9GB/s using 4 x Hex-Core Intel Xeon processors (“Dunnington”) Model X7460 in Supermicro X7QC3+

motherboard, 32GB (16x2GB DDR2-667 FB-DIMM) memory, SuSE Linux® Enterprise Server 10 SP1 64bit

9GB/s using 4 x Hex-Core Intel Xeon processors (“Dunnington”) Model E7450 in Supermicro X7QC3+

motherboard, 32GB (16x2GB DDR2-667 FB-DIMM) memory, SuSE Linux® Enterprise Server 10 SP1 64bit

47


HPC Interconnect performance testing

Configurations used to collect performance data:

• ausqc[01-08]: 2 x Quad-Core AMD Opteron processors Model

2384 in Supermicro H8DMU+ motherboard, 16GB (8x2GB

DDR2-800 ), 150GB IDE disk drive, IB DDR [card,switch], SuSE

Linux® Enterprise Server 10 SP1 64-bit

• fiorano[01-08]: 2 x Quad-Core AMD Opteron processors Model

2387 in Toonie 2 , 16GB (8x2GB DDR2-800 ), 150GB IDE disk

drive, IB QDR [card,switch], SuSE Linux® Enterprise Server 10

SP2 64-bit

• Tooniehpc[1,2]: 2 x Quad-Core AMD Opteron processors Model

2387 in Toonie 2 , 16GB (8x2GB DDR2-800 ), 150GB IDE disk

drive, IB [D,QDR] back to back [no IB switch], SuSE Linux®

Enterprise Server 10 SP2 64-bit

• IB [D,Q]DR cards and switches provided by Mellanox.

48


STREAM (Memory Bandwidth)

2 x Six-Core AMD Opteron processors (“Istanbul”) Model 2435 in Supermicro A+ Server 1021M-UR+B

server, 16GB (8x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP2 64-bit

4 x Six-Core AMD Opteron processors (“Istanbul”) Model 8435 in Tyan Transport TX46 server, 32GB

(16x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP2 64-bit

8 x Six-Core AMD Opteron processors (“Istanbul”) Model 8435, AMD 5690 Chipset reference design

platform, 64GB (32x2GB DDR2-800) memory, SuSE Linux® Enterprise Server 10 SP2 64-bit

2 x AMD Opteron processors (“Magny-Cours ”) Model 61xx, AMD 5690 Chipset reference design

platform, 32GB memory, SuSE Linux® Enterprise Server 10 SP2 64-bit.

4 x AMD Opteron processors (“Magny-Cours”) Model 61xx, AMD 5690 Chipset reference design platform,

64GB memory, SuSE Linux® Enterprise Server 10 SP2 64-bit.

2x Intel Xeon processors (“Gainestown”) Model X5570 in Supermicro SuperServer 6026T-NTR+ server,

24GB memory (6x4GB DDR3-1333) memory, SuSE Linux® Enterprise Server 10 SP2 64-bit

2x Intel Xeon processors (“Gainestown”) Model E5540 in Supermicro SuperServer 6026T-NTR+ server,

24GB memory (6x4GB DDR3-1066) memory, SuSE Linux® Enterprise Server 10 SP2 64-bit

2x Intel Xeon processors (“Gainestown”) Model L5520 in Supermicro SuperServer 6026T-NTR+ server,

24GB memory (6x4GB DDR3-1066) memory, SuSE Linux® Enterprise Server 10 SP2 64-bit

4x Intel Xeon processors (“Dunnington”) Model X7460 in Supermicro X7QC3+ motherboard, 32GB

(16x2GB FBDIMM) memory, SuSE Linux® Enterprise Server 10 SP1 64-bit

49


Disclaimer & Attribution

DISCLAIMER

The information presented in this document is for informational purposes only and may contain technical

inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons,

including but not limited to product and roadmap changes, component and motherboard version changes,

new model and/or product releases, product differences between differing manufacturers, software changes,

BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or

revise this information. However, AMD reserves the right to revise this information and to make changes

from time to time to the content hereof without obligation of AMD to notify any person of such revisions or

changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND

ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN

THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY

PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT,

SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED

HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2009 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI logo, AMD

CoolCore, AMD PowerNow!, AMD Opteron, AMD Virtualization, AMD-V, Dual Dynamic Power Management,

Catalyst, FireGL, FirePro, FireStream, Radeon, and combinations thereof are trademarks of Advanced Micro

Devices, Inc. HyperTransport is a licensed trademark of the HyperTransport Technology consortium.

Microsoft, Windows, and DirectX are registered trademarks of Microsoft Corporation in the United States

and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their

respective owners.

OpenCL is a trademark of Apple Inc. used under License to the Khronos Group Inc.

50

More magazines by this user
Similar magazines