19.11.2014 Views

CFD - GPU Technology Conference

CFD - GPU Technology Conference

CFD - GPU Technology Conference

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Stan Posey<br />

NVIDIA, Santa Clara, CA, USA; sposey@nvidia.com


Agenda: <strong>GPU</strong> Acceleration for Applied <strong>CFD</strong><br />

Overview of <strong>GPU</strong> Progress for <strong>CFD</strong><br />

<strong>GPU</strong> Acceleration of ANSYS Fluent<br />

<strong>GPU</strong> Acceleration of OpenFOAM<br />

2


<strong>GPU</strong> Progress Summary for <strong>GPU</strong>-Parallel <strong>CFD</strong><br />

<strong>GPU</strong> progress in <strong>CFD</strong> research continues to expand<br />

Growth from particle-based <strong>CFD</strong> and high-order methods<br />

Explicit schemes generally more progress than implicit<br />

Strong <strong>GPU</strong> investments by commercial <strong>CFD</strong> vendors (ISVs)<br />

Breakthroughs in <strong>GPU</strong>-parallel linear solvers and preconditioners<br />

<strong>GPU</strong>s for 2 nd -level parallelism, preserves costly MPI investment<br />

ISV focus on hybrid parallel <strong>CFD</strong> that utilizes all CPU cores + <strong>GPU</strong><br />

<strong>GPU</strong> progress for end-user developed <strong>CFD</strong> with OpenACC<br />

Most benefits to aerospace companies with legacy Fortran<br />

<strong>GPU</strong>s behind fast growth in particle-based commercial <strong>CFD</strong><br />

New ISV developments in lattice Boltzmann (LBM) and SPH<br />

3


<strong>CFD</strong> Software Character and <strong>GPU</strong> Suitability<br />

Structured Grid FV Unstructured FV Unstructured FE<br />

Explicit<br />

Usually<br />

Compressible<br />

Numerical operations on I,J,K stencil, no “solver”<br />

[Typically flat profiles: <strong>GPU</strong> strategy of directives (OpenACC)]<br />

Finite Volume<br />

Finite Element:<br />

Implicit<br />

Usually<br />

Incompressible<br />

Sparse matrix linear algebra – iterative solvers<br />

[Hot spot ~50%, small % LoC: <strong>GPU</strong> strategy of CUDA and libs]<br />

4


<strong>CFD</strong> Speedups for <strong>GPU</strong> Relative to 8-Core CPU<br />

Structured Grid FV Unstructured FV Unstructured FE<br />

Explicit<br />

Usually<br />

Compressible<br />

Implicit<br />

Usually<br />

Incompressible<br />

~10x ~5x<br />

Turbostream<br />

SJTU RANS<br />

Structured grid explicit<br />

generally best <strong>GPU</strong> fit<br />

Finite Volume<br />

Veloxi<br />

- SD++<br />

Stanford<br />

(Jameson)<br />

- FEFLO<br />

(Lohner)<br />

Finite Element:<br />

5


Turbostream: <strong>CFD</strong> for Turbomachinery<br />

Source:<br />

http://www.turbostream-cfd.com/<br />

Sample Turbostream <strong>GPU</strong> Simulations<br />

Typical Routine Simulation<br />

Large-scale Simulation<br />

~19x Speedup<br />

6


Commercial Aircraft Wing Design on <strong>GPU</strong>s<br />

COMAC and SJTU<br />

Commercial Aircraft Corporation of China<br />

<strong>GPU</strong> Application<br />

SJTU-developed explicit <strong>CFD</strong> RANS for<br />

aerodynamic evaluation of wing shapes<br />

COMAC Wing<br />

Candidate<br />

<strong>GPU</strong> Benefit<br />

Use of Tesla C2070: 37x vs. single core<br />

Intel core i7 CPU<br />

Faster simulations for more wing design<br />

candidates vs. costly wind tunnel tests<br />

Expanding to multi-<strong>GPU</strong> and full aircraft<br />

ONERA M6 Wing<br />

<strong>CFD</strong> Simulation<br />

7


<strong>CFD</strong> Speedups for <strong>GPU</strong> Relative to 8-Core CPU<br />

Structured Grid FV Unstructured FV Unstructured FE<br />

Explicit<br />

Usually<br />

Compressible<br />

~15x ~5x<br />

Turbostream<br />

Veloxi<br />

SJTU RANS<br />

- SD++<br />

Stanford<br />

(Jameson)<br />

- FEFLO<br />

(Lohner)<br />

Finite Volume<br />

Finite Element:<br />

Implicit<br />

Usually<br />

Incompressible<br />

Commercial <strong>CFD</strong> mostly<br />

unstructured implicit<br />

- ANSYS Fluent<br />

- Culises for<br />

OpenFOAM<br />

- SpeedIT for<br />

OpenFOAM<br />

- <strong>CFD</strong>-ACE+<br />

- FIRE<br />

~2x<br />

- Moldflow<br />

- AcuSolve<br />

- Moldex3D<br />

8


NVIDIA Strategy for <strong>GPU</strong>-Accelerated <strong>CFD</strong><br />

Strategic Alliances<br />

Business and technical alliances with key ISVs (ANSYS, CD-adapco, etc.)<br />

Invest in long-term technical collaboration for ANSYS Fluent acceleration<br />

Develop key technical collaborations with <strong>CFD</strong> research community:<br />

TiTech—Aoki, Stanford—Jameson, Oxford—Giles, Wyoming—Mavriplis, others<br />

Software Development<br />

NVIDIA linear solver toolkit with emphasis on AMG for industry <strong>CFD</strong><br />

Invest in relevant high-order methods (DGM, flux reconstruction, etc.)<br />

Applications Support<br />

Direct developer support for range of ISV and customer requests<br />

Implicit Schemes: Integration support of libraries and solver toolkit<br />

Explicit Schemes: Stencil libraries, OpenACC support for Fortran<br />

9


Primary Commercial CAE and <strong>GPU</strong> Progress<br />

ISV Primary Applications (Green color indicates CUDA-ready during 2013)<br />

ANSYS<br />

ANSYS Mechanical; ANSYS Fluent; ANSYS HFSS<br />

DS SIMULIA Abaqus/Standard; Abaqus/Explicit; Abaqus/<strong>CFD</strong><br />

MSC Software<br />

Altair<br />

CD-adapco<br />

Autodesk<br />

ESI Group<br />

Siemens<br />

LSTC<br />

Mentor<br />

Metacomp<br />

MSC Nastran; Marc; Adams<br />

RADIOSS; AcuSolve<br />

STAR-CD; STAR-CCM+<br />

AS Mechanical, Moldflow, AS <strong>CFD</strong><br />

PAM-CRASH imp; <strong>CFD</strong>-ACE+<br />

NX Nastran<br />

LS-DYNA; LS-DYNA <strong>CFD</strong><br />

FloEFD, FloTherm<br />

<strong>CFD</strong>++<br />

10


Additional Commercial <strong>GPU</strong> Developments<br />

ISV Domain Location Primary Applications<br />

FluiDyna <strong>CFD</strong> Germany Culises for OpenFOAM; LBultra<br />

Vratis <strong>CFD</strong> Poland Speed-IT for OpenFOAM; ARAEL<br />

Prometech <strong>CFD</strong> Japan Particleworks<br />

Turbostream <strong>CFD</strong> England, UK Turbostream<br />

IMPETUS Explicit FEA Sweden AFEA<br />

AVL <strong>CFD</strong> Austria FIRE<br />

CoreTech <strong>CFD</strong> (molding) Taiwan Moldex3D<br />

Intes Implicit FEA Germany PERMAS<br />

Next Limit <strong>CFD</strong> Spain XFlow<br />

CPFD <strong>CFD</strong> USA BARRACUDA<br />

Flow Science <strong>CFD</strong> USA FLOW-3D<br />

SCSK Implicit FEA Japan ADVENTURECluster<br />

CDH Implicit FEA Germany AMLS; FastFRS<br />

FunctionBay MB Dynamics S. Korea RecurDyn<br />

Cradle Software <strong>CFD</strong> Japan SC/Tetra; scSTREAM<br />

11


Status Summary of ISVs and <strong>GPU</strong> Acceleration<br />

Every primary ISV has products available on <strong>GPU</strong>s or ongoing evaluation<br />

The 4 largest ISVs all have products based on <strong>GPU</strong>s, some at 3rd generation<br />

ANSYS SIMULIA MSC Software Altair<br />

The top 4 out of 5 ISV applications are available on <strong>GPU</strong>s today<br />

ANSYS Fluent, ANSYS Mechanical, Abaqus/Standard, MSC Nastran, . . . LS-DYNA implicit only<br />

Several new ISVs were founded with <strong>GPU</strong>s as a primary competitive strategy<br />

Prometech, FluiDyna, Vratis, IMPETUS, Turbostream<br />

Open source <strong>CFD</strong> OpenFOAM available on <strong>GPU</strong>s today with many options<br />

Commercial options: FluiDyna, Vratis; Open source options: Cufflink, Symscape ofgpu, RAS, etc.<br />

12


Basics of <strong>GPU</strong> Computing for ISV Software<br />

ISV software use of <strong>GPU</strong> acceleration is user-transparent<br />

Jobs launch and complete without additional user steps<br />

User informs ISV application (GUI, command) that a <strong>GPU</strong> exists<br />

Schematic of a CPU with an attached <strong>GPU</strong> accelerator<br />

CPU begins/ends job, <strong>GPU</strong> manages heavy computations<br />

CPU<br />

I/O<br />

Hub<br />

Cache<br />

1<br />

4<br />

DDR<br />

DDR<br />

PCI-Express<br />

3<br />

2<br />

GDDR<br />

GDDR<br />

<strong>GPU</strong><br />

Schematic of an x86 CPU<br />

with a <strong>GPU</strong> accelerator<br />

1. ISV job launched on CPU<br />

2. Solver operations sent to <strong>GPU</strong><br />

3. <strong>GPU</strong> sends results back to CPU<br />

4. ISV job completes on CPU<br />

13


Commercial <strong>CFD</strong> Focus on Sparse Solvers<br />

<strong>CFD</strong> Application Software<br />

Read input, matrix Set-up<br />

<strong>GPU</strong><br />

Implicit Sparse<br />

Matrix Operations<br />

- Hand-CUDA Parallel<br />

40% - 65% of<br />

Profile time,<br />

Small % LoC<br />

Implicit Sparse<br />

Matrix Operations<br />

CPU<br />

- <strong>GPU</strong> Libraries, CUBLAS<br />

- OpenACC Directives<br />

Global solution, write output<br />

(Investigating OpenACC<br />

for more tasks on <strong>GPU</strong>)<br />

+<br />

14


NVIDIA Offers an Accelerated Solver Toolkit<br />

Toolkit of linear solvers, preconditioners, other, for large sparse Ax=b<br />

BiCGstab AMG Jacobi<br />

MC-DILU<br />

Available schemes include:<br />

AMG – multi-level scheme popular with several commercial <strong>CFD</strong><br />

Jacobi, BiCGStab, FGMRES, MC-DILU, and others<br />

Use of NVIDIA linear solver toolkit for industry-ready <strong>CFD</strong>:<br />

ANSYS 14.5 collaboration introduced their AMG-<strong>GPU</strong> solver in Nov 2012<br />

FluiDyna collaboration on Culises 2.0 AMG solver library for OpenFOAM<br />

Other ISVs and customer <strong>CFD</strong> software undergoing evaluation . . .<br />

15


<strong>GPU</strong> Developments for Aircraft <strong>CFD</strong><br />

External Aero<br />

Developer Location Software<br />

(Green color indicates <strong>GPU</strong>-ready during 2013)<br />

NASA USA OVERFLOW<br />

NASA USA FUN3D<br />

AFRL USA AVUS<br />

ONERA France elsA<br />

Stanford/Jameson USA SD++<br />

JAXA Japan UPACS<br />

ANSYS USA ANSYS Fluent 15.0<br />

CD-adapco USA/UK STAR-CCM+<br />

Metacomp USA <strong>CFD</strong>++<br />

Internal Flows<br />

ANSYS USA ANSYS Fluent 15.0<br />

FluiDyna Germany Culises for OpenFOAM 2.2.0<br />

Vratis Poland Speed-IT for OpenFOAM 2.2.0<br />

CD-adapco USA/UK STAR-CCM+<br />

16


<strong>GPU</strong> Developments for Turbine Engine <strong>CFD</strong><br />

Turbomachinery<br />

Developer Location Software<br />

(Green color indicates CUDA-ready during 2013)<br />

Turbostream England, UK Turbostream 3.0<br />

Oxford / Rolls Royce England, UK OP2 / Hydra<br />

ANSYS USA ANSYS <strong>CFD</strong> 15.0 (Fluent + CFX)<br />

Combustor<br />

ANSYS USA ANSYS Fluent 15.0<br />

FluiDyna Germany Culises for OpenFOAM 2.2.0<br />

Vratis Poland Speed-IT for OpenFOAM 2.2.0<br />

Cascade Technologies USA CHARLES<br />

Convergent Science USA Converge <strong>CFD</strong><br />

Sandia NL / Oak Ridge NL USA<br />

S3D<br />

Nozzle / Noise<br />

Naval Research Lab USA JENRE<br />

Aviadvigatel OJSC Russia GHOST <strong>CFD</strong><br />

17


<strong>GPU</strong> Status of Select Automotive CAE Software<br />

Select Automotive CAE Application ISV Select CAE Software <strong>GPU</strong> Status<br />

CSM: Durability (Stress) and Fatigue MSC Nastran Available Today<br />

Road Handling and VPG Adams (for MBD) Evaluation<br />

Powertrain Stress Analysis Abaqus/Standard Available Today<br />

Body NVH MSC Nastran Available Today<br />

Crashworthiness and Safety LS-DYNA Implicit only, beta<br />

<strong>CFD</strong>: Aerodynamics / Thermal UH ANSYS Fluent Available Today, beta<br />

IC Engine Combustion STAR-CCM+ Evaluation<br />

Aerodynamics / HVAC OpenFOAM Available Today<br />

Plastic Mold Injection Moldflow Available Today<br />

18


<strong>GPU</strong> Progress Summary for <strong>GPU</strong>-Parallel <strong>CFD</strong><br />

<strong>GPU</strong> progress in <strong>CFD</strong> research continues to expand<br />

Growth from particle-based <strong>CFD</strong> and high-order methods<br />

Explicit schemes generally more progress than implicit<br />

Strong <strong>GPU</strong> investments by commercial <strong>CFD</strong> vendors (ISVs)<br />

Breakthroughs in <strong>GPU</strong>-parallel linear solvers and preconditioners<br />

<strong>GPU</strong>s for 2 nd -level parallelism, preserves costly MPI investment<br />

ISV focus on hybrid parallel <strong>CFD</strong> that utilizes all CPU cores + <strong>GPU</strong><br />

<strong>GPU</strong> progress for end-user developed <strong>CFD</strong> with OpenACC<br />

Most benefits to aerospace companies with legacy Fortran<br />

<strong>GPU</strong>s behind fast growth in particle-based commercial <strong>CFD</strong><br />

New ISV developments in lattice Boltzmann (LBM) and SPH<br />

19


Particle-Based Commercial <strong>CFD</strong> Software Growing<br />

ISV Software Application Method <strong>GPU</strong> Status<br />

PowerFLOW Aerodynamics LBM Evaluation<br />

LBultra Aerodynamics LBM Available v2.0<br />

XFlow Aerodynamics LBM Evaluation<br />

Project Falcon Aerodynamics LBM Evaluation<br />

Particleworks Multiphase/FS MPS (~SPH) Available v3.5<br />

BARRACUDA Multiphase/FS MP-PIC In development<br />

EDEM Discrete phase DEM In development<br />

ANSYS Fluent–DDPM Multiphase/FS DEM In development<br />

STAR-CCM+ Multiphase/FS DEM Evaluation<br />

AFEA High impact SPH Available v2.0<br />

ESI High impact SPH, ALE In development<br />

LSTC High impact SPH, ALE Evaluation<br />

Altair High impact SPH, ALE Evaluation<br />

20


TiTech Aoki Lab LBM Solution of External Flows<br />

A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows<br />

Based on Lattice Boltzmann Method, Prof. Dr. Takayuki Aoki<br />

http://registration.gputechconf.com/quicklink/8Is4ClC<br />

www.sim.gsic.titech.ac.jp<br />

Aoki <strong>CFD</strong> solver using Lattice<br />

Boltzmann method (LBM) with<br />

Large Eddy Simulation (LES)<br />

21


FluiDyna Lattice Boltzmann Solver LBultra<br />

http://www.fluidyna.com/content/lbultra<br />

www.fluidyna.de<br />

Spin-Off in 2006<br />

from TU Munich<br />

<strong>CFD</strong> solver using<br />

Lattice Boltzmann<br />

method (LBM)<br />

Demonstrated 25x<br />

speedup single <strong>GPU</strong><br />

Multi-<strong>GPU</strong> ready<br />

Contact FluiDyna<br />

for license details<br />

22


Prometech and Particleworks for Particle <strong>CFD</strong><br />

http://www.prometech.co.jp<br />

Oil Flow in<br />

HB Gearbox<br />

MPS-based method developed at the<br />

University of Tokyo [Prof. Koshizuka]<br />

Particleworks 3.0 <strong>GPU</strong> vs. 4 core i7<br />

Courtesy of Prometech Software and Particleworks <strong>CFD</strong> Software<br />

23


Agenda: <strong>GPU</strong> Acceleration for Applied <strong>CFD</strong><br />

Overview of <strong>GPU</strong> Progress for <strong>CFD</strong><br />

<strong>GPU</strong> Acceleration of ANSYS Fluent<br />

<strong>GPU</strong> Acceleration of OpenFOAM<br />

24


ANSYS and NVIDIA Technical Collaboration<br />

Release ANSYS Mechanical ANSYS Fluent ANSYS EM<br />

13.0<br />

Dec 2010<br />

SMP, Single <strong>GPU</strong>, Sparse<br />

and PCG/JCG Solvers<br />

ANSYS Nexxim<br />

14.0<br />

Dec 2011<br />

+ Distributed ANSYS;<br />

+ Multi-node Support<br />

Radiation Heat Transfer<br />

(beta)<br />

ANSYS Nexxim<br />

14.5<br />

Nov 2012<br />

+ Multi-<strong>GPU</strong> Support;<br />

+ Hybrid PCG;<br />

+ Kepler <strong>GPU</strong> Support<br />

+ Radiation HT;<br />

+ <strong>GPU</strong> AMG Solver (beta),<br />

Single <strong>GPU</strong><br />

ANSYS Nexxim<br />

15.0<br />

Q4-2013<br />

+ CUDA 5 Kepler Tuning + Multi-<strong>GPU</strong> AMG Solver;<br />

+ CUDA 5 Kepler Tuning<br />

ANSYS Nexxim<br />

ANSYS HFSS (Transient)<br />

25


ANSYS Fluent 14.5 and Radiation HT on <strong>GPU</strong><br />

VIEWFAC Utility:<br />

Use on CPUs, <strong>GPU</strong>s<br />

or both ~2x speedup<br />

Radiation HT Applications:<br />

- Underhood cooling<br />

- Cabin comfort HVAC<br />

- Furnace simulations<br />

RAY TRACING Utility:<br />

Uses OptiX library<br />

from NVIDIA with up<br />

to ~15x speedup<br />

(Use on <strong>GPU</strong> only)<br />

- Solar loads on buildings<br />

- Combustor in turbine<br />

- Electronics passive cooling<br />

26


ANSYS Fluent Use of NVIDIA Solver Tooklit<br />

ANSYS Fluent 15.0 will offer a <strong>GPU</strong>-based AMG solver (Nov/Dec 2013)<br />

Developed with support for MPI across multiple nodes and multiple <strong>GPU</strong>s<br />

Solver collaboration on pressure-based coupled Navier-Stokes, others to follow<br />

Early results published at Parallel <strong>CFD</strong> 2013, 20-24 May, Changsha, CN<br />

<strong>GPU</strong>-Accelerated Algebraic Multigrid for Applied <strong>CFD</strong><br />

27


ANSYS Fluent CPU Profile for Coupled Solver<br />

Non-linear iterations<br />

Assemble Linear System of Equations<br />

Runtime:<br />

~ 35%<br />

Accelerate<br />

this first<br />

Solve Linear System of Equations: Ax = b<br />

~ 65%<br />

No<br />

Converged ?<br />

Yes<br />

Stop<br />

28


Error Residuals<br />

ANSYS Fluent 14.5 <strong>GPU</strong> Solver Convergence<br />

nvAMG Preview of ANSYS Fluent Convergence Behavior<br />

1.0000E+00<br />

1.0000E-01<br />

1.0000E-02<br />

1.0000E-03<br />

NVAMG-Cont<br />

NVAMG-X-mom<br />

NVAMG-Y-mom<br />

NVAMG-Z-mom<br />

FLUENT-Cont<br />

FLUENT-X-mom<br />

FLUENT-Y-mom<br />

FLUENT-Z-mom<br />

Numerical Results<br />

Mar 2012: Test for<br />

convergence at<br />

each iteration<br />

matches precise<br />

Fluent behavior<br />

1.0000E-04<br />

1.0000E-05<br />

1.0000E-06<br />

1.0000E-07<br />

1.0000E-08<br />

1 11 21 31 41 51 61 71 81 91 101 111 121 131 141<br />

Model FL5S1:<br />

- Incompressible<br />

- Flow in a Bend<br />

- 32K Hex Cells<br />

- Coupled Solver<br />

Iteration Number<br />

29


ANSYS Fluent AMG Solver Time (Sec)<br />

ANSYS Fluent 14.5 <strong>GPU</strong> Acceleration<br />

Preview of ANSYS Fluent 14.5 Performance – by ANSYS, Aug 2012<br />

3000<br />

2832<br />

Dual Socket CPU<br />

Dual Socket CPU + Tesla C2075<br />

Helix Model<br />

2000<br />

Lower<br />

is<br />

Better<br />

1000<br />

0<br />

5.5x<br />

2 x Xeon X5650,<br />

Only 1 Core Used<br />

933<br />

1.8x<br />

517 517<br />

2 x Xeon X5650,<br />

All 12 Cores Used<br />

Helix geometry<br />

1.2M Tet cells<br />

Unsteady, laminar<br />

Coupled PBNS, DP<br />

AMG F-cycle on CPU<br />

AMG V-cycle on <strong>GPU</strong><br />

NOTE: All jobs<br />

solver time only<br />

30


ANSYS Fluent AMG Solver Time per Iteration (Sec)<br />

ANSYS Fluent with <strong>GPU</strong>-Based AMG Solver<br />

ANSYS Fluent 14.5 Performance – Results by NVIDIA, Nov 2012<br />

9<br />

6<br />

Airfoil and Aircraft Models with Hexahedral Cells<br />

K20X<br />

3930K(6)<br />

Lower<br />

is<br />

Better<br />

2.4x<br />

2 x Core-i7 3930K,<br />

Only 6 Cores Used<br />

Solver settings:<br />

3<br />

0<br />

2.4x<br />

Airfoil (hex 784K) Aircraft (hex 1798K)<br />

CPU Fluent solver:<br />

F-cycle, agg8, DILU,<br />

0pre, 3post<br />

<strong>GPU</strong> nvAMG solver:<br />

V-cycle, agg8, MC-DILU,<br />

0pre, 3post<br />

NOTE: Times<br />

for solver only<br />

31


<strong>GPU</strong>s and Distributed Cluster Computing<br />

Partition on CPU<br />

1<br />

2 3<br />

4<br />

Geometry decomposed: partitions<br />

put on independent cluster nodes;<br />

CPU distributed parallel processing<br />

Nodes distributed<br />

parallel using MPI<br />

N1<br />

N1 N2 N3 N4<br />

Global Solution<br />

32


<strong>GPU</strong>s and Distributed Cluster Computing<br />

Partition on CPU<br />

1<br />

2 3<br />

4<br />

Geometry decomposed: partitions<br />

put on independent cluster nodes;<br />

CPU distributed parallel processing<br />

Nodes distributed<br />

parallel using MPI<br />

N1<br />

1<br />

Execution on<br />

CPU + <strong>GPU</strong><br />

N1 N2 N3 N4<br />

G1 G2 G3 G4<br />

<strong>GPU</strong>s shared memory<br />

parallel using OpenMP<br />

under distributed parallel<br />

Global Solution<br />

33


ANSYS Fluent for 3.6M Cell Aerodynamic Case<br />

Multi-<strong>GPU</strong> Acceleration of<br />

16-Core ANSYS Fluent 15.0<br />

(Preview) External Aero<br />

2.9X Solver Speedup<br />

Xeon E5-2667 + 4 x Tesla K20X <strong>GPU</strong>s<br />

CPU Configuration<br />

CPU + <strong>GPU</strong> Configuration<br />

16-Core Server Node<br />

8-Cores<br />

8-Cores<br />

G1 G2 G3 G4<br />

34


ANSYS Fluent AMG Solver Time per Iteration (Sec)<br />

ANSYS Fluent for 14M Cell Aerodynamic Case<br />

ANSYS Fluent 15.0 Preview Performance – Results by NVIDIA, Jun 2013<br />

75<br />

69<br />

Intel Xeon E5-2667, 2.90GHz<br />

Intel Xeon E5-2667, 2.90GHz + Tesla K20X<br />

Truck Body Model<br />

50<br />

Lower<br />

is<br />

Better<br />

14 M Mixed cells<br />

41<br />

DES Turbulence<br />

Coupled PBNS, SP<br />

25<br />

3.5x<br />

28<br />

Times for 1 Iteration<br />

AMG F-cycle on CPU<br />

12<br />

3.3x<br />

9<br />

<strong>GPU</strong>: Preconditioned<br />

FGMRES with AMG<br />

0<br />

1 x Nodes, 2 CPUs<br />

(12 Cores Total)<br />

2 x Nodes, 4 CPUs<br />

(24 Cores Total);<br />

8 <strong>GPU</strong>s (4 ea Node)<br />

4 x Nodes, 8 CPUs<br />

(48 Cores Total);<br />

16 <strong>GPU</strong>s (4 ea Node)<br />

NOTE: All jobs<br />

solver time only<br />

35


Agenda: <strong>GPU</strong> Acceleration for Applied <strong>CFD</strong><br />

Overview of <strong>GPU</strong> Progress for <strong>CFD</strong><br />

<strong>GPU</strong> Acceleration of ANSYS Fluent<br />

<strong>GPU</strong> Acceleration of OpenFOAM<br />

36


2013: Further Expansion of OF Community<br />

ESI acquisition of Open<strong>CFD</strong> from SGI during Sep 2012<br />

IDAJ acquire majority stake of ICON during May 2013<br />

This Year 3 (up from 2) OpenFOAM Global User Events:<br />

APR 24 – 26, Frankfurt, DE: ESI OpenFOAM Users <strong>Conference</strong> (first ever)<br />

http://www.esi-group.com/corporate/events/2013/OpenFOAM2013<br />

Concentration on OpenFOAM from Open<strong>CFD</strong><br />

JUN 11 – 14, Jeju, KR : 8 th International OpenFOAM Workshop (first in Asia)<br />

http://www.openfoamworkshop2013.org/<br />

Concentration on OpenFOAM-extend and Wikki<br />

OCT 24 – 25, Hamburg, DE : 7 th Open Source <strong>CFD</strong> International <strong>Conference</strong> (ICON)<br />

http://www.opensourcecfd.com/conference2013/<br />

Concentration on both OpenFOAM and OpenFOAM-extend<br />

37


NVIDIA Market Strategy for OpenFOAM<br />

Provide technical support for commercial <strong>GPU</strong> solver developments<br />

FluiDyna Culises AMG solver library using NVIDIA toolkit<br />

Vratis Speed-IT library, development of CUSP-based AMG<br />

Alliances (but no development) with key OpenFOAM organizations<br />

ESI and Open<strong>CFD</strong> Foundation (H. Weller, M. Salari)<br />

Wikki and OpenFOAM-extend community (H. Jasak)<br />

IDAJ in Japan and ICON in the UK – support of both OF and OF-ext<br />

Conduct performance studies and customer benchmark evaluations<br />

Collaborations: developers, customers, OEMs (Dell, SGI, HP, etc.)<br />

38


Culises: <strong>CFD</strong> Solver Library for OpenFOAM<br />

Culises Easy-to-Use AMG-PCG Solver:<br />

www.fluidyna.de<br />

#1. Download and license from http://www.FluiDyna.de<br />

#2. Automatic installation with FluiDyna-provided script<br />

#3. Activate Culises and <strong>GPU</strong>s with 2 edits to config-file<br />

config-file CPU-only<br />

config-file CPU+<strong>GPU</strong><br />

FluiDyna: TU Munich<br />

Spin-Off from 2006<br />

Culises provides a<br />

linear solver library<br />

Culises requires only<br />

two edits to control<br />

file of OpenFOAM<br />

Multi-<strong>GPU</strong> ready<br />

Contact FluiDyna<br />

for license details<br />

www.fluidyna.de<br />

39


Culises Coupling to OpenFOAM<br />

Culises Coupling is User-Transparent:<br />

www.fluidyna.de<br />

40


OpenFOAM Speedups Based on <strong>CFD</strong> Application<br />

<strong>GPU</strong> Speedups for Different Industry Cases:<br />

Range of model sizes and different solver schemes (Krylov, AMG-PCG, etc.)<br />

www.fluidyna.de<br />

Automotive<br />

1.6x<br />

Multiphase<br />

1.9x<br />

Thermal<br />

3.0x<br />

Pharma <strong>CFD</strong><br />

2.2x<br />

Process <strong>CFD</strong><br />

4.7x<br />

Job Speedup Solver Speedup OpenFOAM CPU-Only Efficiency<br />

41


FluiDyna Culises: <strong>CFD</strong> Solver for OpenFOAM<br />

Culises: A Library for Accelerated <strong>CFD</strong> on Hybrid <strong>GPU</strong>-CPU Systems<br />

Dr. Bjoern Landmann, FluiDyna<br />

developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0293-GTC2012-Culises-Hybrid-<strong>GPU</strong>.pdf<br />

www.fluidyna.de<br />

DrivAer: Joint Car Body Shape by BMW and Audi<br />

http://www.aer.mw.tum.de/en/research-groups/automotive/drivaer<br />

Mesh Size - CPUs 9M - 2 CPU 18M - 2 CPU 36M - 2 CPU<br />

Solver speedup of 7x<br />

for 2 CPU + 4 <strong>GPU</strong><br />

• 36M Cells (mixed type)<br />

• GAMG on CPU<br />

• AMGPCG on <strong>GPU</strong><br />

<strong>GPU</strong>s +1 <strong>GPU</strong> +2 <strong>GPU</strong>s +4 <strong>GPU</strong>s<br />

2.5x 4.2x 6.9x<br />

Job Speedup 1.36x 1.52x 1.67x<br />

42


Conclusions For Applied <strong>CFD</strong> on <strong>GPU</strong>s<br />

<strong>GPU</strong>s provide significant speedups for solver intensive jobs<br />

Improved product quality with higher fidelity modeling<br />

Shorten product engineering cycles with faster simulation turnaround<br />

Simulations recently considered impractical now possible<br />

Unsteady RANS, Large Eddy Simulation (LES) practical in cost and time<br />

Effective parameter optimization from large increase in number of jobs<br />

43


Stan Posey<br />

NVIDIA, Santa Clara, CA, USA; sposey@nvidia.com

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!