CFD - GPU Technology Conference
CFD - GPU Technology Conference
CFD - GPU Technology Conference
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Stan Posey<br />
NVIDIA, Santa Clara, CA, USA; sposey@nvidia.com
Agenda: <strong>GPU</strong> Acceleration for Applied <strong>CFD</strong><br />
Overview of <strong>GPU</strong> Progress for <strong>CFD</strong><br />
<strong>GPU</strong> Acceleration of ANSYS Fluent<br />
<strong>GPU</strong> Acceleration of OpenFOAM<br />
2
<strong>GPU</strong> Progress Summary for <strong>GPU</strong>-Parallel <strong>CFD</strong><br />
<strong>GPU</strong> progress in <strong>CFD</strong> research continues to expand<br />
Growth from particle-based <strong>CFD</strong> and high-order methods<br />
Explicit schemes generally more progress than implicit<br />
Strong <strong>GPU</strong> investments by commercial <strong>CFD</strong> vendors (ISVs)<br />
Breakthroughs in <strong>GPU</strong>-parallel linear solvers and preconditioners<br />
<strong>GPU</strong>s for 2 nd -level parallelism, preserves costly MPI investment<br />
ISV focus on hybrid parallel <strong>CFD</strong> that utilizes all CPU cores + <strong>GPU</strong><br />
<strong>GPU</strong> progress for end-user developed <strong>CFD</strong> with OpenACC<br />
Most benefits to aerospace companies with legacy Fortran<br />
<strong>GPU</strong>s behind fast growth in particle-based commercial <strong>CFD</strong><br />
New ISV developments in lattice Boltzmann (LBM) and SPH<br />
3
<strong>CFD</strong> Software Character and <strong>GPU</strong> Suitability<br />
Structured Grid FV Unstructured FV Unstructured FE<br />
Explicit<br />
Usually<br />
Compressible<br />
Numerical operations on I,J,K stencil, no “solver”<br />
[Typically flat profiles: <strong>GPU</strong> strategy of directives (OpenACC)]<br />
Finite Volume<br />
Finite Element:<br />
Implicit<br />
Usually<br />
Incompressible<br />
Sparse matrix linear algebra – iterative solvers<br />
[Hot spot ~50%, small % LoC: <strong>GPU</strong> strategy of CUDA and libs]<br />
4
<strong>CFD</strong> Speedups for <strong>GPU</strong> Relative to 8-Core CPU<br />
Structured Grid FV Unstructured FV Unstructured FE<br />
Explicit<br />
Usually<br />
Compressible<br />
Implicit<br />
Usually<br />
Incompressible<br />
~10x ~5x<br />
Turbostream<br />
SJTU RANS<br />
Structured grid explicit<br />
generally best <strong>GPU</strong> fit<br />
Finite Volume<br />
Veloxi<br />
- SD++<br />
Stanford<br />
(Jameson)<br />
- FEFLO<br />
(Lohner)<br />
Finite Element:<br />
5
Turbostream: <strong>CFD</strong> for Turbomachinery<br />
Source:<br />
http://www.turbostream-cfd.com/<br />
Sample Turbostream <strong>GPU</strong> Simulations<br />
Typical Routine Simulation<br />
Large-scale Simulation<br />
~19x Speedup<br />
6
Commercial Aircraft Wing Design on <strong>GPU</strong>s<br />
COMAC and SJTU<br />
Commercial Aircraft Corporation of China<br />
<strong>GPU</strong> Application<br />
SJTU-developed explicit <strong>CFD</strong> RANS for<br />
aerodynamic evaluation of wing shapes<br />
COMAC Wing<br />
Candidate<br />
<strong>GPU</strong> Benefit<br />
Use of Tesla C2070: 37x vs. single core<br />
Intel core i7 CPU<br />
Faster simulations for more wing design<br />
candidates vs. costly wind tunnel tests<br />
Expanding to multi-<strong>GPU</strong> and full aircraft<br />
ONERA M6 Wing<br />
<strong>CFD</strong> Simulation<br />
7
<strong>CFD</strong> Speedups for <strong>GPU</strong> Relative to 8-Core CPU<br />
Structured Grid FV Unstructured FV Unstructured FE<br />
Explicit<br />
Usually<br />
Compressible<br />
~15x ~5x<br />
Turbostream<br />
Veloxi<br />
SJTU RANS<br />
- SD++<br />
Stanford<br />
(Jameson)<br />
- FEFLO<br />
(Lohner)<br />
Finite Volume<br />
Finite Element:<br />
Implicit<br />
Usually<br />
Incompressible<br />
Commercial <strong>CFD</strong> mostly<br />
unstructured implicit<br />
- ANSYS Fluent<br />
- Culises for<br />
OpenFOAM<br />
- SpeedIT for<br />
OpenFOAM<br />
- <strong>CFD</strong>-ACE+<br />
- FIRE<br />
~2x<br />
- Moldflow<br />
- AcuSolve<br />
- Moldex3D<br />
8
NVIDIA Strategy for <strong>GPU</strong>-Accelerated <strong>CFD</strong><br />
Strategic Alliances<br />
Business and technical alliances with key ISVs (ANSYS, CD-adapco, etc.)<br />
Invest in long-term technical collaboration for ANSYS Fluent acceleration<br />
Develop key technical collaborations with <strong>CFD</strong> research community:<br />
TiTech—Aoki, Stanford—Jameson, Oxford—Giles, Wyoming—Mavriplis, others<br />
Software Development<br />
NVIDIA linear solver toolkit with emphasis on AMG for industry <strong>CFD</strong><br />
Invest in relevant high-order methods (DGM, flux reconstruction, etc.)<br />
Applications Support<br />
Direct developer support for range of ISV and customer requests<br />
Implicit Schemes: Integration support of libraries and solver toolkit<br />
Explicit Schemes: Stencil libraries, OpenACC support for Fortran<br />
9
Primary Commercial CAE and <strong>GPU</strong> Progress<br />
ISV Primary Applications (Green color indicates CUDA-ready during 2013)<br />
ANSYS<br />
ANSYS Mechanical; ANSYS Fluent; ANSYS HFSS<br />
DS SIMULIA Abaqus/Standard; Abaqus/Explicit; Abaqus/<strong>CFD</strong><br />
MSC Software<br />
Altair<br />
CD-adapco<br />
Autodesk<br />
ESI Group<br />
Siemens<br />
LSTC<br />
Mentor<br />
Metacomp<br />
MSC Nastran; Marc; Adams<br />
RADIOSS; AcuSolve<br />
STAR-CD; STAR-CCM+<br />
AS Mechanical, Moldflow, AS <strong>CFD</strong><br />
PAM-CRASH imp; <strong>CFD</strong>-ACE+<br />
NX Nastran<br />
LS-DYNA; LS-DYNA <strong>CFD</strong><br />
FloEFD, FloTherm<br />
<strong>CFD</strong>++<br />
10
Additional Commercial <strong>GPU</strong> Developments<br />
ISV Domain Location Primary Applications<br />
FluiDyna <strong>CFD</strong> Germany Culises for OpenFOAM; LBultra<br />
Vratis <strong>CFD</strong> Poland Speed-IT for OpenFOAM; ARAEL<br />
Prometech <strong>CFD</strong> Japan Particleworks<br />
Turbostream <strong>CFD</strong> England, UK Turbostream<br />
IMPETUS Explicit FEA Sweden AFEA<br />
AVL <strong>CFD</strong> Austria FIRE<br />
CoreTech <strong>CFD</strong> (molding) Taiwan Moldex3D<br />
Intes Implicit FEA Germany PERMAS<br />
Next Limit <strong>CFD</strong> Spain XFlow<br />
CPFD <strong>CFD</strong> USA BARRACUDA<br />
Flow Science <strong>CFD</strong> USA FLOW-3D<br />
SCSK Implicit FEA Japan ADVENTURECluster<br />
CDH Implicit FEA Germany AMLS; FastFRS<br />
FunctionBay MB Dynamics S. Korea RecurDyn<br />
Cradle Software <strong>CFD</strong> Japan SC/Tetra; scSTREAM<br />
11
Status Summary of ISVs and <strong>GPU</strong> Acceleration<br />
Every primary ISV has products available on <strong>GPU</strong>s or ongoing evaluation<br />
The 4 largest ISVs all have products based on <strong>GPU</strong>s, some at 3rd generation<br />
ANSYS SIMULIA MSC Software Altair<br />
The top 4 out of 5 ISV applications are available on <strong>GPU</strong>s today<br />
ANSYS Fluent, ANSYS Mechanical, Abaqus/Standard, MSC Nastran, . . . LS-DYNA implicit only<br />
Several new ISVs were founded with <strong>GPU</strong>s as a primary competitive strategy<br />
Prometech, FluiDyna, Vratis, IMPETUS, Turbostream<br />
Open source <strong>CFD</strong> OpenFOAM available on <strong>GPU</strong>s today with many options<br />
Commercial options: FluiDyna, Vratis; Open source options: Cufflink, Symscape ofgpu, RAS, etc.<br />
12
Basics of <strong>GPU</strong> Computing for ISV Software<br />
ISV software use of <strong>GPU</strong> acceleration is user-transparent<br />
Jobs launch and complete without additional user steps<br />
User informs ISV application (GUI, command) that a <strong>GPU</strong> exists<br />
Schematic of a CPU with an attached <strong>GPU</strong> accelerator<br />
CPU begins/ends job, <strong>GPU</strong> manages heavy computations<br />
CPU<br />
I/O<br />
Hub<br />
Cache<br />
1<br />
4<br />
DDR<br />
DDR<br />
PCI-Express<br />
3<br />
2<br />
GDDR<br />
GDDR<br />
<strong>GPU</strong><br />
Schematic of an x86 CPU<br />
with a <strong>GPU</strong> accelerator<br />
1. ISV job launched on CPU<br />
2. Solver operations sent to <strong>GPU</strong><br />
3. <strong>GPU</strong> sends results back to CPU<br />
4. ISV job completes on CPU<br />
13
Commercial <strong>CFD</strong> Focus on Sparse Solvers<br />
<strong>CFD</strong> Application Software<br />
Read input, matrix Set-up<br />
<strong>GPU</strong><br />
Implicit Sparse<br />
Matrix Operations<br />
- Hand-CUDA Parallel<br />
40% - 65% of<br />
Profile time,<br />
Small % LoC<br />
Implicit Sparse<br />
Matrix Operations<br />
CPU<br />
- <strong>GPU</strong> Libraries, CUBLAS<br />
- OpenACC Directives<br />
Global solution, write output<br />
(Investigating OpenACC<br />
for more tasks on <strong>GPU</strong>)<br />
+<br />
14
NVIDIA Offers an Accelerated Solver Toolkit<br />
Toolkit of linear solvers, preconditioners, other, for large sparse Ax=b<br />
BiCGstab AMG Jacobi<br />
MC-DILU<br />
Available schemes include:<br />
AMG – multi-level scheme popular with several commercial <strong>CFD</strong><br />
Jacobi, BiCGStab, FGMRES, MC-DILU, and others<br />
Use of NVIDIA linear solver toolkit for industry-ready <strong>CFD</strong>:<br />
ANSYS 14.5 collaboration introduced their AMG-<strong>GPU</strong> solver in Nov 2012<br />
FluiDyna collaboration on Culises 2.0 AMG solver library for OpenFOAM<br />
Other ISVs and customer <strong>CFD</strong> software undergoing evaluation . . .<br />
15
<strong>GPU</strong> Developments for Aircraft <strong>CFD</strong><br />
External Aero<br />
Developer Location Software<br />
(Green color indicates <strong>GPU</strong>-ready during 2013)<br />
NASA USA OVERFLOW<br />
NASA USA FUN3D<br />
AFRL USA AVUS<br />
ONERA France elsA<br />
Stanford/Jameson USA SD++<br />
JAXA Japan UPACS<br />
ANSYS USA ANSYS Fluent 15.0<br />
CD-adapco USA/UK STAR-CCM+<br />
Metacomp USA <strong>CFD</strong>++<br />
Internal Flows<br />
ANSYS USA ANSYS Fluent 15.0<br />
FluiDyna Germany Culises for OpenFOAM 2.2.0<br />
Vratis Poland Speed-IT for OpenFOAM 2.2.0<br />
CD-adapco USA/UK STAR-CCM+<br />
16
<strong>GPU</strong> Developments for Turbine Engine <strong>CFD</strong><br />
Turbomachinery<br />
Developer Location Software<br />
(Green color indicates CUDA-ready during 2013)<br />
Turbostream England, UK Turbostream 3.0<br />
Oxford / Rolls Royce England, UK OP2 / Hydra<br />
ANSYS USA ANSYS <strong>CFD</strong> 15.0 (Fluent + CFX)<br />
Combustor<br />
ANSYS USA ANSYS Fluent 15.0<br />
FluiDyna Germany Culises for OpenFOAM 2.2.0<br />
Vratis Poland Speed-IT for OpenFOAM 2.2.0<br />
Cascade Technologies USA CHARLES<br />
Convergent Science USA Converge <strong>CFD</strong><br />
Sandia NL / Oak Ridge NL USA<br />
S3D<br />
Nozzle / Noise<br />
Naval Research Lab USA JENRE<br />
Aviadvigatel OJSC Russia GHOST <strong>CFD</strong><br />
17
<strong>GPU</strong> Status of Select Automotive CAE Software<br />
Select Automotive CAE Application ISV Select CAE Software <strong>GPU</strong> Status<br />
CSM: Durability (Stress) and Fatigue MSC Nastran Available Today<br />
Road Handling and VPG Adams (for MBD) Evaluation<br />
Powertrain Stress Analysis Abaqus/Standard Available Today<br />
Body NVH MSC Nastran Available Today<br />
Crashworthiness and Safety LS-DYNA Implicit only, beta<br />
<strong>CFD</strong>: Aerodynamics / Thermal UH ANSYS Fluent Available Today, beta<br />
IC Engine Combustion STAR-CCM+ Evaluation<br />
Aerodynamics / HVAC OpenFOAM Available Today<br />
Plastic Mold Injection Moldflow Available Today<br />
18
<strong>GPU</strong> Progress Summary for <strong>GPU</strong>-Parallel <strong>CFD</strong><br />
<strong>GPU</strong> progress in <strong>CFD</strong> research continues to expand<br />
Growth from particle-based <strong>CFD</strong> and high-order methods<br />
Explicit schemes generally more progress than implicit<br />
Strong <strong>GPU</strong> investments by commercial <strong>CFD</strong> vendors (ISVs)<br />
Breakthroughs in <strong>GPU</strong>-parallel linear solvers and preconditioners<br />
<strong>GPU</strong>s for 2 nd -level parallelism, preserves costly MPI investment<br />
ISV focus on hybrid parallel <strong>CFD</strong> that utilizes all CPU cores + <strong>GPU</strong><br />
<strong>GPU</strong> progress for end-user developed <strong>CFD</strong> with OpenACC<br />
Most benefits to aerospace companies with legacy Fortran<br />
<strong>GPU</strong>s behind fast growth in particle-based commercial <strong>CFD</strong><br />
New ISV developments in lattice Boltzmann (LBM) and SPH<br />
19
Particle-Based Commercial <strong>CFD</strong> Software Growing<br />
ISV Software Application Method <strong>GPU</strong> Status<br />
PowerFLOW Aerodynamics LBM Evaluation<br />
LBultra Aerodynamics LBM Available v2.0<br />
XFlow Aerodynamics LBM Evaluation<br />
Project Falcon Aerodynamics LBM Evaluation<br />
Particleworks Multiphase/FS MPS (~SPH) Available v3.5<br />
BARRACUDA Multiphase/FS MP-PIC In development<br />
EDEM Discrete phase DEM In development<br />
ANSYS Fluent–DDPM Multiphase/FS DEM In development<br />
STAR-CCM+ Multiphase/FS DEM Evaluation<br />
AFEA High impact SPH Available v2.0<br />
ESI High impact SPH, ALE In development<br />
LSTC High impact SPH, ALE Evaluation<br />
Altair High impact SPH, ALE Evaluation<br />
20
TiTech Aoki Lab LBM Solution of External Flows<br />
A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows<br />
Based on Lattice Boltzmann Method, Prof. Dr. Takayuki Aoki<br />
http://registration.gputechconf.com/quicklink/8Is4ClC<br />
www.sim.gsic.titech.ac.jp<br />
Aoki <strong>CFD</strong> solver using Lattice<br />
Boltzmann method (LBM) with<br />
Large Eddy Simulation (LES)<br />
21
FluiDyna Lattice Boltzmann Solver LBultra<br />
http://www.fluidyna.com/content/lbultra<br />
www.fluidyna.de<br />
Spin-Off in 2006<br />
from TU Munich<br />
<strong>CFD</strong> solver using<br />
Lattice Boltzmann<br />
method (LBM)<br />
Demonstrated 25x<br />
speedup single <strong>GPU</strong><br />
Multi-<strong>GPU</strong> ready<br />
Contact FluiDyna<br />
for license details<br />
22
Prometech and Particleworks for Particle <strong>CFD</strong><br />
http://www.prometech.co.jp<br />
Oil Flow in<br />
HB Gearbox<br />
MPS-based method developed at the<br />
University of Tokyo [Prof. Koshizuka]<br />
Particleworks 3.0 <strong>GPU</strong> vs. 4 core i7<br />
Courtesy of Prometech Software and Particleworks <strong>CFD</strong> Software<br />
23
Agenda: <strong>GPU</strong> Acceleration for Applied <strong>CFD</strong><br />
Overview of <strong>GPU</strong> Progress for <strong>CFD</strong><br />
<strong>GPU</strong> Acceleration of ANSYS Fluent<br />
<strong>GPU</strong> Acceleration of OpenFOAM<br />
24
ANSYS and NVIDIA Technical Collaboration<br />
Release ANSYS Mechanical ANSYS Fluent ANSYS EM<br />
13.0<br />
Dec 2010<br />
SMP, Single <strong>GPU</strong>, Sparse<br />
and PCG/JCG Solvers<br />
ANSYS Nexxim<br />
14.0<br />
Dec 2011<br />
+ Distributed ANSYS;<br />
+ Multi-node Support<br />
Radiation Heat Transfer<br />
(beta)<br />
ANSYS Nexxim<br />
14.5<br />
Nov 2012<br />
+ Multi-<strong>GPU</strong> Support;<br />
+ Hybrid PCG;<br />
+ Kepler <strong>GPU</strong> Support<br />
+ Radiation HT;<br />
+ <strong>GPU</strong> AMG Solver (beta),<br />
Single <strong>GPU</strong><br />
ANSYS Nexxim<br />
15.0<br />
Q4-2013<br />
+ CUDA 5 Kepler Tuning + Multi-<strong>GPU</strong> AMG Solver;<br />
+ CUDA 5 Kepler Tuning<br />
ANSYS Nexxim<br />
ANSYS HFSS (Transient)<br />
25
ANSYS Fluent 14.5 and Radiation HT on <strong>GPU</strong><br />
VIEWFAC Utility:<br />
Use on CPUs, <strong>GPU</strong>s<br />
or both ~2x speedup<br />
Radiation HT Applications:<br />
- Underhood cooling<br />
- Cabin comfort HVAC<br />
- Furnace simulations<br />
RAY TRACING Utility:<br />
Uses OptiX library<br />
from NVIDIA with up<br />
to ~15x speedup<br />
(Use on <strong>GPU</strong> only)<br />
- Solar loads on buildings<br />
- Combustor in turbine<br />
- Electronics passive cooling<br />
26
ANSYS Fluent Use of NVIDIA Solver Tooklit<br />
ANSYS Fluent 15.0 will offer a <strong>GPU</strong>-based AMG solver (Nov/Dec 2013)<br />
Developed with support for MPI across multiple nodes and multiple <strong>GPU</strong>s<br />
Solver collaboration on pressure-based coupled Navier-Stokes, others to follow<br />
Early results published at Parallel <strong>CFD</strong> 2013, 20-24 May, Changsha, CN<br />
<strong>GPU</strong>-Accelerated Algebraic Multigrid for Applied <strong>CFD</strong><br />
27
ANSYS Fluent CPU Profile for Coupled Solver<br />
Non-linear iterations<br />
Assemble Linear System of Equations<br />
Runtime:<br />
~ 35%<br />
Accelerate<br />
this first<br />
Solve Linear System of Equations: Ax = b<br />
~ 65%<br />
No<br />
Converged ?<br />
Yes<br />
Stop<br />
28
Error Residuals<br />
ANSYS Fluent 14.5 <strong>GPU</strong> Solver Convergence<br />
nvAMG Preview of ANSYS Fluent Convergence Behavior<br />
1.0000E+00<br />
1.0000E-01<br />
1.0000E-02<br />
1.0000E-03<br />
NVAMG-Cont<br />
NVAMG-X-mom<br />
NVAMG-Y-mom<br />
NVAMG-Z-mom<br />
FLUENT-Cont<br />
FLUENT-X-mom<br />
FLUENT-Y-mom<br />
FLUENT-Z-mom<br />
Numerical Results<br />
Mar 2012: Test for<br />
convergence at<br />
each iteration<br />
matches precise<br />
Fluent behavior<br />
1.0000E-04<br />
1.0000E-05<br />
1.0000E-06<br />
1.0000E-07<br />
1.0000E-08<br />
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141<br />
Model FL5S1:<br />
- Incompressible<br />
- Flow in a Bend<br />
- 32K Hex Cells<br />
- Coupled Solver<br />
Iteration Number<br />
29
ANSYS Fluent AMG Solver Time (Sec)<br />
ANSYS Fluent 14.5 <strong>GPU</strong> Acceleration<br />
Preview of ANSYS Fluent 14.5 Performance – by ANSYS, Aug 2012<br />
3000<br />
2832<br />
Dual Socket CPU<br />
Dual Socket CPU + Tesla C2075<br />
Helix Model<br />
2000<br />
Lower<br />
is<br />
Better<br />
1000<br />
0<br />
5.5x<br />
2 x Xeon X5650,<br />
Only 1 Core Used<br />
933<br />
1.8x<br />
517 517<br />
2 x Xeon X5650,<br />
All 12 Cores Used<br />
Helix geometry<br />
1.2M Tet cells<br />
Unsteady, laminar<br />
Coupled PBNS, DP<br />
AMG F-cycle on CPU<br />
AMG V-cycle on <strong>GPU</strong><br />
NOTE: All jobs<br />
solver time only<br />
30
ANSYS Fluent AMG Solver Time per Iteration (Sec)<br />
ANSYS Fluent with <strong>GPU</strong>-Based AMG Solver<br />
ANSYS Fluent 14.5 Performance – Results by NVIDIA, Nov 2012<br />
9<br />
6<br />
Airfoil and Aircraft Models with Hexahedral Cells<br />
K20X<br />
3930K(6)<br />
Lower<br />
is<br />
Better<br />
2.4x<br />
2 x Core-i7 3930K,<br />
Only 6 Cores Used<br />
Solver settings:<br />
3<br />
0<br />
2.4x<br />
Airfoil (hex 784K) Aircraft (hex 1798K)<br />
CPU Fluent solver:<br />
F-cycle, agg8, DILU,<br />
0pre, 3post<br />
<strong>GPU</strong> nvAMG solver:<br />
V-cycle, agg8, MC-DILU,<br />
0pre, 3post<br />
NOTE: Times<br />
for solver only<br />
31
<strong>GPU</strong>s and Distributed Cluster Computing<br />
Partition on CPU<br />
1<br />
2 3<br />
4<br />
Geometry decomposed: partitions<br />
put on independent cluster nodes;<br />
CPU distributed parallel processing<br />
Nodes distributed<br />
parallel using MPI<br />
N1<br />
N1 N2 N3 N4<br />
Global Solution<br />
32
<strong>GPU</strong>s and Distributed Cluster Computing<br />
Partition on CPU<br />
1<br />
2 3<br />
4<br />
Geometry decomposed: partitions<br />
put on independent cluster nodes;<br />
CPU distributed parallel processing<br />
Nodes distributed<br />
parallel using MPI<br />
N1<br />
1<br />
Execution on<br />
CPU + <strong>GPU</strong><br />
N1 N2 N3 N4<br />
G1 G2 G3 G4<br />
<strong>GPU</strong>s shared memory<br />
parallel using OpenMP<br />
under distributed parallel<br />
Global Solution<br />
33
ANSYS Fluent for 3.6M Cell Aerodynamic Case<br />
Multi-<strong>GPU</strong> Acceleration of<br />
16-Core ANSYS Fluent 15.0<br />
(Preview) External Aero<br />
2.9X Solver Speedup<br />
Xeon E5-2667 + 4 x Tesla K20X <strong>GPU</strong>s<br />
CPU Configuration<br />
CPU + <strong>GPU</strong> Configuration<br />
16-Core Server Node<br />
8-Cores<br />
8-Cores<br />
G1 G2 G3 G4<br />
34
ANSYS Fluent AMG Solver Time per Iteration (Sec)<br />
ANSYS Fluent for 14M Cell Aerodynamic Case<br />
ANSYS Fluent 15.0 Preview Performance – Results by NVIDIA, Jun 2013<br />
75<br />
69<br />
Intel Xeon E5-2667, 2.90GHz<br />
Intel Xeon E5-2667, 2.90GHz + Tesla K20X<br />
Truck Body Model<br />
50<br />
Lower<br />
is<br />
Better<br />
14 M Mixed cells<br />
41<br />
DES Turbulence<br />
Coupled PBNS, SP<br />
25<br />
3.5x<br />
28<br />
Times for 1 Iteration<br />
AMG F-cycle on CPU<br />
12<br />
3.3x<br />
9<br />
<strong>GPU</strong>: Preconditioned<br />
FGMRES with AMG<br />
0<br />
1 x Nodes, 2 CPUs<br />
(12 Cores Total)<br />
2 x Nodes, 4 CPUs<br />
(24 Cores Total);<br />
8 <strong>GPU</strong>s (4 ea Node)<br />
4 x Nodes, 8 CPUs<br />
(48 Cores Total);<br />
16 <strong>GPU</strong>s (4 ea Node)<br />
NOTE: All jobs<br />
solver time only<br />
35
Agenda: <strong>GPU</strong> Acceleration for Applied <strong>CFD</strong><br />
Overview of <strong>GPU</strong> Progress for <strong>CFD</strong><br />
<strong>GPU</strong> Acceleration of ANSYS Fluent<br />
<strong>GPU</strong> Acceleration of OpenFOAM<br />
36
2013: Further Expansion of OF Community<br />
ESI acquisition of Open<strong>CFD</strong> from SGI during Sep 2012<br />
IDAJ acquire majority stake of ICON during May 2013<br />
This Year 3 (up from 2) OpenFOAM Global User Events:<br />
APR 24 – 26, Frankfurt, DE: ESI OpenFOAM Users <strong>Conference</strong> (first ever)<br />
http://www.esi-group.com/corporate/events/2013/OpenFOAM2013<br />
Concentration on OpenFOAM from Open<strong>CFD</strong><br />
JUN 11 – 14, Jeju, KR : 8 th International OpenFOAM Workshop (first in Asia)<br />
http://www.openfoamworkshop2013.org/<br />
Concentration on OpenFOAM-extend and Wikki<br />
OCT 24 – 25, Hamburg, DE : 7 th Open Source <strong>CFD</strong> International <strong>Conference</strong> (ICON)<br />
http://www.opensourcecfd.com/conference2013/<br />
Concentration on both OpenFOAM and OpenFOAM-extend<br />
37
NVIDIA Market Strategy for OpenFOAM<br />
Provide technical support for commercial <strong>GPU</strong> solver developments<br />
FluiDyna Culises AMG solver library using NVIDIA toolkit<br />
Vratis Speed-IT library, development of CUSP-based AMG<br />
Alliances (but no development) with key OpenFOAM organizations<br />
ESI and Open<strong>CFD</strong> Foundation (H. Weller, M. Salari)<br />
Wikki and OpenFOAM-extend community (H. Jasak)<br />
IDAJ in Japan and ICON in the UK – support of both OF and OF-ext<br />
Conduct performance studies and customer benchmark evaluations<br />
Collaborations: developers, customers, OEMs (Dell, SGI, HP, etc.)<br />
38
Culises: <strong>CFD</strong> Solver Library for OpenFOAM<br />
Culises Easy-to-Use AMG-PCG Solver:<br />
www.fluidyna.de<br />
#1. Download and license from http://www.FluiDyna.de<br />
#2. Automatic installation with FluiDyna-provided script<br />
#3. Activate Culises and <strong>GPU</strong>s with 2 edits to config-file<br />
config-file CPU-only<br />
config-file CPU+<strong>GPU</strong><br />
FluiDyna: TU Munich<br />
Spin-Off from 2006<br />
Culises provides a<br />
linear solver library<br />
Culises requires only<br />
two edits to control<br />
file of OpenFOAM<br />
Multi-<strong>GPU</strong> ready<br />
Contact FluiDyna<br />
for license details<br />
www.fluidyna.de<br />
39
Culises Coupling to OpenFOAM<br />
Culises Coupling is User-Transparent:<br />
www.fluidyna.de<br />
40
OpenFOAM Speedups Based on <strong>CFD</strong> Application<br />
<strong>GPU</strong> Speedups for Different Industry Cases:<br />
Range of model sizes and different solver schemes (Krylov, AMG-PCG, etc.)<br />
www.fluidyna.de<br />
Automotive<br />
1.6x<br />
Multiphase<br />
1.9x<br />
Thermal<br />
3.0x<br />
Pharma <strong>CFD</strong><br />
2.2x<br />
Process <strong>CFD</strong><br />
4.7x<br />
Job Speedup Solver Speedup OpenFOAM CPU-Only Efficiency<br />
41
FluiDyna Culises: <strong>CFD</strong> Solver for OpenFOAM<br />
Culises: A Library for Accelerated <strong>CFD</strong> on Hybrid <strong>GPU</strong>-CPU Systems<br />
Dr. Bjoern Landmann, FluiDyna<br />
developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0293-GTC2012-Culises-Hybrid-<strong>GPU</strong>.pdf<br />
www.fluidyna.de<br />
DrivAer: Joint Car Body Shape by BMW and Audi<br />
http://www.aer.mw.tum.de/en/research-groups/automotive/drivaer<br />
Mesh Size - CPUs 9M - 2 CPU 18M - 2 CPU 36M - 2 CPU<br />
Solver speedup of 7x<br />
for 2 CPU + 4 <strong>GPU</strong><br />
• 36M Cells (mixed type)<br />
• GAMG on CPU<br />
• AMGPCG on <strong>GPU</strong><br />
<strong>GPU</strong>s +1 <strong>GPU</strong> +2 <strong>GPU</strong>s +4 <strong>GPU</strong>s<br />
2.5x 4.2x 6.9x<br />
Job Speedup 1.36x 1.52x 1.67x<br />
42
Conclusions For Applied <strong>CFD</strong> on <strong>GPU</strong>s<br />
<strong>GPU</strong>s provide significant speedups for solver intensive jobs<br />
Improved product quality with higher fidelity modeling<br />
Shorten product engineering cycles with faster simulation turnaround<br />
Simulations recently considered impractical now possible<br />
Unsteady RANS, Large Eddy Simulation (LES) practical in cost and time<br />
Effective parameter optimization from large increase in number of jobs<br />
43
Stan Posey<br />
NVIDIA, Santa Clara, CA, USA; sposey@nvidia.com