GPU Acceleration in LAMMPS

More documents

Recommendations

Info

Minimizing the Code Burden • Disadvantages • For some simula1ons, the upper bound for performance improvements due to por1ng addi1onal rou1nes and removing data transfer is as high as 50% (for up to 1 million par1cles on a single GPU with current hardware). • Increasing the number of MPI processes can impact interprocess communica1on performance • Mul1ple MPI processes sharing a GPU currently results in the execu1on of more kernels, each with a smaller amount of work • Paralleliza1on of the CPU rou1nes with OpenMP would likely be a significant effort for efficient simula1ons in LAMMPS. USER-‐CUDA package is now available in LAMMPS as an alternaCve library for acceleraCon. In some cases, the enCre simulaCon can be run on the GPU.
“Data transfer is the bottleneck when using GPU acceleration” • If neighbor list builds are performed on the GPU, data transfer can be a small frac1on of the total simula1on 1me • For this reason, accelerated neighbor list builds can be important despite the rela1vely poor performance on the GPU • For the rhodopsin benchmark, data transfer is less than 6% of the GPU calcula1on 1mes • Will be a smaller percentage of the total simula1on 1me because the calcula1on of bonded forces, 1me integra1on with a thermostat/barostat, SHAKE, and the Poisson solve must be calculated somewhere and we need to do MPI comm • In some cases, can benefit from overlapping data transfer with computa1on 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Force Interpola1on Short Range Force Data Pack Charge Assignment Neighbor Build Data Transfer 1 2 4 8 16 32 GPUs/CPU Cores
Page 1 and 2: GPU Acceleration in LAMMPS W. Mi
Page 3 and 4: Clock Rates are not Increasing
Page 6 and 7: 350 300 250 200 150 100 50 0 CPU Da
Page 8 and 9: Neighbor List Builds • Accelera
Page 10 and 11: USE GPUS GPUS “All rou1nes mus
Page 12 and 13: Minimizing the Code Burden • F
Page 14 and 15: Host-‐Device Load Balancing
Page 18 and 19: “Double precision calculations
Page 20 and 21: Geryon Library • Allows same c
Page 22 and 23: Atomic Lennard-‐Jones Fluid (
Page 24 and 25: Ellipsoidal Particles (256k parti
Page 26 and 27: Rhodopsin Protein (32K Atoms; Sc
Page 28 and 29: Bottle-‐Brush Polyelectrolytes
Page 30 and 31: GPU ACCELERATION AT ORNL
Page 32 and 33: Titan System Goals • Designed
Page 34 and 35: Future Work • Improve performan

GPU Acceleration in LAMMPS

Create successful ePaper yourself

Delete template?

Save as template?