CUDA Fluid Simulation in NVIDIA PhysX - Parallel Computing for ...

**CUDA** **Fluid** **Simulation** **in**

**NVIDIA** **PhysX**

Mark Harris

**NVIDIA** Developer Technology

**NVIDIA** **PhysX**

• Game physics API and eng**in**e

• Uses GPU and CPU

• Rigid Bodies

• Destruction, cluttered environments, ragdolls, vehicles

• De**for**mables

• Cloth**in**g, meaty chunks, organic creatures

• **Fluid**s (as well as simple particle systems)

• **Fluid** emitt**in**g weapons, debris effects

• Focus of this talk

**Fluid** feature requirements

• Simulate a fluid that…

• Doesn’t pass through rigid objects

• Conserves volume

• so puddles and splashes **for**m when objects are hit

• Can move and be moved by rigid bodies

• Can push objects and objects can float **in** bodies of water

• Can flow anywhere **in** a large environment

• Not conta**in**ed to a small box

• Also: multiple **in**dependent fluids per scene

• Efficient parallel multi-fluid simulation

**NVIDIA** **PhysX** **Fluid** Demo

• Available **for** download on the web:

http://www.nvidia.com/content/graphicsplus/us/download.asp

Particle-Based **Fluid**s

• Particle systems are simple and fast

• Without particle-particle **in**teractions

• Can use **for** spray, splash**in**g, leaves,

debris, sparks, etc.

• With particle-particle **in**teractions

• Can use **for** volumetric fluid simulation

Simple Particle Systems

• Particles store mass, position, velocity, age, etc.

• Integrate: d/dt x i = v i

d/dt v i = f i /m i

v i

m i

x i

f i

• Generated by emitters, deleted when

age > lifetime

Particle-Particle Interaction

• **Fluid** simulation with particles requires **in**ter-particle **for**ces

• O(n 2 ) potential computations **for** n particles!

h

• Reduce to l**in**ear complexity O(n) by def**in****in**g **in**teraction

cutoff distance h

Spatial Hash**in**g

h

h

• Fill particles **in**to grid with spac**in**g h

• Only search potential neighbors **in** adjacent cells

• Map cells [i,j,k] **in**to 1D array via hash function h(i,j,k)

• [Teschner03]

Navier-Stokes Equations

v 2

t

v

v

g

p

v

Gravity

Viscosity

Inertia: velocity

advects velocity

Pressure gradient:

fluid moves from

high pressure to low

v

0

Conservation of mass:

guaranteed by particles

Equation of Motion

t

v

v

• Because particles follow the fluid we have:

Dv

Dt

v 2

v

t

g

v v

p

dv

dt

i

a

i

v

• Thus acceleration of particle i is just a i = f i / ρ i

• where f i is the body **for**ce evaluated at x i

Solv**in**g the Navier-Stokes Equations

• How do we **for**mulate fluids on particles?

• We need to evaluate cont**in**uous fields (e.g. v(x))

• Only have v i sampled on particles

• Basic idea:

• Particles **in**duce smooth local fields

• Global field is sum of local fields

• Smoothed Particle Hydrodynamics

SPH

• Invented **for** the simulation of stars [Monaghan 92]

• Our implementation based on [Müller 03]

• Use scalar kernel function W(r)

• W x)

W ( x x )

i

(

i

• Normalized:

W i

( x)

dx

1

r

x

x i

• Example [Müller 03]:

315 2 2 3

W ( r,

h)

( h r ) , 0

9

64 h

r

h

Density Computation

• Global Density Field

( x ) m W ( x )

• Density of each particle

j

i

( x i

)

j

x j

• Mass conservation guaranteed if W is normalized

( x ) m W ( x x ) dx

j

j

j

j

m

j

Pressure

• The pressure term yields:

m

j

f p(

x ) W ( x x )

pressure

i

i

• Symmetrize: (SPH problem: action ≠ reaction)

j

j

i

j

f

pressure

i

p(

x

i

)

where

p

i

j

m

k(

j

j

i

p

i

p

2

)

0

j

W ( x

i

x

j

),

• k is gas constant (stiffness), ρ 0 is desired rest density

• Other state laws are possible [Becker 07]

Rema**in****in**g **for**ces

• External **for**ce, e.g. gravity:

f

external

i

i

g

• Viscosity (symmetrized)

v

j

vi

2

f m W ( x x )

viscosity

i

j

j

j

i

j

SPH algorithm

• Build spatial data structure on particles

• To enable fast neighbor f**in**d**in**g

• For each particle

• f**in**d neighbors and compute density

• For each particle

• f**in**d neighbors, compute **for**ce and update velocity

• For each particle

• F**in**d neighbor**in**g triangles, check collision, update pos

**Parallel** SPH algorithm

• Build spatial data structure on particles (**in** parallel)

• To enable fast neighbor f**in**d**in**g

• For all particles **in** parallel

• f**in**d neighbors and compute density

• For all particles **in** parallel

• f**in**d neighbors, compute **for**ce and update velocity

• For all particles **in** parallel

• F**in**d neighbor**in**g triangles, check collision, update pos

GPU Algorithm Goals

• For best GPU per**for**mance, we need

• Thousands of threads to cover memory latency

• M**in**imal communication/sync between threads

• Similar amount of work per thread

• High arithmetic **in**tensity

SPH Algorithm properties

• Explicit **in**tegration, accept some compressibility

• Each particle **in**dependent so can be given own thread

• No synchronization or r/w comms between threads

• Particles have many neighbors

• High arithmetic **in**tensity

• Variable number of neighbors

• Threads take different amounts of time

• However there is coherence **in** workload

• Pair computation quite complex

• Register-**in**tensive

Spatial Data Structures

• Must enable sparse fluids **in** large environments

• Must quickly f**in**d all neighbors of each particle

• With**in** a given radius

• Options

• KD-tree

• Octree

• Hashed uni**for**m grid

Hashed Uni**for**m Grid

• Abstract, **in**f**in**ite 3D grid mapped to concrete, f**in**ite grid

• Concrete grid = 64*64*64 uni**for**m grid

• Abstract (x,y,z) maps to concrete (x%64,y%64, z%64).

• Neighbor f**in**d**in**g is fast (assum**in**g no collisions)

• Abstract neighbor cell (x+dx,y+dy,z+dz) maps to

concrete ((x+dx)%64,(y+dy)%64,(z+dz)%64)

• Like hash**in**g, but with an **in**tuitive hash function

• Can support multiple fluids by stor**in**g fluid ID **in** hash

Build**in**g the Hashed Grid

• For all particles **in** parallel

• Compute hash key (cell **in**dex) from position

• Sort particle (key, particleID) pairs by hash key

• Use a fast GPU radix sort

• Use sorted order to re-order particle pos, vel arrays

• Sorted arrays ensure coherent memory access

• Store start**in**g (sorted) particle **in**dex **for** each grid cell

• Some waste **for** large, sparse grids

Hashed Grid: Array of Lists

Grid Cell: 0 1 2 3 4 5 …

p23 p17 p133 p9 p42

p3

p97

p2

p0

Count:

2 3 0 2 1 1

Offset:

0 2 5 5 7 8

Storage:

p23 p3 p17 p97 p0 p133 p2 p9 p42

Radix Sort

• Fast **CUDA** radix sort **for** variable-length keys

• Advantage: many fluids can be sorted at once

• Put fluid ID **in** MSBs, particle ID **in** LSBs

• With 64 3 grid, 4 fluids = 20-bit key, 64 fluids = 24-bit key

• “Design**in**g Efficient Sort**in**g Algorithms **for** Manycore

GPUs”. N. Satish, M. Harris, and M. Garland. To appear **in**

Proceed**in**gs of IEEE International **Parallel** and Distributed

Process**in**g Symposium 2009.

Mapp**in**g SPH to hardware

• Each particle is assigned a thread

• Thread queries grid to f**in**d its neighbors

• Due to virtual grid, some “neighbors” may be far away

• SPH smooth**in**g will ensure they don’t have an effect

• Can early exit density/pressure calc if dist is too large

• Hasn’t been a per**for**mance issue yet **for** **PhysX**

Comput**in**g Density and Force

• For each particle position x use grid to iterate over

neighbor particles j and compute density:

( x)

m

jW

( x x

j

j

• For each particle position x use grid to iterate over

neighbor particles j and compute **for**ce:

m

pressure

j pi

pi

fi

W ( xi

x

j

), where pi

k(

i

j j

2

• Other **for**ces:

• viscosity computed with pressure, gravity is trivial

)

0

)

Collision

• Dynamic shape collisions

• Test aga**in**st dynamic boxes, capsules and convexes

• Compute dynamic collision restitution

• Static shape collisions

• Test aga**in**st static boxes, capsules, and convexes

• Compute static collision restitution

• Triangle Meshes (e.g. terra**in**) collisions

• F**in**d collisions of particles with static meshes

• Compute static collision restitution

**Parallel**iz**in**g Collision Detection

• Host broad phase generates collision work units:

• Particle “packets” + potentially collid**in**g objects (PCO)

• One **CUDA** thread block per work unit, thread per particle

• Test (cont**in**uous) collision with each PCO

• Uni**for**m per-thread work

• Convex shapes slightly trickier due to variable # of planes

• But still uni**for**m work across threads **in** a work unit

**Fluid** Render**in**g

Screen Space Mesh Render**in**g

• Similar to [Müller 2007], without explicit mesh**in**g

• Setup:

• Render scene to color texture

• Render particles to depth texture as sphere po**in**t sprites

• Render particle thickness to thickness texture with

additive blend**in**g

• Smooth depth texture to avoid “particle appearance”

• Compute surface normal texture

F**in**al render**in**g

• Use color, depth and thickness buffers

C

out

a( 1 F(

n v))

bF(

n v)

k ( n h)

s

Fresnel Refraction Fresnel Reflection Phong Highlight

• With

a

lerp( C

fluid

, S(

x

n

x

,

y

n

y

), e

T ( x,

y)

),

T(x,

y)

• S = scene color texture; T(x, y) = thickness texture; n =

normal texture computed from smoothed depth

Smooth**in**g the depth buffer

• Initially tried Gaussian and Bilateral Gaussian filters

• Gaussian kernels produce leak**in**g

• Bilateral filters are non-separable (expensive)

• Released **PhysX** fluid demo uses Gaussian blur

• **Fluid** looks slightly bumpy, like “tapioca”

• Look at the problem a different way:

• Need to smooth out sudden changes **in** curvature

• Seems natural, s**in**ce this is what surface tension does

Screen-space Curvature Flow

• Curvature flow [Malladi and Sethian 1995]

• Evolve surface along normal direction with speed

determ**in**ed by mean surface curvature

• Screen-space curvature flow moves surface only **in** Z

• Hence screen-space, s**in**ce it modifies depth only.

• Works well **in** practice

• “Screen Space **Fluid** Render**in**g with Curvature Flow”.

Wladimir Van der Laan, Simon Green, and Miguel Sa**in**z.

To appear **in** proceed**in**gs of I3D 2009.

**Fluid** Render**in**g

Curvature

Iteration 0 Iteration 40 Iteration 60

•Iterative screen space curvature flow smooth**in**g

Gaussian-based smooth**in**g

Curvature flow smooth**in**g

Curvature flow smooth**in**g

with perl**in** noise

Gaussian-based smooth**in**g

Curvature flow smooth**in**g

Gaussian-based smooth**in**g

Curvature flow smooth**in**g

Curvature flow smooth**in**g

with Perl**in** noise

Acknowledgements

• **PhysX** **Fluid**s project contributors (direct and **in**direct)

• **NVIDIA** **PhysX** team

• Simon Schirm: fluid software architect

• **NVIDIA** Developer Technology team

• **NVIDIA** **CUDA** Software team

• **NVIDIA** Research

• Richard Tonge, Simon Green, Miguel Sa**in**z,

Matthias Müller-Fischer

• slides and content

Questions?

mharris@nvidia.com

References

• [Becker 07] M. Becker and M. Teschner. “Weakly

compressible SPH **for** free surface flows”. SCA 07

• [Monaghan 92] J.J. Monaghan. “Smoothed Particle

Hydrodynamics”. Annual Review of Astronomy and

Astrophysics, 30:543-574, 1992

• [Müller 03] Müller, M., et al.“Particle-based **Fluid** **Simulation**

**for** Interactive Applications”. SCA 03

• [Müller 07] Müller, M., Schirm, S., and Duthaler, S. “Screenspace

Meshes”. SCA 07

References

• [Satish 09] Satish, N., Harris, M., and Garland, M. “Design**in**g

Efficient Sort**in**g Algorithms **for** Manycore GPUs”. Proceed**in**gs

of IEEE International **Parallel** and Distributed Process**in**g

Symposium (IPDPS) 2009 (to appear).

• [Teschner03] M. Teschner et al., “Optimized Spatial Hash**in**g

**for** Collision Detection of De**for**mable Objects”, VMV 03

• [van der Laan 09] van der Laan, W., Green, S., and Sa**in**z, M.

“Screen Space **Fluid** Render**in**g with Curvature Flow”. I3D

2009 (to appear).