Multi-Threaded Fluid Simulation for Games
By Hugh Smith and Jeff Freeman
Along with mentioning a strange rash, a sure way to kill
a conversation is to bring up the topic of Computational
Fluid Dynamics (CFD). But for those who wish to create
fantastic-looking clouds, explosions, smoke, and other
game effects, nothing could be more exciting! CFD is the
section of fluid mechanics that finds numerical solutions to
the equations describing the behavior of fluids, both liquid
and gaseous, and their interaction with various forces.
Until recent hardware advances, even approximating a
solution was too computationally intense for real-time
applications. However, the last year or two has shown
a small but increasing amount of research and articles,
as well as the release of some commercial products that
deal with simulating fluids, not only in real time but in a
video game environment. With the Kaboom project, we’ve
created a modular, threaded CFD simulation that can be
dropped into an existing project, hooked up to a rendering
engine, and used to generate realistic, real-time fluid
dynamics simulations, suitable for smoke or water.
We observed that many games are not reaching their
full potential because they’re unable to use all of the
available CPU resources. Frequently, this is because they
are not properly threaded for multi-core architectures.
Rather than run the simulation on the GPU, we decided
to use those extra cycles to produce a basic, real-time
fluid simulation. By performing the simulation on the CPU
as opposed to the GPU, the simulation has fast and easy
access to all game resources (such as mesh information)
and the game has access to the simulation results. This also
leaves the GPU free to handle the graphics.
As we continued our research we found that although
interest in fluid simulations was high, little concrete
material was available, especially in the way of code and
examples. We were unable to find examples of 3D or
multi-threaded solvers. Most of the articles described ways
to solve the equations but did not use a multi-threaded
approach. Despite this, there seemed to be a lot of interest
in the topic, so we decided that another goal of the project
would be to make the resulting code modular and thereby
fairly simple for someone to integrate into their own
project. Developers can also use and expand upon the set
of threaded, modular routines that we created for 3D CFD.
MULTI-THREADED FLUID SIMULATION FOR GAMES
The code is based on an article and sample code written
by Mick West that was recently posted on Gamasutra. 1,2
Since one of our goals was to produce a modular, reusable
code base, the first thing we did was to convert the
existing sample to C++ and separate the simulation from
the rest of the application, specifically from the rendering.
This separation allows the simulation code to be easily
added to an existing code base and rendering engine.
Next, we extended the code into three dimensions,
which turned out to be a fairly straightforward exercise
because none of the algorithms changed appreciably with
the addition of another dimension. For example, the 2D
diffusion step is solved by applying the diffusion operation
to move values from a cell to its neighbors (Figures 1
and 2). This approach is extended in the 3D case so that
instead of inspecting four neighboring cells we look at six
neighbors (Figures 1 and 3). Other cases used the same
principle cpe as well.
Figure 2. A code sample that shows 2D diffusion.
Figure 3. A code sample that shows 3D diffusion.
Figure 1. Transitioning code from 2D to 3D.
The following two code samples (Figures 2 and 3)
show the code that performs the diffusion operation
for non-border case cells in both the 2D and 3D cases.
The border cells are those cells at the very edge of the
grid that are missing one or more neighbors. These cells
are handled separately as special cases. The transition
to 3D is accomplished by adding in consideration for the
z-axis. Apart from that, the algorithm remains essentially
After transitioning the simulation into 3D, we began
investigating how to break the work up for parallel
processing. As a base for our multi-core work we used
the Intel® Threading Building Blocks (TBB) package. We
found TBB to be very competitive in performance with
other threading APIs. Additionally, it has a lot of built-in
functionality and structures, such as a thread pool, task
manager, and multi-threaded memory management.
The idea behind threading the simulation is data
decomposition: take the grid and break it into smaller
portions. These smaller portions are submitted as tasks
to the task queue and processed by the worker threads,
independently of each other. The grid is broken up evenly
until a small-enough granularity is reached, at which point
further reductions become ineffective, or worse, inefficient
due to thread management overhead. Figure 4 shows an
example of how a 2D grid is broken into four tasks.
West, Mick. “Practical Fluid Dynamics: Part 1” Gamasutra, 26 June 2008. http://www.gamasutra.com/view/feature/1549/practical_fluid_dynamics_part_1.php
West, Mick. “Practical Fluid Dynamics: Part 2” Gamasutra, 23 July 2008. http://www.gamasutra.com/view/feature/1615/practical_fluid_dynamics__part_ii.php
MULTI-THREADED FLUID SIMULATION FOR GAMES
The main computational load of the simulation occurs
when each step loops over the grid. These are all nested
loops, so a straightforward way to break them up is by
using the “parallel for” construction of TBB. In the example
case of density diffusion, the code went from the serial
version shown in Figure 3 to the parallel version shown in
Figures 5 and 6.
Figure 5. Using Intel® Threading Building Blocks to call the
Figure 4. Breaking up a 2D grid into tasks and an individual cell
adjacent to other tasks
For example, in the simulation there is a diffusion step
followed by a force update step and later, an advection
step. At each step the grid is broken into pieces, and the
tasks write their data to a shared output grid, which
becomes the input into the next step. In most cases, this
approach works well. However, a problem can arise if the
calculation uses its own results in the current step. That
is, the output of cell 1 is used to calculate cell 2. Looking
at Figure 4, it is obvious that certain cells, like the shaded
cell, will be processed in a task separate from the task
in which their immediate neighbors are processed. Since
each task uses the same input, the neighboring cells’ data
will be different in the multi-threaded version compared
to the serial version. Whether this is an issue depends on
which portion of the simulation we are in. Inaccuracies in
the results in the diffusion step can occur, but they are not
noticeable and don’t introduce destabilizing effects. In most
cases the inaccuracies are smoothed out by subsequent
iterations over the data and have no lasting impact on the
simulation. This kind of behavior is acceptable because the
primary concern is visual appeal, not numerical accuracy.
However, in other cases inaccuracies can accumulate,
introducing a destabilizing effect on the simulation, which
can quickly destroy any semblance of normal behavior. An
example of this occurs while processing the edge cases in
diffusion—where diffusion is calculated along the edges
of the grid or cube. When run in parallel, the density does
not diffuse out into the rest of the volume and instead
accumulates at the edges, forming a solid border. For
this situation and similar ones, we simply run that
Figure 5 shows how TBB is used to call the diffusion
algorithm. The arguments uBegin and uEnd are the
start and end values of the total range to process, and
uGrainSize is how small to break the range into. For
example, if uBegin is 0 and uEnd is 99, the total range is
100 units. If uGrainSize is 10, 10 tasks will be submitted,
each of which will process a range of 10.
Figure 6 shows the actual algorithm called by the TBB
task manager. The range to process is passed in, and the
input and output grids are variables common to each job.
As a result of using multi-threading, the performance in
frames per second was improved by 3.76× when going
from one thread to eight threads on an Intel® Core i7
Figure 6. A code sample that shows 3D, multi-threaded diffusion.
MULTI-THREADED FLUID SIMULATION FOR GAMES
Although the goal of the project was to take unused
CPU cycles and use them to compute a CFD simulation,
the results won’t have much impact without a nice, visual
representation. There are many methods of rendering
volumetric data, but we like the results we got using
the ray casting method detailed in Real-Time Volume
Graphics. 3 Using this method, the volumetric data is copied
into a 3D texture, which is used in a pixel shader on the
GPU. The simulation volume is rendered as a cube at the
position of the volume in the 3D scene. For each visible
pixel, a ray is cast from the camera to this pixel on the
cube. The corresponding point in the 3D texture is found
and sampled and the ray marches through the texture at
a fixed step size, accumulating the density data as it goes.
With Kaboom, we created a modular fluid simulation
that adds to the existing knowledge base. It operates in
3D, is independent of the rendering portion, and is multithreaded
to take advantage of multi-core processors. Of
course, there is still plenty of future work that someone
can do with this code sample. On the simulation side, a
new or modified algorithm could be used to remove the
serial requirements. Also, we found that a section of the
code that does bilinear interpolation calculations takes up
a significant amount of processing time, which could be
reduced by storing the results in a lookup table. Another
interesting avenue to explore is the introduction of
arbitrary boundaries. Since the simulation is running on
the CPU it has access to and can react with the geometry
data of the scene. The rendering could be enhanced with
additional effects, such as shadowing or dynamic lighting.
About the Author
Quentin Froemke is a software engineer with Intel’s
Visual Computing Software Division where he enables the
game industry to more effectively utilize multi-core and
other Intel technologies.
For more information about Intel® Threading Building
Blocks, visit http://www.threadingbuildingblocks.org
Engel, Klaus, ed. Real-Time Volume Graphics, Wellesley, Massachusetts: A. K. Peters, Ltd., 2006.
Sign up today for Intel® Visual Adrenaline magazine:
Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality,
or compatibility of third-party vendors and their devices.
All products, dates, and plans are based on current expectations and subject to change without notice.
Intel, Intel logo, Intel Core, Pentium, VTune, and Xeon are trademarks or registered trademarks of Intel
Corporation or its subsidiaries in the United States and other countries.
*Other names and brands may be claimed as the property of others.
Copyright 2009. Intel Corporation. All rights reserved. 0309/CS/RHM/PDF