Multi-Threaded Fluid Simulation

Multi-Threaded Fluid Simulation

Multi-Threaded Fluid Simulation for Games

By Hugh Smith and Jeff Freeman

Along with mentioning a strange rash, a sure way to kill

a conversation is to bring up the topic of Computational

Fluid Dynamics (CFD). But for those who wish to create

fantastic-looking clouds, explosions, smoke, and other

game effects, nothing could be more exciting! CFD is the

section of fluid mechanics that finds numerical solutions to

the equations describing the behavior of fluids, both liquid

and gaseous, and their interaction with various forces.

Until recent hardware advances, even approximating a

solution was too computationally intense for real-time

applications. However, the last year or two has shown

a small but increasing amount of research and articles,

as well as the release of some commercial products that

deal with simulating fluids, not only in real time but in a

video game environment. With the Kaboom project, we’ve

created a modular, threaded CFD simulation that can be

dropped into an existing project, hooked up to a rendering

engine, and used to generate realistic, real-time fluid

dynamics simulations, suitable for smoke or water.

We observed that many games are not reaching their

full potential because they’re unable to use all of the

available CPU resources. Frequently, this is because they

are not properly threaded for multi-core architectures.

Rather than run the simulation on the GPU, we decided

to use those extra cycles to produce a basic, real-time

fluid simulation. By performing the simulation on the CPU

as opposed to the GPU, the simulation has fast and easy

access to all game resources (such as mesh information)

and the game has access to the simulation results. This also

leaves the GPU free to handle the graphics.

As we continued our research we found that although

interest in fluid simulations was high, little concrete

material was available, especially in the way of code and

examples. We were unable to find examples of 3D or

multi-threaded solvers. Most of the articles described ways

to solve the equations but did not use a multi-threaded

approach. Despite this, there seemed to be a lot of interest

in the topic, so we decided that another goal of the project

would be to make the resulting code modular and thereby

fairly simple for someone to integrate into their own

project. Developers can also use and expand upon the set

of threaded, modular routines that we created for 3D CFD.




The code is based on an article and sample code written

by Mick West that was recently posted on Gamasutra. 1,2

Since one of our goals was to produce a modular, reusable

code base, the first thing we did was to convert the

existing sample to C++ and separate the simulation from

the rest of the application, specifically from the rendering.

This separation allows the simulation code to be easily

added to an existing code base and rendering engine.

Next, we extended the code into three dimensions,

which turned out to be a fairly straightforward exercise

because none of the algorithms changed appreciably with

the addition of another dimension. For example, the 2D

diffusion step is solved by applying the diffusion operation

to move values from a cell to its neighbors (Figures 1

and 2). This approach is extended in the 3D case so that

instead of inspecting four neighboring cells we look at six

neighbors (Figures 1 and 3). Other cases used the same

principle cpe as well.

Figure 2. A code sample that shows 2D diffusion.

Figure 3. A code sample that shows 3D diffusion.

Figure 1. Transitioning code from 2D to 3D.

The following two code samples (Figures 2 and 3)

show the code that performs the diffusion operation

for non-border case cells in both the 2D and 3D cases.

The border cells are those cells at the very edge of the

grid that are missing one or more neighbors. These cells

are handled separately as special cases. The transition

to 3D is accomplished by adding in consideration for the

z-axis. Apart from that, the algorithm remains essentially

the same.

After transitioning the simulation into 3D, we began

investigating how to break the work up for parallel

processing. As a base for our multi-core work we used

the Intel® Threading Building Blocks (TBB) package. We

found TBB to be very competitive in performance with

other threading APIs. Additionally, it has a lot of built-in

functionality and structures, such as a thread pool, task

manager, and multi-threaded memory management.

The idea behind threading the simulation is data

decomposition: take the grid and break it into smaller

portions. These smaller portions are submitted as tasks

to the task queue and processed by the worker threads,

independently of each other. The grid is broken up evenly

until a small-enough granularity is reached, at which point

further reductions become ineffective, or worse, inefficient

due to thread management overhead. Figure 4 shows an

example of how a 2D grid is broken into four tasks.


West, Mick. “Practical Fluid Dynamics: Part 1” Gamasutra, 26 June 2008.


West, Mick. “Practical Fluid Dynamics: Part 2” Gamasutra, 23 July 2008.



The main computational load of the simulation occurs

when each step loops over the grid. These are all nested

loops, so a straightforward way to break them up is by

using the “parallel for” construction of TBB. In the example

case of density diffusion, the code went from the serial

version shown in Figure 3 to the parallel version shown in

Figures 5 and 6.

Figure 5. Using Intel® Threading Building Blocks to call the

diffusion algorithm.

Figure 4. Breaking up a 2D grid into tasks and an individual cell

adjacent to other tasks

For example, in the simulation there is a diffusion step

followed by a force update step and later, an advection

step. At each step the grid is broken into pieces, and the

tasks write their data to a shared output grid, which

becomes the input into the next step. In most cases, this

approach works well. However, a problem can arise if the

calculation uses its own results in the current step. That

is, the output of cell 1 is used to calculate cell 2. Looking

at Figure 4, it is obvious that certain cells, like the shaded

cell, will be processed in a task separate from the task

in which their immediate neighbors are processed. Since

each task uses the same input, the neighboring cells’ data

will be different in the multi-threaded version compared

to the serial version. Whether this is an issue depends on

which portion of the simulation we are in. Inaccuracies in

the results in the diffusion step can occur, but they are not

noticeable and don’t introduce destabilizing effects. In most

cases the inaccuracies are smoothed out by subsequent

iterations over the data and have no lasting impact on the

simulation. This kind of behavior is acceptable because the

primary concern is visual appeal, not numerical accuracy.

However, in other cases inaccuracies can accumulate,

introducing a destabilizing effect on the simulation, which

can quickly destroy any semblance of normal behavior. An

example of this occurs while processing the edge cases in

diffusion—where diffusion is calculated along the edges

of the grid or cube. When run in parallel, the density does

not diffuse out into the rest of the volume and instead

accumulates at the edges, forming a solid border. For

this situation and similar ones, we simply run that

portion serially.

Figure 5 shows how TBB is used to call the diffusion

algorithm. The arguments uBegin and uEnd are the

start and end values of the total range to process, and

uGrainSize is how small to break the range into. For

example, if uBegin is 0 and uEnd is 99, the total range is

100 units. If uGrainSize is 10, 10 tasks will be submitted,

each of which will process a range of 10.

Figure 6 shows the actual algorithm called by the TBB

task manager. The range to process is passed in, and the

input and output grids are variables common to each job.

As a result of using multi-threading, the performance in

frames per second was improved by 3.76× when going

from one thread to eight threads on an Intel® Core i7


Figure 6. A code sample that shows 3D, multi-threaded diffusion.




Although the goal of the project was to take unused

CPU cycles and use them to compute a CFD simulation,

the results won’t have much impact without a nice, visual

representation. There are many methods of rendering

volumetric data, but we like the results we got using

the ray casting method detailed in Real-Time Volume

Graphics. 3 Using this method, the volumetric data is copied

into a 3D texture, which is used in a pixel shader on the

GPU. The simulation volume is rendered as a cube at the

position of the volume in the 3D scene. For each visible

pixel, a ray is cast from the camera to this pixel on the

cube. The corresponding point in the 3D texture is found

and sampled and the ray marches through the texture at

a fixed step size, accumulating the density data as it goes.


With Kaboom, we created a modular fluid simulation

that adds to the existing knowledge base. It operates in

3D, is independent of the rendering portion, and is multithreaded

to take advantage of multi-core processors. Of

course, there is still plenty of future work that someone

can do with this code sample. On the simulation side, a

new or modified algorithm could be used to remove the

serial requirements. Also, we found that a section of the

code that does bilinear interpolation calculations takes up

a significant amount of processing time, which could be

reduced by storing the results in a lookup table. Another

interesting avenue to explore is the introduction of

arbitrary boundaries. Since the simulation is running on

the CPU it has access to and can react with the geometry

data of the scene. The rendering could be enhanced with

additional effects, such as shadowing or dynamic lighting.

About the Author

Quentin Froemke is a software engineer with Intel’s

Visual Computing Software Division where he enables the

game industry to more effectively utilize multi-core and

other Intel technologies.


For more information about Intel® Threading Building

Blocks, visit


Engel, Klaus, ed. Real-Time Volume Graphics, Wellesley, Massachusetts: A. K. Peters, Ltd., 2006.

Sign up today for Intel® Visual Adrenaline magazine: >>

Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality,

or compatibility of third-party vendors and their devices.

All products, dates, and plans are based on current expectations and subject to change without notice.

Intel, Intel logo, Intel Core, Pentium, VTune, and Xeon are trademarks or registered trademarks of Intel

Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Copyright 2009. Intel Corporation. All rights reserved. 0309/CS/RHM/PDF



More magazines by this user
Similar magazines