Multi-Threaded Fluid Simulation

Multi-Threaded Fluid Simulation for Games 

By Hugh Smith and Jeff Freeman 

Along with mentioning a strange rash, a sure way to kill 

a conversation is to bring up the topic of Computational 

Fluid Dynamics (CFD). But for those who wish to create 

fantastic-looking clouds, explosions, smoke, and other 

game effects, nothing could be more exciting! CFD is the 

section of fluid mechanics that finds numerical solutions to 

the equations describing the behavior of fluids, both liquid 

and gaseous, and their interaction with various forces. 

Until recent hardware advances, even approximating a 

solution was too computationally intense for real-time 

applications. However, the last year or two has shown 

a small but increasing amount of research and articles, 

as well as the release of some commercial products that 

deal with simulating fluids, not only in real time but in a 

video game environment. With the Kaboom project, we’ve 

created a modular, threaded CFD simulation that can be 

dropped into an existing project, hooked up to a rendering 

engine, and used to generate realistic, real-time fluid 

dynamics simulations, suitable for smoke or water. 

We observed that many games are not reaching their 

full potential because they’re unable to use all of the 

available CPU resources. Frequently, this is because they 

are not properly threaded for multi-core architectures. 

Rather than run the simulation on the GPU, we decided 

to use those extra cycles to produce a basic, real-time 

fluid simulation. By performing the simulation on the CPU 

as opposed to the GPU, the simulation has fast and easy 

access to all game resources (such as mesh information) 

and the game has access to the simulation results. This also 

leaves the GPU free to handle the graphics. 

As we continued our research we found that although 

interest in fluid simulations was high, little concrete 

material was available, especially in the way of code and 

examples. We were unable to find examples of 3D or 

multi-threaded solvers. Most of the articles described ways 

to solve the equations but did not use a multi-threaded 

approach. Despite this, there seemed to be a lot of interest 

in the topic, so we decided that another goal of the project 

would be to make the resulting code modular and thereby 

fairly simple for someone to integrate into their own 

project. Developers can also use and expand upon the set 

of threaded, modular routines that we created for 3D CFD. 

1

MULTI-THREADED FLUID SIMULATION FOR GAMES 

Simulation 

The code is based on an article and sample code written 

by Mick West that was recently posted on Gamasutra. 1,2 

Since one of our goals was to produce a modular, reusable 

code base, the first thing we did was to convert the 

existing sample to C++ and separate the simulation from 

the rest of the application, specifically from the rendering. 

This separation allows the simulation code to be easily 

added to an existing code base and rendering engine. 

Next, we extended the code into three dimensions, 

which turned out to be a fairly straightforward exercise 

because none of the algorithms changed appreciably with 

the addition of another dimension. For example, the 2D 

diffusion step is solved by applying the diffusion operation 

to move values from a cell to its neighbors (Figures 1 

and 2). This approach is extended in the 3D case so that 

instead of inspecting four neighboring cells we look at six 

neighbors (Figures 1 and 3). Other cases used the same 

principle cpe as well. 

Figure 2. A code sample that shows 2D diffusion. 

Figure 3. A code sample that shows 3D diffusion. 

Figure 1. Transitioning code from 2D to 3D. 

The following two code samples (Figures 2 and 3) 

show the code that performs the diffusion operation 

for non-border case cells in both the 2D and 3D cases. 

The border cells are those cells at the very edge of the 

grid that are missing one or more neighbors. These cells 

are handled separately as special cases. The transition 

to 3D is accomplished by adding in consideration for the 

z-axis. Apart from that, the algorithm remains essentially 

the same. 

After transitioning the simulation into 3D, we began 

investigating how to break the work up for parallel 

processing. As a base for our multi-core work we used 

the Intel® Threading Building Blocks (TBB) package. We 

found TBB to be very competitive in performance with 

other threading APIs. Additionally, it has a lot of built-in 

functionality and structures, such as a thread pool, task 

manager, and multi-threaded memory management. 

The idea behind threading the simulation is data 

decomposition: take the grid and break it into smaller 

portions. These smaller portions are submitted as tasks 

to the task queue and processed by the worker threads, 

independently of each other. The grid is broken up evenly 

until a small-enough granularity is reached, at which point 

further reductions become ineffective, or worse, inefficient 

due to thread management overhead. Figure 4 shows an 

example of how a 2D grid is broken into four tasks. 

1 

West, Mick. “Practical Fluid Dynamics: Part 1” Gamasutra, 26 June 2008. http://www.gamasutra.com/view/feature/1549/practical_fluid_dynamics_part_1.php 

2 

West, Mick. “Practical Fluid Dynamics: Part 2” Gamasutra, 23 July 2008. http://www.gamasutra.com/view/feature/1615/practical_fluid_dynamics__part_ii.php 

2


The main computational load of the simulation occurs 

when each step loops over the grid. These are all nested 

loops, so a straightforward way to break them up is by 

using the “parallel for” construction of TBB. In the example 

case of density diffusion, the code went from the serial 

version shown in Figure 3 to the parallel version shown in 

Figures 5 and 6. 

Figure 5. Using Intel® Threading Building Blocks to call the 

diffusion algorithm. 

Figure 4. Breaking up a 2D grid into tasks and an individual cell 

adjacent to other tasks 

For example, in the simulation there is a diffusion step 

followed by a force update step and later, an advection 

step. At each step the grid is broken into pieces, and the 

tasks write their data to a shared output grid, which 

becomes the input into the next step. In most cases, this 

approach works well. However, a problem can arise if the 

calculation uses its own results in the current step. That 

is, the output of cell 1 is used to calculate cell 2. Looking 

at Figure 4, it is obvious that certain cells, like the shaded 

cell, will be processed in a task separate from the task 

in which their immediate neighbors are processed. Since 

each task uses the same input, the neighboring cells’ data 

will be different in the multi-threaded version compared 

to the serial version. Whether this is an issue depends on 

which portion of the simulation we are in. Inaccuracies in 

the results in the diffusion step can occur, but they are not 

noticeable and don’t introduce destabilizing effects. In most 

cases the inaccuracies are smoothed out by subsequent 

iterations over the data and have no lasting impact on the 

simulation. This kind of behavior is acceptable because the 

primary concern is visual appeal, not numerical accuracy. 

However, in other cases inaccuracies can accumulate, 

introducing a destabilizing effect on the simulation, which 

can quickly destroy any semblance of normal behavior. An 

example of this occurs while processing the edge cases in 

diffusion—where diffusion is calculated along the edges 

of the grid or cube. When run in parallel, the density does 

not diffuse out into the rest of the volume and instead 

accumulates at the edges, forming a solid border. For 

this situation and similar ones, we simply run that 

portion serially. 

Figure 5 shows how TBB is used to call the diffusion 

algorithm. The arguments uBegin and uEnd are the 

start and end values of the total range to process, and 

uGrainSize is how small to break the range into. For 

example, if uBegin is 0 and uEnd is 99, the total range is 

100 units. If uGrainSize is 10, 10 tasks will be submitted, 

each of which will process a range of 10. 

Figure 6 shows the actual algorithm called by the TBB 

task manager. The range to process is passed in, and the 

input and output grids are variables common to each job. 

As a result of using multi-threading, the performance in 

frames per second was improved by 3.76× when going 

from one thread to eight threads on an Intel® Core i7 

processor. 

Figure 6. A code sample that shows 3D, multi-threaded diffusion. 

3


Rendering 

Although the goal of the project was to take unused 

CPU cycles and use them to compute a CFD simulation, 

the results won’t have much impact without a nice, visual 

representation. There are many methods of rendering 

volumetric data, but we like the results we got using 

the ray casting method detailed in Real-Time Volume 

Graphics. 3 Using this method, the volumetric data is copied 

into a 3D texture, which is used in a pixel shader on the 

GPU. The simulation volume is rendered as a cube at the 

position of the volume in the 3D scene. For each visible 

pixel, a ray is cast from the camera to this pixel on the 

cube. The corresponding point in the 3D texture is found 

and sampled and the ray marches through the texture at 

a fixed step size, accumulating the density data as it goes. 

Conclusion 

With Kaboom, we created a modular fluid simulation 

that adds to the existing knowledge base. It operates in 

3D, is independent of the rendering portion, and is multithreaded 

to take advantage of multi-core processors. Of 

course, there is still plenty of future work that someone 

can do with this code sample. On the simulation side, a 

new or modified algorithm could be used to remove the 

serial requirements. Also, we found that a section of the 

code that does bilinear interpolation calculations takes up 

a significant amount of processing time, which could be 

reduced by storing the results in a lookup table. Another 

interesting avenue to explore is the introduction of 

arbitrary boundaries. Since the simulation is running on 

the CPU it has access to and can react with the geometry 

data of the scene. The rendering could be enhanced with 

additional effects, such as shadowing or dynamic lighting. 

About the Author 

Quentin Froemke is a software engineer with Intel’s 

Visual Computing Software Division where he enables the 

game industry to more effectively utilize multi-core and 

other Intel technologies. 

Resources 

For more information about Intel® Threading Building 

Blocks, visit http://www.threadingbuildingblocks.org 

3 

Engel, Klaus, ed. Real-Time Volume Graphics, Wellesley, Massachusetts: A. K. Peters, Ltd., 2006. 

Sign up today for Intel® Visual Adrenaline magazine: 

www.intelsoftwaregraphics.com >> 

Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality, 

or compatibility of third-party vendors and their devices. 

All products, dates, and plans are based on current expectations and subject to change without notice. 

Intel, Intel logo, Intel Core, Pentium, VTune, and Xeon are trademarks or registered trademarks of Intel 

Corporation or its subsidiaries in the United States and other countries. 

*Other names and brands may be claimed as the property of others. 

Copyright 2009. Intel Corporation. All rights reserved. 0309/CS/RHM/PDF 

© 

4

Multi-Threaded Fluid Simulation

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?