04.04.2015 Views

Multi-Threaded Fluid Simulation

Multi-Threaded Fluid Simulation

Multi-Threaded Fluid Simulation

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Multi</strong>-<strong>Threaded</strong> <strong>Fluid</strong> <strong>Simulation</strong> for Games<br />

By Hugh Smith and Jeff Freeman<br />

Along with mentioning a strange rash, a sure way to kill<br />

a conversation is to bring up the topic of Computational<br />

<strong>Fluid</strong> Dynamics (CFD). But for those who wish to create<br />

fantastic-looking clouds, explosions, smoke, and other<br />

game effects, nothing could be more exciting! CFD is the<br />

section of fluid mechanics that finds numerical solutions to<br />

the equations describing the behavior of fluids, both liquid<br />

and gaseous, and their interaction with various forces.<br />

Until recent hardware advances, even approximating a<br />

solution was too computationally intense for real-time<br />

applications. However, the last year or two has shown<br />

a small but increasing amount of research and articles,<br />

as well as the release of some commercial products that<br />

deal with simulating fluids, not only in real time but in a<br />

video game environment. With the Kaboom project, we’ve<br />

created a modular, threaded CFD simulation that can be<br />

dropped into an existing project, hooked up to a rendering<br />

engine, and used to generate realistic, real-time fluid<br />

dynamics simulations, suitable for smoke or water.<br />

We observed that many games are not reaching their<br />

full potential because they’re unable to use all of the<br />

available CPU resources. Frequently, this is because they<br />

are not properly threaded for multi-core architectures.<br />

Rather than run the simulation on the GPU, we decided<br />

to use those extra cycles to produce a basic, real-time<br />

fluid simulation. By performing the simulation on the CPU<br />

as opposed to the GPU, the simulation has fast and easy<br />

access to all game resources (such as mesh information)<br />

and the game has access to the simulation results. This also<br />

leaves the GPU free to handle the graphics.<br />

As we continued our research we found that although<br />

interest in fluid simulations was high, little concrete<br />

material was available, especially in the way of code and<br />

examples. We were unable to find examples of 3D or<br />

multi-threaded solvers. Most of the articles described ways<br />

to solve the equations but did not use a multi-threaded<br />

approach. Despite this, there seemed to be a lot of interest<br />

in the topic, so we decided that another goal of the project<br />

would be to make the resulting code modular and thereby<br />

fairly simple for someone to integrate into their own<br />

project. Developers can also use and expand upon the set<br />

of threaded, modular routines that we created for 3D CFD.<br />

1


MULTI-THREADED FLUID SIMULATION FOR GAMES<br />

<strong>Simulation</strong><br />

The code is based on an article and sample code written<br />

by Mick West that was recently posted on Gamasutra. 1,2<br />

Since one of our goals was to produce a modular, reusable<br />

code base, the first thing we did was to convert the<br />

existing sample to C++ and separate the simulation from<br />

the rest of the application, specifically from the rendering.<br />

This separation allows the simulation code to be easily<br />

added to an existing code base and rendering engine.<br />

Next, we extended the code into three dimensions,<br />

which turned out to be a fairly straightforward exercise<br />

because none of the algorithms changed appreciably with<br />

the addition of another dimension. For example, the 2D<br />

diffusion step is solved by applying the diffusion operation<br />

to move values from a cell to its neighbors (Figures 1<br />

and 2). This approach is extended in the 3D case so that<br />

instead of inspecting four neighboring cells we look at six<br />

neighbors (Figures 1 and 3). Other cases used the same<br />

principle cpe as well.<br />

Figure 2. A code sample that shows 2D diffusion.<br />

Figure 3. A code sample that shows 3D diffusion.<br />

Figure 1. Transitioning code from 2D to 3D.<br />

The following two code samples (Figures 2 and 3)<br />

show the code that performs the diffusion operation<br />

for non-border case cells in both the 2D and 3D cases.<br />

The border cells are those cells at the very edge of the<br />

grid that are missing one or more neighbors. These cells<br />

are handled separately as special cases. The transition<br />

to 3D is accomplished by adding in consideration for the<br />

z-axis. Apart from that, the algorithm remains essentially<br />

the same.<br />

After transitioning the simulation into 3D, we began<br />

investigating how to break the work up for parallel<br />

processing. As a base for our multi-core work we used<br />

the Intel® Threading Building Blocks (TBB) package. We<br />

found TBB to be very competitive in performance with<br />

other threading APIs. Additionally, it has a lot of built-in<br />

functionality and structures, such as a thread pool, task<br />

manager, and multi-threaded memory management.<br />

The idea behind threading the simulation is data<br />

decomposition: take the grid and break it into smaller<br />

portions. These smaller portions are submitted as tasks<br />

to the task queue and processed by the worker threads,<br />

independently of each other. The grid is broken up evenly<br />

until a small-enough granularity is reached, at which point<br />

further reductions become ineffective, or worse, inefficient<br />

due to thread management overhead. Figure 4 shows an<br />

example of how a 2D grid is broken into four tasks.<br />

1<br />

West, Mick. “Practical <strong>Fluid</strong> Dynamics: Part 1” Gamasutra, 26 June 2008. http://www.gamasutra.com/view/feature/1549/practical_fluid_dynamics_part_1.php<br />

2<br />

West, Mick. “Practical <strong>Fluid</strong> Dynamics: Part 2” Gamasutra, 23 July 2008. http://www.gamasutra.com/view/feature/1615/practical_fluid_dynamics__part_ii.php<br />

2


MULTI-THREADED FLUID SIMULATION FOR GAMES<br />

The main computational load of the simulation occurs<br />

when each step loops over the grid. These are all nested<br />

loops, so a straightforward way to break them up is by<br />

using the “parallel for” construction of TBB. In the example<br />

case of density diffusion, the code went from the serial<br />

version shown in Figure 3 to the parallel version shown in<br />

Figures 5 and 6.<br />

Figure 5. Using Intel® Threading Building Blocks to call the<br />

diffusion algorithm.<br />

Figure 4. Breaking up a 2D grid into tasks and an individual cell<br />

adjacent to other tasks<br />

For example, in the simulation there is a diffusion step<br />

followed by a force update step and later, an advection<br />

step. At each step the grid is broken into pieces, and the<br />

tasks write their data to a shared output grid, which<br />

becomes the input into the next step. In most cases, this<br />

approach works well. However, a problem can arise if the<br />

calculation uses its own results in the current step. That<br />

is, the output of cell 1 is used to calculate cell 2. Looking<br />

at Figure 4, it is obvious that certain cells, like the shaded<br />

cell, will be processed in a task separate from the task<br />

in which their immediate neighbors are processed. Since<br />

each task uses the same input, the neighboring cells’ data<br />

will be different in the multi-threaded version compared<br />

to the serial version. Whether this is an issue depends on<br />

which portion of the simulation we are in. Inaccuracies in<br />

the results in the diffusion step can occur, but they are not<br />

noticeable and don’t introduce destabilizing effects. In most<br />

cases the inaccuracies are smoothed out by subsequent<br />

iterations over the data and have no lasting impact on the<br />

simulation. This kind of behavior is acceptable because the<br />

primary concern is visual appeal, not numerical accuracy.<br />

However, in other cases inaccuracies can accumulate,<br />

introducing a destabilizing effect on the simulation, which<br />

can quickly destroy any semblance of normal behavior. An<br />

example of this occurs while processing the edge cases in<br />

diffusion—where diffusion is calculated along the edges<br />

of the grid or cube. When run in parallel, the density does<br />

not diffuse out into the rest of the volume and instead<br />

accumulates at the edges, forming a solid border. For<br />

this situation and similar ones, we simply run that<br />

portion serially.<br />

Figure 5 shows how TBB is used to call the diffusion<br />

algorithm. The arguments uBegin and uEnd are the<br />

start and end values of the total range to process, and<br />

uGrainSize is how small to break the range into. For<br />

example, if uBegin is 0 and uEnd is 99, the total range is<br />

100 units. If uGrainSize is 10, 10 tasks will be submitted,<br />

each of which will process a range of 10.<br />

Figure 6 shows the actual algorithm called by the TBB<br />

task manager. The range to process is passed in, and the<br />

input and output grids are variables common to each job.<br />

As a result of using multi-threading, the performance in<br />

frames per second was improved by 3.76× when going<br />

from one thread to eight threads on an Intel® Core i7<br />

processor.<br />

Figure 6. A code sample that shows 3D, multi-threaded diffusion.<br />

3


MULTI-THREADED FLUID SIMULATION FOR GAMES<br />

Rendering<br />

Although the goal of the project was to take unused<br />

CPU cycles and use them to compute a CFD simulation,<br />

the results won’t have much impact without a nice, visual<br />

representation. There are many methods of rendering<br />

volumetric data, but we like the results we got using<br />

the ray casting method detailed in Real-Time Volume<br />

Graphics. 3 Using this method, the volumetric data is copied<br />

into a 3D texture, which is used in a pixel shader on the<br />

GPU. The simulation volume is rendered as a cube at the<br />

position of the volume in the 3D scene. For each visible<br />

pixel, a ray is cast from the camera to this pixel on the<br />

cube. The corresponding point in the 3D texture is found<br />

and sampled and the ray marches through the texture at<br />

a fixed step size, accumulating the density data as it goes.<br />

Conclusion<br />

With Kaboom, we created a modular fluid simulation<br />

that adds to the existing knowledge base. It operates in<br />

3D, is independent of the rendering portion, and is multithreaded<br />

to take advantage of multi-core processors. Of<br />

course, there is still plenty of future work that someone<br />

can do with this code sample. On the simulation side, a<br />

new or modified algorithm could be used to remove the<br />

serial requirements. Also, we found that a section of the<br />

code that does bilinear interpolation calculations takes up<br />

a significant amount of processing time, which could be<br />

reduced by storing the results in a lookup table. Another<br />

interesting avenue to explore is the introduction of<br />

arbitrary boundaries. Since the simulation is running on<br />

the CPU it has access to and can react with the geometry<br />

data of the scene. The rendering could be enhanced with<br />

additional effects, such as shadowing or dynamic lighting.<br />

About the Author<br />

Quentin Froemke is a software engineer with Intel’s<br />

Visual Computing Software Division where he enables the<br />

game industry to more effectively utilize multi-core and<br />

other Intel technologies.<br />

Resources<br />

For more information about Intel® Threading Building<br />

Blocks, visit http://www.threadingbuildingblocks.org<br />

3<br />

Engel, Klaus, ed. Real-Time Volume Graphics, Wellesley, Massachusetts: A. K. Peters, Ltd., 2006.<br />

Sign up today for Intel® Visual Adrenaline magazine:<br />

www.intelsoftwaregraphics.com >><br />

Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality,<br />

or compatibility of third-party vendors and their devices.<br />

All products, dates, and plans are based on current expectations and subject to change without notice.<br />

Intel, Intel logo, Intel Core, Pentium, VTune, and Xeon are trademarks or registered trademarks of Intel<br />

Corporation or its subsidiaries in the United States and other countries.<br />

*Other names and brands may be claimed as the property of others.<br />

Copyright 2009. Intel Corporation. All rights reserved. 0309/CS/RHM/PDF<br />

©<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!