20.03.2015 Views

Recitation 5: Rendering Fractals

Recitation 5: Rendering Fractals

Recitation 5: Rendering Fractals

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CS179: GPU Programming<br />

<strong>Recitation</strong> 5: <strong>Rendering</strong> <strong>Fractals</strong>


<strong>Rendering</strong> <strong>Fractals</strong><br />

●<br />

●<br />

●<br />

●<br />

●<br />

Volume data vs. texture memory<br />

Creating and using CUDA arrays<br />

Using PBOs for screen output<br />

Quaternion Julia Sets<br />

<strong>Rendering</strong> volume data


Volume Data<br />

●<br />

●<br />

●<br />

●<br />

Stored in global memory<br />

Can be accessed only as linear memory<br />

No texturing pipeline features available<br />

– But, only form of global writeable data<br />

Allocate arbitrary linear memory using<br />

cudaMalloc


Texture Memory<br />

●<br />

●<br />

●<br />

●<br />

CUDA arrays allocated in “texture memory”<br />

– use cudaMalloc3DArray for correct pitch<br />

Then declare texture in device:<br />

– texture tex;<br />

<br />

– type = cudaReadModeElementType<br />

cudaReadModeNormalizedFloat<br />

Access using tex3D:<br />

– tex3D(tex, s, t, p);


Textures<br />

●<br />

●<br />

●<br />

●<br />

Can set properties of existing tex object:<br />

– tex.normalized = true;<br />

– tex.filterMode = cudaFilterModeLinear;<br />

– tex.addressMode[i] = cudaAddressModeClamp;<br />

– Basically, same settings as OpenGL<br />

Then, bind tex using Malloc3D's array:<br />

– cudaBindTextureToArray<br />

No need to bind/unbind for each use<br />

Usually have at least 8 texture lines available<br />

– Probably wont need more than one anyway...


Using PBOs<br />

●<br />

How to actually render using CUDA?<br />

– PBO: pixel buffer object<br />

●<br />

●<br />

A PBO handles pixels like VBOs handle vertices<br />

OpenGL allocates it as a region of global memory<br />

– So, it can be mapped via cudaGLMapBufferObject<br />

– written to by CUDA<br />

– bound using glBindBufferARB<br />

●<br />

GL_PIXEL_UNPACK_BUFFER_ARB<br />

– then drawn to screen with glDrawPixels


Lab 5<br />

●<br />

●<br />

<strong>Rendering</strong> quaternion Julia sets<br />

Not as complicated as it sounds:<br />

– Calculate volume fractal using equation<br />

– Copy over to texture memory<br />

– Volume render<br />

– Only recalculate when necessary<br />

●<br />

But first.. what is a quaternion Julia set?


<strong>Fractals</strong><br />

●<br />

●<br />

Self-similar, recursive sets<br />

Became popular in mid-late 1900s with the<br />

evolution of graphics<br />

– Difficult before graphics due to infinite detail<br />

– Graphics made visualizing them possible<br />

– Mandlebrot used fractals to try and estimate<br />

coastlines..


The Mandlebrot Set


The Mandlebrot Set<br />

z n+1<br />

= z n2<br />

+c


The Mandlebrot Set<br />

●<br />

Defined by iterative complex equation:<br />

– z n+1<br />

= z n<br />

2<br />

+ c<br />

●<br />

c is a pixel coord on the complex plane<br />

– x-axis = real axis, y axis = imaginary axis<br />

● z 0<br />

= 0<br />

● Three possible results depending on c:<br />

– Converge to 0 (black space)<br />

– Stays in finite orbit (boundary)<br />

– Escapes to infinity


The Mandlebrot Set<br />

●<br />

●<br />

●<br />

●<br />

●<br />

Typically computed by iterating z and checking<br />

if it escapes some magnitude (~2)<br />

Can color based on rate of escape<br />

Typically 20-50 iterations is enough to tell<br />

behavior of z<br />

Used to take seconds to render set on CPU<br />

See SDK for real-time program


Julia Set<br />

●<br />

Each point of the Mandlebrot set has a<br />

corresponding Julia set:<br />

– Iterate z 2 + c, but z 0<br />

is some pixel


Julia Sets<br />

●<br />

●<br />

Calculated in the same way as Mandlebrot<br />

sets<br />

We don't really have a practical application for<br />

these..<br />

– But they look really pretty!<br />

– And they're parallelizable, so we'll work with them


Quaternion Julia Sets<br />

●<br />

We could do 2D Julia sets..<br />

– But 4D ones are more exciting!<br />

●<br />

The iterative process is the same, except now<br />

we use Quaternions.


Quaternions<br />

●<br />

Extension to the real numbers:<br />

i 2 = j 2 =k 2 =ijk =−1<br />

ij=k ji=−k<br />

jk =i kj=−i<br />

ki= j ik=− j<br />

●<br />

Very applicable in CG for 3D rotations,<br />

visualizations, etc..


Quaternion Julia Sets<br />

●<br />

So, we can create a 4D set, but how do we<br />

render in 3D?<br />

– Projection!


Projection<br />

●<br />

We can take 2D slices of a 3D object<br />

– Think MRI scan<br />

●<br />

Same idea: we take 3D volume slices of a 4D<br />

object<br />

– Imagine a 3D object that morphs over time<br />

– The object at one instance of time is our 3D slice<br />

●<br />

●<br />

These 3D slices are what we render<br />

So, we have three parameters now: z 0<br />

, c, and<br />

the slicing plane


Quaternions in Lab 5<br />

●<br />

●<br />

●<br />

●<br />

Quaternion multiplication provided:<br />

– mul_quat, sqr_quat<br />

pos_to_quat<br />

– Given a plane as a parameter, converts a 3D<br />

position to a quaternion<br />

We'll store quaternions as float4's<br />

cutil_math.h provides vector math (dot, cross,<br />

etc.) and operator definitions (float4 * float,<br />

etc.)


<strong>Rendering</strong><br />

●<br />

●<br />

●<br />

●<br />

●<br />

Transform each point in volume texture to quaternion<br />

Iterate the Julia fractal equation<br />

Store whether point is in set or how fast it escapes<br />

Then, use normal volume rendering techniques<br />

– raytracing.. remember lab2?<br />

Raytracing might not work perfectly..<br />

– Julia sets have infinite detail, so some parts are<br />

infinitely thin


Julia Distance Function<br />

●<br />

●<br />

●<br />

●<br />

There is a distance estimator function for Julia<br />

Sets<br />

Gives lower bound on the distance to the set<br />

from any point in space<br />

Iterate section equation simultaneously with<br />

Julia set equation:<br />

– z n<br />

' = 2z n<br />

z n<br />

'<br />

Then, the distance is estimated by:<br />

d (z)= ∣ z n ∣<br />

2∣z n<br />

'∣ log ∣z n ∣


Julia Distance Function<br />

●<br />

●<br />

●<br />

We can actually just render this function!<br />

Distance function is smooth, so we can render<br />

the isosurface of it<br />

More iterations improve the estimate


Julia Distance Function<br />

●<br />

How to use it:<br />

– Iterate z' n+1<br />

= 2z n<br />

z n<br />

' and z n+1<br />

= z n<br />

2<br />

+ c with provided c<br />

and z 0<br />

– Can stop iterating once z n<br />

escapes (|z n<br />

| 2 > ~20) or<br />

reach maximum number of iterations<br />

– Return distance on previous slide


Better <strong>Rendering</strong><br />

●<br />

●<br />

●<br />

●<br />

Fill in volume data with value of distance<br />

function at each point<br />

Copy to volume texture<br />

Step along projected ray and render when you<br />

hit isosurface (value < epsilon)<br />

This is pretty fast when parallelized<br />

– like lab 2


Best <strong>Rendering</strong><br />

●<br />

●<br />

●<br />

●<br />

●<br />

But we can speed this up!<br />

We have a distance estimator<br />

If we estimate we are 0.5 units away, no need<br />

to step ray by 0.001<br />

Step by a * d(z)<br />

– a is just some constant.. 0.1-0.5 works well<br />

Will this cause thread divergence?<br />

– Not really, threads that finish early will just wait<br />

– Note: These distances are in 4D


Drawing Isosurface<br />

●<br />

●<br />

●<br />

●<br />

Stop stepping along the ray when we hit<br />

surface, and render something<br />

In order to do lighting, we'll want the normal<br />

For a smooth scalar field, the normal is the<br />

gradient of the field at that point<br />

Compute gradient from volume texture?<br />

– No, this will be blocky<br />

– Instead, compute gradient via more juliaDist calls


Computing Normals<br />

●<br />

Theoretically, this should work..<br />

– But in practice, it doesn't work too well<br />

●<br />

Can also arbitrarily choose axes for gradient<br />

computation, then calculate tangent and<br />

binormal, then normal = tangent x binormal<br />

– Still, not too great, but better<br />

– This is optional, you can just do the gradient, it's<br />

much easier


Lab 5<br />

●<br />

●<br />

What you need to do:<br />

On the host:<br />

– Execute kernels<br />

– Copy global memory to texture memory<br />

●<br />

Look up necessary functions in CUDA manuals<br />

– Set symbols in graphics memory<br />

●<br />

On the device:<br />

– Julia distance estimator function<br />

– Fractal computation kernel<br />

– Volume rendering kernel<br />

●<br />

Let the TODOs guide you, as usual


Important Note on Registers<br />

●<br />

●<br />

●<br />

●<br />

Volume rendering calls JuliaDist, intersectBox,<br />

computes normals, etc.<br />

Easy to run out of register memory<br />

So, be careful and put things into functions if<br />

you don't need them later (helps compiler)<br />

Might also want to use less than 512 threads<br />

per block


Lab 5<br />

●<br />

What's given to you:<br />

– Volume render ray is set up<br />

●<br />

– Steps along ray at constant interval and<br />

accumulates from 3D texture<br />

Change this to have a dynamic interval and<br />

stop when we hit isosurface


Lab 5<br />

●<br />

●<br />

●<br />

●<br />

Organize your volume cube with threads<br />

running in the lowest dimension and a 2D grid<br />

for the other 2 dimensions to make indexing<br />

easier<br />

See globally defined dim3s<br />

The space extends in ±2.0 in each direction<br />

(for converting indices to positions)<br />

1 thread per element is probably fastest, but<br />

feel free to experiment with loops


Memory Coalescing<br />

●<br />

If you compute the index within the block as:<br />

– i = x + width*y + width*height*z<br />

– then write to output[i]<br />

– threads run along x, therefore coalesced<br />

●<br />

●<br />

Non-coalesced case: x swapped with one of<br />

the other dimensions<br />

Test both coalesced and non-coalesced<br />

speeds, write results into README


Cool stuff:<br />

●<br />

●<br />

Color it however you want<br />

– You should compute normals<br />

– Color it as a function of something:<br />

●<br />

●<br />

●<br />

normals<br />

position in space<br />

etc..<br />

Could experiment with different functions, like<br />

z 3 + c (mention it in README)


Cool Stuff<br />

●<br />

Extra credit:<br />

– Raytrace for shadows?<br />

– Adaptive Detailing: render using lower epsilon if<br />

we're closer to camera


Final Notes<br />

●<br />

We could just raytrace entire set<br />

– also pretty fast<br />

●<br />

●<br />

This way teaches us a little about memory<br />

Raytracing also fairly simple<br />

– Just call JuliaDist in volume rendering function<br />

instead of sampling texture<br />

– But, sampling textures is still faster<br />

– We get more threads this way (O(n 3 ) instead of<br />

O(n 2 ))

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!