23.04.2016 Views

OPENGL BLUEPRINT RENDERING

s6143-christoph-kubisch-blueprint-rendering

s6143-christoph-kubisch-blueprint-rendering

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

April 4-7, 2016 | Silicon Valley<br />

<strong>OPENGL</strong> <strong>BLUEPRINT</strong> <strong>RENDERING</strong><br />

Christoph Kubisch, 4/7/2016


MOTIVATION<br />

Blueprints / drawings in CAD/graph viewer<br />

applications<br />

Documents can contain many LINES and LINE_STRIPS<br />

Various line styles can be used (world-space widths,<br />

stippling, joints, caps...)<br />

Potential CPU bottlenecks<br />

‣ Generating geometry for complex styles<br />

‣ Collecting and rendering geometry<br />

Model courtesy of PTC<br />

2


MOTIVATION<br />

Not targeting full vector graphics<br />

NV_path_rendering covers high fidelity vector<br />

graphics rendering<br />

Per-pixel quadratic Bézier evaluation<br />

Stencil & Cover pass to allow sophisticated<br />

blending<br />

Focus of this talk is rendering lines defined by<br />

traditional vertices<br />

Rendering data from OpenGL buffer objects<br />

Single-pass, but does mean not safe for blending<br />

(does self-overlap)<br />

3


DEMO: BASIC DEMONSTRATION<br />

4


LINE RASTERIZATION<br />

Representation<br />

Standard:<br />

skewed rectangle<br />

pixel snapped lines<br />

Multisampling:<br />

aligned rectangle<br />

smooth lines<br />

Both suffer from visible gaps and<br />

overlaps on increasing line width<br />

5


LINE RASTERIZATION<br />

Stippling<br />

Stippling only in screenspace<br />

Patterns must be expressable with<br />

16 bits<br />

LINES re-start pattern every<br />

segment<br />

LINE_STRIPS have continous<br />

distance<br />

6


SHADER-DRIVEN LINES<br />

TECH<br />

Appearance<br />

on screen<br />

Create TRIANGLES/QUADS<br />

for line segments<br />

Project extruded vertices<br />

to keep line width<br />

consistent<br />

Clip and color in fragment<br />

shader based on UV<br />

coordinates and line<br />

distance<br />

Geometry in world<br />

coordinates<br />

Shapes via fragment<br />

shader discard<br />

7


SHADER-DRIVEN LINES<br />

TECH<br />

FLEXIBILITY<br />

Create TRIANGLES for line<br />

segments, project<br />

extrusion to<br />

world/screen, discard<br />

fragments<br />

Arbitrary stippling<br />

patterns and line widths<br />

Joint- and cap-styles<br />

Different distance metrics<br />

New coloring/animation<br />

possibilities via shaders<br />

Thin center line as effect<br />

8


SHADER-DRIVEN LINES<br />

TECH<br />

FLEXIBILITY<br />

CAVEATS<br />

Create TRIANGLES for line<br />

segments, project<br />

extrusion to<br />

world/screen, discard<br />

fragments<br />

Arbitrary stippling<br />

patterns and line widths<br />

Joint- and cap-styles<br />

Different distance metrics<br />

New coloring/animation<br />

possibilities via shaders<br />

Cannot be as fast as basic<br />

line rasterization<br />

Not all data local at<br />

rendering time (line strip<br />

distances need extra<br />

calculation)<br />

Geometry still selfoverlaps<br />

9


SHADER-DRIVEN LINES<br />

Sample implementation/library<br />

C interface library to render different<br />

line primitives (LINES, LINE_STRIPS,<br />

ARCS) provided as flexible framework<br />

rather than black-box<br />

Two different render-modes:<br />

render as extruded triangles,<br />

or one pixel wide lines<br />

Uses NVIDIA and ARB OpenGL extensions<br />

if available<br />

10


SHADER-DRIVEN LINES<br />

Sample implementation/library<br />

Global style and stipple definitions<br />

Stipple from arbitrary bit-pattern, or<br />

float values<br />

Style 0<br />

Style 1<br />

...<br />

Style-<br />

Definitions<br />

Stipple-<br />

Patterns<br />

Pattern texture A<br />

Pattern texture B<br />

...<br />

typedef struct NVLStyleInfo_s {<br />

NVLSpaceType<br />

projectionSpace;<br />

NVLJoinType<br />

join;<br />

NVLCapsType<br />

capsBegin;<br />

NVLCapsType<br />

capsEnd;<br />

float<br />

thickness;<br />

NVLStippleID<br />

stipplePattern;<br />

float<br />

stippleLength;<br />

float<br />

stippleOffsetBegin;<br />

float<br />

stippleOffsetEnd;<br />

NVLAnchorType<br />

stippleAnchor;<br />

NVLboolean<br />

stippleClamp;<br />

} NVLStyleInfo;<br />

typedef enum NVLSpaceType_e {<br />

NVL_SPACE_SCREEN,<br />

NVL_SPACE_SCREENDIST3D,<br />

NVL_SPACE_CUSTOM,<br />

NVL_SPACE_CUSTOMDIST3D,<br />

NVL_NUM_SPACES,<br />

}NVLSpaceType;<br />

typedef enum NVLAnchorType_e {<br />

NVL_ANCHOR_BEGIN,<br />

NVL_ANCHOR_END,<br />

NVL_ANCHOR_BOTH,<br />

NVL_NUM_ANCHORS,<br />

}NVLAnchorType;<br />

typedef enum NVLCapsType_e {<br />

NVL_CAPS_NONE,<br />

NVL_CAPS_ROUND,<br />

NVL_CAPS_BOX,<br />

NVL_NUM_CAPS,<br />

}NVLCapsType;<br />

typedef enum NVLJoinType_e {<br />

NVL_JOIN_NONE,<br />

NVL_JOIN_ROUND,<br />

NVL_JOIN_MITER,<br />

NVL_NUM_JOINS,<br />

}NVLJoinType;<br />

11


SHADER-DRIVEN LINES<br />

Sample implementation/library<br />

Uses GPU friendly collection mechanism:<br />

Record many primitives then render<br />

Optionally render sub-sections<br />

Raw Primitives pass vertex data directly<br />

Geometry Primitives reference existing<br />

Vertex Buffers<br />

Collections have usage-style flags:<br />

Geometry/Raw Recording<br />

Geometry Primitives VBO reference<br />

Raw Primitives Vertex values<br />

Matrix Color<br />

Style reference<br />

‣ filled new per-frame<br />

‣ recorded once, re-used many frames<br />

12


SHADER-DRIVEN LINES<br />

Quad extrusion<br />

Faster geometry creation by just using Vertex-<br />

Shader, avoiding extra Geometry-Shader stage<br />

VS<br />

GS<br />

Render GL_QUADS (4 vertices each segment)<br />

Use gl_VertexID to fetch line points<br />

VertexBuffer<br />

texelFetch(...gl_VertexID/4 + 0 or 1)<br />

Use it for the offsets as well<br />

Using custom vertex-fetch generally not<br />

recommended, but useful for special<br />

situations<br />

gl_VertexID % 4 + 0 gl_VertexID % 4 + 1<br />

13


SHADER-DRIVEN LINES<br />

Minimize Overdraw<br />

No naive rectangles but adjacency in<br />

LINE_STRIP is used to tighten the geometry<br />

Reduces overdraw and minimizes potential<br />

artifacts resulting from that<br />

14


SHADER-DRIVEN LINES<br />

Depth clamping<br />

Joints and caps exceed original line<br />

definition<br />

Can cause depth-buffer artifacts<br />

Prevent depth over-shooting by passing<br />

closest depth to fragment shader and<br />

clamp there<br />

Can use ARB_conservative_depth or just<br />

min/max to keep hardware z-cull active<br />

#extension GL_ARB_conservative_depth : require<br />

layout (depth_greater) out float gl_FragDepth;<br />

in flat float closestPointDepth;<br />

...<br />

gl_FragDepth = max(gl_FragCoord.z,<br />

closestPointDepth);<br />

15


DISTANCE COMPUTATION<br />

LINE_STRIPS need dedicated calculation phase<br />

V 0 V 1<br />

V 2<br />

Read vertices and calculate distances along the strip<br />

VertexBuffer V<br />

V 3<br />

4<br />

Strip Length 0 1<br />

2 3<br />

Sections drawn indepedently<br />

Fetch vertices & distances<br />

D 0 D 1<br />

D 1 D 2<br />

DistanceBuffer D 0 [0,1] [0,1]+[1,2] [0,1]+[1,2]+[2,3]<br />

Distances are fetched at render-time<br />

D 2 D 3<br />

16


DISTANCE COMPUTATION<br />

Shader Tips<br />

One LINE_STRIP per thread can lead to<br />

under utilization and non ideal memory<br />

access due to divergence<br />

Strip Length<br />

Thread: 0 ... 3<br />

3 2 4 8<br />

SIMT hardware processes threads together<br />

in lock-step, common instruction pointer<br />

(masks out inactive threads).<br />

NVIDIA: 1 warp = 32 threads<br />

VertexBuffer<br />

Distance<br />

Accumulation<br />

Loop<br />

0<br />

1<br />

2<br />

-<br />

-<br />

-<br />

3<br />

4<br />

-<br />

-<br />

-<br />

-<br />

5<br />

6<br />

7<br />

8<br />

-<br />

-<br />

9<br />

10<br />

11<br />

12<br />

13<br />

14<br />

-<br />

-<br />

-<br />

15<br />

-<br />

-<br />

-<br />

16<br />

17


DISTANCE COMPUTATION<br />

Shader Tips<br />

Compute one LINE_STRIP at a time across<br />

warp, gives nice memory fetch<br />

NV_shader_thread_shuffle to access<br />

neighbors and do prefix-sum calculation<br />

vec3 posA = getPosition ( gl_ThreadInWarpNV + …)<br />

vec3 posB = shuffleUpNV (posA, 1, gl_WarpSizeNV);<br />

... Handle first thread point differently<br />

float dist = distance(posA, posB);<br />

Short strips may still under-utilize warp,<br />

but are taking only one iteration<br />

Strip Length<br />

VertexBuffer<br />

Distance<br />

Accumulation<br />

Loop<br />

Access neighbor<br />

point via<br />

shuffleUpNV and<br />

compute distance<br />

Thread: 0 ... 3<br />

9 9 9 9<br />

0 1 2 3<br />

[0,0] [0,1] [1,2] [2,3]<br />

... Prefix-sum over distances ...<br />

4 5 6 7<br />

8 - - -<br />

18


DISTANCE COMPUTATION<br />

Batching & Latency hiding<br />

Memory intensive operations prefer many<br />

threads to hide latency of fetch<br />

Would not „compute“ distance for a single<br />

strip, but need many strips to work on<br />

Use one warp per strip if total amount of<br />

threads is low<br />

Warp 0 Warp 1 Warp 2 Warp 3<br />

Fetch<br />

Wait<br />

For<br />

Memory<br />

Compute<br />

Effective<br />

Utilization<br />

Hardware switches activity<br />

between entire warps<br />

19


DISTANCE COMPUTATION<br />

Batching & Latency hiding<br />

Launch overhead of compute dispatch not<br />

negligable for < 10 000 threads<br />

Use glEnable(GL_RASTERIZER_DISCARD); and<br />

Vertex-Shader to do compute work<br />

No shared memory but warp data sharing as<br />

seen before (ARB_shader_ballot or<br />

NV_shader_thread_shuffle)<br />

... “Compute” alternative for few threads<br />

if (numThreads < FEW_THREADS){<br />

glUseProgram( vs );<br />

glEnable ( GL_RASTERIZER_DISCARD );<br />

glDrawArrays( GL_POINTS, 0, numThreads );<br />

glDisable ( GL_RASTERIZER_DISCARD );<br />

}<br />

else {<br />

glUseProgram( cs );<br />

numGroups = (numThreads+GroupSize-1)/GroupSize;<br />

glUniformi1 (0, numThreads);<br />

glDispatchCompute ( numGroups, 1, 1 );<br />

}<br />

... Shader<br />

#if USE_COMPUTE<br />

layout (local_size_x=GROUP_SIZE) in;<br />

layout (location=0) uniform int numThreads;<br />

int threadID = int( gl_GlobalInvocationID.x );<br />

#else<br />

int threadID = int( gl_VertexID );<br />

#endif<br />

20


SMOOTH TRANSITIONS<br />

Anti-aliasing edges within shader<br />

Fragment shader effects cause outlines of visible<br />

shapes to be within geometry<br />

No geometric edges No MSAA benefit<br />

MSAA will not add quality „within triangle“<br />

Need to compute coverage accurately (sampleshading)<br />

or approximate<br />

Use of gl_SampleID (e.g. with<br />

interpolateAtSample) automatically makes<br />

shader run per-sample, „discard“ will affect<br />

coverage mask properly<br />

Cheaper: GL_SAMPLE_ALPHA_TO_COVERAGE or<br />

clear bits in gl_SampleMask<br />

in float stippleCoord;<br />

...<br />

sc = interpolateAtSample (stippleCoord, gl_SampleID);<br />

stippleResult = computeStippling( sc );<br />

if (stippleResult < 0) discard;<br />

21


SMOOTH TRANSITIONS<br />

Using Pixel Derivatives<br />

1<br />

1<br />

Simple trick to get smooth transitions, also works<br />

well on surface contour lines<br />

0<br />

0<br />

Use a signed distance field, instead of step<br />

function<br />

Find if sample is close to transition (zero<br />

crossing) via fwidth<br />

0<br />

1<br />

fwidth<br />

( signal )<br />

smoothing<br />

zone<br />

around zero<br />

Compute smooth weight if required<br />

float weight = signal < 0 ? -1 : 1;<br />

float zone = fwidth ( signal ) * 0.5;<br />

if (abs (signal) < zone){<br />

weight = signal / zone;<br />

}<br />

-1<br />

signal<br />

within<br />

smoothing<br />

zone<br />

22


RECORDING RAW DATA<br />

Using persistent mapped buffers<br />

When primitives & vertices are not re-used, but<br />

regenerated by CPU, we want a fast way to get<br />

them to GPU<br />

CPU filling<br />

Buffers A<br />

Timeline<br />

GPU rendering<br />

Use ARB_buffer_storage/OpenGL 4.3 to have<br />

buffers in CPU memory for fast copying<br />

Need fences to avoid overwriting data still used<br />

by GPU, 3 frames typically enough to avoid<br />

synchronization<br />

Cycle sets<br />

of buffers<br />

each<br />

frame<br />

Buffers B<br />

Buffers C<br />

Wait Fence<br />

Buffers A<br />

Buffers A<br />

Signal Frame Fence<br />

Buffers B<br />

Buffers C<br />

CPU memory access „okayish“ if data only read<br />

rarely (once for stipple-compute, once for<br />

render)<br />

Buffers B<br />

Buffers A<br />

Buffers B<br />

23


<strong>RENDERING</strong> ARCS<br />

Not trivial to compute distance along an arbitrary<br />

projected arc/circle<br />

Approximate circle as line strip<br />

Allocate maximum subdivision<br />

Compute adaptively based on screen-space size<br />

(or frustum cull)<br />

Rendering only needs to fetch distance values,<br />

can still compute position on the fly<br />

24


OUTLOOK & CONCLUSION<br />

Preserving all primitive order not optimal for<br />

performance, ideally application can operate in layers.<br />

Code your own special primitives for annotations<br />

(arrows...)<br />

Use of shaders can increase visual quality beyond<br />

„fancy surface shading“<br />

Do not need actual geometry for everything (distance<br />

fields are great)<br />

GPU programmable enough to move more effects from<br />

CPU to GPU<br />

25


April 4-7, 2016 | Silicon Valley<br />

THANK YOU<br />

JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join<br />

ckubisch@nvidia.com @pixeljetstream

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!