Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
April 4-7, 2016 | Silicon Valley<br />
<strong>OPENGL</strong> <strong>BLUEPRINT</strong> <strong>RENDERING</strong><br />
Christoph Kubisch, 4/7/2016
MOTIVATION<br />
Blueprints / drawings in CAD/graph viewer<br />
applications<br />
Documents can contain many LINES and LINE_STRIPS<br />
Various line styles can be used (world-space widths,<br />
stippling, joints, caps...)<br />
Potential CPU bottlenecks<br />
‣ Generating geometry for complex styles<br />
‣ Collecting and rendering geometry<br />
Model courtesy of PTC<br />
2
MOTIVATION<br />
Not targeting full vector graphics<br />
NV_path_rendering covers high fidelity vector<br />
graphics rendering<br />
Per-pixel quadratic Bézier evaluation<br />
Stencil & Cover pass to allow sophisticated<br />
blending<br />
Focus of this talk is rendering lines defined by<br />
traditional vertices<br />
Rendering data from OpenGL buffer objects<br />
Single-pass, but does mean not safe for blending<br />
(does self-overlap)<br />
3
DEMO: BASIC DEMONSTRATION<br />
4
LINE RASTERIZATION<br />
Representation<br />
Standard:<br />
skewed rectangle<br />
pixel snapped lines<br />
Multisampling:<br />
aligned rectangle<br />
smooth lines<br />
Both suffer from visible gaps and<br />
overlaps on increasing line width<br />
5
LINE RASTERIZATION<br />
Stippling<br />
Stippling only in screenspace<br />
Patterns must be expressable with<br />
16 bits<br />
LINES re-start pattern every<br />
segment<br />
LINE_STRIPS have continous<br />
distance<br />
6
SHADER-DRIVEN LINES<br />
TECH<br />
Appearance<br />
on screen<br />
Create TRIANGLES/QUADS<br />
for line segments<br />
Project extruded vertices<br />
to keep line width<br />
consistent<br />
Clip and color in fragment<br />
shader based on UV<br />
coordinates and line<br />
distance<br />
Geometry in world<br />
coordinates<br />
Shapes via fragment<br />
shader discard<br />
7
SHADER-DRIVEN LINES<br />
TECH<br />
FLEXIBILITY<br />
Create TRIANGLES for line<br />
segments, project<br />
extrusion to<br />
world/screen, discard<br />
fragments<br />
Arbitrary stippling<br />
patterns and line widths<br />
Joint- and cap-styles<br />
Different distance metrics<br />
New coloring/animation<br />
possibilities via shaders<br />
Thin center line as effect<br />
8
SHADER-DRIVEN LINES<br />
TECH<br />
FLEXIBILITY<br />
CAVEATS<br />
Create TRIANGLES for line<br />
segments, project<br />
extrusion to<br />
world/screen, discard<br />
fragments<br />
Arbitrary stippling<br />
patterns and line widths<br />
Joint- and cap-styles<br />
Different distance metrics<br />
New coloring/animation<br />
possibilities via shaders<br />
Cannot be as fast as basic<br />
line rasterization<br />
Not all data local at<br />
rendering time (line strip<br />
distances need extra<br />
calculation)<br />
Geometry still selfoverlaps<br />
9
SHADER-DRIVEN LINES<br />
Sample implementation/library<br />
C interface library to render different<br />
line primitives (LINES, LINE_STRIPS,<br />
ARCS) provided as flexible framework<br />
rather than black-box<br />
Two different render-modes:<br />
render as extruded triangles,<br />
or one pixel wide lines<br />
Uses NVIDIA and ARB OpenGL extensions<br />
if available<br />
10
SHADER-DRIVEN LINES<br />
Sample implementation/library<br />
Global style and stipple definitions<br />
Stipple from arbitrary bit-pattern, or<br />
float values<br />
Style 0<br />
Style 1<br />
...<br />
Style-<br />
Definitions<br />
Stipple-<br />
Patterns<br />
Pattern texture A<br />
Pattern texture B<br />
...<br />
typedef struct NVLStyleInfo_s {<br />
NVLSpaceType<br />
projectionSpace;<br />
NVLJoinType<br />
join;<br />
NVLCapsType<br />
capsBegin;<br />
NVLCapsType<br />
capsEnd;<br />
float<br />
thickness;<br />
NVLStippleID<br />
stipplePattern;<br />
float<br />
stippleLength;<br />
float<br />
stippleOffsetBegin;<br />
float<br />
stippleOffsetEnd;<br />
NVLAnchorType<br />
stippleAnchor;<br />
NVLboolean<br />
stippleClamp;<br />
} NVLStyleInfo;<br />
typedef enum NVLSpaceType_e {<br />
NVL_SPACE_SCREEN,<br />
NVL_SPACE_SCREENDIST3D,<br />
NVL_SPACE_CUSTOM,<br />
NVL_SPACE_CUSTOMDIST3D,<br />
NVL_NUM_SPACES,<br />
}NVLSpaceType;<br />
typedef enum NVLAnchorType_e {<br />
NVL_ANCHOR_BEGIN,<br />
NVL_ANCHOR_END,<br />
NVL_ANCHOR_BOTH,<br />
NVL_NUM_ANCHORS,<br />
}NVLAnchorType;<br />
typedef enum NVLCapsType_e {<br />
NVL_CAPS_NONE,<br />
NVL_CAPS_ROUND,<br />
NVL_CAPS_BOX,<br />
NVL_NUM_CAPS,<br />
}NVLCapsType;<br />
typedef enum NVLJoinType_e {<br />
NVL_JOIN_NONE,<br />
NVL_JOIN_ROUND,<br />
NVL_JOIN_MITER,<br />
NVL_NUM_JOINS,<br />
}NVLJoinType;<br />
11
SHADER-DRIVEN LINES<br />
Sample implementation/library<br />
Uses GPU friendly collection mechanism:<br />
Record many primitives then render<br />
Optionally render sub-sections<br />
Raw Primitives pass vertex data directly<br />
Geometry Primitives reference existing<br />
Vertex Buffers<br />
Collections have usage-style flags:<br />
Geometry/Raw Recording<br />
Geometry Primitives VBO reference<br />
Raw Primitives Vertex values<br />
Matrix Color<br />
Style reference<br />
‣ filled new per-frame<br />
‣ recorded once, re-used many frames<br />
12
SHADER-DRIVEN LINES<br />
Quad extrusion<br />
Faster geometry creation by just using Vertex-<br />
Shader, avoiding extra Geometry-Shader stage<br />
VS<br />
GS<br />
Render GL_QUADS (4 vertices each segment)<br />
Use gl_VertexID to fetch line points<br />
VertexBuffer<br />
texelFetch(...gl_VertexID/4 + 0 or 1)<br />
Use it for the offsets as well<br />
Using custom vertex-fetch generally not<br />
recommended, but useful for special<br />
situations<br />
gl_VertexID % 4 + 0 gl_VertexID % 4 + 1<br />
13
SHADER-DRIVEN LINES<br />
Minimize Overdraw<br />
No naive rectangles but adjacency in<br />
LINE_STRIP is used to tighten the geometry<br />
Reduces overdraw and minimizes potential<br />
artifacts resulting from that<br />
14
SHADER-DRIVEN LINES<br />
Depth clamping<br />
Joints and caps exceed original line<br />
definition<br />
Can cause depth-buffer artifacts<br />
Prevent depth over-shooting by passing<br />
closest depth to fragment shader and<br />
clamp there<br />
Can use ARB_conservative_depth or just<br />
min/max to keep hardware z-cull active<br />
#extension GL_ARB_conservative_depth : require<br />
layout (depth_greater) out float gl_FragDepth;<br />
in flat float closestPointDepth;<br />
...<br />
gl_FragDepth = max(gl_FragCoord.z,<br />
closestPointDepth);<br />
15
DISTANCE COMPUTATION<br />
LINE_STRIPS need dedicated calculation phase<br />
V 0 V 1<br />
V 2<br />
Read vertices and calculate distances along the strip<br />
VertexBuffer V<br />
V 3<br />
4<br />
Strip Length 0 1<br />
2 3<br />
Sections drawn indepedently<br />
Fetch vertices & distances<br />
D 0 D 1<br />
D 1 D 2<br />
DistanceBuffer D 0 [0,1] [0,1]+[1,2] [0,1]+[1,2]+[2,3]<br />
Distances are fetched at render-time<br />
D 2 D 3<br />
16
DISTANCE COMPUTATION<br />
Shader Tips<br />
One LINE_STRIP per thread can lead to<br />
under utilization and non ideal memory<br />
access due to divergence<br />
Strip Length<br />
Thread: 0 ... 3<br />
3 2 4 8<br />
SIMT hardware processes threads together<br />
in lock-step, common instruction pointer<br />
(masks out inactive threads).<br />
NVIDIA: 1 warp = 32 threads<br />
VertexBuffer<br />
Distance<br />
Accumulation<br />
Loop<br />
0<br />
1<br />
2<br />
-<br />
-<br />
-<br />
3<br />
4<br />
-<br />
-<br />
-<br />
-<br />
5<br />
6<br />
7<br />
8<br />
-<br />
-<br />
9<br />
10<br />
11<br />
12<br />
13<br />
14<br />
-<br />
-<br />
-<br />
15<br />
-<br />
-<br />
-<br />
16<br />
17
DISTANCE COMPUTATION<br />
Shader Tips<br />
Compute one LINE_STRIP at a time across<br />
warp, gives nice memory fetch<br />
NV_shader_thread_shuffle to access<br />
neighbors and do prefix-sum calculation<br />
vec3 posA = getPosition ( gl_ThreadInWarpNV + …)<br />
vec3 posB = shuffleUpNV (posA, 1, gl_WarpSizeNV);<br />
... Handle first thread point differently<br />
float dist = distance(posA, posB);<br />
Short strips may still under-utilize warp,<br />
but are taking only one iteration<br />
Strip Length<br />
VertexBuffer<br />
Distance<br />
Accumulation<br />
Loop<br />
Access neighbor<br />
point via<br />
shuffleUpNV and<br />
compute distance<br />
Thread: 0 ... 3<br />
9 9 9 9<br />
0 1 2 3<br />
[0,0] [0,1] [1,2] [2,3]<br />
... Prefix-sum over distances ...<br />
4 5 6 7<br />
8 - - -<br />
18
DISTANCE COMPUTATION<br />
Batching & Latency hiding<br />
Memory intensive operations prefer many<br />
threads to hide latency of fetch<br />
Would not „compute“ distance for a single<br />
strip, but need many strips to work on<br />
Use one warp per strip if total amount of<br />
threads is low<br />
Warp 0 Warp 1 Warp 2 Warp 3<br />
Fetch<br />
Wait<br />
For<br />
Memory<br />
Compute<br />
Effective<br />
Utilization<br />
Hardware switches activity<br />
between entire warps<br />
19
DISTANCE COMPUTATION<br />
Batching & Latency hiding<br />
Launch overhead of compute dispatch not<br />
negligable for < 10 000 threads<br />
Use glEnable(GL_RASTERIZER_DISCARD); and<br />
Vertex-Shader to do compute work<br />
No shared memory but warp data sharing as<br />
seen before (ARB_shader_ballot or<br />
NV_shader_thread_shuffle)<br />
... “Compute” alternative for few threads<br />
if (numThreads < FEW_THREADS){<br />
glUseProgram( vs );<br />
glEnable ( GL_RASTERIZER_DISCARD );<br />
glDrawArrays( GL_POINTS, 0, numThreads );<br />
glDisable ( GL_RASTERIZER_DISCARD );<br />
}<br />
else {<br />
glUseProgram( cs );<br />
numGroups = (numThreads+GroupSize-1)/GroupSize;<br />
glUniformi1 (0, numThreads);<br />
glDispatchCompute ( numGroups, 1, 1 );<br />
}<br />
... Shader<br />
#if USE_COMPUTE<br />
layout (local_size_x=GROUP_SIZE) in;<br />
layout (location=0) uniform int numThreads;<br />
int threadID = int( gl_GlobalInvocationID.x );<br />
#else<br />
int threadID = int( gl_VertexID );<br />
#endif<br />
20
SMOOTH TRANSITIONS<br />
Anti-aliasing edges within shader<br />
Fragment shader effects cause outlines of visible<br />
shapes to be within geometry<br />
No geometric edges No MSAA benefit<br />
MSAA will not add quality „within triangle“<br />
Need to compute coverage accurately (sampleshading)<br />
or approximate<br />
Use of gl_SampleID (e.g. with<br />
interpolateAtSample) automatically makes<br />
shader run per-sample, „discard“ will affect<br />
coverage mask properly<br />
Cheaper: GL_SAMPLE_ALPHA_TO_COVERAGE or<br />
clear bits in gl_SampleMask<br />
in float stippleCoord;<br />
...<br />
sc = interpolateAtSample (stippleCoord, gl_SampleID);<br />
stippleResult = computeStippling( sc );<br />
if (stippleResult < 0) discard;<br />
21
SMOOTH TRANSITIONS<br />
Using Pixel Derivatives<br />
1<br />
1<br />
Simple trick to get smooth transitions, also works<br />
well on surface contour lines<br />
0<br />
0<br />
Use a signed distance field, instead of step<br />
function<br />
Find if sample is close to transition (zero<br />
crossing) via fwidth<br />
0<br />
1<br />
fwidth<br />
( signal )<br />
smoothing<br />
zone<br />
around zero<br />
Compute smooth weight if required<br />
float weight = signal < 0 ? -1 : 1;<br />
float zone = fwidth ( signal ) * 0.5;<br />
if (abs (signal) < zone){<br />
weight = signal / zone;<br />
}<br />
-1<br />
signal<br />
within<br />
smoothing<br />
zone<br />
22
RECORDING RAW DATA<br />
Using persistent mapped buffers<br />
When primitives & vertices are not re-used, but<br />
regenerated by CPU, we want a fast way to get<br />
them to GPU<br />
CPU filling<br />
Buffers A<br />
Timeline<br />
GPU rendering<br />
Use ARB_buffer_storage/OpenGL 4.3 to have<br />
buffers in CPU memory for fast copying<br />
Need fences to avoid overwriting data still used<br />
by GPU, 3 frames typically enough to avoid<br />
synchronization<br />
Cycle sets<br />
of buffers<br />
each<br />
frame<br />
Buffers B<br />
Buffers C<br />
Wait Fence<br />
Buffers A<br />
Buffers A<br />
Signal Frame Fence<br />
Buffers B<br />
Buffers C<br />
CPU memory access „okayish“ if data only read<br />
rarely (once for stipple-compute, once for<br />
render)<br />
Buffers B<br />
Buffers A<br />
Buffers B<br />
23
<strong>RENDERING</strong> ARCS<br />
Not trivial to compute distance along an arbitrary<br />
projected arc/circle<br />
Approximate circle as line strip<br />
Allocate maximum subdivision<br />
Compute adaptively based on screen-space size<br />
(or frustum cull)<br />
Rendering only needs to fetch distance values,<br />
can still compute position on the fly<br />
24
OUTLOOK & CONCLUSION<br />
Preserving all primitive order not optimal for<br />
performance, ideally application can operate in layers.<br />
Code your own special primitives for annotations<br />
(arrows...)<br />
Use of shaders can increase visual quality beyond<br />
„fancy surface shading“<br />
Do not need actual geometry for everything (distance<br />
fields are great)<br />
GPU programmable enough to move more effects from<br />
CPU to GPU<br />
25
April 4-7, 2016 | Silicon Valley<br />
THANK YOU<br />
JOIN THE NVIDIA DEVELOPER PROGRAM AT developer.nvidia.com/join<br />
ckubisch@nvidia.com @pixeljetstream