10.07.2015 Views

ATI Stream Computing OpenCL Programming Guide - CiteSeerX

ATI Stream Computing OpenCL Programming Guide - CiteSeerX

ATI Stream Computing OpenCL Programming Guide - CiteSeerX

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>ATI</strong> STREAM COMPUTINGFigures1.1 Generalized GPU Compute Device Structure.........................................................................1-21.2 Simplified Block Diagram of the GPU Compute Device.........................................................1-31.3 <strong>ATI</strong> <strong>Stream</strong> Software Ecosystem.............................................................................................1-41.4 Simplified Mapping of <strong>OpenCL</strong> onto <strong>ATI</strong> <strong>Stream</strong> <strong>Computing</strong> .................................................1-61.5 Work-Item Grouping Into Work-Groups and Wavefronts ........................................................1-71.6 CAL Functionality.....................................................................................................................1-91.7 Interrelationship of Memory Domains....................................................................................1-111.8 Dataflow between Host and GPU .........................................................................................1-111.9 Simplified Execution Of Work-Items On A Single <strong>Stream</strong> Core ...........................................1-151.10 <strong>Stream</strong> Core Stall Due to Data Dependency ........................................................................1-161.11 <strong>OpenCL</strong> <strong>Programming</strong> Model................................................................................................1-182.1 <strong>OpenCL</strong> Compiler Toolchain....................................................................................................2-12.2 Runtime Processing Structure .................................................................................................2-54.1 Memory System .......................................................................................................................4-74.2 FastPath (blue) vs CompletePath (red) Using float1..............................................................4-94.3 Transformation to Staggered Offsets.....................................................................................4-164.4 Two Kernels: One Using float4 (blue), the Other float1 (red) ..............................................4-184.5 Effect of Varying Degrees of Coalescing - Coal (blue), NoCoal (red), Split (green) ..........4-204.6 Unaligned Access Using float1..............................................................................................4-224.7 Unmodified Loop....................................................................................................................4-484.8 Kernel Unrolled 4X.................................................................................................................4-484.9 Unrolled Loop with Stores Clustered.....................................................................................4-494.10 Unrolled Kernel Using float4 for Vectorization ......................................................................4-494.11 One Example of a Tiled Layout Format................................................................................4-52C.1 Pixel Shader Matrix Transpose .............................................................................................. C-2C.2 Compute Kernel Matrix Transpose......................................................................................... C-3C.3 LDS Matrix Transpose ............................................................................................................ C-4ContentsCopyright © 2010 Advanced Micro Devices, Inc. All rights reserved.xi

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!