11.07.2015 Views

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.2 AMD CAL Programming modelVarious APIs and programming languages are available to program the RV770 GPU <str<strong>on</strong>g>for</str<strong>on</strong>g>GPGPU purposes. The AMD Compute Abstracti<strong>on</strong> Layer (CAL) Applicati<strong>on</strong> ProgrammingInterface [8] provides the ability to program the GPU using either the R700 family ISA[10] orusing the AMD Intermediate Language (IL) [9]. OpenCL support and DirectX-11 computeshader 4.1 support are also expected in the future <str<strong>on</strong>g>for</str<strong>on</strong>g> the RV770. To program the GPUusing the CAL API, the programmer explicitly manages the GPU. For most efficiency,the programmer must explicitly manage the GPU’s <strong>on</strong>-board memory and must explicitlytransfer data between the system RAM and the GPU <strong>on</strong>-board RAM. While the hardwaredoes provide some ability to directly access the system RAM without explicit transfers to theGPU <strong>on</strong>-board RAM, the ability is very limited. Accessing the system memory directly overthe PCI-e bus is also highly inefficient due to the high latency and relatively low bandwidth.This document discusses the AMD IL programming model since it corresp<strong>on</strong>ds closelyto the hardware. AMD IL is a RISC-like program representati<strong>on</strong> derived from the ShaderModel 4.0 assembler. The AMD CAL API includes a JIT compiler to compile and run AMDIL programs and also provides routines <str<strong>on</strong>g>for</str<strong>on</strong>g> managing the GPU memory, initializati<strong>on</strong> andshutdown <str<strong>on</strong>g>of</str<strong>on</strong>g> the GPU and querying the GPU <str<strong>on</strong>g>for</str<strong>on</strong>g> available resources. The CAL runtime andthe CAL compiler are distributed as part <str<strong>on</strong>g>of</str<strong>on</strong>g> the AMD graphic driver.2.2.1 Memory managementIn the CAL API, the memory <strong>on</strong> the GPU is allocated as two-dimensi<strong>on</strong>al resources. Toallocate a resource, the programmer specifies the height, width, <str<strong>on</strong>g>for</str<strong>on</strong>g>mat, memory type andlocati<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> the resource. The width and height are both restricted to 8192 elements. The<str<strong>on</strong>g>for</str<strong>on</strong>g>mat specifies the element type <str<strong>on</strong>g>of</str<strong>on</strong>g> the resource and can be <str<strong>on</strong>g>of</str<strong>on</strong>g> any <str<strong>on</strong>g>of</str<strong>on</strong>g> the numeric datatypesup to 128-bit width. For example, float, float2, float4, int, int2, int4, double and double2are all valid <str<strong>on</strong>g>for</str<strong>on</strong>g>mat specifiers. The locati<strong>on</strong> specifies whether the resource is located in theGPU memory or in a driver-allocated porti<strong>on</strong> <str<strong>on</strong>g>of</str<strong>on</strong>g> system RAM.For GPU resources, another specifier is the memory type. AMD GPUs are capable <str<strong>on</strong>g>of</str<strong>on</strong>g>storing resources in two different physical arrangements. The default arrangement is a tiledarrangement where a block <str<strong>on</strong>g>of</str<strong>on</strong>g> 16×4 bytes is stored c<strong>on</strong>tiguously. The other opti<strong>on</strong> is to storeresources in linear memory that corresp<strong>on</strong>ds to row major order, similar to 2-dimensi<strong>on</strong>alarrays in C. The two memory arrangements are illustrated in Figure 2.2.2.2.2 C<strong>on</strong>text managementA GPU can support <strong>on</strong>e or more executi<strong>on</strong> c<strong>on</strong>texts. All executi<strong>on</strong> <strong>on</strong> a GPU is d<strong>on</strong>ethrough a c<strong>on</strong>text. Within a c<strong>on</strong>text, resources can be mapped to <strong>on</strong>e or more predefinednames. A resource is mapped to a name that specifies how the resource can be used within8

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!