A Compiler for Parallel Exeuction of Numerical Python Programs on ...

2.2 AMD CAL Programming modelVarious APIs and programming languages are available to program the RV770 GPU <strong>for</strong>GPGPU purposes. The AMD Compute Abstraction Layer (CAL) Application ProgrammingInterface [8] provides the ability to program the GPU using either the R700 family ISA[10] orusing the AMD Intermediate Language (IL) [9]. OpenCL support and DirectX-11 computeshader 4.1 support are also expected in the future <strong>for</strong> the RV770. To program the GPUusing the CAL API, the programmer explicitly manages the GPU. For most efficiency,the programmer must explicitly manage the GPU’s on-board memory and must explicitlytransfer data between the system RAM and the GPU on-board RAM. While the hardwaredoes provide some ability to directly access the system RAM without explicit transfers to theGPU on-board RAM, the ability is very limited. Accessing the system memory directly overthe PCI-e bus is also highly inefficient due to the high latency and relatively low bandwidth.This document discusses the AMD IL programming model since it corresponds closelyto the hardware. AMD IL is a RISC-like program representation derived from the ShaderModel 4.0 assembler. The AMD CAL API includes a JIT compiler to compile and run AMDIL programs and also provides routines <strong>for</strong> managing the GPU memory, initialization andshutdown <strong>of</strong> the GPU and querying the GPU <strong>for</strong> available resources. The CAL runtime andthe CAL compiler are distributed as part <strong>of</strong> the AMD graphic driver.2.2.1 Memory managementIn the CAL API, the memory on the GPU is allocated as two-dimensional resources. Toallocate a resource, the programmer specifies the height, width, <strong>for</strong>mat, memory type andlocation <strong>of</strong> the resource. The width and height are both restricted to 8192 elements. The<strong>for</strong>mat specifies the element type <strong>of</strong> the resource and can be <strong>of</strong> any <strong>of</strong> the numeric datatypesup to 128-bit width. For example, float, float2, float4, int, int2, int4, double and double2are all valid <strong>for</strong>mat specifiers. The location specifies whether the resource is located in theGPU memory or in a driver-allocated portion <strong>of</strong> system RAM.For GPU resources, another specifier is the memory type. AMD GPUs are capable <strong>of</strong>storing resources in two different physical arrangements. The default arrangement is a tiledarrangement where a block <strong>of</strong> 16×4 bytes is stored contiguously. The other option is to storeresources in linear memory that corresponds to row major order, similar to 2-dimensionalarrays in C. The two memory arrangements are illustrated in Figure 2.2.2.2.2 Context managementA GPU can support one or more execution contexts. All execution on a GPU is donethrough a context. Within a context, resources can be mapped to one or more predefinednames. A resource is mapped to a name that specifies how the resource can be used within8

Previous page

Next page

3

4

5

6

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?