Vectorizing the forward mode of ADOL-C on a GPU ... - Autodiff.org

More documents

Recommendations

Info

. K. Kulshreshtha, A. Koniaeva 4 / 13 <strong>Vectorizing</strong> <strong>ADOL</strong>-C using CUDA Euro AD 10.06.2013GPU Computing.NVIDIA’s CUDA architectureNVIDIA’s CUDA architectureHostDeviceGrid 1• E.g: NVIDIA adro 4000 with CUDARuntime version 4.2 hasKernel 1Block(0, 0)Block(0, 1)Block(1, 0)Block(1, 1)Block(2, 0)Block(2, 1)Maximum number <strong>of</strong> threads per block:1024Maximum sizes <strong>of</strong> each dimension <strong>of</strong> a block:1024 x 1024 x 64Maximum sizes <strong>of</strong> each dimension <strong>of</strong> a grid:65535 x 65535 x 65535Kernel 2Grid 2Block (1, 1)• Several kernels may be started parallely bydistributing <strong>the</strong>m on <strong>the</strong> gridThread(0, 0)Thread(0, 1)Thread(1, 0)Thread(1, 1)Thread(2, 0)Thread(2, 1)Thread(3, 0)Thread(3, 1)Thread(4, 0)Thread(4, 1)Thread(0, 2)Thread(1, 2)Thread(2, 2)Thread(3, 2)Thread(4, 2)The host issues a succession <strong>of</strong> kernel invocations to <strong>the</strong> device. Each kernel is executed as a batch<strong>of</strong> threads organized as a grid <strong>of</strong> thread blocks
. K. Kulshreshtha, A. Koniaeva 5 / 13 <strong>Vectorizing</strong> <strong>ADOL</strong>-C using CUDA Euro AD 10.06.2013.NVIDIA’s CUDA architectureGPU ComputingNVIDIA’s CUDA architecture• Memory access is bidirectional• Data can be ga<strong>the</strong>red fromvarious memory locations toeach core• Each core may scaercomputed data across variousmemory locations• Some GPUs provide on-deviceDRAM as a buffer betweensystem memory and executioncoresControlALU ALU ALUControl...ALU ALU ALU ...CacheCacheDRAMd0 d1 d2 d3d4 d5 d6 d7Ga<strong>the</strong>rControlALU ALU ALUControl...ALU ALU ALU ...CacheCacheDRAMd0 d1 d2 d3d4 d5 d6 d7Scatter…………
Page 1 and 2: .. K. Kulshreshtha, A. Koniaeva 1 /
Page 3 and 4: . K. Kulshreshtha, A. Koniaeva 3 /
Page 5: . K. Kulshreshtha, A. Koniaeva 4 /
Page 25 and 26: Summary, Issues & Outlook.Summary,

Vectorizing the forward mode of ADOL-C on a GPU ... - Autodiff.org

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?