Parallel Computing On Qualcomm Platforms Using OpenCL - Uplinq
Parallel Computing On Qualcomm Platforms Using OpenCL - Uplinq
Parallel Computing On Qualcomm Platforms Using OpenCL - Uplinq
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>OpenCL</strong> Memory Model<br />
• Memory Types<br />
– Private: on-chip, temp register file<br />
– Local: on-chip, read-write, shared within a compute<br />
unit<br />
– Constant: on-chip memory, read-only, shared within<br />
compute unit<br />
– Global: system memory, shared between all<br />
compute units, can be on host or device<br />
• Memory model<br />
– <strong>OpenCL</strong> provides barrier mechanism for<br />
synchronization<br />
o Memory is assumed to be undefined across work items<br />
unless explicitly synchronized<br />
– Multiple distinct address spaces<br />
– Address spaces can be collapsed depending on the<br />
device’s memory subsystem<br />
• Diagram shows typical memory layout for a<br />
GPU architecture<br />
t<br />
Private<br />
Memory<br />
Private<br />
Memory<br />
Private<br />
Memory<br />
Private<br />
Memory<br />
ALU #1 ALU #M ALU #1 ALU #M<br />
Local Memory<br />
Compute Unit 1<br />
Compute Device<br />
Local Memory<br />
Compute Unit N<br />
Global / Constant Memory Data Cache<br />
Global Memory<br />
Compute Device Memory<br />
21<br />
<strong>Qualcomm</strong> Proprietary