10.07.2015 Views

ATI Stream Computing OpenCL Programming Guide - CiteSeerX

ATI Stream Computing OpenCL Programming Guide - CiteSeerX

ATI Stream Computing OpenCL Programming Guide - CiteSeerX

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>ATI</strong> STREAM COMPUTINGWork-ItemT0XXXXXXXSTALLREADYT1READYXXXXXXXSTALLT2READYXXXXSTALLT3READYXSTALL0 20 40 60 80= executing = ready (not executing) XX = stalledFigure 1.9Simplified Execution Of Work-Items On A Single <strong>Stream</strong> CoreAt runtime, work-item T0 executes until cycle 20; at this time, a stall occurs dueto a memory fetch request. The scheduler then begins execution of the nextwork-item, T1. Work-item T1 executes until it stalls or completes. New work-itemsexecute, and the process continues until the available number of active workitemsis reached. The scheduler then returns to the first work-item, T0.If the data work-item T0 is waiting for has returned from memory, T0 continuesexecution. In the example in Figure 1.9, the data is ready, so T0 continues. Sincethere were enough work-items and processing element operations to cover thelong memory latencies, the stream core does not idle. This method of memorylatency hiding helps the GPU compute device achieve maximum performance.If none of T0 – T3 are runnable, the stream core waits (stalls) until one of T0 –T3 is ready to execute. In the example shown in Figure 1.10, T0 is the first tocontinue execution.1.6 GPU Compute Device Scheduling 1-15Copyright © 2010 Advanced Micro Devices, Inc. All rights reserved.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!