01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

244 F. Xudong et al.<br />

5 Related Work<br />

With the popular adoption <strong>of</strong> GPUs <strong>in</strong> scientific comput<strong>in</strong>g, much research has<br />

recently been performed <strong>in</strong> optimiz<strong>in</strong>g GPU applications us<strong>in</strong>g general programm<strong>in</strong>g<br />

languages such as Brook+ and CUDA.<br />

Ryoo et al. proposed optimization pr<strong>in</strong>ciples for efficient mapp<strong>in</strong>g <strong>of</strong> computation<br />

to graphics hardware [11]. Their ma<strong>in</strong> concern was us<strong>in</strong>g massive multithread<strong>in</strong>g<br />

to exploit the rich stream core resource and hide memory fetch latency.<br />

Jang et al. presented an optimization methodology that utilizes architectural <strong>in</strong>formation<br />

to accelerate programs on AMD GPUs [12]. They exploited optimizations<br />

by def<strong>in</strong><strong>in</strong>g optimization spaces. Their work demonstrated many AMD<br />

GPU details. Wang et al. presented GPU implementation <strong>of</strong> Mgrid us<strong>in</strong>g CUDA<br />

<strong>in</strong> the s<strong>in</strong>gle precision float<strong>in</strong>g po<strong>in</strong>t version [13]. They exploited data locality<br />

<strong>in</strong> 3-level memory hierarchies and tuned thread granularity thus reduc<strong>in</strong>g the<br />

pressure on the <strong>of</strong>f-chip memory bandwidth. Due to architecture deviation, their<br />

optimization strategies can not directly applied us<strong>in</strong>g Brook+ on AMD GPUs.<br />

In our work, we choose Mgrid s<strong>in</strong>ce its stencil computations provide rich<br />

opportunity for exploit<strong>in</strong>g on-chip parallelism and hid<strong>in</strong>g memory access<strong>in</strong>g latencies.<br />

Li et al. proposed a compiler framework for automatic til<strong>in</strong>g <strong>of</strong> iterative<br />

stencil loops to improve the cache reuse [14]. Krishnamoorthy et al. developed<br />

an approach for automatic parallelization <strong>of</strong> stencil codes, explicitly address<strong>in</strong>g<br />

the issue <strong>of</strong> load-balanced execution <strong>of</strong> tiles caused by loop skew<strong>in</strong>g <strong>in</strong> the time<br />

dimension [4]. They focused on improv<strong>in</strong>g cache locality <strong>of</strong> the CPU.<br />

6 Conclusions and Future Work<br />

In this paper, we implemented and optimized a real benchmark application Mgrid<br />

on AMD Radeon HD4870 GPU, and achieved very good experimental results.<br />

Though our implementation and optimizations are based on Mgrid, theoptimization<br />

strategies can be use to improve any stencil computations. In the future, we<br />

would like to determ<strong>in</strong>e thread granularity automatically to simplify the application<br />

optimization on the GPU; and we would also consider explor<strong>in</strong>g the GPU<br />

application performance at the <strong>in</strong>termediate language (IL) level. To fully exploit<br />

the CPU and GPU heterogeneous platform, we would try to automatically<br />

distribute tasks between the two comput<strong>in</strong>g resources.<br />

References<br />

1. AMD.: Ati stream comput<strong>in</strong>g user guide v1.4beta (2009),<br />

http://developer.amd.com/gpu_assets/Stream_<strong>Comput<strong>in</strong>g</strong>_User_Guide.pdf<br />

2. NVIDIA.: Compute unified device architecture programm<strong>in</strong>g guide v2.1beta<br />

(2009),<br />

http://developer.download.nvidia.com/compute/cuda/<br />

1 0/NVIDIA CUDA Programm<strong>in</strong>g Guide 1.0.pdf

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!