01.12.2012 Views

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

Architecture of Computing Systems (Lecture Notes in Computer ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Optimiz<strong>in</strong>g Stencil Application on Multi-thread GPU <strong>Architecture</strong> 245<br />

3. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.m.W.:<br />

Optimization pr<strong>in</strong>ciples and application performance evaluation <strong>of</strong> a multithreaded<br />

gpu us<strong>in</strong>g cuda. In: PPoPP 2008: Proceed<strong>in</strong>gs <strong>of</strong> the 13th ACM SIGPLAN Symposium<br />

on Pr<strong>in</strong>ciples and practice <strong>of</strong> parallel programm<strong>in</strong>g, pp. 73–82. ACM, New<br />

York (2008)<br />

4. Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A.,<br />

Sadayappan, P.: Effective automatic parallelization <strong>of</strong> stencil computations. SIG-<br />

PLAN Not. 42(6), 235–244 (2007)<br />

5. Fan, Z., Qiu, F., Kaufman, A., Yoakum-Stover, S.: Gpu cluster for high performance<br />

comput<strong>in</strong>g. In: SC 2004: Proceed<strong>in</strong>gs <strong>of</strong> the 2004 ACM/IEEE conference<br />

on Supercomput<strong>in</strong>g, Wash<strong>in</strong>gton, DC, USA, p. 47. IEEE <strong>Computer</strong> Society Press,<br />

Los Alamitos (2004)<br />

6. Buck, I.: Brook specification v0.2 (2003),<br />

http://hci.stanford.edu/cstr/reports/2003-04.pdf<br />

7. Ryoo, S., Rodrigues, C.I., Stone, S.S., Stratton, J.A., Ueng, S.-Z., Baghsorkhi,<br />

S.S., Hwu, W.-m.W.: Program optimization carv<strong>in</strong>g for gpu comput<strong>in</strong>g. J. Parallel<br />

Distrib. Comput. 68(10), 1389–1401 (2008)<br />

8. Mohan, T., de Sup<strong>in</strong>ski, B.R., McKee, S.A., Mueller, F., Yoo, A., Schulz, M.: Identify<strong>in</strong>g<br />

and exploit<strong>in</strong>g spatial regularity <strong>in</strong> data memory references. In: SC 2003:<br />

Proceed<strong>in</strong>gs <strong>of</strong> the 2003 ACM/IEEE conference on Supercomput<strong>in</strong>g, Wash<strong>in</strong>gton,<br />

DC, USA, p. 49. IEEE <strong>Computer</strong> Society, Los Alamitos (2003)<br />

9. Harris, M.J., Baxter, W.V., Scheuermann, T., Lastra, A.: Simulation <strong>of</strong> cloud<br />

dynamics on graphics hardware. In: HWWS 2003: Proceed<strong>in</strong>gs <strong>of</strong> the ACM<br />

SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, Aire-la-Ville,<br />

Switzerland, Switzerland, pp. 92–101. Eurographics Association (2003)<br />

10. Allen, J.R., Kennedy, K., Porterfield, C., Warren, J.: Conversion <strong>of</strong> control dependence<br />

to data dependence. In: POPL 1983: Proceed<strong>in</strong>gs <strong>of</strong> the 10th ACM SIGACT-<br />

SIGPLAN symposium on Pr<strong>in</strong>ciples <strong>of</strong> programm<strong>in</strong>g languages, pp. 177–189. ACM,<br />

New York (1983)<br />

11. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.-m.W.:<br />

Optimization pr<strong>in</strong>ciples and application performance evaluation <strong>of</strong> a multithreaded<br />

gpu us<strong>in</strong>g cuda. In: PPoPP 2008: Proceed<strong>in</strong>gs <strong>of</strong> the 13th ACM SIGPLAN Symposium<br />

on Pr<strong>in</strong>ciples and practice <strong>of</strong> parallel programm<strong>in</strong>g, pp. 73–82. ACM, New<br />

York (2008)<br />

12. Jang, B., Do, S., Pien, H., Kaeli, D.: <strong>Architecture</strong>-aware optimization target<strong>in</strong>g<br />

multithreaded stream comput<strong>in</strong>g. In: GPGPU-2: Proceed<strong>in</strong>gs <strong>of</strong> 2nd Workshop on<br />

General Purpose Process<strong>in</strong>g on Graphics Process<strong>in</strong>g Units, pp. 62–70. ACM, New<br />

York (2009)<br />

13. Wang, G., Yang, X.J., Zhang, Y., Tang, T., Fang, X.D.: Program optimization <strong>of</strong><br />

stencil based application on the gpu-accelerated system. In: Intl. Symposium on<br />

Parallel and Distributed Process<strong>in</strong>g and Applications, pp. 219–225 (2009)<br />

14. Li, Z., Song, Y.: Automatic til<strong>in</strong>g <strong>of</strong> iterative stencil loops. ACM Trans. Program.<br />

Lang. Syst. 26(6), 975–1028 (2004)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!