01.02.2013 Views

RECENT DEVELOPMENT IN COMPUTATIONAL SCIENCE

RECENT DEVELOPMENT IN COMPUTATIONAL SCIENCE

RECENT DEVELOPMENT IN COMPUTATIONAL SCIENCE

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

ISCS 2011 Selected Papers Vol.2 Time Benchmarks for the OpenMP and GPU<br />

GFLOPS<br />

GFLOPS<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

(a) real number (M2)<br />

DGEMM(8 Threads)<br />

DGEMM(Tesla C1060)<br />

SGEMM(Tesla C1060)<br />

Ratio(Single/Double)<br />

256<br />

512<br />

1024 2048<br />

Matrix size<br />

4096<br />

(c) real number (M3)<br />

Tesla C1060<br />

Tesla M2050 with cuda3.1<br />

Tesla M2050 with cuda3.2<br />

256<br />

512<br />

1024 2048<br />

Matrix size<br />

4096<br />

5120<br />

5120<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

0<br />

RATIO<br />

GFLOPS<br />

GFLOPS<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

(b) complex number (M2)<br />

ZGEMM(8 Threads)<br />

ZGEMM(Tesla C1060)<br />

CGEMM(Tesla C1060)<br />

Ratio(Single/Double)<br />

256<br />

512<br />

1024 2048<br />

Matrix size<br />

4096<br />

5120<br />

(d) complex number (M3)<br />

Tesla C1060<br />

Tesla M2050 with cuda3.1<br />

Tesla M2050 with cuda3.2<br />

256<br />

512 1024<br />

Matrix size<br />

2048<br />

Figure 4: Effective FLOPS values of the MM for various matrix sizes (256 ∼ 5120) by SGEMM, CGEMM, DGEMM<br />

and ZGEMM routines in Tesla GPUs (Tesla C1060 and Tesla M2050). The upper and lower two panels are obtained<br />

in the machines of M2 (a)(b) and M3 (c)(d) (see Table 1), respectively. The times for memory transfer are included<br />

in the estimation.<br />

Memory Transfer Rate [%]<br />

100<br />

80<br />

60<br />

40<br />

20<br />

0<br />

MM<br />

DtoH<br />

HtoD<br />

37.5%<br />

14.6%<br />

47.8%<br />

128<br />

51.8%<br />

11.3%<br />

36.8%<br />

256<br />

56.9%<br />

20.1%<br />

22.9%<br />

1024<br />

79.2%<br />

8.6%<br />

12.2%<br />

4096<br />

Figure 5: Data transfer ratio in DGEMM calculation for<br />

various matrix sizes in M3.<br />

21<br />

4096<br />

6<br />

4<br />

2<br />

0<br />

RATIO

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!