RECENT DEVELOPMENT IN COMPUTATIONAL SCIENCE
RECENT DEVELOPMENT IN COMPUTATIONAL SCIENCE
RECENT DEVELOPMENT IN COMPUTATIONAL SCIENCE
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
ISCS 2011 Selected Papers Vol.2 Time Benchmarks for the OpenMP and GPU<br />
GFLOPS<br />
GFLOPS<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
(a) real number (M2)<br />
DGEMM(8 Threads)<br />
DGEMM(Tesla C1060)<br />
SGEMM(Tesla C1060)<br />
Ratio(Single/Double)<br />
256<br />
512<br />
1024 2048<br />
Matrix size<br />
4096<br />
(c) real number (M3)<br />
Tesla C1060<br />
Tesla M2050 with cuda3.1<br />
Tesla M2050 with cuda3.2<br />
256<br />
512<br />
1024 2048<br />
Matrix size<br />
4096<br />
5120<br />
5120<br />
7<br />
6<br />
5<br />
4<br />
3<br />
2<br />
1<br />
0<br />
RATIO<br />
GFLOPS<br />
GFLOPS<br />
350<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
300<br />
250<br />
200<br />
150<br />
100<br />
50<br />
0<br />
(b) complex number (M2)<br />
ZGEMM(8 Threads)<br />
ZGEMM(Tesla C1060)<br />
CGEMM(Tesla C1060)<br />
Ratio(Single/Double)<br />
256<br />
512<br />
1024 2048<br />
Matrix size<br />
4096<br />
5120<br />
(d) complex number (M3)<br />
Tesla C1060<br />
Tesla M2050 with cuda3.1<br />
Tesla M2050 with cuda3.2<br />
256<br />
512 1024<br />
Matrix size<br />
2048<br />
Figure 4: Effective FLOPS values of the MM for various matrix sizes (256 ∼ 5120) by SGEMM, CGEMM, DGEMM<br />
and ZGEMM routines in Tesla GPUs (Tesla C1060 and Tesla M2050). The upper and lower two panels are obtained<br />
in the machines of M2 (a)(b) and M3 (c)(d) (see Table 1), respectively. The times for memory transfer are included<br />
in the estimation.<br />
Memory Transfer Rate [%]<br />
100<br />
80<br />
60<br />
40<br />
20<br />
0<br />
MM<br />
DtoH<br />
HtoD<br />
37.5%<br />
14.6%<br />
47.8%<br />
128<br />
51.8%<br />
11.3%<br />
36.8%<br />
256<br />
56.9%<br />
20.1%<br />
22.9%<br />
1024<br />
79.2%<br />
8.6%<br />
12.2%<br />
4096<br />
Figure 5: Data transfer ratio in DGEMM calculation for<br />
various matrix sizes in M3.<br />
21<br />
4096<br />
6<br />
4<br />
2<br />
0<br />
RATIO