A Compiler for Parallel Exeuction of Numerical Python Programs on ...
A Compiler for Parallel Exeuction of Numerical Python Programs on ...
A Compiler for Parallel Exeuction of Numerical Python Programs on ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Table 7.3: Executi<strong>on</strong> time <str<strong>on</strong>g>for</str<strong>on</strong>g> matrix multiplicati<strong>on</strong> benchmark <str<strong>on</strong>g>for</str<strong>on</strong>g> 64-bit floating point(sec<strong>on</strong>ds)Problem Size 1024 2048 3072 4096OpenMP 4 threads min 8.55 109.5 274.85 1406.82max 8.68 110.4 275.8 1409.5mean 8.59 109.8 275.1 1408.0ATLAS BLAS min 0.244 1.34 4.38 11.62max 0.247 1.36 4.47 11.70mean 0.245 1.35 4.41 11.64GPU Total (Opt) min 0.147 0.679 3.71 5.34max 0.16 0.695 3.82 5.53mean 0.15 0.685 3.78 5.39GPU Only (Opt) min 0.078 0.513 2.34 3.84max 0.09 0.543 2.44 4.06mean 0.081 0.523 2.38 3.91GPU Total (No opt) min 0.227 1.42 6.21 11.26max 0.24 1.53 6.52 11.73mean 0.232 1.47 6.33 11.4GPU Only (No opt) min 0.164 1.249 4.87 10.13max 0.177 1.33 5.182 10.66mean 0.168 1.27 4.96 10.31Table 7.4: Speedups <str<strong>on</strong>g>for</str<strong>on</strong>g> matrix multiplicati<strong>on</strong> using GPU <str<strong>on</strong>g>for</str<strong>on</strong>g> 64-bit floating point overATLASProblem Size Speedup (No Opt) Speedup (Opt)1024 1.07 1.6592048 0.95 1.973072 0.7 1.184096 1.03 2.1755