RECENT DEVELOPMENT IN COMPUTATIONAL SCIENCE

Recommendations

Info

ISCS 2011 Selected Papers Vol.2 Time Benchmarks for the OpenMP and GPU Elapsed time (sec) 5 4 3 2 1 FFTW CUFFT(DOUBLE) CUFFT(S<strong>IN</strong>GLE) 0.02 0.01 0 0 50 100 0 0 100 200 300 400 Transform Size (N^3) Figure 8: The computational times with respect to the various sizes for the CUFFT routine of 3D-FFT (GPU) in M2. Elapsed time (sec) 70 60 50 40 30 20 10 MPI + GPU MPI(BLAS+FFTW) MPI(CUBLAS+CUFFT) MPI(CUBLAS+FFTW) 0 0 1 2 3 4 5 6 Num.of Processors Figure 10: Conputational times (single CPMD iteration in second) with respect to the number of MPI threads for MPI+GPU (Tesla C1060) version. The single MPI thread can access to a single GPU in M2. Speed Up Factor 16 14 12 10 8 6 4 2 6 4 2 MPI + OpenMP MPI 0 0 2 4 6 0 0 2 4 6 8 10 12 14 16 Num.of Processors Figure 9: Parallelization efficiency for the MPI and Hybrid versions for BaTiO3 system. Elapsed time (sec) 35 30 25 20 15 10 5 0 6.30 5.09 20.0 33.9 2.35 16.6 2.08 21.0 Other FFT Matrix 8.70 2.02 4.45 2.23 FFTW+BLAS CUFFT+CUBLAS FFTW+CUBLAS Figure 11: Conputational times (single CPMD iteration in second) of MM, FFT, and others for BaTiO3 by MPI+ GPU (Tesla C1060) version (3MPI+3GPU). GPU calculation for the MM (CUBLAS) is improved considerably, whereas the FFT by CUFFT is more time-consuming than the FFTW, as expected from the result in Fig. 8. 4.3 MPI+OpenMP+GPU In the experience which is encountered in the previous sections, the FFT by GPU cannot be carried out so high performance. Taking into account this, the MPI+OpenMP+GPU (hybrid+GPU) version is organized with the MPI (k point parallelization), the MM by GPU, the FFT by OpenMP (6 threads in OpenMP), and other minor parallelization. This is tested in the magnetic system (Fe/MgO/Fe) with M3 (4 threads in the MPI and 2 GPUs). The result with the CUDA Toolkit 23
ISCS 2011 Selected Papers Vol.2 J. Gotou, S. Haraguchi, M. Tsujikawa, T. Oda Elapsed time (sec) 25 20 15 10 5 0 4.98 4.44 13.9 23.3 MPI 4.87 1.41 2.75 9.03 Hybrid 5.87 4.09 1.79 Other FFT Matrix 12.08 4.71 1.63 1.93 7.97 MPI+GPU Hybrid+GPU Figure 12: Computation times (single CPMD iteration in second) for the slab of Fe(5MLs)/MgO(6MLs)/Fe(5MLs) in M3. 3.1 is represented in Fig. 12. In the MPI+GPU calculation, the FFTW is used in the 3D-FFT for each k point sampling. The hybrid+GPU calculation marks the highest score among the versions investigated. By introducing OpenMP and GPU, the time ratio of MM has become to 24% from 70% in the single processing calculation and from 60% in the calculation. If preparing one GPU per MPI thread, instead of one GPU per two MPI threads, the MM can be accelarated. Introducing the MPI, OpenMP, and GPU to the CPMD code, the computation other than the MM and FFT becomes to the major part of time-consuming. It should be improved in future. 5 Conclusions We have investigated computational performance of the CPMD code for the OpenMP parallelization and the GPU architecture in addition to the MPI parallelization. Our test has succeeded to reduce the computing time considerably, compared with the MPI version for the magnetic slab system which is an accesible model for typical magnetic tunnel junction. Based on such development of the code which accords with the new architectures, we may approach electronic structures and molecular dynamics for larger realistic systems with the high performance computation. Acknowledgment One of authors (T.O.) would like to thank the Japan Society for the Promotion of Science (JSPS) for financial supports (Grant 22104012, 22360014, and 22340106). One of authors (M.T.) acknowledges the JSPS Research Fellowships (Grant No.20-6647) for Young Scientists. References [1] D. Vanderbilt (1990). Soft self-consistent pseudopotentials in a generalized eigenvalue formalism, Phys. Rev. B, 41, 7892-7895. 24
Page 1 and 2: RECENT DEVELOPMENT IN COMPUTATIONAL
Page 3 and 4: Group Photo February 17, 2011 �
Page 8 and 9: Compression Stress Effect on Disloc
Page 10 and 11: ISCS 2011 Selected Papers Vol.2 Com
Page 12 and 13: ISCS 2011 Selected Papers Vol.2 Com
Page 14: ISCS 2011 Selected Papers Vol.2 Com
Page 17 and 18: ISCS 2011 Selected Papers Vol.2 S.
Page 25 and 26: ISCS 2011 Selected Papers Vol.2 J.
Page 27 and 28: ISCS 2011 Selected Papers Vol.2 J.
Page 29: ISCS 2011 Selected Papers Vol.2 J.
Page 34 and 35: Simulation Of Fluid-Solid Interacti
Page 36 and 37: ISCS 2011 Selected Papers Vol.2 Sim
Page 44 and 45: High-Pressure Crystal Structure Pre
Page 46 and 47: ISCS 2011 Selected Papers Vol.2 Cry
Page 52: ISCS 2011 Selected Papers Vol.2 Cry
Page 55 and 56: ISCS 2011 Selected Papers Vol.2 M.
Page 67 and 68: ISCS 2011 Selected Papers Vol.2 Mic
Page 77 and 78: ISCS 2011 Selected Papers Vol.2 Gia
Page 79 and 80: ISCS 2011 Selected Papers Vol.2 Gia
Page 81 and 82:
ISCS 2011 Selected Papers Vol.2 Gia
Page 83 and 84:
ISCS 2011 Selected Papers Vol.2 Gia
Page 86 and 87:
International Symposium on Computat
Page 88 and 89:
February 16 (Wednesday): 2 nd day M
Page 90 and 91:
List of Participants Akhmaloka Rect
Page 92:
Gia Septiana Wulandari Triati Dewi
show all

RECENT DEVELOPMENT IN COMPUTATIONAL SCIENCE

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?