18.01.2015 Views

TGQR 2010Q4 Report.pdf - Teragridforum.org

TGQR 2010Q4 Report.pdf - Teragridforum.org

TGQR 2010Q4 Report.pdf - Teragridforum.org

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Machine # Processors Domains Grids per domain # e- per proc. Wall Clock(sec)<br />

Kraken 1024 128 8x512x512 262144 31441<br />

Kraken 4096 512 4x1024x1024 524288 67115<br />

PI. Axelrod (Univ. of Arizona, Physics). Research Allocation for the Large Synoptic Survey<br />

Telescope Data Challenge 3b. Continuing through 08/11. Darren Adams (NCSA), Raghu Reddy<br />

(PSC), and Chris Jordan (TACC) are providing ASTA support for this PI. There is no update this<br />

quarter for this project.<br />

PI.Radko (Naval Postgraduate School, Physical Oceanography) Numerical Modeling of Doublediffusive<br />

Convection. Continuing through 09/11. ASTA staff for this project is Gabriele Jost.<br />

The goal of the AUS is to increase the performance and scalability of Prof. Radko's 2D and 3D<br />

double diffusion convection (DDC) implementations TILT2D and TILT3D. The codes are<br />

currently parallelized base on OpenMP, which limits their scalability on SMP clusters to 1 node<br />

(eg 16 threads on Ranger).<br />

The ASTA staff and PI’s team are pursuing 3 possibilities:<br />

1. Develop an MPI implementation of TILT3D<br />

2. Investigate the use of GPGPUs (eg on Longhorn at TACC) for TILT2D<br />

3. Explore the use of very large scale SMP nodes (eg. SGI Altix UV)<br />

During the first quarter they focused the development of an MPI based implementation of<br />

TILT3D. The work items below were performed.<br />

Item 1: Planning and discussing a strategy with Prof. Radko<br />

=============================================================<br />

The application consists of 2 main programs. One program is used to generate initial input data<br />

for numerical experiments, the other program performs the DDC calculations. It was decided to<br />

use a 1D domain decomposition of the 3D arrays in the 3rd dimension.<br />

The code to generate the input data currently generates a single binary file with input data. For the<br />

distributed version they decided to keep this approach. Each MPI process will read its part of the<br />

input data from the binary file.<br />

The code DDC code uses FFTW to perform 3D real-to-complex FFTs. In order to perform the<br />

FFT on distributed data the structure of the code will be changed to<br />

1. perform 2D FFT, using FFTW real-to-complex in the xy-plane<br />

2. use MPI_All_to_all to redistribute the data<br />

3. perform a 1D complex-to-complex FFT in z dimension<br />

4. use MPI_All_to_all to obtain the original distribution in z-dimension<br />

Item 2: Start implementation<br />

============================<br />

Change the input generation code from sequential into an MPI code. Input generation was<br />

changed to allow direct access to the binary file. This is to support the reading of multiple MPI<br />

processes from different locations of the files.<br />

Test the code for running on 1 MPI process.<br />

Implement several test cases for performing a 3D FFTW in 2 steps: 2D real-to-complex (R2C) in<br />

xy, 1D complex-to-complex (C2C) in z dimension.<br />

60

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!