02.10.2015 Views

2010 Best Practices Competition IT & Informatics HPC

IT Informatics - Cambridge Healthtech Institute

IT Informatics - Cambridge Healthtech Institute

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Figure 2 Saguaro2 supercomputer<br />

In addition to the Saguaro2 cluster system, TGen also has a large memory Symmetric Multi-Processor<br />

(SMP) system available. This system, is an SGI Altix 4700 consisting of 48 Intel IA-64 cores and 576 GB<br />

of globally shared memory. The SGI system is well suited for solving memory intensive problems, or<br />

algorithms that are not easily parallelized. With the resources available on this system, it can run several<br />

concurrent memory intensive jobs, without having a performance penalty inflicted due to the architecture<br />

of both the processors and the I/O backplanes on this system. This system was funded via NIH Grant<br />

S10 RR023390-01.<br />

Updated NextGen sequencing workflow:<br />

Learning from the experience and systematically identifying the resource requirements at various stages<br />

of the NextGen data analysis and transfer, TGen developed and installed a significantly improved<br />

NextGen sequencing data processing pipeline (Figures 3 & 4). The updated data processing pipeline<br />

utilizes several customized scripts tailored to the software implementation underlying various data<br />

analysis tools have been developed, which improve the effectiveness of using <strong>HPC</strong> for analyses. By<br />

indentifying the critical files at various stages, redundancy of storage has been minimized and policies<br />

have been established to delete intermediate files automatically after fixed time. Several compute<br />

systems have been dedicated to local data processing, such as annotation and parsing. Involving PIs in<br />

the infrastructure design process and educating their research staff has helped significantly in creating a<br />

team of proficient and more mindful users of the data processing pipeline.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!