2010 Best Practices Competition IT & Informatics HPC
IT Informatics - Cambridge Healthtech Institute
IT Informatics - Cambridge Healthtech Institute
- No tags were found...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Figure 2 Saguaro2 supercomputer<br />
In addition to the Saguaro2 cluster system, TGen also has a large memory Symmetric Multi-Processor<br />
(SMP) system available. This system, is an SGI Altix 4700 consisting of 48 Intel IA-64 cores and 576 GB<br />
of globally shared memory. The SGI system is well suited for solving memory intensive problems, or<br />
algorithms that are not easily parallelized. With the resources available on this system, it can run several<br />
concurrent memory intensive jobs, without having a performance penalty inflicted due to the architecture<br />
of both the processors and the I/O backplanes on this system. This system was funded via NIH Grant<br />
S10 RR023390-01.<br />
Updated NextGen sequencing workflow:<br />
Learning from the experience and systematically identifying the resource requirements at various stages<br />
of the NextGen data analysis and transfer, TGen developed and installed a significantly improved<br />
NextGen sequencing data processing pipeline (Figures 3 & 4). The updated data processing pipeline<br />
utilizes several customized scripts tailored to the software implementation underlying various data<br />
analysis tools have been developed, which improve the effectiveness of using <strong>HPC</strong> for analyses. By<br />
indentifying the critical files at various stages, redundancy of storage has been minimized and policies<br />
have been established to delete intermediate files automatically after fixed time. Several compute<br />
systems have been dedicated to local data processing, such as annotation and parsing. Involving PIs in<br />
the infrastructure design process and educating their research staff has helped significantly in creating a<br />
team of proficient and more mindful users of the data processing pipeline.