02.10.2015 Views

2010 Best Practices Competition IT & Informatics HPC

IT Informatics - Cambridge Healthtech Institute

IT Informatics - Cambridge Healthtech Institute

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

location and retention of key data. Specifically, <strong>IT</strong> took the following steps to improve the data<br />

management process and accelerate the scientific workflow:<br />

• Dedicated NFS storage for raw reads, attached to back-up tape library<br />

• Dedicated NFS storage for all results, attached to back-up tape library<br />

• Automated backup process for “key” files<br />

• User education on how to mount/unmount the storage space<br />

• Configured Aspera server to read directly from designated NFS mount points eliminating<br />

unnecessary data moves<br />

• Weekly cron jobs for monitoring and informing users about storage resource capacity<br />

• Automated monitoring of user jobs utilizing the <strong>HPC</strong> Cluster<br />

• Established a SharePoint based web portal to share NextGen project related information<br />

These changes had to be synchronized and communicated across multiple scientific divisions as well as<br />

the within the <strong>IT</strong> department. The end result was a more streamlined scientific workflow, improved data<br />

management environment and reduced impact on the storage, backup and network infrastructure.<br />

Lesson, be flexible in regards to data management procedures and the supporting infrastructure. Rapidly<br />

advancing technologies such as NextGen sequencing can render your current methods obsolete and you<br />

must be willing to make dramatic changes in response to the needs of the scientific community and the<br />

demands of the technology.<br />

Benchmarking:<br />

Alignment of billions of reads to reference genomes is computationally expensive. An effort was initiated<br />

to benchmark sequence alignment tools. TGen’s <strong>IT</strong> team was actively involved in this process by<br />

providing several performance measurement and tuning tools and creating automated scripts for<br />

collecting data about computing resource utilization associated with six popular sequence alignment<br />

programs. <strong>IT</strong> used performance measurement tools for cluster computing environments to benchmark<br />

the speed, CPU utilization and input-output bandwidth needed for the program. This information is now<br />

being used for selecting the best tool for various projects and planning the resource requirements for<br />

future NextGen sequencing projects. Lesson, time spent benchmarking can provide significant benefit in<br />

terms of reducing the cost and effort associated with the “trial & error” approach to selecting and using<br />

complex technology such sequencing alignment tools.<br />

[Results]<br />

Key Technologies & Supporting Methodologies<br />

The TGen High Performance Bio-Computing Center (HPBC) manages a diverse collection of <strong>HPC</strong><br />

systems, storage and networking resources, including two large supercomputers. The first supercomputer<br />

is called Saguaro2, and is a Dell Linux cluster. This system consists of ~4000 Intel x86-64 processor<br />

cores, with 2 GB RAM per core. This system has a shared parallel 250 TB (Lustre) file system that allows<br />

massive amounts of concurrent input/output operations spread across many compute nodes. This system<br />

is very effective at running thousands of concurrent discrete processing jobs, or at running very large<br />

parallel processing workloads. This large <strong>HPC</strong> cluster system is installed at the Arizona State University<br />

campus in the Fulton High Performance Computing Initiative (<strong>HPC</strong>I) center and was funded via NIH grant<br />

S10 RR25056-01.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!