2010 Best Practices Competition IT & Informatics HPC
IT Informatics - Cambridge Healthtech Institute
IT Informatics - Cambridge Healthtech Institute
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Published Resources for the Life Sciences<br />
250 First Avenue, Suite 300, Needham, MA 02494 | phone: 781‐972‐5400 | fax: 781‐972‐5425<br />
transferred in parallel for faster import, and well summary results are imported separately<br />
from cell-level measurements so users can review well-level results more quickly. A webbased<br />
administration tool controls the number of threaded processes and other data import<br />
settings. Experimental annotation, data mining and visualization are supported by the<br />
dedicated Data Explorer client application. Data-intensive operations, including data<br />
extraction and updates, QC and data analysis are implemented on the servers and the<br />
database to reduce the amount of data transferred from server to client. The Data Explorer<br />
also allows users to view images for selected wells or as a ‘poster’ of images for an entire<br />
plate. Images can also be viewed in third-party applications such as TIBCO Spotfire using<br />
a web page (Fig. 1). In either case, the image conversion server retrieves images from the<br />
appropriate platform repository and converts them from proprietary formats to standard<br />
TIFF or JPEG formats as needed.<br />
<strong>IT</strong> Tools & Techniques<br />
The large volumes of data generated by HCS require particular attention to image and data<br />
storage and management.<br />
Storage: HCS system provides scalable and extensible storage that is well suited for<br />
managing large numbers of images. The distributed nature of the system means that input<br />
and output bandwidth grow in parallel with capacity, avoiding a potential bottleneck.<br />
Images are stored at or near the site where they were acquired (and where they are likely<br />
to be analyzed or viewed) to reduce network latency issues. This approach reduced<br />
storage costs while increasing the bandwidth for image transfer.<br />
After extensive product evaluation, we decided on Isilon Systems clustered networkattached<br />
storage appliances. We deployed these as a file service, exposing several<br />
Windows networking file shares to the HCS readers, as well as to researcher workstations.<br />
Key Differentiators influencing our decision for Isilon NAS cluster were: True unified<br />
name space, robust data protection algorithms, straightforward scalability using building<br />
block nodes, ease of administration – FreeBSD CLI and lower-cost SATA disks.<br />
Data Management<br />
The large number of data records generated by HCS also presents an informatics<br />
challenge. We store HCS results in Oracle relational databases, as do other HCS users 10 .<br />
These databases can become very large, primarily because of cell level data. We observed<br />
that as the size of our databases grew, performance deteriorated. To address this, we used<br />
Oracle’s database partitioning capabilities. We focused our efforts on the two largest<br />
tables in the database, which both contain cell-level data. Our partitioning scheme