02.10.2015 Views

2010 Best Practices Competition IT & Informatics HPC

IT Informatics - Cambridge Healthtech Institute

IT Informatics - Cambridge Healthtech Institute

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Published Resources for the Life Sciences<br />

250 First Avenue, Suite 300, Needham, MA 02494 | phone: 781‐972‐5400 | fax: 781‐972‐5425<br />

transferred in parallel for faster import, and well summary results are imported separately<br />

from cell-level measurements so users can review well-level results more quickly. A webbased<br />

administration tool controls the number of threaded processes and other data import<br />

settings. Experimental annotation, data mining and visualization are supported by the<br />

dedicated Data Explorer client application. Data-intensive operations, including data<br />

extraction and updates, QC and data analysis are implemented on the servers and the<br />

database to reduce the amount of data transferred from server to client. The Data Explorer<br />

also allows users to view images for selected wells or as a ‘poster’ of images for an entire<br />

plate. Images can also be viewed in third-party applications such as TIBCO Spotfire using<br />

a web page (Fig. 1). In either case, the image conversion server retrieves images from the<br />

appropriate platform repository and converts them from proprietary formats to standard<br />

TIFF or JPEG formats as needed.<br />

<strong>IT</strong> Tools & Techniques<br />

The large volumes of data generated by HCS require particular attention to image and data<br />

storage and management.<br />

Storage: HCS system provides scalable and extensible storage that is well suited for<br />

managing large numbers of images. The distributed nature of the system means that input<br />

and output bandwidth grow in parallel with capacity, avoiding a potential bottleneck.<br />

Images are stored at or near the site where they were acquired (and where they are likely<br />

to be analyzed or viewed) to reduce network latency issues. This approach reduced<br />

storage costs while increasing the bandwidth for image transfer.<br />

After extensive product evaluation, we decided on Isilon Systems clustered networkattached<br />

storage appliances. We deployed these as a file service, exposing several<br />

Windows networking file shares to the HCS readers, as well as to researcher workstations.<br />

Key Differentiators influencing our decision for Isilon NAS cluster were: True unified<br />

name space, robust data protection algorithms, straightforward scalability using building<br />

block nodes, ease of administration – FreeBSD CLI and lower-cost SATA disks.<br />

Data Management<br />

The large number of data records generated by HCS also presents an informatics<br />

challenge. We store HCS results in Oracle relational databases, as do other HCS users 10 .<br />

These databases can become very large, primarily because of cell level data. We observed<br />

that as the size of our databases grew, performance deteriorated. To address this, we used<br />

Oracle’s database partitioning capabilities. We focused our efforts on the two largest<br />

tables in the database, which both contain cell-level data. Our partitioning scheme

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!