21.09.2015 Views

SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond

SIENA European Roadmap on Grid and Cloud Standards for e ...

SIENA European Roadmap on Grid and Cloud Standards for e ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Biology <strong>on</strong> the <strong>Cloud</strong><br />

The <strong>Cloud</strong> provides a wide range of infrastructure <strong>and</strong> software services that can be used<br />

by the Biology user community. Indeed, experienced technical computing users are already<br />

finding ways in which to use these services to augment their existing computing resources.<br />

The greater promise of the cloud is that it can make technical computing pervasive,<br />

opening up the field to new researchers who have not been traditi<strong>on</strong>al HPC users. These<br />

researchers will be able to co-opt sophisticated cloud services provided by both academia<br />

<strong>and</strong> commercial providers to aid them in their research. In this paper I will showcase two<br />

Biology <strong>Cloud</strong> use cases which offer a number of advantages to users.<br />

IaaS: Web-services Mirrors<br />

The Ensembl project provides a variety of web services which allows researchers to visualise<br />

<strong>and</strong> data-mine genomic data (www.ensembl.org). Ensembl has a world-wide audience <strong>and</strong> is<br />

accessed 24 hours a day. Historically, the web service was hosted in a single UK datacentre.<br />

Whilst this provided fast access to users in the UK <strong>and</strong> Europe, users in Asia <strong>and</strong> the<br />

Americas found that access to the web services was slow, due the large latencies involved<br />

in serving requests across the globe. Single site hosting also made the website vulnerable<br />

to datacentre <strong>and</strong> network outages.<br />

The global, distributed nature of commercial <strong>Cloud</strong> IaaS make them a useful building block<br />

<strong>for</strong> providing world-wide availability <strong>and</strong> reach. Ensembl has used public IaaS providers to<br />

build mirrors of its web services in the United States of America <strong>and</strong> Asia. Not <strong>on</strong>ly has this<br />

massively increased the per<strong>for</strong>mance of the website <strong>for</strong> n<strong>on</strong> <str<strong>on</strong>g>European</str<strong>on</strong>g> users, but it also<br />

provides c<strong>on</strong>tinued availability of service when the UK datacentre is offline.<br />

<strong>Cloud</strong> hosting provides several advantages over hosting in a traditi<strong>on</strong>al co-locati<strong>on</strong> facility.<br />

Installing real hardware in a remote co-locati<strong>on</strong> facility requires time-c<strong>on</strong>suming <strong>and</strong> costly<br />

logistics. Hardware has to be shipped to the facility <strong>and</strong> cleared through customs, <strong>and</strong><br />

staff need to be present <strong>on</strong> site to oversee hardware installati<strong>on</strong> <strong>and</strong> initial provisi<strong>on</strong>ing.<br />

In c<strong>on</strong>trast, provisi<strong>on</strong>ing virtual hardware in a remote cloud IaaS facility can be d<strong>on</strong>e from<br />

any locati<strong>on</strong> with internet access, whilst the “<strong>on</strong>-dem<strong>and</strong>” facilities allow machines to be<br />

provisi<strong>on</strong>ed within a matter of minutes<br />

<strong>Cloud</strong>Scape III - Taking <str<strong>on</strong>g>European</str<strong>on</strong>g> <strong>Cloud</strong> Infrastructure Forward<br />

25<br />

SaaS: Providing In<strong>for</strong>matics services <strong>for</strong> Next-Generati<strong>on</strong><br />

Sequencing (NGS)<br />

SasS provides new opportunities <strong>for</strong> organisati<strong>on</strong>s to provide IT services to researchers.<br />

IT service provisi<strong>on</strong> <strong>for</strong> next-generati<strong>on</strong> sequencing machines is a huge challenge. A single<br />

sequencing instrument can produce approximately a terabyte of raw data per day <strong>and</strong> a<br />

large sequencing study may end up with a total dataset of many hundreds of terabytes.<br />

Dealing with this data is a challenge <strong>for</strong> organisati<strong>on</strong>s of all sizes, whether they are a small<br />

lab with a single machine, or a large sequencing centre with many tens of machines.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!