SIENA European Roadmap on Grid and Cloud Standards for e-Science and Beyond
SIENA European Roadmap on Grid and Cloud Standards for e ...
SIENA European Roadmap on Grid and Cloud Standards for e ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Biology <strong>on</strong> the <strong>Cloud</strong><br />
The <strong>Cloud</strong> provides a wide range of infrastructure <strong>and</strong> software services that can be used<br />
by the Biology user community. Indeed, experienced technical computing users are already<br />
finding ways in which to use these services to augment their existing computing resources.<br />
The greater promise of the cloud is that it can make technical computing pervasive,<br />
opening up the field to new researchers who have not been traditi<strong>on</strong>al HPC users. These<br />
researchers will be able to co-opt sophisticated cloud services provided by both academia<br />
<strong>and</strong> commercial providers to aid them in their research. In this paper I will showcase two<br />
Biology <strong>Cloud</strong> use cases which offer a number of advantages to users.<br />
IaaS: Web-services Mirrors<br />
The Ensembl project provides a variety of web services which allows researchers to visualise<br />
<strong>and</strong> data-mine genomic data (www.ensembl.org). Ensembl has a world-wide audience <strong>and</strong> is<br />
accessed 24 hours a day. Historically, the web service was hosted in a single UK datacentre.<br />
Whilst this provided fast access to users in the UK <strong>and</strong> Europe, users in Asia <strong>and</strong> the<br />
Americas found that access to the web services was slow, due the large latencies involved<br />
in serving requests across the globe. Single site hosting also made the website vulnerable<br />
to datacentre <strong>and</strong> network outages.<br />
The global, distributed nature of commercial <strong>Cloud</strong> IaaS make them a useful building block<br />
<strong>for</strong> providing world-wide availability <strong>and</strong> reach. Ensembl has used public IaaS providers to<br />
build mirrors of its web services in the United States of America <strong>and</strong> Asia. Not <strong>on</strong>ly has this<br />
massively increased the per<strong>for</strong>mance of the website <strong>for</strong> n<strong>on</strong> <str<strong>on</strong>g>European</str<strong>on</strong>g> users, but it also<br />
provides c<strong>on</strong>tinued availability of service when the UK datacentre is offline.<br />
<strong>Cloud</strong> hosting provides several advantages over hosting in a traditi<strong>on</strong>al co-locati<strong>on</strong> facility.<br />
Installing real hardware in a remote co-locati<strong>on</strong> facility requires time-c<strong>on</strong>suming <strong>and</strong> costly<br />
logistics. Hardware has to be shipped to the facility <strong>and</strong> cleared through customs, <strong>and</strong><br />
staff need to be present <strong>on</strong> site to oversee hardware installati<strong>on</strong> <strong>and</strong> initial provisi<strong>on</strong>ing.<br />
In c<strong>on</strong>trast, provisi<strong>on</strong>ing virtual hardware in a remote cloud IaaS facility can be d<strong>on</strong>e from<br />
any locati<strong>on</strong> with internet access, whilst the “<strong>on</strong>-dem<strong>and</strong>” facilities allow machines to be<br />
provisi<strong>on</strong>ed within a matter of minutes<br />
<strong>Cloud</strong>Scape III - Taking <str<strong>on</strong>g>European</str<strong>on</strong>g> <strong>Cloud</strong> Infrastructure Forward<br />
25<br />
SaaS: Providing In<strong>for</strong>matics services <strong>for</strong> Next-Generati<strong>on</strong><br />
Sequencing (NGS)<br />
SasS provides new opportunities <strong>for</strong> organisati<strong>on</strong>s to provide IT services to researchers.<br />
IT service provisi<strong>on</strong> <strong>for</strong> next-generati<strong>on</strong> sequencing machines is a huge challenge. A single<br />
sequencing instrument can produce approximately a terabyte of raw data per day <strong>and</strong> a<br />
large sequencing study may end up with a total dataset of many hundreds of terabytes.<br />
Dealing with this data is a challenge <strong>for</strong> organisati<strong>on</strong>s of all sizes, whether they are a small<br />
lab with a single machine, or a large sequencing centre with many tens of machines.