Annual Scientific Report 2015
EMBL_EBI_ASR_2015_DigitalEdition
EMBL_EBI_ASR_2015_DigitalEdition
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Systems and Networking<br />
The Systems Infrastructure Team manages EMBL-EBI’s compute servers,<br />
storage, virtualisation, data centres and networking, including managing the<br />
campus Internet connection. The team works closely with all project groups,<br />
maintaining and planning their specific infrastructures, and plays a key role in<br />
managing the technical frameworks supported by the UK Government’s Large<br />
Facilities Capital Fund.<br />
Major achievements<br />
In In <strong>2015</strong> our team maintained EMBL-EBI’s growing<br />
compute infrastructure, which is now based on<br />
approximately 30 000 Central Processing Unit (CPU)<br />
cores. As our user base is very broad, with wide-ranging<br />
technical requirements, we organised two training days<br />
for approximately 70 members of staff to get up to speed<br />
with EMBL-EBI clusters. To support internal users<br />
working in a distributed computing environment, we<br />
built a new, 36-node Apache Hadoop cluster.<br />
We maintained EMBL-EBI’s secure, high-performance<br />
GridFTP data-transfer protocol in <strong>2015</strong>, and as part<br />
of this work we completely rebuilt and automated<br />
the installation process. This makes it much simpler<br />
for users to share large datasets on our networks. In<br />
addition, we installed MongoDB machines to provide an<br />
alternative database platform for our internal users.<br />
Security is paramount to the integrity of the controlledaccess<br />
European Genome-phenome Archive (EGA),<br />
which EMBL-EBI jointly develops with the CRG in<br />
Barcelona. Our team re-implemented the vault firewall<br />
for the EGA in <strong>2015</strong>, using a software approach. This<br />
improves the performance of the system substantially.<br />
Our team put considerable efforts into maximising<br />
flexibility and optimising compute performance in <strong>2015</strong>.<br />
We rebuilt our compute clusters’ General Parallel File<br />
System completely, improving high-speed file access<br />
for multiple applications executing on different nodes<br />
within the clusters at the same time. We also continued<br />
to develop Elastic clusters, and plan to complete this<br />
work in 2016.<br />
Given the scale of the storage challenge for EMBL-EBI,<br />
our team’s focus is always on identifying and<br />
implementing the right technologies to maximise<br />
efficiency. For example, Docker containers create a<br />
wrapper for an application, with all of its dependencies,<br />
which provides a more standardised and softwaredevelopment-friendly<br />
unit. Importantly, they also<br />
Detail from the Genome Campus data centre, with Dawn Johnson, data<br />
centre engineer..<br />
guarantee that the application always runs the same,<br />
regardless of its environment. In <strong>2015</strong> we enabled<br />
internal users to run these containers in our clusters,<br />
which resulted in improved efficiency and more<br />
consistent performance.<br />
Our team is responsible for maintaining the raw data<br />
storage capacity of the institute, which exceeded 60<br />
Petabytes at the end of <strong>2015</strong>. A substantial portion<br />
of this represents raw sequence archives. Although<br />
159<br />
<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>