22.08.2016 Views

Annual Scientific Report 2015

EMBL_EBI_ASR_2015_DigitalEdition

EMBL_EBI_ASR_2015_DigitalEdition

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Systems and Networking<br />

The Systems Infrastructure Team manages EMBL-EBI’s compute servers,<br />

storage, virtualisation, data centres and networking, including managing the<br />

campus Internet connection. The team works closely with all project groups,<br />

maintaining and planning their specific infrastructures, and plays a key role in<br />

managing the technical frameworks supported by the UK Government’s Large<br />

Facilities Capital Fund.<br />

Major achievements<br />

In In <strong>2015</strong> our team maintained EMBL-EBI’s growing<br />

compute infrastructure, which is now based on<br />

approximately 30 000 Central Processing Unit (CPU)<br />

cores. As our user base is very broad, with wide-ranging<br />

technical requirements, we organised two training days<br />

for approximately 70 members of staff to get up to speed<br />

with EMBL-EBI clusters. To support internal users<br />

working in a distributed computing environment, we<br />

built a new, 36-node Apache Hadoop cluster.<br />

We maintained EMBL-EBI’s secure, high-performance<br />

GridFTP data-transfer protocol in <strong>2015</strong>, and as part<br />

of this work we completely rebuilt and automated<br />

the installation process. This makes it much simpler<br />

for users to share large datasets on our networks. In<br />

addition, we installed MongoDB machines to provide an<br />

alternative database platform for our internal users.<br />

Security is paramount to the integrity of the controlledaccess<br />

European Genome-phenome Archive (EGA),<br />

which EMBL-EBI jointly develops with the CRG in<br />

Barcelona. Our team re-implemented the vault firewall<br />

for the EGA in <strong>2015</strong>, using a software approach. This<br />

improves the performance of the system substantially.<br />

Our team put considerable efforts into maximising<br />

flexibility and optimising compute performance in <strong>2015</strong>.<br />

We rebuilt our compute clusters’ General Parallel File<br />

System completely, improving high-speed file access<br />

for multiple applications executing on different nodes<br />

within the clusters at the same time. We also continued<br />

to develop Elastic clusters, and plan to complete this<br />

work in 2016.<br />

Given the scale of the storage challenge for EMBL-EBI,<br />

our team’s focus is always on identifying and<br />

implementing the right technologies to maximise<br />

efficiency. For example, Docker containers create a<br />

wrapper for an application, with all of its dependencies,<br />

which provides a more standardised and softwaredevelopment-friendly<br />

unit. Importantly, they also<br />

Detail from the Genome Campus data centre, with Dawn Johnson, data<br />

centre engineer..<br />

guarantee that the application always runs the same,<br />

regardless of its environment. In <strong>2015</strong> we enabled<br />

internal users to run these containers in our clusters,<br />

which resulted in improved efficiency and more<br />

consistent performance.<br />

Our team is responsible for maintaining the raw data<br />

storage capacity of the institute, which exceeded 60<br />

Petabytes at the end of <strong>2015</strong>. A substantial portion<br />

of this represents raw sequence archives. Although<br />

159<br />

<strong>2015</strong> EMBL-EBI <strong>Annual</strong> <strong>Scientific</strong> <strong>Report</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!