National Energy Research Scientific Computing Center
BcOJ301XnTK
BcOJ301XnTK
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
28 NERSC ANNUAL REPORT 2015<br />
ELK Stack<br />
Elastic cluster<br />
Archive syslog data to disk.<br />
Possible custom methods<br />
custom methods<br />
(text or json)<br />
Multiple Logstash<br />
Collectd, power and<br />
environmental data<br />
sent to rabbitmq for<br />
other consumers<br />
Event and time series data<br />
stored here. Master for full<br />
Collectd, power and correlation<br />
or data.<br />
Kibana graphical<br />
interfacefor data<br />
exploration and display<br />
Closed shards that are<br />
archived and can be loaded<br />
later by other elastic cluster<br />
instances. This included Mac<br />
and PC stacks.<br />
The new centralized data<br />
collection infrastructure<br />
at NERSC.<br />
logstash follower<br />
for text files<br />
generated by<br />
other programs<br />
rsyslog<br />
collectd<br />
Per system forwarding logstash<br />
logstash follower for<br />
text files generated by<br />
other programs<br />
json inputs sent<br />
to logstash to<br />
enter into Elastic<br />
Power and environmental<br />
data. Direct from sensors<br />
or a collector.<br />
text and json format<br />
LDMS<br />
Procmon<br />
amqp inputs<br />
Rabbitmq cluster<br />
Other output methods can be added<br />
such as mysql, mongoDB, etc.<br />
This can be a subset of the data<br />
collectedor the full data.<br />
This data can also be shipped to other<br />
centers either over ssl or not.<br />
Redis DB for Lustre and other data<br />
for immediate access<br />
Redis DB for power and cooling<br />
limited to under 5 minutes<br />
Archive collectd and other time<br />
series data to hdf5 format<br />
entire facility. This process involves collecting information on hosts or sensors across the center.<br />
This data is then passed into the collection environment via RabbitMQ—an open-source message<br />
broker software—which sends and stores the data to the Elasticsearch framework. RabbitMQ is<br />
capable of passing data to many targets, and we leveraged this feature to send the same data to Redis<br />
(an open-source, in-memory data structure store) for real-time access to some data, and to Freeboard<br />
(an open-source dashboard) to display real-time data without any storage ability. We anticipate using<br />
this method to send data to multiple targets to address future collection issues as they arise.<br />
All of the acquired sensor data is stored in a commonly accessible format—few changes need to be<br />
made when collecting new data or analyzing the data—that is centrally accessible to most staff<br />
without transferring the data from one storage method to another or re-collecting the data in a<br />
different configuration. This new method is already providing potential data analysis opportunities<br />
not previously possible.<br />
Compact Cori: An Educational Tool for HPC<br />
In 2015 NERSC developed an HPC educational tool designed to inspire students of all ages,<br />
including middle and high-school levels, and demonstrate the value of HPC to the general public.<br />
A team of NERSC staff and student interns (two undergraduates and one high school student)<br />
created “Compact Cori,” a micro-HPC system based on Intel NUC (next unit of computing)<br />
hardware and a modern software stack based on python, MPI4PY, REST APIs, WebGL, Meteor<br />
and other modern web technologies.<br />
The hardware design was inspired by the “Tiny Titan” micro-HPC developed at Oak Ridge <strong>National</strong><br />
Laboratory. The hardware and software stack were chosen to be accessible and to represent modern<br />
programming techniques and paradigms with richer, more accessible educational value than<br />
traditional HPC languages like C and FORTRAN. The software stack allows students to explore<br />
several novel design points for HPC.