23.07.2014 Views

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

Lustre 1.6 Operations Manual

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2<strong>1.6</strong>.1 Network Tunables<br />

With a large number of clients and servers possible on these systems, tuning various<br />

request pools becomes important. We are making changes to the ptllnd module.<br />

Parameter<br />

max_nodes<br />

max_procs_per_node<br />

Description<br />

max_nodes is the maximum number of queue pairs, and, therefore,<br />

the maximum number of peers with which the LND instance can<br />

communicate. Set max_nodes to a value higher than the product of<br />

the total number of nodes and maximum processes per node.<br />

Max nodes > (Total # Nodes) * (max_procs_per_node)<br />

Setting max_nodes to a lower value than described causes <strong>Lustre</strong> to<br />

throw an error. Setting max_nodes to a higher value, causes excess<br />

memory to be consumed.<br />

max_procs_per_node is the maximum number of cores (CPUs),<br />

on a single Catamount node. Portals must know this value to<br />

properly clean up various queues. LNET is not notified directly<br />

when a Catamount process aborts. The first information LNET<br />

receives is when a new Catamount process with the same Cray<br />

portals NID starts and sends a connection request. If the number of<br />

processes with that Cray portals NID exceeds the<br />

max_procs_per_node value, then LNET removes the oldest one to<br />

make space for the new one.<br />

These two tunables combine to set the size of the ptllnd request buffer pool. The buffer pool<br />

must never drop an incoming message, so proper sizing is very important.<br />

Ntx<br />

Credits<br />

Ntx helps to size the transmit (tx) descriptor pool. A tx descriptor is<br />

used for each send and each passive RDMA. The max number of<br />

concurrent sends == 'credits'. Passive RDMA is a response to a PUT<br />

or GET of a payload that is too big to fit in a small message buffer.<br />

For servers, this only happens on large RPCs (for instance, where a<br />

long file name is included), so the MDS could be under pressure in<br />

a large cluster. For routers, this is bounded by the number of<br />

servers. If the tx pool is exhausted, a console error message appears.<br />

Credits determine how many sends are in-flight at once on ptllnd.<br />

Optimally, there are 8 requests in-flight per server. The default<br />

value is 128, which should be adequate for most applications.<br />

Chapter 21 <strong>Lustre</strong> Tuning 21-11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!