T-BLADE 2 TECHNICAL GUIDEBOOK - T-Platforms
T-BLADE 2 TECHNICAL GUIDEBOOK - T-Platforms
T-BLADE 2 TECHNICAL GUIDEBOOK - T-Platforms
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
T-Blade 2 Technical GuideBook<br />
www.t-platforms.com
Disclaimer:<br />
This document is provided for informational purposes only<br />
and may contain inaccuracies. For up-to-date information<br />
please contact your T-<strong>Platforms</strong> representative.<br />
T-Blade Technical Guide Book, rev. A0 .
Table of contents<br />
1. General overview ..........................................................................................................................................................................4<br />
2. Compute Module ........................................................................................................................................................................5<br />
2.1. Compute module board ....................................................................................................................................................5<br />
2.1.1. Board geometry and dimensions ................................................................................................................................................................5<br />
2.1.2. Compute node featureset ...............................................................................................................................................................................6<br />
2.1.3. Compute node special features ....................................................................................................................................................................7<br />
2.2. Compute module power and connectivity features ........................................................................................7<br />
2.3. Compute module heat sink..............................................................................................................................................8<br />
3. InfiniBand Switch Modules ..................................................................................................................................................9<br />
4. Management and Switch Module (MSM) ...........................................................................................................10<br />
4.1. Management processor block ....................................................................................................................................10<br />
4.2. Gigabit Ethernet switches block ...............................................................................................................................11<br />
4.3. Special networks block (FPGA) ...................................................................................................................................11<br />
4.4. Global clock distribution block ..................................................................................................................................11<br />
4.5. Management and Switch Module Connectors.................................................................................................12<br />
5. Backplane ........................................................................................................................................................................................13<br />
6. Blade chassis .................................................................................................................................................................................14<br />
6.3. Cooling subsystem .............................................................................................................................................................15<br />
7. Other features ...............................................................................................................................................................................16<br />
7.1. Power-on procedure ........................................................................................................................................................16<br />
7.2. Emergency Shutdown procedure .............................................................................................................................16<br />
8. Cluster Management and Monitoring .....................................................................................................................16<br />
9. Basic infrastructure requirements .................................................................................................................................17<br />
9.1. Electricity ...................................................................................................................................................................................17<br />
9.2. Cooling .......................................................................................................................................................................................17<br />
9.3. Cabinet infrastructure .......................................................................................................................................................17<br />
9.4. Floors and layout ..................................................................................................................................................................17<br />
10. Operating and file systems compatibility ...........................................................................................................17<br />
11. System specification .............................................................................................................................................................18<br />
T-Blade 2 Enclosure .........................................................................................................................................................................................................18<br />
T-Blade 2 Compute Node ...........................................................................................................................................................................................18<br />
T-Blade 2 external ports and networks ...............................................................................................................................................................18<br />
Appendix 1. Compute Module diagram ......................................................................................................................19<br />
Appendix 2. Intra-chassis QDR IB links diagram .....................................................................................................20<br />
Appendix 3. Management and Switch Module diagram ..............................................................................21<br />
T-Blade Technical Guide Book, rev. A0 .
1. General overview<br />
Welcome to T-Blade , the leading compute density solution from T-<strong>Platforms</strong> engineered<br />
specifically for the world’s largest x86 HPC installations.<br />
Introduced in 009, T-Blade creates an unmatched solution for customers with the most<br />
demanding HPC needs, and perfectly complements the other elements of T-<strong>Platforms</strong>’ HPC<br />
portfolio, including the Cell BE-based PowerXCell 8 1RU compute nodes and the 5RU 10-node<br />
T-Blade I system.<br />
The driving force behind the T-Blade 2 design is the use of industry-standard compute,<br />
interconnect, and management architectures packaged in an HPC-optimized manner to deliver<br />
just the right mix of performance, density, redundancy, and manageability.<br />
T-Blade 2: designed for density, performance, and reliability:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
7U-high chassis for mounting into standard 19” rack<br />
16 hot-pluggable compute modules with 64 Intel Xeon 56xx series CPUs; each compute<br />
module contains 2 dual-processor compute nodes on a single board<br />
2 integrated 36-port QDR InfiniBand switches<br />
Integrated management module with GbE switches and special HPC features<br />
Air cooling using hot-swappable high performance fans<br />
11KW, N+1 redundant power supplies<br />
Although T-Blade 2 features standard technologies, T-<strong>Platforms</strong> has focused its engineering and design expertise on overcoming the scalability<br />
bottlenecks that so often plague commodity clusters based built with off-the-shelf components. This bottleneck is largely caused by poor collective<br />
communications efficiency, which needs special support at the hardware level to maintain excellent applications performance with high nodecount<br />
systems. T-Blade 2 features dedicated global barrier and global interrupt networks, ensuring effective application scalability even on<br />
systems containing many thousands of nodes.<br />
The global barrier network supports fast synchronization between the tasks of a large-scale application, while the global interrupt network<br />
significantly reduces the negative impact of OS jitter by synchronizing process scheduling over the entire system. As a result, processors<br />
communicate much more efficiently with each other, enabling high scalability even for the most demanding parallel applications.<br />
The two networks are managed by an FPGA chip integrated into a dedicated management module (see Appendix) that monitors all subsystems<br />
and components, enabling remote system management, emergency shutdowns, etc.<br />
To ensure smooth and fast data transfer, and reduce network congestion in high node-count systems, T-Blade 2 incorporates extra external<br />
InfiniBand ports, providing an impressive 1.6Tb/sec of bandwidth.<br />
T-<strong>Platforms</strong> has engineered the T-Blade 2 platform for reliability as well as performance. The system features neither hard discs nor cables inside<br />
the enclosure, dramatically decreasing outages caused by mechanical failure within a node. Reliability is further enhanced within each enclosure<br />
with hot-swappable, N+1 redundant power supplies and cooling fans.<br />
Petascale computing demands deep changes in system software to efficiently support application scalability. T-Blade 2 is optionally supplied<br />
with a comprehensive ‘all-in-one’ system software stack that includes an optimized Linux OS core and system libraries, as well as all necessary<br />
management and monitoring software components. This integrated software stack provides an out-of-the-box experience, reducing system<br />
installation time and administration costs.<br />
The T-Blade Linux core provides specialized support to the global barrier and global interrupt networks, an optimization that greatly speeds up<br />
inter-node communications over more traditional designs. And our new system management software, part of Clustrx OS, ensures seamless<br />
scalability up to 12,000 compute nodes with near-real time monitoring capability. T-Blade 2 also features aggressive power saving technology, and<br />
support of topology-driven resource allocation which improves memory usage and accelerates real applications performance.<br />
Best of all, the T-Blade 2 HPC platform is field-proven: it is the primary building block of the 420TF Lomonosov cluster deployed at Moscow State<br />
University. This system was recognized at number 12 on the November 2009 TOP500 list of the world’s fastest computers and is considered to<br />
be the largest supercomputer in Eastern Europe.<br />
T-Blade Technical Guide Book, rev. A0 .
2. Compute Module<br />
A T-Blade 2 compute module consists of a single printed circuit board (PCB) with two separate dual-processor compute nodes that are mounted<br />
on a specially-engineered heat sink that provides the necessary cooling for the entire module. Each compute module board comes with memory<br />
modules preinstalled (Figure 1), in 16-board bulk packages.<br />
2.1. Compute module board<br />
Guides<br />
Figure 1.<br />
System<br />
board tray<br />
Backplane connector<br />
DDR3 memory modules<br />
2.1.1. Board geometry and dimensions<br />
Intel Xeon 5500 Series<br />
QDR IB controllers<br />
The geometry of a compute module PCB is shown in Figure 1. This airflow-optimized board is attached for reliability both to the heat sink and to<br />
a board tray using mounting holes. Each board attaches to the system backplane using a power and signal connector mounted on the PCB, and<br />
is supported by mechanical guides located on each tray.<br />
T-Blade Technical Guide Book, rev. A0 .<br />
Heatsink<br />
DC-DC<br />
converters<br />
5
2.1.2. Compute node featureset<br />
Each of the two compute nodes on a PCB features:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
Two Intel Westmere (56xx series) CPUs, each with a maximum TDP of 95W and clock speeds up to 2.93GHz<br />
12GB memory per compute node in a 3-channel design that uses unbuffered ECC DDR3-1333 RAM<br />
6GB or 12GB of memory per each CPU socket<br />
4 memory modules (Figure 2) in a proprietary mezzanine card design with 27 memory chips (9 chips in each of the 3<br />
channels)<br />
The Intel Tylersburg 24D+ ICH10 chipset, with QPI links providing 6.4 Gtransfers/sec (3.2GHz) between CPUs on a board<br />
A single-port Mellanox ConnectX QDR InfiniBand chip connected to the Tylersburg chipset via PCIe 2.0 x8 link<br />
A single-port GbE controller connected to ICH10<br />
A USB controller for miniSD or microSD flash, and a connector for the flash card<br />
Baseboard management controller (BMC) and a separate single-port GbE controller for the BMC connection (refer to the section<br />
“Compute node management and monitoring” for a discussion of the BMC and the management and monitoring feature set)<br />
One serial port, connected to the BMC<br />
Five GPIO signals from ICH10 chip going through the backplane to the FPGA chip on the blade management module for HPCspecific<br />
purposes (described in the section “Compute node special features”)<br />
A centralized storage design with no separate SAS/SATA storage controller; the links from the integrated SATA controller are not<br />
routed, and the appropriate BIOS portion is disabled<br />
Two microbuttons for POWER and RESET, placed on the rear edge of the compute module<br />
Figure 2. DDR3 Memory Module<br />
6 T-Blade Technical Guide Book, rev. A0 .<br />
See Compute node Diagram in Appendix 1
2.1.3. Compute node special features<br />
■<br />
■<br />
■<br />
It is possible to use either an internal or external clock for the system. Clock selection is implemented using a special IDT5T9GL02<br />
model (or similar) chip, and the clock select pin on that chip goes to the DIP switch placed on the rear bottom part of the PCB<br />
to provide manual switching of the clocking mode. There is also an LED placed on the rear edge of each compute module to<br />
indicate whether the system is using the internal or external clock.<br />
Two interrupt pins on the ICH10 chip are routed through the backplane to the FPGA chip on the management module; the FPGA<br />
chip is discussed in greater detail in a separate document.<br />
5 GPIO pins on the ICH10 chip on each compute node are routed through the backplane to the FPGA chip on the management<br />
module.<br />
Compute node management and monitoring is based on an industry standard baseboard management controller, or BMC. The BMC is accessed<br />
through the management module using a dedicated GbE interface to eliminate the possibility of interference with application performance. The<br />
BMC provides the following hardware monitoring features:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
Temperature for each CPU core<br />
Memory temperature – one temperature sensor per memory block dedicated to each CPU<br />
Two additional temperature sensors on CPU voltage regulator modules<br />
The voltage at each CPU core<br />
3.3V, 5V, 12V and 5Vsb voltages<br />
Memory and PCI error conditions; for memory errors the exact memory channel is reported to aid in error resolution<br />
InfiniBand and GbE counters (optional)<br />
The following management capabilities are supported:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
Remote power on/power off/reset<br />
Remote KVM-over-IP and full SOL console access starting from the boot process, including the POST procedure and BIOS<br />
setup<br />
Compute node BIOS update and NVRAM save/restore without involving any OS-dependent utilities<br />
Boot select priority<br />
Remote BMC firmware upgrade<br />
All BMC features can be accessed remotely via Telnet and SSH using key authentication for added security.<br />
2.2. Compute module power and connectivity features<br />
■<br />
All power and connectivity for a compute module is provided via backplane power and signal connectors. The power connector<br />
provides -48V DC input with a minimum of 18A current for every -48V channel.<br />
The following signals are routed to the backplane via the signal connector:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
2 QDR InfiniBand 4x links (1 per compute node)<br />
4 GbE links (1 GbE + 1 BMC GbE connection per compute node)<br />
2 external clock links (1 per compute node); it is possible to use just one external clock link by using a clock extender on the<br />
compute module<br />
4 remote interrupt links (2 per compute node)<br />
10 global barrier network links (5 GPIO links per compute node)<br />
T-Blade Technical Guide Book, rev. A0 .
2.3. Compute module heat sink<br />
The compute board is attached to a tray and a specially-designed heat sink that covers the entire board. The heat sink also serves as a rail for<br />
the compute module boards that helps them slide reliably into compute module bays of the enclosure. The geometry of the heat sink is shown in<br />
figure 3.<br />
The heat sink base plate has variable thickness according to the component height on the compute module board. A rubber material with low<br />
thermal resistance provides good thermal contact between the components and the heat sink.<br />
The heat sink design and choice of material are based on extensive thermal analysis performed by T-Services to achieve an optimal compute<br />
node design. This engineering resulted in a final design that is not only more efficient but also lighter than standard heat sinks, reducing the overall<br />
T-Blade 2 system weight below 160kg.<br />
There is also a retention mechanism mounted on the rear edge of the heat sink for plugging/unplugging the compute module (Figure 3).<br />
Figure 3.<br />
Figure 4.<br />
Air current velocity and distribution between the board<br />
and heat sink for 60CFM Fan design variant.<br />
Heatsink Backplane<br />
connector<br />
PCB<br />
8 T-Blade Technical Guide Book, rev. A0 .<br />
Retention<br />
Mechanism<br />
Figure 5.<br />
Temperature distribution at the base of aluminum-based<br />
heat sink design variant.
3. InfiniBand Switch Modules<br />
There are two non-redundant integrated InfiniBand switch modules provided with the T-Blade 2 system, each located at the back of the blade<br />
chassis. Each InfiniBand switch module is based on Mellanox’s InfiniScale IV QDR InfiniBand switch silicon. A total of 36 QDR InfiniBand 4x ports<br />
is provided: 16 QDR ports are routed through the backplane to the compute modules, and 20 QDR ports go to the rear panel to provide additional<br />
flexibility when connecting centralized storage or heterogeneous nodes (see Appendix 2). Each blade module port is connected sequentially, with<br />
ports 1-16 connected to one switch, and ports 17-36 to the other. The intra-chassis switch topology is two independent switches, while the interchassis<br />
topology can be any option supported by the InfiniBand subnet manager.<br />
Figure 6. QDR InfiniBand Modules<br />
■<br />
■<br />
■<br />
The external panel ports are implemented using QSFP connectors.<br />
Remote control and monitoring of the InfiniBand switch module is implemented via an I C link going through the backplane to<br />
the management module.<br />
The InfiniBand switch module is powered via backplane power connector using -48V DC input.<br />
The rear panel of the IB Switch module features:<br />
■<br />
■<br />
■<br />
■<br />
20 QSFP connectors for InfiniBand ports<br />
“Link initialized” and “Link active” LEDs for each port<br />
Two mounting handles (retention mechanism, Figure 1)<br />
Ventilation holes<br />
T-Blade Technical Guide Book, rev. A0 .<br />
9
4. Management and Switch Module (MSM)<br />
The MSM provides the following functions:<br />
■<br />
■<br />
■<br />
■<br />
Blades and chassis management and monitoring<br />
Gigabit Ethernet switching<br />
Support for the global barrier and global interrupt networks<br />
Global clock distribution<br />
The MSM is a 1U-high device located at the rear side of the blade enclosure. It consists of four<br />
functional blocks:<br />
■<br />
■<br />
■<br />
■<br />
Management processor block (4.1)<br />
Gigabit Ethernet switches block (4.2)<br />
Special networks block (4.3)<br />
Global clock distribution block (4.4)<br />
For large-scale installations, the management module is connected to the specialized external global clock distribution switch to reduce OS<br />
jitter, improving application performance and predictability. The management module continuously collects software and hardware events, and<br />
consolidates them for reporting to the cluster management node and a ruggedized Black Box appliance. Combined with Clustrx management<br />
services, this two-stage process gives near real-time monitoring capability of 12,000 events per second using a single 2-way management node<br />
(see Section 8 for more information).<br />
4.1. Management processor block<br />
The management processor block features the following components:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
Low voltage Intel Yonah-class CPU<br />
Intel 3100 MICH<br />
Single DDR2 memory slot populated with 2GB ECC module<br />
Dual-port GbE controller on a PCI-E bus (Intel 82576)<br />
BMC (AST2050)<br />
SIO chip<br />
40GB SSD drive<br />
The following interfaces are provided:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
1 GbE interface from the Intel 82576 to the auxiliary network switch<br />
1 GbE interface from the Intel 82576 to the management network switch<br />
1 10/100 Ethernet interface from the BMC sharing one of the RJ45 connectors on the rear panel. This makes the BMC and<br />
management module CPU available remotely in case of issues with GbE switches<br />
1 PCI-E interface to the special networks block (FPGA)<br />
1 video interface from the BMC to the D-SUB connector on the rear panel<br />
2 USB interfaces to the connectors on the rear panel<br />
1 USB interface to the LCD module (via backplane connector)<br />
1 RS-232 interface to the RJ45 connector on the rear panel<br />
1 RS-232 interface to the GbE switches management processor (Marvel 88F5181)<br />
I C bus interface going to the backplane (for fans, PSUs, and InfiniBand switch management). This I C bus is shared by both the<br />
BMC and the management module CPU, and enables the BMC to power individual PSUs up and down<br />
See Appendix for more information<br />
10 T-Blade Technical Guide Book, rev. A0 .<br />
Figure 7.<br />
Management and Switch Module
4.2. Gigabit Ethernet switches block<br />
The Gigabit Ethernet switches block consists of two separate switches:<br />
■<br />
■<br />
Auxiliary network switch, used for generic node access (SSH, job management, etc.)<br />
Management network switch, used for access to BMCs<br />
Each of the switches contains two Marvell DX270 chips stacked together. Both switches are managed via a Marvell 88F5181 processor running<br />
Linux and custom firmware.<br />
The auxiliary network switch has the following connections:<br />
■<br />
■<br />
■<br />
■<br />
32 GbE links going through the backplane to the compute nodes<br />
1 GbE link going to the network controller of the management CPU (Intel 82576)<br />
1 GbE link going to the Marvel 88F5181<br />
2 10G uplinks going to the XFP connectors on the rear panel of the management module<br />
The management network switch has the following connections:<br />
■<br />
■<br />
■<br />
■<br />
32 GbE links routed through the backplane to the BMCs of the compute nodes<br />
1 GbE link routed to the network controller of the management CPU (Intel 82576)<br />
1 GbE link routed to the Marvel 88F5181<br />
2 GbE uplinks routed to the RJ45 connectors on the rear panel of the management module<br />
One of the RJ45 connectors on the rear panel can also be used to provide remote access to the BMC of the management module. The Marvel<br />
88F5181 processor provides two RS-232 interfaces to the CPU of the management module.<br />
See Appendix 3 for more information.<br />
4.3. Special networks block (FPGA)<br />
The special networks block is used to implement the global barrier and global interrupt networks. It is based on a Xilinx XC5VLX50T FPGA<br />
(XC5VLX50T-3FFG665C), and also contains an FPGA boot flash device.<br />
The following interfaces are provided:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
PCI-E link from the FPGA to the Intel 3100 MICH chip<br />
32x5 (160 total) single-wire links from the FPGA to the GPIO pins of the ICH10 chips on the compute nodes<br />
2 single-wire links from the FPGA to the IRQ pins of the ICH10 chips on the compute nodes (split into 32x2 links on the<br />
backplane)<br />
5 differential links from the FPGA to the RJ45 connectors on the rear panel for the global barrier network<br />
2 differential links from the FPGA to the RJ45 connectors on the rear panel for the global interrupt network<br />
4 differential links from MGTs on the FPGA to the RJ45 connector on the rear panel (reserved for future use).<br />
See Appendix for more information.<br />
4.4. Global clock distribution block<br />
Global clock distribution is used to adjust the frequency of CPUs and memory on all compute nodes. Clock distribution may be provided via a clock<br />
source on the management module, or by an external clock signal. For better signal quality, differential signaling should be used for the external<br />
clock; the differential clock input is converted to a single-wire on the management module. The choice of using an external or embedded clock<br />
source is made by the clock selector chip, controlled using the switch on the rear panel of the management module. The resulting clock signal<br />
goes to the backplane and then is split to provide the clock source for all compute nodes.<br />
See Appendix 3 for more information.<br />
T-Blade Technical Guide Book, rev. A0 .<br />
11
4.5. Management and Switch Module Connectors<br />
The following connectors are present on the management module:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
The backplane signal connector provides the following signals:<br />
■ 32 GbE links from compute nodes to the auxiliary network switch<br />
■ 32 GbE links from the BMCs of the compute nodes to the management network switch<br />
■ I C link to the Intel 3100 MICH chip and the management node BMC<br />
■ 32x5 GPIO links from the FPGA<br />
■ interrupts links from the FPGA<br />
■ Сlock signal to the compute nodes<br />
■ USB link to the LCD module<br />
Backplane power connector that provides main 48V power input and 5Vsb power input to the BMC<br />
Rear panel connectors:<br />
■ 2 XFP 10GbE connectors for auxiliary network switch uplinks (ETH1 & ETH2)<br />
■ 2 RJ45 connectors for management network switch uplinks (ETH3 & ETH4)<br />
■ One port is GbE<br />
■ Another one is 100Mb and shared among few devices<br />
1 RJ45 100Mb connector for global barrier network links 1-4 (FPGA block) (SPN1)<br />
1 RJ45 100Mb connector for global barrier network link 5, global interrupt network links 1 and 2, and an external clock input<br />
(FPGA block) (SPN2)<br />
1 RJ45 100Mb connector for 4 differential signals, reserved for future use (FPGA block) (RIO)<br />
1 D-SUB connector for video output (VGA)<br />
2 USB connectors (USB)<br />
1 RJ45 connector for RS-22 port (serial console) (Serial).<br />
ETH1<br />
ETH2<br />
ETH3<br />
ETH4<br />
Figure 8.<br />
Rear panel connectors of management and switch module.<br />
VGA RIO<br />
SPN2 SPN1<br />
12 T-Blade Technical Guide Book, rev. A0 .<br />
SERIAL<br />
USB
5. Backplane<br />
The backplane is used to provide power to the compute modules, the InfiniBand switch modules, and the management and switch module (MSM),<br />
and to provide connectivity between these system components. It also provides hot plug functionality to the compute modules. The backplane<br />
delivers electrical signals rated at up to 13kW of power with standard system signaling using high-frequency Infiniband QDR links (32 links @<br />
40Gigatransfers).<br />
The backplane has very few active components; those that are included are responsible only for global clock distribution logic. It is a low-profile<br />
board, optimized for efficient airflow, and based on a 24-layer PCB design. The following connectors are on the rear side of the backplane:<br />
■<br />
■<br />
■<br />
16 combined power/signal connectors to the compute nodes<br />
2 combined power/signal connectors to the InfiniBand switch modules<br />
2 combined power/signal connectors to the MSM<br />
The front side of the backplane is used for power input. The following components are placed on the front side of the backplane:<br />
■<br />
■<br />
■<br />
I C connectors to the PSU’s<br />
I C connectors to the fan modules<br />
USB connector to the LCD panel<br />
The overall backplane geometry is shown in Figure 9.<br />
Figure 9. System backplane, rear view.<br />
Management Module Interface<br />
2 Infiniband Switch interfaces<br />
T-Blade Technical Guide Book, rev. A0 .<br />
Blade module interfaces<br />
13
6. Blade chassis<br />
The T-Blade 2 chassis accommodates the following field-replaceable unit (FRU) components:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
16 hot-plug compute modules<br />
2 InfiniBand switch modules<br />
1 management and switch module<br />
10+2 hot-swappable 80mm fan modules<br />
6 hot-swappable power supplies.<br />
There is also a button-controlled, 4-line LCD panel connected through the backplane to the USB port on the blade management module. There is<br />
no main power switch on the blade enclosure front panel, nodes are powered up remotely or using microbuttons to power on/off/reset individual<br />
blades.<br />
The dimensions of the chassis:<br />
■<br />
■<br />
■<br />
Height: 310mm (7U)<br />
Width (not including the rails and mounting brackets): 430mm, fits a standard 19” rack<br />
Depth: 860 mm<br />
The chassis can be mounted into a standard 19” rack using fixed (non-sliding) rails, and there are mounting brackets in the front side of the<br />
chassis that can be screwed to vertical rails in the rack. T-<strong>Platforms</strong> recommends that customers install an empty chassis into the rack cabinet,<br />
and then populate it with the modules. A fully populated T-Blade 2 chassis has an approximate weight of 153kg, and caution is advised when<br />
handling it.<br />
10 HS fans<br />
LCD panel<br />
2 HS fans<br />
6 Power supplies<br />
Figure 10. Front section view Figure 11. Rear section view<br />
Figure 12. Extraction of Compute Module<br />
AC power<br />
14 T-Blade Technical Guide Book, rev. A0 .<br />
16 twin blade bays<br />
IB QDR Switches<br />
Figure 13. Extraction of IB Switch<br />
Management module
6.2. Power subsystem<br />
The T-Blade 2 power subsystem is comprised of six power supplies. They are connected through six PSU Interface boards to the T-Shaped<br />
Power Distribution board which connects the hot-pluggable power supplies (and system fans) to the system backplane. The power supplies and<br />
fans are the only off-the-shelf components in a T-Blade 2 system; for more information on the T-Blade 2 system assembly, please refer to the<br />
document “T-Blade 2 system layout and components.”<br />
The 2725W Lineage Power-made rectifier (model CP2725AC54Z)<br />
is a compact 1RU design (Figure 14). As the integral power<br />
distribution system of T-Blade 2, it features a front-to-back airflow<br />
rectifier, an RS485 interface, redundant I C links, PFC, and both<br />
under/over voltage and thermal protection.<br />
For more information about the PSU, please follow this link:<br />
http://www.lineagepower.com/BinaryGet.aspx?ID=f75abb6f-abcf-<br />
4835-b811-f3b6dabd98b5<br />
6.3. Cooling subsystem<br />
T-Blade 2 uses front-to-back airflow cooling, zoned into upper and lower airflow sections within the blade enclosure. Extensive engineering and<br />
design work was invested in the cooling design of T-Blade 2, with both thermal simulations and physical tests conducted to ensure sufficient<br />
system cooling at constant peak loads.<br />
Compute module cooling is provided via ten hot-swappable 80mm<br />
axial fans (Figure 15). The fans have connectors for DC power, speed<br />
monitoring, and PWM-based speed control. Each fan is placed into a<br />
separate hot-swappable module, and there are louvers in the chassis<br />
to prevent backward airflow in the case of a fan failure or removal. Fan<br />
modules come equipped with a small board that holds the fan controller.<br />
This controller is connected through the backplane via the I C interface to<br />
the blade management module.<br />
There are two additional 80mm fans below the LCD panel to provide<br />
cooling for the InfiniBand switch modules and the blade management<br />
module. Each of the two fan clusters is N+1 redundant, and uses identical<br />
high performance 14,000 RPM fans. To increase their operational lifetime<br />
they are housed at the front of the enclosure which lets them operate in a<br />
close-to-ambient air temperature range.<br />
T-Blade Technical Guide Book, rev. A0 .<br />
Figure 14. System Power supplies<br />
Figure 15. System fans<br />
15
7. Other features<br />
7.1. Power-on procedure<br />
After connecting the blade system to the power source, the power supplies enter stand by mode, providing only 5Vsb power to the blade<br />
management module. The power supplies are then switched on using a remote connection to the BMC, or by pressing the power button. At<br />
this point the Management and Switch module, the InfiniBand switch modules, and the BMCs on the compute modules begin their initialization<br />
routines. After initialization is complete, the compute modules are turned on using a remote connection to their BMCs.<br />
7.2. Emergency Shutdown procedure<br />
The T-Blade 2 system has Emergency Shutdown functionality implemented in the management module (MSM). Depending upon the firmware<br />
version, either an individual blade module or entire enclosure can automatically be powered down in case of a critical management event.<br />
Emergency shutdown helps to avoid physical damage to system silicon.<br />
As a reminder, each blade can also be manually switched on, off, and reset using two microbuttons located on the rear edge of blade module PCB.<br />
8. Cluster Management and Monitoring<br />
The advanced monitoring and management capabilities of the T-Blade 2 are implemented at a clusterwide<br />
level. The optional Clustrx OS TP Edition and its Clustrx Watch monitoring suite are tightly integrated<br />
with the T-Blade system and its management module, along with standalone command line utilities for<br />
advanced scripting. Clustrx is Petascale-ready high performance computing OS, developed by T-Massive<br />
Computing, part of T-<strong>Platforms</strong> group. The Clustrx OS management subsystem requires a console system<br />
and single management node, and provides the following functionality:<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
■<br />
Automated cluster deployment, using one console with single installation DVD media<br />
CLI- and GUI-based access<br />
Optimized such that cluster installation and tuning require only run-of-the-mill system administration<br />
skills<br />
The installation GUI reduces the time of basic installation and tuning of the cluster from installation to Linpack test readiness<br />
down to 2 hours (for systems with a standard topology, not including storage system installation time)<br />
The monitoring system tracks both the compute and infrastructure cluster subsystems (except for Ethernet and InfiniBand<br />
switches at the time of writing)<br />
The monitoring system can be configured to deliver notifications to systems administrators<br />
The management subsystem reacts automatically with precise remedies to correct abnormal operation, including automated<br />
equipment power down.<br />
Monitoring latency is minimized to ensure on-time delivery of corrective measures, with up to 150 events per second gathered<br />
from individual compute nodes (300-500 events per second targeted in the near future)<br />
When combined with Clustrx CNL (Compute Node Linux), the Clustrx monitoring and management suite is an exaFLOPS-ready<br />
operating environment covering L3 clusters with up to 12,000 compute nodes (support for L4 clusters of up to 210,000 compute<br />
nodes is planned)<br />
Supports a variety of compute node operating systems; currently validated with Clustrx OS (future support for RHEL, SUSE,<br />
Windows, and others is planned)<br />
A variety of parallel file systems are supported, including Lustre and Panasas<br />
It is possible to use some other popular management packages with the T-Blade 2 systems, as they are IPMI-compliant and device connectors<br />
can be developed upon request.<br />
16 T-Blade Technical Guide Book, rev. A0 .
9. Basic infrastructure requirements<br />
The T-Blade 2 system requires professional installation to ensure proper system operation and uptime. Installation is typically supplied as a part<br />
of a complete High Performance Computer system deployed by T-<strong>Platforms</strong> installation specialists. The design of the compute room and support<br />
infrastructure is an integral part of each T-Blade 2-based HPC system, and it is common for T-<strong>Platforms</strong> to work closely with customers on the<br />
design and deployment of the entire HPC system installation in a turn-key fashion.<br />
The following subsections cover the basic prerequisites for T-Blade 2-based HPC cluster deployment.<br />
9.1. Electricity<br />
■<br />
■<br />
■<br />
9.2. Cooling<br />
■<br />
■<br />
■<br />
■<br />
80-415VAC, 5-wire, 3 Phase electrical distribution system<br />
Each T-Blade 2 system must be supplied with at least 11kW of uninterruptable power<br />
Each T-Blade 2 system must be fed by a dedicated power switch or circuit breaker rated at 32A<br />
Each T-Blade 2 enclosure requires 600 cubic feet per minute (CFM) of front-to-back airflow (1,019 cubic meters per hour)<br />
The site cooling system must be able to operate at an ambient temperature of up to 55°C, mixing in a sufficient amount of cold<br />
air to support the nominal operational mode of conditioning systems.<br />
Required cooling performance of 11KW for each T-Blade 2 system<br />
The inflow air temperature range should be in the range of 10°C to 30°C; an energy-efficient design rule is to keep inflow air at<br />
not less than 20°C, with optimal air temperature at 25°C.<br />
9.3. Cabinet infrastructure<br />
■<br />
■<br />
■<br />
EIA 310-D (or later) compliant cabinet<br />
Cabinet rack depth of at least 900mm; newer cabinet designs with 1000m rack depth are recommended<br />
Depending upon the interconnect cabling, the total enclosure weight with attached cabling infrastructure can vary from 130kg<br />
to 00kg<br />
9.4. Floors and layout<br />
■<br />
■<br />
■<br />
■<br />
Antistatic raised floors rated for the fully loaded rack cabinet weight are recommended<br />
Recommended front isle width is not less than 1.0 meter<br />
Recommended back isle width is not less than 0.9 meter<br />
Final isle design width will vary with individual cooling design requirements<br />
10. Operating and file systems compatibility<br />
As a standards-based HPC system, T-Blade 2 generally supports many Linux distributions, including RHEL and SUSE. It is also possible to use<br />
the Clustrx Watch with the major Linux Distributions to enable fine-grain monitoring capabilities. For customers interested in a Windows-based<br />
installation, Windows HPC Server 2008 does run on a T-Blade 2 system.<br />
Note: The Clustrx T-<strong>Platforms</strong> Edition OS is the only distribution that currently supports the global barrier and interrupt network<br />
functionality of the T-Blade 2 platform.<br />
Currently supported parallel file systems include, but are not limited to, Lustre and Panasas.<br />
T-Blade Technical Guide Book, rev. A0 .<br />
17
11. System specification<br />
T-Blade 2 Enclosure<br />
Form factor ■ 16 hot-plug computing modules (32 dual processor compute<br />
nodes) in a 7U enclosure<br />
■ 2 modules with 36-port QDR InfiniBand switches<br />
■ Dedicated management module<br />
Peak performance per enclosure 4.5 TFlops<br />
Density 384 quad-core processors (2,304 cores) per standard 19’’ 42U rack<br />
Peak performance per rack TFlops<br />
RAM ■ Up to 384GB (1.5GB per core)<br />
■ Up to 768GB (3GB per core)<br />
Power consumption per enclosure (max. configuration) 11 KW<br />
Performance/power consumption ratio 0.4GFLOPS/W<br />
Cooling design 12 hot-swap redundant cooling fans in front of the chassis<br />
Operating temperature 10-30°C<br />
Dimensions (HxWxD), mm 310х430х860mm<br />
System weight (fully configured) 152.6 Kg<br />
T-Blade 2 Compute Node<br />
Processor capacity/type 2 six-core Intel Xeon E5600 processors, up to 2.93GHz<br />
Chipset Intel 5520+ICH10<br />
RAM up to 24GB of DDR3-1333/1066/800<br />
Internal Storage Integrated MicroSD slot<br />
Expansion slots no<br />
Ethernet interface 1 GbE port<br />
Interconnect support integrated QDR InfiniBand<br />
Network interface capacity 40GB/s<br />
LED power, system ID<br />
Management integrated service processor with KVM over IP support<br />
Dimensions (HxWxD), mm 26х225х612<br />
T-Blade 2 external ports and networks<br />
System network QDR InfiniBand 40Gb/s<br />
40 external ports per enclosure<br />
Management (auxiliary) network 10G Ethernet, 2 external ports per enclosure<br />
Service network One GbE external port and one 100Mb external port per enclosure<br />
Global barrier network 1 uplink port to support large system topologies<br />
Global interrupt network 1 uplink port to support large system topologies<br />
18 T-Blade Technical Guide Book, rev. A0 .
Appendix 1. Compute Module diagram<br />
UDDR3<br />
CHA<br />
CHB<br />
CPU0<br />
QPI<br />
CPU1<br />
CHA<br />
CHB<br />
UDDR3<br />
UDDR3<br />
CHA<br />
CHB<br />
CPU2<br />
QPI<br />
CPU3<br />
CHA<br />
CHB<br />
UDDR3<br />
UDDR3<br />
UDDR3<br />
UDDR3<br />
UDDR3<br />
6.4GT/s<br />
CHC<br />
6.4GT/s<br />
UDDR3<br />
CHC<br />
Nehalem-EP<br />
Nehalem-EP<br />
CHC<br />
UDDR3<br />
UDDR3<br />
Nehalem-EP<br />
Nehalem-EP<br />
CHC<br />
UDDR3<br />
QPI 6.4GT/s<br />
QPI 6.4GT/s<br />
QPI 6.4GT/s<br />
QPI 6.4GT/s<br />
Intel<br />
82576<br />
PCIe<br />
PCIe<br />
Intel<br />
82576<br />
PCIe<br />
PCIe<br />
Tylersburg 24D<br />
Mellanox<br />
ConnectX<br />
QDR<br />
Tylersburg 24D<br />
Mellanox<br />
ConnectX<br />
QDR<br />
Infiniband<br />
Serdes<br />
ESI<br />
ESI<br />
MicroSD<br />
MicroSD<br />
USB<br />
AST2050<br />
USB<br />
AST2050<br />
ICH10<br />
ICH10<br />
T-Blade Technical Guide Book, rev. A0 .<br />
Serdes<br />
Infiniband<br />
High speed connector<br />
to middle plane<br />
19
Appendix 2. Intra-chassis QDR IB links diagram<br />
Compute modules<br />
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16<br />
0 T-Blade Technical Guide Book, rev. A0 .<br />
Backplane signal connector Backplane signal connector<br />
-48V<br />
-48V<br />
16 x QDR IB 4x links Power connector 16 x QDR IB 4x links<br />
Power connector<br />
I 2 C link<br />
QDR IB switch<br />
(Mellanox InfiniScale IV)<br />
I 2 C link<br />
QDR IB switch<br />
(Mellanox InfiniScale IV)<br />
20 x QDR IB 4x links<br />
20 x QDR IB 4x links<br />
20 x QSPF connectors<br />
20 x QSPF connectors
Appendix 3. Management and Switch Module diagram<br />
RJ45<br />
MDI<br />
PORT 1<br />
AST2050<br />
BMC<br />
MII<br />
PORT 5<br />
88E6161<br />
U122<br />
PORT 4<br />
MDI<br />
PORT 0<br />
88E1116<br />
U39<br />
U98<br />
XFP XFP<br />
RGMII<br />
SYSTEM<br />
RJ45<br />
MDI<br />
CPU<br />
479<br />
Yonah<br />
SGMII<br />
88E2011<br />
U77<br />
88E2011<br />
U76<br />
IMVP-6<br />
For CPU<br />
CPU<br />
88F5181<br />
88E1112<br />
U103<br />
U40<br />
PORT 0<br />
Intel<br />
82576<br />
PORT 0<br />
SGMII<br />
P1<br />
PORT 1<br />
SMI<br />
PORT 25 PORT 26<br />
P0<br />
PORT 1<br />
PCI BUS<br />
FSB 400 MHZ<br />
DDRII MEMORY *1<br />
PCIE x4<br />
DDR2 400 MHZ<br />
Intel 3100<br />
PCIE x4<br />
USB*2<br />
PCIE x4<br />
SATA<br />
PCI<br />
LPC I2C<br />
T-Blade Technical Guide Book, rev. A0 .<br />
USB 2.0<br />
HGX 10G<br />
PORT 20<br />
PORT 24<br />
PORT 26<br />
PORT 24<br />
DX270 DX270 PORT 8 DX270 HGX 10G DX270<br />
X24<br />
PORT<br />
X8<br />
PORT<br />
X16<br />
PORT<br />
X16<br />
PORT<br />
U18 U17 U16 U15<br />
PORT 25<br />
HGX 10G<br />
MICH<br />
PORT 0 PORT 1 PORT 0 PORT 1 PORT 4 PORT 5 PORT 0 PORT 1<br />
SATA*1<br />
SERDES<br />
SERDES<br />
BP<br />
BP<br />
FBGA<br />
HM<br />
SIO<br />
FWH<br />
MII<br />
(DME1737) (W83793G)<br />
BMC<br />
AST2050<br />
(FIRMWARE HUB)<br />
PCI-E<br />
FBGA<br />
LX50T<br />
RJ45<br />
Serial port *2<br />
D-SUB<br />
one to RJ45<br />
one to 88F8181<br />
RJ45 RJ45<br />
21
Disclaimer:<br />
This document is provided for informational purposes only<br />
and may contain inaccuracies. For up-to-date information<br />
please contact your T-<strong>Platforms</strong> representative.<br />
T-Blade Technical Guide Book, rev. A0 .
T-Blade Technical Guide Book, rev. A0 .
About T-<strong>Platforms</strong><br />
Established in 2002, T-<strong>Platforms</strong> provides comprehensive HPC systems, software and services<br />
with customer installations consistently included on the TOP500 worldwide list of most powerful<br />
supercomputers. Lomonosov, a T-<strong>Platforms</strong> system installed at Moscow State University, has<br />
been widely recognized as the #1 ranked supercomputer in Eastern Europe and the #12 ranked<br />
supercomputer worldwide.<br />
T-<strong>Platforms</strong> is a one-stop source for companies looking for the competitive advantage of HPC<br />
technology, but lacking the resources necessary to fully adopt and embrace a supercomputing<br />
environment. The portfolio of solutions offered by T-<strong>Platforms</strong> starts with early stage analysis and<br />
documentation of user requirements, and progresses all the way to turnkey supercomputer center<br />
design. The company’s highly scalable T-Blade family of HPC systems utilize Clustrx, a robust<br />
operating system built specifically for HPC, that ensures next-generation scalability and fidelity to<br />
support the path from petascale to exascale.<br />
T-<strong>Platforms</strong> also delivers a unique added value with its ability to provide end-to-end modeling,<br />
simulation and analysis services, and deep technical talent with particular expertise in areas such as<br />
CFD, structural analysis, and other extreme computational disciplines, a level of support not available<br />
from most HPC platform suppliers.<br />
T-<strong>Platforms</strong> is part of T-<strong>Platforms</strong> Group which consists of T-<strong>Platforms</strong>, T-Services, T-Massive<br />
Computing, and T-Design, with locations in Hannover, Moscow, Kiev and Taipei.<br />
For more information, please visit www.t-platforms.com.<br />
The T-Blade 1 computational row at 60TFlops MSU Chebyshev installation<br />
T-<strong>Platforms</strong><br />
Moscow, Russia<br />
Leninsky Prospect 113/1 Suite E-520<br />
Tel.: +7 (495) 956 54 90<br />
Fax: +7 (495) 956 54 15<br />
info@t-platforms.com<br />
http://www.t-platforms.com<br />
© T-<strong>Platforms</strong> 2010<br />
t<strong>Platforms</strong> GmbH<br />
Woehlerstrasse 42, D-30163,<br />
Hannover, Germany<br />
Tel.: +49 (511) 203 885 40<br />
Tax.: +49 (511) 203 885 41<br />
T-<strong>Platforms</strong>, T-<strong>Platforms</strong> logo, T-Blade, Clustrx TP edition are trademarks or registered trademarks of T-<strong>Platforms</strong>, JSC. Other brand names and<br />
trademarks are property of their respective owners.<br />
This document is for informational purposes only. T-<strong>Platforms</strong> reserves the right to make changes without further notice to any products herein. The<br />
content provided is as is and without express or implied warranties of any kind.