28.11.2012 Views

T-BLADE 2 TECHNICAL GUIDEBOOK - T-Platforms

T-BLADE 2 TECHNICAL GUIDEBOOK - T-Platforms

T-BLADE 2 TECHNICAL GUIDEBOOK - T-Platforms

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

T-Blade 2 Technical GuideBook<br />

www.t-platforms.com


Disclaimer:<br />

This document is provided for informational purposes only<br />

and may contain inaccuracies. For up-to-date information<br />

please contact your T-<strong>Platforms</strong> representative.<br />

T-Blade Technical Guide Book, rev. A0 .


Table of contents<br />

1. General overview ..........................................................................................................................................................................4<br />

2. Compute Module ........................................................................................................................................................................5<br />

2.1. Compute module board ....................................................................................................................................................5<br />

2.1.1. Board geometry and dimensions ................................................................................................................................................................5<br />

2.1.2. Compute node featureset ...............................................................................................................................................................................6<br />

2.1.3. Compute node special features ....................................................................................................................................................................7<br />

2.2. Compute module power and connectivity features ........................................................................................7<br />

2.3. Compute module heat sink..............................................................................................................................................8<br />

3. InfiniBand Switch Modules ..................................................................................................................................................9<br />

4. Management and Switch Module (MSM) ...........................................................................................................10<br />

4.1. Management processor block ....................................................................................................................................10<br />

4.2. Gigabit Ethernet switches block ...............................................................................................................................11<br />

4.3. Special networks block (FPGA) ...................................................................................................................................11<br />

4.4. Global clock distribution block ..................................................................................................................................11<br />

4.5. Management and Switch Module Connectors.................................................................................................12<br />

5. Backplane ........................................................................................................................................................................................13<br />

6. Blade chassis .................................................................................................................................................................................14<br />

6.3. Cooling subsystem .............................................................................................................................................................15<br />

7. Other features ...............................................................................................................................................................................16<br />

7.1. Power-on procedure ........................................................................................................................................................16<br />

7.2. Emergency Shutdown procedure .............................................................................................................................16<br />

8. Cluster Management and Monitoring .....................................................................................................................16<br />

9. Basic infrastructure requirements .................................................................................................................................17<br />

9.1. Electricity ...................................................................................................................................................................................17<br />

9.2. Cooling .......................................................................................................................................................................................17<br />

9.3. Cabinet infrastructure .......................................................................................................................................................17<br />

9.4. Floors and layout ..................................................................................................................................................................17<br />

10. Operating and file systems compatibility ...........................................................................................................17<br />

11. System specification .............................................................................................................................................................18<br />

T-Blade 2 Enclosure .........................................................................................................................................................................................................18<br />

T-Blade 2 Compute Node ...........................................................................................................................................................................................18<br />

T-Blade 2 external ports and networks ...............................................................................................................................................................18<br />

Appendix 1. Compute Module diagram ......................................................................................................................19<br />

Appendix 2. Intra-chassis QDR IB links diagram .....................................................................................................20<br />

Appendix 3. Management and Switch Module diagram ..............................................................................21<br />

T-Blade Technical Guide Book, rev. A0 .


1. General overview<br />

Welcome to T-Blade , the leading compute density solution from T-<strong>Platforms</strong> engineered<br />

specifically for the world’s largest x86 HPC installations.<br />

Introduced in 009, T-Blade creates an unmatched solution for customers with the most<br />

demanding HPC needs, and perfectly complements the other elements of T-<strong>Platforms</strong>’ HPC<br />

portfolio, including the Cell BE-based PowerXCell 8 1RU compute nodes and the 5RU 10-node<br />

T-Blade I system.<br />

The driving force behind the T-Blade 2 design is the use of industry-standard compute,<br />

interconnect, and management architectures packaged in an HPC-optimized manner to deliver<br />

just the right mix of performance, density, redundancy, and manageability.<br />

T-Blade 2: designed for density, performance, and reliability:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

7U-high chassis for mounting into standard 19” rack<br />

16 hot-pluggable compute modules with 64 Intel Xeon 56xx series CPUs; each compute<br />

module contains 2 dual-processor compute nodes on a single board<br />

2 integrated 36-port QDR InfiniBand switches<br />

Integrated management module with GbE switches and special HPC features<br />

Air cooling using hot-swappable high performance fans<br />

11KW, N+1 redundant power supplies<br />

Although T-Blade 2 features standard technologies, T-<strong>Platforms</strong> has focused its engineering and design expertise on overcoming the scalability<br />

bottlenecks that so often plague commodity clusters based built with off-the-shelf components. This bottleneck is largely caused by poor collective<br />

communications efficiency, which needs special support at the hardware level to maintain excellent applications performance with high nodecount<br />

systems. T-Blade 2 features dedicated global barrier and global interrupt networks, ensuring effective application scalability even on<br />

systems containing many thousands of nodes.<br />

The global barrier network supports fast synchronization between the tasks of a large-scale application, while the global interrupt network<br />

significantly reduces the negative impact of OS jitter by synchronizing process scheduling over the entire system. As a result, processors<br />

communicate much more efficiently with each other, enabling high scalability even for the most demanding parallel applications.<br />

The two networks are managed by an FPGA chip integrated into a dedicated management module (see Appendix) that monitors all subsystems<br />

and components, enabling remote system management, emergency shutdowns, etc.<br />

To ensure smooth and fast data transfer, and reduce network congestion in high node-count systems, T-Blade 2 incorporates extra external<br />

InfiniBand ports, providing an impressive 1.6Tb/sec of bandwidth.<br />

T-<strong>Platforms</strong> has engineered the T-Blade 2 platform for reliability as well as performance. The system features neither hard discs nor cables inside<br />

the enclosure, dramatically decreasing outages caused by mechanical failure within a node. Reliability is further enhanced within each enclosure<br />

with hot-swappable, N+1 redundant power supplies and cooling fans.<br />

Petascale computing demands deep changes in system software to efficiently support application scalability. T-Blade 2 is optionally supplied<br />

with a comprehensive ‘all-in-one’ system software stack that includes an optimized Linux OS core and system libraries, as well as all necessary<br />

management and monitoring software components. This integrated software stack provides an out-of-the-box experience, reducing system<br />

installation time and administration costs.<br />

The T-Blade Linux core provides specialized support to the global barrier and global interrupt networks, an optimization that greatly speeds up<br />

inter-node communications over more traditional designs. And our new system management software, part of Clustrx OS, ensures seamless<br />

scalability up to 12,000 compute nodes with near-real time monitoring capability. T-Blade 2 also features aggressive power saving technology, and<br />

support of topology-driven resource allocation which improves memory usage and accelerates real applications performance.<br />

Best of all, the T-Blade 2 HPC platform is field-proven: it is the primary building block of the 420TF Lomonosov cluster deployed at Moscow State<br />

University. This system was recognized at number 12 on the November 2009 TOP500 list of the world’s fastest computers and is considered to<br />

be the largest supercomputer in Eastern Europe.<br />

T-Blade Technical Guide Book, rev. A0 .


2. Compute Module<br />

A T-Blade 2 compute module consists of a single printed circuit board (PCB) with two separate dual-processor compute nodes that are mounted<br />

on a specially-engineered heat sink that provides the necessary cooling for the entire module. Each compute module board comes with memory<br />

modules preinstalled (Figure 1), in 16-board bulk packages.<br />

2.1. Compute module board<br />

Guides<br />

Figure 1.<br />

System<br />

board tray<br />

Backplane connector<br />

DDR3 memory modules<br />

2.1.1. Board geometry and dimensions<br />

Intel Xeon 5500 Series<br />

QDR IB controllers<br />

The geometry of a compute module PCB is shown in Figure 1. This airflow-optimized board is attached for reliability both to the heat sink and to<br />

a board tray using mounting holes. Each board attaches to the system backplane using a power and signal connector mounted on the PCB, and<br />

is supported by mechanical guides located on each tray.<br />

T-Blade Technical Guide Book, rev. A0 .<br />

Heatsink<br />

DC-DC<br />

converters<br />

5


2.1.2. Compute node featureset<br />

Each of the two compute nodes on a PCB features:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

Two Intel Westmere (56xx series) CPUs, each with a maximum TDP of 95W and clock speeds up to 2.93GHz<br />

12GB memory per compute node in a 3-channel design that uses unbuffered ECC DDR3-1333 RAM<br />

6GB or 12GB of memory per each CPU socket<br />

4 memory modules (Figure 2) in a proprietary mezzanine card design with 27 memory chips (9 chips in each of the 3<br />

channels)<br />

The Intel Tylersburg 24D+ ICH10 chipset, with QPI links providing 6.4 Gtransfers/sec (3.2GHz) between CPUs on a board<br />

A single-port Mellanox ConnectX QDR InfiniBand chip connected to the Tylersburg chipset via PCIe 2.0 x8 link<br />

A single-port GbE controller connected to ICH10<br />

A USB controller for miniSD or microSD flash, and a connector for the flash card<br />

Baseboard management controller (BMC) and a separate single-port GbE controller for the BMC connection (refer to the section<br />

“Compute node management and monitoring” for a discussion of the BMC and the management and monitoring feature set)<br />

One serial port, connected to the BMC<br />

Five GPIO signals from ICH10 chip going through the backplane to the FPGA chip on the blade management module for HPCspecific<br />

purposes (described in the section “Compute node special features”)<br />

A centralized storage design with no separate SAS/SATA storage controller; the links from the integrated SATA controller are not<br />

routed, and the appropriate BIOS portion is disabled<br />

Two microbuttons for POWER and RESET, placed on the rear edge of the compute module<br />

Figure 2. DDR3 Memory Module<br />

6 T-Blade Technical Guide Book, rev. A0 .<br />

See Compute node Diagram in Appendix 1


2.1.3. Compute node special features<br />

■<br />

■<br />

■<br />

It is possible to use either an internal or external clock for the system. Clock selection is implemented using a special IDT5T9GL02<br />

model (or similar) chip, and the clock select pin on that chip goes to the DIP switch placed on the rear bottom part of the PCB<br />

to provide manual switching of the clocking mode. There is also an LED placed on the rear edge of each compute module to<br />

indicate whether the system is using the internal or external clock.<br />

Two interrupt pins on the ICH10 chip are routed through the backplane to the FPGA chip on the management module; the FPGA<br />

chip is discussed in greater detail in a separate document.<br />

5 GPIO pins on the ICH10 chip on each compute node are routed through the backplane to the FPGA chip on the management<br />

module.<br />

Compute node management and monitoring is based on an industry standard baseboard management controller, or BMC. The BMC is accessed<br />

through the management module using a dedicated GbE interface to eliminate the possibility of interference with application performance. The<br />

BMC provides the following hardware monitoring features:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

Temperature for each CPU core<br />

Memory temperature – one temperature sensor per memory block dedicated to each CPU<br />

Two additional temperature sensors on CPU voltage regulator modules<br />

The voltage at each CPU core<br />

3.3V, 5V, 12V and 5Vsb voltages<br />

Memory and PCI error conditions; for memory errors the exact memory channel is reported to aid in error resolution<br />

InfiniBand and GbE counters (optional)<br />

The following management capabilities are supported:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

Remote power on/power off/reset<br />

Remote KVM-over-IP and full SOL console access starting from the boot process, including the POST procedure and BIOS<br />

setup<br />

Compute node BIOS update and NVRAM save/restore without involving any OS-dependent utilities<br />

Boot select priority<br />

Remote BMC firmware upgrade<br />

All BMC features can be accessed remotely via Telnet and SSH using key authentication for added security.<br />

2.2. Compute module power and connectivity features<br />

■<br />

All power and connectivity for a compute module is provided via backplane power and signal connectors. The power connector<br />

provides -48V DC input with a minimum of 18A current for every -48V channel.<br />

The following signals are routed to the backplane via the signal connector:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

2 QDR InfiniBand 4x links (1 per compute node)<br />

4 GbE links (1 GbE + 1 BMC GbE connection per compute node)<br />

2 external clock links (1 per compute node); it is possible to use just one external clock link by using a clock extender on the<br />

compute module<br />

4 remote interrupt links (2 per compute node)<br />

10 global barrier network links (5 GPIO links per compute node)<br />

T-Blade Technical Guide Book, rev. A0 .


2.3. Compute module heat sink<br />

The compute board is attached to a tray and a specially-designed heat sink that covers the entire board. The heat sink also serves as a rail for<br />

the compute module boards that helps them slide reliably into compute module bays of the enclosure. The geometry of the heat sink is shown in<br />

figure 3.<br />

The heat sink base plate has variable thickness according to the component height on the compute module board. A rubber material with low<br />

thermal resistance provides good thermal contact between the components and the heat sink.<br />

The heat sink design and choice of material are based on extensive thermal analysis performed by T-Services to achieve an optimal compute<br />

node design. This engineering resulted in a final design that is not only more efficient but also lighter than standard heat sinks, reducing the overall<br />

T-Blade 2 system weight below 160kg.<br />

There is also a retention mechanism mounted on the rear edge of the heat sink for plugging/unplugging the compute module (Figure 3).<br />

Figure 3.<br />

Figure 4.<br />

Air current velocity and distribution between the board<br />

and heat sink for 60CFM Fan design variant.<br />

Heatsink Backplane<br />

connector<br />

PCB<br />

8 T-Blade Technical Guide Book, rev. A0 .<br />

Retention<br />

Mechanism<br />

Figure 5.<br />

Temperature distribution at the base of aluminum-based<br />

heat sink design variant.


3. InfiniBand Switch Modules<br />

There are two non-redundant integrated InfiniBand switch modules provided with the T-Blade 2 system, each located at the back of the blade<br />

chassis. Each InfiniBand switch module is based on Mellanox’s InfiniScale IV QDR InfiniBand switch silicon. A total of 36 QDR InfiniBand 4x ports<br />

is provided: 16 QDR ports are routed through the backplane to the compute modules, and 20 QDR ports go to the rear panel to provide additional<br />

flexibility when connecting centralized storage or heterogeneous nodes (see Appendix 2). Each blade module port is connected sequentially, with<br />

ports 1-16 connected to one switch, and ports 17-36 to the other. The intra-chassis switch topology is two independent switches, while the interchassis<br />

topology can be any option supported by the InfiniBand subnet manager.<br />

Figure 6. QDR InfiniBand Modules<br />

■<br />

■<br />

■<br />

The external panel ports are implemented using QSFP connectors.<br />

Remote control and monitoring of the InfiniBand switch module is implemented via an I C link going through the backplane to<br />

the management module.<br />

The InfiniBand switch module is powered via backplane power connector using -48V DC input.<br />

The rear panel of the IB Switch module features:<br />

■<br />

■<br />

■<br />

■<br />

20 QSFP connectors for InfiniBand ports<br />

“Link initialized” and “Link active” LEDs for each port<br />

Two mounting handles (retention mechanism, Figure 1)<br />

Ventilation holes<br />

T-Blade Technical Guide Book, rev. A0 .<br />

9


4. Management and Switch Module (MSM)<br />

The MSM provides the following functions:<br />

■<br />

■<br />

■<br />

■<br />

Blades and chassis management and monitoring<br />

Gigabit Ethernet switching<br />

Support for the global barrier and global interrupt networks<br />

Global clock distribution<br />

The MSM is a 1U-high device located at the rear side of the blade enclosure. It consists of four<br />

functional blocks:<br />

■<br />

■<br />

■<br />

■<br />

Management processor block (4.1)<br />

Gigabit Ethernet switches block (4.2)<br />

Special networks block (4.3)<br />

Global clock distribution block (4.4)<br />

For large-scale installations, the management module is connected to the specialized external global clock distribution switch to reduce OS<br />

jitter, improving application performance and predictability. The management module continuously collects software and hardware events, and<br />

consolidates them for reporting to the cluster management node and a ruggedized Black Box appliance. Combined with Clustrx management<br />

services, this two-stage process gives near real-time monitoring capability of 12,000 events per second using a single 2-way management node<br />

(see Section 8 for more information).<br />

4.1. Management processor block<br />

The management processor block features the following components:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

Low voltage Intel Yonah-class CPU<br />

Intel 3100 MICH<br />

Single DDR2 memory slot populated with 2GB ECC module<br />

Dual-port GbE controller on a PCI-E bus (Intel 82576)<br />

BMC (AST2050)<br />

SIO chip<br />

40GB SSD drive<br />

The following interfaces are provided:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

1 GbE interface from the Intel 82576 to the auxiliary network switch<br />

1 GbE interface from the Intel 82576 to the management network switch<br />

1 10/100 Ethernet interface from the BMC sharing one of the RJ45 connectors on the rear panel. This makes the BMC and<br />

management module CPU available remotely in case of issues with GbE switches<br />

1 PCI-E interface to the special networks block (FPGA)<br />

1 video interface from the BMC to the D-SUB connector on the rear panel<br />

2 USB interfaces to the connectors on the rear panel<br />

1 USB interface to the LCD module (via backplane connector)<br />

1 RS-232 interface to the RJ45 connector on the rear panel<br />

1 RS-232 interface to the GbE switches management processor (Marvel 88F5181)<br />

I C bus interface going to the backplane (for fans, PSUs, and InfiniBand switch management). This I C bus is shared by both the<br />

BMC and the management module CPU, and enables the BMC to power individual PSUs up and down<br />

See Appendix for more information<br />

10 T-Blade Technical Guide Book, rev. A0 .<br />

Figure 7.<br />

Management and Switch Module


4.2. Gigabit Ethernet switches block<br />

The Gigabit Ethernet switches block consists of two separate switches:<br />

■<br />

■<br />

Auxiliary network switch, used for generic node access (SSH, job management, etc.)<br />

Management network switch, used for access to BMCs<br />

Each of the switches contains two Marvell DX270 chips stacked together. Both switches are managed via a Marvell 88F5181 processor running<br />

Linux and custom firmware.<br />

The auxiliary network switch has the following connections:<br />

■<br />

■<br />

■<br />

■<br />

32 GbE links going through the backplane to the compute nodes<br />

1 GbE link going to the network controller of the management CPU (Intel 82576)<br />

1 GbE link going to the Marvel 88F5181<br />

2 10G uplinks going to the XFP connectors on the rear panel of the management module<br />

The management network switch has the following connections:<br />

■<br />

■<br />

■<br />

■<br />

32 GbE links routed through the backplane to the BMCs of the compute nodes<br />

1 GbE link routed to the network controller of the management CPU (Intel 82576)<br />

1 GbE link routed to the Marvel 88F5181<br />

2 GbE uplinks routed to the RJ45 connectors on the rear panel of the management module<br />

One of the RJ45 connectors on the rear panel can also be used to provide remote access to the BMC of the management module. The Marvel<br />

88F5181 processor provides two RS-232 interfaces to the CPU of the management module.<br />

See Appendix 3 for more information.<br />

4.3. Special networks block (FPGA)<br />

The special networks block is used to implement the global barrier and global interrupt networks. It is based on a Xilinx XC5VLX50T FPGA<br />

(XC5VLX50T-3FFG665C), and also contains an FPGA boot flash device.<br />

The following interfaces are provided:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

PCI-E link from the FPGA to the Intel 3100 MICH chip<br />

32x5 (160 total) single-wire links from the FPGA to the GPIO pins of the ICH10 chips on the compute nodes<br />

2 single-wire links from the FPGA to the IRQ pins of the ICH10 chips on the compute nodes (split into 32x2 links on the<br />

backplane)<br />

5 differential links from the FPGA to the RJ45 connectors on the rear panel for the global barrier network<br />

2 differential links from the FPGA to the RJ45 connectors on the rear panel for the global interrupt network<br />

4 differential links from MGTs on the FPGA to the RJ45 connector on the rear panel (reserved for future use).<br />

See Appendix for more information.<br />

4.4. Global clock distribution block<br />

Global clock distribution is used to adjust the frequency of CPUs and memory on all compute nodes. Clock distribution may be provided via a clock<br />

source on the management module, or by an external clock signal. For better signal quality, differential signaling should be used for the external<br />

clock; the differential clock input is converted to a single-wire on the management module. The choice of using an external or embedded clock<br />

source is made by the clock selector chip, controlled using the switch on the rear panel of the management module. The resulting clock signal<br />

goes to the backplane and then is split to provide the clock source for all compute nodes.<br />

See Appendix 3 for more information.<br />

T-Blade Technical Guide Book, rev. A0 .<br />

11


4.5. Management and Switch Module Connectors<br />

The following connectors are present on the management module:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

The backplane signal connector provides the following signals:<br />

■ 32 GbE links from compute nodes to the auxiliary network switch<br />

■ 32 GbE links from the BMCs of the compute nodes to the management network switch<br />

■ I C link to the Intel 3100 MICH chip and the management node BMC<br />

■ 32x5 GPIO links from the FPGA<br />

■ interrupts links from the FPGA<br />

■ Сlock signal to the compute nodes<br />

■ USB link to the LCD module<br />

Backplane power connector that provides main 48V power input and 5Vsb power input to the BMC<br />

Rear panel connectors:<br />

■ 2 XFP 10GbE connectors for auxiliary network switch uplinks (ETH1 & ETH2)<br />

■ 2 RJ45 connectors for management network switch uplinks (ETH3 & ETH4)<br />

■ One port is GbE<br />

■ Another one is 100Mb and shared among few devices<br />

1 RJ45 100Mb connector for global barrier network links 1-4 (FPGA block) (SPN1)<br />

1 RJ45 100Mb connector for global barrier network link 5, global interrupt network links 1 and 2, and an external clock input<br />

(FPGA block) (SPN2)<br />

1 RJ45 100Mb connector for 4 differential signals, reserved for future use (FPGA block) (RIO)<br />

1 D-SUB connector for video output (VGA)<br />

2 USB connectors (USB)<br />

1 RJ45 connector for RS-22 port (serial console) (Serial).<br />

ETH1<br />

ETH2<br />

ETH3<br />

ETH4<br />

Figure 8.<br />

Rear panel connectors of management and switch module.<br />

VGA RIO<br />

SPN2 SPN1<br />

12 T-Blade Technical Guide Book, rev. A0 .<br />

SERIAL<br />

USB


5. Backplane<br />

The backplane is used to provide power to the compute modules, the InfiniBand switch modules, and the management and switch module (MSM),<br />

and to provide connectivity between these system components. It also provides hot plug functionality to the compute modules. The backplane<br />

delivers electrical signals rated at up to 13kW of power with standard system signaling using high-frequency Infiniband QDR links (32 links @<br />

40Gigatransfers).<br />

The backplane has very few active components; those that are included are responsible only for global clock distribution logic. It is a low-profile<br />

board, optimized for efficient airflow, and based on a 24-layer PCB design. The following connectors are on the rear side of the backplane:<br />

■<br />

■<br />

■<br />

16 combined power/signal connectors to the compute nodes<br />

2 combined power/signal connectors to the InfiniBand switch modules<br />

2 combined power/signal connectors to the MSM<br />

The front side of the backplane is used for power input. The following components are placed on the front side of the backplane:<br />

■<br />

■<br />

■<br />

I C connectors to the PSU’s<br />

I C connectors to the fan modules<br />

USB connector to the LCD panel<br />

The overall backplane geometry is shown in Figure 9.<br />

Figure 9. System backplane, rear view.<br />

Management Module Interface<br />

2 Infiniband Switch interfaces<br />

T-Blade Technical Guide Book, rev. A0 .<br />

Blade module interfaces<br />

13


6. Blade chassis<br />

The T-Blade 2 chassis accommodates the following field-replaceable unit (FRU) components:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

16 hot-plug compute modules<br />

2 InfiniBand switch modules<br />

1 management and switch module<br />

10+2 hot-swappable 80mm fan modules<br />

6 hot-swappable power supplies.<br />

There is also a button-controlled, 4-line LCD panel connected through the backplane to the USB port on the blade management module. There is<br />

no main power switch on the blade enclosure front panel, nodes are powered up remotely or using microbuttons to power on/off/reset individual<br />

blades.<br />

The dimensions of the chassis:<br />

■<br />

■<br />

■<br />

Height: 310mm (7U)<br />

Width (not including the rails and mounting brackets): 430mm, fits a standard 19” rack<br />

Depth: 860 mm<br />

The chassis can be mounted into a standard 19” rack using fixed (non-sliding) rails, and there are mounting brackets in the front side of the<br />

chassis that can be screwed to vertical rails in the rack. T-<strong>Platforms</strong> recommends that customers install an empty chassis into the rack cabinet,<br />

and then populate it with the modules. A fully populated T-Blade 2 chassis has an approximate weight of 153kg, and caution is advised when<br />

handling it.<br />

10 HS fans<br />

LCD panel<br />

2 HS fans<br />

6 Power supplies<br />

Figure 10. Front section view Figure 11. Rear section view<br />

Figure 12. Extraction of Compute Module<br />

AC power<br />

14 T-Blade Technical Guide Book, rev. A0 .<br />

16 twin blade bays<br />

IB QDR Switches<br />

Figure 13. Extraction of IB Switch<br />

Management module


6.2. Power subsystem<br />

The T-Blade 2 power subsystem is comprised of six power supplies. They are connected through six PSU Interface boards to the T-Shaped<br />

Power Distribution board which connects the hot-pluggable power supplies (and system fans) to the system backplane. The power supplies and<br />

fans are the only off-the-shelf components in a T-Blade 2 system; for more information on the T-Blade 2 system assembly, please refer to the<br />

document “T-Blade 2 system layout and components.”<br />

The 2725W Lineage Power-made rectifier (model CP2725AC54Z)<br />

is a compact 1RU design (Figure 14). As the integral power<br />

distribution system of T-Blade 2, it features a front-to-back airflow<br />

rectifier, an RS485 interface, redundant I C links, PFC, and both<br />

under/over voltage and thermal protection.<br />

For more information about the PSU, please follow this link:<br />

http://www.lineagepower.com/BinaryGet.aspx?ID=f75abb6f-abcf-<br />

4835-b811-f3b6dabd98b5<br />

6.3. Cooling subsystem<br />

T-Blade 2 uses front-to-back airflow cooling, zoned into upper and lower airflow sections within the blade enclosure. Extensive engineering and<br />

design work was invested in the cooling design of T-Blade 2, with both thermal simulations and physical tests conducted to ensure sufficient<br />

system cooling at constant peak loads.<br />

Compute module cooling is provided via ten hot-swappable 80mm<br />

axial fans (Figure 15). The fans have connectors for DC power, speed<br />

monitoring, and PWM-based speed control. Each fan is placed into a<br />

separate hot-swappable module, and there are louvers in the chassis<br />

to prevent backward airflow in the case of a fan failure or removal. Fan<br />

modules come equipped with a small board that holds the fan controller.<br />

This controller is connected through the backplane via the I C interface to<br />

the blade management module.<br />

There are two additional 80mm fans below the LCD panel to provide<br />

cooling for the InfiniBand switch modules and the blade management<br />

module. Each of the two fan clusters is N+1 redundant, and uses identical<br />

high performance 14,000 RPM fans. To increase their operational lifetime<br />

they are housed at the front of the enclosure which lets them operate in a<br />

close-to-ambient air temperature range.<br />

T-Blade Technical Guide Book, rev. A0 .<br />

Figure 14. System Power supplies<br />

Figure 15. System fans<br />

15


7. Other features<br />

7.1. Power-on procedure<br />

After connecting the blade system to the power source, the power supplies enter stand by mode, providing only 5Vsb power to the blade<br />

management module. The power supplies are then switched on using a remote connection to the BMC, or by pressing the power button. At<br />

this point the Management and Switch module, the InfiniBand switch modules, and the BMCs on the compute modules begin their initialization<br />

routines. After initialization is complete, the compute modules are turned on using a remote connection to their BMCs.<br />

7.2. Emergency Shutdown procedure<br />

The T-Blade 2 system has Emergency Shutdown functionality implemented in the management module (MSM). Depending upon the firmware<br />

version, either an individual blade module or entire enclosure can automatically be powered down in case of a critical management event.<br />

Emergency shutdown helps to avoid physical damage to system silicon.<br />

As a reminder, each blade can also be manually switched on, off, and reset using two microbuttons located on the rear edge of blade module PCB.<br />

8. Cluster Management and Monitoring<br />

The advanced monitoring and management capabilities of the T-Blade 2 are implemented at a clusterwide<br />

level. The optional Clustrx OS TP Edition and its Clustrx Watch monitoring suite are tightly integrated<br />

with the T-Blade system and its management module, along with standalone command line utilities for<br />

advanced scripting. Clustrx is Petascale-ready high performance computing OS, developed by T-Massive<br />

Computing, part of T-<strong>Platforms</strong> group. The Clustrx OS management subsystem requires a console system<br />

and single management node, and provides the following functionality:<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

■<br />

Automated cluster deployment, using one console with single installation DVD media<br />

CLI- and GUI-based access<br />

Optimized such that cluster installation and tuning require only run-of-the-mill system administration<br />

skills<br />

The installation GUI reduces the time of basic installation and tuning of the cluster from installation to Linpack test readiness<br />

down to 2 hours (for systems with a standard topology, not including storage system installation time)<br />

The monitoring system tracks both the compute and infrastructure cluster subsystems (except for Ethernet and InfiniBand<br />

switches at the time of writing)<br />

The monitoring system can be configured to deliver notifications to systems administrators<br />

The management subsystem reacts automatically with precise remedies to correct abnormal operation, including automated<br />

equipment power down.<br />

Monitoring latency is minimized to ensure on-time delivery of corrective measures, with up to 150 events per second gathered<br />

from individual compute nodes (300-500 events per second targeted in the near future)<br />

When combined with Clustrx CNL (Compute Node Linux), the Clustrx monitoring and management suite is an exaFLOPS-ready<br />

operating environment covering L3 clusters with up to 12,000 compute nodes (support for L4 clusters of up to 210,000 compute<br />

nodes is planned)<br />

Supports a variety of compute node operating systems; currently validated with Clustrx OS (future support for RHEL, SUSE,<br />

Windows, and others is planned)<br />

A variety of parallel file systems are supported, including Lustre and Panasas<br />

It is possible to use some other popular management packages with the T-Blade 2 systems, as they are IPMI-compliant and device connectors<br />

can be developed upon request.<br />

16 T-Blade Technical Guide Book, rev. A0 .


9. Basic infrastructure requirements<br />

The T-Blade 2 system requires professional installation to ensure proper system operation and uptime. Installation is typically supplied as a part<br />

of a complete High Performance Computer system deployed by T-<strong>Platforms</strong> installation specialists. The design of the compute room and support<br />

infrastructure is an integral part of each T-Blade 2-based HPC system, and it is common for T-<strong>Platforms</strong> to work closely with customers on the<br />

design and deployment of the entire HPC system installation in a turn-key fashion.<br />

The following subsections cover the basic prerequisites for T-Blade 2-based HPC cluster deployment.<br />

9.1. Electricity<br />

■<br />

■<br />

■<br />

9.2. Cooling<br />

■<br />

■<br />

■<br />

■<br />

80-415VAC, 5-wire, 3 Phase electrical distribution system<br />

Each T-Blade 2 system must be supplied with at least 11kW of uninterruptable power<br />

Each T-Blade 2 system must be fed by a dedicated power switch or circuit breaker rated at 32A<br />

Each T-Blade 2 enclosure requires 600 cubic feet per minute (CFM) of front-to-back airflow (1,019 cubic meters per hour)<br />

The site cooling system must be able to operate at an ambient temperature of up to 55°C, mixing in a sufficient amount of cold<br />

air to support the nominal operational mode of conditioning systems.<br />

Required cooling performance of 11KW for each T-Blade 2 system<br />

The inflow air temperature range should be in the range of 10°C to 30°C; an energy-efficient design rule is to keep inflow air at<br />

not less than 20°C, with optimal air temperature at 25°C.<br />

9.3. Cabinet infrastructure<br />

■<br />

■<br />

■<br />

EIA 310-D (or later) compliant cabinet<br />

Cabinet rack depth of at least 900mm; newer cabinet designs with 1000m rack depth are recommended<br />

Depending upon the interconnect cabling, the total enclosure weight with attached cabling infrastructure can vary from 130kg<br />

to 00kg<br />

9.4. Floors and layout<br />

■<br />

■<br />

■<br />

■<br />

Antistatic raised floors rated for the fully loaded rack cabinet weight are recommended<br />

Recommended front isle width is not less than 1.0 meter<br />

Recommended back isle width is not less than 0.9 meter<br />

Final isle design width will vary with individual cooling design requirements<br />

10. Operating and file systems compatibility<br />

As a standards-based HPC system, T-Blade 2 generally supports many Linux distributions, including RHEL and SUSE. It is also possible to use<br />

the Clustrx Watch with the major Linux Distributions to enable fine-grain monitoring capabilities. For customers interested in a Windows-based<br />

installation, Windows HPC Server 2008 does run on a T-Blade 2 system.<br />

Note: The Clustrx T-<strong>Platforms</strong> Edition OS is the only distribution that currently supports the global barrier and interrupt network<br />

functionality of the T-Blade 2 platform.<br />

Currently supported parallel file systems include, but are not limited to, Lustre and Panasas.<br />

T-Blade Technical Guide Book, rev. A0 .<br />

17


11. System specification<br />

T-Blade 2 Enclosure<br />

Form factor ■ 16 hot-plug computing modules (32 dual processor compute<br />

nodes) in a 7U enclosure<br />

■ 2 modules with 36-port QDR InfiniBand switches<br />

■ Dedicated management module<br />

Peak performance per enclosure 4.5 TFlops<br />

Density 384 quad-core processors (2,304 cores) per standard 19’’ 42U rack<br />

Peak performance per rack TFlops<br />

RAM ■ Up to 384GB (1.5GB per core)<br />

■ Up to 768GB (3GB per core)<br />

Power consumption per enclosure (max. configuration) 11 KW<br />

Performance/power consumption ratio 0.4GFLOPS/W<br />

Cooling design 12 hot-swap redundant cooling fans in front of the chassis<br />

Operating temperature 10-30°C<br />

Dimensions (HxWxD), mm 310х430х860mm<br />

System weight (fully configured) 152.6 Kg<br />

T-Blade 2 Compute Node<br />

Processor capacity/type 2 six-core Intel Xeon E5600 processors, up to 2.93GHz<br />

Chipset Intel 5520+ICH10<br />

RAM up to 24GB of DDR3-1333/1066/800<br />

Internal Storage Integrated MicroSD slot<br />

Expansion slots no<br />

Ethernet interface 1 GbE port<br />

Interconnect support integrated QDR InfiniBand<br />

Network interface capacity 40GB/s<br />

LED power, system ID<br />

Management integrated service processor with KVM over IP support<br />

Dimensions (HxWxD), mm 26х225х612<br />

T-Blade 2 external ports and networks<br />

System network QDR InfiniBand 40Gb/s<br />

40 external ports per enclosure<br />

Management (auxiliary) network 10G Ethernet, 2 external ports per enclosure<br />

Service network One GbE external port and one 100Mb external port per enclosure<br />

Global barrier network 1 uplink port to support large system topologies<br />

Global interrupt network 1 uplink port to support large system topologies<br />

18 T-Blade Technical Guide Book, rev. A0 .


Appendix 1. Compute Module diagram<br />

UDDR3<br />

CHA<br />

CHB<br />

CPU0<br />

QPI<br />

CPU1<br />

CHA<br />

CHB<br />

UDDR3<br />

UDDR3<br />

CHA<br />

CHB<br />

CPU2<br />

QPI<br />

CPU3<br />

CHA<br />

CHB<br />

UDDR3<br />

UDDR3<br />

UDDR3<br />

UDDR3<br />

UDDR3<br />

6.4GT/s<br />

CHC<br />

6.4GT/s<br />

UDDR3<br />

CHC<br />

Nehalem-EP<br />

Nehalem-EP<br />

CHC<br />

UDDR3<br />

UDDR3<br />

Nehalem-EP<br />

Nehalem-EP<br />

CHC<br />

UDDR3<br />

QPI 6.4GT/s<br />

QPI 6.4GT/s<br />

QPI 6.4GT/s<br />

QPI 6.4GT/s<br />

Intel<br />

82576<br />

PCIe<br />

PCIe<br />

Intel<br />

82576<br />

PCIe<br />

PCIe<br />

Tylersburg 24D<br />

Mellanox<br />

ConnectX<br />

QDR<br />

Tylersburg 24D<br />

Mellanox<br />

ConnectX<br />

QDR<br />

Infiniband<br />

Serdes<br />

ESI<br />

ESI<br />

MicroSD<br />

MicroSD<br />

USB<br />

AST2050<br />

USB<br />

AST2050<br />

ICH10<br />

ICH10<br />

T-Blade Technical Guide Book, rev. A0 .<br />

Serdes<br />

Infiniband<br />

High speed connector<br />

to middle plane<br />

19


Appendix 2. Intra-chassis QDR IB links diagram<br />

Compute modules<br />

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16<br />

0 T-Blade Technical Guide Book, rev. A0 .<br />

Backplane signal connector Backplane signal connector<br />

-48V<br />

-48V<br />

16 x QDR IB 4x links Power connector 16 x QDR IB 4x links<br />

Power connector<br />

I 2 C link<br />

QDR IB switch<br />

(Mellanox InfiniScale IV)<br />

I 2 C link<br />

QDR IB switch<br />

(Mellanox InfiniScale IV)<br />

20 x QDR IB 4x links<br />

20 x QDR IB 4x links<br />

20 x QSPF connectors<br />

20 x QSPF connectors


Appendix 3. Management and Switch Module diagram<br />

RJ45<br />

MDI<br />

PORT 1<br />

AST2050<br />

BMC<br />

MII<br />

PORT 5<br />

88E6161<br />

U122<br />

PORT 4<br />

MDI<br />

PORT 0<br />

88E1116<br />

U39<br />

U98<br />

XFP XFP<br />

RGMII<br />

SYSTEM<br />

RJ45<br />

MDI<br />

CPU<br />

479<br />

Yonah<br />

SGMII<br />

88E2011<br />

U77<br />

88E2011<br />

U76<br />

IMVP-6<br />

For CPU<br />

CPU<br />

88F5181<br />

88E1112<br />

U103<br />

U40<br />

PORT 0<br />

Intel<br />

82576<br />

PORT 0<br />

SGMII<br />

P1<br />

PORT 1<br />

SMI<br />

PORT 25 PORT 26<br />

P0<br />

PORT 1<br />

PCI BUS<br />

FSB 400 MHZ<br />

DDRII MEMORY *1<br />

PCIE x4<br />

DDR2 400 MHZ<br />

Intel 3100<br />

PCIE x4<br />

USB*2<br />

PCIE x4<br />

SATA<br />

PCI<br />

LPC I2C<br />

T-Blade Technical Guide Book, rev. A0 .<br />

USB 2.0<br />

HGX 10G<br />

PORT 20<br />

PORT 24<br />

PORT 26<br />

PORT 24<br />

DX270 DX270 PORT 8 DX270 HGX 10G DX270<br />

X24<br />

PORT<br />

X8<br />

PORT<br />

X16<br />

PORT<br />

X16<br />

PORT<br />

U18 U17 U16 U15<br />

PORT 25<br />

HGX 10G<br />

MICH<br />

PORT 0 PORT 1 PORT 0 PORT 1 PORT 4 PORT 5 PORT 0 PORT 1<br />

SATA*1<br />

SERDES<br />

SERDES<br />

BP<br />

BP<br />

FBGA<br />

HM<br />

SIO<br />

FWH<br />

MII<br />

(DME1737) (W83793G)<br />

BMC<br />

AST2050<br />

(FIRMWARE HUB)<br />

PCI-E<br />

FBGA<br />

LX50T<br />

RJ45<br />

Serial port *2<br />

D-SUB<br />

one to RJ45<br />

one to 88F8181<br />

RJ45 RJ45<br />

21


Disclaimer:<br />

This document is provided for informational purposes only<br />

and may contain inaccuracies. For up-to-date information<br />

please contact your T-<strong>Platforms</strong> representative.<br />

T-Blade Technical Guide Book, rev. A0 .


T-Blade Technical Guide Book, rev. A0 .


About T-<strong>Platforms</strong><br />

Established in 2002, T-<strong>Platforms</strong> provides comprehensive HPC systems, software and services<br />

with customer installations consistently included on the TOP500 worldwide list of most powerful<br />

supercomputers. Lomonosov, a T-<strong>Platforms</strong> system installed at Moscow State University, has<br />

been widely recognized as the #1 ranked supercomputer in Eastern Europe and the #12 ranked<br />

supercomputer worldwide.<br />

T-<strong>Platforms</strong> is a one-stop source for companies looking for the competitive advantage of HPC<br />

technology, but lacking the resources necessary to fully adopt and embrace a supercomputing<br />

environment. The portfolio of solutions offered by T-<strong>Platforms</strong> starts with early stage analysis and<br />

documentation of user requirements, and progresses all the way to turnkey supercomputer center<br />

design. The company’s highly scalable T-Blade family of HPC systems utilize Clustrx, a robust<br />

operating system built specifically for HPC, that ensures next-generation scalability and fidelity to<br />

support the path from petascale to exascale.<br />

T-<strong>Platforms</strong> also delivers a unique added value with its ability to provide end-to-end modeling,<br />

simulation and analysis services, and deep technical talent with particular expertise in areas such as<br />

CFD, structural analysis, and other extreme computational disciplines, a level of support not available<br />

from most HPC platform suppliers.<br />

T-<strong>Platforms</strong> is part of T-<strong>Platforms</strong> Group which consists of T-<strong>Platforms</strong>, T-Services, T-Massive<br />

Computing, and T-Design, with locations in Hannover, Moscow, Kiev and Taipei.<br />

For more information, please visit www.t-platforms.com.<br />

The T-Blade 1 computational row at 60TFlops MSU Chebyshev installation<br />

T-<strong>Platforms</strong><br />

Moscow, Russia<br />

Leninsky Prospect 113/1 Suite E-520<br />

Tel.: +7 (495) 956 54 90<br />

Fax: +7 (495) 956 54 15<br />

info@t-platforms.com<br />

http://www.t-platforms.com<br />

© T-<strong>Platforms</strong> 2010<br />

t<strong>Platforms</strong> GmbH<br />

Woehlerstrasse 42, D-30163,<br />

Hannover, Germany<br />

Tel.: +49 (511) 203 885 40<br />

Tax.: +49 (511) 203 885 41<br />

T-<strong>Platforms</strong>, T-<strong>Platforms</strong> logo, T-Blade, Clustrx TP edition are trademarks or registered trademarks of T-<strong>Platforms</strong>, JSC. Other brand names and<br />

trademarks are property of their respective owners.<br />

This document is for informational purposes only. T-<strong>Platforms</strong> reserves the right to make changes without further notice to any products herein. The<br />

content provided is as is and without express or implied warranties of any kind.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!