27.03.2013 Views

Download PDF - IBM Redbooks

Download PDF - IBM Redbooks

Download PDF - IBM Redbooks

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 4. Continuous availability and<br />

manageability<br />

4<br />

This chapter provides information about <strong>IBM</strong> reliability, availability, and serviceability (RAS)<br />

design and features. This set of technologies, implemented on <strong>IBM</strong> Power Systems servers,<br />

provides the possibility to improve your architecture’s total cost of ownership (TCO) by<br />

reducing unplanned down time.<br />

The elements of RAS can be described as follows:<br />

► Reliability: Indicates how infrequently a defect or fault in a server manifests itself.<br />

► Availability: Indicates how infrequently the functionality of a system or application is<br />

impacted by a fault or defect.<br />

► Serviceability: Indicates how well faults and their effects are communicated to users and<br />

services and how efficiently and nondisruptively the faults are repaired.<br />

Each successive generation of <strong>IBM</strong> servers is designed to be more reliable than the previous<br />

server family. POWER7 processor-based servers have new features to support new levels of<br />

virtualization, help ease administrative burden, and increase system utilization.<br />

Reliability starts with components, devices, and subsystems designed to be fault-tolerant.<br />

POWER7 uses lower voltage technology, improving reliability with stacked latches to reduce<br />

soft error susceptibility. During the design and development process, subsystems go through<br />

rigorous verification and integration testing processes. During system manufacturing,<br />

systems go through a thorough testing process to help ensure high product quality levels.<br />

The processor and memory subsystem contain a number of features designed to avoid or<br />

correct environmentally induced, single-bit, intermittent failures as well as handle solid faults<br />

in components, including selective redundancy to tolerate certain faults without requiring an<br />

outage or parts replacement.<br />

© Copyright <strong>IBM</strong> Corp. 2011. All rights reserved. 135

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!