11.01.2013 Views

IBM AIX Continuous Availability Features - IBM Redbooks

IBM AIX Continuous Availability Features - IBM Redbooks

IBM AIX Continuous Availability Features - IBM Redbooks

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>IBM</strong> has tried to enhance FFDC features such that in most cases, failures in <strong>AIX</strong> will not result<br />

in recreate requests, also known as Second Failure Data Capture (SFDC), from <strong>AIX</strong> support<br />

to customers in order to solve the problem. In <strong>AIX</strong>, this service functionality focuses on<br />

gathering sufficient information upon a failure to allow for complete diagnosis without<br />

requiring failure reproduction. For example, Lightweight Memory Trace (LMT) support<br />

introduced with <strong>AIX</strong> V5.3 ML3 represents a significant advance in <strong>AIX</strong> first failure data capture<br />

capabilities, and provides service personnel with a powerful and valuable tool for diagnosing<br />

problems.<br />

The Run-Time Error Checking (RTEC) facility provides service personnel with a method to<br />

manipulate debug capabilities that are already built into product binaries. RTEC provides<br />

service personnel with powerful first failure data capture and second failure data capture<br />

(SFDC) error detection features. This SFDC service functionality focuses on tools to enhance<br />

serviceability data gathering after an initial failure. The basic RTEC framework has been<br />

introduced in <strong>AIX</strong> V5.3 TL3, and extended with additional features in subsequent <strong>AIX</strong><br />

releases.<br />

1.8 <strong>IBM</strong> <strong>AIX</strong> continuous availability strategies<br />

There are many market requirements for continuous availability to resolve typical customer<br />

pain points, including:<br />

► Too many scheduled outages<br />

► Service depends on problem recreation and intrusive problem determination<br />

► System unavailability disrupts customer business<br />

► Need for reliable protection of customer data<br />

<strong>IBM</strong> has made <strong>AIX</strong> robust with respect to continuous availability characteristics, and this<br />

robustness makes <strong>IBM</strong> UNIX servers the best in the market. <strong>IBM</strong>'s <strong>AIX</strong> continuous availability<br />

strategy has the following characteristics:<br />

► Reduce the frequency and severity of <strong>AIX</strong> system outages, planned and unplanned<br />

► Improve serviceability by enhancing <strong>AIX</strong> failure data capture tools.<br />

► Provide enhancements to debug and problem analysis tools.<br />

► Ensure that all necessary information involving unplanned outages is provided, to correct<br />

the problem with minimal customer effort<br />

► Use of mainframe hardware features for operating system continuous availability brought<br />

to System p hardware<br />

► Provide key error detection capabilities through hardware-assist<br />

► Exploit other System p hardware aspects to continue transition to “stay-up” designs<br />

► Use of “stay-up” designs for continuous availability<br />

► Maintain operating system availability in the face of errors while minimizing application<br />

impacts<br />

► Use of sophisticated and granular operating system error detection and recovery<br />

capabilities<br />

► Maintain a strong tie between serviceability and availability<br />

► Provide problem diagnosis from data captured at first failure without the need for further<br />

disruption<br />

► Provide service aids that are non-disruptive to the customer environment<br />

8 <strong>IBM</strong> <strong>AIX</strong> <strong>Continuous</strong> <strong>Availability</strong> <strong>Features</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!