IBM AIX Continuous Availability Features - IBM Redbooks
IBM AIX Continuous Availability Features - IBM Redbooks
IBM AIX Continuous Availability Features - IBM Redbooks
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>IBM</strong> has tried to enhance FFDC features such that in most cases, failures in <strong>AIX</strong> will not result<br />
in recreate requests, also known as Second Failure Data Capture (SFDC), from <strong>AIX</strong> support<br />
to customers in order to solve the problem. In <strong>AIX</strong>, this service functionality focuses on<br />
gathering sufficient information upon a failure to allow for complete diagnosis without<br />
requiring failure reproduction. For example, Lightweight Memory Trace (LMT) support<br />
introduced with <strong>AIX</strong> V5.3 ML3 represents a significant advance in <strong>AIX</strong> first failure data capture<br />
capabilities, and provides service personnel with a powerful and valuable tool for diagnosing<br />
problems.<br />
The Run-Time Error Checking (RTEC) facility provides service personnel with a method to<br />
manipulate debug capabilities that are already built into product binaries. RTEC provides<br />
service personnel with powerful first failure data capture and second failure data capture<br />
(SFDC) error detection features. This SFDC service functionality focuses on tools to enhance<br />
serviceability data gathering after an initial failure. The basic RTEC framework has been<br />
introduced in <strong>AIX</strong> V5.3 TL3, and extended with additional features in subsequent <strong>AIX</strong><br />
releases.<br />
1.8 <strong>IBM</strong> <strong>AIX</strong> continuous availability strategies<br />
There are many market requirements for continuous availability to resolve typical customer<br />
pain points, including:<br />
► Too many scheduled outages<br />
► Service depends on problem recreation and intrusive problem determination<br />
► System unavailability disrupts customer business<br />
► Need for reliable protection of customer data<br />
<strong>IBM</strong> has made <strong>AIX</strong> robust with respect to continuous availability characteristics, and this<br />
robustness makes <strong>IBM</strong> UNIX servers the best in the market. <strong>IBM</strong>'s <strong>AIX</strong> continuous availability<br />
strategy has the following characteristics:<br />
► Reduce the frequency and severity of <strong>AIX</strong> system outages, planned and unplanned<br />
► Improve serviceability by enhancing <strong>AIX</strong> failure data capture tools.<br />
► Provide enhancements to debug and problem analysis tools.<br />
► Ensure that all necessary information involving unplanned outages is provided, to correct<br />
the problem with minimal customer effort<br />
► Use of mainframe hardware features for operating system continuous availability brought<br />
to System p hardware<br />
► Provide key error detection capabilities through hardware-assist<br />
► Exploit other System p hardware aspects to continue transition to “stay-up” designs<br />
► Use of “stay-up” designs for continuous availability<br />
► Maintain operating system availability in the face of errors while minimizing application<br />
impacts<br />
► Use of sophisticated and granular operating system error detection and recovery<br />
capabilities<br />
► Maintain a strong tie between serviceability and availability<br />
► Provide problem diagnosis from data captured at first failure without the need for further<br />
disruption<br />
► Provide service aids that are non-disruptive to the customer environment<br />
8 <strong>IBM</strong> <strong>AIX</strong> <strong>Continuous</strong> <strong>Availability</strong> <strong>Features</strong>