IBM AIX Continuous Availability Features - IBM Redbooks
IBM AIX Continuous Availability Features - IBM Redbooks
IBM AIX Continuous Availability Features - IBM Redbooks
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Note: The default kernel recovery framework setting is disabled. This means an affirmative<br />
action must be taken via SMIT or the raso command to enable recovery. When recovery is<br />
not enabled, the behavior will be the same as on <strong>AIX</strong> 5.3.<br />
2.1.13 Automated system hang recovery<br />
Automatic system hang recovery with error detection and fix capabilities are key features of<br />
the automated system management of <strong>AIX</strong> which can detect the condition that high priority<br />
processes are monopolizing system resources and prohibiting normal execution. <strong>AIX</strong> offers<br />
system administrators a variety of customizable solutions to remedy the system hang<br />
condition.<br />
2.1.14 Recovery framework<br />
Beginning with <strong>AIX</strong> V6.1, the kernel can recover from errors in selected routines, thus<br />
avoiding an unplanned system outage. The kernel recovery framework improves system<br />
availability. The framework allows continued system operation after some unexpected kernel<br />
errors.<br />
Kernel recovery<br />
Kernel recovery in <strong>AIX</strong> V6.1 is disabled by default. This is because the set of errors that can<br />
be recovered is limited in <strong>AIX</strong> V6.1, and kernel recovery, when enabled, requires an extra 4 K<br />
page of memory per thread. To enable, disable, or show kernel recovery state, use the SMIT<br />
path Problem Determination → Kernel Recovery, or use the smitty krecovery command.<br />
You can show the current and next boot states, and also enable or disable the kernel<br />
recovery framework at the next boot. In order for the change to become fully active, you must<br />
run the /usr/sbin/bosboot command after changing the kernel recovery state, and then<br />
reboot the operating system.<br />
During a kernel recovery action, the system might pause for a short time, generally less than<br />
two seconds. The following actions occur immediately after a kernel recovery action:<br />
1. The system console displays the message saying that a kernel error recovery action has<br />
occurred.<br />
2. <strong>AIX</strong> adds an entry into the error log.<br />
3. <strong>AIX</strong> may generate a live dump.<br />
4. You can send the error log data and live dump data to <strong>IBM</strong> for service (similar to sending<br />
data from a full system termination).<br />
Note: Some functions might be lost after a kernel recovery, but the operating system<br />
remains in a stable state. If necessary, shut down and restart your system to restore the<br />
lost functions.<br />
2.2 System reliability<br />
Over the years the <strong>AIX</strong> operating system has included many reliability features inspired by<br />
<strong>IBM</strong> technology, and it now includes even more ground breaking technologies that add to <strong>AIX</strong><br />
reliability. Some of these include kernel support for POWER6 storage keys, Concurrent <strong>AIX</strong><br />
Update, dynamic tracing and enhanced software first failure data capture, just to mention a<br />
few new features.<br />
Chapter 2. <strong>AIX</strong> continuous availability features 19