11.01.2013 Views

IBM AIX Continuous Availability Features - IBM Redbooks

IBM AIX Continuous Availability Features - IBM Redbooks

IBM AIX Continuous Availability Features - IBM Redbooks

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Note: The default kernel recovery framework setting is disabled. This means an affirmative<br />

action must be taken via SMIT or the raso command to enable recovery. When recovery is<br />

not enabled, the behavior will be the same as on <strong>AIX</strong> 5.3.<br />

2.1.13 Automated system hang recovery<br />

Automatic system hang recovery with error detection and fix capabilities are key features of<br />

the automated system management of <strong>AIX</strong> which can detect the condition that high priority<br />

processes are monopolizing system resources and prohibiting normal execution. <strong>AIX</strong> offers<br />

system administrators a variety of customizable solutions to remedy the system hang<br />

condition.<br />

2.1.14 Recovery framework<br />

Beginning with <strong>AIX</strong> V6.1, the kernel can recover from errors in selected routines, thus<br />

avoiding an unplanned system outage. The kernel recovery framework improves system<br />

availability. The framework allows continued system operation after some unexpected kernel<br />

errors.<br />

Kernel recovery<br />

Kernel recovery in <strong>AIX</strong> V6.1 is disabled by default. This is because the set of errors that can<br />

be recovered is limited in <strong>AIX</strong> V6.1, and kernel recovery, when enabled, requires an extra 4 K<br />

page of memory per thread. To enable, disable, or show kernel recovery state, use the SMIT<br />

path Problem Determination → Kernel Recovery, or use the smitty krecovery command.<br />

You can show the current and next boot states, and also enable or disable the kernel<br />

recovery framework at the next boot. In order for the change to become fully active, you must<br />

run the /usr/sbin/bosboot command after changing the kernel recovery state, and then<br />

reboot the operating system.<br />

During a kernel recovery action, the system might pause for a short time, generally less than<br />

two seconds. The following actions occur immediately after a kernel recovery action:<br />

1. The system console displays the message saying that a kernel error recovery action has<br />

occurred.<br />

2. <strong>AIX</strong> adds an entry into the error log.<br />

3. <strong>AIX</strong> may generate a live dump.<br />

4. You can send the error log data and live dump data to <strong>IBM</strong> for service (similar to sending<br />

data from a full system termination).<br />

Note: Some functions might be lost after a kernel recovery, but the operating system<br />

remains in a stable state. If necessary, shut down and restart your system to restore the<br />

lost functions.<br />

2.2 System reliability<br />

Over the years the <strong>AIX</strong> operating system has included many reliability features inspired by<br />

<strong>IBM</strong> technology, and it now includes even more ground breaking technologies that add to <strong>AIX</strong><br />

reliability. Some of these include kernel support for POWER6 storage keys, Concurrent <strong>AIX</strong><br />

Update, dynamic tracing and enhanced software first failure data capture, just to mention a<br />

few new features.<br />

Chapter 2. <strong>AIX</strong> continuous availability features 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!