11.01.2013 Views

IBM AIX Continuous Availability Features - IBM Redbooks

IBM AIX Continuous Availability Features - IBM Redbooks

IBM AIX Continuous Availability Features - IBM Redbooks

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.6.2 <strong>Availability</strong><br />

1.6.3 Serviceability<br />

For all operating system or application errors, recovery must be attempted. When an error<br />

occurs, it is not valid to simply give up and terminate processing. Instead, the operating<br />

system or application must at least try to keep the component affected by the error up and<br />

running. If that is not possible, the operating system or application should make every effort to<br />

capture the error data and automate system restart as quickly as possible.<br />

The amount of effort put into the recovery should, of course, be proportional to the impact of a<br />

failure and the reasonableness of “trying again”. If actual recovery is not feasible, then the<br />

impact of the error should be reduced to the minimum appropriate level.<br />

Today, many customers require that recovery processing be subject to a time limit and have<br />

concluded that rapid termination with quick restart or takeover by another application or<br />

system is preferable to delayed success. However, takeover strategies rely on redundancy<br />

that becomes more and more expensive as systems get larger, and in most cases the main<br />

reason for quick termination is to begin a lengthy takeover process as soon as possible.<br />

Thus, the focus is now shifting back towards core reliability, and that means quality and<br />

recovery features.<br />

Today’s systems have hot plug capabilities for many subcomponents, from processors to<br />

input/output cards to memory. Also, clustering techniques, reconfigurable input/output data<br />

paths, mirrored disks, and hot swappable hardware should help to achieve a significant level<br />

of system availability.<br />

From a software perspective, availability is the capability of a program to perform its function<br />

whenever it is needed. <strong>Availability</strong> is a basic customer requirement. Customers require a<br />

stable degree of certainty, and also require that schedules and user needs are met.<br />

<strong>Availability</strong> gauges the percentage of time a system or program can be used by the customer<br />

for productive use. <strong>Availability</strong> is determined by the number of interruptions and the duration<br />

of the interruptions, and depends on characteristics and capabilities which include:<br />

► The ability to change program or operating system parameters without rebuilding the<br />

kernel and restarting the system<br />

► The ability to configure new devices without restarting the system<br />

► The ability to install new software or update existing software without restarting the system<br />

► The ability to monitor system resources and programs and cleanup or recover resources<br />

when failures occur<br />

► The ability to maintain data integrity in spite of errors<br />

The <strong>AIX</strong> operating system includes many availability characteristics and capabilities from<br />

which your overall environment will benefit.<br />

Focus on serviceability is shifting from providing customer support remotely through<br />

conventional methods, such as phone and e-mail, to automated system problem reporting<br />

and correction, without user (or system administrator) intervention.<br />

Hot swapping capabilities of some hardware components enhances the serviceability aspect.<br />

A service processor with advanced diagnostic and administrative tools further enhances the<br />

system serviceability. A System p server's service processor can call home in the service<br />

report, providing detailed information for <strong>IBM</strong> service to act upon. This automation not only<br />

6 <strong>IBM</strong> <strong>AIX</strong> <strong>Continuous</strong> <strong>Availability</strong> <strong>Features</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!