11.01.2013 Views

IBM AIX Continuous Availability Features - IBM Redbooks

IBM AIX Continuous Availability Features - IBM Redbooks

IBM AIX Continuous Availability Features - IBM Redbooks

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

2.2.3 Paging space verification<br />

2.2.4 Storage keys<br />

Finding the root cause of system crashes, hangs or other symptoms when that root cause is<br />

data corruption can be difficult, because the symptoms can appear far downstream from<br />

where the corruption was observed. The page space verification design is intended to<br />

improve First Failure Data Capture (FFDC) of problems caused by paging space data<br />

corruption by checking that the data read in from paging space matches the data that was<br />

written out.<br />

When a page is paged out, a checksum will be computed on the data in the page and saved<br />

in a pinned array associated with the paging device. If and when it is paged back in, a new<br />

checksum will be computed on the data that is read in from paging space and compared to<br />

the value in the array. If the values do not match, the kernel will log an error and halt (if the<br />

error occurred in system memory), or send an exception to the application (if it occurred in<br />

user memory).<br />

Paging space verification can be enabled or disabled, on a per-paging space basis, by using<br />

the mkps and chps commands. The details of these commands can be found in their<br />

corresponding <strong>AIX</strong> man pages.<br />

Most application programmers have experienced the inadvertent memory overlay problem<br />

where a piece of code accidentally wrote to a memory location that is not part of the<br />

component’s memory domain. The new hardware feature, called storage protection keys, and<br />

referred to as storage keys in this paper, assists application programmers in locating these<br />

inadvertent memory overlays.<br />

Memory overlays and addressing errors are among the most difficult problems to diagnose<br />

and service. The problem is compounded by growing software size and increased<br />

complexity. Under <strong>AIX</strong>, a large global address space is shared among a variety of software<br />

components. This creates a serviceability issue for both applications and the <strong>AIX</strong> kernel.<br />

The <strong>AIX</strong> 64-bit kernel makes extensive use of a large flat address space by design. This is<br />

important in order to avoid costly MMU operations on POWER processors. Although this<br />

design does produce a significant performance advantage, it also adds reliability, availability<br />

and serviceability (RAS) costs. Large 64-bit applications, such as DB2®, use a global<br />

address space for similar reasons and also face issues with memory overlays.<br />

Storage keys were introduced in PowerPC® architecture to provide memory isolation, while<br />

still permitting software to maintain a flat address space. The concept was adopted from the<br />

System z and <strong>IBM</strong> 390 systems. Storage keys allow an address space to be assigned<br />

context-specific protection. Access to the memory regions can be limited to prevent, and<br />

catch, illegal storage references.<br />

A new CPU facility, Authority Mask Register (AMR), has been added to define the key set that<br />

the CPU has access to. The AMR is implemented as bit pairs vector indexed by key number,<br />

with distinct bits to control read and write access for each key. The key protection is in<br />

addition to the existing page protection bits. For any load or store process, the CPU retrieves<br />

the memory key assigned to the targeted page during the translation process. The key<br />

number is used to select the bit pair in the AMR that defines if an access is permitted.<br />

A data storage interrupt occurs when this check fails. The AMR is a per-context register that<br />

can be updated efficiently. The TLB/ERAT contains storage key values for each virtual page.<br />

This allows AMR updates to be efficient, since they do not require TLB/ERAT invalidation.<br />

Chapter 2. <strong>AIX</strong> continuous availability features 21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!