11.01.2013 Views

Creating a Disk Heartbeat device in HACMP v5 - filibeto.org

Creating a Disk Heartbeat device in HACMP v5 - filibeto.org

Creating a Disk Heartbeat device in HACMP v5 - filibeto.org

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Creat<strong>in</strong>g</strong> a <strong>Disk</strong> <strong>Heartbeat</strong> <strong>device</strong> <strong>in</strong> <strong>HACMP</strong> <strong>v5</strong>.x<br />

by Shawn Bodily, send feedback to sbodily@us.ibm.com<br />

Introduction<br />

This document is <strong>in</strong>tended to supplement exist<strong>in</strong>g documentation on how to configure, test, and monitor<br />

a disk heartbeat <strong>device</strong> and network <strong>in</strong> <strong>HACMP</strong>/ES V 5.x. This feature is new <strong>in</strong> V5.1, and it provides<br />

another alternative for non-ip based heartbeats. The <strong>in</strong>tent of this document is to provide step-by-step<br />

directions as they are currently sketchy <strong>in</strong> the <strong>HACMP</strong> <strong>v5</strong>.1 pubs. This will hopefully clarify several<br />

misconceptions that have been brought to my attention.<br />

This example consists of a two-node cluster (nodes GT40 & SL55) with shared ESS vpath <strong>device</strong>s. If<br />

more than two nodes exist <strong>in</strong> your cluster, you will need N number or non-ip heartbeat networks. Where<br />

N represents the number of nodes <strong>in</strong> the cluster. (i.e. three node cluster requires 3 non-ip heartbeat<br />

networks). This creates a heartbeat r<strong>in</strong>g.<br />

It’s worth not<strong>in</strong>g that one should not confuse concurrent volume groups with concurrent resource<br />

groups. And note, there is a difference between concurrent volume groups and enhanced concurrent<br />

volume groups. A concurrent resource group is one which may be active on more than one node at a<br />

type. A concurrent volume group also shares the characteristic that it may be active on more than one<br />

node at a time. This is also true for an enhanced concurrent VG; however, <strong>in</strong> a non-concurrent resource<br />

group, the enhanced concurrent VG, while it may be active and not have a SCSI reserve resid<strong>in</strong>g on the<br />

disk, it’s data is only normally accessed by one system at a time.<br />

Pre-Reqs<br />

In this document, it is assumed that the shared storage <strong>device</strong>s are already made available and<br />

configured to AIX, and that the proper levels of RSCT and <strong>HACMP</strong> are already <strong>in</strong>stalled. S<strong>in</strong>ce<br />

utiliz<strong>in</strong>g enhanced-concurrent volume groups, it is also necessary to make sure that bos.clvm.enh is<br />

<strong>in</strong>stalled. This is not normally <strong>in</strong>stalled as part of a <strong>HACMP</strong> <strong>in</strong>stallation via the <strong>in</strong>stallp command.<br />

<strong>Disk</strong> <strong>Heartbeat</strong> Details<br />

This provides the ability to use exist<strong>in</strong>g shared disks, regardless of disk type, to provide a serial network<br />

like heartbeat path. A benefit of this is that one need not dedicate the <strong>in</strong>tegrated serial ports for <strong>HACMP</strong><br />

heartbeats (if supported on the subject systems) or purchase an 8-port asynchronous adapter.<br />

This feature utilizes a special area on the disk previously reserved for “Concurrent Capable” volume<br />

groups (traditionally only for SSA disks). S<strong>in</strong>ce AIX 5.2 dropped support for the SSA concurrent<br />

volume groups, this fit makes it available for use. This also means that the disk chosen for serial<br />

heartbeat can be part of a data volume group. (Note Performance Concerns below)<br />

The disk heart beat<strong>in</strong>g code went <strong>in</strong>to the 2.2.1.30 version of RSCT. Some recommended APARs br<strong>in</strong>g<br />

that to 2.2.1.31. If you've got that level <strong>in</strong>stalled, and <strong>HACMP</strong> 5.1, you can use disk heart beat<strong>in</strong>g. The<br />

relevant file to look for is /usr/sb<strong>in</strong>/rsct/b<strong>in</strong>/hats_diskhb_nim. Though it is supported ma<strong>in</strong>ly through<br />

RSCT, we recommend AIX 5.2 when utiliz<strong>in</strong>g disk heartbeat.<br />

To use disk heartbeats, no node can issue a SCSI reserve for the disk. This is because both nodes us<strong>in</strong>g<br />

it for heart beat<strong>in</strong>g must be able to read and write to that disk. It is sufficient that the disk be <strong>in</strong> an<br />

enhanced concurrent volume group to meet this requirement. (It should also be possible to use a disk<br />

that is <strong>in</strong> no volume group for disk heart beat<strong>in</strong>g. RSCT certa<strong>in</strong>ly won't care; but <strong>HACMP</strong> SMIT panels<br />

may not be particularly helpful <strong>in</strong> sett<strong>in</strong>g this up.)


Now, <strong>in</strong> <strong>HACMP</strong> 5.1 with AIX 5.1, enhanced concurrent mode volume groups can be used only <strong>in</strong><br />

concurrent (or "onl<strong>in</strong>e on all available nodes") resource groups. This means that disk heart beat<strong>in</strong>g is<br />

useful only to people runn<strong>in</strong>g concurrent configurations, or who can allocate such a volume group/disk<br />

(which is certa<strong>in</strong>ly possible, though perhaps an expensive approach). In other words, at <strong>HACMP</strong> 5.1<br />

and AIX 5.1, typical <strong>HACMP</strong> clusters (with a server and idle standby) will require an additional<br />

concurrent resource group with a disk <strong>in</strong> an enhanced concurrent VG dedicated for heartbeat use. At<br />

AIX 5.2, disk heartbeats can exist on an enhanced concurrent VG that resides <strong>in</strong> a non-concurrent<br />

resource group. At AIX 5.2, one may also use the fast disk takeover feature <strong>in</strong> non-concurrent resource<br />

groups with enhanced concurrent volume groups. With <strong>HACMP</strong> 5.1 and AIX 5.2, enhanced concurrent<br />

mode volume groups can be used <strong>in</strong> serial access configurations for fast disk takeover, along with disk<br />

heart beat<strong>in</strong>g. (AIX 5.2 requires RSCT 2.3.1.0 or later) That is, the facility becomes usable to the<br />

average customer, without committment of additional resource, s<strong>in</strong>ce disk heart beat<strong>in</strong>g can occur on a<br />

volume group used for ord<strong>in</strong>ary filesystem and logical volume activity.<br />

Performance Concerns with <strong>Disk</strong> Heart Beat<strong>in</strong>g<br />

Most modern disks take somewhere around 15 milliseconds to service an IO request, which means that<br />

they can't do much more than 60 seeks per second. The sectors used for disk heart beat<strong>in</strong>g are part of the<br />

VGDA, which is at the outer edge of the disk, and may not be near the application data. This means that<br />

every time a disk heart beat is done, a seek will have to be done. <strong>Disk</strong> heart beat<strong>in</strong>g will typically (with<br />

the default parameters) require four (4) seeks per second. That is each of two nodes will write to the<br />

disk and read from the disk once/second, for a total of 4 IOPS. So, if possible, a disk should be selected<br />

as a heart beat path that does not normally do more than about 50 seeks per second. The filemon tool<br />

can be used to monitor the seek activity on a disk.<br />

In cases where a disk must be used for heart beat<strong>in</strong>g that already has a high seek rate, it may be<br />

necessary to change the heart beat tim<strong>in</strong>g parameters to prevent long write delays from be<strong>in</strong>g seen as a<br />

failure.<br />

The above cautions as stated apply to JBOD configurations, and should be modified based on the<br />

technology of the disk subsystem:<br />

• If the disk used for heart beat<strong>in</strong>g is <strong>in</strong> a controller that provides large amounts of cache - such as<br />

the ESS - the number of seeks per second can be much larger.<br />

• If the disk used for heart beat<strong>in</strong>g is part of a RAID set without a cach<strong>in</strong>g front end controller,<br />

the disk may be able to support fewer seeks, due to the extra activity required by RAID<br />

operations<br />

Pros & Cons of us<strong>in</strong>g <strong>Disk</strong> Heart Beat<strong>in</strong>g<br />

Pros:<br />

1. No additional hardware needed.<br />

2. Easier to span greater distances.<br />

3. No loss <strong>in</strong> usable storage space and can use exist<strong>in</strong>g data volume groups.<br />

4. Uses enhanced concurrent vgs which also allows for fast-disk takeover.<br />

Cons:<br />

1. Must be aware of the <strong>device</strong>s diskhb uses and adm<strong>in</strong>ister <strong>device</strong>s properly*<br />

2. Lose the forced down option of stopp<strong>in</strong>g cluster services because of enhanced concurrent vg<br />

usage.<br />

*I have had a customer delete all their disk def<strong>in</strong>itions and run cfgmgr aga<strong>in</strong> to clean up number holes <strong>in</strong><br />

their <strong>device</strong> def<strong>in</strong>ition list. When they did, obviously , the <strong>device</strong> names did not come back <strong>in</strong> the same<br />

order as they were before. So the diskhb <strong>device</strong> assigned to <strong>HACMP</strong>, was no longer valid as a different


<strong>device</strong> was configured us<strong>in</strong>g the old <strong>device</strong> name and it was not part of an enhanced concurrent vg.<br />

Hence diskhb no longer worked, and s<strong>in</strong>ce the customer did not monitor their cluster either, they were<br />

unaware that the diskhb no longer worked.<br />

Configur<strong>in</strong>g <strong>Disk</strong> <strong>Heartbeat</strong><br />

As mentioned previously, disk heartbeat utilizes enhanced-concurrent volume groups. If start<strong>in</strong>g with a<br />

new configuration of disks, you will want to create enhanced-concurrent volume groups, either<br />

manually, or by utiliz<strong>in</strong>g C-SPOC. My example shows us<strong>in</strong>g C-SPOC which is the best practice to use<br />

here.<br />

If you plan to use an exist<strong>in</strong>g volume group for disk heartbeats that is not enhanced concurrent, then you<br />

will have to convert them to such us<strong>in</strong>g the chvg command. We recommend that the VG be active on<br />

only one node, and that the application not be runn<strong>in</strong>g when mak<strong>in</strong>g this change run chvg –C vgname<br />

to change the VG to enhanced concurrent mode. Vary it off, then run importvg –L <br />

vgname on the other node to make it aware that the vg is now enhanced concurrent capable. If us<strong>in</strong>g this<br />

method, you can skip to the “<strong>Creat<strong>in</strong>g</strong> <strong>Disk</strong> <strong>Heartbeat</strong> Devices and Network” section of this document.<br />

<strong>Disk</strong> and VG Preparation<br />

To be able to use C-SPOC successfully, it is required that some basic IP based topology already exists,<br />

and that the storage <strong>device</strong>s have their PVIDs <strong>in</strong> both system’s ODMs. This can be verified by runn<strong>in</strong>g<br />

lspv on each system. If a PVID does not exist on each system, it is necessary to run chdev -l<br />

-a pv=yes on each system. This will allow C-SPOC to match up the <strong>device</strong>(s) as known<br />

shared storage <strong>device</strong>s.<br />

In this example, vpath0 on GT40 is the same virtual disk as vpath3 on SL55.<br />

Use C-SPOC to create an Enhanced Concurrent volume group. In the follow<strong>in</strong>g example, s<strong>in</strong>ce vpath<br />

<strong>device</strong>s are be<strong>in</strong>g used, the follow<strong>in</strong>g smit screen paths were used.<br />

smitty cl_adm<strong>in</strong>�Go to <strong>HACMP</strong> Concurrent Logical Volume Management� Concurrent Volume<br />

Groups� Create a Concurrent Volume Group with Data Path Devices and press Enter<br />

Choose the appropriate nodes, and then choose the appropriate shared storage <strong>device</strong>s based on pvids<br />

(vpath0 and vpath3 <strong>in</strong> this example). Choose a name for the VG , desired PP size, make sure that<br />

Enhanced Concurrent Mode is set to true and press Enter. (enhconcvg <strong>in</strong> this example). This will create<br />

the shared enhanced-concurrent vg needed for our disk heartbeat. .<br />

It’s a good idea to verify via lspv once this has completed to make sure the <strong>device</strong> and vg is show<br />

appropriately as follows:<br />

GT40#/ lspv<br />

vpath0 000a7f5af78e0cf4 enhconcvg<br />

SL55#/lspv<br />

vpath3 000a7f5af78e0cf4 enhconcvg


<strong>Creat<strong>in</strong>g</strong> <strong>Disk</strong> <strong>Heartbeat</strong> Devices and Network<br />

There are two different ways to do this. S<strong>in</strong>ce we have already created the enhanced concurrent vg, we<br />

can use the discovery method (1) and let HA f<strong>in</strong>d it for us. Or we can do this manually via the Predef<strong>in</strong>ed<br />

<strong>device</strong>s method (2). Follow<strong>in</strong>g is an example of each.<br />

1) <strong>Creat<strong>in</strong>g</strong> via Discover Method: (See Note)<br />

Enter smitty hacmp�Extended Configuration�Discover <strong>HACMP</strong>-related Information from<br />

Configured Nodes�Press Enter<br />

This will run automatically and create a clip_config file that conta<strong>in</strong>s the <strong>in</strong>formation it has discovered.<br />

Once completed, go back to the Extended Configuration menu and chose:<br />

Extended Topology Configuration�Configure <strong>HACMP</strong> Communication Interfaces/Devices�Add<br />

Communication Interfaces/Devices�Add Discovered Communication Interface and<br />

Devices�Communication Devices �Choose appropriate <strong>device</strong>s (ex. vpath0 and vpath3)<br />

Select Po<strong>in</strong>t-to-Po<strong>in</strong>t Pair of Discovered Communication Devices to Add<br />

Move cursor to desired item and press F7. Use arrow keys to scroll.<br />

ONE OR MORE items can be selected.<br />

Press Enter AFTER mak<strong>in</strong>g all selections.<br />

# Node Device Device Path Pvid<br />

> nodeGT40 vpath0 /dev/vpath0 000a7f5af78<br />

> nodeSL55 vpath3 /dev/vpath3 000a7f5af78<br />

Note:<br />

Base HA 5.1 appears to have a problem when us<strong>in</strong>g the Discovered Devices this method. If you<br />

get this error: "ERROR: Invalid node name 000a7f5af78e0cf4".<br />

Then you will need apar IY51594. Otherwise you will have to create via the Pre-Def<strong>in</strong>ed Devices<br />

method. Once corrected, this section will be completed<br />

2) <strong>Creat<strong>in</strong>g</strong> via Pre-Def<strong>in</strong>ed Devices Method<br />

When us<strong>in</strong>g this method, it is necessary to create a diskhb network first, then assign the disk-node pair<br />

<strong>device</strong>s to the network. Create the diskhb network as follows:<br />

smitty hacmp � Extended Configuration � Extended Topology Configuration �Configure <strong>HACMP</strong><br />

Networks �Add a Network to the <strong>HACMP</strong> cluster � choose diskhb � Enter desired network name<br />

(ex. disknet1)--press Enter<br />

smitty hacmp � Extended Configuration � Extended Topology Configuration � Configure <strong>HACMP</strong><br />

Communication Interfaces/Devices � Add Communication Interfaces/Devices� Add Pre-Def<strong>in</strong>ed<br />

Communication Interfaces and Devices �<br />

Communication Devices � Choose your diskhb Network Name �


Add a Communication Device<br />

Type or select values <strong>in</strong> entry fields.<br />

Press Enter AFTER mak<strong>in</strong>g all desired changes.<br />

[Entry Fields]<br />

* Device Name [GT40_hboverdisk]<br />

* Network Type diskhb<br />

* Network Name disknet1<br />

* Device Path [/dev/vpath0]<br />

* Node Name [GT40]<br />

For Device Name, that is a unique name you can chose. It will show up <strong>in</strong> your topology under this<br />

name, much like serial heartbeat and ttys have <strong>in</strong> the past.<br />

For the Device Path, you want to put <strong>in</strong> /dev/. Then choose the correspond<strong>in</strong>g node for<br />

this <strong>device</strong> and <strong>device</strong> name (ex. GT40). Then press Enter.<br />

You will repeat this process for the other node (ex. SL55) and the other <strong>device</strong> (vpath3). This will<br />

complete both <strong>device</strong>s for the diskhb network.<br />

Test<strong>in</strong>g <strong>Disk</strong> <strong>Heartbeat</strong> Connectivity<br />

Once the <strong>device</strong> and network def<strong>in</strong>itions have been created, it is a good idea to test it and make sure<br />

communications is work<strong>in</strong>g properly. If the volume group is varied on <strong>in</strong> normal mode on one of the<br />

nodes, the test will probably not work.<br />

/usr/sb<strong>in</strong>/rsct/b<strong>in</strong>/dhb_read is used to test the validity of a diskhb connection. The usage of dhb_read is<br />

as follows:<br />

dhb_read -p <strong>device</strong>name //dump diskhb sector contents<br />

dhb_read -p <strong>device</strong>name -r //receive data over diskhb network<br />

dhb_read -p <strong>device</strong>name -t //transmit data over diskhb network<br />

To test that disknet1, <strong>in</strong> the example configuration, can communicate from nodeB(ex. SL55) to nodeA<br />

(ex. GT40), you would run the follow<strong>in</strong>g commands:<br />

On nodeA, enter:<br />

dhb_read -p rvpath0 -r<br />

On nodeB, enter:<br />

dhb_read -p rvpath3 -t<br />

Note: That the <strong>device</strong> name is raw <strong>device</strong> as designated with the “r” proceed<strong>in</strong>g the <strong>device</strong> name.


If the l<strong>in</strong>k from nodeB to nodeA is operational, both nodes will display:<br />

L<strong>in</strong>k operat<strong>in</strong>g normally.<br />

You can run this aga<strong>in</strong> and swap which node transmits and which one receives. To make the network<br />

active, it is necessary to sync up the cluster. S<strong>in</strong>ce the volume group has not been added to the resource<br />

group, we will sync up once <strong>in</strong>stead of twice.<br />

Add Shared <strong>Disk</strong> as a Shared Resource<br />

In most cases you would have your diskhb <strong>device</strong> on a shared data vg. It is necessary to add that vg <strong>in</strong>to<br />

your resource group and synchronize the cluster.<br />

smitty hacmp� Extended Configuration� Extended Resource<br />

Configuration > Extended Resource Group Configuration� Change/Show<br />

Resources and Attributes for a Resource Group� and press Enter.<br />

Choose the appropriate resource group, enter the new vg (enhconcvg) <strong>in</strong>to the volume group list and<br />

press Enter.<br />

Return to the top of the Extended Configuration menu and synchronize the cluster.<br />

Monitor <strong>Disk</strong> <strong>Heartbeat</strong><br />

Once the cluster is up and runn<strong>in</strong>g, you can monitor the activity of the disk (actually all) heartbeats via<br />

lssrc -ls topsvcs. An example of the output follows:<br />

Subsystem Group PID Status<br />

topsvcs topsvcs 32108 active<br />

Network Name Indx Defd Mbrs St Adapter ID Group ID<br />

disknet1 [ 3] 2 2 S 255.255.10.0 255.255.10.1<br />

disknet1 [ 3] rvpath3 0x86cd1b02 0x86cd1b4f<br />

HB Interval = 2 secs. Sensitivity = 4 missed beats<br />

Missed HBs: Total: 0 Current group: 0<br />

Packets sent : 229 ICMP 0 Errors: 0 No mbuf: 0<br />

Packets received: 217 ICMP 0 Dropped: 0<br />

NIM's PID: 28724<br />

Be aware that there is a grace period for heartbeats to start process<strong>in</strong>g. This is normally around 60<br />

seconds. So if you run this command quickly after start<strong>in</strong>g the cluster, you may not see anyth<strong>in</strong>g at all<br />

until heartbeat process<strong>in</strong>g is started after the grace period time has elapsed.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!