Redpaper - IBM Redbooks

High Availability z/OS Solutions 

for WebSphere Business 

Integration Message Broker V5 

Develop a highly available WebSphere 

Business Integration Message Broker 

solution on z/OS 

Configure WebSphere MQ QSGs to 

support Message Broker in a 

Sysplex 

Example Message Broker 

high availability 

implementations 

Front cover 

Saida Davies 

Dean Barker 

Steve Kiernan 

Jon Mc Namara 

Redpaper 

ibm.com/redbooks

International Technical Support Organization 

High Availability z/OS Solutions for WebSphere 

Business Integration Message Broker V5 

October 2004

Note: Before using this information and the product it supports, read the information in 

“Notices” on page v. 

First Edition (October 2004) 

This edition applies to Version 5, Release 01 of IBM WebSphere Business Integration Message 

Broker for z/OS (product number 5655-K60). 

© Copyright International Business Machines Corporation 2004. All rights reserved. 

Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP 

Schedule Contract with IBM Corp.

Contents 

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v 

Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 

The team that wrote this Redpaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 

Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 

Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 

Chapter 1. Introduction and technical overview. . . . . . . . . . . . . . . . . . . . . . 1 

1.1 Project overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 

1.1.1 Availability levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 

1.2 Testing methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 

1.2.1 HTTP Listener . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 

1.3 Environment overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 

Chapter 2. Design decisions that affect high availability . . . . . . . . . . . . . . 9 

2.1 Considerations when designing for high availability . . . . . . . . . . . . . . . . . 10 

2.2 High Availability with Message Broker . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

2.3 WebSphere MQ options in supporting HA with Message Broker . . . . . . . 12 

2.3.1 WebSphere MQ clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2.3.2 WebSphere MQ shared queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 

2.4 Message Broker flow design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 

2.4.1 Affinities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 

2.4.2 Error processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 

2.5 Message Broker networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 

2.5.1 Further considerations with Message Broker networks . . . . . . . . . . 19 

Chapter 3. Topology and system setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 

3.1 High Availability configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 

3.1.1 Active-active . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 

3.1.2 Active-passive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 

3.2 Test environment topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 

3.3 The z/OS LPARs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 

3.4 The DB2 data sharing group configuration . . . . . . . . . . . . . . . . . . . . . . . . 24 

3.5 The WebSphere MQ queue sharing group configuration . . . . . . . . . . . . . 25 

3.5.1 Queue sharing group configuration considerations. . . . . . . . . . . . . . 25 

3.6 The WebSphere Business Integration Message Broker configuration . . . 26 

3.6.1 Message Broker configuration considerations . . . . . . . . . . . . . . . . . 27 

3.6.2 Additional Message Broker configuration hints . . . . . . . . . . . . . . . . . 29 

© Copyright IBM Corp. 2004. All rights reserved. iii

3.7 Automatic Restart Management configuration . . . . . . . . . . . . . . . . . . . . . 30 

3.8 The configuration manager platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 

3.9 An overview of WebSphere Business Integration Message Broker 

SupportPac IP13 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

Chapter 4. Failover scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 

4.1 Test environment setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 

4.1.1 SupportPac IP13 setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 

4.1.2 Message Broker configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 

4.2 Scenario 1 - Initial state with all components active . . . . . . . . . . . . . . . . . 37 

4.3 Scenario 2 - Execution group failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 

4.4 Scenario 3 - Message Broker failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 

4.5 Scenario 4 - Queue manager failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

4.6 Scenario 5 - DB2 failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

4.7 Scenario 6 - z/OS system failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 

Appendix A. Sample code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 

ARM policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 

Broker customization input file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 

Appendix B. Additional material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 

Locating the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 

Using the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 

How to use the Web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 

Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 

IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 

Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 

Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 

How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 

Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 

iv High Availability z/OS Solutions for WebSphere Business Integration Message Broker V5

Notices 

This information was developed for products and services offered in the U.S.A. 

IBM may not offer the products, services, or features discussed in this document in other countries. Consult 

your local IBM representative for information on the products and services currently available in your area. 

Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM 

product, program, or service may be used. Any functionally equivalent product, program, or service that 

does not infringe any IBM intellectual property right may be used instead. However, it is the user's 

responsibility to evaluate and verify the operation of any non-IBM product, program, or service. 

IBM may have patents or pending patent applications covering subject matter described in this document. 

The furnishing of this document does not give you any license to these patents. You can send license 

inquiries, in writing, to: 

IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A. 

The following paragraph does not apply to the United Kingdom or any other country where such provisions 

are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES 

THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, 

INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, 

MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer 

of express or implied warranties in certain transactions, therefore, this statement may not apply to you. 

This information could include technical inaccuracies or typographical errors. Changes are periodically made 

to the information herein; these changes will be incorporated in new editions of the publication. IBM may 

make improvements and/or changes in the product(s) and/or the program(s) described in this publication at 

any time without notice. 

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any 

manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the 

materials for this IBM product and use of those Web sites is at your own risk. 

IBM may use or distribute any of the information you supply in any way it believes appropriate without 

incurring any obligation to you. 

Information concerning non-IBM products was obtained from the suppliers of those products, their published 

announcements or other publicly available sources. IBM has not tested those products and cannot confirm 

the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on 

the capabilities of non-IBM products should be addressed to the suppliers of those products. 

This information contains examples of data and reports used in daily business operations. To illustrate them 

as completely as possible, the examples include the names of individuals, companies, brands, and products. 

All of these names are fictitious and any similarity to the names and addresses used by an actual business 

enterprise is entirely coincidental. 

COPYRIGHT LICENSE: 

This information contains sample application programs in source language, which illustrates programming 

techniques on various operating platforms. You may copy, modify, and distribute these sample programs in 

any form without payment to IBM, for the purposes of developing, using, marketing or distributing application 

programs conforming to the application programming interface for the operating platform for which the 

sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, 

therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, 

modify, and distribute these sample programs in any form without payment to IBM for the purposes of 

developing, using, marketing, or distributing application programs conforming to IBM's application 

programming interfaces. 

© Copyright IBM Corp. 2004. All rights reserved. v

Trademarks 

The following terms are trademarks of the International Business Machines Corporation in the United States, 

other countries, or both: 

Eserver® 

Eserver® 

Redbooks (logo) 

Eserver® 

ibm.com® 

z/OS® 

zSeries® 

DB2® 

IBM® 

Lotus® 

MQSeries® 

MVS 

NetView® 

OS/390® 

Parallel Sysplex® 

Redbooks 

The following terms are trademarks of other companies: 

RACF® 

S/390® 

SupportPac 

ThinkPad® 

Tivoli® 

WebSphere® 

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the 

United States, other countries, or both. 

UNIX is a registered trademark of The Open Group in the United States and other countries. 

Other company, product, and service names may be trademarks or service marks of others. 

vi High Availability z/OS Solutions for WebSphere Business Integration Message Broker V5

Preface 

When designing and implementing a production grade Message Broker solution 

on z/OS®, one of the most important factors to consider is high availability. This 

IBM® Redpaper examines the design considerations inherent in configuring a 

highly available Message Broker environment. Also demonstrated is the use of 

the coupling facility for WebSphere MQ queue sharing groups (QSG) and 

Automatic Restart Management (ARM) in order to support WebSphere Business 

Integration Message Broker HA in a sysplex environment. Finally, examples of 

the behavior of Message Broker during failover are provided, including 

transaction rate measurements and throughput statistics. 

The team that wrote this Redpaper 

This paper was produced by a team of specialists from around the world, working 

with the International Technical Support Organization (ITSO), Raleigh, NC. 

Saida Davies is a Project Leader with the 

ITSO. She is a certified senior IT specialist and 

has 15 years of experience in IT. Saida has 

published several Redbooks on various 

business integration scenarios. She has 

experience in the architecture and design of 

WebSphere® MQ solutions, has extensive 

knowledge of IBM’s z/OS operating system, 

and a detailed working knowledge of both IBM 

and Independent Software Vendors’ operating 

system software. In a customer facing role 

with IBM Global Services, her role included the 

development of services for WebSphere MQ within the z/OS and Windows® 

platform. This covered the architecture, scope, design, project management, and 

implementation of the software on stand-alone systems or on systems in a 

Parallel Sysplex® environment. She has a degree in Computer Science, and her 

background includes z/OS systems programming. 

© Copyright IBM Corp. 2004. All rights reserved. vii

From left: Dean, Steve, and Jon 

Dean Barker is an IT Specialist at the IBM Hursley Laboratories in the UK. He 

has over ten years experience as a MVS Systems Programmer. He holds a 

degree in Chemical Engineering from the University of Manchester Institute of 

Science and Technology. He has an excellent knowledge of the z/OS operating 

system and sysplex environment. His other areas of expertise include Unix 

System Services and WebSphere Application Server for z/OS. Dean assisted 

with the z/OS system setup pre-requirements for this project which enabled the 

team to accomplish the full scope of this Redpaper. 

Steve Kiernan is a Consulting IT Specialist on the New England and Upstate 

New York WebSphere technical team. Before joining IBM eight years ago, he 

spent fifteen years in the banking industry as a mainframe systems programmer. 

Since Steve joined IBM, he has worked with the entire WebSphere Business 

Integration platform and primarily with WebSphere MQ and WebSphere 

Business Integration MB on z/OS. 

Jon Mc Namara is an IT Specialist in the Hursley WebSphere Business 

Integration Services Team. He provides WebSphere Business Integration 

customers with a range of expert technical services. Jon’s areas of expertise 

include z/OS, WebSphere MQ, WebSphere MQ Integrator, WebSphere Business 

Integrator FN, WebSphere Business Integration Message Broker, and 

WebSphere Business Integration Event Broker. He is also a recognized expert in 

multicast technology. 

viii High Availability z/OS Solutions for WebSphere Business Integration Message Broker V5

The Redpaper team would like to thank the following people, located in IBM 

Hursley, UK, for their guidance, assistance, and contributions to this edition: 

► Gary Willoughby, Manager, WebSphere Business Integration, EMEA 

WebSphere Lab Services 

► Ralph Bateman, WebSphere MQ Brokers z/OS Change Team specialist 

► Karen Burgess, WebSphere MQ z/OS FV & Test 

► William Chorlton, IT Specialist, zSeries® DB/DC Support 

► Rob Convery, WebSphere MQ System Test 

► Peter Edwards, MQSeries® Test 

► Emir Garza, Consulting IT Specialist, Software Group/Application Integration 

Middleware 

► Luisa Lopez de Silanes Ruiz, Consulting IT Specialist, Software 

Group/Application Integration Middleware 

► Colin Paice, WebSphere MQ Scenarions - z/OS performance specialist, IBM 

Hursley, UK 

► Alasdair Paton, WebSphere MQ Brokers - System Test team lead 

► Pete Siddall, Software Engineer 

► Vicente Suarez, IT Specialist, Software Group/Application Integration 

Middleware 

Preface ix

Become a published author 

Join us for a two- to six-week residency program! Help write an IBM redbook 

dealing with specific products or solutions, while getting hands-on experience 

with leading-edge technologies. You will team with IBM technical professionals, 

Business Partners, and customers. 

Your efforts will help increase product acceptance and customer satisfaction. As 

a bonus, you will develop a network of contacts in IBM development labs and 

increase your productivity and marketability. 

Find out more about the residency program, browse the residency index, and 

apply online at: 

Comments welcome 

ibm.com/redbooks/residencies.html 

Your comments are important to us! 

We want our papers to be as helpful as possible. Send us your comments about 

this Redpaper or other Redbooks in one of the following ways: 

► Use the online Contact us review redbook form found at: 

ibm.com/redbooks 

► Send your comments in an e-mail to: 

redbook@us.ibm.com 

► Mail your comments to: 

IBM Corporation, International Technical Support Organization 

Dept. HZ8 Building 662 

P.O. Box 12195 

Research Triangle Park, NC 27709-2195 

x High Availability z/OS Solutions for WebSphere Business Integration Message Broker V5

Chapter 1. Introduction and technical 

overview 

1 

This paper is divided into four chapters: 

► This chapter, Chapter 1, “Introduction and technical overview” on page 1, 

describes the scope of this book and provides an overview of a high 

availability environment for WebSphere Business Integration Message Broker 

(Message Broker) on z/OS. 

► Chapter 2, “Design decisions that affect high availability” on page 9, provides 

considerations, hints, and tips for the Message Broker in a high availability 

environment and developing message flows and publish and subscribe 

applications in a high availability environment. 

► Chapter 3, “Topology and system setup” on page 21, describes how to 

achieve a high availability environment with Message Broker and contains 

specific configuration information for the software components used during 

failover testing. 

► Chapter 4, “Failover scenarios” on page 33, describes the procedures to 

create a testing environment and details the results of the failover testing. 

© Copyright IBM Corp. 2004. All rights reserved. 1

1.1 Project overview 

This paper explores the architecture and explains the tasks necessary to build a 

high availability environment for Message Broker on z/OS. Remember that the 

Message Broker high availability environment is necessary for the business 

applications that it serves and not for itself. Today’s business critical applications 

often demand a high degree of availability, and the z/OS system administrator 

must exploit Parallel Sysplex technology to provide continuous service levels 

during system outages, planned or otherwise. 

1.1.1 Availability levels 

There are several ways to describe the degree of availability that a system 

requires. The availability of a system can be expressed as a number of nines or 

by using different terms. 

The number of nines term represents the percentage of time for which the 

system is available. Three nines means that it is available 99.9% of the time; five 

nines means that it is available 99.999% of the time. These numbers become 

much more significant when you look at these figures in terms of downtime over 

a fixed period. For example, over a period of one year, a system with 99.9% 

availability would have 8.75 hours of downtime, while a system with 99.999% 

availability would be down for just 315 seconds. 

The following terms are also used to describe availability: 

► Continuous availability: This term describes a system that experiences no 

discernible downtime, where neither scheduled nor unscheduled outages 

occur. A continuously available system detects the error and immediately 

provides an alternative component that is already ready to go. Also, this 

system should support the scheduling of planned maintenance by allowing 

workload to be transparently transferred away from the components or 

subsystems that are the subject of the maintenance activity. Although 

continuous availability seems difficult to achieve, it is possible to obtain such 

availability by combining hardware, software, and operational procedures that 

can mask outages from the user so that the user does not perceive that a 

system outage occurs. 

► Continuous operation: This term describes a system that experiences no 

discernible downtime due to scheduled outages. However, this system's 

availability will not be as high as it would be with a continuously available 

system because it may suffer unplanned outages. 

► High availability (HA): This term describes a system that can detect a single 

failure and react to it automatically within a matter of a few minutes at most. 

2 High Availability z/OS Solutions for WebSphere Business Integration Message Broker V5

These kinds of systems will operate with an amount of planned and 

unplanned outages. There are two significant aspects in this definition: 

– The system should survive a single failure but a second failure may result 

in a loss of service. 

– The detection of a fault and the triggering of an action to recover from it 

should be automatic, that is, require no manual intervention. 

Figure 1-1 illustrates the relationship between the components of systems 

availability. 

Continuous Availability 

Concurrency 

Redundancy 

Systems Management 

Reliable, Robust and Resilient Technologies 

High Availability + Continuous Operation 

Figure 1-1 High availability + Continuous Operation = Continuous Availability 

There are times with the term fault tolerance is used mistakenly when the terms 

high availability or continuous availability are meant. Fault tolerance describes 

systems which, in the event of a failure, can substitute a replacement component 

for the failed component in a matter of a few milliseconds. This kind of 

achievement is supported by components that have redundant sub-components, 

error checking and correction for data, retry capabilities for basic operations, 

alternate path for I/O requests, and so forth. However, there may also be a single 

point of failure which, despite the fault tolerance, can cause a component to fail. 

Similarly, if one important component in a system is not fault-tolerant, then the 

system is not fault-tolerant even though all other components are. 

Specifically, HA refers to a specific level of service that provides availability in the 

event of a single, non-catastrophic component failure. Transaction capacity may 

Chapter 1. Introduction and technical overview 3

e compromised while the failing component recovers, but recovery should be 

automatic. 

This paper describes a resilient HA Message Broker architecture based on an 

Active-Active Message Broker configuration using a WebSphere MQ queue 

sharing group (QSG) and a DB2 data sharing group (DSG) across two Logical 

Partitions (LPAR). Testing revealed that this configuration provides for sustained 

Message Broker services through a variety of failover scenarios. 

For a complete overview of HA concepts, refer to the IBM Redbook Highly 

Available WebSphere Business Integration Solutions, SG24-6328. 

The individual component failures that we tested in the research for this paper 

include: 

► A Message Broker execution group failure 

► A Message Broker failure 

► A queue manager failure 

► A DB2 sub-system failure 

In our testing, we restarted each of these components in place on the LPAR on 

which they failed using Automatic Restart Management (ARM). We also tested a 

complete LPAR failure in which we restarted the queue manager and Message 

Broker from one LPAR on the active LPAR. 

There are countless other individual component failures that we could not test 

due to time constraints. We used ARM to restart all failing components. 

Appendix A, “Sample code” on page 49 provides the ARM policy that was in 

effect at the time of the failure. Some organizations may prefer NetView® 

automated recovery to ARM. You can achieve similar restart functionality using 

NetView. 

A coupling facility (CF) failure is categorized as a catastrophic failure. To achieve 

continuous availability on z/OS, a CF failover has to be accounted for, specifically, 

by duplexing the WebSphere MQ admin and list structures. The only single point 

of failure in our tested configuration was the CF. CF failover testing is outside the 

scope of this paper. For further information refer the following information: 

► MVS Setting up a Sysplex: 

http://publibz.boulder.ibm.com/cgi-bin/bookmgr_OS390/BOOKS/IEA2F132/ 

CCONTENTS?SHELF=IEA2BK34&DN=SA22-7625-06&DT=20030423145429 

► The WebSphere MQ System Administration Guide - Part 5, “Recovery and 

Restart” at: 

http://publibfp.boulder.ibm.com/epubs/html/csqsaw02/csqsaw02tfrm.htm 


1.2 Testing methodology 

Because the goal of this paper is to evaluate Message Broker service availability, 

we obtained measurements from both the Message Broker and the application 

that was using the Message Broker. The WebSphere Business Integration 

Broker - Sniff test and performance on z/OS (IBM Category 2 SupportPac IP13) 

provided an elegant solution. For details on downloading the SupportPac, see: 

http://www-306.ibm.com/software/integration/support/supportpacs/ 

Note: We recommended that you install the most current version of the 

SupportPac before using the techniques described in this paper. 

SupportPac IP13 consists of documentation, programs, and message flows 

designed to help you measure application throughput on Message Broker on 

z/OS. SupportPac IP13 provides statistics at the end of the job, including the 

total number of round trip messages that the SupportPac IP13 application 

processed. You can find the specific SupportPac IP13 parameters that we used 

for testing in Chapter 4, “Failover scenarios” on page 33. We ran the application 

on both LPARs during several of the tests to simulate a realistic production 

environment. 

Figure 1-2 on page 6 illustrates our testing environment topology. 


Figure 1-2 Test environment topology 

In our testing, we also obtained statistics from the Message Broker while archive 

accounting data collection was active. Again, you can find specific procedures for 

this Chapter 4, “Failover scenarios” on page 33. By using SupportPac IP13 to 

generate a workload for the Message Broker to consume and by comparing 

those results with the statistics produced by the Message Broker, we were able 

to evaluate continuous throughput during component failover. 

Thus, the SupportPac IP13 batch application drove messages to a shared 

message queue. We configured a SupportPac IP13 message flow to consume 

those messages on the shared queue and then deploy the .bar file to both 

brokers. Reply messages were sent to a second shared queue and, in turn, were 


1.2.1 HTTP Listener 

then consumed by the SupportPac IP13 batch application program in a 

request-reply model. 

This testing methodology was sufficient to evaluate component failover. 

However, the statistics supplied in Chapter 4, “Failover scenarios” on page 33 are 

insufficient to evaluate Message Broker performance or capacity for the following 

reasons: 

► The z/OS environment was not tuned. The main purpose of SupportPac IP13 

is to provide a way to begin to tune the Message Broker environment, but we 

did not use it for this purpose in this exercise. 

► The LPARs did not have identical hardware resources. For instance, LPAR0 

was configured with two logical processors, while LPAR2 was configured with 

one logical processor. 

► The loads that we placed on the configuration were designed to provide a 

continuous stream of messages for the duration of the individual tests, but the 

steady-state load did not stress available resources. A single instance of the 

SupportPac IP13 message flow was used for each broker. 

Although HTTP Listener support in Message Broker is expected to makes its 

debut on z/OS shortly (FixPack 4), the current version does not support it. 

Therefore, configuration and testing using HTTP Listener was beyond the scope 

of this paper. 

It is essential for the system administrator to consider dynamic VIPAs for a high 

availability environment. WebSphere MQ and WebSphere Business Integration 

Message Broker rely on TCP/IP stack for communicating with other systems. 

For further information, on this subject, there is an IBM white paper entitled 

Leveraging z/OS TCP/IP Dynamic VIPAs and Sysplex Distributor for higher 

availability, GM13-0026, at: 

http://www-1.ibm.com/servers/eserver/zseries/library/techpapers/ 

gm130165.html 

1.3 Environment overview 

For our testing environment, we kept the software configuration as simple as 

possible to achieve a HA environment. The goal was to build a resilient Message 

Broker server capable of providing a very high degree of availability. Because 


Message Broker is a WebSphere MQ and DB2® application, these components 

must also be configured for high availability, specifically: 

► The two z/OS queue managers participated in a QSG to allow for shared 

application queues. 

► The queue managers were not clustered. 

► A pair of WebSphere MQ channels was used between each z/OS queue 

manager and the Windows Configuration Manager queue manager. 

► A DB2 DSG was created to support the QSG. 

The operating environments used for this book include: 

► The Message Broker installed on two z/OS 1.4 LPARs in a sysplex running on 

a 9672 model XZ7. The LPARs have the following software levels: 

– z/OS 1.4 with service up to RSU0406 

– DB2 V7R1M0 with service up to RSU0312 

– WebSphere MQ V5R3M1 at FP5 

– WebSphere Business Integration Message Broker V5R0M1 at Fix Pack 3 

► The Message Broker Configuration Manager installed on an IBM T40 

ThinkPad® with: 

– Windows XP SP1 

– WebSphere MQ V5.3 Fix Pack 5 

– WebSphere Business Integration Message Broker V5 Fix Pack 3 

– DB2 V8.1 FixPak 2 

A thorough description of the software configuration and setup that we used in 

our testing environment can be found in Chapter 3, “Topology and system setup” 

on page 21. 

Note: We recommend that you install the most current release of all Fix Packs 

before using the techniques described in this paper. 


2 

Chapter 2. Design decisions that affect 

high availability 

This chapter examines design decisions that you need to address when 

configuring an HA system. It begins by taking a high-level look at what HA entails 

and continues by examining the details of designing an HA system on z/OS using 

WebSphere MQ and Message Broker. 

Remember that when you design a Message Broker configuration on z/OS which 

must satisfy stringent HA requirements, there are a number of factors you should 

consider. The main question is how to ensure that the availability targets for a 

Message Broker implementation are met? 

The most common implementation of Message Broker is as a central hub 

through which all messages in a messaging architecture are processed. Thus, 

the potential exists for Message Broker to become a bottleneck and single point 

of failure for the whole message processing network. Because of the business 

impact of the messaging hub failing, you should use a thorough approach when 

designing the queue manager and an HA implementation of Message Broker. 


2.1 Considerations when designing for high availability 

Many factors contribute to the HA of a Message Broker environment, including 

platform and operating system agnostic factors, such as the following: 

► Documented procedures 

► Practicing procedures 

► Reliable hardware 

► Failover 

► Dual networks 

Other platform and operating system specific factors can also affect the HA of a 

Message Broker environment, including: 

► Shared queues 

► Reliable operating system 

► WebSphere MQ clustering 

All of these factors are important when considering availability. Many of these 

factors are platform agnostic and are equally relevant across the range of 

supported platforms, including z/OS. These factors include contingency against 

disk failures, processor and memory, and power supply or power itself, network, 

and so forth. 

Because the entire range of components that would make up a highly available 

system is extremely varied and covers a number of separate disciplines, we 

concentrated our efforts on a distinct subset in our testing environment. This 

subset included methods by which the queue manager and Message Broker 

were configured to support a highly available system. This section also outlines 

some of the issues to take into account when using functionality such as shared 

queues, clustering, and cloned brokers. 

One approach to mitigating against failures is to install multiple instances of the 

components that might fail. On z/OS, there is an architecture which lends itself 

well to providing a highly available system. For example, there is the well-proven 

architecture of using separate LPARs connected via the coupling facility. This 

method is collectively referred to as a parallel sysplex environment. A parallel 

sysplex environment is comprised of highly available disks, the power supply, 

and the ability to communicate across LPARs by using the coupling facility. With 

this architecture, you can implement a highly available queue manager and 

Message Broker system by using, for example, a number of brokers operating on 

separate LPARs, all of which are supported by shared WebSphere MQ queues 

via the coupling facility or by WebSphere MQ clustering. 


2.2 High Availability with Message Broker 

When formulating a design for a highly available Message Broker system, it is 

worth considering the benefits of having a network topology consisting of a hub 

that contains a number of queue managers which serve brokers. This type of 

configuration allows the workload to be spread evenly across the brokers, while 

also providing redundancy in the event that a Message Broker or a queue 

manager fails. 

In addition, running brokers across multiple LPARs allows work to continue in 

case of an LPAR failure. 

At a lower level, there is a choice between deploying the same configuration to 

each broker in the hub or assigning a different configuration to each broker. If a 

particular message flow runs only on a single broker in the hub, then this 

environment does not present a high availability solution. If the same message 

flow is the chosen configuration for all brokers, any broker can process any 

message. However, the resource allocation cannot be tailored as finely. 

With a separate configuration for each broker, you can control the number of 

copies of a flow run on each broker in the hub. You can also tailor the priorities so 

that LPARs with more capacity can handle more of the work. However, this 

environment can introduce affinities to particular LPARs and compromise HA 

effectiveness. Whichever way you configure brokers, you must create appropriate 

.bar files and execution groups for each broker in the hub. Table 2-1 on page 12 

shows the advantages and disadvantages to each configuration. 

Chapter 2. Design decisions that affect high availability 11

Table 2-1 Advantages and disadvantages of clones and tailored execution groups 

Cloned execution groups Tailored execution groups 

Advantages Advantages 

Provides for the highest availability level 

because all brokers can process any 

message. 

Gives a wide distribution of work evenly 

across the system, reducing the potential 

for bottlenecks. 

Disadvantages Disadvantages 

Does not provide granular control over 

resource allocation. 

Does not provide support for flows which 

may be required to turnaround messages 

more quickly than others. 

2.3 WebSphere MQ options in supporting HA with 

Message Broker 

There are advantages and disadvantages you should consider before embarking 

on a particular method of providing HA with Message Broker. However, once you 

have determined the best method for your configuration, there are a number of 

ways you can use WebSphere MQ to support your configuration, using either 

WebSphere MQ clustering or WebSphere MQ shared queues. 

This section discusses the WebSphere MQ options available in supporting a 

highly available Message Broker configuration. 

2.3.1 WebSphere MQ clustering 

Allows for the best use of the resources 

available. 

Allows more flows which are used more or 

have a more demanding message 

processing need. 

May introduce affinities to particular 

brokers and LPARS. 

Provides a lower availability because 

fewer brokers can process all messages. 

To understand how clustering works, consider this example. A sending 

application that sends a message to a receiving application uses the queue 

manager to send the message. If the queue manager of the receiving application 

is offline, the clustering functionality would automatically reroute the message to 

another queue manager. This queue manager acts as a clone of the receiving 

application. This function is totally transparent to the applications. Additional 

information about clustering can be found in WebSphere MQ Queue Manager 

Clusters, SC34-6061. 


The ability to define these clustered queues on several of the queue managers in 

the cluster leads to increased system availability. Each queue manager runs 

equivalent instances of the applications that process the messages. If one of the 

queue managers fails, or the communication to it is suspended, that queue 

manager is temporarily excluded from the choice of destinations for the 

messages. This functionality provides a number of benefits, the main one being 

HA. 

There are a few points to consider when approaching WebSphere MQ clustering 

as the preferred choice for implementing HA. For example, if a queue manager 

fails, then although any subsequent messages sent to the cluster are not routed 

to the failed queue manager, any persistent messages already sent to the queue 

manager, but not yet processed by the broker, are marooned on this failed queue 

manager. Also, for WebSphere MQ clustering to effectively balance loads 

between queues on different queue managers, messages sent from outside the 

cluster need to be sent to a gateway queue manager. A gateway queue manager 

creates a single point of failure for the whole logical hub. Therefore, the gateway 

queue manager would need to have very high availability. 

Also, remember that in the event that the Message Broker falls over but the 

clustered queue manager is still operational, messages under a cluster queue 

environment are still sent to the queue manager. These messages are not 

processed until the Message Broker is once again functional. Unless there is a 

monitoring tool which registers that the Message Broker is inoperative, 

messages can build up unnoticed. 

A way of safeguarding against this situation is to employ a monitoring tool, which 

registers both the status of the Message Broker and the subsequent build up of 

messages on the broker input queue. If, for example, the normal buildup on the 

broker input queue is 50 messages, it might be wise to set an alert on the queue. 

If messages build to a number greater than 50, then an alert is sent to the 

monitoring tool, and an operator can determine if there is a problem. 

Table 2-2 shows the advantages and disadvantages to using WebSphere MQ 

clustering. 

Table 2-2 Advantages and disadvantages to using WebSphere MQ clustering 

Advantages Disadvantages 

Provides resilient failover because 

WebSphere MQ can redirect messages 

away from a failed queue manager to be 

picked up by a functional one. 

Persistent messages awaiting processing 

while on the queue of a failed queue 

manager remain there until the queue 

manager is operational. 



Uses the available resources because 

WebSphere MQ clustering can direct 

messages across all available LPARS. 

Takes advantage of cloned applications 

and identically configured brokers by 

workload balancing, thus reducing the risk 

of bottlenecks. 

2.3.2 WebSphere MQ shared queues 

Gateway queue managers can act as a 

single point of failure for the entire 

WebSphere MQ cluster. 

Multi-part messages and transactions can 

cause affinities to particular queue 

managers, thus reducing the availability of 

the system. 

Should Message Broker fail and the queue 

manager is still functional, the WebSphere 

MQ cluster continues to send messages 

to that queue manager even though they 

are not processed. 

Now that we have examined the issues in using WebSphere MQ clustering to 

support HA, this section discusses WebSphere MQ shared queues. Shared 

queues rely heavily on the coupling facility and a shared DB2 database to allow 

queue managers to form a queue sharing group (QSG). Once the queue 

managers have configured the QSG, they can create queues that are available 

to all the queue managers in the group. 

This functionality sounds very similar to WebSphere MQ clustering and, in many 

respects, there is a certain amount of overlap in the service shared queues 

provide. However, a shared queue is one instance of a queue which is “shared” 

among a number of queue managers. This method differs from WebSphere MQ 

clustering in that the clustered queues, despite having the same designation, are 

actually separate entities. In practice, once a shared queue has been configured 

allowing a number of queue managers to access it, all of the brokers associated 

with those queue managers (or any application) could have access to that queue, 

despite existing on separate LPARs. 

A simple scenario which uses shared queues might resemble the following. A 

sending application puts messages to a shared queue. These messages are 

then available to all queue managers in the QSG across all relevant LPARs. 

Associated with these queue managers are brokers, all of which are retrieving 

messages from the same shared queue. Should an LPAR, queue manager, or a 

broker fall over, the other brokers would continue retrieving and processing 

messages from the shared queue. 

Because cloned applications and the broker which serve them are able to a 

connect to queue managers in a QSG and because all queue managers in a 

QSG can access shared queues, applications do not need to rely on the 


availability of any one queue manager. Should an LPAR, queue manager, or a 

Message Broker fall over, shared queues on the functioning system can continue 

to service cloned applications. 

There are distinct advantages for using shared queues as opposed to using 

WebSphere MQ clustering. For example, in the event that the Message Broker 

falls over but the queue manager is operational, messages under a cluster queue 

environment are still sent to the queue manager. These messages then wait on 

the queue until the Message Broker restarts. This situation can cause problems, 

especially if the business requires a quick turnaround response to the message. 

This situation does not occur when using shared queues, because it is the 

Message Broker that is retrieving the message from the shared queue directly, 

rather then the message being passed to a queue manager which only supports 

one Message Broker. 

Another advantage of using shared queues is that both applications and the 

broker can take advantage of serialization during shared queue peer recovery. 

As previously mentioned, if a queue manager should fail, then all the queue 

managers in the QSG continue to process messages on the shared queue. 

However, they also finish the shared queue work for the incomplete units of work 

which were running on the failed queue manager. One potential issue here is that 

during the process of rolling back uncommitted messages it is possible that 

another queue manager may attempt to process one of the messages in the unit 

of work still on the queue. In this case, the messages in the unit of work would 

then be out of sequence. By using the serialization mechanism, no other queue 

manager in the QSG can access any of the messages in the unit of work until a 

full roll back has been completed. This method ensures that the messages are 

processed in the correct order. 

Some of the restrictions of using shared queues on z/OS when it comes to HA. 

Possibly the most important restriction which should be taken into account is the 

maximum message size. For the current version, WebSphere MQ 5.3, the 

largest message which can be placed upon a shared queue is 63 KB. Thus, the 

largest message that can be transported via the highly available shared queue 

system is also 63 KB. In addition, there is currently a restriction of eight million 

messages which can be stored on a queue. 

Also, for WebSphere MQ 5.2, the shared queues do not support persistent 

messaging. However, the non-persistent messages on a shared queue do 

survive a queue manager restart. Thus, this method does have a form of 

resiliency (although technically, this is not persistence). With WebSphere MQ 

5.3, persistent messages are supported with shared queues. Remember that 

non-persistent messages do survive the queue manager restart. This situation 

can be advantageous unless an application specifically depends upon the 

non-persistent messages being deleted on the event of a queue manager restart. 


In this instance, you should consider incorporating a policy for “cleaning up” 

these messages. 

One aspect of the availability of messages on shared queues which you should 

consider is the effect of using a two-phase commit. For example, consider a unit 

of work which retrieves a message from a shared queue, updates a DB2 table 

based on the contents of the message, and returns a response to a shared 

queue. A 2pc protocol is used to ensure that either all or none of the processing 

happens, typically coordinated by Resource Recovery Services (RRS). If the 

queue manager where this unit of work is running were to fail during the 

two-phase commit, it is possible that the unit of work would be left indoubt in 

WebSphere MQ. 

In this case, the correct resolution of the unit of work cannot be determined until 

the queue manager is restarted and can reconnect with RRS. It is therefore not 

possible for the other queue managers to perform peer recovery for this unit of 

work (this means, the input message cannot be rolled back by a peer for 

processing via a different queue manager which has an impact on the availability 

of the messages consumed and produced by that indoubt unit of work). The 

availability of other messages on the shared queue is not impacted unless 

serialization tokens are being used to ensure an ordering of processing 

messages on this queue. This is further explained in WebSphere MQ for z/OS 

System Administration Guide V5.3.1, SC34-6053-01. 

Note: When messages are put to a shared queue, the data is logged on a 

particular queue manager, but that process does not cause any kind of 

message affinity to a queue manager. The affinity is between a unit of work 

and a queue manager. 

Table 2-3 outlines the issues you should consider when looking at shared queues 

to support a highly available environment. 

Table 2-3 Advantages and disadvantages of using WebSphere MQ shared queues 


Resilient support of HA over multiple 

LPARs. Should a queue manager or a 

Message Broker fail, other brokers 

continue to retrieve messages. 

Maximum message size of 63 KB. Even if 

a current system does not have messages 

of that size or above, it places a restriction 

on the natural growth of the system. 

However, this restriction is increased to 

100 MB in the next release of WebSphere 

MQ by further using DB2. 


2.4 Message Broker flow design 

2.4.1 Affinities 


Messages are pulled from the queue, 

rather than pushed onto another queue 

managers clustered queue. Thus, if a 

Message Broker becomes inoperative, 

messages do not build up on the queue 

manager serving the broker. Instead, the 

messages remain on the shared queue to 

be picked up by a functional broker. 

The ability to fully utilize the available 

resources as shared queues allows 

messages to be shared across available 

LPARS. 

Ability to take advantage of cloned 

applications and identically configured 

brokers by allowing less busy applications 

and brokers to retrieve messages from the 

shared queue at there own pace, thus 

reducing the risk of bottlenecks. 

Ability to take advantage of the 

serialization mechanism to ensure backed 

out units of work are completed in the 

correct order. 

Limitation of the coupling facility storage 

that no more than eight million messages 

can be stored on a queue. 

Non-persistent messages which could 

survive queue manager restarts can be an 

issue for the application. If an application 

depends on non-persistent messages not 

surviving a queue manager restart, it is 

possible you will need to add functionality 

to deal with this issue. 

Depending on how the coupling facility is 

being used, the storage taken up by 

shared queues could well be relatively 

expensive. 

There are a number of points to keep in mind with HA in the design of message 

flows. 

Message affinities can arise when multiple messages are required to make up a 

single business transaction or when messages must be processed in a specific 

order. This situation is also true when a Message Broker is forced to keep state, 

as is the case when using the aggregation node. This type of situation can often 

mean that a particular queue manager or Message Broker is required to process 

these messages. In terms of HA, this affinity can lead to vulnerabilities. If a 

particular queue manager or Message Broker is required to process a long string 


of messages making up a transaction, then that manager or Message Broker can 

become a single point of failure for that transaction. This situation creates a 

potentially vulnerable point in the system, because they the queue manager or 

Message Broker require messages to be processed through the same thread, 

broker, and so on. 

However, your business needs may make it necessary to introduce affinities into 

the Message Broker infrastructure. In such a situation, you should be aware of 

the issues that introducing affinities may cause. One way to alleviate some of 

these issues is to run transactions under transaction coordination, so that in the 

event that the queue manager or the Message Broker falls over, the transaction 

is rolled back. Doing so means that the transaction can then be re-processed by 

another instance of the Message Broker. 

Publish and Subscribe functionality is an example of where a user application 

can develop an affinity to a specific broker. Unless cloning is used, once a 

subscription has been registered at a specific broker, only that broker can deliver 

publications to the subscriber. This situation applies specifically to RealTime, 

Multicast, and Telemetry transports because these applications establish a direct 

TCP/IP connection to the client, which is broken if the server goes down. 

2.4.2 Error processing 

To ensure that the system is as highly available as possible, it is worthwhile to 

understand how Message Broker handles error processing. Message Broker has 

very defined procedures for dealing with invalid messages. Not all of them, 

however, are conducive to HA. 

While processing messages, the normal procedure for the broker when 

encountering an invalid message is to place the message back on the input 

queue. The broker then retries the message until the retry count is reached. At 

this point, the broker simply places the message back on the queue and ceases 

processing it. When it comes to supporting HA, this behavior causes a problem. 

Once an invalid message is placed back on the input queue, other potentially 

valid messages are building up on the queue behind it. This building up of 

messages continues until the invalid message is removed. This situation is not 

ideal in the HA environment. To work-around this type of situation, you will need 

to incorporate some tailored error processing. The error processing can be as 

simple or as complex as you prefer, from simply putting the invalid message onto 

an error queue, to creating an error processing subflow supported with Try Catch 

functionality. The important factor is to remove the invalid message from the input 

queue, thereby preventing valid messages from building up behind the invalid 

message. 


2.5 Message Broker networks 

A popular feature of Message Broker is that it supports the Publish and 

Subscribe functionality. A subscriber can connect to a broker in the Publish and 

Subscribe topology and receive publications made on that broker or others in the 

network. In the default case, the subscriber has an affinity to the broker with 

which it has registered. There are various ways to reduce this affinity. For 

WebSphere MQ subscribers, a solution is to use the Cloned Broker feature, 

which allows a subscriber to receive its publications directly from several different 

brokers, thus reducing the affinity. 

Note: Cloned brokers cannot be used with other broker topologies, such as 

hierarchies and collectives. For subscribers using the RealTime transport, this 

option is not available because a connection is established directly with a 

particular broker and the subscription is removed if the connection is broken. 

This paper does not discuss Publish and Subscribe applications in detail, but 

there is more information about HA for Publish and Subscribe applications in the 

Redbook WebSphere Business Integration Pub/Sub Solutions, SG24-6088. 

2.5.1 Further considerations with Message Broker networks 

When working with Message Broker networks, you should also consider the User 

Name Server’s function. This function provides a level of security for the Publish 

and Subscribe function on Message Broker. Overall, the User Name Server 

provides an excellent level of service across the WebSphere MQ supported 

platforms. However, an examination of the User Name Server with regard to HA 

has not been outlined in this document because the User Name Server is not 

particularly well suited to the large scale, high volume type of application typically 

found on z/OS. On z/OS, the User Name Server periodically accesses RACF® to 

match access control privileges against the user IDs which require access to 

topics, resulting in a rebuild of the cache. 

One point to consider, would be that while this is not a significant overhead when 

dealing with moderate numbers of user IDs, it can become more significant once 

the number of user IDs begins to grow. With the nature and scale of applications 

which are based on z/OS and the subsequent number of user IDs and RACF 

definitions, the User Name Server may not be seen to scale particularly well in 

this environment. This situation may potentially cause performance problems and 

be expensive in the amount of resources required to frequently check RACF 

definitions against a large number of user IDs. 



Chapter 3. Topology and system setup 

This chapter describes the topology and system setup of HA Message Broker 

environments on z/OS, and more specifically, the environment we used to create 

the failover scenarios tested in Chapter 4, “Failover scenarios” on page 33. 

In this chapter, you can find information about: 

► High Availability configurations. 

► Test environment topology. 

► The z/OS LPARs. 

► The DB2 data sharing group configuration. 

► The WebSphere MQ queue sharing group configuration. 

► The WebSphere Business Integration Message Broker configuration. 

► Automatic Restart Management configuration. 

► The configuration manager platform. 

► An overview of WebSphere Business Integration Message Broker 

SupportPac IP13. 

3 


3.1 High Availability configurations 

3.1.1 Active-active 

3.1.2 Active-passive 

A high availability environment with WebSphere Business Integration Message 

Broker on z/OS requires the use of WebSphere MQ queue sharing groups as 

described in Chapter 2, “Design decisions that affect high availability” on page 9. 

That setup in turn implies the need for a coupling facility and at least two z/OS 

systems in a sysplex that host a DB2 DSG. 

These components, employing two z/OS images for simplicity, form the basis of 

the following HA configuration descriptions and, in the next section, the test 

environment that we used for this paper. The selection of either one of the 

configurations described below depends on your business need and the 

available capacity of your environment. 

An active-active setup describes a business environment where both z/OS 

images run WebSphere Business Integration Message Broker to process 

messages from the shared queue. Both systems process the business 

application load. In the event of a failure on one image, the other is still available 

to carry on processing alone for the duration of the recovery. While this setup 

provides HA and fully utilizes the two machines in daily operation, it may cause 

performance degradation during recovery. 

And active-passive setup describes a business environment where only one of 

the z/OS images runs the daily broker business. In the event of a failure on this 

image, the second z/OS image is available on hot standby to pick up the load 

from the shared queue. While this provides for HA and also maintains throughput 

during a failure, in normal operation only one machine is being fully utilized. 


3.2 Test environment topology 

The HA environment that we used for this paper employed two z/OS images to 

process work in an “active-active” configuration. Figure 3-1 illustrates the 

topology of the system components listed in Chapter 4, “Failover scenarios” on 

page 33. 

z/OS 

DB2 

(DSG) 

QMGR 

(QSG) 


Execution Groups 

Message 

Flows 

IP13 

Figure 3-1 Queue sharing group topology 

Coupling 

Facility 

Configuration 

Manager 

Two z/OS systems in a sysplex host a DB2 data sharing group and a WebSphere 

MQ queue sharing group. A WebSphere Business Integration Message Broker 

runs on each system, connected to the respective queue manager. The brokers 

are administered by the configuration manager installed on a ThinkPad running 

Windows. The application workload to test the configuration is supplied by 

SupportPac IP13. 

z/OS 

IP13 

DB2 

(DSG) 

QMGR 

(QSG) 



Message 

Flows 

Chapter 3. Topology and system setup 23

3.3 The z/OS LPARs 

There are three LPARs in the sysplex running on a 9672 model XZ7 with an 

internal coupling facility. For simplicity, only two of the LPARs are used for the 

active-active configuration, as depicted in Figure 3-1 on page 23. One of the 

LPARs, MVSM0, is connected to two logical processors while the second, 

MVSM2, to just one. MVSM0 has 1536 MB of real storage configured, while 

MVSM2 has 1024 MB. 

Otherwise, the MVS images have similar configurations with RRS, DB2, 

WebSphere MQ, WebSphere Business Integration Message Broker, and TCP/IP 

all active. UNIX® System Services is running with shared Hierarchical File 

System (HFS). Both images are running z/OS 1.4. 

3.4 The DB2 data sharing group configuration 

The DB2 subsystems are data sharing and accessible as an Open Database 

Connectivity (ODBC) data source via a Distributed Data Facility (DDF) that is 

using TCP/IP. The DB2 subsystems in data sharing group DSN710PM are DFM0 

and DFM2, running on MVSM0 and MVSM2 respectively. Figure 3-2 displays the 

data sharing group. 

Figure 3-2 The DB2 data sharing group 


3.5 The WebSphere MQ queue sharing group 

configuration 

A queue manager is set up on each system and defined to a QSG. To create the 

QSG, we defined some new structures to the coupling facility. The procedure for 

this is described in the WebSphere MQ z/OS System Setup Guide, which can be 

found at: 

http://www-306.ibm.com/software/integration/mqfamily/library/manualsa/manuals/ 

platspecific.html#zos 

The queue managers in queue sharing group MB01 are WMQ0 and WMQ2, 

running on MVSM0 and MVSM2 respectively. Figure 3-3 shows the QSG. 

Figure 3-3 The queue sharing group 

3.5.1 Queue sharing group configuration considerations 

Some things you should consider when configuring the QSG are: 

► If you are using the z/OS Automatic Restart Management (ARM) to restart 

queue managers on different z/OS images, then: 

– Define every queue manager with a sysplex-wide, unique four character 

subsystem name that uses a command prefix string (CPF) scope of S. 

– Configure each queue manager with a different channel listener port. In 

the event of a system failure, this configuration is important if there is a 

requirement to restart the failing queue manager on another system in the 

sysplex that is already running a queue manager. 

► Use the INITSIZE value of 10 MB as provided in the sample job CSQ4CFRM 

when you define the admin structure. Specify a larger amount than this for the 

SIZE value so that it can expand. You should also consider adding the 

ALLOWAUTOALT(YES) parameter to allow system-initiated alters 

(automatic-alter) for this structure. 


► Review actual structure sizes regularly and make sure the coupling facility 

Resource Manager (CFRM) policy is updated to reflected actual usage. 

Application structure allocations grow with use. 

► Confirm that the admin structure is a single point of failure. If a second 

coupling facility is available then duplexing of the structure should be 

considered. 

► Define the queue manager logs with SHAREOPTIONS(2 3) as in the Job 

Control Language (JCL) of sample job CSQ4BSDS, because shared queue 

recovery requires that a queue manager can access the logs of peers within 

the QSG. 

► Define application CFSTRUCT with CFLEVEL(3) and RECOVER YES when using 

persistent messages (the default is level 2 and NO). 

► If using persistent messages, review log sizes prior to migration to shared 

queues. Data logged is slightly larger, and BACKUP CFSTRUCT copies 

shared queue messages to the log. Non-persistent messages on a private 

queue may be logged under some circumstances (for instance, if the 

message stays on the queue for an extended length of time), but this situation 

does not occur for non-persistent messages that reside on a shared queue. 

► Prevent Hierarchical Storage Manager (HSM) from migrating the queue 

manager DB2 tables. If the tables are migrated, you may experience startup 

problems. 

► Do not attempt to manually change QSG DB2 tables unless directed by IBM 

support. Even apparently innocuous changes may leave DB2 table 

information out of sync with the coupling facility or with other tables. 

► Allow a QSG for a group listener (using VIPA) and shared channels, which 

may be useful in providing HA for WebSphere MQ applications. 

3.6 The WebSphere Business Integration Message 

Broker configuration 

A message broker is created for each of the queue managers described 

previously. The message broker names are WMQ0BRK and WMQ2BRK, running 

on MVSM0 and MVSM2 respectively. 

You can find instructions for creating the broker at: 

http://publib.boulder.ibm.com/infocenter/wbihelp/index.jsp 


3.6.1 Message Broker configuration considerations 

Some things to consider when configuring the Message Broker are: 

► Each Message broker connected to the QSG should run under a different 

user ID. One of the reasons for this configuration is that the broker tables 

must be created with unique names in the DB2 data sharing group. The user 

ID, which is used for variable DB2_TABLE_OWNER in the mqsicompcif file, is 

prepended to the broker table names to form the fully qualified unique table 

names. 

► If the desired response to a system failure is to use ARM to restart a broker, 

with its queue manager on another z/OS system in the sysplex, then consider 

the following: 

– The UNIX System Services (USS) environment across the sysplex should 

be configured to use shared HFS. 

– The broker root directories must not be created under the /var directory. 

This is because the /var directory resolves to &SYSNAME/var and thus is 

system specific. System specific HFSs are unmounted when the owning 

system goes down and are not available to the rest of the sysplex while 

that system remains down. 

Instead, create a new directory under the sysplex root and create the 

broker root directories under this same directory. 

For example: 

mkdir /wbimb 

mkdir /wbimb/WMQ0BRK 

The root directory for any given broker is now visible to each z/OS system 

in the sysplex and is not unmounted if its host system goes down. 

Depending on the number of brokers created, you may also want to create 

additional HFSs to be mounted at each broker root directory, or one larger 

HFS at the higher directory, in order to prevent filling up the sysplex root 

HFS mounted at ‘/’. 

– When editing the mqsicompcif file, the DB2 group attach name (in this 

case DFPM) should be used for the DB2_SUBSYSTEM variable rather 

than specifying a specific DB2 subsystem. This configuration means that 

the broker can use another DB2 subsystem in the data sharing group to 

access its tables in the event of failure. 

– Use of BP0 as the DB2 buffer pool chosen for the broker database is not 

recommended. Furthermore, to enable the broker to restart on another 

system in the sysplex the buffer pool selected needs to be active on that 

system. Use the alter bufferpool command to activate it. If the buffer 


pool is not available, errors occur on the system to where the broker is 

moved. Figure 3-4 lists those errors. 

Figure 3-4 Buffer Pool 2 is not available to DB2 subsystem DFM0 

– Having defined the local broker DB2 buffer pools on the relevant systems, 

you must then allocate a global buffer pool to allow data to be shared 

between the DB2 subsystems. This configuration requires defining a new 

structure to the coupling facility, for example called: 

DSN710PM_GBP2 


Figure 3-5 Global Buffer Pool 2 not defined 

If the global buffer pool is not available, an error is submitted to the system 

log to where the broker has been moved. Figure 3-5 illustrates this error. 

3.6.2 Additional Message Broker configuration hints 

The following are additional Message Broker configuration hints: 

► Before running the mqsicreatebroker command make sure you have 

completed the actions in the section Setting up your OMVS user ID with 

instructions on setting the USS environment variables PATH and NLSPATH. 

Otherwise, the command and the corresponding messages are not found. 

You can find Setting up your OMVS user ID in the Message Broker section of 

the IBM WebSphere Business Integration Information Center z/OS section, 

under the chapter Configuring the Broker Domain. 

The Information Center is online at: 

http://publib.boulder.ibm.com/infocenter/wbihelp/index.jsp 

► The SYSTEM.BROKER.* queues must be private queues defined to the 

broker’s queue manager. They cannot be shared. 

► To enable the broker to use ARM to restart it, customize the ARM section of 

the mqsicompcif file and run the mqsicustomize program. The following are 

example lines from mqsicompcif: 

USE_ARM=’YES’ 

ARM_ELEMENTNAME=’WMQ0BRK’ 

ARM_ELEMENTTYPE=’SYSWMQI’ 

You can find a downloadable sample ZIP file in Appendix B, “Additional 

material” on page 53. 


3.7 Automatic Restart Management configuration 

On z/OS in a sysplex environment, a program can enhance its recovery potential 

by registering as an element of ARM. You can reduce the impact of an 

unexpected error to an element using ARM, because MVS can restart 

automatically without operator intervention. Program recovery via ARM is 

provided by activating an ARM policy using the SETXCF START command. 

In this environment, the active ARM policy is set to restart the following: 

► DB2 

► WebSphere MQ 

► WebSphere Business Integration Message Broker 

If any of these restarts fail, MVS restarts them in place. However, in the event of 

a system failure, these elements are restarted on another system in the sysplex. 

In this case, DB2 is only restarted in ‘light’ mode on another system. Restart light 

enables DB2 to restart with minimal storage footprint to quickly release retained 

locks and then terminate normally. Additionally, to enable Internal Resource Lock 

Manager (IRLM) to obtain the full benefits of a restart light, the ARM policy for the 

IRLM element should specify PC=YES. 

The JCL used to create the ARM policy for our testing environment is illustrated 

in Appendix A, “Sample code” on page 49. A ZIP file containing this JCL is 

available for download in Appendix B, “Additional material” on page 53. 

3.8 The configuration manager platform 

All components of WebSphere Business Integration Message Broker V5 Fix 

Pack 3 were installed on a T40 ThinkPad along with WebSphere MQ V5.3 FP5 

and DB2 V8.1 FP2. A set of WebSphere MQ channels was built between the 

mobile computer queue manager and each z/OS queue manager. 

The SupportPac IP13 message flows were imported into the workspace as per 

installation instructions. 

A .bar file was built for deployment of the DB2U message flow. The .bar file was 

deployed to both brokers. 


3.9 An overview of WebSphere Business Integration 

Message Broker SupportPac IP13 

IBM provides SupportPac IP13 that can be used to check the setup of a z/OS 

system and its WebSphere MQ and WebSphere Business Integration Message 

Broker configuration. SupportPac IP13 includes example flows and programs 

with documentation that facilitates the capability to do quick performance and 

health checks on your z/OS system. 

The SupportPac IP13 is available online at the following Web address: 

http://www-1.ibm.com/support/docview.wss?rs=203&uid=swg24006892&loc=en_US&cs=ut 

f-8&lang=en 

SupportPac IP13 is used in the testing environment for this paper to provide work 

for the brokers and to measure transaction rates. Broker statistics are used in 

conjunction with SupportPac IP13 transaction rate data to illustrate the effects of 

the various scenarios tested in Chapter 4, “Failover scenarios” on page 33. 



Chapter 4. Failover scenarios 

This chapter describes the high availability failover scenarios that we tested. The 

resultant application and broker statistics are shown along with our conclusions 

and explanations. 

This chapter describes the following failover scenarios: 

► Scenario 1 - Initial state with all components active. 

► Scenario 2 - Execution group failover. 

► Scenario 3 - Message Broker failover. 

► Scenario 4 - Queue manager failover. 

► Scenario 5 - DB2 failover. 

► Scenario 6 - z/OS system failover. 

4 


4.1 Test environment setup 

As previously mentioned, the SupportPac IP13 batch job (OEMPUTX) was used 

to drive a message load to a shared queue. 

Note: For further information about all components of the IBM Category 2 

SupportPac IP13 refer to the documentation available from: 

http://www-306.ibm.com/software/integration/support/supportpacs/ 

The SupportPac IP13 message flow (DB2U) running on both brokers consumed 

these messages, then placed reply messages on a second shared queue. The 

reply messages were subsequently picked up by OEMPUTX, completing the 

request-reply loop. Statistics generated by OEMPUTX were compared to 

statistics generated by the Message Broker and the results evaluated. The DB2U 

message flow also updates a DB2 database called SHAREPRICES. 

We executed all test scenarios at least three times to provide reasonable 

accuracy in reporting statistics. 

Transaction rates for the applications (running on the MVSM0 and MVSM2 

LPARs) and the brokers (WMQ0BRK, WMQ2BRK) can be compared only within 

a given test scenario. Comparison of these rates between test scenarios is 

meaningless. 

4.1.1 SupportPac IP13 setup 

This section provides details on the SupportPac IP13 setup. 

DB2 configuration 

JCL is supplied with SupportPac IP13 to both set up the application environment 

and run the tests. Job JDB2DEFS creates a DB2 SHAREPRICES table and 

inserts some initial data. 

Since this table is unique, the JDB2DEFS job should only be run once. However, 

the JCL in this job would normally create the table by taking the user ID of one 

broker as the schema name. When the other broker later tries to access the 

table, it refers to it by taking a different user ID, under which it runs, as the 

schema. This is because in each broker’s dsnaoini file the CURRENTSQLID is 

set to the user ID of that broker. So, the second broker is not able to access the 

table. 

To avoid this problem in the environment for this book, broker 2 was given 

authority to set its CURRENTSQLID to the user ID of broker 0. However, a better 


solution for setting up SupportPac IP13 for running in a sysplex is to follow the 

steps below: 

1. Create a new RACF group that will be the schema name for the unique 

SupportPac IP13 DB2 table. 

2. Connect the user IDs under which the broker started tasks run to the 

previously created RACF group. 

3. Create the SupportPac IP13 SHAREPRICES table using job JBDB2DEFS, 

inserting the new RACF group name as the schema. 

4. SET CURRENTSQLID in the ESQL of the mesageflow. See DB2U message 

flow configuration below for details. 

OEMPUTX batch job configuration 

There are several parameters that can be passed to OEMPUTX in order to 

cause different behavior of the application. The parameters we chose for our 

testing included: 

-n25000 Total number of messages to put, in this case 25000. 

-m4 Causes the program to run for four minutes. In the test 

environment, no single broker was capable of processing 25000 

messages within four minutes, so these first two parameters 

provided a steady-state message flow rate. 

-gm Causes OEMPUTX to use same MQMD.MsgId for all MQPUTs in 

the loop, and MQGET replies by this MsgId. With this option, 

there is no Message Broker affinity. 

-w45 Sets the MQGET MQMD.WaitInterval to wait_time seconds. 

-c Commit every msgs_in_loop messages. msgs-in-loop was not 

set and the default is one message per batch. 

We also used non-persistent messages. 

DB2U message flow configuration 

Recall that in each broker’s dsnaoini file, the CURRENTSQLID is set to the user 

ID of that broker. This configuration would normally prevent brokers from 

accessing DB2 tables outside of their schema. To circumvent this behavior: 

► Add the following line of ESQL to the DB2U message flow to set the 

CURRENTSQLID to the RACF group name created for the SupportPac IP13 

SHAREPRICES table (see Chapter 3, “Topology and system setup” on 

page 21). This value has to have single quotes surrounding it for z/OS DB2 to 

process it correctly, as shown in the example below: 

PASSTHRU('SET CURRENT SQLID= ''SYSDSP'''); 

Chapter 4. Failover scenarios 35

► In a production environment, you may prefer to pass the CURRENTSQLID as 

part of the input message and perform the set via a variable. We did not test 

the following ESQL, but we have provided it as a sample: 

DECLARE ID CHARACTER; 

SET ID = InputBody.My.CurrentSQLID; 

PASSTHRU(‘{SET SQLID = (?)}’,ID); 

4.1.2 Message Broker configuration 

In our testing, we used WebSphere Business Integration Message Broker 

without any tuning. We built a single execution group for each broker, and the 

same .bar file was deployed to the execution group. The final task necessary to 

run the tests was to turn message flow accounting (archive data) on. Statistical 

data is written to a message queue based on the collection interval defined for 

the broker. The first step is to build a subscription to the Message Broker to 

publish statistics. You can build the subscription as follows: 

► The subscription topic is: $SYS/Broker/+/StatisticsAccounting/# 

► This subscription must be put to the SYSTEM.BROKER.CONTROL.QUEUE 

► Specify a private queue to publish the statistics to, for example: 

STATS.IN.WMQ0BRK 

The IH03 SupportPac (RFHUTIL) was used to put the subscription and retrieve 

the XML statistics messages as they were produced. Once a statistics message 

is successfully read into RFHUTIL, select the Data tab. Then, select the XML 

radio button under Data Format. The XML tags of the statistics message are self 

explanatory. 

For easier viewing, you can import this file into Excel using the following 

procedure: 

1. From the Data tab panel in RFHUTIL, press Ctrl+A to select the XML 

message, then Ctrl+C to copy it. 

2. Paste the data into WordPad (not NotePad). Save the data to a file. 

3. This file can then be imported into Excel. In Excel, select File → Open, then 

navigate to the directory you just saved the WordPad file in. At the bottom of 

the navigation screen, select All File Types and select the file. 

4. The Text Import Wizard opens. Click Next. 

5. On the second panel of the Text Import Wizard in the Delimiter section, select 

Other and in the box to the right of Other type a double quote (“). Then select 

Finish. With a little re-sizing, you can easily read the statistics in the 

spreadsheet. 


In order for the Message Broker to publish statistics, the following command was 

issued at the MVS console: 

f WMQ0BRK,cs a=yes,g=yes,j=yes,n=basic,t=basic,o=xml,c=active 

4.2 Scenario 1 - Initial state with all components active 

The first scenario that was measured involved all components of the environment 

active in their normal functioning state. The SupportPac IP13 batch jobs 

submitted to both systems allowed statistics to be gathered for this control 

situation. 

Figure 4-1 displays the configuration. 

z/OS (MVSM0) 

DB2 (DFM0) 

(DSG - DSN710PM) 

QMGR (WMQ0) 

(QSG - MB01) 


(WMQ0BRK) 


Message 

Flows 

IP13 

Figure 4-1 All components active 

Coupling 

Facility 

Configuration 

Manager 

z/OS (MVSM2) 

IP13 

DB2 (DFM2) 


QMGR (WMQ2 

(QSG - MB01) 


(WMQ2BRK) 


Message 

Flows 


The batch jobs provided work for the brokers for two minutes. The results from 

the SupportPac IP13 measurements and the Message Broker statistics recorded 

are displayed in Table 4-1. 

Table 4-1 SupportPac IP13 and Message Broker statistics 

IP13 Statistics MVSM0 MVSM2 Total 

Total Transactions 9884 9568 19450 

Elapsed Time (seconds) 119.942 119.615 

Application CPU Time (seconds) 9.419 11.479 

Transaction Rate (trans/sec) 82.406 79.973 

Round trip per msg (ms) 12.134 12.504 

Average App CPU per msg (ms) 0.952 1.199 

Broker Statistics WMQ0BRK WMQ2BRK 

Total Number Input Messages 9234 10216 19450 

Total Elapsed Time (seconds) 96.401 98.762 

Total CPU Time (seconds) 31.590 33.167 

Conclusions 

From the results, we concluded the following: 

► The total number of transactions processed by the SupportPac IP13 

applications on both systems equals the total number of messages processed 

by each of the two brokers. 

► A greater number of SupportPac IP13 application transactions are processed 

by the MVSM0 LPAR, while a greater number of messages are processed by 

the broker on MVSM2. This slight imbalance is due to the different resources 

available to each system. 

4.3 Scenario 2 - Execution group failover 

In the second scenario the execution group for WMQ2BRK was made to fail by 

issuing a cancel command (C SAMPLE2) to the MVS console, as illustrated in 

Figure 4-2 on page 39. Message Broker execution groups are automatically 

recovered by the Message Broker, so no action is required by ARM. For this and 

all subsequent tests, Coordinated Transaction was selected in the deployment 

descriptor of the .bar file. 


z/OS (MVSM0) 

DB2 (DFM0) 


QMGR (WMQ0) 

(QSG - MB01) 


(WMQ0BRK) 


Message 

Flows 

IP13 

Figure 4-2 Execution Group failover 

Coupling 

Facility 

Configuration 

Manager 

z/OS (MVSM2) 

The batch jobs provided work for the brokers for four minutes. Table 4-2 records 

the results from the SupportPac IP13 measurements and the Message Broker 

statistics. 




IP13 




Round trip per msg (ms) 15.730 16.405 

Average App CPU per msg (ms) 1.128 1.265 

DB2 (DFM2) 

(DSG - DSN710PM)) 

QMGR (WQM2) 

(QSG - MB01) 


(WMQ2BRK) 


Message 

Flows 







Conclusions 

From the results, we concluded the following: 

► At first glance, it would seem as though messages were lost under this test, 

but this is not the case for this or any other test scenario. When the execution 

group is cancelled, the statistics message is dumped by the broker. Upon 

successful execution group startup, a new statistics message is created, the 

interval time is reset, and accounting starts fresh. Since the execution group 

was cancelled very close to the beginning of the test and it recovered quickly, 

there are 735 messages that WMQ2BRK processed that are not accounted 

for in the above table. 

► The high number of messages consumed by WMQ2BRK underscores the 

speed of the execution group recovery. 

► The higher number of messages consumed by WMQ0BRK compared to 

transactions completed on MVSM0 illustrates how broker 0 processes 

messages from the SupportPac IP13 batch jobs on both systems, taking the 

extra load while the execution group is down on MVSM2. 

4.4 Scenario 3 - Message Broker failover 

The third scenario tests the failure of WMQ2BRK Message Broker. The failure 

was simulated by issuing a MVS cancel command (C WMQ2BRK,ARMRESTART), as 

illustrated in Figure 4-3 on page 41. The ARM policy in effect ensures the broker 

restarts immediately. You can find the policy details in Appendix A, “Sample 

code” on page 49. 


z/OS (MVSM0) 

DB2 (DFM0) 


QMGR (WMQ0) 

(QSG - MB01) 


(WMQ0BRK) 


Message 

Flows 

IP13 

Figure 4-3 Message Broker failover 

Coupling 

Facility 

Configuration 

Manager 

z/OS (MVSM2) 



statistics. 




IP13 




Round trip per msg (ms) 46572 88645 

Average App CPU per msg (ms) 1045 1034 

DB2 (DFM2) 

(DSG- DSN710PM) 

QMGR (WMQ2) 

(QSG - MB01) 


(WMQ2BRK) 


Message 

Flows 




Total Number Input Messages 23098 unknown 23098 

Total Elapsed Time (seconds) 224.622 unknown 

Total CPU Time (seconds) 76.918 unknown 

Conclusions 

From the results, we concluded: 

► Again, the statistics message is dumped by the broker so no statistics are 

available for WMQ2BRK. Cancelling the broker caused multiple SVC dumps 

to be taken and CPU usage was at 100% on MVSM2 for some time. This 

explains why the transaction rate for MVSM2 is less than half of MVSM0. 

► The higher number of messages consumed by WMQ0BRK compared to 

transactions completed on MVSM0 illustrates how broker 0 processes 

messages from the SupportPac IP13 batch jobs on both systems, taking the 

extra load while WMQ2BRK is down. 

4.5 Scenario 4 - Queue manager failover 

Scenario 4 tests the failure of WMQ2 queue manager. The failure was simulated 

by issuing a MVS cancel command (WMQ2STOP QMGR MODE(RESTART) as illustrated 

in Figure 4-4 on page 43. The ARM policy in effect ensures the queue manager 

restarts immediately. Upon successful queue manager startup, the Message 

Broker dynamically reconnects to the queue manager and issues the following 

message to the MVS log: 

+BIP2091I WMQ2BRK 0 The broker has reconnected to WebSphere Business 

Integration successfully. : ImbAdminAgent(1095) 

The OEMPUTX batch job does not reconnect to the queue manager after a 

failure, so for this test the batch job was submitted only on MVSM0. 


z/OS (MVSM0) 

DB2 (DFM0) 


QMGR (WMQ0) 

(QSG - MB01) 


(WMQ0BRK) 


Message 

Flows 

IP13 

Figure 4-4 Queue Manager failover 

Coupling 

Facility 

Configuration 

Manager 



statistics. 



Total Transactions 37204 n.a. 37204 

Elapsed Time (seconds) 239.919 n.a. 

Application CPU Time (seconds) 35.269 n.a. 

Transaction Rate (trans/sec) 155.059 n.a. 

Round trip per msg (ms) 25774 n.a. 

Average App CPU per msg (ms) 947 n.a. 

z/OS (MVSM2) 

IP13 

DB2 (DFM2) 


QMGR (WMQ2) 

(QSG - MB01) 


(WMQ2BRK) 


Message 

Flows 







Conclusions 

From the results, we concluded that the statistics message is lost prior to the 

queue manager restart since statistics gathering is a WebSphere MQ Pub/Sub 

function. The statistics shown for WMQ2BRK are after the WMQ2 queue 

manager restart. WMQ2BRK processed about 30% of the total messages. This 

is due to the length of time it takes the queue manager to restart and the broker 

to reconnect. 

4.6 Scenario 5 - DB2 failover 

There is a known issue with the Message Broker in that if DB2 should fail while it 

is active, it does not dynamically reconnect as is the case with a queue manager 

failure, as illustrated in Figure 4-5 on page 45. So, while ARM restarts DB2, the 

Message Broker requires a manual restart to resume work. The DB2 connection 

is not actively managed by the Message Broker and, in fact, it would not realize 

the DB2 had failed until there was a need to access a database. This issue is 

being addressed in APAR PQ92596. As such, there were no statistics produced 

for this test. 


z/OS (MVSM0) 

DB2 (DFM0) 


QMGR (WMQ0) 

(QSG - MB01) 


(WMQ0BRK) 


Message 

Flows 

Figure 4-5 DB2 failover 

4.7 Scenario 6 - z/OS system failover 

IP13 

Coupling 

Facility 

Configuration 

Manager 

z/OS (MVSM2) 


(WMQ2BRK) 


Message 

Flows 

The final scenario tests the failure of MVSM2. The failure was simulated by 

removing MVSM2 from the sysplex with a system reset from the hardware 

console, as illustrated in Figure 4-6 on page 46. ARM restarts the DFM2 DB2 

subsystem in light mode on MVSM0 to release any locks retained. It then restarts 

queue manager, WMQ2, and broker, WMQ2BRK, on the surviving system, 

MVSM0. 

IP13 

DB2 (DFM2) 

(DSG- DSN710PM) 

QMGR (WMQ2) 

(QSG - MB01) 


z/OS 

DB2 

(DSG) 

QMGR 

(QSG) 



Message 

Flows 

Figure 4-6 LPAR failover 

Because the OEMPUTX batch job running on MVSM2 ended when the system 

went down, there are no statistics for this test. However, to test the operation of 

the moved broker, the job was resubmitted on MVSM0, and it was observed that 

both brokers functioned normally on the one surviving z/OS image. 

Once MVSM2 was brought back into the sysplex with its DB2 subsystem active, 

the MVSM2 queue manager and broker were shut down on MVSM0 and 

restarted on MVSM2. Once again, the OEMPUTX batch job was submitted and 

normal operation was verified. 

46 High Availability z/OS Solutions for WebSphere Business Integration Message Broker V5 

IP13 

Coupling 

Facility 

Configuration 

Manager 

z/OS 

IP13 

DB2 

(DSG) 

QMGR 

(QSG) 



Message 

Flows

4.8 Summary 

Conclusions 

Though collection of statistics was not appropriate for this scenario, the scenario 

demonstrated that: 

► With the ARM policy provided in Appendix A, “Sample code” on page 49, 

when the host z/OS system failed, the Message Broker and its pre-requisite 

subsystems were automatically restarted on the chosen surviving z/OS 

system in the sysplex. 

► Successful operation of the moved Message Broker was verified. 

► On moving the Message Broker back to its original z/OS system its 

successful operation was again verified. 

We performed various failover tests to demonstrate high availability solutions for 

WebSphere Business Integration Message Broker on z/OS. Our goal was to 

observe that none of the messages were lost, despite the loss of statistics in 

some of the failover scenarios. 



Appendix A. Sample code 

This appendix provides sample code that was used for this paper. 

A 


ARM policy 

Example A-1 displays the ARM policy we created and activated for the sysplex 

on which the failover scenarios (in Chapter 4, “Failover scenarios” on page 33) 

were performed. The policy name is POLICY1. The elements to be restarted are 

in GROUP1. These elements consist of DB2, IRLM, WebSphere MQ, and 

WebSphere Business Integration Message Broker. 

Example: A-1 ARM Policy 

DATA TYPE(ARM) 

REPORT(YES) 

DEFINE POLICY NAME(POLICY1) REPLACE(YES) 

RESTART_ORDER 

LEVEL(1) 

ELEMENT_NAME(DSN710PMDFM0,DSN710PMDFM2, 

DFPMIRLMIFM0001,DFPMIRLMIFM2003) 

LEVEL(2) 

ELEMENT_NAME(SYSMQMGRWMQ0,SYSMQMGRWMQ2) 

LEVEL(3) 

ELEMENT_NAME(SYSWMQI_WMQ0BRK,SYSWMQI_WMQ2BRK) 

RESTART_GROUP(DEFAULT) 

ELEMENT(*) 

RESTART_ATTEMPTS(0) /* JOBS NOT TO BE RESTARTED BY ARM */ 

RESTART_GROUP(GROUP1) 

TARGET_SYSTEM(MVM0,MVM2) /* Z/OS SYSTEM NAME(S) */ 

RESTART_PACING(20) 

ELEMENT(DSN710PMDFM0) 

RESTART_METHOD(SYSTERM,STC,'#DFM0 STA DB2,LIGHT(YES)') 

ELEMENT(DSN710PMDFM2) 

RESTART_METHOD(SYSTERM,STC,'#DFM2 STA DB2,LIGHT(YES)') 

ELEMENT(DFPMIRLMIFM0001) 

RESTART_METHOD(SYSTERM,STC,'#DFM0 S DFM0IRLM,PC=YES') 

ELEMENT(DFPMIRLMIFM2003) 

RESTART_METHOD(SYSTERM,STC,'#DFM2 S DFM2IRLM,PC=YES') 

ELEMENT(SYSMQMGRWMQ0) 

RESTART_ATTEMPTS(3,300) 

RESTART_TIMEOUT(120) 

TERMTYPE(ALLTERM) 

RESTART_METHOD(BOTH,STC,'WMQ0 START QMGR') 

ELEMENT(SYSMQMGRWMQ2) 




RESTART_METHOD(BOTH,STC,'WMQ2 START QMGR') 


ELEMENT(SYSWMQI_WMQ0BRK) 




RESTART_METHOD(BOTH,STC,'S WMQ0BRK') 

ELEMENT(SYSWMQI_WMQ2BRK) 




RESTART_METHOD(BOTH,STC,'S WMQ2BRK') 

This policy is also available as a downloadable ZIP file in Appendix B, “Additional 

material” on page 53. 

Once downloaded and unzipped, the file should be uploaded to the z/OS system 

using binary ftp transfer. The resulting data set is in TSO TRANSMIT format. To 

extract the ARM policy JCL issue a TSO command similar to the following: 

RECEIVE INDSNAME(ARMPOL.XMIT) 

Once created, the policy can be activated with the following system command: 

SETXCF START,POLICY,TYPE=ARM,POLNAME=POLICY1 

Broker customization input file 

An example of the broker customization input file, mqsicompcif used for broker 

WMQ2BRK is downloadable from Appendix B, “Additional material” on page 53. 

The size constraint limits its display in the appendix. 

Once downloaded and unzipped, the file should be uploaded to the z/OS system 

using binary ftp transfer. The resulting data set is in TSO TRANSMIT format. To 

extract the mqsicomcif file issue a TSO command similar to the following: 

RECEIVE INDSNAME(MQSI.COMPCIF.XMIT) 

Appendix A. Sample code 51


Appendix B. Additional material 

This paper refers to additional material that can be downloaded from the Internet 

as described below. 

Locating the Web material 

The Web material associated with this paper is available in softcopy on the 

Internet from the IBM Redbooks Web server. Point your Web browser to: 

ftp://www.redbooks.ibm.com/redbooks/REDP3894 

Alternatively, you can go to the IBM Redbooks Web site at: 


B 

Select the Additional materials and open the directory that corresponds with 

the redbook form number, REDP3894. 


Using the Web material 

The additional Web material that accompanies this Redpaper includes the 

following files: 

File name Description 

armpol.xmit.zip ARM Policy (TSO) transmit Zipped Code Sample. 

mqsicomsif.xmit.zip Broker customization input file (TSO) transmit sample 

How to use the Web material 

Create a subdirectory (folder) on your workstation, and unzip the contents of the 

Web material ZIP file into this folder. 


Abbreviations and acronyms 

ARM Automatic Restart 

Management 

CA continuous availability 

CF coupling facility 

CFRM Coupling Facility Resource 

Manager 

CPF command prefix string 

DDF Distributed Data Facility 

DSG DB2 data sharing group 

HA high availability 

HFS Hierarchical File System 

HSM Hierarchical Storage Manager 

IBM International Business 

Machines Corporation 

IRLM Internal Resource Lock 

Manager 

ITSO International Technical 

Support Organization 

JCL Job Control Language 

LPAR Logical Partition 

ODBC Open Database Connectivity 

QSG queue sharing group 

RRS Resource Recovery Services 

USS UNIX System Services 

VIPA Virtual IP Address 

WMQ WebSphere MQ 



Related publications 

IBM Redbooks 

The publications listed in this section are considered particularly suitable for a 

more detailed discussion of the topics covered in this Redpaper. 

For information about ordering these publications, see “How to get IBM 

Redbooks” on page 58. Note that some of the documents referenced here may 

be available in softcopy only. 

► WebSphere Business Integration Pub/Sub Solutions, SG24-6088 

► Highly Available WebSphere Business Integration Solutions, SG24-7006 

Other publications 

Online resources 

These publications are also relevant as further information sources: 

► Leveraging z/OS TCP/IP Dynamic VIPAs and Sysplex Distributor for higher 

availability, GM13-0026 

These Web sites and URLs are also relevant as further information sources: 

► Leveraging z/OS TCP/IP Dynamic VIPAs and Sysplex Distributor for higher 

availability, GM13-0026 

http://www-1.ibm.com/servers/eserver/zseries/pso/whitepaper.html 

http://www-1.ibm.com/servers/eserver/zseries/library/techpapers/ 

gm130165.html 

► OS/390® and z/OS TCP/IP in the Parallel Sysplex Environment - Blurring the 

Boundaries 

http://www-1.ibm.com/servers/eserver/zseries/pso/whitepaper.html 

http://www-1.ibm.com/servers/eserver/zseries/library/techpapers/pdf/ 

gm130026.pdf 


How to get IBM Redbooks 

You can search for, view, or download Redbooks, Redpapers, Hints and Tips, 

draft publications and Additional materials, as well as order hardcopy Redbooks 

or CD-ROMs, at this Web site: 


Help from IBM 

IBM Support and downloads 

ibm.com/support 

IBM Global Services 

ibm.com/services 


Index 

A 

active-active 22 

active-passive 22 

Automatic Restart Management (ARM) 4, 25, 30 

C 

cloned applications, advantages 14 

cloned brokers 10 

cloned execution groups, advantages 12 

cloned execution groups, disadvantages 12 

clustering 10 

command prefix string (CPF) 25 

Continuous availability 4 

coupling facility (CF) 

failover 4 

failure 4 

Resource Manager (CFRM) 4, 26 

CURRENTSQLID 34–35 

D 

DB2 data sharing group (DSG) 4 

DB2 database called SHAREPRICES 34 

DB2 SHAREPRICES table 34 

design of message flows 17 

Distributed Data Facility (DDF) 24 

F 

factors contribute to HA of a Message Broker environment 

10 

H 

Hierarchical File System (HFS) 24 

Hierarchical Storage Manager (HSM) 26 

high availability (HA) 1, 21–22 

environment for WebSphere Business Integration 

Message Broker 2 

failover scenarios 33 

I 

IBM Category 2 SupportPac IP13 5, 34 

IH03 SupportPac (RFHUTIL) 36 

important factors when considering availability 10 

Internal Resource Lock Manager (IRLM) 30 

J 

JBDB2DEFS 35 

JDB2DEFS 34 

Job Control Language (JCL) 26 

Job JDB2DEFS 34 

L 

Logical Partition (LPAR) 4, 24 

M 

Message Broker, error processing 18 

N 

network topology, benefits 11 

O 

OEMPUTX, parameters that can be passed 35 

Open Database Connectivity (ODBC) 24 

P 

Parallel Sysplex 2 

Publish and Subscribe functionality 18–19 

Q 

queue sharing group (QSG) 4, 14 

R 

Redbooks Web site 58 

S 

SupportPac IP13 

batch job (OEMPUTX) 34 

batch jobs 37 

message flow (DB2U) 34 

SHAREPRICES table 35 


U 

UNIX System Services (USS) environment 27 

W 

WebSphere 

Business Integration Broker on z/OS 5 

Business Integration Message Broker 1, 9 

WebSphere MQ 

clustering 10, 12–15 

clustering, advantages 13 

clustering, disadvantages 13 

shared queues 12, 14 

shared queues, advantages 16 

shared queues, disadvantages 16 


High Availability z/OS Solutions 

for WebSphere Business 

Integration Message Broker V5 

Develop a highly 

available WebSphere 

Business Integration 


solution on z/OS 

Configure 

WebSphere MQ QSGs 

to support Message 

Broker in a Sysplex 

Example Message 

Broker high 

availability 

implementations 

Back cover 

When designing and implementing a production grade 

Message Broker solution on z/OS, one of the most important 

factors to consider is high availability. 

This IBM Redpaper examines the design considerations 

inherent in configuring a highly available Message Broker 

environment. 

Also demonstrated is the use of the coupling facility for 

WebSphere MQ queue sharing groups (QSG) and Automatic 

Restart Management (ARM) in order to support WebSphere 

Business Integration Message Broker HA in a sysplex 

environment. 

Finally, examples of the behavior of Message Broker during 

failover are provided, including transaction rate 

measurements and throughput statistics. 

INTERNATIONAL 

TECHNICAL 

SUPPORT 

ORGANIZATION 

® 

Redpaper 

BUILDING TECHNICAL 

INFORMATION BASED ON 

PRACTICAL EXPERIENCE 

IBM Redbooks are developed by 

the IBM International Technical 

Support Organization. Experts 

from IBM, Customers and 

Partners from around the world 

create timely technical 

information based on realistic 

scenarios. Specific 

recommendations are provided 

to help you implement IT 

solutions more effectively in 

your environment. 

For more information: 

ibm.com/redbooks

Redpaper - IBM Redbooks

Create successful ePaper yourself

Delete template?

Save as template?