13.07.2015 Views

ExpressCluster X 3.1 for Linux Getting Started Guide - Nec

ExpressCluster X 3.1 for Linux Getting Started Guide - Nec

ExpressCluster X 3.1 for Linux Getting Started Guide - Nec

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>ExpressCluster</strong> ® X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong><strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>01/10/20137th Edition


Revision HistoryEdition Revised Date DescriptionFirst 10/11/2011 New manual2nd 10/11/2011 Error correction3rd 11/30/2011 Corresponds to the internal version <strong>3.1</strong>.1-1.4th 03/31/2012 Corresponds to the internal version <strong>3.1</strong>.3-1.5th 06/30/2012 Corresponds to the internal version <strong>3.1</strong>.4-1.6th 09/30/2012 Corresponds to the internal version <strong>3.1</strong>.5-1.7th 01/10/2013 Corresponds to the internal version <strong>3.1</strong>.7-1.


© Copyright NEC Corporation 2011. All rights reserved.DisclaimerIn<strong>for</strong>mation in this document is subject to change without notice. No part of this document may be reproduced ortransmitted in any <strong>for</strong>m by any means, electronic or mechanical, <strong>for</strong> any purpose, without the express writtenpermission of NEC Corporation.Trademark In<strong>for</strong>mation<strong>ExpressCluster</strong> ® X is a registered trademark of NEC Corporation.FastSync is a trademark of NEC Corporation.<strong>Linux</strong> is a registered trademark or trademark of Linus Torvalds in the United States and other countries.RPM is a trademark of Red Hat, Inc.Intel, Pentium and Xeon are registered trademarks or trademarks of Intel Corporation.Microsoft and Windows are registered trademarks of Microsoft Corporation in the United States and othercountries.Turbolinux is a registered trademark of Turbolinux. Inc.VERITAS, VERITAS Logo and all other VERITAS product names and slogans are trademarks or registeredtrademarks of VERITAS Software Corporation.Oracle, Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or itsaffiliates.VMware is a registered trademark or trademark of VMware, Inc. in the United States and other countries.Novell is a registered trademark of Novell, Inc. in the United States and Japan.SUSE is a registered trademark of SUSE LINUX AG, a group company of U.S. Novell.Citrix, Citrix XenServer, and Citrix Essentials are registered trademarks or trademarks of Citrix Systems, Inc. inthe United State and other countries.WebOTX is a registered trademark of NEC Corporation.JBoss is a registered trademark of Red Hat, Inc. in the United States and its subsidiaries.Apache Tomcat, Tomcat, and Apache are registered trademarks or trademarks of Apache Software Foundation.Android is a trademark or registered trademark of Google, Inc.SVF is a registered trademark of WingArc Technologies, Inc.F5, F5 Networks, BIG-IP, and iControl are trademarks or registered trademarks of F5 Networks, Inc. in theUnited States and other countries.Equalizer SVF is a registered trademark of Coyote Point Systems, Inc.Other product names and slogans written in this manual are trademarks or registered trademarks of theirrespective companies.


Table of ContentsPreface.................................................................................................................................................... ixWho Should Use This <strong>Guide</strong>............................................................................................................................................... ixHow This <strong>Guide</strong> is Organized............................................................................................................................................. ix<strong>ExpressCluster</strong> X Documentation Set ................................................................................................................................. xConventions ........................................................................................................................................................................ xiContacting NEC.................................................................................................................................................................. xiiSection I Introducing <strong>ExpressCluster</strong>........................................................................... 13Chapter 1 What is a cluster system? ............................................................................... 15Overview of the cluster system............................................................................................................................ 16High Availability (HA) cluster ............................................................................................................................ 16Shared disk type.................................................................................................................................................................. 17Data mirror type.................................................................................................................................................................. 19Error detection mechanism .................................................................................................................................. 20Problems with shared disk type........................................................................................................................................... 20Network partition (split-brain-syndrome) ........................................................................................................................... 21Taking over cluster resources .............................................................................................................................. 22Taking over the data............................................................................................................................................................ 22Taking over the applications ............................................................................................................................................... 23Summary of failover ........................................................................................................................................................... 23Eliminating single point of failure ....................................................................................................................... 24Shared disk.......................................................................................................................................................................... 24Access path to the shared disk............................................................................................................................................. 25LAN .................................................................................................................................................................................... 26Operation <strong>for</strong> availability..................................................................................................................................... 27Failure monitoring............................................................................................................................................................... 27Chapter 2 Using <strong>ExpressCluster</strong> ..................................................................................... 29What is <strong>ExpressCluster</strong>? ...................................................................................................................................... 30<strong>ExpressCluster</strong> modules....................................................................................................................................... 30Software configuration of <strong>ExpressCluster</strong> ........................................................................................................... 31How an error is detected in <strong>ExpressCluster</strong> ........................................................................................................................ 31What is server monitoring? ................................................................................................................................................. 32What is application monitoring? ......................................................................................................................................... 33What is internal monitoring?............................................................................................................................................... 33Monitorable and non-monitorable errors............................................................................................................................. 33Detectable and non-detectable errors by server monitoring ................................................................................................ 33Detectable and non-detectable errors by application monitoring ........................................................................................ 34Network partition resolution................................................................................................................................ 35Failover mechanism............................................................................................................................................. 36Failover resources ............................................................................................................................................................... 37System configuration of the failover type cluster................................................................................................................ 37Hardware configuration of the shared disk type cluster ...................................................................................................... 40Hardware configuration of the mirror disk type cluster ...................................................................................................... 41Hardware configuration of the hybrid disk type cluster ...................................................................................................... 42What is cluster object? ........................................................................................................................................................ 43What is a resource?.............................................................................................................................................. 44Heartbeat resources............................................................................................................................................................. 44Network partition resolution resources ............................................................................................................................... 44Group resources .................................................................................................................................................................. 44Monitor resources ............................................................................................................................................................... 45VM monitor resource (vmw) .............................................................................................................................................. 46<strong>Getting</strong> started with <strong>ExpressCluster</strong>..................................................................................................................... 48Latest in<strong>for</strong>mation............................................................................................................................................................... 48Designing a cluster system.................................................................................................................................................. 48Configuring a cluster system............................................................................................................................................... 48Troubleshooting the problem .............................................................................................................................................. 48v


Section II Installing <strong>ExpressCluster</strong> ...............................................................................49Chapter 3Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>..............................................51Hardware ..............................................................................................................................................................52General server requirements................................................................................................................................................52Supported disk interfaces.....................................................................................................................................................52Supported network interfaces ..............................................................................................................................................53Servers supporting BMC-related functions .........................................................................................................................53Servers supporting Express5800/A1080a and Express5800/A1080aA1040a series linkage ...............................................53Software ...............................................................................................................................................................54System requirements <strong>for</strong> <strong>ExpressCluster</strong> Server..................................................................................................................54Supported distributions and kernel versions ........................................................................................................................54Applications supported by monitoring options....................................................................................................................63Operation Environment of VM resources............................................................................................................................68Operation environment <strong>for</strong> SNMP linkage functions ..........................................................................................................69Operation environment <strong>for</strong> JVM monitor............................................................................................................................70Required memory and disk size...........................................................................................................................................71System requirements <strong>for</strong> the Builder ....................................................................................................................72Supported operating systems and browsers .........................................................................................................................72Java runtime environment ...................................................................................................................................................73Required memory and disk size...........................................................................................................................................73Supported <strong>ExpressCluster</strong> versions .....................................................................................................................................73System requirements <strong>for</strong> the WebManager ..........................................................................................................74Supported operating systems and browsers .........................................................................................................................74Java runtime environment ...................................................................................................................................................74Required memory and disk size...........................................................................................................................................75System requirements <strong>for</strong> the Integrated WebManager .........................................................................................76Supported operating systems and browsers .........................................................................................................................76Java runtime environment ...................................................................................................................................................77Required memory size and disk size....................................................................................................................................77System requirements <strong>for</strong> WebManager Mobile....................................................................................................77Supported operating systems and browsers .........................................................................................................................77Chapter 4Latest version in<strong>for</strong>mation.............................................................................79Correspondence list of <strong>ExpressCluster</strong> and a manual..........................................................................................................80Enhanced functions .............................................................................................................................................................80Corrected in<strong>for</strong>mation .........................................................................................................................................................86Chapter 5 Notes and Restrictions ....................................................................................107Designing a system configuration ........................................................................................................................108Function list and necessary license......................................................................................................................................108Supported operating systems <strong>for</strong> the Builder and WebManager..........................................................................................108Hardware requirements <strong>for</strong> mirror disks..............................................................................................................................109Hardware requirements <strong>for</strong> shared disks..............................................................................................................................110Hardware requirements <strong>for</strong> hybrid disks..............................................................................................................................111IPv6 environment ................................................................................................................................................................112Network configuration.........................................................................................................................................................113Execute Script be<strong>for</strong>e Final Action setting <strong>for</strong> monitor resource recovery action ...............................................................113NIC link up/down monitor resource ....................................................................................................................................114Write function of the mirror disk resource and hybrid disk resource...................................................................................115Not outputting syslog to the mirror disk resource or the hybrid disk resource ....................................................................115Notes when terminating the mirror disk resource or the hybrid disk resource.....................................................................115Data consistency among multiple asynchronous mirror disks .............................................................................................116O_DIRECT <strong>for</strong> mirror or hybrid disk resources..................................................................................................................116Initial mirror construction time <strong>for</strong> mirror or hybrid disk resources....................................................................................116Mirror or hybrid disk connect..............................................................................................................................................116JVM monitor resources .......................................................................................................................................................116Mail reporting......................................................................................................................................................................117Requirements <strong>for</strong> network warning light .............................................................................................................................117Installing operating system...................................................................................................................................118/opt/nec/clusterpro file system.............................................................................................................................................118Mirror disks .........................................................................................................................................................................118vi


Hybrid disks........................................................................................................................................................................ 120Dependent library................................................................................................................................................................ 120Dependent driver................................................................................................................................................................. 120The major number of Mirror driver..................................................................................................................................... 121The major number of Kernel mode LAN heartbeat and keepalive drivers.......................................................................... 121Partition <strong>for</strong> RAW monitoring of disk monitor resources ................................................................................................... 121SE<strong>Linux</strong> settings ................................................................................................................................................................. 121NetworkManager settings ................................................................................................................................................... 121Be<strong>for</strong>e installing <strong>ExpressCluster</strong> ......................................................................................................................... 122Communication port number .............................................................................................................................................. 122Management LAN of server BMC...................................................................................................................................... 123Management LAN of server BMC...................................................................................................................................... 123Changing the range of automatic allocation <strong>for</strong> the communication port numbers ............................................................. 125Clock synchronization......................................................................................................................................................... 125NIC device name................................................................................................................................................................. 125Shared disk.......................................................................................................................................................................... 125Mirror disk .......................................................................................................................................................................... 126Hybrid disk ......................................................................................................................................................................... 126Adjusting OS startup time................................................................................................................................................... 126Verifying the network settings ............................................................................................................................................ 126ipmiutil and OpenIPMI ....................................................................................................................................................... 126User mode monitor resource (monitoring method: softdog) ............................................................................................... 127Log collection ..................................................................................................................................................................... 127nsupdate and nslookup ........................................................................................................................................................ 127FTP monitor resources ........................................................................................................................................................ 128Notes when creating <strong>ExpressCluster</strong> configuration data ..................................................................................... 129Environment variable.......................................................................................................................................................... 129Force stop function, chassis identify lamp linkage.............................................................................................................. 129Server reset, server panic and power off ............................................................................................................................. 129Final action <strong>for</strong> group resource deactivation error .............................................................................................................. 130Verifying raw device <strong>for</strong> VxVM......................................................................................................................................... 131Selecting mirror disk file system......................................................................................................................................... 131Selecting hybrid disk file system ........................................................................................................................................ 131Setting of mirror or hybrid disk resource action.................................................................................................................. 132Time to start a single serve when many mirror disks are defined. ...................................................................................... 132RAW monitoring of disk monitor resources ....................................................................................................................... 132Delay warning rate.............................................................................................................................................................. 132Disk monitor resource (monitoring method TUR) .............................................................................................................. 132WebManager reload interval............................................................................................................................................... 133LAN heartbeat settings........................................................................................................................................................ 133Kernel mode LAN heartbeat resource settings.................................................................................................................... 133COM heartbeat resource settings ........................................................................................................................................ 133BMC heartbeat settings....................................................................................................................................................... 133IP address <strong>for</strong> Integrated WebManager settings.................................................................................................................. 133Double-byte character set that can be used in script comments .......................................................................................... 133Failover exclusive attribute of virtual machine group......................................................................................................... 133System monitor resource settings........................................................................................................................................ 134Message receive monitor resource settings ......................................................................................................................... 134JVM monitor resource settings ........................................................................................................................................... 135After start operating <strong>ExpressCluster</strong>.................................................................................................................... 136Error message in the load of the mirror driver in an environment such as udev ................................................................. 136File operating utility on X-Window .................................................................................................................................... 136Messages displayed when loading a driver ......................................................................................................................... 137Messages displayed <strong>for</strong> the first I/O to mirror disk resources or hybrid disk resources ...................................................... 137Messages written to syslog when multiple mirror disk resources or hybrid disk resources are used .................................. 138IPMI message...................................................................................................................................................................... 138Limitations during the recovery operation .......................................................................................................................... 139Executable <strong>for</strong>mat file and script file not described in manuals.......................................................................................... 139Message of kernel page allocation error.............................................................................................................................. 139Cache swell by a massive I/O ............................................................................................................................................. 140When multiple mounts are specified <strong>for</strong> a resource like a mirror disk resource.................................................................. 141Executing fsck..................................................................................................................................................................... 142Messages when collecting logs ........................................................................................................................................... 144Failover and activation during mirror recovery................................................................................................................... 144Cluster shutdown and reboot (mirror disk resource and hybrid disk resource) ................................................................... 144vii


Shutdown and reboot of individual server (mirror disk resource and hybrid disk resource) ...............................................145Scripts <strong>for</strong> starting/stopping <strong>ExpressCluster</strong> services..........................................................................................................145Service startup time .............................................................................................................................................................146Scripts in EXEC resources ..................................................................................................................................................146Monitor resources that monitoring timing is “Active” ........................................................................................................146Notes on the WebManager ..................................................................................................................................................147Notes on the Builder (Config mode of Cluster Manager)....................................................................................................147Changing the partition size of mirror disks and hybrid disk resources ................................................................................148Changing kernel dump settings ...........................................................................................................................................148Notes on floating IP and virtual IP resources ......................................................................................................................148Notes on system monitor resources .....................................................................................................................................149Notes on JVM monitor resources ........................................................................................................................................149Notes on final action (group stop) at detection of a monitor resource error ........................................................................149Notes when changing the <strong>ExpressCluster</strong> configuration ......................................................................................150Failover exclusive attribute of group properties ..................................................................................................................150Dependency between resource properties............................................................................................................................150Updating <strong>ExpressCluster</strong> ......................................................................................................................................151If the alert destination setting is changed.............................................................................................................................151Chapter 6Upgrading <strong>ExpressCluster</strong>.............................................................................153How to update from <strong>ExpressCluster</strong> X 2.0 or 2.1.................................................................................................154How to upgrade from X2.0 or X2.1 to X3.0 or X<strong>3.1</strong>...........................................................................................................154Appendix A. Glossary ...........................................................................................................159Appendix B.Index.................................................................................................................161viii


PrefaceWho Should Use This <strong>Guide</strong><strong>ExpressCluster</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong> is intended <strong>for</strong> first-time users of the <strong>ExpressCluster</strong>. The guide coverstopics such as product overview of the <strong>ExpressCluster</strong>, how the cluster system is installed, and the summary ofother available guides. In addition, latest system requirements and restrictions are described.How This <strong>Guide</strong> is OrganizedSection IChapter 1Chapter 2Introducing <strong>ExpressCluster</strong>What is a cluster system?Helps you to understand the overview of the cluster system and <strong>ExpressCluster</strong>.Using <strong>ExpressCluster</strong>Provides instructions on how to use a cluster system and other related-in<strong>for</strong>mation.Section IIChapter 3Chapter 4Chapter 5Chapter 6AppendixAppendix AAppendix BInstalling <strong>ExpressCluster</strong>Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>Provides the latest in<strong>for</strong>mation that needs to be verified be<strong>for</strong>e starting to use <strong>ExpressCluster</strong>.Latest version in<strong>for</strong>mationProvides in<strong>for</strong>mation on latest version of the <strong>ExpressCluster</strong>.Notes and RestrictionsProvides in<strong>for</strong>mation on known problems and restrictions.Upgrading <strong>ExpressCluster</strong>Provides instructions on how to update the <strong>ExpressCluster</strong>.GlossaryIndexix


<strong>ExpressCluster</strong> X Documentation SetThe <strong>ExpressCluster</strong> X manuals consist of the following five guides. The title and purpose of each guide isdescribed below:<strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>This guide is intended <strong>for</strong> all users. The guide covers topics such as product overview, system requirements, andknown problems.Installation and Configuration <strong>Guide</strong>This guide is intended <strong>for</strong> system engineers and administrators who want to build, operate, and maintain a clustersystem. Instructions <strong>for</strong> designing, installing, and configuring a cluster system with <strong>ExpressCluster</strong> are coveredin this guide.Reference <strong>Guide</strong>This guide is intended <strong>for</strong> system administrators. The guide covers topics such as how to operate <strong>ExpressCluster</strong>,function of each module, maintenance-related in<strong>for</strong>mation, and troubleshooting. The guide is supplement to theInstallation and Configuration <strong>Guide</strong>.<strong>ExpressCluster</strong> X Integrated WebManager Administrator’s <strong>Guide</strong>This guide is intended <strong>for</strong> system administrators who manage cluster systems using <strong>ExpressCluster</strong> withIntegrated WebManager, and also intended <strong>for</strong> system engineers who introduce Integrated WebManager. Thisguide describes detailed issues necessary <strong>for</strong> introducing Integrated WebManager in the actual procedures.<strong>ExpressCluster</strong> X WebManager Mobile Administrator’s <strong>Guide</strong>This guide is intended <strong>for</strong> system administrators who manage cluster systems using <strong>ExpressCluster</strong> with<strong>ExpressCluster</strong> WebManager Mobile and <strong>for</strong> system engineers who are installing the WebManager Mobile. Inthis guide, details on those items required <strong>for</strong> installing the cluster system using the WebManager Mobile areexplained in accordance with the actual procedures.x


ConventionsIn this guide, Note, Important, Related In<strong>for</strong>mation are used as follows:Note:Used when the in<strong>for</strong>mation given is important, but not related to the data loss and damage to the system andmachine.Important:Used when the in<strong>for</strong>mation given is necessary to avoid the data loss and damage to the system and machine.Related In<strong>for</strong>mation:Used to describe the location of the in<strong>for</strong>mation given at the reference destination.The following conventions are used in this guide.Convention Usage ExampleIndicates graphical objects, such asIn User Name, type your name.Bold fields, list boxes, menu selections,On the File menu, click Open Database.buttons, labels, icons, etc.Angledbracketwithin thecommandline#Monospace(courier)Monospacebold(courier)Monospaceitalic(courier)Indicates that the value specifiedinside of the angled bracket can beomitted.Prompt to indicate that a <strong>Linux</strong> userhas logged in as root user.Indicates path names, commands,system output (message, prompt,etc), directory, file names, functionsand parameters.Indicates the value that a useractually enters from a commandline.Indicates that users should replaceitalicized part with values that theyare actually working with.clpstat –s[-h host_name]# clpcl -s -a/<strong>Linux</strong>/3.0/en/server/Enter the following:# clpcl -s -arpm –iexpressclsbuilder--.i686.rpmxi


Contacting NECFor the latest product in<strong>for</strong>mation, visit our website below:http://www.nec.com/global/prod/expresscluster/xii


Section IIntroducing<strong>ExpressCluster</strong>This section helps you to understand the overview of <strong>ExpressCluster</strong> and its system requirements.This section covers:• Chapter 1 What is a cluster system?• Chapter 2 Using <strong>ExpressCluster</strong>13


Chapter 1What is a cluster system?This chapter describes overview of the cluster system.This chapter covers:• Overview of the cluster system···············································································································16• High Availability (HA) cluster ···············································································································16• Error detection mechanism·····················································································································20• Taking over cluster resources ·················································································································22• Eliminating single point of failure ··········································································································24• Operation <strong>for</strong> availability ·······················································································································2715


Chapter 1 What is a cluster system?Overview of the cluster systemA key to success in today’s computerized world is to provide services without them stopping. Asingle machine down due to a failure or overload can stop entire services you provide withcustomers. This will not only result in enormous damage but also in loss of credibility you onceenjoyed.A cluster system is a solution to tackle such a disaster. Introducing a cluster system allows you tominimize the period during which operation of your system stops (down time) or to avoidsystem-down by load distribution.As the word “cluster” represents, a cluster system is a system aiming to increase reliability andper<strong>for</strong>mance by clustering a group (or groups) of multiple computers. There are various types ofcluster systems, which can be classified into the following three listed below. <strong>ExpressCluster</strong> iscategorized as a high availability cluster.High Availability (HA) ClusterIn this cluster configuration, one server operates as an active server. When the active serverfails, a stand-by server takes over the operation. This cluster configuration aims <strong>for</strong>high-availability and allows data to be inherited as well. The high availability cluster isavailable in the shared disk type, data mirror type or remote cluster type.Load Distribution ClusterThis is a cluster configuration where requests from clients are allocated to load-distributionhosts according to appropriate load distribution rules. This cluster configuration aims <strong>for</strong> highscalability. Generally, data cannot be taken over. The load distribution cluster is available in aload balance type or parallel database type.High Per<strong>for</strong>mance Computing (HPC) ClusterThis is a cluster configuration where CPUs of all nodes are used to per<strong>for</strong>m a single operation.This cluster configuration aims <strong>for</strong> high per<strong>for</strong>mance but does not provide general versatility.Grid computing, which is one of the types of high per<strong>for</strong>mance computing that clusters awider range of nodes and computing clusters, is a hot topic these days.High Availability (HA) clusterTo enhance the availability of a system, it is generally considered that having redundancy <strong>for</strong>components of the system and eliminating a single point of failure is important. “Single point offailure” is a weakness of having a single computer component (hardware component) in the system.If the component fails, it will cause interruption of services. The high availability (HA) cluster is acluster system that minimizes the time during which the system is stopped and increasesoperational availability by establishing redundancy with multiple servers.The HA cluster is called <strong>for</strong> in mission-critical systems where downtime is fatal. The HA clustercan be divided into two types: shared disk type and data mirror type. The explanation <strong>for</strong> each typeis provided below.16<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


High Availability (HA) clusterShared disk typeData must be inherited from one server to another in cluster systems. A cluster topology where datais stored in a shared disk with two or more servers using the data is called shared disk type.Shared Disk TypeData Mirror TypeMirroring- Expensive since a shared disk isnecessary.- Ideal <strong>for</strong> the system that handleslarge data- Cheap since a shared disk isunnecessary.- Ideal <strong>for</strong> the system with less datavolume because of mirroring.Figure 1-1: HA cluster configurationIf a failure occurs on a server where applications are running (active server), the cluster systemdetects the failure and applications are automatically started in a stand-by server to take overoperations. This mechanism is called failover. Operations to be inherited in the cluster systemconsist of resources including disk, IP address and application.In a non-clustered system, a client needs to access a different IP address if an application isrestarted on a server other than the server where the application was originally running. In contrast,many cluster systems allocate a virtual IP address on an operational basis. A server where theoperation is running, be it an active or a stand-by server, remains transparent to a client. Theoperation is continued as if it has been running on the same server.File system consistency must be checked to inherit data. A check command (<strong>for</strong> example, fsck orchkdsk in <strong>Linux</strong>) is generally run to check file system consistency. However, the larger the filesystem is, the more time spent <strong>for</strong> checking. While checking is in process, operations are stopped.For this problem, journaling file system is introduced to reduce the time required <strong>for</strong> failover.Logic of the data to be inherited must be checked <strong>for</strong> applications. For example, roll-back orroll-<strong>for</strong>ward is necessary <strong>for</strong> databases. With these actions, a client can continue operation only byre-executing the SQL statement that has not been committed yet.A server with the failure can return to the cluster system as a stand-by server if it is physicallyseparated from the system, fixed, and then succeeds to connect the system. Such returning isacceptable in production environments where continuity of operations is important.Section I Introducing <strong>ExpressCluster</strong>17


Chapter 1 What is a cluster system?Normal OperationOccurrence of FailureFailoverOperationOperationServer FailureOperation TransferFailbackRecovering ServerOperationOperationFigure 1-2: From occurrence of a failure to recoveryWhen the specification of the failover destination server does not meet the system requirements oroverload occurs due to multi-directional stand-by, operations on the original server are preferred.In such a case, a failback takes place to resume operations on the original server.A stand-by mode where there is one operation and no operation is active on the stand-by server, asshown in Figure 1-3, is referred to as uni-directional stand-by. A stand-by mode where there aretwo or more operations with each server of the cluster serving as both active and stand-by servers isreferred to as multi-directional stand-by.Normal OperationNormal OperationActiveServerStand-byServerActive Server <strong>for</strong>Operation AStand-by Server <strong>for</strong>Operation BActive Server <strong>for</strong>Operation BStand-by Server <strong>for</strong>Operation AOperationOperation AOperation BFigure 1-3: HA cluster topology18<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


High Availability (HA) clusterData mirror typeThe shared disk type cluster system is good <strong>for</strong> large-scale systems. However, creating a systemwith this type can be costly because shared disks are generally expensive. The data mirror typecluster system provides the same functions as the shared disk type with smaller cost throughmirroring of server disks.The data mirror type is not recommended <strong>for</strong> large-scale systems that handle a large volume of datasince data needs to be mirrored between servers.When a write request is made by an application, the data mirror engine not only writes data in thelocal disk but sends the write request to the stand-by server via the interconnect. Interconnect is anetwork connecting servers. It is used to monitor whether or not the server is activated in thecluster system. In addition to this purpose, interconnect is sometimes used to transfer data in thedata mirror type cluster system. The data mirror engine on the stand-by server achieves datasynchronization between stand-by and active servers by writing the data into the local disk of thestand-by server.For read requests from an application, data is simply read from the disk on the active server.Active ServerNormal OperationStand-by ServerApplicationFile SystemNICLAN (Interconnect)NICData MirrorEngineClusterData MirrorEngineDiskDiskWriteReadFigure 1-4: Data mirror mechanismSnapshot backup is applied usage of data mirroring. Because the data mirror type cluster systemhas shared data in two locations, you can keep the disk of the stand-by server as snapshot backupwithout spending time <strong>for</strong> backup by simply separating the server from the cluster.Failover mechanism and its problemsThere are various cluster systems such as failover clusters, load distribution clusters, and highper<strong>for</strong>mance computing (HPC) clusters. The failover cluster is one of the high availability (HA)cluster systems that aim to increase operational availability through establishing server redundancyand passing operations being executed to another server when a failure occurs.Section I Introducing <strong>ExpressCluster</strong>19


Error detection mechanismNetwork partition (split-brain-syndrome)When all interconnects between servers are disconnected, failover takes place because the serversassume other server(s) are down. To monitor whether the server is activated, a heartbeatcommunication is used. As a result, multiple servers mount a file system simultaneously causingdata corruption. This explains the importance of appropriate failover behavior in a cluster system atthe time of failure occurrence.A failureoccurs on theother server.A failureoccurs onthe otherservermountmountDataCorruptionFigure 1-6: Network partition problemThe problem explained in the section above is referred to as “network partition” or “split-brainsyndrome.” The failover cluster system is equipped with various mechanisms to ensure shared disklock at the time when all interconnects are disconnected.Section I Introducing <strong>ExpressCluster</strong>21


Chapter 1 What is a cluster system?Taking over cluster resourcesAs mentioned earlier, resources to be managed by a cluster include disks, IP addresses, andapplications. The functions used in the failover cluster system to inherit these resources aredescribed below.Taking over the dataData to be passed from a server to another in a cluster system is stored in a partition on the shareddisk. This means data is re-mounting the file system of files that the application uses on a healthyserver. What the cluster software should do is simply mount the file system because the shared diskis physically connected to a server that inherits data.Detects a FailuremountmountmountFigure 1-7: Taking over dataThe figure 1-7 may look simple, but consider the following issues in designing and creating acluster system.One issue to consider is recovery time <strong>for</strong> a file system. A file system to be inherited may havebeen used by another server or being updated just be<strong>for</strong>e the failure occurred and requires a filesystem consistency check. When the file system is large, the time spent <strong>for</strong> checking consistencywill be enormous. It may take a few hours to complete the check and the time is wholly added tothe time <strong>for</strong> failover (time to take over operation), and this will reduce system availability.Another issue you should consider is writing assurance. When an application writes important datainto a file, it tries to ensure the data to be written into a disk by using a function such assynchronized writing. The data that the application assumes to have been written is expected to beinherited after failover. For example, a mail server reports the completion of mail receiving to othermail servers or clients after it has securely written mails it received in a spool. This will allow thespooled mail to be distributed again after the server is restarted. Likewise, a cluster system shouldensure mails written into spool by a server to become readable by another server.22<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Taking over cluster resourcesTaking over the applicationsThe last to come in inheritance of operation by cluster software is inheritance of applications.Unlike fault tolerant computers (FTC), no process status such as contents of memory is inherited intypical failover cluster systems. The applications running on a failed server are inherited byrerunning them on a healthy server.For example, when instances of a database management system (DBMS) are inherited, thedatabase is automatically recovered (roll-<strong>for</strong>ward/roll-back) by startup of the instances. The timeneeded <strong>for</strong> this database recovery is typically a few minutes though it can be controlled byconfiguring the interval of DBMS checkpoint to a certain extent.Many applications can restart operations by re-execution. Some applications, however, requiregoing through procedures <strong>for</strong> recovery if a failure occurs. For these applications, cluster softwareallows to start up scripts instead of applications so that recovery process can be written. In a script,the recovery process, including cleanup of files half updated, is written as necessary according tofactors <strong>for</strong> executing the script and in<strong>for</strong>mation on the execution server.Summary of failoverTo summarize the behavior of cluster software: Detects a failure (heartbeat/resource monitoring)Resolves a network partition (NP resolution)Switches cluster resources• Pass data• Pass IP address• Application Taking overA Failure OccursDetects a FailureCompletesTaking overTimeApplicationTaking overIP InheritingData Taking overNP ResolutionDetects a Failure(HeartbeatResourceMonitoring)Down TimeSystem Operating TimeFigure 1-8: Failover time chartCluster software is required to complete each task quickly and reliably (see Figure 1-8.) Clustersoftware achieves high availability with due consideration on what has been described so far.Section I Introducing <strong>ExpressCluster</strong>23


Chapter 1 What is a cluster system?Eliminating single point of failureHaving a clear picture of the availability level required or aimed is important in building a highavailability system. This means when you design a system, you need to study cost effectiveness ofcountermeasures, such as establishing a redundant configuration to continue operations andrecovering operations within a short period of time, against various failures that can disturb systemoperations.Single point of failure (SPOF), as described previously, is a component where failure can lead tostop of the system. In a cluster system, you can eliminate the system’s SPOF by establishing serverredundancy. However, components shared among servers, such as shared disk may become aSPOF. The key in designing a high availability system is to duplicate or eliminate this sharedcomponent.A cluster system can improve availability but failover will take a few minutes <strong>for</strong> switchingsystems. That means time <strong>for</strong> failover is a factor that reduces availability. Solutions <strong>for</strong> thefollowing three, which are likely to become SPOF, will be discussed hereafter although technicalissues that improve availability of a single server such as ECC memory and redundant powersupply are important. Shared diskShared diskAccess path to the shared diskLANTypically a shared disk uses a disk array <strong>for</strong> RAID. Because of this, the bare drive of the disk doesnot become SPOF. The problem is the RAID controller is incorporated. Shared disks commonlyused in many cluster systems allow controller redundancy.In general, access paths to the shared disk must be duplicated to benefit from redundant RAIDcontroller. There are still things to be done to use redundant access paths in <strong>Linux</strong> (described laterin this chapter). If the shared disk has configuration to access the same logical disk unit (LUN)from duplicated multiple controllers simultaneously, and each controller is connected to one server,you can achieve high availability by failover between nodes when an error occurs in one of thecontrollers.FailoverSPOFRAID5RAID5HBA(SCSI Card, FC NIC)Access PathRAID ControllerArray Disk24Figure 1-9: Example of the shared disk RAID controller and access paths being SPOF (left)and an access path connected to a RAID controllerWith a failover cluster system of data mirror type, where no shared disk is used, you can create anideal system having no SPOF because all data is mirrored to the disk in the other server. Howeveryou should consider the following issues:<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Eliminating single point of failureDisk I/O per<strong>for</strong>mance in mirroring data over the network (especially writing per<strong>for</strong>mance)System per<strong>for</strong>mance during mirror resynchronization in recovery from server failure (mirrorcopy is done in the background)Time <strong>for</strong> mirror resynchronization (clustering cannot be done until mirror resynchronizationis completed)In a system with frequent data viewing and a relatively small volume of data, choosing the datamirror type <strong>for</strong> clustering is a key to increase availability.Access path to the shared diskIn a typical configuration of the shared disk type cluster system, the access path to the shared diskis shared among servers in the cluster. To take SCSI as an example, two servers and a shared diskare connected to a single SCSI bus. A failure in the access path to the shared disk can stop the entiresystem.What you can do <strong>for</strong> this is to have a redundant configuration by providing multiple access paths tothe shared disk and make them look as one path <strong>for</strong> applications. The device driver allowing such iscalled a path failover driver. Path failover drivers are often developed and released by shared diskvendors. Path failover drivers in <strong>Linux</strong> are still under development. For the time being, asdiscussed earlier, offering access paths to the shared disk by connecting a server on an arraycontroller on the shared disk basis is the way to ensure availability in <strong>Linux</strong> cluster systems.ApplicationPath FailoverDriverApplicationPath FailoverDriverFigure 1-10: Path failover driverSection I Introducing <strong>ExpressCluster</strong>25


Chapter 1 What is a cluster system?LANIn any systems that run services on a network, a LAN failure is a major factor that disturbsoperations of the system. If appropriate settings are made, availability of cluster system can beincreased through failover between nodes at NIC failures. However, a failure in a network devicethat resides outside the cluster system disturbs operation of the system.NICNICSPOFFailoverFigure 1-11: Example of router becoming SPOFLAN redundancy is a solution to tackle device failure outside the cluster system and to improveavailability. You can apply ways used <strong>for</strong> a single server to increase LAN availability. For example,choose a primitive way to have a spare network device with its power off, and manually replace afailed device with this spare device. Choose to have a multiplex network path through a redundantconfiguration of high-per<strong>for</strong>mance network devices, and switch paths automatically. Anotheroption is to use a driver that supports NIC redundant configuration such as Intel’s ANS driver.Load balancing appliances and firewall appliances are also network devices that are likely tobecome SPOF. Typically they allow failover configurations through standard or optional software.Having redundant configuration <strong>for</strong> these devices should be regarded as requisite since they playimportant roles in the entire system.26<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Operation <strong>for</strong> availabilityOperation <strong>for</strong> availabilityEvaluation be<strong>for</strong>e staring operationGiven many of factors causing system troubles are said to be the product of incorrect settings orpoor maintenance, evaluation be<strong>for</strong>e actual operation is important to realize a high availabilitysystem and its stabilized operation. Exercising the following <strong>for</strong> actual operation of the system is akey in improving availability: Clarify and list failures, study actions to be taken against them, and verify effectiveness ofthe actions by creating dummy failures.Conduct an evaluation according to the cluster life cycle and verify per<strong>for</strong>mance (such as atdegenerated mode)Arrange a guide <strong>for</strong> system operation and troubleshooting based on the evaluation mentionedabove.Having a simple design <strong>for</strong> a cluster system contributes to simplifying verification andimprovement of system availability.Failure monitoringDespite the above ef<strong>for</strong>ts, failures still occur. If you use the system <strong>for</strong> long time, you cannotescape from failures: hardware suffers from aging deterioration and software produces failures anderrors through memory leaks or operation beyond the originally intended capacity. Improvingavailability of hardware and software is important yet monitoring <strong>for</strong> failure and troubleshootingproblems is more important. For example, in a cluster system, you can continue running the systemby spending a few minutes <strong>for</strong> switching even if a server fails. However, if you leave the failedserver as it is, the system no longer has redundancy and the cluster system becomes meaninglessshould the next failure occur.If a failure occurs, the system administrator must immediately take actions such as removing anewly emerged SPOF to prevent another failure. Functions <strong>for</strong> remote maintenance and reportingfailures are very important in supporting services <strong>for</strong> system administration. <strong>Linux</strong> is known <strong>for</strong>providing good remote maintenance functions. Mechanism <strong>for</strong> reporting failures are coming inplace. To achieve high availability with a cluster system, you should: Remove or have complete control on single point of failure.Have a simple design that has tolerance and resistance <strong>for</strong> failures, and be equipped with aguide <strong>for</strong> operation and troubleshooting.Detect a failure quickly and take appropriate action against it.Section I Introducing <strong>ExpressCluster</strong>27


Chapter 2Using <strong>ExpressCluster</strong>This chapter explains the components of <strong>ExpressCluster</strong>, how to design a cluster system, and how to use<strong>ExpressCluster</strong>.This chapter covers:• What is <strong>ExpressCluster</strong>?·························································································································30• <strong>ExpressCluster</strong> modules ·························································································································30• Software configuration of <strong>ExpressCluster</strong> ······························································································31• Network partition resolution···················································································································35• Failover mechanism ·······························································································································36• What is a resource? ································································································································44• <strong>Getting</strong> started with <strong>ExpressCluster</strong> ··································································································4829


Chapter 2 Using <strong>ExpressCluster</strong>What is <strong>ExpressCluster</strong>?<strong>ExpressCluster</strong> is software that enhances availability and expandability of systems by a redundant(clustered) system configuration. The application services running on the active server areautomatically inherited to a standby server when an error occurs in the active server.<strong>ExpressCluster</strong> modules<strong>ExpressCluster</strong> consists of following three modules:<strong>ExpressCluster</strong> ServerA core component of <strong>ExpressCluster</strong>. Includes all high availability function of the server. Theserver function of the WebManager is also included.<strong>ExpressCluster</strong> X WebManager (WebManager)A tool to manage <strong>ExpressCluster</strong> operations. Uses a Web browser as a user interface. TheWebManager is installed in <strong>ExpressCluster</strong> Server, but it is distinguished from the<strong>ExpressCluster</strong> Server because the WebManager is operated from the Web browser on themanagement PC.<strong>ExpressCluster</strong> X Builder (Builder)A tool <strong>for</strong> editing the cluster configuration data. The Builder also uses Web browser as a userinterface. The following two versions of Builder are provided: the offline version, which isinstalled on your terminal as software independent of <strong>ExpressCluster</strong> Server, and the onlineversion, which is opened by clicking the setup mode icon on the WebManager screen toolbaror Setup Mode on the View menu. The Builder needs to be installed separately from the<strong>ExpressCluster</strong> Server on the machine where you use the Builder.30<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Software configuration of <strong>ExpressCluster</strong>Software configuration of <strong>ExpressCluster</strong>The software configuration of <strong>ExpressCluster</strong> should look similar to the figure below. Install the<strong>ExpressCluster</strong> Server (software) on a <strong>Linux</strong> server, and the Builder on a management PC or aserver. Because the main functions of WebManager and Builder are included in <strong>ExpressCluster</strong>Server, it is not necessary to separately install them. However, to use the Builder in an environmentwhere <strong>ExpressCluster</strong> Server is not accessible, the offline version of Builder must be installed onthe PC. The WebManager or Builder can be used through the Web browser on the management PCor on each server in the cluster.<strong>Linux</strong><strong>ExpressCluster</strong>XWebManager(Server)JREBuilderWebManager(Browser)Server 1<strong>Linux</strong><strong>ExpressCluster</strong> XWebManager(Server)JREBuilderWebManager(Browser)Server 2Windowsor<strong>Linux</strong>JREBuilderWebManager(Browser)Management PCFigure 2-1 Software configuration of <strong>ExpressCluster</strong>How an error is detected in <strong>ExpressCluster</strong>There are three kinds of monitoring in <strong>ExpressCluster</strong>: (1) server monitoring, (2) applicationmonitoring, and (3) internal monitoring. These monitoring functions let you detect an error quicklyand reliably. The details of the monitoring functions are described below.Section I Introducing <strong>ExpressCluster</strong>31


Chapter 2 Using <strong>ExpressCluster</strong>What is server monitoring?Server monitoring is the most basic function of the failover-type cluster system. It monitors if aserver that constitutes a cluster is properly working.<strong>ExpressCluster</strong> regularly checks whether other servers are properly working in the cluster system.This way of verification is called “heartbeat communication.” The heartbeat communication usesthe following communication paths:Primary InterconnectUses an Ethernet NIC in communication pathdedicated to the failover-type cluster system.This is used to exchange in<strong>for</strong>mation betweenthe servers as well as to per<strong>for</strong>m heartbeatcommunication.Secondary InterconnectUses a communication path used <strong>for</strong>communication with client machine as analternative interconnect. Any Ethernet NICcan be used as long as TCP/IP can be used.This is also used to exchange in<strong>for</strong>mationbetween the servers and to per<strong>for</strong>m heartbeatcommunication.Figure 2-2 Server monitoringShared diskCreates an <strong>ExpressCluster</strong>-dedicated partition (<strong>ExpressCluster</strong> partition) on the disk that isconnected to all servers that constitute the failover-type cluster system, and per<strong>for</strong>ms heartbeatcommunication on the <strong>ExpressCluster</strong> partition.COM portPer<strong>for</strong>ms heartbeat communication between the servers that constitute the failover-type clustersystem through a COM port, and checks whether other servers are working properly.BMCPer<strong>for</strong>ms heartbeat communication between the servers that constitute the failover-type clustersystem through the BMC, and checks whether other servers are working properly.521431. Primary Interconnect2 Secondary Interconnect3 Shared disk4 COM port5 BMCHaving these communication paths dramatically improves the reliability of the communicationbetween the servers, and prevents the occurrence of network partition.Note:Network partition (also known as “split-brain syndrome”) refers to a condition when anetwork gets split by having a problem in all communication paths of the servers in a cluster.In a cluster system that is not capable of handling a network partition, a problem occurred in acommunication path and a server cannot be distinguished. As a result, multiple servers mayaccess the same resource and cause the data in a cluster system to be corrupted.32<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Software configuration of <strong>ExpressCluster</strong>What is application monitoring?Application monitoring is a function that monitors applications and factors that cause a situationwhere an application cannot run.Activation status of application monitoringAn error can be detected by starting up an application from an exec resource in <strong>ExpressCluster</strong> andregularly checking whether a process is active or not by using the pid monitor resource. It iseffective when the factor <strong>for</strong> application to stop is due to error termination of an application.Note:An error in resident process cannot be detected in an application started up by<strong>ExpressCluster</strong>. When the monitoring target application starts and stops a residentprocess, an internal application error (such as application stalling, result error)cannot be detected.Resource monitoringAn error can be detected by monitoring the cluster resources (such as disk partition and IP address)and public LAN using the monitor resources of the <strong>ExpressCluster</strong>. It is effective when the factor<strong>for</strong> application to stop is due to an error of a resource which is necessary <strong>for</strong> an application tooperate.What is internal monitoring?Internal monitoring refers to an inter-monitoring of modules within <strong>ExpressCluster</strong>. It monitorswhether each monitoring function of <strong>ExpressCluster</strong> is properly working. Activation status of<strong>ExpressCluster</strong> process monitoring is per<strong>for</strong>med within <strong>ExpressCluster</strong>. Critical monitoring of <strong>ExpressCluster</strong> processMonitorable and non-monitorable errorsThere are monitorable and non-monitorable errors in <strong>ExpressCluster</strong>. It is important to know whatcan or cannot be monitored when building and operating a cluster system.Detectable and non-detectable errors by server monitoringMonitoring condition: A heartbeat from a server with an error is stoppedExample of errors that can be monitored: Hardware failure (of which OS cannot continue operating) System panicExample of error that cannot be monitored: Partial failure on OS (<strong>for</strong> example, only a mouse or keyboard does not function)Section I Introducing <strong>ExpressCluster</strong>33


Chapter 2 Using <strong>ExpressCluster</strong>Detectable and non-detectable errors by application monitoringMonitoring conditions: Termination of applications with errors, continuous resource errors, anddisconnection of a path to the network devices.Example of errors that can be monitored: Abnormal termination of an application Failure to access the shared disk (such as HBA 1 failure) Public LAN NIC problemExample of errors that cannot be monitored: Application stalling and resulting in error. <strong>ExpressCluster</strong> cannot monitor applicationstalling and error results. However, it is possible to per<strong>for</strong>m failover by creating a programthat monitors applications and terminates itself when an error is detected, starting theprogram using the exec resource, and monitoring application using the PID monitor resource.1 HBA is an abbreviation <strong>for</strong> host bus adapter. This adapter is not <strong>for</strong> the shared disk, but <strong>for</strong> the server.<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>34


Network partition resolutionNetwork partition resolutionWhen the stop of a heartbeat is detected from a server, <strong>ExpressCluster</strong> determines whether it is anerror in a server or a network partition. If it is judged as a server failure, failover (activate resourcesand start applications on a healthy server) is per<strong>for</strong>med. If it is judged as network partition,protecting data is given priority over Taking over operations, so processing such as emergencyshutdown is per<strong>for</strong>med.The following is the network partition resolution method: ping methodRelated In<strong>for</strong>mation:For the details on the network partition resolution method, see Chapter 7, “Details onnetwork partition resolution resources” of the Reference <strong>Guide</strong>.Section I Introducing <strong>ExpressCluster</strong>35


Chapter 2 Using <strong>ExpressCluster</strong>Failover mechanismWhen an error is detected, <strong>ExpressCluster</strong> determines whether an error detected be<strong>for</strong>e failing overis an error in a server or a network partition. Then a failover is per<strong>for</strong>med by activating variousresources and starting up applications on a properly working server.The group of resources which fail over at the same time is called a “failover group.” From a user’spoint of view, a failover group appears as a virtual computer.Note:In a cluster system, a failover is per<strong>for</strong>med by restarting the application from aproperly working node. There<strong>for</strong>e, what is saved in an application memory cannot befailed over.From occurrence of error to completion of failover takes a few minutes. See the figure 2-3 below:Error occurredError detectedFailover startedFailover completedHeartbeat timeoutFile systemrecoveredActivating resources(Including disks and IP addresses)Recovering or restartingapplicationFigure 2-3 Failover time chartHeartbeat timeout The time <strong>for</strong> a standby server to detect an error after that error occurred on the active server.The setting values of the cluster properties should be adjusted depending on the applicationload. (The default value is 90 seconds.)Activating various resources The time to activate the resources necessary <strong>for</strong> operating an application.The resources can be activated in a few seconds in ordinary settings, but the required timechanges depending on the type and the number of resources registered to the failover group.For more in<strong>for</strong>mation, refer to the Installation and Configuration <strong>Guide</strong>.Start script execution time The data recovery time <strong>for</strong> a roll-back or roll-<strong>for</strong>ward of the database and the startup time ofthe application to be used in operation.The time <strong>for</strong> roll-back or roll-<strong>for</strong>ward can be predicted by adjusting the check point interval.For more in<strong>for</strong>mation, refer to the document that comes with each software product.36<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Failover mechanismFailover resources<strong>ExpressCluster</strong> can fail over the following resources:Switchable partitionResources such as disk resource, mirror disk resource and hybrid disk resource.A disk partition to store the data that the application takes over.Floating IP AddressBy connecting an application using the floating IP address, a client does not have to beconscious about switching the servers due to failover processing.It is achieved by dynamic IP address allocation to the public LAN adapter and sending ARPpacket. Connection by floating IP address is possible from most of the network devices.Script (exec resource) In <strong>ExpressCluster</strong>, applications are started up from the scripts.The file failed over on the shared disk may not be complete as data even if it is properlyworking as a file system. Write the recovery processing specific to an application at the timeof failover in addition to the startup of an application in the scripts.Note:In a cluster system, failover is per<strong>for</strong>med by restarting the application from aproperly working node. There<strong>for</strong>e, what is saved in an application memory cannot befailed over.System configuration of the failover type clusterIn a failover-type cluster, a disk array device is shared between the servers in a cluster. When anerror occurs on a server, the standby server takes over the applications using the data on the shareddisk.Public LANInterconnectdedicatedLAN<strong>ExpressCluster</strong>OSOSDataSharedDiskFigure 2-4 System configurationSection I Introducing <strong>ExpressCluster</strong>37


Chapter 2 Using <strong>ExpressCluster</strong>A failover-type cluster can be divided into the following categories depending on the clustertopologies:Uni-Directional Standby Cluster SystemIn the uni-directional standby cluster system, the active server runs applications while the otherserver, the standby server, does not. This is the simplest cluster topology and you can build ahigh-availability system without per<strong>for</strong>mance degradation after failing over.ApplicationApplicationFailoverFigure 2-5 Uni-directional standby cluster systemSame Application Multi Directional Standby Cluster SystemIn the same application multi-directional standby cluster system, the same applications areactivated on multiple servers. These servers also operate as standby servers. The applications mustsupport multi-directional standby operation. When the application data can be split into multipledata, depending on the data to be accessed, you can build a load distribution system per datapartitioning basis by changing the client’s connecting server.ApplicationApplicationFailoverApplicationApplication- The applications in the diagram are the same application.- Multiple application instances are run on a single server after failover.Figure 2-6 Same application multi directional standby cluster system38<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Failover mechanismDifferent Application – Multi Directional Standby Cluster SystemIn the different application multi-directional standby cluster system, different applications areactivated on multiple servers and these servers also operate as standby servers. The applications donot have to support multi-directional standby operation. A load distribution system can be built perapplication unit basis.ApplicationApplicationApplicationFailoverApplication- Operation 1 and operation 2 use different applications.Figure 2-7 Different application multi directional standby cluster systemNode to Node ConfigurationThe configuration can be expanded with more nodes by applying the configurations introducedthus far. In a node to node configuration described below, three different applications are run onthree servers and one standby server takes over the application if any problem occurs. In auni-directional standby cluster system, one of the two servers functions as a standby server.However, in a node to node configuration, only one of the four server functions as a standby serverand per<strong>for</strong>mance deterioration is not anticipated if an error occurs only on one server.Active Active Active StandbyAP A AP B AP CError!Active Active Active StandbyAP AAP CAP BFigure 2-8 Node to Node configurationSection I Introducing <strong>ExpressCluster</strong>39


Chapter 2 Using <strong>ExpressCluster</strong>Hardware configuration of the shared disk type clusterThe hardware configuration of the shared disk in <strong>ExpressCluster</strong> is described below. In general, thefollowing is used <strong>for</strong> communication between the servers in a cluster system: Two NIC cards (one <strong>for</strong> external communication, one <strong>for</strong> <strong>ExpressCluster</strong>)COM port connected by RS232C cross cableSpecific space of a shared diskSCSI or FibreChannel can be used <strong>for</strong> communication interface to a shared disk; however, recentlyFibreChannel is more commonly used.Access by this address fromthe WebManager clientIP address10.0.0.1Active server (server1)Virtual IP10.0.0.11Access by this address fromthe operation clientVirtual IP10.0.0.12/dev/ttyS0Shared diskIP address192.168.0.1IP address192.168.0.2Interconnect LANRS-232CDisk heartbeat device/dev/sdb1IP address10.0.0.2Standby server (server2)/dev/ttyS0Shared disk deviceMount pointFile system/dev/sdb2/mnt/sdb2ext3Public-LANTo a client PCFigure 2-9 Sample of cluster environment when a shared disk is used40<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Failover mechanismHardware configuration of the mirror disk type clusterThe hardware configuration of the mirror disk in <strong>ExpressCluster</strong> is described below.Unlike the shared disk type, a network to copy the mirror disk data is necessary. In general, anetwork is used with NIC <strong>for</strong> internal communication in <strong>ExpressCluster</strong>.Mirror disks need to be separated from the operating system; however, they do not depend on aconnection interface (IDE or SCSI.)Figure 2-10 Sample of cluster environment when mirror disks are used (when allocatingcluster partition and data partition to the disk where OS is installed):Access by this address fromthe WebManager client.IP address10.0.0.1Active server (server 1)Virtual IP10.0.0.11Access by this addressfrom the operation client.Virtual IP10.0.0.12/dev/ttyS0IP address192.168.0.1IP address192.168.0.2Interconnet LAN(Mirror connect)OS /boot device /dev/sda1OS swap device /dev/sda2OS /(root) device /dev/sda3Device <strong>for</strong> cluster partition /dev/sda5Device <strong>for</strong> data partition /dev/sda6Mount point /mnt/sda6File system ext3RS-232CIP address10.0.0.2Public-LANStandby server (server 2)To operation clientDisk(Free partition of the diskwhere OS is installed)/dev/ttyS0Figure 2-11 Sample of cluster environment when mirror disks are used (when disks <strong>for</strong>cluster partition and data partition are prepared):Access by this address fromthe WebManager clientIP address10.0.0.1Active server (server1)Virtual IP10.0.0.11Access by this addressfrom the operation clientVirtual IP10.0.0.12/dev/ttyS0Mirror diskIP address192.168.0.1IP address192.168.0.2Interconnect LAN(Mirror connect)RS-232CCluster partition device /dev/sdb1Mirror resource disk device /dev/sdb2Mount point/mnt/sdb2File systemext3IP address10.0.0.2Public-LANStandby server (server2)/dev/ttyS0Mirror diskTo operation clientSection I Introducing <strong>ExpressCluster</strong>41


Chapter 2 Using <strong>ExpressCluster</strong>Hardware configuration of the hybrid disk type clusterThe hardware configuration of the hybrid disk in <strong>ExpressCluster</strong> is described below.Unlike the shared disk type, a network to copy the data is necessary. In general, NIC <strong>for</strong> internalcommunication in <strong>ExpressCluster</strong> is used to meet this purpose.Disks do not depend on a connection interface (IDE or SCSI).Access by this address fromthe WebManager clientIP address 10.0.0.1Active server (server1)Floating IP10.0.0.11Access by this addressfrom the operation clientFloating IP10.0.0.12Shared disk <strong>for</strong> hybridresourceIP address 192.168.0.1IP address 192.168.0.2HUBHybrid device/dev/NMP1Mount point/mnt/hd1File systemext3Device <strong>for</strong> cluster partition /dev/sdb1Hybrid resource disk device /dev/sdb2DISK heartbeat device name /dev/sdb3Raw device name /dev/raw/raw1IP address 10.0.0.2Standby server (server2)Server group 1IP address 192.168.0.3IP address 10.0.0.3Server group 2Disk <strong>for</strong> hybridresourceHybrid device /dev/NMP1Mount point /mnt/hd1File system ext3Device <strong>for</strong> cluster partition /dev/sdb1Hybrid resource disk device /dev/sdb2public-LANTo operation clientStandby server server3Figure 2-12: Sample of cluster environment where hybrid disks are used (two servers use ashared disk and the third server’s general disk are used <strong>for</strong> mirroring)42<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Failover mechanismWhat is cluster object?In <strong>ExpressCluster</strong>, the various resources are managed as the following groups:Cluster objectConfiguration unit of a cluster.Server objectIndicates the physical server and belongs to the cluster object.Server group objectGroups the servers and belongs to the cluster object.Heartbeat resource objectIndicates the network part of the physical server and belongs to the server object.Network partition resolution resource objectIndicates the network partition resolution mechanism and belongs to the server object.Group objectIndicates a virtual server and belongs to the cluster object.Group resource objectIndicates resources (network, disk) of the virtual server and belongs to the group object.Monitor resource objectIndicates monitoring mechanism and belongs to the cluster object.Section I Introducing <strong>ExpressCluster</strong>43


Chapter 2 Using <strong>ExpressCluster</strong>What is a resource?In <strong>ExpressCluster</strong>, a group used <strong>for</strong> monitoring the target is called “resources.” There are fourtypes of resources and are managed separately. Having resources allows distinguishing what ismonitoring and what is being monitored more clearly. It also makes building a cluster and handlingan error easy. The resources can be divided into heartbeat resources, network partition resolutionresources, group resources, and monitor resources.Heartbeat resourcesHeartbeat resources are used <strong>for</strong> verifying whether the other server is working properly betweenservers. The following heartbeat resources are currently supported:LAN heartbeat resourceUses Ethernet <strong>for</strong> communication.Kernel mode LAN heartbeat resourceUses Ethernet <strong>for</strong> communication.COM heartbeat resourceUses RS232C (COM) <strong>for</strong> communication.Disk heartbeat resourceUses a specific partition (cluster partition <strong>for</strong> disk heartbeat) on the shared disk <strong>for</strong> communication.It can be used only on a shared disk configuration.BMC heartbeat resourceUses Ethernet <strong>for</strong> communication via the BMC. This resource can be used only when the BMChardware and firmware support the communication.Network partition resolution resourcesThe resource used <strong>for</strong> solving the network partition is shown below:PING network partition resolution resourceThis is a network partition resolution resource by the PING method.Group resources44A group resource constitutes a unit when a failover occurs. The following group resources arecurrently supported:Floating IP resource (fip)Provides a virtual IP address. A client can access virtual IP address the same way as the regular IPaddress.EXEC resource (exec)Provides a mechanism <strong>for</strong> starting and stopping the applications such as DB and httpd.Disk resource (disk)Provides a specified partition on the shared disk. It can be used only on a shared disk configuration.Mirror disk resource (md)Provides a specified partition on the mirror disk. It can be used only on a mirror disk configuration.Hybrid disk resource (hd)Provides a specified partition on a shared disk or a disk. It can be used only <strong>for</strong> hybridconfiguration.Volume manager resource (volmgr)Handles multiple storage devices and disks as a single logical disk.<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


What is a resource?NAS resource (nas)Connect to the shared resources on NAS server. Note that it is not a resource that the cluster serverbehaves as NAS server.Virtual IP resource (vip)Provides a virtual IP address. This can be accessed from a client in the same way as a general IPaddress. This can be used in the remote cluster configuration among different network addresses.VM resource (vm)Starts, stops, or migrates the virtual machine.Dynamic DNS resource (ddns)Registers the virtual host name and the IP address of the active server to the dynamic DNS server.Monitor resourcesA monitor resource monitors a cluster system. The following monitor resources are currentlysupported:Floating IP monitor resource (fipw)Provides a monitoring mechanism of an IP address started up by a floating IP resource.IP monitor resource (ipw)Provides a monitoring mechanism of an external IP address.Disk monitor resource (diskw)Provides a monitoring mechanism of the disk. It also monitors the shared disk.Mirror disk monitor resource (mdw)Provides a monitoring mechanism of the mirroring disks.Mirror disk connect monitor resource (mdnw)Provides a monitoring mechanism of the mirror disk connect.Hybrid disk monitor resource (hdw)Provides a monitoring mechanism of the hybrid disk.Hybrid disk connect monitor resource (hdnw)Provides a monitoring mechanism of the hybrid disk connect.PID monitor resource (pidw)Provides a monitoring mechanism to check whether a process started up by exec resource is activeor not.User mode monitor resource (userw)Provides a monitoring mechanism <strong>for</strong> a stalling problem in the user space.NIC Link Up/Down monitor resource (miiw)Provides a monitoring mechanism <strong>for</strong> link status of LAN cable.Volume manager monitor resource (volmgrw)Provides a monitoring mechanism <strong>for</strong> multiple storage devices and disks.Multi target monitor resource (mtw)Provides a status with multiple monitor resources.Virtual IP monitor resource (vipw)Provides a mechanism <strong>for</strong> sending RIP packets of a virtual IP resource.ARP monitor resource (arpw)Provides a mechanism <strong>for</strong> sending ARP packets of a floating IP resource or a virtual IP resource.Custom monitor resource (genw)Section I Introducing <strong>ExpressCluster</strong>45


Chapter 2 Using <strong>ExpressCluster</strong>46Provides a monitoring mechanism to monitor the system by the operation result of commands orscripts which per<strong>for</strong>m monitoring, if any.VM monitor resource (vmw)Checks whether the virtual machine is alive.Message receive monitor resource (mrw)Specifies the action to take when an error message is received and how the message is displayed onthe WebManager.Dynamic DNS monitor resource (ddnsw)Periodically registers the virtual host name and the IP address of the active server to the dynamicDNS server.Process name monitor resource (psw)Provides a monitoring mechanism <strong>for</strong> checking whether a process specified by a process name isactive.DB2 monitor resource (db2w)Provides a monitoring mechanism <strong>for</strong> IBM DB2 database.ftp monitor resource (ftpw)Provides a monitoring mechanism <strong>for</strong> FTP server.http monitor resource (httpw)Provides a monitoring mechanism <strong>for</strong> HTTP server.imap4 monitor resource (imap4w)Provides a monitoring mechanism <strong>for</strong> IMAP4 server.MySQL monitor resource (mysqlw)Provides a monitoring mechanism <strong>for</strong> MySQL database.nfs monitor resource (nfsw)Provides a monitoring mechanism <strong>for</strong> nfs file server.Oracle monitor resource (oraclew)Provides a monitoring mechanism <strong>for</strong> Oracle database.OracleAS monitor resource (oracleasw)Provides a monitoring mechanism <strong>for</strong> Oracle application.pop3 monitor resource (pop3w)Provides a monitoring mechanism <strong>for</strong> POP3 server.PostgreSQL monitor resource (psqlw)Provides a monitoring mechanism <strong>for</strong> PostgreSQL database.samba monitor resource (sambaw)Provides a monitoring mechanism <strong>for</strong> samba file server.smtp monitor resource (smtpw)Provides a monitoring mechanism <strong>for</strong> SMTP server.Sybase monitor resource (sybasew)Provides a monitoring mechanism <strong>for</strong> Sybase database.Tuxedo monitor resource (tuxw)Provides a monitoring mechanism <strong>for</strong> Tuxedo application server.<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


What is a resource?Websphere monitor resource (wasw)Provides a monitoring mechanism <strong>for</strong> Websphere application server.Weblogic monitor resource (wlsw)Provides a monitoring mechanism <strong>for</strong> Weblogic application server.WebOTX monitor resource (otxsw)Provides a monitoring mechanism <strong>for</strong> WebOTX application server.JVM monitor resource (jraw)Provides a monitoring mechanism <strong>for</strong> Java VM.System monitor resource (sraw)Provides a monitoring mechanism <strong>for</strong> the resources specific to individual processes or those of thewhole system.Section I Introducing <strong>ExpressCluster</strong>47


Chapter 2 Using <strong>ExpressCluster</strong><strong>Getting</strong> started with <strong>ExpressCluster</strong>Refer to the following guides when building a cluster system with <strong>ExpressCluster</strong>:Latest in<strong>for</strong>mationRefer to Section II, “Installing <strong>ExpressCluster</strong>” in this guide.Designing a cluster systemRefer to Section I, “Configuring a cluster system” in the Installation and Configuration <strong>Guide</strong> andSection II, “Resource details” in the Reference <strong>Guide</strong>.Configuring a cluster systemRefer to the Installation and Configuration <strong>Guide</strong>.Troubleshooting the problemRefer to Section III, “Maintenance in<strong>for</strong>mation” in the Reference <strong>Guide</strong>.48<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Section IIInstalling<strong>ExpressCluster</strong>This section provides the latest in<strong>for</strong>mation on the <strong>ExpressCluster</strong>. The latest in<strong>for</strong>mation on the supportedhardware and software is described in detail. Topics such as restrictions, known problems, and how totroubleshoot the problem are covered.• Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>• Chapter 4 Latest version in<strong>for</strong>mation• Chapter 5 Notes and Restrictions• Chapter 6 Upgrading <strong>ExpressCluster</strong>49


Chapter 3Installation requirements <strong>for</strong><strong>ExpressCluster</strong>This chapter provides in<strong>for</strong>mation on system requirements <strong>for</strong> <strong>ExpressCluster</strong>.This chapter covers:• Hardware················································································································································52• Software·················································································································································54• System requirements <strong>for</strong> the Builder ······································································································72• System requirements <strong>for</strong> the WebManager·····························································································74• System requirements <strong>for</strong> the Integrated WebManager ············································································76• System requirements <strong>for</strong> WebManager···································································································7751


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>Hardware<strong>ExpressCluster</strong> operates on the following server architectures:IA-32x86_64IBM POWER (Replicator, Replicator DR, Agents except Database Agent are not supported)General server requirementsRequired specifications <strong>for</strong> <strong>ExpressCluster</strong> Server are the following:RS-232C port 1 port (not necessary when configuring a cluster with 3 or more nodes)Ethernet port 2 or more portsShared diskMirror disk or empty partition <strong>for</strong> mirrorCD-ROM driveWhen using the off-line Builder upon constructing and changing the existing configuration, one ofthe following is required <strong>for</strong> communication between the off-line Builder and servers:Removable media (<strong>for</strong> example, floppy disk drive or USB flash drive)A machine to operate the off-line Builder and a way to share filesSupported disk interfacesDisk types that are supported as mirror disks or hybrid disk (non-shared disk) of Replicator DR areas follows:Disk type Host side driver RemarksIDE ide Supported up to 120GBSCSISCSISCSISCSISCSIRAIDaic7xxxaic79xxsym53c8xxmptbase,mptscsihmptsasMegaraid (SCSI type)RAID megaraid (IDE type) Supported up to 275GBS-ATA sata-nv Supported up to 80GBS-ATA ata-piix Supported up to 120GB52<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


HardwareSupported network interfacesThe following are the network boards that are supported as a mirror disk connect <strong>for</strong> the mirror diskand hybrid disk of the Replicator and the Replicator DR:ChipIntel 82540EMIntel 82544EIIntel 82546EBIntel 82546GBIntel 82573LIntel 80003ES2LANIntel 631xESB/632xESBBroadcom BCM5701Broadcom BCM5703Broadcom BCM5721Broadcom BCM5721Drivere1000bcm5700tg3Only typical examples are listed above and other products can also be used.Servers supporting BMC-related functionsThe table below lists the supported servers that can use the function to <strong>for</strong>cibly stop a physicalmachine and the chassis identify function. These are typical examples, and also some other serverscan use these functions.ServerRemarksExpress5800/120Rg-1Express5800/120Rf-1Express5800/120Rg-2Servers supporting Express5800/A1080a andExpress5800/A1080aA1040a series linkageThe table below lists the supported servers that can use the Express5800/A1080a andExpress5800/A1040a series linkage function of the BMC heartbeat resources and message receivemonitor resources. This function cannot be used by servers other than the following.ServeRemarksExpress5800/A1080a-E Update to the latest firmware.Express5800/A1080a-DExpress5800/A1080a-SExpress5800/A1040aUpdate to the latest firmware.Update to the latest firmware.Update to the latest firmware.Section II Installing <strong>ExpressCluster</strong>53


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>SoftwareSystem requirements <strong>for</strong> <strong>ExpressCluster</strong> ServerSupported distributions and kernel versionsIA-32DistributionTurbolinux 11 Server(SP1)Turbolinux ApplianceServer 3.0(SP1)Red HatEnterprise <strong>Linux</strong> 5(update4)Red HatEnterprise <strong>Linux</strong> 5(update5)The environment where <strong>ExpressCluster</strong> Server can operate depends on kernel module versionsbecause there are kernel modules unique to <strong>ExpressCluster</strong>. Kernel versions which has beenverified are listed below.Kernel version2.6.23-102.6.23-10smp64G2.6.23-122.6.23-12smp64G2.6.23-102.6.23-10smp64G2.6.23-122.6.23-12smp64G2.6.18-164.el52.6.18-164.el5PAE2.6.18-164.el5xen2.6.18-164.6.1.el52.6.18-164.6.1.el5PAE2.6.18-164.6.1.el5xen2.6.18-164.9.1.el52.6.18-164.9.1.el5PAE2.6.18-164.9.1.el5xen2.6.18-164.11.1.el52.6.18-164.11.1.el5PAE2.6.18-164.11.1.el5xen2.6.18-164.15.1.el52.6.18-164.15.1.el5PAE2.6.18-164.15.1.el5xen2.6.18-164.38.1.el52.6.18-164.38.1.el5PAE2.6.18-164.38.1.el5xen2.6.18-194.el52.6.18-194.el5PAE2.6.18-194.el5xen2.6.18-194.8.1.el52.6.18-194.8.1.el5PAE2.6.18-194. 8.1.el5xen2.6.18-194.11.4.el52.6.18-194.11.4.el5PAE2.6.18-194.11.4.el5xen2.6.18-194.17.1.el52.6.18-194.17.1.el5PAE2.6.18-194.17.1.el5xenReplicatorReplicator DRsupportRun clpkaandclpkhbsupport<strong>ExpressCluster</strong>VersionYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.1-1 or laterYes Yes 3.0.1-1 or laterRemarks54<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareDistributionRed HatEnterprise <strong>Linux</strong> 5(update6)Red HatEnterprise <strong>Linux</strong> 5(update7)Red HatEnterprise <strong>Linux</strong> 5(update8)Red HatEnterprise <strong>Linux</strong> 6Kernel version2.6.18-194.32.1.el52.6.18-194.32.1.el5PAE2.6.18-194.32.1.el5xen2.6.18-238.el52.6.18-238.el5PAE2.6.18-238.el5xen2.6.18-238.1.1.el52.6.18-238.1.1.el5PAE2.6.18-238.1.1.el5xen2.6.18-238.9.1.el52.6.18-238.9.1.el5PAE2.6.18-238.9.1.el5xen2.6.18-238.37.1.el52.6.18-238.37.1.el5PAE2.6.18-238.37.1.el5xen2.6.18-274.el52.6.18-274.el5PAE2.6.18-274.el5xen2.6.18-274.18.1.el52.6.18-274.18.1.el5PAE2.6.18-274.18.1.el5xen2.6.18-308.el52.6.18-308.el5PAE2.6.18-308.el5xen2.6.18-308.4.1.el52.6.18-308.4.1.el5PAE2.6.18-308.4.1.el5xen2.6.18-308.11.1.el52.6.18-308.11.1.el5PAE2.6.18-308.11.1.el5xenReplicatorReplicator DRsupportRun clpkaandclpkhbsupport<strong>ExpressCluster</strong>VersionYes Yes 3.0.3-1 or laterYes Yes 3.0.3-1 or laterYes Yes 3.0.3-1 or laterYes Yes <strong>3.1</strong>.0-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.0-1 or laterYes Yes <strong>3.1</strong>.3-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.5-1 or later2.6.32-71.el6.i686 Yes Yes 3.0.2-1 or later2.6.32-71.7.1.el6.i686 Yes Yes 3.0.3-1 or later2.6.32-71.14.1.el6.i686 Yes Yes 3.0.3-1 or later2.6.32-71.18.1.el6.i686 Yes Yes 3.0.3-1 or laterRed Hat2.6.32-131.0.15.el6.i686 Yes Yes 3.0.4-1 or laterEnterprise <strong>Linux</strong> 6(update1) 2.6.32-131.21.1.el6.i686 Yes Yes <strong>3.1</strong>.3-1 or laterRed HatEnterprise <strong>Linux</strong> 6(update2)2.6.32-220.el6.i686 Yes Yes <strong>3.1</strong>.3-1 or later2.6.32-220.4.2.el6.i686 Yes Yes <strong>3.1</strong>.3-1 or laterRemarks2.6.32-220.17.1.el6.i686 Yes Yes <strong>3.1</strong>.4-1 or later2.6.32-220.2<strong>3.1</strong>.el6.i686 Yes Yes <strong>3.1</strong>.5-1 or laterSection II Installing <strong>ExpressCluster</strong>55


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>DistributionKernel versionReplicatorReplicator DRsupportRun clpkaandclpkhbsupport<strong>ExpressCluster</strong>VersionRed Hat2.6.32-279.el6.i686 Yes Yes <strong>3.1</strong>.4-1 or laterEnterprise <strong>Linux</strong> 6(update3) 2.6.32-279.2.1.el6.i686 Yes Yes <strong>3.1</strong>.5-1 or laterAsianux Server 3(SP2)Asianux Server 3(SP3)Asianux Server 3(SP4)2.6.18-128.7AXS32.6.18-128.7AXS3PAE2.6.18-128.7AXS3xen2.6.18-194.1.AXS32.6.18-194.1.AXS3PAE2.6.18-194.1.AXS3xen2.6.18-194.2.AXS32.6.18-194.2.AXS3PAE2.6.18-194.2.AXS3xen2.6.18-194.6.AXS32.6.18-194.6.AXS3PAE2.6.18-194.6.AXS3xen2.6.18-194.9.AXS32.6.18-194.9.AXS3PAE2.6.18-194.9.AXS3xen2.6.18-238.2.AXS32.6.18-238.2.AXS3PAE2.6.18-238.2.AXS3xen2.6.18-308.1.AXS32.6.18-308.1.AXS3PAE2.6.18-308.1.AXS3xenYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.1-1 or laterYes Yes 3.0.3-1 or laterYes Yes <strong>3.1</strong>.0-1 or laterYes Yes <strong>3.1</strong>.5-1 or laterAsianux Server 4 2.6.32-71.7.1.el6.i686 Yes Yes 3.0.4-1 or laterAsianux Server 4(SP1)Asianux Server 4(SP2)Novell SUSE LINUXEnterprise Server 10(SP2)Novell SUSE LINUXEnterprise Server 10(SP3)2.6.32-131.12.1.el6.i686 Yes Yes <strong>3.1</strong>.3-1 or later2.6.32-220.1<strong>3.1</strong>.el6.i686 Yes Yes <strong>3.1</strong>.4-1 or later2.6.32-279.2.1.el6.i686 Yes Yes <strong>3.1</strong>.7-1 or later2.6.16.60-0.21-default2.6.16.60-0.21-smp2.6.16.60-0.21-bigsmp2.6.16.60-0.21-xen2.6.16.60-0.54.5-default2.6.16.60-0.54.5-smp2.6.16.60-0.54.5-bigsmp2.6.16.60-0.54.5-xen2.6.16.60-0.69.1-default2.6.16.60-0.69.1-smp2.6.16.60-0.69.1-bigsmp2.6.16.60-0.69.1-xen2.6.16.60-0.83.2-default2.6.16.60-0.83.2-smp2.6.16.60-0.83.2-bigsmp2.6.16.60-0.83.2-xenYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYesYes3.0.1-1 or later3.0.3-1,<strong>3.1</strong>.0-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterRemarks56<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareDistributionNovell SUSE LINUXEnterprise Server 10(SP4)Novell SUSE LINUXEnterprise Server 11Novell SUSE LINUXEnterprise Server 11(SP1)Novell SUSE LINUXEnterprise Server 11(SP2)XenServer 5.5(update2)XenServer 5.6XenServer 5.6(SP2)XenServer 6.0Kernel version2.6.16.60-0.85.1-default2.6.16.60-0.85.1-smp2.6.16.60-0.85.1-bigsmp2.6.16.60-0.85.1-xen2.6.16.60-0.91.1-default2.6.16.60-0.91.1-smp2.6.16.60-0.91.1-bigsmp2.6.16.60-0.91.1-xen2.6.16.60-0.9<strong>3.1</strong>-default2.6.16.60-0.9<strong>3.1</strong>-smp2.6.16.60-0.9<strong>3.1</strong>-bigsmp2.6.16.60-0.9<strong>3.1</strong>-xen2.6.16.60-0.97.1-default2.6.16.60-0.97.1-smp2.6.16.60-0.97.1-bigsmp2.6.16.60-0.97.1-xen2.6.27.19-5-default2.6.27.19-5-pae2.6.27.19-5-xenReplicatorReplicator DRsupportRun clpkaandclpkhbsupport<strong>ExpressCluster</strong>VersionYes Yes 3.0.4-1 or laterYes Yes <strong>3.1</strong>.3-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.5-1 or laterYes Yes 3.0.0-1 or later2.6.27.48-0.12-default2.6.27.48-0.12-paeYes Yes 3.0.1-1 or later2.6.27.48-0.12-xen2.6.32.12-0.7-default2.6.32.12-0.7-pae2.6.32.12-0.7-xenNoYesYesYes3.0.0-1 or later3.0.2-1 or later2.6.32.19-0.<strong>3.1</strong>-defaultNo Yes 3.0.1-1 or later2.6.32.19-0.<strong>3.1</strong>-pae2.6.32.19-0.<strong>3.1</strong>-xen Yes Yes 3.0.2-1 or later2.6.32.23-0.<strong>3.1</strong>-defaultNo Yes 3.0.1-1 or later2.6.32.23-0.<strong>3.1</strong>-pae2.6.32.23-0.<strong>3.1</strong>-xen Yes Yes 3.0.2-1 or later2.6.32.49-0.3-default2.6.32.49-0.3-pae2.6.32.49-0.3-xen3.0.13-0.27-default3.0.13-0.27-pae3.0.34-0.7-default3.0.34-0.7-pae2.6.18-128.1.6.el5.xs5.5.0.505.1024xen2.6.27.42-0.1.1.xs5.6.0.44.111158xen2.6.32.12-0.7.1.xs5.6.100.32<strong>3.1</strong>70596xen2.6.32.12-0.7.1.xs6.0.0.529.170661xenYes Yes <strong>3.1</strong>.3-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.5-1 or laterNo Yes 3.0.0-1 or laterNo Yes <strong>3.1</strong>.0-1 or laterNo Yes <strong>3.1</strong>.0-1 or laterNo Yes <strong>3.1</strong>.1-1 or laterRemarksSection II Installing <strong>ExpressCluster</strong>57


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>x86_64DistributionTurbolinux 11 Server(SP1)Kernel versionReplicator Run clpkaReplicator DR and clpkhbsupport support<strong>ExpressCluster</strong>Version2.6.23-10 Yes Yes 3.0.0-1 or later2.6.23-12 Yes Yes 3.0.0-1 or laterTurbolinux Appliance 2.6.23-10 Yes Yes 3.0.0-1 or laterServer 3.0(SP1) 2.6.23-12 Yes Yes 3.0.0-1 or laterRed HatEnterprise <strong>Linux</strong> 5(update4)Red HatEnterprise <strong>Linux</strong> 5(update5)Red HatEnterprise <strong>Linux</strong> 5(update6)Red HatEnterprise <strong>Linux</strong> 5(update7)2.6.18-164.el52.6.18-164.el5xen2.6.18-164.6.1.el52.6.18-164.6.1.el5xen2.6.18-164.9.1.el52.6.18-164.9.1.el5xen2.6.18-164.11.1.el52.6.18-164.11.1.el5xen2.6.18-164.15.1.el52.6.18-164.15.1.el5xen2.6.18-164.38.1.el52.6.18-164.38.1.el5xen2.6.18-194.el52.6.18-194.el5xen-2.6.18-194.8.1.el52.6.18-194.8.1.el5xen2.6.18-194.11.4.el52.6.18-194.11.4.el5xen2.6.18-194.17.1.el52.6.18-194.17.1.el5xen2.6.18-194.32.1.el52.6.18-194.32.1.el5xen2.6.18-238.el52.6.18-238.el5xen2.6.18-238.1.1.el52.6.18-238.1.1.el5xen2.6.18-238.9.1.el52.6.18-238.9.1.el5xen2.6.18-238.37.1.el52.6.18-238.37.1.el5xen2.6.18-274.el52.6.18-274.el5xen2.6.18-274.18.1.el52.6.18-274.18.1.el5xenYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.1-1 or laterYes Yes 3.0.1-1 or laterYes Yes 3.0.3-1 or laterYes Yes 3.0.3-1 or laterYes Yes 3.0.3-1 or laterYes Yes <strong>3.1</strong>.0-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.0-1 or laterYes Yes <strong>3.1</strong>.3-1 or laterRemarks58<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareDistributionRed HatEnterprise <strong>Linux</strong> 5(update8)Red HatEnterprise <strong>Linux</strong> 6Kernel version2.6.18-308.el52.6.18-308.el5xen2.6.18-308.4.1el52.6.18-308.4.1el5xen2.6.18-308.11.1el52.6.18-308.11.1el5xenReplicator Run clpkaReplicator DR and clpkhbsupport support<strong>ExpressCluster</strong>VersionYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.5-1 or later2.6.32-71.el6.x86_64 Yes Yes 3.0.2-1 or later2.6.32-71.7.1.el6.x86_64 Yes Yes 3.0.3-1 or later2.6.32-71.14.1.el6.x86_64 Yes Yes 3.0.3-1 or later2.6.32-71.18.1.el6.x86_64 Yes Yes 3.0.3-1 or laterRed Hat2.6.32-131.0.15.el6.x86_64 Yes Yes 3.0.4-1 or laterEnterprise <strong>Linux</strong> 6(update1) 2.6.32-131.21.1.el6.x86_64 Yes Yes <strong>3.1</strong>.3-1 or laterRed HatEnterprise <strong>Linux</strong> 6(update2)2.6.32-220.el6.x86_64 Yes Yes <strong>3.1</strong>.3-1 or later2.6.32-220.4.2.el6.x86_64 Yes Yes <strong>3.1</strong>.3-1 or laterRemarks2.6.32-220.17.1.el6.x86_64 Yes Yes <strong>3.1</strong>.4-1 or later2.6.32-220.2<strong>3.1</strong>.el6.i686 Yes Yes <strong>3.1</strong>.5-1 or laterRed HatEnterprise <strong>Linux</strong> 6(update3)2.6.32-279.el6.x86_64 Yes Yes <strong>3.1</strong>.4-1 or later2.6.32-279.2.1.el6.i686 Yes Yes <strong>3.1</strong>.5-1 or laterAsianux Server 3(SP2)Asianux Server 3(SP3)Asianux Server 3(SP4)2.6.32-279.11.1.el6.x86_64 Yes Yes <strong>3.1</strong>.7-1 or later2.6.32-279.14.1.el6.x86_64 Yes Yes <strong>3.1</strong>.7-1 or later2.6.18-128.7AXS32.6.18-128.7AXS3xen2.6.18-194.1.AXS32.6.18-194.1.AXS3xen2.6.18-194.2.AXS32.6.18-194.2.AXS3xen2.6.18-194.6.AXS32.6.18-194.6.AXS3xen2.6.18-194.9.AXS32.6.18-194.9.AXS3xen2.6.18-238.2.AXS32.6.18-238.2.AXS3xen2.6.18-308.1.AXS32.6.18-308.1.AXS3xenYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.1-1 or laterYes Yes 3.0.3-1 or laterYes Yes <strong>3.1</strong>.0-1 or laterYes Yes <strong>3.1</strong>.5-1 or laterAsianux Server 4 2.6.32-71.7.1.el6.x86_64 Yes Yes 3.0.4-1 or laterAsianux Server 4(SP1)Asianux Server 4(SP2)Novell SUSE LINUXEnterprise Server 10(SP2)2.6.32-131.12.1.el6.x86_64 Yes Yes <strong>3.1</strong>.3-1 or later2.6.32-220.1<strong>3.1</strong>.el6.x86_64 Yes Yes <strong>3.1</strong>.4-1 or later2.6.32-279.2.1.el6.x86_64 Yes Yes <strong>3.1</strong>.7-1 or later2.6.16.60-0.21-default2.6.16.60-0.21-smp2.6.16.60-0.21-xenYes Yes 3.0.0-1 or laterSection II Installing <strong>ExpressCluster</strong>59


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>DistributionNovell SUSE LINUXEnterprise Server 10(SP3)Novell SUSE LINUXEnterprise Server 10(SP4)Novell SUSE LINUXEnterprise Server 11Novell SUSE LINUXEnterprise Server 11(SP1)Kernel version2.6.16.60-0.54.5-default2.6.16.60-0.54.5-smp2.6.16.60-0.54.5-xen2.6.16.60-0.69.1-default2.6.16.60-0.69.1-smp2.6.16.60-0.69.1-xen2.6.16.60-0.83.2-default2.6.16.60-0.83.2-smp2.6.16.60-0.83.2-xen2.6.16.60-0.85.1-default2.6.16.60-0.85.1-smp2.6.16.60-0.85.1-xen2.6.16.60-0.91.1-default2.6.16.60-0.91.1-smp2.6.16.60-0.91.1-xen2.6.16.60-0.9<strong>3.1</strong>-default2.6.16.60-0.9<strong>3.1</strong>-smp2.6.16.60-0.9<strong>3.1</strong>-xen2.6.16.60-0.97.1-default2.6.16.60-0.97.1-smp2.6.16.60-0.97.1-xen2.6.27.19-5-default2.6.27.19-5-xen2.6.27.48-0.12-default2.6.27.48-0.12-xenReplicator Run clpkaReplicator DR and clpkhbsupport support<strong>ExpressCluster</strong>VersionYes Yes 3.0.0-1 or laterYesYes3.0.1-1 or later3.0.3-1,<strong>3.1</strong>.0-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes 3.0.4-1 or laterYes Yes <strong>3.1</strong>.3-1 or laterYes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.5-1 or laterYes Yes 3.0.0-1 or laterYes Yes 3.0.1-1 or later2.6.32.12-0.7-defaultNo Yes 3.0.0-1 or later2.6.32.12-0.7-xen Yes Yes 3.0.2-1 or later2.6.32.19-0.<strong>3.1</strong>-defaultNo Yes 3.0.1-1 or later2.6.32.19-0.<strong>3.1</strong>-xen Yes Yes 3.0.2-1 or later2.6.32.23-0.<strong>3.1</strong>-defaultNo Yes 3.0.1-1 or later2.6.32.23-0.<strong>3.1</strong>-xen Yes Yes 3.0.2-1 or laterRemarks2.6.32.49-0.3-default2.6.32.49-0.3-xenYes Yes <strong>3.1</strong>.3-1 or laterNovell SUSE LINUXEnterprise Server 11(SP2)OracleEnterprise <strong>Linux</strong> 5(5.5)Oracle <strong>Linux</strong> 6.2VMware ESX 4.0VMware ESX 4.0(update1)VMware ESX 4.0(update2)603.0.13-0.27-default3.0.13-0.27-xen3.0.34-0.7-default3.0.34-0.7-xen2.6.18-194.el52.6.18-194.el5xen2.6.39-200.29.1.el6uek.x86_64Yes Yes <strong>3.1</strong>.4-1 or laterYes Yes <strong>3.1</strong>.5-1 or laterYes Yes 3.0.0-1 or laterYes Yes <strong>3.1</strong>.5-1 or later2.6.18-128.ESX No Yes 3.0.0-1 or later<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareDistributionKernel versionReplicator Run clpkaReplicator DR and clpkhbsupport support<strong>ExpressCluster</strong>VersionVMware ESX 4.1 2.6.18-164.ESX No Yes 3.0.0-1 or laterVMware ESX 4.1(update1) 2.6.18-194.ESX No Yes 3.0.3-1 or laterRemarksVMware ESX 4.1(update2) 2.6.18-194.ESX No Yes <strong>3.1</strong>.0-1 or laterIBM POWERDistributionRed HatEnterprise <strong>Linux</strong> 5(update4)Red HatEnterprise <strong>Linux</strong> 5(update5)Red HatEnterprise <strong>Linux</strong> 5(update6)Kernel versionReplicatorReplicator DRsupportRun clpkaand clpkhbsupport<strong>ExpressCluster</strong>Version2.6.18-164.el5 No Yes 3.0.0-1 or later2.6.18-164.6.1.el5 No Yes 3.0.0-1 or later2.6.18-164.9.1.el5 No Yes 3.0.0-1 or later2.6.18-164.11.1.el5 No Yes 3.0.0-1 or later2.6.18-164.15.1.el5 No Yes 3.0.0-1 or later2.6.18-164.38.1.el5 No Yes <strong>3.1</strong>.4-1 or later2.6.18-194.el5 No Yes 3.0.0-1 or later2.6.18-194.8.1.el5 No Yes 3.0.0-1 or later2.6.18-194.11.4.el5 No Yes 3.0.1-1 or later2.6.18-194.17.1.el5 No Yes 3.0.1-1 or later2.6.18-194.32.1.el5 No Yes 3.0.3-1 or later2.6.18-238.el5 No Yes 3.0.3-1 or later2.6.18-238.1.1.el5 No Yes 3.0.3-1 or later2.6.18-238.9.1.el5 No Yes <strong>3.1</strong>.0-1 or later2.6.18-238.37.1.el5 No Yes <strong>3.1</strong>.4-1 or laterRed Hat2.6.18-274.el5 No Yes <strong>3.1</strong>.0-1 or laterEnterprise <strong>Linux</strong> 5(update7) 2.6.18-274.18.1.el5 No Yes <strong>3.1</strong>.3-1 or laterRed HatEnterprise <strong>Linux</strong> 5(update8)Red HatEnterprise <strong>Linux</strong> 62.6.18-308.el5 No Yes <strong>3.1</strong>.4-1 or later2.6.18-308.4.1.el5 No Yes <strong>3.1</strong>.4-1 or later2.6.18-308.11.1.el5 No Yes <strong>3.1</strong>.5-1 or later2.6.32-71.el6.ppc64 No Yes 3.0.2-1 or later2.6.32-71.7.1.el6.ppc64 No Yes 3.0.3-1 or later2.6.32-71.14.1.el6.ppc64 No Yes 3.0.3-1 or later2.6.32-71.18.1.el6.ppc64 No Yes 3.0.3-1 or laterRed Hat2.6.32-131.0.15.el6.ppc64 No Yes 3.0.4-1 or laterEnterprise <strong>Linux</strong> 6(update1) 2.6.32-131.21.1.el6.ppc64 No Yes <strong>3.1</strong>.3-1 or laterRed Hat2.6.32-220.el6.ppc64 No Yes <strong>3.1</strong>.3-1 or laterEnterprise <strong>Linux</strong> 6(update2) 2.6.32-220.4.2.el6.ppc64 No Yes <strong>3.1</strong>.3-1 or laterRemarksSection II Installing <strong>ExpressCluster</strong>61


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>DistributionRed HatEnterprise <strong>Linux</strong> 6(update3)Novell SUSE LINUXEnterprise Server 10(SP2)Kernel versionReplicatorReplicator DRsupportRun clpkaand clpkhbsupport<strong>ExpressCluster</strong>Version2.6.32-220.17.1.el6.ppc64 No Yes <strong>3.1</strong>.4-1 or later2.6.32-220.2<strong>3.1</strong>.el6.ppc64 No Yes <strong>3.1</strong>.5-1 or later2.6.32-279.el6.ppc64 No Yes <strong>3.1</strong>.5-1 or later2.6.32-279.2.1.el6.ppc64 No Yes <strong>3.1</strong>.5-1 or later2.6.32-279.11.1.el6.ppc64 No Yes <strong>3.1</strong>.7-1 or later2.6.32-279.14.1.el6.ppc64 No Yes <strong>3.1</strong>.7-1 or later2.6.16.60-0.21-ppc64 No Yes 3.0.0-1 or laterRemarksNovell SUSE LINUX 2.6.16.60-0.54.5-ppc64 No Yes 3.0.0-1 or laterEnterprise Server 10(SP3) 2.6.16.60-0.69.1-ppc64 No Yes <strong>3.1</strong>.0-1 or laterNovell SUSE LINUXEnterprise Server 10(SP4)Novell SUSE LINUXEnterprise Server 11Novell SUSE LINUXEnterprise Server 11(SP1)Novell SUSE LINUXEnterprise Server 11(SP2)2.6.16.60-0.85.1-ppc64 No Yes 3.0.4-1 or later2.6.27.19-5-ppc64 No Yes 3.0.0-1 or later2.6.32.12-0.7-ppc64 No Yes 3.0.2-1 or later3.0.13-0.27-ppc64 No Yes <strong>3.1</strong>.5-1 or later62<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareApplications supported by monitoring optionsVersion in<strong>for</strong>mation of the applications to be monitored by monitor resources is described below.IA32Monitor resourceOracle monitorDB2 monitorPostgreSQL monitorMySQL monitorSybase monitorMonitored applicationOracle Database 10g Release 2(10.2)Oracle Database 11g Release 1(11.1)Oracle Database 11g Release 2(11.2)DB2 V9.5DB2 V9.7DB2 V10.1PostgreSQL 8.1PostgreSQL 8.2PostgreSQL 8.3PostgreSQL 8.4PostgreSQL 9.0PostgreSQL 9.1PostgreSQL 9.2PowerGres on <strong>Linux</strong> 6.0PowerGres on <strong>Linux</strong> 7.0PowerGres on <strong>Linux</strong> 7.1PowerGres on <strong>Linux</strong> 9.0MySQL 5.0MySQL 5.1MySQL 5.5Sybase ASE 15.0Sybase ASE 15.5<strong>ExpressCluster</strong>version3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later<strong>3.1</strong>.3-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.3-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.7-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.3-1 or later3.0.0-1 or later3.0.0-1 or later3.0.3-1 or later3.0.0-1 or later<strong>3.1</strong>.0-1 or laterRemarksSamba monitornfs monitorSybase ASE 15.7Samba 3.0Samba 3.2Samba 3.3Samba 3.4Samba 3.5nfsd 2 (udp)nfsd 3 (udp)nfsd 4 (tcp)mountd 1 (tcp)mountd 2 (tcp)<strong>3.1</strong>.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later<strong>3.1</strong>.5-1 or later3.0.0-1 or later<strong>3.1</strong>.5-1 or later<strong>3.1</strong>.5-1 or later3.0.0-1 or later<strong>3.1</strong>.5-1 or laterSection II Installing <strong>ExpressCluster</strong>63


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>Monitor resourceMonitored application<strong>ExpressCluster</strong>versionmountd 3 (tcp) <strong>3.1</strong>.5-1 or laterRemarksHTTP monitor No specified version 3.0.0-1 or laterSMTP monitor No specified version 3.0.0-1 or laterPop3 monitor No specified version 3.0.0-1 or laterimap4 monitor No specified version 3.0.0-1 or laterftp monitor No specified version 3.0.0-1 or laterTuxedo monitorOracleAS monitorWeblogic monitorWebsphere monitorWebOTX monitorJVM monitorTuxedo 10g R3 3.0.0-1 or laterTuxedo 11g R1 3.0.0-1 or laterOracle Application Server 10gRelease 3 (10.1.3.4)3.0.0-1 or laterWebLogic Server 10g R3 3.0.0-1 or laterWebLogic Server 11g R1 3.0.0-1 or laterWebLogic Server 12c <strong>3.1</strong>.3-1 or laterWebSphere 6.1 3.0.0-1 or laterWebSphere 7.0 3.0.0-1 or laterWebSphere 8.0 <strong>3.1</strong>.5-1 or laterWebOTX V7.1 3.0.0-1 or laterWebOTX V8.0 3.0.0-1 or laterWebOTX V8.1 3.0.0-1 or laterWebOTX V8.2 3.0.0-1 or laterWebOTX V8.3 <strong>3.1</strong>.0-1 or laterWebOTX V8.4 <strong>3.1</strong>.0-1 or laterWebLogic Server 11g R1 <strong>3.1</strong>.0-1 or laterWebLogic Server 12c <strong>3.1</strong>.3-1 or laterWebOTX V8.2 <strong>3.1</strong>.0-1 or laterWebOTX V8.3 <strong>3.1</strong>.0-1 or laterWebOTX V8.4 <strong>3.1</strong>.0-1 or laterWebOTX Enterprise ServiceBus V8.4WebOTX Enterprise ServiceBus V8.5JBoss Application Server4.2.3.GA/5.1.0.GAJBoss Enterprise ApplicationPlat<strong>for</strong>m 4.3.0.GA_CP06<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.5-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or laterApache Tomcat 6.0 <strong>3.1</strong>.0-1 or laterApache Tomcat 7.0 <strong>3.1</strong>.3-1 or laterWebSAM SVF <strong>for</strong> PDF 9.0 <strong>3.1</strong>.3-1 or laterWebSAM SVF <strong>for</strong> PDF 9.1 <strong>3.1</strong>.4-1 or later64<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareMonitor resourceMonitored applicationWebSAM Report DirectorEnterprise 9.0WebSAM Report DirectorEnterprise 9.1WebSAM Universal Connect/X9.0WebSAM Universal Connect/X9.1Oracle iPlanet Web Server 7.0<strong>ExpressCluster</strong>version<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.5-1 or later<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.5-1 or later<strong>3.1</strong>.3-1 or laterSystem monitor No specified version <strong>3.1</strong>.0-1 or laterRemarksx86_64Monitor resourceMonitored applicationOracle Database 10gRelease 2 (10.2)<strong>ExpressCluster</strong>version3.0.0-1 or laterRemarksOracle monitorOracle Database 11gRelease 1 (11.1)3.0.0-1 or laterOracle Database 11gRelease 2 (11.2)3.0.0-1 or laterDB2 V9.53.0.0-1 or laterDB2 monitorDB2 V9.73.0.0-1 or laterDB2 V10.1<strong>3.1</strong>.3-1 or laterPostgreSQL 8.13.0.0-1 or laterPostgreSQL 8.23.0.0-1 or laterPostgreSQL 8.33.0.0-1 or laterPostgreSQL 8.43.0.0-1 or laterPostgreSQL 9.03.0.3-1 or laterPostgreSQL monitorPostgreSQL 9.1<strong>3.1</strong>.0-1 or laterPostgreSQL 9.2<strong>3.1</strong>.7-1 or laterPowerGres on <strong>Linux</strong> 6.0 3.0.0-1 or laterPowerGres on <strong>Linux</strong> 7.0 3.0.0-1 or laterPowerGres on <strong>Linux</strong> 7.1 3.0.0-1 or laterPowerGres on <strong>Linux</strong> 9.0 3.0.3-1 or laterPowerGres Plus V5.0 3.0.0-1 or laterMySQL 5.03.0.0-1 or laterMySQL monitor MySQL 5.13.0.0-1 or laterMySQL 5.53.0.3-1 or laterSybase ASE 15.03.0.0-1 or laterSybase monitor Sybase ASE 15.5<strong>3.1</strong>.0-1 or laterSybase ASE 15.7<strong>3.1</strong>.0-1 or laterSamba monitor Samba 3.0 3.0.0-1 or laterSection II Installing <strong>ExpressCluster</strong>65


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>Monitor resourceMonitored applicationSamba 3.2<strong>ExpressCluster</strong>version3.0.0-1 or laterRemarksSamba 3.33.0.0-1 or laterSamba 3.43.0.0-1 or laterSamba 3.5<strong>3.1</strong>.5-1 or laternfsd 2 (udp)3.0.0-1 or laternfsd 3 (udp)<strong>3.1</strong>.5-1 or laterNFS monitornfsd 4 (tcp)mountd 1 (tcp)<strong>3.1</strong>.5-1 or later3.0.0-1 or latermountd 2 (tcp)<strong>3.1</strong>.5-1 or latermountd 3 (tcp)<strong>3.1</strong>.5-1 or laterHTTP monitor No specified version 3.0.0-1 or laterSMTP monitor No specified version 3.0.0-1 or laterpop3 monitor No specified version 3.0.0-1 or laterimap4 monitor No specified version 3.0.0-1 or laterftp monitor No specified version 3.0.0-1 or laterTuxedo monitorOracleAS monitorWeblogic monitorWebsphere monitorWebOTX monitorJVM monitorTuxedo 10g R3Tuxedo 11g R1Oracle Application Server10g Release 3 (10.1.3.4)WebLogic Server 10g R3WebLogic Server 11g R1WebLogic Server 12cWebSphere 6.1WebSphere 7.0WebSphere 8.0WebOTX V7.1WebOTX V8.0WebOTX V8.1WebOTX V8.2WebOTX V8.3WebOTX V8.4WebOTX V8.5WebLogic Server 11g R1WebLogic Server 12cWebOTX V8.2WebOTX V8.3WebOTX V8.4WebOTX V8.53.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later<strong>3.1</strong>.3-1 or later3.0.0-1 or later3.0.0-1 or later<strong>3.1</strong>.5-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later3.0.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.5-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.5-1 or later66<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareMonitor resourceMonitored applicationWebOTX EnterpriseService Bus V8.4WebOTX EnterpriseService Bus V8.5JBoss Application Server4.2.3.GA/5.1.0.GAJBoss EnterpriseApplication Plat<strong>for</strong>m4.3.0.GA_CP06Apache Tomcat 6.0Apache Tomcat 7.0WebSAM SVF <strong>for</strong> PDF 9.0WebSAM SVF <strong>for</strong> PDF 9.1WebSAM Report DirectorEnterprise 9.0WebSAM Report DirectorEnterprise 9.1WebSAM UniversalConnect/X 9.0WebSAM UniversalConnect/X 9.1Oracle iPlanet Web Server7.0<strong>ExpressCluster</strong>version<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.5-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.4-1 or later<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.5-1 or later<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.5-1 or later<strong>3.1</strong>.3-1 or laterSystem monitor No specified version <strong>3.1</strong>.0-1 or laterRemarksNote: To use monitoring options in x86_64 environments, applications to bemonitored must be x86_64 version.IBM POWERMonitor resourceMonitored application<strong>ExpressCluster</strong>versionOracle monitor Oracle Database 10g Release 2 (10.2) 3.0.0-1 or laterDB2 monitorPostgreSQL monitorDB2 V9.5DB2 V9.7DB2 V10.1PostgreSQL 8.1PostgreSQL 8.2PostgreSQL 8.3PostgreSQL 8.4PostgreSQL 9.0PostgreSQL 9.1PostgreSQL 9.2<strong>3.1</strong>.0-1 or later3.0.0-1 or later<strong>3.1</strong>.3-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later3.0.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.7-1 or laterRemarksSection II Installing <strong>ExpressCluster</strong>67


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>Note: To use monitoring options in IBM POWER environments, applications to bemonitored must be IBM POWER version.Operation Environment of VM resourcesThe followings are the version in<strong>for</strong>mation of the virtual machines on which VM resourcesoperation are verified.vSphereVirtual MachineXenServerKVMVersion<strong>ExpressCluster</strong>versionRemarks4.0 update1 3.0.0-1 or later x86_644.0 update2 3.0.0-1 or later x86_644.1 3.0.0-1 or later x86_645 <strong>3.1</strong>.0-1 or later VM5.5 3.0.0-1 or later IA325.6 3.0.0-1 or later IA32Red Hat Enterprise <strong>Linux</strong> 5.5 3.0.0-1 or later x86_64Red Hat Enterprise <strong>Linux</strong> 5.6 3.0.0-1 or later x86_64Red Hat Enterprise <strong>Linux</strong> 6.0 <strong>3.1</strong>.0-1 or later x86_64Red Hat Enterprise <strong>Linux</strong> 6.1 <strong>3.1</strong>.0-1 or later x86_6468<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareOperation environment <strong>for</strong> SNMP linkage functionsThe tables below list the SNMP agents on which the operation of the SNMP linkage functions wasverified.IA32DistributionSNMP agent<strong>ExpressCluster</strong>versionRemarksRed HatEnterprise <strong>Linux</strong> 5.4Net-SNMP 5.3.2.2<strong>3.1</strong>.0-1 or laterRed HatEnterprise <strong>Linux</strong> 5.6Red HatEnterprise <strong>Linux</strong> 6.1Novell SUSE LINUXEnterprise Server 11(SP1)Net-SNMP 5.3.2.2Net-SNMP 5.5Net-SNMP 5.4.2.1<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or later<strong>3.1</strong>.0-1 or laterx86_64DistributionRed HatEnterprise <strong>Linux</strong> 5.4SNMP agentNet-SNMP 5.3.2.2<strong>ExpressCluster</strong>version<strong>3.1</strong>.0-1 or laterRemarksRed HatEnterprise <strong>Linux</strong> 5.6Net-SNMP 5.3.2.2<strong>3.1</strong>.0-1 or laterRed HatEnterprise <strong>Linux</strong> 6.1Net-SNMP 5.5<strong>3.1</strong>.0-1 or laterNovell SUSE LINUXEnterprise Server 11(SP1)Net-SNMP 5.4.2.1<strong>3.1</strong>.0-1 or laterOracle Enterprise <strong>Linux</strong>5 (5.5)Net-SNMP 5.3.2.2<strong>3.1</strong>.0-1 or laterIBM POWERDistributionRed HatEnterprise <strong>Linux</strong> 6.1Net-SNMP 5.5SNMP agent<strong>ExpressCluster</strong>version<strong>3.1</strong>.0-1 or laterRemarksNovell SUSE LINUXEnterprise Server 11(SP1)Net-SNMP 5.4.2.1<strong>3.1</strong>.0-1 or laterNote: Use Novell SUSE LINUX Enterprise Server 11 (SP1) or later to obtain SNMPin<strong>for</strong>mation on a Novell SUSE LINUX Enterprise Server.Section II Installing <strong>ExpressCluster</strong>69


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>Operation environment <strong>for</strong> JVM monitorThe use of the JVM monitor requires a Java runtime environment.Java Runtime EnvironmentVersion6.0 Update 21 (1.6.0_21) or laterJava SE Development KitVersion 6.0 Update 21(1.6.0_21) or laterJava Runtime EnvironmentVersion7.0Java SE Development KitVersion 7.0Open JDKThe tables below list the load balancers that were verified <strong>for</strong> the linkage with the JVM monitor.IA32Load balancer<strong>ExpressCluster</strong>versionRemarksExpress5800/LB400h <strong>3.1</strong>.0-1 or laterBIG-IP v11 <strong>3.1</strong>.3-1 or laterMIRACLE LoadBalancer <strong>3.1</strong>.3-1 or laterCoyotePoint Equalizer <strong>3.1</strong>.3-1 or laterx86_64Load balancer<strong>ExpressCluster</strong>versionExpress5800/LB400h <strong>3.1</strong>.0-1 or laterBIG-IP v11 <strong>3.1</strong>.3-1 or laterMIRACLE LoadBalancer <strong>3.1</strong>.3-1 or laterCoyotePoint Equalizer <strong>3.1</strong>.3-1 or laterRemarks70<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


SoftwareRequired memory and disk sizeRequired memory sizeUser modeKernel modeRequired disk sizeRight afterinstallationMax. duringoperationWhen the synchronization mode isIA-32 96 MB(*1) used:140 MB 2.0 GBx86_6496 MB(*1)(number of request queues x I/Osize) +(2MB x number of mirror diskresources and hybrid diskresources)When the asynchronous mode isused:(number of request queues x I/Osize) +((2MB + (number ofasynchronous queues)) xnumber of mirror disk resourcesand hybrid disk resources)140 MB 2.0 GBIBM64 MB(*1) - 24 MB 1.1 GBPOWER(*1) excepting <strong>for</strong> optional products.Note:The I/O size is 128 KB <strong>for</strong> the vxfs file system and 4 KB <strong>for</strong> file systems other than it.For the setting value of the number of request queues and asynchronization queues,see “Understanding mirror disk resources” in the Reference <strong>Guide</strong>.Section II Installing <strong>ExpressCluster</strong>71


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>System requirements <strong>for</strong> the BuilderSupported operating systems and browsersRefer to the website, http://www.nec.com/global/prod/expresscluster/, <strong>for</strong> the latest in<strong>for</strong>mation.Currently supported operating systems and browsers are the following:Operating system Browser LanguageMicrosoft Windows ® XP SP3 or later (IA32)IE7English/Japanese/ChineseMicrosoft Windows Vista ® (IA32)IE8IE7IE8English/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseMicrosoft Windows ® 7 (IA32) IE8 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) IE8 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) IE9 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) Firefox 10 English/Japanese/ChineseMicrosoft Windows ® 8 (IA32, x86_64) IE10 English/Japanese/ChineseMicrosoft Windows ® 8 (IA32, x86_64) Firefox 15 English/Japanese/ChineseMicrosoft Windows Server 2008 (IA32) IE7 English/Japanese/ChineseMicrosoft Windows Server 2008 R2 IE 9 English/Japanese/ChineseMicrosoft Windows Server 2012 IE 10 English/Japanese/ChineseMicrosoft Windows Server 2012 Firefox 15 English/Japanese/ChineseNovell SUSE LINUXEnterprise Server 10 (IA32)Novell SUSE LINUXEnterprise Server 11 (x86_64)Red Hat Enterprise <strong>Linux</strong> 5update5 (IA32)Red Hat Enterprise <strong>Linux</strong> 6update3 (IA32)Asianux Server 3 (IA32)Firefox 2.0.0.2Firefox 10Firefox 3.0.18Firefox 10Firefox 1.5.0.12Konqueror3.5.5English/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseTurbolinux 11 Server (IA32) Firefox 2.0.0.8 English/Japanese/ChineseNote:The Builder does not run on a browser of a x86_64 or IBM POWER machine. Use abrowser supporting IA32 to run the Builder.72<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> the BuilderNote:When using Internet Explorer 9, to connect to WebManager by using http://:2900, the IP address must be registered to Site of Local Intranet inadvance.Java runtime environmentRequired:Java Runtime Environment, Version 6.0 Update 21 (1.6.0_21) or laterJava Runtime Environment, Version 7.0 Update 2 (1.7.0_2) or laterNote:The 32-bit Java Runtime is necessary to run the Builder on x86_64 machines.Required memory and disk sizeRequired memory size: 32 MB or moreRequired disk size: 5 MB (excluding the size required <strong>for</strong> Java runtime environment)Supported <strong>ExpressCluster</strong> versionsOffline Builder version <strong>ExpressCluster</strong> X rpm version3.0.0-1 3.0.0-13.0.1-1 3.0.1-13.0.2-1 3.0.2-13.0.3-2 3.0.3-13.0.4-1 3.0.4-1<strong>3.1</strong>.0-1 <strong>3.1</strong>.0-1<strong>3.1</strong>.1-1 <strong>3.1</strong>.1-1<strong>3.1</strong>.3-1 <strong>3.1</strong>.3-1<strong>3.1</strong>.4-1 <strong>3.1</strong>.4-1<strong>3.1</strong>.5-1<strong>3.1</strong>.5-1<strong>3.1</strong>.6-1<strong>3.1</strong>.7-1 <strong>3.1</strong>.7-1Note:When you use the Offline Builder and the <strong>ExpressCluster</strong> rpm, a combination of theirversions should be the one shown above. The Builder may not operate properly if theyare used in a different combination.Section II Installing <strong>ExpressCluster</strong>73


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>System requirements <strong>for</strong> the WebManagerSupported operating systems and browsersRefer to the website, http://www.nec.com/global/prod/expresscluster/, <strong>for</strong> the latest in<strong>for</strong>mation.Currently the following operating systems and browsers are supported:Operating system Browser LanguageMicrosoft Windows ® SP3(IA32)IE7English/Japanese/ChineseMicrosoft Windows Vista ® (IA32)IE8IE7IE8English/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseMicrosoft Windows ® 7 (IA32) IE8 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) IE8 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) IE9 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) Firefox 10 English/Japanese/ChineseMicrosoft Windows ® 8 (IA32, x86_64) IE10 English/Japanese/ChineseMicrosoft Windows ® 8 (IA32, x86_64) Firefox 15 English/Japanese/ChineseMicrosoft Windows Server 2008 (IA32) IE7 English/Japanese/ChineseMicrosoft Windows Server 2008 R2 IE 9 English/Japanese/ChineseMicrosoft Windows Server 2012 IE 10 English/Japanese/ChineseMicrosoft Windows Server 2012 Firefox 15 English/Japanese/ChineseNovell SUSE LINUXEnterprise Server 10 (IA32)Novell SUSE LINUXEnterprise Server 11 (x86_64)Red Hat Enterprise <strong>Linux</strong> 5update5 (IA32)Red Hat Enterprise <strong>Linux</strong> 6update3 (IA32)Asianux Server 3 (IA32)Firefox 2.0.0.2Firefox 10Firefox 3.0.18Firefox 10Firefox 1.5.0.12Konqueror3.5.5English/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseTurbolinux 11 Server (IA32) Firefox 2.0.0.8 English/Japanese/ChineseNote:The <strong>ExpressCluster</strong> X WebManager does not run on a browser of a x86_64 or IBMPOWER machine. Use a browser supporting IA32 to run the <strong>ExpressCluster</strong> XWebManager .Note:When using Internet Explorer 9, to connect to WebManager by using http://:2900, the IP address must be registered to Site of Local Intranet inadvance.Java runtime environment74<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> the WebManagerRequired:Java Runtime Environment, Version 6.0 Update 21 (1.6.0_21) or laterJava Runtime Environment, Version 7.0 Update 2 (1.7.0_2) or laterNote:The 32-bit Java Runtime is necessary to run the Builder on x86_64 machines.Required memory and disk sizeRequired memory size: 40 MB or moreRequired disk size: 600 KB (excluding the size required <strong>for</strong> Java runtime environment)Section II Installing <strong>ExpressCluster</strong>75


Chapter 3 Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>System requirements <strong>for</strong> the IntegratedWebManagerThis section explains system requirements to operate the Integrated WebManager. Refer to theIntegrated WebManager Administrator’s <strong>Guide</strong> <strong>for</strong> the Java application version IntegratedWebManager.Supported operating systems and browsersCurrently the following operating systems and browsers are supported:Operating system Browser LanguageIE7English/Japanese/ChineseWindows ® XP SP3IE8English/Japanese/ChineseMicrosoft Windows Vista (IA32)IE7IE8English/Japanese/ChineseEnglish/Japanese/ChineseMicrosoft Windows ® 7 (IA32) IE8 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) IE8 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) IE9 English/Japanese/ChineseMicrosoft Windows ® 7 SP1 (IA32) Firefox 10 English/Japanese/ChineseMicrosoft Windows ® 8 (IA32, x86_64) IE10 English/Japanese/ChineseMicrosoft Windows ® 8 (IA32, x86_64) Firefox 15 English/Japanese/ChineseMicrosoft Windows Server 2008 (IA32) IE7 English/Japanese/ChineseMicrosoft Windows Server 2008 R2 IE 9 English/Japanese/ChineseMicrosoft Windows Server 2012 IE 10 English/Japanese/ChineseMicrosoft Windows Server 2012 Firefox 15 English/Japanese/ChineseNovell SUSE LINUXEnterprise Server 10 (IA32)Novell SUSE LINUXEnterprise Server 11 (x86_64)Red Hat Enterprise <strong>Linux</strong> 5update5 (IA32)Red Hat Enterprise <strong>Linux</strong> 6update3 (IA32)Asianux Server 3 (IA32)Firefox 2.0.0.2Firefox 10Firefox 3.0.18Firefox 10Firefox 1.5.0.12Konqueror3.5.5English/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseEnglish/Japanese/ChineseTurbolinux 11 Server (IA32) Firefox 2.0.0.8 English/Japanese/ChineseNote:The <strong>ExpressCluster</strong> X WebManager does not run on a browser of a x86_64 or IBMPOWER machine. Use a browser supporting IA32 to run the <strong>ExpressCluster</strong> XWebManager .76<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileJava runtime environmentRequired:Java Runtime Environment, Version 6.0 Update 21 (1.6.0_21) or later.Java Runtime Environment, Version 7.0 Update 2 (1.7.0_2) or laterNote:The 32-bit Java Runtime is necessary to run the Builder on x86_64 machines.Required memory size and disk sizeRequired memory size: 40 MB or moreRequired disk size: 300 KB or more (excluding the size required <strong>for</strong> Java runtime environment)System requirements <strong>for</strong> WebManager MobileThis section explains the system requirements to run the WebManager Mobile.Supported operating systems and browsersCurrently the following operating systems and browsers are supported:Operating system Browser LanguageAndroid 2.2 Standard browser English/Japanese/ChineseAndroid 2.3 Standard browser English/Japanese/ChineseAndroid 3.0 Standard browser English/Japanese/ChineseiOS 5 Safari (standard) English/Japanese/ChineseSection II Installing <strong>ExpressCluster</strong>77


Chapter 4Latest version in<strong>for</strong>mationThis chapter provides the latest in<strong>for</strong>mation on <strong>ExpressCluster</strong>.This chapter covers:• Correspondence list of <strong>ExpressCluster</strong> and a manual··············································································80• Enhanced functions ································································································································80• Corrected in<strong>for</strong>mation ····························································································································8679


Chapter 4 Latest version in<strong>for</strong>mationCorrespondence list of <strong>ExpressCluster</strong> and a manualThis book has explained on the assumption that <strong>ExpressCluster</strong> of the following version. Becareful of the number of versions of the version of <strong>ExpressCluster</strong>, and a manual.<strong>ExpressCluster</strong>Version<strong>3.1</strong>.7-1Manual Manual Version RemarksInstallation and Configuration<strong>Guide</strong><strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>Reference <strong>Guide</strong>Integrated WebManagerAdministrator’s <strong>Guide</strong>WebManager MobileAdministrator’s <strong>Guide</strong>Sixth EditionSeventh EditionSixth EditionSeventh EditionSecond EditionEnhanced functionsUpgrade has been per<strong>for</strong>med on the following minor versions.NumberVersion (indetail)Upgraded section1 3.0.0-1 The WebManager and Builder can now be used from the same browserwindow.2 3.0.0-1 The cluster generation wizard has been upgraded.3 3.0.0-1 Some settings can now be automatically acquired in the cluster generationwizard.4 3.0.0-1 The Integrated WebManager can now be used from a browser.5 3.0.0-1 A function has been implemented to check settings when uploadingconfiguration data.6 3.0.0-1 <strong>ExpressCluster</strong> can now automatically select the failover destination whenan error occurs.7 3.0.0-1 A function has been implemented to control failovers across server groups.8 3.0.0-1 All Groups can now be selected as the failover target when an error isdetected.9 3.0.0-1 The start wait time can now be skipped.10 3.0.0-1 <strong>ExpressCluster</strong> can now manage external errors.11 3.0.0-1 Dump in<strong>for</strong>mation can now be acquired when the target monitoringapplication times out.12 3.0.0-1 Detailed in<strong>for</strong>mation about an Oracle database can now be acquired if anerror is detected while monitoring it.80<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumberVersion (indetail)Upgraded section13 3.0.0-1 Mirror data can now be compressed <strong>for</strong> transfer during asynchronousmirroring.14 3.0.0-1 Whole mirror synchronization has been accelerated.15 3.0.0-1 A function has been implemented to register a virtual host name to thedynamic DNS server.16 3.0.0-1 A guest OS can now be handled as a resource when the host OS ofvSphere, XenServer, or kvm is clustered.17 3.0.0-1 A function has been implemented to automatically follow a guest OS in thevirtualization infrastructure if it is moved by software other than<strong>ExpressCluster</strong>.18 3.0.0-1 vMotion can now be executed at error detection or during operation if thevSphere host OS is clustered.19 3.0.0-1 The Logical Volume Manager (LVM) can now be controlled.20 3.0.0-1 Disk settings have been consolidated.21 3.0.0-1 Additional OSs are now supported.22 3.0.0-1 Additional applications are now supported.23 3.0.0-1 Additional network warning lights are now supported.24 3.0.2-1 The newly released kernel is now supported.25 3.0.2-1 An improvement has been made to the WebManager display that indicatesthe specification of all groups as recovery targets of the monitor resource.26 3.0.3-1 The newly released kernel is now supported.27 3.0.3-1 Coordination with the migration function of XenServer has been enabled.28 3.0.4-1 The newly released kernel is now supported.29 <strong>3.1</strong>.0-1 The number of group and resource has been doubled.30 <strong>3.1</strong>.0-1 Options have been added <strong>for</strong> dynamic failover.31 <strong>3.1</strong>.0-1 Waiting <strong>for</strong> startup or stopping a faiover group has been enabled.32 <strong>3.1</strong>.0-1 Failover to a resource outside the server group has been added as arecovery action <strong>for</strong> an message receive monitor resource (mrw).33 <strong>3.1</strong>.0-1A function whereby the WebManager and the clpmonctrl command canbe used to trigger a Dummy Failure <strong>for</strong> a monitor resource has beenimplemented.34 <strong>3.1</strong>.0-1 WebManager that can be connected from an Android terminal has beenimplemented.35 <strong>3.1</strong>.0-1 The MIB of <strong>ExpressCluster</strong> has been defined.Section II Installing <strong>ExpressCluster</strong>81


Chapter 4 Latest version in<strong>for</strong>mationNumberVersion (indetail)Upgraded section36 <strong>3.1</strong>.0-1 An SNMP trap transmission function has been added.37 <strong>3.1</strong>.0-1 In<strong>for</strong>mation acquisition requests on SNMP are now supported.38 <strong>3.1</strong>.0-1 A function has been implemented to execute a specified script to recover amonitor resource. In addition, script execution has been enabled prior toreactivation or failover.39 <strong>3.1</strong>.0-1 A function has been implemented to disable recovery action caused bymonitor resource error.40 <strong>3.1</strong>.0-1 Parallel processing now occurs when all groups failover due to a monitoringerror.41 <strong>3.1</strong>.0-1 Database monitoring functions have been enhanced.42 <strong>3.1</strong>.0-1 Some environment variables have been added <strong>for</strong> use in scripts.43 <strong>3.1</strong>.0-1 Script setting has been simplified by the use of script templates.44 <strong>3.1</strong>.0-1 The display of the configuration mode screen has been corrected <strong>for</strong> the800*600 screen size.45 <strong>3.1</strong>.0-1 Logs can be downloaded even if the browser is set to block popups.46 <strong>3.1</strong>.0-1 Functions <strong>for</strong> which licenses have not been installed are no longerdisplayed during setup.47 <strong>3.1</strong>.0-1 The number of monitor resources that are automatically registered hasbeen increased.48 <strong>3.1</strong>.0-1The default command timeout value <strong>for</strong> the clprexec command has beenchanged from 30 seconds to 180 seconds.49 <strong>3.1</strong>.0-1 Process name monitor resource (psw) has been added.50 <strong>3.1</strong>.0-1 JVM monitor resource (jraw) has been added.51 <strong>3.1</strong>.0-1 System monitor resource (sraw) has been added.52 <strong>3.1</strong>.0-1 A function has been added to save the mirror disk per<strong>for</strong>mance data as alog.53 <strong>3.1</strong>.0-1 Short options are available <strong>for</strong> mirror disk-related commands.54 <strong>3.1</strong>.0-1 Configuration screen <strong>for</strong> mirror disk connect is now the same be<strong>for</strong>e andafter running Cluster Generation Wizard.55 <strong>3.1</strong>.0-1 A function has been added to prevent the startup of the <strong>ExpressCluster</strong>services when the operating system has been shut down abnormally.56 <strong>3.1</strong>.0-1 Conditions <strong>for</strong> triggering the function that stalls shutdown can now bespecified.57 <strong>3.1</strong>.0-1 Rotating log (internal log) can now be selected as the script execution log<strong>for</strong> EXEC resources and custom monitor resources (genw).58 <strong>3.1</strong>.0-1A list of registered licenses can now be displayed by using the clplcnscommand.82<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumberVersion (indetail)59 <strong>3.1</strong>.0-1Upgraded sectionA function <strong>for</strong> using the clplcns command to delete only the trial licensehas been added.60 <strong>3.1</strong>.0-1 The newly released kernel is now supported.(RHEL5.7, AXS3SP4)61 <strong>3.1</strong>.0-1 In linkage with vSphere5, the cluster on the guest operating system hasbeen enabled to control startup and stopping of another guest operatingsystem.62 <strong>3.1</strong>.0-1 Coordination with the migration function of kvm has been enabled.63 <strong>3.1</strong>.0-1 Timeout decision processing has been improved when an invalid OS time isreturned from while running <strong>for</strong> 447 or 497 continuous days.64 <strong>3.1</strong>.0-1 LVM and VxVM in<strong>for</strong>mation has been added to the function that collectsserver in<strong>for</strong>mation in configuration mode.65 <strong>3.1</strong>.0-1 A function has been added to limit the bandwidth <strong>for</strong> communication withthe mirror disks (in asynchronous mode).66 <strong>3.1</strong>.0-1 A command function has been added to display mirror disk per<strong>for</strong>mancedata.67 <strong>3.1</strong>.0-1 If BMC detects some serious error on Express5800/A1080, BMC can wait<strong>for</strong> its recovery action until recovery action of message receive monitor(mrw) is completed.68 <strong>3.1</strong>.0-1 Heartbeat path via networks among BMC interfaces is newly added <strong>for</strong>exclusive use with Express5800/A1080.69 <strong>3.1</strong>.1-1 The newly released kernel is now supported. (XenServer6)70 <strong>3.1</strong>.1-171 <strong>3.1</strong>.1-172 <strong>3.1</strong>.1-1Section II Installing <strong>ExpressCluster</strong>The conditions to wait <strong>for</strong> the group stop can now be specified. (Clusterstop, server stop )The view of the recovery action control function popup window that isdisplayed at the end of the Cluster Generation Wizard is improved.The number of disks of which size is to be monitored by System ResourceAgent has been changed from 10 to 64.73 <strong>3.1</strong>.3-1 The newly released kernel is now supported.74 <strong>3.1</strong>.3-175 <strong>3.1</strong>.3-176 <strong>3.1</strong>.3-177 <strong>3.1</strong>.3-178 <strong>3.1</strong>.3-179 <strong>3.1</strong>.3-180 <strong>3.1</strong>.3-181 <strong>3.1</strong>.3-1A function <strong>for</strong> displaying time in<strong>for</strong>mation has been added to WebManager.A function <strong>for</strong> <strong>for</strong>cibly stopping a virtual machine has been added.A function <strong>for</strong> automatically starting or resuming the cluster after reflectingthe configuration data has been added.A function has been added to prevent a Web browser from beingterminated or reloaded when the configuration data is edited inWebManager Config Mode.WebManager can now set and display physical machines and virtualmachines separately.The setting that assumes that a diskfull detection is not an error has beenadded to the disk monitor resource.A function <strong>for</strong> monitoring the number of processes has been added to theprocess name monitor resource.The Oracle monitor resource has been improved so that a specific error83


Chapter 4 Latest version in<strong>for</strong>mationNumber84Version (indetail)82 <strong>3.1</strong>.3-183 <strong>3.1</strong>.3-184 <strong>3.1</strong>.3-185 <strong>3.1</strong>.3-186 <strong>3.1</strong>.3-187 <strong>3.1</strong>.3-188 <strong>3.1</strong>.3-189 <strong>3.1</strong>.3-190 <strong>3.1</strong>.3-191 <strong>3.1</strong>.3-192 <strong>3.1</strong>.3-193 <strong>3.1</strong>.4-194 <strong>3.1</strong>.4-195 <strong>3.1</strong>.4-196 <strong>3.1</strong>.4-197 <strong>3.1</strong>.4-198 <strong>3.1</strong>.5-199 <strong>3.1</strong>.5-1100 <strong>3.1</strong>.5-1101 <strong>3.1</strong>.5-1102 <strong>3.1</strong>.5-1103 <strong>3.1</strong>.5-1Upgraded section(ORA-1033) which occurs when Oracle is being started is regarded asbeing the normal state.The disk resource monitoring function of the system monitor resource cannow monitor disks and mirror disks that are mounted after the systemstarted.The floating IP monitor resource has been added.The process to deactivate a resource has been improved so that theprocess can be executed as far as possible in case of emergencyshutdown.It is now possible to specify whether to enable or disable the deactivitycheck of the floating IP address resource.The conditions to determine whether a timeout occurs in Database Agent,Java Resource Agent, System Resource Agent, virtual IP monitor resource,and DDNS monitor resource has been enhanced.A message queue has been added as an internal log communicationmethod.The mirror disk resource can now be used in LVM environments.A mirror synchronization packet is now not sent when the latest data issaved in both mirror disks.An improvement has been made in the per<strong>for</strong>mance when small amount ofdata is written to a mirror data partition with O_SYNC specified.An improvement has been made in the per<strong>for</strong>mance of the initial mirrorconstruction and full mirror recovery when the mirror data partition <strong>for</strong>mat isext4.The JVM monitor resource now supports OpenJDK.The newly released kernel is now supported.An attempt to reopen the COM device is now made if an HW error occursduring RS232C communication.WebManager now supports Java SE Runtime Environment 7.ext4 can now be selected as the file system <strong>for</strong> a disk resource.The load imposed by the WebLogic monitoring processing by the WebLogicmonitor resource has been reduced.The newly released kernel is now supported.The simplified version of the cluster configuration wizard, which facilitatesconfiguration of a shared disk type cluster, is now supported.You can now select the servers that continue to work even if it is detectedthat both systems are activated.A warning message is now output if in<strong>for</strong>mation becomes inconsistentbetween servers because both systems are activated or <strong>for</strong> some otherreason.The monitor resource exclusion list used <strong>for</strong> determining dynamic failovercan now be edited.Shutdown operations, including OS shutdown, can be now avoided if noother servers continue to work.<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumberVersion (indetail)104 <strong>3.1</strong>.5-1105 <strong>3.1</strong>.5-1106 <strong>3.1</strong>.5-1107 <strong>3.1</strong>.5-1108 <strong>3.1</strong>.5-1109 <strong>3.1</strong>.5-1110 <strong>3.1</strong>.5-1111 <strong>3.1</strong>.5-1112 <strong>3.1</strong>.5-1113 <strong>3.1</strong>.5-1114 <strong>3.1</strong>.5-1115 <strong>3.1</strong>.5-1116 <strong>3.1</strong>.5-1117 <strong>3.1</strong>.5-1118 <strong>3.1</strong>.5-1119 <strong>3.1</strong>.7-1120 <strong>3.1</strong>.7-1121 <strong>3.1</strong>.7-1122 <strong>3.1</strong>.7-1123 <strong>3.1</strong>.7-1124 <strong>3.1</strong>.7-1125 <strong>3.1</strong>.7-1Upgraded sectionThe license in<strong>for</strong>mation list can be now viewed from WebManager.The DN-1500GL series from ISA is now supported as the warning light <strong>for</strong>the <strong>ExpressCluster</strong> X AlertService option.When linked with DN-1500GL, the <strong>ExpressCluster</strong> X AlertService optioncan link with the voice play function in DN-1500GL.The start/stop linkage processing between the monitor resources and thegroup resources when the monitoring time is “Active” has been reviewedand accelerated.If an error occurs in a monitor resource registered in the exclusion list, thatresource can be now restarted on the same server.The NFS monitor resource now supports NFS v3 and v4.The samba monitor resource now supports samba 3.5.The Websphere monitor resource now supports WebSphere 8.0.The load balancer link function <strong>for</strong> the JVM monitor resource now supportsBIG-IP LTM.The JVM monitor resource now supports WebOTX 8.5 (x86_64 only),WebOTX ESB 8.5, MasterScope/NEC Storage SVF <strong>for</strong> PDF 9.1,MasterScope/NEC Storage Report Director Enterprise 9.1, andMasterScope/NEC Storage Universal Connect/X 9.1.The WebOTX monitor resource now supports WebOTX 8.5 (x86_64 only).MDC heartbeat-related parameters have been added as adjustmentparameters <strong>for</strong> the mirror and hybrid disk resources.A command that can be used <strong>for</strong> capacity planning (clpprer) has beenadded. This command can estimate future values based on time-seriesdata indicating system resource usage.A function to collect system resource in<strong>for</strong>mation that can be used to easilydetermine the cause of a failure resulting from a shortage of systemresources has been added.The stack size of applications started from the EXEC resources nowmatches the OS setting value.The newly released kernel is now supported.PostgreSQL monitor now supports PostgreSQL9.2.It can now be judged as abnormal if an NIC Link down occurs at activationof floating IP resource.It can now be judged as abnormal if an NIC Link down occurs at activationof virtual IP resource.The NH-SPL from PATLITE is now supported as the warning light <strong>for</strong> the<strong>ExpressCluster</strong> X AlertService option.The DN-1300GL series from ISA is now supported as the warning light <strong>for</strong>the <strong>ExpressCluster</strong> X AlertService option.Actions configurable <strong>for</strong> Action at NP Occurrence have been expanded.Section II Installing <strong>ExpressCluster</strong>85


Chapter 4 Latest version in<strong>for</strong>mationCorrected in<strong>for</strong>mationModification has been per<strong>for</strong>med on the following minor versions.NumberVersion (indetail) Upgraded section Cause1 3.0.1-1/3.0.0-1A problem that a cluster cannot start Error in the license managementup with VM license has been fixed. tableThe final action upon group resource2 3.0.2-1or monitor resource failure was/3.0.0-1 to 3.0.1-1displayed as a final action upon The terms have not been unifiedcluster service failure <strong>for</strong> Builder, and among the functions.a final action upon cluster daemonfailure <strong>for</strong> WebManager.3.0.2-1Specifying exclusive attributes from3In Builder, an exclusive attribute/3.0.0-1 to 3.0.1-1 the properties was not prohibited.could be specified from the virtual(In the case of the wizard, this hasmachine properties.been prohibited.)3.0.2-1 In an environment where XenServer4/3.0.0-1 to 3.0.1-1 could not be used, the VM monitorA NULL pointer was issued in theabnormally terminated (core dump)VM monitor initialization.when the XenServer VM monitor wasset up.3.0.2-1 When connecting to WebManager by5WebManager was not taken into/3.0.0-1 to 3.0.1-1 using FIP, and adding the settings,consideration <strong>for</strong> the process tothe notes on connecting by usingevaluate connection using FIP.FIPs were not displayed.3.0.2-1Script execution and group failover6/3.0.0-1 to 3.0.1-1 When using the clprexec were not taken into consideration <strong>for</strong>command, Unknown request was the process to create a characteroutput to syslog and the alert log. string output to syslog and the alertlog.3.0.2-1Since the NP status was not7891011In WebManager, the pingnp status of/3.0.0-1 to 3.0.1-1 initialized, it is assumed as anthe stopped server was displayed asnormal.undefined value if no in<strong>for</strong>mationwas obtained.3.0.2-1 When changing the settings on theThere was no consideration <strong>for</strong> the/3.0.0-1 to 3.0.1-1 monitor resource properties dialogdecision process.box, Apply could not be clicked.3.0.2-1 On the Builder Interconnect Setting/3.0.0-1 to 3.0.1-1 window, when attempting to deleteThere was no consideration <strong>for</strong>inter connect settings by selecting allselecting multiple interconnects.settings, only some settings weredeleted.3.0.2-1The timing to release the MutexThe system abnormally terminated/3.0.0-1 to 3.0.1-1 resource, which was used by thewhen the WebManager service wasrealtime update thread, was notstopped.correct.3.0.2-1 The alert synchronization service/3.0.0-1 to 3.0.1-1 abnormally terminated when There was an error in the process torestarting the server after changing its obtain the server list.name.86<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumber12131415161718192021222324Version (indetail) Upgraded section Cause3.0.2-1 When mdw was timed out or <strong>for</strong>cibly There was no timing to release the/3.0.0-1 to 3.0.1-1 killed, the OS resources were leaked. obtained semaphore.3.0.2-1 When it was specified that the initial When a user did not execute the/3.0.0-1 to 3.0.1-1 mirror construction was not executed, initial mirror constructionresynchronization did not become intentionally, the flag to guaranteeenabled until full synchronization was the consistency of the disk contentsper<strong>for</strong>med.was not established.3.0.2-1This occurred when returning to the/3.0.0-1 to 3.0.1-1 When a cluster name was changed in cluster name change screen afterthe Cluster Generation Wizard, the the cluster name was changed inname was reset to a default name. the Cluster Generation Wizard andthe next step was selected.3.0.2-1 the recovery action was not executed The process to decide whether to/3.0.0-1 to 3.0.1-1 even if a volmgrw monitor error was execute the recovery action was notdetected.correct.3.0.2-1 The timeout <strong>for</strong> the volmgr resource The <strong>for</strong>mula to calculate the time out/3.0.0-1 to 3.0.1-1 could not be correctly specified. was not correct.3.0.2-1 When a keyword over 256 characters/3.0.0-1 to 3.0.1-1 was specified, linkage with external The size of the buffer to save themonitoring was not started even if the keyword was insufficient.mnw monitor was set.3.0.2-1The check process of shutdownWhen disabling shutdown monitoring,/3.0.0-1 to 3.0.1-1 monitoring was executed by theuser space monitoring could not beinitialization process of user spacestarted.monitoring.3.0.2-1 The timeout <strong>for</strong> shutdown monitoring The heartbeat timeout was specified/3.0.0-1 to 3.0.1-1 could not be changed.to use at any time.3.0.3-1 In config mode, non-numeric data There was an error in the design of/3.0.0-1 to 3.0.2-1 (alphabetic characters and symbols) the Builder input control.could be incorrectly entered <strong>for</strong> WaitTime When External MigrationOccurs <strong>for</strong> VM monitor resources.3.0.3-1 The method designed <strong>for</strong> applying a Because the server ID of the server/3.0.0-1 to 3.0.2-1 change in the server priority involved used to activate the groupsuspending and resuming the cluster resources was stored in the sharedand then restarting WebManager. memory, the in<strong>for</strong>mation on thatHowever, it actually requires the server became inconsistent whenstopping and starting of the cluster the server ID was changed.and then restarting WebManager.3.0.3-1 When "0" was specified as the There was an error in the design of/3.0.0-1 to 3.0.2-1 timeout period <strong>for</strong> EXEC resources, the Builder input control.the activation of EXEC resourcesfailed and emergency shutdown wasper<strong>for</strong>med.3.0.3-1 In a specific environment, pressing The error was caused by a problem/3.0.0-1 to 3.0.2-1 the Add Server button in the Cluster with JRE.Generation Wizard of the Buildercaused an application error.3.0.3-1 When a hybrid configuration was There was a problem in the logic/3.0.0-1 to 3.0.2-1 used, the mirror agent sometimes used <strong>for</strong> searching server groups.failed to start.Section II Installing <strong>ExpressCluster</strong>87


Chapter 4 Latest version in<strong>for</strong>mationNumber2526272829Version (indetail) Upgraded section Cause3.0.3-1 When "0" was specified <strong>for</strong> the/3.0.0-1 to 3.0.2-1 Server Sync Wait Time, the mainprocess of the cluster sometimesfailed to start.3.0.3-1 When an attempt was made to/3.0.0-1 to 3.0.2-1 per<strong>for</strong>m failover between absoluteexclusion groups upon the group status.occurrence of multiple and concurrentmonitor errors, both systems weresometimes activated.When "0" (minutes) was specified<strong>for</strong> the Server Sync Wait Time, thevalue of the startup wait timeoutbecame identical to that of the HBtransmission start timeout.There<strong>for</strong>e, the startup waitprocessing was not per<strong>for</strong>medappropriately because of the timingof the processing.There was an error in the design ofthe value that was returned as the3.0.3-1 The setting <strong>for</strong> <strong>for</strong>ced FIP activation The setting was overwritten with/3.0.0-1 to 3.0.2-1 was ignored.another setting in theimplementation.3.0.3-1 The units of the time values to be The output conversion method was/3.0.0-1 to 3.0.2-1 displayed in the alert (syslog) <strong>for</strong> incorrect.delay warning in the user spacemonitor resources were incorrect,and the values to be displayed inunits of tick count were displayed inunits of seconds.3.0.3-1 When the size of an alert message The size of the alert message buffer/3.0.0-1 to 3.0.2-1 exceeded 512 bytes, the alert was insufficient.daemon terminated abnormally.8830 3.0.3-1/ 3.0.2-131323334<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1WebManager could not be normally When terminating WebManager,terminated by selecting Exit from the the termination process of ConfigFile menu.Mode (Builder) is inadequate.Uploading is unavailable if the cluster When checking processes whilehas been suspended and a uploading, a cluster configurationtemporary suspension of a monitored file only judged whether the statusresource modified a necessary of monitor resources wasconfigurationsuspended.Sometimes unnecessary logs were After the failover destination serveroutput when an attempt was made to was decided in the first monitor errorper<strong>for</strong>m failover between absolute processing, in the next monitor errorexclusion groups upon the processing, a different server wasoccurrence of multiple and concurrent decided <strong>for</strong> the failover destinationmonitor errors.server because it was determinedthat absolute exclusion groups wererunning on that server.When waiting to start monitoring a The processing <strong>for</strong> the monitor startresident monitor resource, the wait time was invalid.timeout time rather than the start waittime is referenced.An error occurred when collecting There were faults in the processinglogs, delivering settings in<strong>for</strong>mation, that determines whether the actionor during other activities, but the was successful or nor.process appears to have terminatednormally.<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumber3536373839404142434445Version (indetail) Upgraded section Cause<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1When adding a server to a mirror When confirming settings whileenvironment, suspending the cluster adding a server, it was not checkedand mirror agent is required to apply whether mirror disk connect hasthe new settings, but been set.suspend/resume is displayed.When a standby mirror agent is A buffer is released if receiving datasuspended during mirror recovery, failed, but a NULL pointer wasretrieving the recovery data may fail, issued when trying to receive thegenerating OOPS.next data.OOPS occurred sometimes in the Mirror driver functions whichdriver termination processing when process the completion notificationthe standby server mirror agent was <strong>for</strong> processing writing to datastopped during mirror recovery. partitions do not exist in the memorywhen rmmod mirror drivers are nolonger present after a request towrite mirror recovery data to a datapartition.The OS sometimes freezes when the Interruptible sleep occurred whileactive server mirror agent is stopped waiting <strong>for</strong> the recovery data read toduring mirror recoveryfinish, that sleep was interruptedduring shutdown, and the CPUusage <strong>for</strong> the thread waiting <strong>for</strong> readto finish increased.Multiple syslog messages may mix The functions <strong>for</strong> mirror driverand be output from the mirror driver syslog output uses the same bufferso that the same message is output without exclusion.twice at times such as servershutdown when mirror driver syslogoutput frequency is high.When the IP address <strong>for</strong> integrated The names of previous setting itemsWebManager was not specified, error were not updated.messages output due to failures toconnect to clusters were invalid.Sometimes core dump occurred while Illegal memory access occurredstopping a cluster service when max when a log was output after logreboot count limitation was set. library termination processingduring termination processing.When a destination server was down No failover policy check processwhile moving a group, sometimes the existed in the recovery processinggroup was moved to a server which associated with the server beingwas not included in the failover policy. down at the destination.Server down notification settings Cluster configuration in<strong>for</strong>mationwere changed and uploaded, but the was not reloaded upon termination.changes were not applied.A minor memory leak occurred when Thread in<strong>for</strong>mation was notper<strong>for</strong>ming a group or resource discarded after the threadoperation.terminated.When a script execution process Sometimes waitpid() wastimes out be<strong>for</strong>e the final operation executed be<strong>for</strong>e a process wasruns and is <strong>for</strong>ce killed, sometimes a terminated by SIGKILL.zombie process was generated.Section II Installing <strong>ExpressCluster</strong>89


Chapter 4 Latest version in<strong>for</strong>mationNumber4647484950515253Version (indetail) Upgraded section Cause<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1Suspending a user space monitor When WebManager was connectedsometimes failed if the monitoring to a server other than the mastermethod <strong>for</strong> user space monitoring server, the monitor status of thewas modified and uploaded when other servers was not checked.WebManager was connected to aserver other than the master server.The value of the EXEC resource An internal flag of “Prioritize failoverenvironmental variable CLP_EVENT, policy in the server group” wasis set to START instead of FAILOVER processed incorrectly.when the failover occurs <strong>for</strong> groups towhich “Prioritize the failover policywithin the server group” is set.The monitor resource startup status A internal flag of “Prioritize failoverwas not restored when failover was policy in the server group” wasper<strong>for</strong>med <strong>for</strong> groups to which processed incorrectly.“Prioritize failover policy in the servergroup” is set.Recovery action counts were not Shared memory values were reset,reset by the clpmonctrl command but values saved in the memory ofwhen the monitor error recovery monitor resource processes are notaction was fully executed.reset.When a server is added while BMC or In<strong>for</strong>mation to be added anda warning light is set, the BMC and associated with the server additionwarning light are not set to the new was insufficient.server in<strong>for</strong>mation.Cluster Generation Wizard was In the processing of cancellingstarted and two servers were added Cluster Generation Wizard, not allto a cluster. After that, the wizard was data was discarded.cancelled (but the settings weresaved). Then, the add a server wizardwas started but a server was notdisplayed on the interconnectconfiguration screen.If a license error occurs while A thread which was processing NPprocessing NP resolution when resolution was cancelled when astopping the cluster due to a license cluster was stopped by a licenseerror, sometimes the cluster does not error, but the thread was deadnormally stop.locked because of cancelling thelocked thread.The name of a server cannot be fully A horizontal scroll bar was notdisplayed in the list of available displayed.servers in the server tab of groupproperties.54<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1Mirror commands (such as When an internal command wasexecutions by using sudo commands executed, there was a location thatand script executions by using did not reference an absolute path.crond) from the root user did not runproperly when a path to an OSstandard command (such as/usr/sbin/) was not taken..90<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumber555657585960616263646566Version (indetail) Upgraded section Cause<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1If an I/O error occurred on a server, The remaining disk error flag wasand a disk error flag remains on the not properly dealt with.cluster partition, the server isrestarted repeatedly if the CPU isrestarted without replacing the disk.In an x86_64 environment in which an A stack was destroyed whileI/O error occurred on the mirror disk, per<strong>for</strong>ming reset processing.panic is per<strong>for</strong>med rather than reset.When an asynchronous mode mirror The data transmission processingon a VMware guest OS is used and delay was not properly dealt with.ACK timeout is set to less than 30seconds, 100% of CPU usage is theVMware task, and the OS stalls.When an asynchronous mode mirror Cases exist in which processingon a VMware guest OS is used, the which should not be in reverse ofserver caused PANIC while writing in the original processing order is inan environment running multiple reverse order due to the order of theguest OSes, each of which is VMware CPU switch.assigned only one CPU.If, in the first cluster configuration, the Return value determination of theenvironment consisted of 3 or more license collection processing wasnodes and the CPU licenses were incorrect.only registered to a single server,sometimes license authenticationfailed and the CPU could not start.If there was a group whose failover Startup processing on anotherattribute set to “Dynamic failover”, all server was executed at the samegroups take longer to startup. time.When stopping a monitor resource There was a flaw in the processingwhich is being continuously that terminates the child processesmonitored, Application Server Agent of the Application Server Agent.sometimes terminated otherprocesses when it was stopped.Monitor status changed to a status Sometimes the status wasother than “suspend” after overwritten after it was set tosuspending the monitor resource. suspend.When a monitor resource is Depending on timing, waitpid()suspended, it sometimes remained was not executed when childas a zombie process.processes were terminated.When using diverse resources or The initialized area that managesmonitors, if the number of the types is only <strong>for</strong> use with 128<strong>ExpressCluster</strong> module types that types.output logs exceeds 128, sometimesinternal logs are not output.MA memory leak occurred when Internal in<strong>for</strong>mation was notsuspending a cluster failed because a discarded at the time of the failure togroup was moving.suspend the cluster.A memory leak occurred when Internal in<strong>for</strong>mation was notstopping a cluster failed because a discarded at the time of the failure togroup was moving.stop the cluster.Section II Installing <strong>ExpressCluster</strong>91


Chapter 4 Latest version in<strong>for</strong>mationNumber676869707172Version (indetail) Upgraded section Cause<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-173 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-174 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-175 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-176 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-1Child processes remained when a The custom monitor terminatedgenw monitor timeout occurred while be<strong>for</strong>e the child process.the enw settings were set tosynchronous and the dump collectionfunction was enabled..If there was a group whose failover Space was not freed server groupattribute was set to “Prioritize failover management.policy within the server group”,memory leak occurred at failover.When group resources were Resource stop processing (otheractivated on a server that was not than that of a server to whichconnected by WebManager, stopping WebManager is connected) wasthe group resources failed when incomplete.uploading cluster configurationin<strong>for</strong>mation associated with stoppinggroup resources.Sometimes the mirror agent does not There was a problem with the logicstart in a hybrid configuration when used to search <strong>for</strong> the server group..the server group that is used is agroup in which no hybrid diskresource exists, and only one servergroup was specified. .When the host name was FQDN, If the host name which was obtainedrequests from the clprexec from the OS was FQDN, the servercommand failed.cannot find items from a clusterconfiguration file.If there are many objects will be There was a problem in the sourcedisplayed on WebManager, code to allocate memory to displayWebManager server process may be objects.terminated abnormally.If initializing XenServer virtual Environments where XenServermachine resources failed in an could not be used were notenvironment where XenServer could considered.not be used, WebManager serverprocess might be terminatedabnormally.If initializing XenServer virtualmachine monitor resources failed inan environment where XenServercould not be used, WebManagerserver process might be terminatedabnormally.Environments where XenServercould not be used were notconsidered.After collecting logs, some OS After initializing threads is complete,resources of the log collection if the process to wait <strong>for</strong> thefunction might remain.initialization completion is executedon the parent thread, the commandwaits <strong>for</strong> the initialization completionendlessly.The cluster configuration data could It was not considered somebe uploaded even if there is a server changes require to stop resourceson which the <strong>ExpressCluster</strong> service to be reflected to the configuration.did not start in the cluster.92<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumberVersion (indetail) Upgraded section Cause77 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-178 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-179 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-180 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-181 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-182 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-183 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-184 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-185 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-186 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-187 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-1After collecting logs, the files that For SuSE <strong>Linux</strong>, the tar commandmust be deleted might remain. options were not considered.If the VM license is used, an The message that was notunnecessary alert might be output necessary to output when using thewhen starting the cluster.VM license has been output.If the configuration data is uploaded The file that defines the method towith clearing the default resource reflect changes is inadequate.dependency and without specifyingany dependencies, only the clustersuspend is requested even if it isnecessary to stop a group.If the smart failover is set and When allocating the resource datamemory is insufficient when starting storage area failed, an illegalthe cluster, the clprc process might memory access occurs due to theabnormally terminate and the server API allocation request.might be shut down.WebManager might abnormally Since the size of the temporaryterminate if there is a lot of buffer is fixed to 4096 bytes, anin<strong>for</strong>mation to be displayed <strong>for</strong> illegal memory access occurs ifWebManager because there are a lot there is in<strong>for</strong>mation exceeding 4096of servers.bytes.When the setting to start the The monitor processes ofWebManager service and WebAlert WebManager service and WebAlertservice is disabled, the alert service did not consider the settingindicating that starting the to disable the WebManager serviceWebManager service and WebAlert and WebAlert service to start.service failed at OS startup might beoutput.The description (in English) of rc In the current description, “hasmessage ID=26 was not correct. started” was used, but, “has beencompleted” is correct.The correct method to reflect the The file that defines the method toadded group resource is reflect changes was inadequate.“stopping/suspending the group”, but“stopping the cluster” was per<strong>for</strong>med.The warning dialog box might be The process to check thedisplayed when uploading configuration data ID did notconfiguration data in an environment consider old configuration data.of which <strong>ExpressCluster</strong> X wasupgraded from 2.x to3.x.The file descriptor used by the clprc The process to close the fileprocess might leak if WebManager descriptor might not be per<strong>for</strong>med.was frequently updated or clpstatwas frequently executed.When a valid COM heartbeat device The status setting process atname was specified in the blank field, resuming the cluster was notthe system prompted to suspend and correct.resume the cluster. In this case, if thecluster was not stopped, the COMheartbeat did not operate properly.Section II Installing <strong>ExpressCluster</strong>93


Chapter 4 Latest version in<strong>for</strong>mationNumberVersion (indetail) Upgraded section Cause88 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-189 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-190 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-191 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-192 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-193 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-194 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-195 <strong>3.1</strong>.1-1/ 3.0.0-1 to<strong>3.1</strong>.0-196 <strong>3.1</strong>.1-1/ 3.0.3-1 to<strong>3.1</strong>.0-197 <strong>3.1</strong>.1-1/ 3.0.3-1 to<strong>3.1</strong>.0-198 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-1A monitor resource name was not The process to output ID=170 andcorrectly output in the alert of rm 171 was not correct.ID=170 or 171.If an error was detected when Only the final action <strong>for</strong> themultiple resources were being abnormal resource that was foundactivated, the final action was at first was per<strong>for</strong>med.per<strong>for</strong>med <strong>for</strong> the abnormal resourcethat was found at first in thealphabetical order. There<strong>for</strong>e, if theresource to which No Operation wasset, the operation such as shutdownwas not per<strong>for</strong>med.An unnecessary rc alert might be Even if the failover count is 0, thelogged <strong>for</strong> a deactivation error if the process to find a failover target wasfailover count was set to 0 when a per<strong>for</strong>med when a deactivation errordeactivation error occurred in a was occurred.resource in a group to which dynamicfailover was set.Multiple confirmation dialog boxes An exclusive processing when pressingmight be displayed when continuously the operation button was inadequate.pressing the operation button inWebManager Mobile.The default script was not correct inThe ulimit setting of the default script config mode.was deleted in WebManager configmode.Configuration data of which mirror The process to check server additionconfiguration consisted 3 or more nodes was in adequate.could be created.When deleting a virtual machine The delete condition decision processresource, a related virtual machine of the automatic monitor resourceresource could not be deleted delete process was inadequate.automatically.When linkage with a server management The process to update the messageinfrastructure was available, the status receive monitor status was inadequate.might remain OFFLINE if monitoring themessage receive monitor was startedbe<strong>for</strong>e starting the infrastructure module.The NFS monitor resource could not It was determined normal that thedetect that only nfsd was unmount process was normallydisappeared.per<strong>for</strong>med.If multiple targets were registered to a Java API was not thread-safe.JVM monitor resource, monitoringmight fail and a warning might beissued when starting to monitor theJVM monitor resource.If a process of which name length An environment that included awas 1024 bytes or more existed, a process of which name length wasprocess name monitor resource 1024 bytes or more was notmight abnormally terminate. considered.94<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumberVersion (indetail) Upgraded section Cause99 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-1100 <strong>3.1</strong>.1-1/ <strong>3.1</strong>.0-1101102103104105<strong>3.1</strong>.0-1/ 3.0.0-1 to3.0.4-1If the monitoring level is level 2 and The action to be taken when there isno records were created at creation of no record when reading data from aa table <strong>for</strong> monitoring, the database by select during level 2PostgreSQL monitor resource might monitoring was not defined.abnormally terminate.When Database Agent detected a The retry process after a monitoringtimeout, monitoring was immediately timeout was not considered.retried without waiting <strong>for</strong> the monitoringinterval.It might fail to start a specific monitor There was a variable that had notresource <strong>for</strong> the first time, causing a been initialized.monitor error.In a specific machine environment,this might occur <strong>for</strong> an ARP monitorresource, DDNS monitor resource,mirror disk monitor resource, mirrordisk connect monitor resource, hybriddisk monitor resource, hybrid diskconnect monitor resource, messagereceive monitor resource, and virtualIP monitor resource.When the cluster was resumed from The message text was not correct.WebManager, Failed to resume wasmistakenly displayed instead of The<strong>3.1</strong>.3-1/ request to resume the cluster<strong>3.1</strong>.0-1 to <strong>3.1</strong>.1-1 failed on some servers.This occurred when the cluster was<strong>for</strong>cibly suspended and then resumedwith some servers stopped.After the <strong>ExpressCluster</strong> Web Alert The buffer area used to readservice might abnormally terminate, /proc/pid/cmdline wasthis service might start.insufficient, or strerr(), whichWhen the <strong>ExpressCluster</strong> Web Alert was not thread-safe, was<strong>3.1</strong>.3-1/service was killed <strong>for</strong> some reason, sometimes used by multiple3.0.0-1 to <strong>3.1</strong>.1-1this infrequently occurred at the next threads.service startup. Also, this mightinfrequently occur in normaloperation.Mirror recovery might fail immediately Even if data was sent through an oldafter the processing started, and then connection be<strong>for</strong>e disconnectionauto mirror recovery might start. occurred, a send error might not<strong>3.1</strong>.3-1/ This might occur when the mirror disk occur.3.0.0-1 to <strong>3.1</strong>.1-1 connect was recovered from Also, there might be a gap in timingdisconnection.of communication recoverydetection between an active serverand a stand-by server.In SuSE11 environments, an internal The method to create a socket waslog was not output when UDP was set inadequate.<strong>3.1</strong>.3-1/ to a log communication method.3.0.0-1 to <strong>3.1</strong>.1-1 This occurred when UDP was set to alog communication method inSuSE11 environments.Section II Installing <strong>ExpressCluster</strong>95


Chapter 4 Latest version in<strong>for</strong>mationNumber106107108109110Version (indetail) Upgraded section CauseWhen a volume manager resource A default value of an automaticallywas added, a default value of an generated volume manager monitorautomatically added volume manager resource had not been defined.monitor resource was invalid.<strong>3.1</strong>.3-1/This occurred <strong>for</strong> a volume manager<strong>3.1</strong>.0-1 to <strong>3.1</strong>.1-1monitor resource that wasautomatically generated when avolume manager resource wasadded.Multiple volume manager monitor For VxVM, only one volumeresources might be registered. manager monitor resource had to<strong>3.1</strong>.3-1/This occurred when multiple VxVM be automatically registered, but a3.0.0-1 to <strong>3.1</strong>.1-1volume manager resources were monitor resource was registered <strong>for</strong>added.each volume manager resource.When stopping a group by using the The message text was not correct.clpgrp command failed, an errormessage indicating that starting thegroup failed might be displayed.<strong>3.1</strong>.3-1/<strong>3.1</strong>.1-1 This occurred when a group, whichhad been started on another server,was stopped by the clpgrp -tcommand without the -h and -foptions specified.In an environment where a specific The monitor resource name savemonitor resource existed, suspending area was not initialized.and resuming the monitor resourcemight fail.This might occur in an environmentthat included any of the followingmonitor resources:- ARP monitor resource- DDNS monitor resource<strong>3.1</strong>.3-1/<strong>3.1</strong>.1-1 - Mirror disk monitor resource- Mirror disk connect monitorresource- Hybrid disk monitor resource- Hybrid disk connect monitorresource- User mode monitor resource- Message receive monitor resource- Virtual IP monitor resource- Virtual machine monitor resourceExecuting a script by the clprexeccommand might fail.<strong>3.1</strong>.3-1/ This occurred when a script to be3.0.0-1 to <strong>3.1</strong>.1-1 executed by the clprexeccommand was stored in the pathdescribed in the guide.The path to store a clptrnreqcommand script was used.96<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumber111112113114115116117Version (indetail) Upgraded section CauseIt took extra five seconds to per<strong>for</strong>m a A final retry entered in unnecessaryfinal retry upon a group resource sleep state (five seconds) when anactivation/deactivation error. activation or deactivation retry was<strong>3.1</strong>.3-1/This occurred upon a group resource per<strong>for</strong>med upon a group resource3.0.0-1 to <strong>3.1</strong>.1-1activation/deactivation error when it activation/deactivation error.was set to retry activate or deactivatethe group resource.When a network used by mirror disk When an attempt was made to bindconnect was brought down by a a socket in a sending process bycommand such as ifdown, mirror ICMP, the socket was connectedrecovery might be executed without checking the return value.<strong>3.1</strong>.3-1/ repeatedly.3.0.0-1 to <strong>3.1</strong>.1-1 This occurred when there was a pathto a remote server other than mirrorconnect and a network used by mirrordisk connect was brought down byifdown.A message receive monitor error A monitor process was notmight be detected when the cluster generated when the cluster waswas being stopped.being stopped, and the stopping<strong>3.1</strong>.3-1/This might occur when a message process was always assumed to be3.0.0-1 to <strong>3.1</strong>.1-1receive monitor was monitored when completed successfully. In such athe cluster was being stopped. case, it was checked whether amonitor process existed or not.Changes were supposed to be When the Forced Stop andreflected to the <strong>for</strong>ced stop function Chassis Identify are disabledsettings by uploading, but could not (OFF), the setting in<strong>for</strong>mation wasbe reflected.not acquired in the process to<strong>3.1</strong>.3-1/This occurred when Forced Stop acquire.3.0.0-1 to <strong>3.1</strong>.1-1was changed to enable (ON) after thecluster was started with Forced Stopand Chassis Identify disabled(OFF).When collecting logs, a rotated syslog The name of the rotated messagesfile might not be collected.file did not comply with the changed<strong>3.1</strong>.3-1/This occurred when logs were naming rule.3.0.2-1 to <strong>3.1</strong>.1-1collected in RHEL6 or laterenvironments.Stopping the cluster might not be A termination process was missingcompleted.in the process to check a thread<strong>3.1</strong>.3-1/This might infrequently occur when a termination request.3.0.0-1 to <strong>3.1</strong>.1-1message receive monitor resourcewas set.33 or more destinations to which a The control processing of the AddSNMP trap was sent could be set. button on the SNMP trap sending<strong>3.1</strong>.3-1/ This occurred when the SNMP trap destination settings screen was<strong>3.1</strong>.0-1 to <strong>3.1</strong>.1-1 sending destination settings screen inadequate.was started again after 32destinations had been set.Section II Installing <strong>ExpressCluster</strong>97


Chapter 4 Latest version in<strong>for</strong>mationNumber118119120121122123Version (indetail) Upgraded section CauseWhen a mirror disk resource or hybrid When a <strong>for</strong>cibly activated serverdisk resource was <strong>for</strong>cibly activated was rebooted, the internal state ofby a command or other operations, other servers were not updated andauto mirror recovery might be entered the state disabling mirrorrepeated after a server including synchronization in the same way as<strong>3.1</strong>.3-1/<strong>for</strong>cibly activated resources was <strong>for</strong>ced activation. There<strong>for</strong>e,3.0.0-1 to <strong>3.1</strong>.1-1restarted.synchronization was canceled afterThis occurred when a server was auto mirror recovery and then autorestarted with a mirror disk resource mirror recovery was repeated.or hybrid disk resource <strong>for</strong>ciblyactivated.When READ was selected as Method A process to store the valueto monitor a disk resource, I/O Size specified <strong>for</strong> I/O Size was missing<strong>3.1</strong>.3-1/ might return to the default value. when changing Method.3.0.0-1 to <strong>3.1</strong>.1-1 This occurred when Method waschanged from READ to TUR, andthen returned to READ.A monitor resource might mistakenly Reading the configuration data filedetect an error when uploading the failed if a monitor resource tried to<strong>3.1</strong>.3-1/ configuration data.refer to the configuration data when3.0.0-1 to <strong>3.1</strong>.1-1 This occurred on very rare occasions the file was being replaced.when uploading the configurationdata.An FTP monitor resource might The process to be per<strong>for</strong>med whenmistakenly detect a monitoring an intermediate response and final<strong>3.1</strong>.3-1/ timeout.response were returned together3.0.0-1 to <strong>3.1</strong>.1-1 This occurred when an FTP server was not correct.returned an intermediate responseand final response together.When a monitoring target of the JVM Even after the monitoring target wasmonitor resource was terminated by a terminated, the load statusfailover, the load status of Java VM to in<strong>for</strong>mation collected be<strong>for</strong>e thebe monitored that was collected failover was still maintained.immediately be<strong>for</strong>e the failover wascontinuously reported to the<strong>3.1</strong>.3-1/ distributed node module from a<strong>3.1</strong>.0-1 to <strong>3.1</strong>.1-1 source server of failover even afterthe failover.When using the load calculationfunction of Java VM to be monitoredby the load balancer linkage, thisoccurred when the monitoring targetwas terminated by a failover.When a monitoring timeout occurred The process to cancel monitoring ain a PostgreSQL monitor resource, PostgreSQL monitor resource whenthe next monitoring might fail a timeout occurred was inadequate.<strong>3.1</strong>.3-1/ because a PostgreSQL session3.0.0-1 to <strong>3.1</strong>.1-1 remained.This occurred when a monitoringtimeout occurred and the specifiedtimeout interval was short.98<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumberVersion (indetail) Upgraded section CauseIt might take 10 seconds or more to Waiting <strong>for</strong> the completion of the124display the execution results of the thread initialization might fail<strong>3.1</strong>.3-1/ clpstat command.depending on the timing, and the3.0.0-1 to <strong>3.1</strong>.1-1 This occurred on very rare occasions clpstat command might wait <strong>for</strong> awhen executing the clpstat timeout.command.Even if the necessary license had Replicator DR Upgrade was not125been registered, hybrid disk included in the in<strong>for</strong>mation that<strong>3.1</strong>.3-1/ resources might not be displayed in associated group resources and<strong>3.1</strong>.0-1 to <strong>3.1</strong>.1-1 the resource list.licenses.This occurred when the registeredlicense is Replicator DR Upgrade.When changing the WebManager The Mirror Disk Helper screen was126mode from Operation Mode to not initialized when theReference Mode, the Execute WebManager mode was changed tobutton on the Mirror Disk Helper Reference Mode.<strong>3.1</strong>.3-1/screen was initially enabled.3.0.0-1 to <strong>3.1</strong>.1-1This occurred when the WebManagermode was changed to ReferenceMode with the Mirror Disk Helperscreen open.When starting a cluster from The process <strong>for</strong> handling an error when127 <strong>3.1</strong>.4-1/WebManager/WebManager Mobile, an a server that cannot start exists was not<strong>3.1</strong>.3-1error message may not be displayed correct.correctly.This occurred when a server that couldnot start existed.A process to release memory was found128 <strong>3.1</strong>.4-1/ A memory leak occurred in the resourceto be missing when looping over3.0.0-1 to <strong>3.1</strong>.3-1 management process when obtainingmultiple IP address groups.in<strong>for</strong>mation <strong>for</strong> pingnp or whenexecuting the clpstat command.This occurred when the PingNP resourcewas configured and multiple IP addressgroups were set.The time in<strong>for</strong>mation icon might not It was determined that the server had129 <strong>3.1</strong>.4-1/blink on WebManager even when the not been connected be<strong>for</strong>e when it was<strong>3.1</strong>.3-1time in<strong>for</strong>mation was updated. started.This occurred when a server was stoppedand started after WebManagerconnection.Restart of the alert synchronization A system call, which was not130 <strong>3.1</strong>.4-1/service might occur.thread-safe, was sometimes used by3.0.0-1 to <strong>3.1</strong>.3-1This occurred on very rare occasions multiple threads.during normal operation.A memory leak might occur in the alert A memory release process was missing131 <strong>3.1</strong>.4-1/synchronization service.from the processing to be per<strong>for</strong>med3.0.0-1 to <strong>3.1</strong>.3-1This occurred when WebManager could after a communication failure.not communicate with a server <strong>for</strong> whichtwo or more interconnects wereestablished.Section II Installing <strong>ExpressCluster</strong>99


Chapter 4 Latest version in<strong>for</strong>mationNumberVersion (indetail) Upgraded section CauseIn<strong>for</strong>mation acquisition requests132 <strong>3.1</strong>.4-1/ In<strong>for</strong>mation on WebManager,between servers were made in an3.0.0-1 to <strong>3.1</strong>.3-1 clpstat command, SNMP managerirregular order.linkage, etc., could fail to display.This occurred when internalcommunication timed out <strong>for</strong> somereason such as interconnectdisconnection or overload, after whichcontrol returned to the state existingbe<strong>for</strong>e interconnect switchover.The JVM monitor resource might The number of occurrences of Full GC133 <strong>3.1</strong>.4-1/mistakenly detect the number of that was retained by the JVM monitor<strong>3.1</strong>.0-1 to <strong>3.1</strong>.3-1occurrences of Full GC either be<strong>for</strong>e or resource was not cleared when theafter restart of the monitored Java VM, monitored Java VM restarted.causing an error.This occurred when the monitored JavaVM restarted in an application in whichFull GC occurred frequently.When confirming the status while134 <strong>3.1</strong>.4-1/ When using the clpcfctrlapplying settings, a check of whether3.0.0-1 to <strong>3.1</strong>.3-1 command to upload configurationthe mirror agent was suspended was notin<strong>for</strong>mation, a message indicatingmade.that the upload was successful mightbe displayed even though the uploadhas failed.This occurred when a configurationin<strong>for</strong>mation file from which a mirror diskresource was deleted was uploaded whilethe mirror agent was running.When the event service updated the file135 <strong>3.1</strong>.4-1/ The clplogcf command executionof display in<strong>for</strong>mation, it emptied the3.0.0-1 to <strong>3.1</strong>.3-1 results may not be displayed.file once <strong>for</strong> writing.This occurred when the event serviceupdated a temporary file <strong>for</strong> storingdisplay in<strong>for</strong>mation upon execution ofthe clplogcf command.In WebManager config mode, an There was an error when checking136 <strong>3.1</strong>.4-1/exception in Java might occur when whether the checkbox <strong>for</strong> using server3.0.0-1 to <strong>3.1</strong>.3-1using the group addition wizard to add a group was selected.disk resource.This occurred when the use of servergroup was repeatedly selected anddeselected <strong>for</strong> the startup servers in thegroup settings.There was an error in the parameter137 <strong>3.1</strong>.4-1/ When the clpstat command wasdisplay settings.<strong>3.1</strong>.0-1 to <strong>3.1</strong>.3-1 used to display property in<strong>for</strong>mation<strong>for</strong> a disk monitor resource, “Disk fullAction” was not displayed.This occurred when the followingcommand was executed:clpstat --mondisk_monitor_name --detail100<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileVersion (inNumberdetail) Upgraded section CauseStopping a monitor resource sometimes Whether the process was alive, and the138 <strong>3.1</strong>.4-1/caused other processes to be killed. process name, were not checked be<strong>for</strong>e3.0.0-1 to <strong>3.1</strong>.3-1SIGKILL was issued.This might occur when stopping amonitor resource if the pid of amonitor resource managed by<strong>ExpressCluster</strong> is used <strong>for</strong> otherprocesses.There might be a delay in starting a A function <strong>for</strong> IP address acquisition139 <strong>3.1</strong>.4-1/kernel mode LAN heartbeat resource. was called even when no IP address was3.0.0-1 to <strong>3.1</strong>.3-1This occurred in some servers when there specified.was a kernel mode LAN heartbeatresource <strong>for</strong> which no IP address hadbeen specified (which was not used <strong>for</strong>the server).A system stall might occur when the A function <strong>for</strong> switching threads was140 <strong>3.1</strong>.4-1/system is highly loaded.3.0.0-1 to <strong>3.1</strong>.3-1This occurred in circumstances in whichkernel mode LAN heartbeat resourceswere used and the available amount ofsystem memory ran low.used with a spinlock acquired.The configuration in<strong>for</strong>mation may not The setting of environment variable141 <strong>3.1</strong>.4-1/be reflected.LANG was missing when obtaining3.0.0-1 to <strong>3.1</strong>.3-1This might occur when the OS language system in<strong>for</strong>mation.setting was other than Japanese, English,or Chinese.The user mode monitor resource might Difference calculation was per<strong>for</strong>med142 <strong>3.1</strong>.4-1/mistakenly detect a delay warning. on the number of clock ticks using a3.0.0-1 to <strong>3.1</strong>.3-1In a 32-bit OS environment, this might sign.occur when the OS was running <strong>for</strong> 198or more consecutive days with the usermode monitor resources set up.Some monitor resources might Difference calculation was per<strong>for</strong>med143 <strong>3.1</strong>.4-1/mistakenly detect monitor errors. on the number of clock ticks using a<strong>3.1</strong>.3-1In a 32-bit OS environment, this might sign.occur when the OS was running <strong>for</strong> 198or more consecutive days with any of thefollowing monitor resources set up.[Relevant monitor resources]- db2w- ddnsw- genw- jraw- mysqlw- oraclew- psqlw- psw- sraw- sybasew- vipwSection II Installing <strong>ExpressCluster</strong>101


Chapter 4 Latest version in<strong>for</strong>mationNumberVersion (indetail) Upgraded section CauseThe default fsck execution timing value The default value in the internal settings144 <strong>3.1</strong>.4-1/<strong>for</strong> disk resources was changed from contained an error.<strong>3.1</strong>.3-1“Execute Every 10 Times” to “NotExecute.”This occurred when a new disk resourcewas created or when the default valuewas used <strong>for</strong> an existing disk resource.The disconnection of a mirror disk In Ping communication processing <strong>for</strong>145 <strong>3.1</strong>.4-1/connect might be mistakenly detected <strong>for</strong> networking monitoring, there was an3.0.0-1 to <strong>3.1</strong>.3-1a mirror in a remote configuration. error in retrying reception upon theThis might occur in an environment with receipt of ICMP ECHO REQUESTa large communication delay. from the other server.The process name monitor resource The reception of a suspension/stop146 <strong>3.1</strong>.4-1/might end abnormally.request was not properly handled in the<strong>3.1</strong>.0-1 to <strong>3.1</strong>.3-1This might occur when the cluster was internal operation.suspended/stopped in an environment inwhich the process name monitor resourcewas set up.Monitoring by the HTTP monitor The HTTP monitor resource did not147 <strong>3.1</strong>.4-1/resource might fail.properly deal with a renegotiate request.3.0.0-1 to <strong>3.1</strong>.3-1This occurred in an environment in whichrenegotiate was requested upon receptionvia SSL due to monitoring on https.Some core files might not be collected When compressing log files, the first148 <strong>3.1</strong>.4-1/during log collection.core file was compressed, but the<strong>3.1</strong>.1-1 to <strong>3.1</strong>.3-1This might occur when multiple core files subsequent core files were deleted.existed during log collection.The clpmonctrl command displays the The restart count and the failover count149 <strong>3.1</strong>.5-1/recovery action execution count in an are displayed in the reverse order.3.0.0-1 to <strong>3.1</strong>.4-1invalid order.This problem always occurs when youexecute clpmonctrl -v.The comment field will be blank when The entry field was not properly150 <strong>3.1</strong>.5-1/you click Get License Info on the initialized after Get License Info was<strong>3.1</strong>.0-1 to <strong>3.1</strong>.4-1resource addition wizard in Config Mode executed.in WebManager.This problem always occurs when youclick the Get License Info button.When you click the Get License Info The entry field was not properly151 <strong>3.1</strong>.5-1/button in the monitor addition wizard in initialized after Get License Info was<strong>3.1</strong>.0-1 to <strong>3.1</strong>.4-1Config Mode in WebManager, the initial executed.value is not set to the Name field.This problem always occurs when youclick the Get License Info button.Monitor resources may be created more The automatic monitor resource152 <strong>3.1</strong>.5-1/than the upper limit in the setup mode in addition process did not include an3.0.0-1 to <strong>3.1</strong>.4-1WebManager.upper limit check.This problem occurs if you add resourcesthat trigger the automatic addition ofmonitor resources when the upper limitof monitor resources has been reached.102<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumberVersion (indetail) Upgraded section CauseAn application error may occur causing When the maximum number of file153 <strong>3.1</strong>.5-1/an emergency shutdown in the group descriptors that can be used in the OS is3.0.0-1 to <strong>3.1</strong>.4-1resource management process. exceeded, the currently used socket isThis problem occurs if internal improperly operated.communication is established when themaximum number of file descriptors thatcan be used in the OS is exceeded.When the virtual machine monitor The stopped virtual machine is154 <strong>3.1</strong>.5-1/resources detect that the virtual machine migrated because the migration request3.0.0-1 to <strong>3.1</strong>.4-1is down, that virtual machine may not to vCenter is successfully executedstart at the failover destination. even if the virtual machine stops.This problem occurs if it is detected that avirtual machine is down when you usethe virtual machine monitor resourcesettings to attempt migration be<strong>for</strong>efailover.Migration may not be per<strong>for</strong>med if the Only virtual machine groups can be155 <strong>3.1</strong>.5-1/monitor resource with the setting to migrated by internal processing.3.0.0-1 to <strong>3.1</strong>.4-1attempt migration be<strong>for</strong>e failover detectsan error.This problem occurs if an error isdetected by a monitor resource whoserecovery target is not groups butresources.exec resources may fail to activate. If directory creation processes <strong>for</strong>156 <strong>3.1</strong>.5-1/This problem may occur if you temporary files are executed at the same<strong>3.1</strong>.0-1 to <strong>3.1</strong>.4-1simultaneously execute multiple exec time, the directory creation processresources <strong>for</strong> which the setting to rotate started later fails.logs has been specified and it is the firststartup <strong>for</strong> the server.For some monitor resources, an abnormal A setting to log an alart every time an157 <strong>3.1</strong>.5-1/alert may be continuously logged at each initialization error occurs had been3.0.0-1 to <strong>3.1</strong>.4-1interval.specified.This problem occurs when aninitialization error (such as an invalidlibrary path) occurs in the followingmonitor resources.Relevant monitor resources- db2w- ddnsw- genw- jraw- mysqlw- oraclew- psqlw- psw- sraw- sybasew- vipwSection II Installing <strong>ExpressCluster</strong>103


Chapter 4 Latest version in<strong>for</strong>mationNumberVersion (indetail) Upgraded section CauseThe FTP monitor resource may The FTP monitor executes the next158 <strong>3.1</strong>.5-1/mistakenly detect a monitor error. command be<strong>for</strong>e receiving all3.0.0-1 to <strong>3.1</strong>.4-1This problem occurs if the banner responses from the FTP server.message registered in the FTP server orthe message at the time of connection is along character string or spans multiplelines.An unnecessary message about a System A case in which resources are released159 <strong>3.1</strong>.5-1/Resource Agent background process may in a multi-thread process was not taken<strong>3.1</strong>.0-1 to <strong>3.1</strong>.4-1be output in when a cluster stops. into consideration.This problem occurs if the cluster stops inan environment in which SystemResource Agent is used.If all groups are subject to recovery by a Whether to fail over all the groups was160 <strong>3.1</strong>.5-1/message receive monitor resource, not properly determined.<strong>3.1</strong>.0-1 to <strong>3.1</strong>.4-1failover may not be per<strong>for</strong>med when anerror is detected.This problem occurs if you use messagereceive monitor resources <strong>for</strong> linkagewith the server managementinfrastructure or A1080a/A1040a, andthere is a group that did not activate in alocal server.If the process to unmount the mirror disk OS mountpoint in<strong>for</strong>mation was161 <strong>3.1</strong>.3-1/resource or hybrid disk resource times already removed even though the3.0.0-1 to <strong>3.1</strong>.2-1out, the system may misidentify that the unmount process is still being executed.unmount process is successfullycompleted.This problem occurs if the mirror diskresource or hybrid disk resource unmountprocess takes longer than the unmounttimeout time.Activating/deactivating a volume The vgchange command might fail162 <strong>3.1</strong>.5-1/manager resource may fail.when the vgs command is executed3.0.0-1 to <strong>3.1</strong>.4-1This problem may occur when LVM is while the vgchange command is beingselected as the volume manager type and executed.the vgs command is executed whileactivating/deactivating a volumemanager resource.The mail report function might fail to The domain name of the greeing163 <strong>3.1</strong>.5-1/send a mail.message retured by the SMTP server is3.0.0-1 to <strong>3.1</strong>.4-1This problem occurs when a domain used as the domain of the HELO orname is not included in the greeting EHLO command.message of the destination SMTP server.When the configuration in<strong>for</strong>mation is This occurred when the configuration164 <strong>3.1</strong>.7-1/uploaded using the clpcfctrl command, in<strong>for</strong>mation including interconnect<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1an OS memory shortage error might settings having IPs not existing on theoccur although this is not the case. sever is uploaded.The result of virtual machine resource This occurred while virtual machine165 <strong>3.1</strong>.7-1/activation processing was reflected on the resources were used.<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1environment variable CLP_DISK to beused <strong>for</strong> EXEC resources.104<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


System requirements <strong>for</strong> WebManager MobileNumberVersion (indetail) Upgraded section CauseSometimes recovery action upon monitor This occurred when recovery action by166 <strong>3.1</strong>.7-1/error could not be executed.another monitor resource on the same<strong>3.1</strong>.5-1 to <strong>3.1</strong>.6-1server was tried <strong>for</strong> the group andresource <strong>for</strong> which group stop had beenexecuted as the final action uponmonitor error.In executing the clplogcc command, a log This occurred when a directory in a file167 <strong>3.1</strong>.7-1/file might not be saved in a directory system other than the <strong>ExpressCluster</strong><strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1specified by the -o option.installation path was specified by the -ooption in executing clplogcc -l.The following alert might be output to the This occurred when a server was168 <strong>3.1</strong>.7-1/WebManager.stopped and there was no failover<strong>3.1</strong>.5-1 to <strong>3.1</strong>.6-1destination <strong>for</strong> the failover group thatTYPE:rc, ID:503was running on that server and whenA mismatch in the group failover-md manual failover was configured.status occurs between the servers.The following alert might be output to the This occurred when there were169 <strong>3.1</strong>.7-1/WebManager.<strong>3.1</strong>.5-1TYPE:rc, ID:503A mismatch in the group failover-mdstatus occurs between the servers.differences in the startup times betweenthe servers when the cluster started.In Config Mode of the WebManager, the This occurred when the recovery target170 <strong>3.1</strong>.7-1/final action setting might be changed at was changed on the recovery action tab<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1an unintended timing.of monitor resource properties.The same monitor resource might be This occurred on very rare occasions171 <strong>3.1</strong>.7-1/started redundantly, resulting in when a monitor resource set to Always<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1unnecessary recovery action being monitors and a monitor resource set toexecuted.Monitors while activated were startedsimultaneously when the cluster started.The following alerts might be output to This occurred when it took time to stop172 <strong>3.1</strong>.7-1/the WebManager.a failover group in stopping the cluster.<strong>3.1</strong>.5-1TYPE:rm, ID:9Detected an error in monitoring.( :TYPE:rm, ID:25Recovery will not be executed since therecoverytarget is not active.On the title line displayed when This occurred when clpmdstat --perf173 <strong>3.1</strong>.7-1/clpmdstat --perf was executed, "Cur", was executed.<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1which indicated the latest value, wasdisplayed in place of "Avg" in theaverage column.Section II Installing <strong>ExpressCluster</strong>105


Chapter 4 Latest version in<strong>for</strong>mationNumberVersion (indetail) Upgraded section CauseThe following alert might not be output This occurred when the monitor174 <strong>3.1</strong>.7-1/to the WebManager.resource returned to normal once after<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1the alert was output, and upon detectingTYPE:rm, ID:100an error again within 24 hours, and theRestart count exceeded the maximum of final action was ignored.. Final action of monitoring will not beexecuted.When a live migration of a virtual This occurred when the virtual machine175 <strong>3.1</strong>.7-1/machine resource is executed, the virtual type was KVM.<strong>3.1</strong>.5-1 to <strong>3.1</strong>.6-1machine resource might fail to beactivated at the migration destination.In Config Mode of the WebManager, This occurred when a new JVM176 <strong>3.1</strong>.7-1/Nursery Space and Old Space might not monitor resource was created by<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1become monitor targets when a JVM selecting Oracle JRockit <strong>for</strong> JVM Typemonitor resource is created.and not opening Tuning Properties.In Config Mode of the WebManager, This occurred when Apply the177 <strong>3.1</strong>.7-1/executing Apply the Configuration File Configuration File was executed.<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1caused a memory leak to occur in theWebManager server process. A leak of80 + 256 * number of monitor types inuse + 256 * number of monitor resourcesoccurred per execution.When an IP address or the like is changed This occurred when the Chassis178 <strong>3.1</strong>.7-1/in Server Properties - BMC Tab in Identify function was used.<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1Config Mode of the WebManager, theexecution of Suspend and Resume mightnot apply the change.The icon of the virtual machine resource This occurred when a virtual machine179 <strong>3.1</strong>.7-1/to be displayed on the WebManager was resource was used.<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1wrong.Sometimes disk resource activation This occurred when "lvm" or "vxvm"180 <strong>3.1</strong>.7-1/might fail.was specified <strong>for</strong> Disk Type of a disk<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1resource.There might be a delay in executing This occurred when the time181 <strong>3.1</strong>.7-1/recovery action upon detection of a in<strong>for</strong>mation display function was<strong>3.1</strong>.3-1 to <strong>3.1</strong>.6-1monitoring error.enabled and a monitor resourcedetected an error at disconnection of theprimary interconnect.A VMW monitor resource might make This occurred when the monitor182 <strong>3.1</strong>.7-1/erroneous detection of an error. interval of the VMW monitor resource<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1was set to 15 seconds or more.Migration, move, or failover (other than This might occur when it took time to183 <strong>3.1</strong>.7-1/server down) of a virtual machine might complete migration.<strong>3.1</strong>.0-1 to <strong>3.1</strong>.6-1fail.This might occur when it took time tostop the virtual machine.A virtual machine will restart after it is The virtual machine is restarted by184 <strong>3.1</strong>.7-1/migrated by other than EXPRESSCLUSTER even though the<strong>3.1</strong>.5-1 to <strong>3.1</strong>.6-1EXPRESSCLUSTER.restart is unnecessary when it ismigrated by other thanEXPRESSCLUSTER.106<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Chapter 5Notes and RestrictionsThis chapter provides in<strong>for</strong>mation on known problems and how to troubleshoot the problems.This chapter covers:• Designing a system configuration········································································································· 108• Installing operating system··················································································································· 118• Be<strong>for</strong>e installing <strong>ExpressCluster</strong> ·········································································································· 122• Notes when creating <strong>ExpressCluster</strong> configuration data ······································································· 129• After start operating <strong>ExpressCluster</strong>····································································································· 136107


Chapter 5 Notes and RestrictionsDesigning a system configurationHardware selection, option products license arrangement, system configuration, and shared diskconfiguration are introduced in this section.Function list and necessary licenseThe following option products are necessary as many as the number of servers.Those resources and monitor resources <strong>for</strong> which the necessary licenses are not registered are not onthe resource list of the Builder (online version).<strong>Nec</strong>essary function<strong>Nec</strong>essary licenseMirror disk resource <strong>ExpressCluster</strong> X Replicator <strong>3.1</strong>*1Hybrid disk resource <strong>ExpressCluster</strong> X Replicator DR <strong>3.1</strong>*2Oracle monitor resource <strong>ExpressCluster</strong> X Database Agent <strong>3.1</strong>DB2 monitor resource <strong>ExpressCluster</strong> X Database Agent <strong>3.1</strong>PostgreSQL monitor resource <strong>ExpressCluster</strong> X Database Agent <strong>3.1</strong>MySQL monitor resource <strong>ExpressCluster</strong> X Database Agent <strong>3.1</strong>Sybase monitor resource <strong>ExpressCluster</strong> X Database Agent <strong>3.1</strong>Samba monitor resource <strong>ExpressCluster</strong> X File Server Agent <strong>3.1</strong>nfs monitor resource <strong>ExpressCluster</strong> X File Server Agent <strong>3.1</strong>http monitor resource <strong>ExpressCluster</strong> X Internet Server Agent <strong>3.1</strong>smtp monitor resource <strong>ExpressCluster</strong> X Internet Server Agent <strong>3.1</strong>pop3 monitor resource <strong>ExpressCluster</strong> X Internet Server Agent <strong>3.1</strong>imap4 monitor resource <strong>ExpressCluster</strong> X Internet Server Agent <strong>3.1</strong>ftp monitor resource <strong>ExpressCluster</strong> X Internet Server Agent <strong>3.1</strong>Tuxedo monitor resource <strong>ExpressCluster</strong> X Application Server Agent <strong>3.1</strong>OracleAS monitor resource <strong>ExpressCluster</strong> X Application Server Agent <strong>3.1</strong>Weblogic monitor resource <strong>ExpressCluster</strong> X Application Server Agent <strong>3.1</strong>Websphere monitor resource <strong>ExpressCluster</strong> X Application Server Agent <strong>3.1</strong>WebOTX monitor resource <strong>ExpressCluster</strong> X Application Server Agent <strong>3.1</strong>JVM monitor resource <strong>ExpressCluster</strong> X Java Resource Agent <strong>3.1</strong>System monitor resource <strong>ExpressCluster</strong> X System Resource Agent <strong>3.1</strong>Mail report actions <strong>ExpressCluster</strong> X Alert Service <strong>3.1</strong>Network Warning Light status <strong>ExpressCluster</strong> X Alert Service <strong>3.1</strong>*1 When configuring data mirror <strong>for</strong>m, product Replicator must be purchased.*2 When configuring mirror between shared disk, product Replicator DR must be purchased.Supported operating systems <strong>for</strong> the Builder and WebManagerUse a Web browser and Java Runtime supporting 32-bit machine to run the Builder andWebManager on an x86_64 machine.108<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Designing a system configurationHardware requirements <strong>for</strong> mirror disksDisks to be used as a mirror disk resource do not support a <strong>Linux</strong> md stripe set, volume set,mirroring, and stripe set with parity.Mirror disk resource cannot be made as a target of a <strong>Linux</strong> md stripe set, volume set,mirroring, and stripe set with parity.Mirror partitions (data partition and cluster partition) to use a mirror disk resource.There are two ways to allocate mirror partitions:• Allocate a mirror partition (data partition and cluster partition) on the disk where theoperating system (such as root partition and swap partition) resides.• Reserve (or add) a disk (or LUN) not used by the operating system and allocate a mirrorpartition on the disk.Consider the following when allocating mirror partitions:• When maintainability and per<strong>for</strong>mance are important:- It is recommended to have a mirror disk that is not used by the OS.• When LUN cannot be added due to hardware RAID specification or when changingLUN configuration is difficult in hardware RAID pre-install model:- Allocate a mirror partition on the same disk where the operating system resides.When multiple mirror disk resources are used, it is recommended to prepare (adding) a diskper mirror disk resource. Allocating multiple mirror disk resources on the same disk mayresult in degraded per<strong>for</strong>mance and it may take a while to complete mirror recovery due todisk access per<strong>for</strong>mance on <strong>Linux</strong> operating system.Disks used <strong>for</strong> mirroring must be the same in all servers.• Disk typeMirror disks on both servers and disks where mirror partition is allocated should be ofthe same disk typeFor supported disk types, see “Supported disk interfaces” on page 52.ExampleCombination server1 server2OK SCSI SCSIOK IDE IDENG IDE SCSINotes when the geometries of the disks used as mirror disks differ between the servers.The partition size allocated by the fdisk command is aligned by the number of blocks(units) per cylinder. Allocate a data partition considering the relationship between datapartition size and direction <strong>for</strong> initial mirror configuration to be as indicated below:Source server ≤ Destination server“Source server” refers to the server where the failover group that a mirror disk resourcebelongs has a higher priority in failover policy. “Destination server” refers to the serverwhere the failover group that a mirror disk resource belongs has a lower priority infailover policy.Section II Installing <strong>ExpressCluster</strong>109


Chapter 5 Notes and RestrictionsHardware requirements <strong>for</strong> shared disksA shared disk does not support a <strong>Linux</strong> md stripe set, volume set, mirroring, and stripe setwith parity.When a <strong>Linux</strong> LVM stripe set, volume set, mirroring, or stripe set with parity is used:• <strong>ExpressCluster</strong> cannot control ReadOnly/ReadWrite of the partition configured <strong>for</strong> thedisk resource.When you use VxVM or LVM, a LUN that is not controlled by VxVM or LVM is requiredon a shared disk <strong>for</strong> the disk heartbeat of <strong>ExpressCluster</strong>. You should bear this in your mindwhen configuring LUN on the shared disk.Disk heartbeat-dedicated LUNActual diskdg1dg2Disk group(Virtual disks)VxVM disk groupresource in<strong>ExpressCluster</strong>vxvol1vxvol2vxvol3vxvol4Volume(Partitions allocatedfrom disk group)VxVM disk volumeresource in<strong>ExpressCluster</strong>110<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Designing a system configurationHardware requirements <strong>for</strong> hybrid disksDisks to be used as a hybrid disk resource do not support a <strong>Linux</strong> md stripe set, volume set,mirroring, and stripe set with parity.Hybrid disk resource cannot be made as a target of a <strong>Linux</strong> md stripe set, volume set,mirroring, and stripe set with parity.Hybrid partitions (data partition and cluster partition) are required to use a hybrid diskresource.When a disk <strong>for</strong> hybrid disk is allocated in the shared disk, a partition <strong>for</strong> disk heartbeatresource between servers sharing the shared disk device is required.The following are the two ways to allocate partitions when a disk <strong>for</strong> hybrid disk is allocatedfrom a disk which is not a shared disk:• Allocate hybrid partitions (data partition and cluster partition) on the disk where theoperating system (such as root partition and swap partition) resides.• Reserve (or add) a disk (or LUN) not used by the operating system and allocate a hybridpartition on the disk.Consider the following when allocating hybrid partitions:• When maintainability and per<strong>for</strong>mance are important:- It is recommended to have a hybrid disk that is not used by the OS.• When LUN cannot be added due to hardware RAID specification or when changingLUN configuration is difficult in hardware RAID pre-install model:- Allocate a hybrid partition on the same disk where the operating system resides.When multiple hybrid disk resources are used, it is recommended to prepare (add) a LUN perhybrid disk resource. Allocating multiple hybrid disk resources on the same disk may resultin degraded in per<strong>for</strong>mance and it may take a while to complete mirror recovery due to diskaccess per<strong>for</strong>mance on <strong>Linux</strong> operating system.Device <strong>for</strong> which hybrid disk resource is allocatedType of required partition Shared disk device Non-shared disk deviceData partition Required RequiredCluster partition Required RequiredPartition <strong>for</strong> disk heart beat Required Not RequiredAllocation on the same disk(LUN) as where the OS is- PossibleNotes when the geometries of the disks used as hybrid disks differ between the servers.Allocate a data partition considering the relationship between data partition size anddirection <strong>for</strong> initial mirror configuration to be as indicated below:Source server ≤ Destination server“Source server” refers to the server with a higher priority in failover policy in the failovergroup where the hybrid disk resource belongs. “Destination server” refers to the serverwith a lower priority in failover policy in the failover group where the hybrid diskresource belongs has.Section II Installing <strong>ExpressCluster</strong>111


Chapter 5 Notes and RestrictionsIPv6 environmentThe following function cannot be used in an IPv6 environment:BMC heartbeat resourceThe following functions cannot use link-local addresses:LAN heartbeat resourceKernel mode LAN heartbeat resourceMirror disk connectPING network partition resolution resourceFIP resourceVIP resource112<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Designing a system configurationNetwork configurationThe cluster configuration cannot be configured or operated in an environment, such as NAT, wherean IP address of a local server is different from that of a remote server.Example of network configurationExternal networkNAT deviceInternal network10.0.0.2192.168.0.210.0.0.1NAT device settingsA packet to 10.0.0.2 fromexternal is <strong>for</strong>warded192.168.0.1Server1 cluster settingsLocal server :10.0.0.1Remote server:10.0.0.2Server2 cluster settingsLocal server :192.168.0.1Remote server:10.0.0.1The configuration that IP addresses oflocal and remote servers are different isnot allowed.Execute Script be<strong>for</strong>e Final Action setting <strong>for</strong> monitor resourcerecovery action<strong>ExpressCluster</strong> version <strong>3.1</strong>.0-1 and later supports the execution of a script be<strong>for</strong>e reactivation andbe<strong>for</strong>e failover.The same script is executed in either case. There<strong>for</strong>e, if Execute Script be<strong>for</strong>e Final Action is setwith a version earlier than <strong>3.1</strong>.0-1, editing of the script file may be required.For the additional script configuration needed to execute the script be<strong>for</strong>e reactivation and be<strong>for</strong>efailover, the script file must be edited to assign processing to each recovery action.For the assignment of processing <strong>for</strong> a recovery action, see "Recovery/pre-recovery action script"in Chapter 5, "Monitor resource details" in the Reference <strong>Guide</strong>.Section II Installing <strong>ExpressCluster</strong>113


Chapter 5 Notes and RestrictionsNIC link up/down monitor resourceSome NIC boards and drivers do not support required ioctl( ).The propriety of a NIC Link Up/Down monitor resource of operation can be checked by theethtool command which each distributor offers.ethtool eth0Settings <strong>for</strong> eth0:Supported ports: [ TP ]Supported link modes: 10baseT/Half 10baseT/Full100baseT/Half 100baseT/Full1000baseT/FullSupports auto-negotiation: YesAdvertised link modes: 10baseT/Half 10baseT/Full100baseT/Half 100baseT/Full1000baseT/FullAdvertised auto-negotiation: YesSpeed: 1000Mb/sDuplex: FullPort: Twisted PairPHYAD: 0Transceiver: internalAuto-negotiation: onSupports Wake-on: umbgWake-on: gCurrent message level: 0x00000007 (7)Link detected: yesWhen the LAN cable link status ("Link detected: yes") is not displayed as the result of theethtool command:- It is highly likely that NIC Link Up/Down monitor resource ofEXPRESSCLUSTER is not operable. Use IP monitor resource instead.When the LAN cable link status ("Link detected: yes") is displayed as the result of theethtool command:- In most cases NIC Link Up/Down monitor resource of <strong>ExpressCluster</strong> can beoperated, but sometimes it cannot be operated.- Particularly in the following hardware, NIC Link Up/Down monitor resource of<strong>ExpressCluster</strong> may not be operated. Use IP monitor resource instead.- When hardware is installed between the actual LAN connector and NIC chip suchas a blade serverTo check if NIC Link Up/Down monitor resource can be used by using <strong>ExpressCluster</strong> on an actualmachine, follow the steps below to check the operation.1. Register NIC Link Up/Down monitor resource with the configuration in<strong>for</strong>mation.Select No Operation <strong>for</strong> the configuration of recovery operation of NIC Link Up/Downmonitor resource upon failure detection.2. Start the cluster.3. Check the status of NIC Link Up/Down monitor resource.If the status of NIC Link Up/Down monitor resource is abnormal while LAN cable link statusis normal, NIC Link Up/Down monitor resource cannot be operated.114<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Designing a system configuration4. If NIC Link Up/Down monitor resource status becomes abnormal when LAN cable link statusis made abnormal status (link down status), NIC Link Up/Down monitor resource cannot beoperated.If the status remains to be normal, NIC Link Up/Down monitor resource cannot be operated.Write function of the mirror disk resource and hybrid disk resourceA mirror disk and a hybrid disk resource write data in the disk of its own server and the diskof the remote server via network. Reading of data is done only from the disk on own server.Writing functions shows poor per<strong>for</strong>mance in mirroring when compared to writing to asingle server because of the reason provided above. For a system that requires through-put ashigh as single server, use a shared disk.Not outputting syslog to the mirror disk resource or the hybrid diskresourceDo not set directories or subdirectories which mounted the mirror disk resource or the hybrid diskresource as syslog output destination directories.When the mirror disk connection is disconnected, the I/O to the mirror partition may stop until thedisconnection is detected. The system may become abnormal because of the syslog outputstoppage at this time.When outputting syslog to the mirror disk resource or the hybrid disk resource is necessary,consider the followings. Use bonding as a way of path redundancy of the mirror disk connection. Adjust the user space monitoring timeout value or the mirror related timeout values.Notes when terminating the mirror disk resource or the hybrid diskresourceIn case that processes which access to the directories, subdirectories and files which mountedthe mirror disk resource or the hybrid disk resource exist, terminate the accesses to each diskresource by using ending script or other methods at deactivation of each disk resource likewhen shutdown or failover.Depending on the settings of each disk resource, action at abnormity detection whenunmounting (<strong>for</strong>cibly terminate processes while each disk resource is being accessed) mayoccur, or recovery action at deactivation failure caused by unmount failure (OS shutdown orother actions) may be executed.In case that a massive amount of accesses to directories, subdirectories or files whichmounted the mirror disk resource or hybrid disk resource are executed, it may take much timebe<strong>for</strong>e the cache of the file systems is written out to the disks when unmounting at diskresource deactivation.At times like this, set the timeout interval of unmount longer enough so that the writing to thedisks will successfully complete.For the details of this setting,see Chapter 4, "Group resource details" in Reference <strong>Guide</strong>,Settings Tab or Mirror Disk Resource Tuning Properties or Unmount Tab in DetailsTab in "Understanding mirror disk resources" or "Understanding mirror disk resources".Section II Installing <strong>ExpressCluster</strong>115


Chapter 5 Notes and RestrictionsData consistency among multiple asynchronous mirror disksIn mirror disk or hybrid disk with asynchronous mode, writing data to the data partition of theactive server is per<strong>for</strong>med in the same order as the data partition of the standby server.This writing order is guaranteed except during the initial mirror disk configuration or recovery(copy) period after suspending mirroring the disks. The data consistency among the files on thestandby data partition is guaranteed.However, the writing order is not guaranteed among multiple mirror disk resources and hybrid diskresources. For example, if a file gets older than the other and files that cannot maintain the dataconsistency are distributed to multiple asynchronous mirror disks, an application may not runproperly when it fails over due to server failure.For this reason, be sure to place these files on the same asynchronous mirror disk or hybrid disk.O_DIRECT <strong>for</strong> mirror or hybrid disk resourcesDo not use the O_DIRECT flag of the open() system call <strong>for</strong> mirror or hybrid disk resources.Examples include the Oracle parameter filesystemio_options = setall.Do not specify the disk monitor O_DIRECT mode <strong>for</strong> mirror or hybrid disk resources.Initial mirror construction time <strong>for</strong> mirror or hybrid disk resourcesThe time that takes to construct the initial mirror is different between ext2/ext3/ext4 and other filesystems.Mirror or hybrid disk connectWhen using redundant mirror or hybrid disk connect, both version of IP address are needed tobe the same.All the IP addresses used by mirror disk connect must be set to IPv4 or IPv6.JVM monitor resourcesUp to 25 Java VMs can be monitored concurrently. The Java VMs that can be monitoredconcurrently are those which are uniquely identified by the Builder (with Identifier in theMonitor (special) tab).Connections between Java VMs and Java Resource Agent do not support SSL.If, during the monitoring of Java VM, there is another process with the same name as themonitoring target, C heap monitoring may be per<strong>for</strong>med <strong>for</strong> a different monitoring target.It may not be possible to detect thread deadlocks. This is a known problem in Java VM. Fordetails, refer to "Bug ID: 6380127" in the Oracle Bug Database.Monitoring of the WebOTX process group is disabled when the process multiplicity is two ormore. WebOTX V8.4 and later can be monitored.The Java Resource Agent can monitor only the Java VMs on the server on which the JVMmonitor resources are running.The Java Resource Agent can monitor only one JBoss server instance per server.116<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Designing a system configurationThe Java installation path setting made by the Builder (with Java Installation Path in theJVM monitor tab in Cluster Property) is shared by the servers in the cluster. The versionand update of Java VM used <strong>for</strong> JVM monitoring must be the same on every server in thecluster.The management port number setting made by the Builder (with Management Port in theConnection Setting dialog box opened from the JVM monitor tab in Cluster Property) isshared by all the servers in the cluster.Application monitoring is disabled when an application to be monitored on the IA32 versionis running on an x86_64 version OS or when an application to be monitored on an x86_64version is running on an IA32 version OS.If a large value such as 3,000 or more is specified as the maximum Java heap size by theBuilder (by using Maximum Java Heap Size on the JVM monitor tab in Cluster Property),The Java Resource Agent will fail to start up. The maximum heap size differs depending onthe environment, so be sure to specify a value based on the capacity of the mounted systemmemory.Using SingleServerSafe is recommended if you want to use the target Java VM loadcalculation function of the coordination load balancer.Mail reportingThe mail reporting function is not supported by STARTTLS and SSL.Requirements <strong>for</strong> network warning lightWhen using “DN-1000S” or “DN-1500GL,” do not set your password <strong>for</strong> the warning light.To play an audio file as a warning, you must register the audio file <strong>for</strong> “DN-1500GL”be<strong>for</strong>ehand. For details on how to register an audio file, see the “DN-1500GL” operationmanual.Section II Installing <strong>ExpressCluster</strong>117


Chapter 5 Notes and RestrictionsInstalling operating systemNotes on parameters to be determined when installing an operating system, allocating resources,and naming rules are described in this section./opt/nec/clusterpro file systemIt is recommended to use a file system that has journaling functions to improve tolerance <strong>for</strong>system failure.Mirror disksDisk partitionExample: When adding one SCSI disk to each of both servers and making a pair of mirroreddisks:Server1Cluster partition/dev/sdb1Data partition/dev/sdb2Mirror partition deviceA failover unit ofmirror disk resourceServer2/dev/sdb1/dev/sdb2/dev/sdb/dev/sdbExample: When using free space of IDE disks of both servers, where the OS is stored, andmaking a pair of mirrored disks:OS root partition/dev/hda1Server1/dev/hdaOS swap partition/dev/hda2Cluster partition/dev/hda3Data partition/dev/hda4Mirror partition deviceA failover unit ofmirror disk resourceServer2/dev/hda1/dev/hda2/dev/hda3/dev/hda4/dev/hda• Mirror partition device refers to cluster partition and data partition.• Allocate cluster partition and data partition on each server as a pair.• It is possible to allocate a mirror partition (cluster partition and data partition) on the diskwhere the operating system resides (such as root partition and swap partition.).- When maintainability and per<strong>for</strong>mance are important:It is recommended to have a mirror disk that is not used by the operating system(such as root partition and swap partition.)- When LUN cannot be added due to hardware RAID specification: orWhen changing LUN configuration is difficult in hardware RAID pre-install model:118<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Installing operating systemIt is possible to allocate a mirror partition (cluster partition and data partition) on thedisk where the operating system resides (such as root partition and swap partition.)Disk configurationsMultiple disks can be used as mirror disks on a single server. Or, you can allocate multiplemirror partitions on a single disk.Example: When adding two SCSI disks to each of both servers and making two pairs ofmirrored disks:Server 1Server 2/dev/sdb/dev/sdb/dev/sdc/dev/sdc• Allocate two partitions, cluster partition and data partition, as a pair on each disk.• Use of the data partition as the first disk and the cluster partition as the second disk is notpermitted.Example: When adding one SCSI disk to each of both servers and making two mirrorpartitions:Server 1Server 2/dev/sdb/dev/sdbA disk does not support a <strong>Linux</strong> md stripe set, volume set, mirroring, and stripe set withparity.Section II Installing <strong>ExpressCluster</strong>119


Chapter 5 Notes and RestrictionsHybrid disksDisk partitionDisks that are shared or not shared (server with built-in disk, external disk chassis not sharedby servers etc.) can be used.Example) When two servers use a shared disk and the third server uses a built-in disk in theserver:Server 1 Server 2Shared diskCluster partition/dev/sdb1Data partition/dev/sdb2Mirror partition deviceUnit of failover <strong>for</strong> hybrid diskresourceServer 3/dev/sdb1/dev/sdb2/dev/sdbDisk HB partition/dev/sdb3/dev/sdbDependent library• Mirror partition device is a device <strong>ExpressCluster</strong> mirroring driver provides in the upper.• Allocate cluster partition and data partition on each server as a pair.• When a disk that is not shared (e.g. server with a built-in disk, external disk chassis that isnot shared among servers) is used, it is possible to allocate mirror partitions (clusterpartition and data partition) on the disk where the operating system resides (such as rootpartition and swap partition.).- When maintainability and per<strong>for</strong>mance are important:It is recommended to have a mirror disk that is not used by the operating system(such as root partition and swap partition.)- When LUN cannot be added due to hardware RAID specification: orWhen changing LUN configuration is difficult in hardware RAID pre-install model:It is possible to allocate mirror partitions (cluster partition and data partition) on thedisk where the operating system resides (such as root partition and swap partition.)• When a hybrid disk is allocated in a shared disk device, allocate a partition <strong>for</strong> the diskheart beat resource between servers sharing the shared disk device.• A disk does not support a <strong>Linux</strong> md stripe set, volume set, mirroring, and stripe set withparity.libxml2Install libxml2 when installing the operating system.Dependent driver120<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Installing operating systemsoftdogThis driver is necessary when softdog is used to monitor user mode monitor resource.Configure a loadable module. Static driver cannot be used.The major number of Mirror driverUse mirror driver’s major number 218. Do not use major number 218 <strong>for</strong> other device drivers.The major number of Kernel mode LAN heartbeat and keepalivedriversUse major number 10, minor number 240 <strong>for</strong> kernel mode LAN heartbeat driver.Use major number 10, minor number 241 <strong>for</strong> keepalive driver.Make sure to check that other drivers are not using major and minor numbers described above.Partition <strong>for</strong> RAW monitoring of disk monitor resourcesAllocate a partition <strong>for</strong> monitoring when setting up RAW monitoring of disk monitor resources.The partition size should be 10 MB.SE<strong>Linux</strong> settingsConfigure permissive or disabled <strong>for</strong> the SE<strong>Linux</strong> settings.If you set en<strong>for</strong>cing, communication required in <strong>ExpressCluster</strong> may not be achieved.NetworkManager settingsIf the NetworkManager service is running in a Red Hat Enterprise <strong>Linux</strong> 6 environment, anunintended behavior (such as detouring the communication path, or disappearance of the networkinterface) may occur upon disconnection of the network. It is recommended to setNetworkManager to stop the service.Section II Installing <strong>ExpressCluster</strong>121


Chapter 5 Notes and RestrictionsBe<strong>for</strong>e installing <strong>ExpressCluster</strong>Notes after installing an operating system, when configuring OS and disks are described in thissection.Communication port numberIn <strong>ExpressCluster</strong>, the following port numbers are used. You can change the port number by usingthe Builder except “keepalive between mirror drivers.".Make sure not to access the following port numbers from a program other than <strong>ExpressCluster</strong>.Configure to be able to access the port number below when setting a firewall on a server.Server to ServerLoopback in serversFrom To Used <strong>for</strong>Server Automaticallocation 1 → Server 29001/TCP Internal communicationServerServerServerServerServerServerServerServerAutomaticallocationAutomaticallocationAutomaticallocationAutomaticallocationAutomaticallocationAutomaticallocationAutomaticallocationAutomaticallocationServer icmp → Server icmpServerAutomaticallocation→→→→→→→→→Server 29002/TCP Data transferServer 29002/UDP HeartbeatServer 29003/UDP Alert synchronizationServer 29004/TCP Communicationbetween mirror agentsServer 29006/UDP Heartbeat (kernelmode)Server XXXX 2 /TCP Mirror disk resourcedata synchronizationServerXXXX 3 Communication/TCPbetween mirror driversServer XXXX 4 /TCP Communicationbetween mirror driverskeepalive betweenmirror drivers,duplication check <strong>for</strong>FIP/VIP resource andmirror agentServer XXXX 5 /UDP Internal logcommunicationWebManager to ServerFrom To Used <strong>for</strong>WebManagerAutomaticallocation→Server 29003/TCP http communication122<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Be<strong>for</strong>e installing <strong>ExpressCluster</strong>Server connected to the Integrated WebManager to Target serverFrom To Used <strong>for</strong>Server connectedto the IntegratedWebManagerServer to bemanaged by theIntegratedWebManagerAutomaticallocation→Server 29003/TCP http communication29003 → Client 29010/UDP UDP communicationOthersFrom To Used <strong>for</strong>ServerServerManagement LAN ofserver BMCManagement LAN ofserver BMCAutomaticallocationAutomaticallocationAutomaticallocationAutomaticallocation→→Networkwarning lightManagementLAN ofserver BMCSee the manual<strong>for</strong> each product.623/UDP→ Server 162/UDP→ManagementLAN ofserver BMCServer icmp → Monitoringtarget5570/UDPicmpServer icmp → NFS server icmpServer icmp → MonitoringtargetServerServerServerServerAutomaticallocationAutomaticallocationAutomaticallocationAutomaticallocation→→→→ServerMonitoringtargetMonitoringtargetBIG-IP LTMicmpManagementport number setby the BuilderConnection portnumber set bythe BuilderLoad balancerlinkagemanagementport number setby the BuilderCommunicationport number setby the BuilderNetwork warninglight controlBMC control (Forcedstop / Chassis lampassociation)Monitoring target ofthe external linkagemonitor configured<strong>for</strong> BMC linkageBMC HBcommunicationIP monitorChecking if NFSserver is active byNAS resourceMonitoring target ofPing methodnetwork partitionresolution resourceJVM monitorJVM monitorJVM monitorJVM monitor1. In automatic allocation, a port number not being used at a given time is allocated.2. This is a port number used per mirror disk resource or hybrid disk resource and is set whencreating mirror disk resource or hybrid disk resource. A port number 29051 is set by default.When you add a mirror disk resource or hybrid disk resource, this value is automaticallySection II Installing <strong>ExpressCluster</strong>123


Chapter 5 Notes and Restrictionsincremented by 1. To change the value, click Details tab in the [md] Resource Properties orthe [hd] Resource Properties dialog box of the Builder. For more in<strong>for</strong>mation, refer toChapter 4, “Group resource details” in the Reference <strong>Guide</strong>.3. This is a port number used per mirror disk resource or hybrid disk resource and is set whencreating mirror disk resource or hybrid disk resource. A port number 29031 is set by default.When you add a mirror disk resource or a hybrid disk resource, this value is automaticallyincremented by 1. To change the value, click Details tab in the [md] Resource Properties orthe [hd] Resource Properties dialog box of the Builder. For more in<strong>for</strong>mation, refer toChapter 4, “Group resource details” in the Reference <strong>Guide</strong>.4. This is a port number used per mirror disk resource or hybrid disk resource and is set whencreating mirror disk resource or hybrid disk resource. A port number 29071 is set by default.When you add a mirror disk resource or hybrid disk resource this value is automaticallyincremented by 1. To change the value, click Details tab in the [md] Resource Properties orthe [hd] Resource Properties dialog box of the Builder. For more in<strong>for</strong>mation, refer toChapter 4, “Group resource details” in the Reference <strong>Guide</strong>.5. Select UDP <strong>for</strong> the Communication Method <strong>for</strong> Internal Logs in the Port No. (Log) tab inCluster Properties. Use the port number configured in Port No. Communication port is notused <strong>for</strong> the default log communication method UNIX Domain.124<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Be<strong>for</strong>e installing <strong>ExpressCluster</strong>Changing the range of automatic allocation <strong>for</strong> the communicationport numbersThe range of automatic allocation <strong>for</strong> the communication port numbers managed by the OSmight overlap the communication port numbers used by <strong>ExpressCluster</strong>.Change the OS settings to avoid duplication when the range of automatic allocation <strong>for</strong> thecommunication numbers managed by OS and the communication numbers used by<strong>ExpressCluster</strong> are duplicated.Examples of checking and displaying OS setting conditions.The range of automatic allocation <strong>for</strong> the communication port numbers depends on thedistribution.# cat /proc/sys/net/ipv4/ip_local_port_range1024 65000This is the condition to be assigned <strong>for</strong> the range from 1024 to 65000 when the applicationrequests automatic allocation <strong>for</strong> the communication port numbers to the OS.# cat /proc/sys/net/ipv4/ip_local_port_range32768 61000This is the condition to be assigned <strong>for</strong> the range from 32768 to 61000 when the applicationrequests automatic allocation <strong>for</strong> the communication port numbers to the OS.Examples of OS settings changeAdd the line below to /etc/sysctl.conf. (When changing to the range from 30000 to65000)net.ipv4.ip_local_port_range = 30000 65000Clock synchronizationIn a cluster system, it is recommended to synchronize multiple server clocks regularly.Synchronize server clocks by using ntp.NIC device nameBecause of the ifconfig command specification, when the NIC device name is shortened, thelength of the NIC device name which <strong>ExpressCluster</strong> can handle depends on it.Shared diskWhen you continue using the data on the shared disk at times such as server reinstallation, donot allocate a partition or create a file system.The data on the shared disk gets deleted if you allocate a partition or create a file system.<strong>ExpressCluster</strong> controls the file systems on the shared disk. Do not include the file systemson the shared disk to /etc/fstab in operating system.See the Installation and Configuration <strong>Guide</strong> <strong>for</strong> steps <strong>for</strong> shared disk configuration.Section II Installing <strong>ExpressCluster</strong>125


Chapter 5 Notes and RestrictionsMirror diskSet a management partition <strong>for</strong> mirror disk resource (cluster partition) and a partition <strong>for</strong>mirror disk resource (data partition).<strong>ExpressCluster</strong> controls the file systems on mirror disks. Do not set the file systems on themirror disks to /etc/fstab in operating system.See the Installation and Configuration <strong>Guide</strong> <strong>for</strong> steps <strong>for</strong> mirror disk configuration.Hybrid diskConfigure the management partition (cluster partition) <strong>for</strong> hybrid disk resource and thepartition used <strong>for</strong> hybrid disk resource (data partition).When a hybrid disk is allocated in the shared disk device, allocate the partition <strong>for</strong> the diskheart beat resource between servers sharing the shared disk device.<strong>ExpressCluster</strong> controls the file systems on the hybrid disk. Do not include the file systemson the hybrid disk to /etc/fstab in operating system.See the Installation and Configuration <strong>Guide</strong> <strong>for</strong> steps <strong>for</strong> hybrid disk configuration.When using this <strong>ExpressCluster</strong> version, a file system must be manually created in a datapartition used by a hybrid disk resource. For details about what to do when a file system is notcreated in advance, see “Settings after configuring hardware” in Chapter 1 “ Determining asystem configuration” of the Installation and Configuration <strong>Guide</strong>.Adjusting OS startup timeIt is necessary to configure the time from power-on of each node in the cluster to the serveroperating system startup to be longer than the following: The time from power-on of the shared disks to the point they become available.Heartbeat timeout timeSee the Installation and Configuration <strong>Guide</strong> <strong>for</strong> configuration steps.Verifying the network settingsThe network used by Interconnect or Mirror disk connect is checked. It checks by all theservers in a cluster.See the Installation and Configuration <strong>Guide</strong> <strong>for</strong> configuration steps.ipmiutil and OpenIPMI126The following functions use ipmiutil or OpenIPMI.- Final Action at Activation Failure / Deactivation Failure- Monitor resource action upon failure- User space monitor- Shutdown monitor- Forcibly stopping a physical machine- Chassis Identifyipmiutil and OpenIPMI do not come with <strong>ExpressCluster</strong>. You need to download andinstall the rpm packages <strong>for</strong> ipmiutil and OpenIPMI.<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Be<strong>for</strong>e installing <strong>ExpressCluster</strong>Users are responsible <strong>for</strong> making decisions and assuming responsibilities. NEC does notsupport or assume any responsibilities <strong>for</strong>:- Inquires about ipmiutil and OpenIPMI themselves.- Tested operation of ipmiutil and OpenIPMI- Malfunction of ipmiutil and OpenIPMI or error caused by such malfunction.- Inquiries about whether or not ipmiutil and OpenIPMI are supported by servers.Check whether or not your server (hardware) supports ipmiutil and OpenIPMI inadvance.Note that even if the machine complies with ipmi standard as hardware, ipmiutil andOpenIPMI may not run if you actually try to run them.If you are using a software program <strong>for</strong> server monitoring provided by a server vendor, donot choose ipmi as a monitoring method <strong>for</strong> user space monitor resource and shutdown stallmonitor. Because these software programs <strong>for</strong> server monitoring and ipmiutil both useBMC (Baseboard Management Controller) on the server, a conflict occurs preventingsuccessful monitoring.User mode monitor resource (monitoring method: softdog)When softdog is selected as a monitoring method, make sure to set heartbeat that comeswith OS not to start.When it sets softdog in a monitor method in SUSE LINUX 10/11, it is impossible to usewith an i8xx_tco driver. When an i8xx_tco driver is unnecessary, make it the setting thati8xx_tco is not loaded.For Red Hat Enterprise <strong>Linux</strong> 6, when softdog is selected as a monitoring method,softdog cannot be used together with the iTCO_WDT dirver. If the iTCO_WDT driver isnot used, specify not to load iTCO_WDT.Log collectionThe designated function of the generation of the syslog does not work by a log collectionfunction in SUSE LINUX 10/11. The reason is because the suffixes of the syslog aredifferent.Please change setting of rotate of the syslog as follows to use the appointment of thegeneration of the syslog of the log collection function.Please comment out “compress” and “date ext” of the /etc/logrotate.d/syslogfile.nsupdate and nslookupThe following functions use nsupdate and nslookup.- Dynamic DNS resource of group resource (ddns)- Dynamic DNS monitor resource of monitor resource (ddnsw)<strong>ExpressCluster</strong> does not include nsupdate and nslookup. There<strong>for</strong>e, install the rmp filesof nsupdate and nslookup, in addition to the <strong>ExpressCluster</strong> installation.NEC does not support the items below regarding nsupdate and nslookup. Usensupdate and nslookup at your own risk.- Inquiries about nsupdate and nslookup- Guaranteed operations of nsupdate and nslookupSection II Installing <strong>ExpressCluster</strong>127


Chapter 5 Notes and Restrictions- Malfunction of nsupdate or nslookup or failure caused by such a malfunction- Inquiries about support of nsupdate and nslookup on each serverFTP monitor resourcesIf a banner message to be registered to the FTP server or a message to be displayed atconnection is long or consists of multiple lines, a monitor error may occur. When monitoringby the FTP monitor resource, do not register a banner message or connection message.128<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Notes when creating <strong>ExpressCluster</strong> configuration dataNotes when creating <strong>ExpressCluster</strong> configurationdataNotes when creating a cluster configuration data and be<strong>for</strong>e configuring a cluster system isdescribed in this section.Environment variableThe following processes cannot be executed in an environment in which more than 255environment variables are set. When using the following function of resource, set the number ofenvironmental variables less than 256. Group start/stop processNote:Start/Stop script executed by EXEC resource when activating/deactivatingScript executed by Custom monitor Resource when monitoringScript be<strong>for</strong>e final action after the group resource or the monitor resource error is detected.The total number of environment variables set in the system and <strong>ExpressCluster</strong>must be less than 256. About 30 environment variables are set in <strong>ExpressCluster</strong>.Force stop function, chassis identify lamp linkageWhen using <strong>for</strong>ced stop function or chassis identify lamp linkage, settings of BMC IP address, username and password of each server are necessary. Use definitely the user name to which thepassword is set.Server reset, server panic and power offWhen <strong>ExpressCluster</strong> per<strong>for</strong>ms “Server Reset”, “Server Panic,” or “Server power off”, servers arenot shut down normally. There<strong>for</strong>e, the following may occur. Damage to a mounted file systemLost of unsaved dataSuspension of OS dump collection“Server reset” or “Server panic” occurs in the following settings: Action at an error occurred when activating/inactivating group resources-Sysrq Panic-Keepalive Reset-Keepalive Panic-BMC Reset-BMC Power Off-BMC Power Cycle-BMC NMISection II Installing <strong>ExpressCluster</strong>Final action at detection of an error in monitor resource-Sysrq Panic-Keepalive Reset-Keepalive Panic-BMC Reset129


Chapter 5 Notes and Restrictions-BMC Power Off-BMC Power Cycle-BMC NMIAction at detection of user space monitor timeout- Monitoring method softdog- Monitoring method ipmi- Monitoring method keepaliveNote: “Server panic” can be set only when the monitoring method is “keepalive.”Shutdown stall mentoring- Monitoring method softdog- Monitoring method ipmi- Monitoring method keepaliveNote: “Server panic” can be set only when the monitoring method is “keepalive.” Operation of Forced Stop-BMC reset-BMC power off-BMC cycle-BMC NMI-VMware vSphere power offFinal action <strong>for</strong> group resource deactivation errorIf you select No Operation as the final action when a deactivation error is detected, the group doesnot stop but remains in the deactivation error status. Make sure not to set No Operation in theproduction environment.130<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Notes when creating <strong>ExpressCluster</strong> configuration dataVerifying raw device <strong>for</strong> VxVMCheck the raw device of the volume raw device in advance:1. Import all disk groups which can be activated on one server and activate all volumes be<strong>for</strong>einstalling <strong>ExpressCluster</strong>.2. Run the command below:# raw -qa/dev/raw/raw2: bound to major 199, minor 2/dev/raw/raw3: bound to major 199, minor 3(A)(B)Example: Assuming the disk group name and volume name are:• Disk group name: dg1• Volume name under dg1: vol1, vol23. Run the command below:# ls -l /dev/vx/dsk/dg1/brw------- 1 root root 199, 2 May 15 22:13 vol1brw------- 1 root root 199, 3 May 15 22:13 vol2(C)4. Confirm that major and minor numbers are identical between (B) and (C).Never use these raw devices (A) as an <strong>ExpressCluster</strong> disk heartbeat resource, raw resource, rawmonitor resource, disk resource <strong>for</strong> which the disk type is not VxVM, or disk monitor resource <strong>for</strong>which the monitor method is not READ(VxVM).Selecting mirror disk file systemFollowing is the currently supported file systems:ext3ext4xfsreiserfsjfsvxfsext4 operations are not proved <strong>for</strong> operating systems other than Red Hat Enterprise <strong>Linux</strong> 6.Selecting hybrid disk file systemThe following are the currently supported file systems:ext3ext4reiserfsSection II Installing <strong>ExpressCluster</strong>131


Chapter 5 Notes and RestrictionsSetting of mirror or hybrid disk resource actionIn a system that uses mirror or hybrid disks, do not set the monitoring resources final action to Stopthe cluster service.If only the cluster service is stopped while the mirror agent is active, hybrid disk control orcollecting mirror disk status may fail.Time to start a single serve when many mirror disks are defined.If many mirror disk resources are defined and a short time is set to Time to wait <strong>for</strong> the otherservers to start up, it may take time to start a mirror agent and mirror disk resources and monitorresources related to mirror disks may not start properly when a single server is started.If such an event occurs when starting a single server, change the value set to the time to wait <strong>for</strong>synchronization to a large value (by selecting Cluster Properties - Timeout tab - Server SyncWait Time).RAW monitoring of disk monitor resourcesWhen raw monitoring of disk monitor resources is set up, partitions cannot be monitored ifthey have been or will possibly be mounted. These partitions cannot be monitored even if youset device name to “whole device” (device indicating the entire disks).Allocate a partition dedicated to monitoring and set up the partition to use the raw monitoringof disk monitor resources.Delay warning rateIf the delay warning rate is set to 0 or 100, the following can be achieved: When 0 is set to the delay monitoring rateAn alert <strong>for</strong> the delay warning is issued at every monitoring.By using this feature, you can calculate the polling time <strong>for</strong> the monitor resource at the timethe server is heavily loaded, which will allow you to determine the time <strong>for</strong> monitoringtime-out of a monitor resource.When 100 is set to the delay monitoring rateThe delay warning will not be issued.Be sure not to set a low value, such as 0%, except <strong>for</strong> a test operation.Disk monitor resource (monitoring method TUR)You cannot use the TUR methods on a disk or disk interface (HBA) that does not support theTest Unit Ready (TUR) and SG_IO commands of SCSI. Even if your hardware supportsthese commands, consult the driver specifications because the driver may not support them.S-ATA disk interface may be recognized as IDE disk interface (hd) or SCSI disk interface(sd) by OS depending on disk controller type and distribution. When it is recognized as IDEinterface, all TUR methods cannot be used. If it is recognized as SCSI disk interface, TUR(legacy) can be used. Note that TUR (generic) cannot be used.TUR methods burdens OS and disk load less compared to Read methods.In some cases, TUR methods may not be able to detect errors in I/O to the actual media.132<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Notes when creating <strong>ExpressCluster</strong> configuration dataWebManager reload intervalDo not set the “Reload Interval” in the WebManager tab <strong>for</strong> less than 30 seconds.LAN heartbeat settingsAs a minimum, you need to set either the LAN heartbeat resource or kernel mode LANheartbeat resource.You need to set at least one LAN heartbeat resource. It is recommended to set two or moreLAN heartbeat resources.It is recommended to set both LAN heartbeat resource and kernel mode LAN heartbeatresource together.Kernel mode LAN heartbeat resource settingsAs a minimum, you need to set either the LAN heartbeat resource or kernel mode LANheartbeat resource.It is recommended to use kernel mode LAN heartbeat resource <strong>for</strong> distribution kernel ofwhich kernel mode LAN heartbeat can be used.COM heartbeat resource settingsIt is recommended to use a COM heartbeat resource if your environments allows. This isbecause using COM heartbeat resource prevents activating both systems when the network isdisconnected.BMC heartbeat settingsThe hardware and firmware of the BMC must support BMC heartbeat. For available BMCs,see Chapter 3, "Servers supporting BMC-related functions" in the <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>.IP address <strong>for</strong> Integrated WebManager settingsPublic LAN IP address setting, <strong>ExpressCluster</strong> X2.1 or be<strong>for</strong>e, is available in the Builder atIP address <strong>for</strong> Integrated WebManger which is on the WebManager tab of ClusterProperties.Double-byte character set that can be used in script commentsScripts edited in <strong>Linux</strong> environment are dealt as EUC code, and scripts edited in Windowsenvironment are dealt as Shift-JIS code. In case that other character codes are used, charactercorruption may occur depending on environment.Failover exclusive attribute of virtual machine groupWhen setting virtual machine group, do not set Normal or Absolute to Failover exclusiveattribute.Section II Installing <strong>ExpressCluster</strong>133


Chapter 5 Notes and RestrictionsSystem monitor resource settingsPattern of detection by resource monitoringThe System Resource Agent detects by using thresholds and monitoring duration time asparameters.The System Resource Agent collects the data (number of opened files, number of userprocesses, number of threads, used size of memory, CPU usage rate, and used size of virtualmemory) on individual system resources continuously, and detects errors when data keepsexceeding a threshold <strong>for</strong> a certain time (specified as the duration time).Message receive monitor resource settingsError notification to message receive monitor resources can be done in any of three ways:using the clprexec command, BMC linkage, or linkage with the server managementinfrastructure.To use the clprexec command, use the relevant file stored on the <strong>ExpressCluster</strong> CD. Usethis method according to the OS and architecture of the notification-source server. Thenotification-source server must be able to communicate with the notification-destinationserver.To use BMC linkage, the BMC hardware and firmware must support the linkage function.For available BMCs, see "Hardware Servers supporting BMC-related functions" on page 53in Chapter 3, "Installation requirements <strong>for</strong> <strong>ExpressCluster</strong>" in this guide. This methodrequires communication between the IP address <strong>for</strong> management of the BMC and the IPaddress of the OS.For the linkage with the server management infrastructure, see Chapter 9, "Linkage withServer Management Infrastructure" in the Reference <strong>Guide</strong>.134<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Notes when creating <strong>ExpressCluster</strong> configuration dataJVM monitor resource settingsWhen the monitoring target is the WebLogic Server, the maximum values of the followingJVM monitor resource settings may be limited due to the system environment (including theamount of installed memory):• The number under Monitor the requests in Work Manager• Average under Monitor the requests in Work Manager• The number of Waiting Requests under Monitor the requests in Thread Pool• Average of Waiting Requests under Monitor the requests in Thread Pool• The number of Executing Requests under Monitor the requests in Thread Pool• Average of Executing Requests under Monitor the requests in Thread PoolWhen the monitoring-target is a 64-bit JRockit JVM, the following parameters cannot bemonitored because the maximum amount of memory acquired from the JRockit JVM is anegative value that disables the calculation of the memory usage rate:• Total Usage under Monitor Heap Memory Rate• Nursery Space under Monitor Heap Memory Rate• Old Space under Monitor Heap Memory Rate• Total Usage under Monitor Non-Heap Memory Rate• ClassMemory under Monitor Non-Heap Memory RateTo use the Java Resource Agent, install the Java runtime environment (JRE) described in"Operation environment <strong>for</strong> JVM monitor" in Chapter 3, "Installation requirements <strong>for</strong><strong>ExpressCluster</strong>" You can use either the same JRE as that used by the monitoring target(WebLogic Server or WebOTX) or a different JRE.The monitor resource name must not include a blank.Section II Installing <strong>ExpressCluster</strong>135


Chapter 5 Notes and RestrictionsAfter start operating <strong>ExpressCluster</strong>Notes on situations you may encounter after start operating <strong>ExpressCluster</strong> are described in thissection.Error message in the load of the mirror driver in an environmentsuch as udevIn the load of the mirror driver in an environment such as udev, logs like the following may berecorded into the message file:kernel: [I] NMP1 device does not exist.(liscal_make_request)kernel: [I] - This message can berecorded on udev environment when liscal is initializing NMPx.kernel: [I] - Ignore this and followingmessages 'Buffer I/O error on device NMPx' on udev environment.kernel: Buffer I/O error on device NMP1, logical block 0kernel: NMP1 device does not exist.kernel: Buffer I/O error on device NMP1, logical block 112This phenomenon is not abnormal.When you want to prevent the output of the error message in the udev environment, add thefollowing file in /etc/udev/rules.d.filename: 50-liscal-udev.rulesACTION=="add", DEVPATH=="/block/NMP*", OPTIONS+="ignore_device"ACTION=="add", DEVPATH=="/devices/virtual/block/NMP*", OPTIONS+="ignore_device"File operating utility on X-WindowSome of the file operating utilities (coping and moving files and directories via GUI) onX-Window per<strong>for</strong>m the following: Checks if the block device is usable.Mounts the file system if there is any that can be mounted.Make sure not to use file operating utility that per<strong>for</strong>m above operations. They may cause problemto the operation of <strong>ExpressCluster</strong>.136<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


After start operating <strong>ExpressCluster</strong>Messages displayed when loading a driverWhen loading a mirror driver, messages like the following may be displayed at the console and/orsyslog. However, this is not an error.kernel: liscal: no version <strong>for</strong> "xxxxx" found: kernel tainted.kernel: liscal: module license 'unspecified' taints kernel.(Any character strings are set to xxxxx.)And also, when loading the clpka or clpkhb driver, messages like the following may bedisplayed on the console and/or syslog. However, this is not an error.kernel: clpkhb: no version <strong>for</strong> "xxxxx" found: kernel tainted.kernel: clpkhb: module license 'unspecified' taints kernel.kernel: clpka: no version <strong>for</strong> "xxxxx" found: kernel tainted.kernel: clpka: module license 'unspecified' taints kernel.(Any character strings are input into xxxxx.)Messages displayed <strong>for</strong> the first I/O to mirror disk resources orhybrid disk resourcesWhen reading/writing data from/to a mirror disk resource or hybrid disk resource <strong>for</strong> the first timeafter the resource was mounted, a message like the following may be displayed at the consoleand/or syslog. However, this is not an error.kernel: JBD: barrier-based sync failed on NMPx – disabling barriers(Any character strings are set to x.)Section II Installing <strong>ExpressCluster</strong>137


Chapter 5 Notes and RestrictionsMessages written to syslog when multiple mirror disk resources orhybrid disk resources are usedWhen more than two mirror disk resources or hybrid disk resources are configured on a cluster, thefollowing messages may be written to the OS message files when the resources are activated.This occurs by a fsck command function (function to access a device block which is not a targetof fsck) on some distributions.kernel: [I] NMPx I/O port has beenclosed, mount(0), io(0).kernel: [I] - This message can berecorded by fsck command when NMPx becomes active.kernel: [I] - This message can berecorded on hotplug service starting when NMPx is not active.kernel: [I] - Ignore this and followingmessages 'Buffer I/O error on device NMPx' on such environment.kernel: Buffer I/O error on device /dev/NMPx, logical block xxxxkernel: NMPx I/O port is close,mount(0), io(0).kernel: Buffer I/O error on device /dev/NMPx, logical block xxxxIPMI messageThis is not a problem <strong>for</strong> <strong>ExpressCluster</strong>. If this causes any problem such as heavy use of messagefiles, change the following settings of mirror disk resources or hybrid disk resources.- Select “Not Execute” on “fsck action be<strong>for</strong>e mount”- Select “Execute” on “fsck Action When Mount Failed”When you are using ipmi <strong>for</strong> user mode monitor resources, the following kernel module warninglog is recorded many times in the syslog.modprobe: modprobe: Can`t locate module char-major-10-173When you want to prevent this log from being recorded, rename /dev/ipmikcs.138<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


After start operating <strong>ExpressCluster</strong>Limitations during the recovery operationDo not control the following commands, clusters and groups by the WebManager while recoveryprocessing is changing (reactivation → failover → last operation), if a group resource is specifiedas a recovery target and when a monitor resource detects an error. Stop and suspend of a clusterStart, stop, moving of a groupIf these operations are controlled at the transition to recovering due to an error detected by amonitor resource, the other group resources in the group may not be stopped.Even if a monitor resource detects an error, it is possible to control the operations above after thelast operation is per<strong>for</strong>med.Executable <strong>for</strong>mat file and script file not described in manualsExecutable <strong>for</strong>mat files and script files which are not described in Chapter 4, ”<strong>ExpressCluster</strong>command reference” in the Reference <strong>Guide</strong> exist under the installation directory. Do not run thesefiles on any system other than <strong>ExpressCluster</strong>. The consequences of running these files will not besupported.Message of kernel page allocation errorWhen using the Replicator on the Turbo<strong>Linux</strong> 10 Server, the following message may be recordedin syslog. However, it may not be recorded depending on the physical memory size and I/O load.kernel: [kernel Module Name]: page allocation failure. order:X,mode:0xXXWhen this message is recorded, you need to change the kernel parameter described below. Byusing the sysctl command, make the settings to change the parameter when starting OS./proc/sys/vm/min_free_kbytesThe maximum value that can be set to min_free_kbytes is different depending on thephysical memory size installed on the server. Make the settings by referring to the table below:Physical memory size(Mbyte)Maximum value(Mbyte)1024 10242048 14484096 20488192 289616384 4096Section II Installing <strong>ExpressCluster</strong>139


Chapter 5 Notes and RestrictionsCache swell by a massive I/OIn case that a massive amount of write over the disk capability to the mirror disk resource orthe hybrid disk resource are executed, even though the mirror connection is alive, the controlfrom write may not return or memory allocation failure may occur.In case that a massive amount of I/O requests over transaction per<strong>for</strong>mance exist, and then thefile system ensure a massive amount of cache and the cache or the memory <strong>for</strong> the user space(HIGHMEM zone) are insufficient, the memory <strong>for</strong> the kernel space (NORMAL zone) maybe used.Change the settings so that the parameter will be changed at OS startup by using sysctl orother commands./proc/sys/vm/lomem_reserve_ratioIn case that a massive amount of accesses to the mirror disk resource or the hybrid diskresource are executed, it may take much time be<strong>for</strong>e the cache of the file systems is writtenout to the disks when unmounting at disk resource deactivation.If, at this moment, the unmounting times out be<strong>for</strong>e the writing from the file system to thedisks is completed, I/O error messages or unmount failure messages like those shown belowmay be recorded.In this case, change the unmount timeout length <strong>for</strong> the disk resource in question to anadequate value such that the writing to the disk will be normally completed.Example 1:clusterpro: [I] Stopping mdx resource hasstarted.kernel: [I] NMPx close I/O port OK.kernel: [I] NMPx close mount port OK.kernel: [I] NMPx I/O port has been closed,mount(0), io(0).kernel: [I] - This message can be recordedon hotplug service starting when NMPx is not active.kernel: [I] - This message can be recordedby fsck command when NMPx becomes active.kernel: [I] - Ignore this and followingmessages 'Buffer I/O error on device NMPx' on such environment.kernel: Buffer I/O error on device NMPx, logical block xxxxkernel: [I] NMPx I/O port has been closed,mount(0), io(0).kernel: Buffer I/O error on device NMPx, logical block xxxx:140<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


After start operating <strong>ExpressCluster</strong>Example 2:clusterpro: [I] Stopping mdx resource hasstarted.kernel: [I] NMPx holder 1. (be<strong>for</strong>e umount)clusterpro: [E] umount timeout. Make sure thatthe length of Unmount Timeout is appropriate. (Device:mdx):clusterpro: [E] Failed to deactivate mirrordisk. Umount operation failed.(Device:mdx)kernel: [I] NMPx holder 1. (after umount)clusterpro: [E] Stopping mdx resource hasfailed.(83 : System command timeout (umount, timeout=xxx)):When multiple mounts are specified <strong>for</strong> a resource like a mirror diskresource If, after activation of a mirror disk resource or hybrid disk resource, you have created anadditional mount point in a different location by using the mount command <strong>for</strong> the mirrorpartition device (/dev/NMPx) or the mount point (or a part of the file hierarchy <strong>for</strong> the mountpoint), you must unmount that additional mount point be<strong>for</strong>e the disk resource is deactivated.If the deactivation is per<strong>for</strong>med without the additional mount point being unmounted, the filesystem data remaining in memory may not be completely written out to the disks. As a result,the I/O to the disks is closed and the deactivation is completed although the data on the disksare incomplete.Because the file system will still try to continue writing to the disks even after the deactivationis completed, I/O error messages like those shown below may be recorded.After this, an attempt to stop the mirror agent, such as when stopping the server, will fail, sincethe mirror driver cannot be terminated. This may cause the server to restart.Example:clusterpro: [I] Stopping mdx resource hasstarted.kernel: [I] NMP1 holder 1. (be<strong>for</strong>e umount)kernel: [I] NMP1 holder 1. (after umount)kernel: [I] NMPx close I/O port OK.kernel: [I] NMPx close mount port OK.clusterpro: [I] Stopping mdx resource hascompleted.kernel: [I] NMPx I/O port has been closed,mount(0), io(0).kernel: [I] - This message can be recordedon hotplug service starting when NMPx is not active.kernel: [I] - This message can be recordedby fsck command when NMPx becomes active.Section II Installing <strong>ExpressCluster</strong>141


Chapter 5 Notes and Restrictionskernel: [I] - Ignore this and followingmessages 'Buffer I/O error on device NMPx' on such environment.kernel: Buffer I/O error on device NMPx, logical block xxxxxkernel: lost page write due to I/O error on NMPxkernel: [I] NMPx I/O port has been closed,mount(0), io(0).kernel: Buffer I/O error on device NMPx, logical block xxxxxkernel: lost page write due to I/O error on NMPx:Executing fsckWhen fsck is specified to execute at activation of disk resources, mirror disk resources, orhybrid disk resources, fsck is executed when an ext2/ext3/ext4 file system is mounted.Executing fsck may take times depending on the size, usage or status of the file system,resulting that an fsck timeout occurs and mounting the file system fails.This is because fsck is executed in either of the following ways.(a) Only per<strong>for</strong>ming simplified journal check.Executing fsck does not take times.(b) Checking consistency of the entire file system.When the data saved by OS has not been checked <strong>for</strong> 180 days or more or the data willbe checked after it is mounted around 30 times.In this case, executing fsck takes times depending the size or usage of the file system.Specify a time in safe <strong>for</strong> the fsck timeout of disk resources so that no timeout occurs.When fsck is specified not to execute at activation of disk resources, mirror disk resources,or hybrid disk resources, the warning described below may be displayed on the consoleand/or syslog when an ext2/ext3/ext4 file system is mounted more than the mount executioncount set to OS that it is recommended to execute fsck.EXT2-fs warning: xxxxx, running e2fsck is recommended.Note: There are multiple patterns displayed in xxxxx.It is recommended to execute fsck when this waning is displayed.Follow the steps below to manually execute fsck.Be sure to execute the following steps on the server where the disk resource in question hasbeen activated.(1) Deactivate a group to which the disk resource in question belongs by using a commandsuch as clpgrp.(2) Confirm that no disks have been mounted by using a command such as mount and df.(3) Change the state of the disk from Read Only to Read Write by executing one of thefollowing commands depending on the disk resource type.Example <strong>for</strong> disk resources: A device name is /dev/sbd5# clproset –w –d /dev/sbd5/dev/sbd5 : success142<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


After start operating <strong>ExpressCluster</strong>Example <strong>for</strong> mirror disk resources: A resource name is md1.# clpmdctrl –-active –nomount md1: active successfullyExample <strong>for</strong> hybrid disk resources: A resource name is hd1.# clphdctrl –-active –nomount hd1: active successfully(4) Execute fsck.(If you specify the device name <strong>for</strong> fsck execution in the case of a mirror disk resource orhybrid disk resource, specify the mirror partition device name (/dev/NMPx)corresponding to the resource.)(5) Change the state of the disk from Read Write to Read Only by executing one of thefollowing commands depending on the disk resource type.Example <strong>for</strong> disk resources: A device name is /dev/sbd5.# clproset –o –d /dev/sbd5/dev/sbd5 : successExample <strong>for</strong> mirror disk resources: A resource name is md1.# clpmdctrl –-deactive md1: active successfullyExample <strong>for</strong> hybrid disk resources: A resource name is hd1.# clphdctrl –-deactive –nomount hd1: active successfully(6) Activate a group to which the disk resource in question belongs by using a command suchas clpgrp.If you need to specify that the warning message is not output without executing fsck, <strong>for</strong>ext2/ext3/ext4, change the maximum mount count by using tune2fs. Be sure to execute thiscommand on the server where the disk resource in question has been activated.(1) Execute one of the following commands..Example <strong>for</strong> disk resources: A device name is /dev/sbd5.# tune2fs –c -l /dev/sbd5tune2fs 1.27 (8-Mar-2002)Setting maximal mount count to -1Example <strong>for</strong> mirror disk resources: A resource name is /dev/NMP1.# tune2fs –c -l /dev/NMP1tune2fs 1.27 (8-Mar-2002)Setting maximal mount count to -1Section II Installing <strong>ExpressCluster</strong>143


Chapter 5 Notes and RestrictionsExample <strong>for</strong> hybrid disk resources: A resource name is /dev/NMP1.# tune2fs –c -l /dev/NMP1tune2fs 1.27 (8-Mar-2002)Setting maximal mount count to -1(2) Confirm that the maximum mount count has been changed.Example: A device name is /dev/sbd5.# tune2fs -l /dev/sbd5tune2fs 1.27 (8-Mar-2002)Filesystem volume name: :Maximum mount count: -1:Messages when collecting logsWhen collecting logs, the message described below is displayed at the console, but this is not anerror. Logs are collected successfully.hd#: bad special flag: 0x03ip_tables: (C) 2000-2002 Netfilter core team(“hd#” is replaced with the device name of IDE.)kernel: Warning: /proc/ide/hd?/settings interface is obsolete, andwill be removed soon!Failover and activation during mirror recoveryWhen mirror recovery is in progress <strong>for</strong> a mirror disk resource or hybrid disk resource, a mirrordisk resource or hybrid disk resource placed in the deactivated state cannot be activated.During mirror recovery, a failover group including the disk resource in question cannot be moved.If a failover occurs during mirror recovery, it will fail.During mirror recovery, the copy destination server does not have the latest status, so a failover tothe copy destination server or copy destination server group will fail.Even if an attempt to fail over a hybrid disk resource to the same server group is made by actions<strong>for</strong> when a monitor resource detects an error, it will fail, too, since the current server is not changed.Note that, depending on the timing when mirror recovery is completed during a failover, move, oractivation, the operation may be successful.Cluster shutdown and reboot (mirror disk resource and hybrid diskresource)144When using a mirror disk resource or a hybrid disk resource, do not execute cluster shutdown orcluster shutdown reboot from the clpstdn command or the WebManager while a group is beingactivated.<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


After start operating <strong>ExpressCluster</strong>A group cannot be deactivated while a group is being activated. There<strong>for</strong>e, OS may be shut downin the state that mirror disk resource or hybrid disk resources is not deactivated successfully and amirror break may occur.Shutdown and reboot of individual server (mirror disk resource andhybrid disk resource)When using a mirror disk and a hybrid disk resource, do not shut down the server or run theshutdown reboot command from the clpdown command or the WebManager while activating thegroup.A group cannot be deactivated while a group is being activated. There<strong>for</strong>e, OS may be shut downand a mirror break may occur in the state that mirror disk resources and hybrid disk resources arenot deactivated successfully.Scripts <strong>for</strong> starting/stopping <strong>ExpressCluster</strong> servicesErrors occur in starting/stopping scripts as follows: After installing <strong>ExpressCluster</strong> (For SUSE <strong>Linux</strong>)When a server shutdown, the error occurs in the following stopping scripts. There is noproblem <strong>for</strong> the error because services have not started.• clusterpro_alertsync• clusterpro_webmgr• clusterpro• clusterpro_md• clusterpro_trn• clusterpro_evtBe<strong>for</strong>e start operating <strong>ExpressCluster</strong>When a server start up, the error occurs in the following starting scripts. There is no problem<strong>for</strong> the error because cluster configuration data has not uploaded.• clusterpro_mdAfter start operating <strong>ExpressCluster</strong> (For SUSE <strong>Linux</strong>)When mirror disk resources and hybrid disk resources are not used, the error occurs instopping scripts at OS shutdown. There is no problem <strong>for</strong> the error because mirror agent hasnot started.• clusterpro_mdOS shutdown after stopping services manually (Fro SUSE <strong>Linux</strong>)After stopping services manually, the error occurs in the following stopping scripts at OSshutdown. There is no problem <strong>for</strong> the error because services have already stopped.• clusterpro• clusterpro_mdAt following case, the script to terminate <strong>ExpressCluster</strong> services may be executed in the wrongorder.<strong>ExpressCluster</strong> services may be terminated in the wrong order at OS shutdown if all of<strong>ExpressCluster</strong> services are disabled. This problem is caused by failure in terminationprocess <strong>for</strong> the service has been already disabled.Section II Installing <strong>ExpressCluster</strong>145


Chapter 5 Notes and RestrictionsService startup timeAs long as the system shutdown is executed by WebManger or clpstdn command, there isno problem even if the services is terminated in the wrong order. But, any other problem maynot be happened by wrong order termination.<strong>ExpressCluster</strong> services might take a while to start up, depending on the wait processing at startup. clusterpro_evtServers other than the master server wait up to two minutes <strong>for</strong> configuration data to bedownloaded from the master server. Downloading usually finishes within several seconds ifthe master server is already operating. The master server does not have this wait process. clusterpro_trnThere is no wait process. This process usually finishes within several seconds. clusterpro_mdThis service starts up only when the mirror or hybrid disk resources exist. The system waitsup to one minute <strong>for</strong> the mirror agent to normally start up. This process usually finisheswithin several seconds. clusterproAlthough there is no wait process, <strong>ExpressCluster</strong> might take several tens of seconds to startup. This process usually finishes within several seconds. clusterpro_webmgrThere is no wait process. This process usually finishes within several seconds. clusterpro_alertsyncThere is no wait process. This process usually finishes within several seconds.In addition, the system waits <strong>for</strong> cluster activation synchronization after the <strong>ExpressCluster</strong>daemon is started. By default, this wait time is five minutes.For details, see Chapter 10, “The system maintenance in<strong>for</strong>mation” in the Reference <strong>Guide</strong>.Scripts in EXEC resourcesEXEC resource scripts of group resources stored in the following location./opt/nec/clusterpro/scripts/group-name/resource-name/The following cases, old EXEC resource scripts are not deleted automatically.• When the EXEC resource is deleted or renamed• When a group that belongs to the EXEC resource is deleted or renamedOld EXEC resource scripts can be deleted when unnecessary.Monitor resources that monitoring timing is “Active”When monitor resources that monitoring timing is “Active” have suspended and resumed, thefollowing restriction apply: In case stopping target resource after suspending monitor resource, monitor resourcebecomes suspended. As a result, monitoring restart cannot be executed.In case stopping or starting target resource after suspending monitor resource, monitoring bymonitor resource starts when target resource starts.146<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


After start operating <strong>ExpressCluster</strong>Notes on the WebManagerThe in<strong>for</strong>mation displayed on the WebManager does not necessarily show the latest status. Ifyou want to get the latest in<strong>for</strong>mation, click the Reload button.If the problems such as server shutdown occur while the WebManager is getting thein<strong>for</strong>mation, acquiring in<strong>for</strong>mation may fail and a part of object may not be displayedcorrectly. Wait <strong>for</strong> the next automatic update or click the Reload button to reacquire thelatest in<strong>for</strong>mation.When using a browser on <strong>Linux</strong>, a dialog box may be displayed behind the window managersdepending on the combination of the managers. Change the window by pressing the ALT +TAB keys.Collecting logs of <strong>ExpressCluster</strong> cannot be executed from two or more WebManagersimultaneously.If the WebManager is operated in the state that it cannot communicate with the connectiondestination, it may take a while until the control returns.If you move the cursor out of the browser in the state that the mouse pointer is displayed as awristwatch or hourglass, the cursor may be back to an arrow.When going through the proxy server, make the settings <strong>for</strong> the proxy server be able to relaythe port number of the WebManager.When going through the reverse proxy server, the WebManager will not operate properly.When updating <strong>ExpressCluster</strong>, close all running browsers. Clear the Java cache (notbrowser cache) and open browsers.When updating Java, close all running browsers. Clear the Java cache (not browser cache)and open browsersNotes on the Builder (Config mode of Cluster Manager)<strong>ExpressCluster</strong> does not have the compatibility of the cluster configuration data with thefollowing products.• Builder <strong>for</strong> <strong>Linux</strong> other than <strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong>Cluster configuration data created using a later version of this product cannot be used withthis product.Cluster configuration data of <strong>ExpressCluster</strong> X1.0/2.0/2.1/3.0/<strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> can be used withthis product.You can use such data by clicking Import from the File menu in the Builder.Closing the Web browser (by clicking Exit from the menu), the dialog box to confirm to saveis displayed.When you continue to edit, click the Cancel button.Section II Installing <strong>ExpressCluster</strong>147


Chapter 5 Notes and RestrictionsNote:This dialog box is not displayed if JavaScript is disabled.Reloading the Web browser (by selecting Refresh button from the menu or tool bar), thedialog box to confirm to save is displayed.When you continue to edit, click the Cancel button.Note:This dialog box is not displayed if JavaScript is disabled.When creating the cluster configuration data using the Builder, do not enter the value startingwith 0 on the text box. For example, if you want to set 10 seconds <strong>for</strong> a timeout value, enter“10” but not “010.”When going through the reverse proxy server, the Builder will not operate properly.Changing the partition size of mirror disks and hybrid diskresourcesWhen changing the size of mirror partitions after the operation is started, see “Changing offset orsize of a partition on mirror disk resource” in Chapter 10 “The system maintenance in<strong>for</strong>mation” inthe Reference <strong>Guide</strong>.Changing kernel dump settingsIf you are changing the kdump settings and "applying" them through "kernel dump configuration"(system-config-kdump) while the cluster is running on Red Hat Enterprise <strong>Linux</strong> 6 or the like, youmay see the following error message output.In this case, stop the cluster once (stop the mirror agent as well as the cluster when using a mirrordisk resource or hybrid disk resource), and then retry the kernel dump configuration.* The following {driver_name} indicates clpka, clpkhb, or liscal.No module {driver_name} found <strong>for</strong> kernel {kernel_version}, abortingNotes on floating IP and virtual IP resourcesDo not execute a network restart on a server on which floating IP resources are active. If thenetwork is restarted, any IP addresses that have been added as floating IP resources aredeleted.148<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


After start operating <strong>ExpressCluster</strong>Notes on system monitor resourcesTo change a setting, the cluster must be suspended.System monitor resources do not support a delay warning <strong>for</strong> monitor resources.If the date or time setting on the OS is changed by the data(1) command or anothermethod while a system monitor resource is operating, that system monitor resource may failto operate normally.If you have changed the date or time setting on the OS, suspend and then resume the cluster.Set SE<strong>Linux</strong> to either the permissive or disabled state.If SE<strong>Linux</strong> is set to the en<strong>for</strong>cing state, the communication required <strong>for</strong> <strong>ExpressCluster</strong> maybe disabled.If the “system monitor” is not displayed in the Type field of the monitor resource definitiondialog box, update the server in<strong>for</strong>mation by selecting Update Server Data from the Filemenu in the Builder.Up to 64 disks that can be monitored by the disk resource monitor function at the same time.Notes on JVM monitor resourcesWhen restarting the monitoring-target Java VM, suspend or shut down the cluster be<strong>for</strong>erestarting the Java VM.To change a setting, the cluster must be suspended.JVM monitor resources do not support a delay warning <strong>for</strong> monitor resources.Notes on final action (group stop) at detection of a monitorresource errorWhen the final action (group stop) has been executed, suspend and resume the cluster, orrestart the cluster <strong>for</strong> the server.If the group is started on the server on which the final action (group stop) has been executed,the recovery action <strong>for</strong> the group from the monitor resource will not be executed.Section II Installing <strong>ExpressCluster</strong>149


Chapter 5 Notes and RestrictionsNotes when changing the <strong>ExpressCluster</strong>configurationThe section describes what happens when the configuration is changed after starting to use<strong>ExpressCluster</strong> in the cluster configuration.Failover exclusive attribute of group propertiesWhen the failover exclusive attribute is changed, the change is reflected by suspending andresuming the cluster.If the failover exclusive attribute is changed from No exclusion or Normal to Absolute, multiplegroups of Absolute may be started on the same server depending on the group startup status be<strong>for</strong>esuspending the cluster.Exclusive control will be per<strong>for</strong>med at the next group startup.Dependency between resource propertiesWhen the dependency between resources has been changed, the change is applied by suspendingand resuming the cluster.If a change in the dependency between resources that requires the resources to be stopped duringapplication is made, the startup status of the resources after the resume may not reflect the changeddependency.Dependency control will be per<strong>for</strong>med at the next group startup.150<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Updating <strong>ExpressCluster</strong>Updating <strong>ExpressCluster</strong>This section describes notes on updating <strong>ExpressCluster</strong> after starting cluster operation.If the alert destination setting is changedIf the alert destination setting is changed in the previous version, per<strong>for</strong>m the following procedureafter updating <strong>ExpressCluster</strong>. This procedure applies to an update from X2.0.0-1-X3.0.0-1 toX<strong>3.1</strong>.0-1-X<strong>3.1</strong>.5-1.1. Connect the WebManager to one server constituting the cluster.2. Start the online version Builder from the connected WebManager. If this is the first time tostart the online version Builder, you need to configure the Java user policy file. For details,refer to the Installation and Configuration <strong>Guide</strong>.3. Open the Alert Service tab of Cluster Properties, and then click the Edit button <strong>for</strong> EnableAlert Setting to open the Change Alert Destination dialog box.4. Click the OK button to close the Change Alert Destination dialog box.5. Click the OK button to close Cluster Properties.6. Make sure that the server in the cluster is running, and then upload the configurationin<strong>for</strong>mation from the online version Builder. For details on how to operate the online versionBuilder, refer to the Reference <strong>Guide</strong>.Section II Installing <strong>ExpressCluster</strong>151


Chapter 6Upgrading <strong>ExpressCluster</strong>This chapter provides in<strong>for</strong>mation on how to upgrade <strong>ExpressCluster</strong>.This chapter covers:• How to update from <strong>ExpressCluster</strong> X 2.0 or 2.1·················································································· 154• How to upgrade from X2.0 or X2.1 to X3.0 or X<strong>3.1</strong>············································································· 154Linkage In<strong>for</strong>mation:For the update from X3.0 to X<strong>3.1</strong>, see ”Update <strong>Guide</strong>”.153


Chapter 6 Upgrading <strong>ExpressCluster</strong>How to update from <strong>ExpressCluster</strong> X 2.0 or 2.1How to upgrade from X2.0 or X2.1 to X3.0 or X<strong>3.1</strong>Install the <strong>ExpressCluster</strong> Server RPM as root user.1. Disable the services by running the chkconfig --del name in the following order on allthe servers. Specify one of the following services in name.clusterpro_alertsyncclusterpro_webmgrclusterproclusterpro_mdclusterpro_trnclusterpro_evt2. Shut down and reboot the cluster by using WebManager or the clpstdn command.3. Mount the installation CD-ROM media.4. Confirm that <strong>ExpressCluster</strong> services are not running, and then install the package file byexecuting the rpm command. The RPM <strong>for</strong> installation is different depending on architecture.In the CD-ROM, move to /<strong>Linux</strong>/<strong>3.1</strong>/en/server and run the following:rpm –Uvh expresscls-..rpmFor architecture, there are i686, x86_64 and IBM POWER. Select architecture according tothe system requirements of the machine where <strong>ExpressCluster</strong> is installed. Architecture canbe verified by the arch command.<strong>ExpressCluster</strong> is installed in the following directory. Note that if you change this directoryyou cannot uninstall <strong>ExpressCluster</strong>.Installation directory: /opt/nec/clusterpro5. After completing installation, unmount the installation CD-ROM media, and remove it.6. Enable the services by running the chkconfig --add name in the following order.Specify one of the following services in name. For SUSE <strong>Linux</strong>, run the command with the–<strong>for</strong>ce option.clusterpro_evtclusterpro_trnclusterpro_webmgrclusterpro_alertsync7. Repeat the steps 3 to 6 on all the servers.8. Reboot all the servers that constitute the cluster.9. Register the license. For details on registering license, see “Chapter 4 Registering the license”in the Installation and Configuration <strong>Guide</strong>.10. Connect the WebManager to one of the servers of the cluster.11. Start the Builder from the connected WebManager. For details on how to start the onlineBuilder, see the Installation and Configuration <strong>Guide</strong>.12. Confirm that all servers of the cluster are started, and then upload the configuration data fromthe online Builder. For details on how to operate the online Builder, see the Reference <strong>Guide</strong>.154<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


How to update from <strong>ExpressCluster</strong> X 2.0 or 2.113. Enable the services in the following order by running the chkconfig --add namecommand. Specify the following services on name.clusterpro_mdclusterpro14. Per<strong>for</strong>m step 14 on all the servers.15. Run Restart Manager on the WebManager.16. Run Start Mirror Agent on the WebManager.17. Restart the browser connecting the WebManager.18. Run Start Cluster on the WebManager.Section II Installing <strong>ExpressCluster</strong>155


Appendix• Appendix A Glossary• Appendix B Index157


Appendix A. GlossaryCluster partitionInterconnectVirtual IP addressManagement clientStartup attributeShared diskShared disk type clusterSwitchable partitionCluster systemCluster shutdownActive serverSecondary serverStandby serverDisk heartbeat partitionData partitionNetwork partitionA partition on a mirror disk. Used <strong>for</strong> managing mirrordisks.(Related term: Disk heartbeat partition)A dedicated communication path <strong>for</strong> server-to-servercommunication in a cluster.(Related terms: Private LAN, Public LAN)IP address used to configure a remote cluster.Any machine that uses the WebManager to access andmanage a cluster system.A failover group attribute that determines whether afailover group should be started up automatically ormanually when a cluster is started.A disk that multiple servers can access.A cluster system that uses one or more shared disks.A disk partition connected to multiple computers and isswitchable among computers.(Related terms: Disk heartbeat partition)Multiple computers are connected via a LAN (or othernetwork) and behave as if it were a single system.To shut down an entire cluster system (all servers thatconfigure a cluster system).A server that is running <strong>for</strong> an application set.(Related term: Standby server)A destination server where a failover group fails over toduring normal operations.(Related term: Primary server)A server that is not an active server.(Related term: Active server)A partition used <strong>for</strong> heartbeat communication in a shareddisk type cluster.A local disk that can be used as a shared disk <strong>for</strong> switchablepartition. Data partition <strong>for</strong> mirror disks and hybrid disks.(Related term: Cluster partition)All heartbeat is lost and the network between servers ispartitioned.(Related terms: Interconnect, Heartbeat)159


Appendix A GlossaryNodeHeartbeatPublic LANFailoverFailbackFailover groupMoving failover groupFailover policyPrivate LANPrimary (server)Floating IP addressMaster serverMirror disk connectMirror disk type clusterA server that is part of a cluster in a cluster system. Innetworking terminology, it refers to devices, includingcomputers and routers, that can transmit, receive, or processsignals.Signals that servers in a cluster send to each other to detect afailure in a cluster.(Related terms: Interconnect, Network partition)A communication channel between clients and servers.(Related terms: Interconnect, Private LAN)The process of a standby server taking over the group ofresources that the active server previously was handling dueto error detection.A process of returning an application back to an activeserver after an application fails over to another server.A group of cluster resources and attributes required toexecute an application.Moving an application from an active server to a standbyserver by a user.A priority list of servers that a group can fail over to.LAN in which only servers configured in a clustered systemare connected.(Related terms: Interconnect, Public LAN)A server that is the main server <strong>for</strong> a failover group.(Related term: Secondary server)Clients can transparently switch one server from anotherwhen a failover occurs.Any unassigned IP address that has the same networkaddress that a cluster server belongs to can be used as afloating address.The server displayed at the top of Master Server in ServerCommon Properties of the BuilderLAN used <strong>for</strong> data mirroring in mirror disks and hybriddisks. Mirror connect can be used with primaryinterconnect.A cluster system that does not use a shared disk. Local disksof the servers are mirrored.160<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>


Appendix B. Indexalert destination setting, 151application monitoring, 33Applications supported, 63BMC heartbeat, 133browsers, 72, 74, 76, 77Builder, 72, 108, 147ABCclock synchronization, 125cluster object, 43Cluster shutdown and reboot, 144cluster system, 16COM heartbeat resource, 133communication port number, 122Config mode of Cluster Manager, 147Corrected in<strong>for</strong>mation, 86Ddata consistency, 116delay warning rate, 132dependency, 150dependent driver, 120dependent library, 120detectable and non-detectable errors, 33, 34disk interfaces, 52disk size, 77distribution, 54Enhanced functions, 80Environment variable, 129error detection, 15, 20executable <strong>for</strong>mat file, 139Execute Script be<strong>for</strong>e Final Action setting <strong>for</strong> monitorresource recovery action, 113<strong>ExpressCluster</strong>, 29, 30Efailover, 23, 29, 35, 36, 144failover exclusive attribute, 150failover resources, 37failure monitoring, 27File operating utility, 136file system, 118, 131final action, 130final action (group stop), 149Force stop function, chassis identify lamp linkage, 129Fgroup resource, 130group resources, 44GHhardware, 52hardware configuration, 40, 41, 42hardware requirements <strong>for</strong> hybrid disk, 111hardware requirements <strong>for</strong> mirror disk, 109hardware requirements <strong>for</strong> shared disk, 110heartbeat resources, 44High Availability (HA) cluster, 16How an error is detected, 31hybrid disk, 120, 126, 148Initial mirror construction time, 116integrated WebManager, 76internal monitoring, 33IP address <strong>for</strong> Integrated WebManager, 133IPMI message, 138IPv6 environment, 112IJJava runtime environment, 73, 74, 77JVM monitor resources, 116, 135, 149Kkernel, 54kernel dump, 148Kernel mode LAN heartbeat and keepalive drivers, 121kernel mode LAN heartbeat resource, 133LAN heartbeat, 133log collection, 127LMMail reporting, 117memory and disk size, 71, 73, 75memory size, 77message of kernel page allocation error, 139Message receive monitor resource, 134messages displayed when loading a driver, 137messages when collecting logs, 144mirror disk, 118, 126mirror driver, 121Mirror or hybrid disk connect, 116mirror recovery, 144modules, 30monitor resources, 45, 149monitor resources that monitoring timing is, 146161


Appendix B Indexmonitorable and non-monitorable errors, 33multiple mounts, 141Nnetwork configuration, 113network interfaces, 53Network partition, 21Network partition resolution resources, 44network settings, 126NetworkManager, 121netwowk warning light, 117NIC device name, 125NIC link up/down monitor resource, 114Notes on system monitor resources, 148, 149OO_DIRECT, 116operating systems, 72, 74, 76, 77OS startup time, 126Rraw device, 131raw monitor resources, 132RAW monitor resources, 121reload interval, 133resource, 29, 44resource activation, 144script file, 139Sscripts <strong>for</strong> starting/stopping <strong>ExpressCluster</strong> services, 145scripts in EXEC resources, 146server monitoring, 32server requirements, 52Server reset, server panic and power off, 129Servers supporting BMC-related functions, 53Servers supporting Express5800/A1080a andExpress5800/A1080aA1040a series linkage, 53Setting of monitor or hybrid disk resource action, 132shared disk, 125shutdown and reboot of individual server, 145single point of failure, 24software, 54software configuration, 29, 31supported operating systems, 108system configuration, 37TTaking over cluster resources, 22Taking over the applications, 23Taking over the data, 22TUR, 132Uuser mode monitor resource, 127WWebManager, 74, 76, 108, 147WebManager Mobile, 77write function, 115162<strong>ExpressCluster</strong> X <strong>3.1</strong> <strong>for</strong> <strong>Linux</strong> <strong>Getting</strong> <strong>Started</strong> <strong>Guide</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!