Topics: PowerHA / HACMP

PowerHA / HACMP support matrix

Support matrix / life cycle for IBM PowerHA (with a typical 3 year lifecycle):

AIX
5.1
AIX
5.2
AIX
5.3
AIX
6.1
AIX
7.1
Release
Date
End Of
Support
HACMP 5.1YESYESYESNONOJuly 11, 2003Sep 1, 2006
HACMP 5.2YESYESYESNONOJuly 16, 2004Sep 30, 2007
HACMP 5.3NOML4+ML2+YESNOAug 12, 2005Sep 30, 2009
HACMP 5.4.0NOTL8+TL4+NONOJuly 28, 2006Sep 30, 2011
HACMP 5.4.1NOTL8+TL4+YESYESSep 11, 2007Sep 30, 2011
PowerHA 5.5NONOTL7+TL2 SP1+YESNov 14, 2008Apr 30, 2012
PowerHA 6.1NONOTL9+TL2 SP1+YESOct 20, 2009Apr 30, 2015
PowerHA 7.1NONONOTL6+YESSep 10, 2010N/A
PowerHA 7.1.1NONONOTL7 SP2+TL1 SP2+Sep 10, 2010N/A
PowerHA 7.1.2NONONOTL8 SP1+TL2 SP1+Oct 3, 2012N/A
PowerHA 7.1.3NONONOTL9 SP1+TL3 SP1+Oct 7, 2013N/A

Note: None of these versions is supported for AIX 4.3.3.
Source: HACMP Version Compatibility Matrix

Topics: AIX, PowerHA / HACMP, System Administration

clstat: Failed retrieving cluster information.

If clstat is not working, you may get the following error, when running clstat:

# clstat
Failed retrieving cluster information.

There are a number of possible causes:
clinfoES or snmpd subsystems are not active.
snmp is unresponsive.
snmp is not configured correctly.
Cluster services are not active on any nodes.

Refer to the HACMP Administration Guide for more information.
Additional information for verifying the SNMP configuration on AIX 6
can be found in /usr/es/sbin/cluster/README5.5.0.UPDATE
To resolve this, first of all, go ahead and read the README that is referred to. You'll find that you have to enable an entry in /etc/snmdv3.conf:
Commands clstat or cldump will not start if the internet MIB tree is not enabled in snmpdv3.conf file. This behavior is usually seen in AIX 6.1 onwards where this internet MIB entry was intentionally disabled as a security issue. This internet MIB entry is required to view/resolve risc6000clsmuxpd (1.3.6.1.4.1.2.3.1.2.1.5) MIB sub tree which is used by clstat or cldump functionality.

There are two ways to enable this MIB sub tree(risc6000clsmuxpd) they are:

1) Enable the main internet MIB entry by adding this line in /etc/snmpdv3.conf file

VACM_VIEW defaultView internet - included -

But doing so is not advisable as it unlocks the entire MIB tree

2) Enable only the MIB sub tree for risc6000clsmuxpd without enabling the main MIB tree by adding this line in /etc/snmpdv3.conf file

VACM_VIEW defaultView 1.3.6.1.4.1.2.3.1.2.1.5 - included -

Note: After enabling the MIB entry above snmp daemon must be restarted with the following commands as shown below:

# stopsrc -s snmpd
# startsrc -s snmpd

After snmp is restarted leave the daemon running for about two minutes before attempting to start clstat or cldump.
Sometimes, even after doing this, clstat or cldump still don't work. The next thing may sound silly, but edit the /etc/snmpdv3.conf file, and take out the coments. Change this:
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password  # gated
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password # HACMP/ES for AIX ...
To:
smux 1.3.6.1.4.1.2.3.1.2.1.2 gated_password
smux 1.3.6.1.4.1.2.3.1.2.1.5 clsmuxpd_password
Then, recycle the deamons on all cluster nodes. This can be done while the cluster is up and running:
# stopsrc -s hostmibd
# stopsrc -s snmpmibd
# stopsrc -s aixmibd
# stopsrc -s snmpd
# sleep 4
# chssys -s hostmibd -a "-c public"
# chssys -s aixmibd  -a "-c public"
# chssys -s snmpmibd  -a "-c public"
# sleep 4
# startsrc -s snmpd
# startsrc -s aixmibd
# startsrc -s snmpmibd
# startsrc -s hostmibd
# sleep 120
# stopsrc -s clinfoES
# startsrc -s clinfoES
# sleep 120
Now, to verify that it works, run either clstat or cldump, or the following command:
# snmpinfo -m dump -v -o /usr/es/sbin/cluster/hacmp.defs cluster
Still not working at this point? Then run an Extended Verification and Synchronization:
# smitty cm_ver_and_sync.select
After that, clstat, cldump and snmpinfo should work.

Topics: AIX, PowerHA / HACMP, System Administration

Error in HACMP in LVM

If you run into the following error:

cl_mklv: Operation is not allowed because vg is a RAID concurrent volume group.
This may be caused by the volume group being varied on, on the other node. If it should not be varied on, on the other node, run:
# varyoffvg vg
And then retry the LVM command. If it continues to be a problem, then stop HACMP on both nodes, export the volume group and re-import the volume group on both nodes, and then restart the cluster.

Topics: AIX, PowerHA / HACMP, System Administration

NTP slewing in clusters

In order to keep the system time synchronized with other nodes in an HACMP cluster or across the enterprise, Network Time Protocol (NTP) should be implemented. In its default configuration, NTP will periodically update the system time to match a reference clock by resetting the system time on the node. If the time on the reference clock is behind the time of the system clock, the system clock will be set backwards causing the same time period to be passed twice. This can cause internal timers in HACMP and Oracle databases to wait longer periods of time under some circumstances. When these circumstances arise, HACMP may stop the node or the Oracle instance may shut itself down.

Oracle will log an ORA-29740 error when it shuts down the instance due to inconsistent timers. The hatsd daemon utilized by HACMP will log a TS_THREAD_STUCK_ER error in the system error log just before HACMP stops a node due to an expired timer.

To avoid this issue, system managers should configure the NTP daemon to increment time on the node slower until the system clock and the reference clock are in sync (this is called "slewing" the clock) instead of resetting the time in one large increment. The behavior is configured with the -x flag for the xntpd daemon.

To check the current running configuration of xntpd for the -x flag:

# ps -aef | grep xntpd | grep -v grep
    root  409632  188534   0 11:46:45      -  0:00 /usr/sbin/xntpd
To update the current running configuration of xntpd to include the -x flag:
# chssys -s xntpd -a "-x"
0513-077 Subsystem has been changed.
# stopsrc -s xntpd
0513-044 The /usr/sbin/xntpd Subsystem was requested to stop.
# startsrc -s xntpd
0513-059 The xntpd Subsystem has been started. Subsystem PID is 40932.
# ps -f | grep xntpd | grep -grep
    root  409632  188534   0 11:46:45      -  0:00 /usr/sbin/xntpd -x

Topics: AIX, PowerHA / HACMP

AIX 5.3 end-of-service

The EOM date (end of marketing) has been announced for AIX 5.3: 04/11; meaning that AIX 5.3 will no longer be marketed by IBM from April 2011, and that it is now time for customers to start thinking about upgrading to AIX 6.1. The EOS (end of service) date for AIX 5.3 is 04/12, meaning AIX 5.3 will be serviced by IBM until April 2012. After that, IBM will only service AIX 5.3 for an additional fee. The EOL (end of life) date is 04/16, which is the end of life date at April 2016. The final technology level for AIX 5.3 is technology level 12. Some service packs for TL12 will be released though.

IBM has also announced EOM and EOS dates for HACMP 5.4 and PowerHA 5.5, so if you're using any of these versions, you also need to upgrade to PowerHA 6.1:

  • Sep 30, 2010: EOM HACMP 5.4, PowerHA 5.5
  • Sep 30, 2011: EOS HACMP 5.4
  • Sep 30, 2012: EOS HACMP 5.5

Topics: AIX, EMC, Installation, PowerHA / HACMP, Storage Area Network, System Administration

Quick setup guide for HACMP

Use this procedure to quickly configure an HACMP cluster, consisting of 2 nodes and disk heartbeating.

Prerequisites:

Make sure you have the following in place:

  • Have the IP addresses and host names of both nodes, and for a service IP label. Add these into the /etc/hosts files on both nodes of the new HACMP cluster.
  • Make sure you have the HACMP software installed on both nodes. Just install all the filesets of the HACMP CD-ROM, and you should be good.
  • Make sure you have this entry in /etc/inittab (as one of the last entries):
    clinit:a:wait:/bin/touch /usr/es/sbin/cluster/.telinit
  • In case you're using EMC SAN storage, make sure you configure you're disks correctly as hdiskpower devices. Or, if you're using a mksysb image, you may want to follow this procedure EMC ODM cleanup.
Steps:
  • Create the cluster and its nodes:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure an HACMP Cluster and Nodes
    
    Enter a cluster name and select the nodes you're going to use. It is vital here to have the hostnames and IP address correctly entered in the /etc/hosts file of both nodes.
  • Create an IP service label:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure Resources to Make Highly Available
    Configure Service IP Labels/Addresses
    Add a Service IP Label/Address
    
    Enter an IP Label/Address (press F4 to select one), and enter a Network name (again, press F4 to select one).
  • Set up a resource group:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure HACMP Resource Groups
    Add a Resource Group
    
    Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never Fallback". This way, when the primary node in the cluster comes up, and the resource group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node.

    Note: The order of the nodes is determined by the order you select the nodes here. If you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority.
  • Add the Servie IP/Label to the resource group:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure HACMP Resource Groups
    Change/Show Resources for a Resource Group (standard)
    
    Select the resource group you've created earlier, and add the Service IP/Label.
  • Run a verification/synchronization:
    # smitty hacmp
    Extended Configuration
    Extended Verification and Synchronization
    
    Just hit [ENTER] here. Resolve any issues that may come up from this synchronization attempt. Repeat this process until the verification/synchronization process returns "Ok". It's a good idea here to select to "Automatically correct errors".
  • Start the HACMP cluster:
    # smitty hacmp
    System Management (C-SPOC)
    Manage HACMP Services
    Start Cluster Services
    
    Select both nodes to start. Make sure to also start the Cluster Information Daemon.
  • Check the status of the cluster:
    # clstat -o
    # cldump
    
    Wait until the cluster is stable and both nodes are up.
Basically, the cluster is now up-and-running. However, during the Verification & Synchronization step, it will complain about not having a non-IP network. The next part is for setting up a disk heartbeat network, that will allow the nodes of the HACMP cluster to exchange disk heartbeat packets over a SAN disk. We're assuming here, you're using EMC storage. The process on other types of SAN storage is more or less similar, except for some differences, e.g. SAN disks on EMC storage are called "hdiskpower" devices, and they're called "vpath" devices on IBM SAN storage.

First, look at the available SAN disk devices on your nodes, and select a small disk, that won't be used to store any data on, but only for the purpose of doing the disk heartbeat. It is a good habit, to request your SAN storage admin to zone a small LUN as a disk heartbeating device to both nodes of the HACMP cluster. Make a note of the PVID of this disk device, for example, if you choose to use device hdiskpower4:
# lspv | grep hdiskpower4
hdiskpower4   000a807f6b9cc8e5    None
So, we're going to set up the disk heartbeat network on device hdiskpower4, with PVID 000a807f6b9cc8e5:
  • Create an concurrent volume group:
    # smitty hacmp
    System Management (C-SPOC)
    HACMP Concurrent Logical Volume Management
    Concurrent Volume Groups
    Create a Concurrent Volume Group
    
    Select both nodes to create the concurrent volume group on by pressing F7 for each node. Then select the correct PVID. Give the new volume group a name, for example "hbvg".
  • Set up the disk heartbeat network:
    # smitty hacmp
    Extended Configuration
    Extended Topology Configuration
    Configure HACMP Networks
    Add a Network to the HACMP Cluster
    
    Select "diskhb" and accept the default Network Name.
  • Run a discovery:
    # smitty hacmp
    Extended Configuration
    Discover HACMP-related Information from Configured Nodes
    
  • Add the disk device:
    # smitty hacmp
    Extended Configuration
    Extended Topology Configuration
    Configure HACMP Communication Interfaces/Devices
    Add Communication Interfaces/Devices
    Add Discovered Communication Interface and Devices
    Communication Devices
    
    Select the disk device on both nodes by selecting the same disk on each node by pressing F7.
  • Run a Verification & Synchronization again, as described earlier above. Then check with clstat and/or cldump again, to check if the disk heartbeat network comes online.

Topics: AIX, PowerHA / HACMP, System Administration

NFS mounts on HACMP failing

When you want to mount an NFS file system on a node of an HACMP cluster, there are a couple of items you need check, before it will work:

  • Make sure the hostname and IP address of the HACMP node are resolvable and provide the correct output, by running:
    # nslookup [hostname]
    # nslookup [ip-address]
    
  • The next thing you will want to check on the NFS server, if the node names of your HACMP cluster nodes are correctly added to the /etc/exports file. If they are, run:
    # exportfs -va
  • The last, and tricky item you will want to check is, if a service IP label is defined as an IP alias on the same adapter as your nodes hostname, e.g.:
    # netstat -nr
    Routing tables
    Destination   Gateway       Flags  Refs  Use    If  Exp  Groups
    
    Route Tree for Protocol Family 2 (Internet):
    default       10.251.14.1   UG      4    180100 en1  -     -
    10.251.14.0   10.251.14.50  UHSb    0         0 en1  -     -
    10.251.14.50  127.0.0.1     UGHS    3    791253 lo0  -     -
    
    The example above shows you that the default gateway is defined on the en1 interface. The next command shows you where your Service IP label lives:
    # netstat -i
    Name  Mtu   Network   Address         Ipkts   Ierrs Opkts
    en1   1500  link#2    0.2.55.d3.75.77 2587851 0      940024
    en1   1500  10.251.14 node01          2587851 0      940024
    en1   1500  10.251.20 serviceip       2587851 0      940024
    lo0   16896 link#1                    1912870 0     1914185
    lo0   16896 127       loopback        1912870 0     1914185
    lo0   16896 ::1                       1912870 0     1914185
    
    As you can see, the Service IP label (in the example above called "serviceip") is defined on en1. In that case, for NFS to work, you also want to add the "serviceip" to the /etc/exports file on the NFS server and re-run "exportfs -va". And you should also make sure that hostname "serviceip" resolves to an IP address correctly (and of course the IP address resolves to the correct hostname) on both the NFS server and the client.

Topics: Monitoring, PowerHA / HACMP

Cluster status webpage

How do you monitor multiple HACMP clusters? You're probably familiar with the clstat or the xclstat commands. These are nice, but not sufficient when you have more than 8 HACMP clusters to monitor, as it can't be configured to monitor more than 8 clusters. It's also difficult to get an overview of ALL clusters in a SINGLE look with clstat. IBM included a clstat.cgi in HACMP 5 to show the cluster status on a webpage. This still doesn't provide an overview in a single look, as the clstat.cgi shows a long listing of all clusters, and it is just like clstat limited to monitoring just 8 clusters.

HACMP cluster status can be retrieved via SNMP (this is actually what clstat does too). Using the IP addresses of a cluster and the snmpinfo command, you can remotely retrieve cluster status information, and use that information to build a webpage. By using colors for the status of the clusters and the nodes (green = ok, yellow = something is happening, red = error), you can get a quick overview of the status of all the HACMP clusters.


Per cluster you can see: the cluster name, the cluster ID, HACMP version and the status of the cluster and all its nodes. It will also show you where any resource groups are active.

You can download the script here. Untar the file. There is a readme in the package, that will tell you how you can configure the script. This script has been tested with HACMP version 4 and 5, up to version 5.5.0.5.

Topics: AIX, EMC, PowerHA / HACMP, Storage, Storage Area Network, System Administration

Missing disk method in HACMP configuration

Issue when trying to bring up a resource group: For example, the hacmp.out log file contains the following:

cl_disk_available[187] cl_fscsilunreset fscsi0 hdiskpower1 false cl_fscsilunreset[124]: openx(/dev/hdiskpower1, O_RDWR, 0, SC_NO_RESERVE): Device busy cl_fscsilunreset[400]: ioctl SCIOLSTART id=0X11000 lun=0X1000000000000 : Invalid argument
To resolve this, you will have to make sure that the SCSI reset disk method is configured in HACMP. For example, when using EMC storage:

Make sure emcpowerreset is present in /usr/lpp/EMC/Symmetrix/bin/emcpowerreset.

Then add new custom disk method:
  • Enter into the SMIT fastpath for HACMP "smitty hacmp".
  • Select Extended Configuration.
  • Select Extended Resource Configuration.
  • Select HACMP Extended Resources Configuration.
  • Select Configure Custom Disk Methods.
  • Select Add Custom Disk Methods.
      Change/Show Custom Disk Methods

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

                                                 [Entry Fields]
* Disk Type (PdDvLn field from CuDv)             disk/pseudo/power
* New Disk Type                                  [disk/pseudo/power]
* Method to identify ghost disks                 [SCSI3]
* Method to determine if a reserve is held       [SCSI_TUR]
* Method to break reserve [/usr/lpp/EMC/Symmetrix/bin/emcpowerreset]
  Break reserves in parallel                     true
* Method to make the disk available              [MKDEV]

Topics: PowerHA / HACMP, System Administration

Synchronizing 2 HACMP nodes

In order to keep users and all their related settings and crontab files synchronized, here's a script that you can use to do this for you:

sync.ksh

Number of results found for topic PowerHA / HACMP: 27.
Displaying results: 1 - 10.