Topics: AIX, Storage Area Network, System Administration

AIX fibre channel error - FCS_ERR6

This error can occur if the fibre channel adapter is extremely busy. The AIX FC adapter driver is trying to map an I/O buffer for DMA access, so the FC adapter can read or write into the buffer. The DMA mapping is done by making a request to the PCI bus device driver.

The PCI bus device driver is saying that it can't satisfy the request right now. There was simply too much IO at that moment, and the adapter couldn't handle them all. When the FC adapter is configured, we tell the PCI bus driver how much resource to set aside for us, and it may have gone over the limit. It is therefore recommended to increase the max_xfer_size on the fibre channel devices.

It depends on the type of fibre channel adapter, but usually the possible sizes are:

0x100000, 0x200000, 0x400000, 0x800000, 0x1000000

To view the current setting type the following command:

# lsattr -El fcsX -a max_xfer_size
Replace the X with the fibre channel adapter number.

You should get an output similar to the following:
max_xfer_size 0x100000 Maximum Transfer Size True
The value can be changed as follows, after which the server needs to be rebooted:
# chdev -l fcsX -a max_xfer_size=0x1000000 -P

Topics: AIX, SDD, Storage Area Network, System Administration

Method error when running cfgmgr

If you see the following error when running cfgmgr:

Method error (/usr/lib/methods/fcmap >> /var/adm/essmap.out):
        0514-023 The specified device does not exist in the
                 customized device configuration database.
This is caused when you have ESS driver filesets installed, but no ESS (type 2105) disks in use on the system. Check the type of disks by running:
# lsdev -Cc disk | grep 2105
If no type 2105 disks are found, you can uninstall any ESS driver filesets:
# installp -u ibm2105.rte ibmpfe.essutil.fibre.data ibmpfe.essutil.rte

Topics: AIX, EMC, Installation, PowerHA / HACMP, Storage Area Network, System Administration

Quick setup guide for HACMP

Use this procedure to quickly configure an HACMP cluster, consisting of 2 nodes and disk heartbeating.

Prerequisites:

Make sure you have the following in place:

  • Have the IP addresses and host names of both nodes, and for a service IP label. Add these into the /etc/hosts files on both nodes of the new HACMP cluster.
  • Make sure you have the HACMP software installed on both nodes. Just install all the filesets of the HACMP CD-ROM, and you should be good.
  • Make sure you have this entry in /etc/inittab (as one of the last entries):
    clinit:a:wait:/bin/touch /usr/es/sbin/cluster/.telinit
  • In case you're using EMC SAN storage, make sure you configure you're disks correctly as hdiskpower devices. Or, if you're using a mksysb image, you may want to follow this procedure EMC ODM cleanup.
Steps:
  • Create the cluster and its nodes:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure an HACMP Cluster and Nodes
    
    Enter a cluster name and select the nodes you're going to use. It is vital here to have the hostnames and IP address correctly entered in the /etc/hosts file of both nodes.
  • Create an IP service label:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure Resources to Make Highly Available
    Configure Service IP Labels/Addresses
    Add a Service IP Label/Address
    
    Enter an IP Label/Address (press F4 to select one), and enter a Network name (again, press F4 to select one).
  • Set up a resource group:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure HACMP Resource Groups
    Add a Resource Group
    
    Enter the name of the resource group. It's a good habit to make sure that a resource group name ends with "rg", so you can recognize it as a resource group. Also, select the participating nodes. For the "Fallback Policy", it is a good idea to change it to "Never Fallback". This way, when the primary node in the cluster comes up, and the resource group is up-and-running on the secondary node, you won't see a failover occur from the secondary to the primary node.

    Note: The order of the nodes is determined by the order you select the nodes here. If you put in "node01 node02" here, then "node01" is the primary node. If you want to have this any other way, now is a good time to correctly enter the order of node priority.
  • Add the Servie IP/Label to the resource group:
    # smitty hacmp
    Initialization and Standard Configuration
    Configure HACMP Resource Groups
    Change/Show Resources for a Resource Group (standard)
    
    Select the resource group you've created earlier, and add the Service IP/Label.
  • Run a verification/synchronization:
    # smitty hacmp
    Extended Configuration
    Extended Verification and Synchronization
    
    Just hit [ENTER] here. Resolve any issues that may come up from this synchronization attempt. Repeat this process until the verification/synchronization process returns "Ok". It's a good idea here to select to "Automatically correct errors".
  • Start the HACMP cluster:
    # smitty hacmp
    System Management (C-SPOC)
    Manage HACMP Services
    Start Cluster Services
    
    Select both nodes to start. Make sure to also start the Cluster Information Daemon.
  • Check the status of the cluster:
    # clstat -o
    # cldump
    
    Wait until the cluster is stable and both nodes are up.
Basically, the cluster is now up-and-running. However, during the Verification & Synchronization step, it will complain about not having a non-IP network. The next part is for setting up a disk heartbeat network, that will allow the nodes of the HACMP cluster to exchange disk heartbeat packets over a SAN disk. We're assuming here, you're using EMC storage. The process on other types of SAN storage is more or less similar, except for some differences, e.g. SAN disks on EMC storage are called "hdiskpower" devices, and they're called "vpath" devices on IBM SAN storage.

First, look at the available SAN disk devices on your nodes, and select a small disk, that won't be used to store any data on, but only for the purpose of doing the disk heartbeat. It is a good habit, to request your SAN storage admin to zone a small LUN as a disk heartbeating device to both nodes of the HACMP cluster. Make a note of the PVID of this disk device, for example, if you choose to use device hdiskpower4:
# lspv | grep hdiskpower4
hdiskpower4   000a807f6b9cc8e5    None
So, we're going to set up the disk heartbeat network on device hdiskpower4, with PVID 000a807f6b9cc8e5:
  • Create an concurrent volume group:
    # smitty hacmp
    System Management (C-SPOC)
    HACMP Concurrent Logical Volume Management
    Concurrent Volume Groups
    Create a Concurrent Volume Group
    
    Select both nodes to create the concurrent volume group on by pressing F7 for each node. Then select the correct PVID. Give the new volume group a name, for example "hbvg".
  • Set up the disk heartbeat network:
    # smitty hacmp
    Extended Configuration
    Extended Topology Configuration
    Configure HACMP Networks
    Add a Network to the HACMP Cluster
    
    Select "diskhb" and accept the default Network Name.
  • Run a discovery:
    # smitty hacmp
    Extended Configuration
    Discover HACMP-related Information from Configured Nodes
    
  • Add the disk device:
    # smitty hacmp
    Extended Configuration
    Extended Topology Configuration
    Configure HACMP Communication Interfaces/Devices
    Add Communication Interfaces/Devices
    Add Discovered Communication Interface and Devices
    Communication Devices
    
    Select the disk device on both nodes by selecting the same disk on each node by pressing F7.
  • Run a Verification & Synchronization again, as described earlier above. Then check with clstat and/or cldump again, to check if the disk heartbeat network comes online.

Topics: AIX, EMC, Storage, Storage Area Network, System Administration

Unable to remove hdiskpower devices due to a method error

If you get a method error when trying to rmdev -dl your hdiskpower devices, then follow this procedure.

Cannot remove hdiskpower devices with rmdev, get error "method error (/etc/methods/ucfgpowerdisk):"
The fix is to uninstall/reinstall Powerpath, but you won't be able to until you remove the hdiskpower devices with this procedure:
  1. # odmdelete -q name=hdiskpowerX -o CuDv
    (for every hdiskpower device)
  2. # odmdelete -q name=hdiskpowerX -o CuAt
    (for every hdiskpower device)
  3. # odmdelete -q name=powerpath0 -o CuDv
  4. # odmdelete -q name=powerpath0 -o CuAt
  5. # rm /dev/powerpath0
  6. You must remove the modified files installed by powerpath and then reboot the server. You will then be able to uninstall powerpath after the reboot via the "installp -u EMCpower" command. The files to be removed are as follows:

    (Do not be concerned if some of the removals do not work as PowerPath may not be fully configured properly).
    # rm ./etc/PowerPathExtensions
    # rm ./etc/emcp_registration
    # rm ./usr/lib/boot/protoext/disk.proto.ext.scsi.pseudo.power
    # rm ./usr/lib/drivers/pnext
    # rm ./usr/lib/drivers/powerdd
    # rm ./usr/lib/drivers/powerdiskdd
    # rm ./usr/lib/libpn.a
    # rm ./usr/lib/methods/cfgpower
    # rm ./usr/lib/methods/cfgpowerdisk
    # rm ./usr/lib/methods/chgpowerdisk
    # rm ./usr/lib/methods/power.cat
    # rm ./usr/lib/methods/ucfgpower
    # rm ./usr/lib/methods/ucfgpowerdisk
    # rm ./usr/lib/nls/msg/en_US/power.cat
    # rm ./usr/sbin/powercf
    # rm ./usr/sbin/powerprotect
    # rm ./usr/sbin/pprootdev
    # rm ./usr/lib/drivers/cgext
    # rm ./usr/lib/drivers/mpcext
    # rm ./usr/lib/libcg.so
    # rm ./usr/lib/libcong.so
    # rm ./usr/lib/libemcp_mp_rtl.so
    # rm ./usr/lib/drivers/mpext
    # rm ./usr/lib/libmp.a
    # rm ./usr/sbin/emcpreg
    # rm ./usr/sbin/powermt
    # rm ./usr/share/man/man1/emcpreg.1
    # rm ./usr/share/man/man1/powermt.1
    # rm ./usr/share/man/man1/powerprotect.1
    
  7. Re-install Powerpath.

Topics: AIX, EMC, PowerHA / HACMP, Storage, Storage Area Network, System Administration

Missing disk method in HACMP configuration

Issue when trying to bring up a resource group: For example, the hacmp.out log file contains the following:

cl_disk_available[187] cl_fscsilunreset fscsi0 hdiskpower1 false cl_fscsilunreset[124]: openx(/dev/hdiskpower1, O_RDWR, 0, SC_NO_RESERVE): Device busy cl_fscsilunreset[400]: ioctl SCIOLSTART id=0X11000 lun=0X1000000000000 : Invalid argument
To resolve this, you will have to make sure that the SCSI reset disk method is configured in HACMP. For example, when using EMC storage:

Make sure emcpowerreset is present in /usr/lpp/EMC/Symmetrix/bin/emcpowerreset.

Then add new custom disk method:
  • Enter into the SMIT fastpath for HACMP "smitty hacmp".
  • Select Extended Configuration.
  • Select Extended Resource Configuration.
  • Select HACMP Extended Resources Configuration.
  • Select Configure Custom Disk Methods.
  • Select Add Custom Disk Methods.
      Change/Show Custom Disk Methods

Type or select values in entry fields.
Press Enter AFTER making all desired changes.

                                                 [Entry Fields]
* Disk Type (PdDvLn field from CuDv)             disk/pseudo/power
* New Disk Type                                  [disk/pseudo/power]
* Method to identify ghost disks                 [SCSI3]
* Method to determine if a reserve is held       [SCSI_TUR]
* Method to break reserve [/usr/lpp/EMC/Symmetrix/bin/emcpowerreset]
  Break reserves in parallel                     true
* Method to make the disk available              [MKDEV]

Topics: AIX, SDD, Storage, Storage Area Network

PVID trouble

To add a PVID to a disk, enter:

# chdev -l vpathxx -a pv=yes
To clear all reservations from a previously used SAN disk:
# chpv -C vpathxx

Topics: EMC, Storage, Storage Area Network

EMC PowerPath key installation

This describes how to configure the EMC PowerPath registration keys.

First, check the current configuration of PowerPath:

# powermt config
Warning: all licenses for storage systems support are missing or expired.
The install the keys:
# emcpreg -install

=========== EMC PowerPath Registration ===========
Do you have a new registration key or keys to enter?[n] y
Enter the registration keys(s) for your product(s),
one per line, pressing Enter after each key.
After typing all keys, press Enter again.

Key (Enter if done): P6BV-4KDB-QET6-RF9A-QV9D-MN3V
1 key(s) successfully added.
Key successfully installed.

Key (Enter if done):
1 key(s) successfully registered.
(Note: the license key used in this example is not valid).

Topics: Installation, SDD, Storage Area Network

SDD upgrade from 1.6.X to 1.7.X

Whenever you need to perform an upgrade of SDD (and it is wise to keep it up-to-date), make sure you check the SDD documentation before doing this. Here's the quick steps to perform to do the updates.

  • Check for any entries in the errorlog that could interfere with the upgrades:
    # errpt -a | more
  • Check if previously installed packages are OK:
    # lppchk -v
  • Commit any previously installed packages:
    # installp -c all
  • Make sure to have a recent mksysb image of the server and before starting the updates to the rootvg, do an incremental TSM backup. Also a good idea is to prepare the alt_disk_install on the second boot disk.
  • For HACMP nodes: check the cluster status and log files to make sure the cluster is stable and ready for the upgrades.
  • Update fileset devices.fcp.disk.ibm to the latest level using smitty update_all.
  • For ESS environments: Update host attachment script ibm2105 and ibmpfe.essutil to the latest available levels using smitty update_all.
  • Enter the lspv command to find out all the SDD volume groups.
  • Enter the lsvgfs command for each SDD volume group to find out which file systems are mounted, e.g.:
    # lsvgfs vg_name
  • Enter the umount command to unmount all file systems belonging to the SDD volume groups.
  • Enter the varyoffvg command to vary off the volume groups.
  • If you are upgrading to an SDD version earlier than 1.6.0.0; or if you are upgrading to SDD 1.6.0.0 or later and your host is in a HACMP environment with nonconcurrent volume groups that are varied-on on other host, that is, reserved by other host, run the vp2hd volume_group_name script to convert the volume group from the SDD vpath devices to supported storage hdisk devices. Otherwise, you skip this step.
  • Stop the SDD server:
    # stopsrc -s sddsrv
  • Remove all the SDD vpath devices:
    # rmdev -dl dpo -R
  • Use the smitty command to uninstall the SDD. Enter smitty deinstall and press Enter. The uninstallation process begins. Complete the uninstallation process.
  • If you need to upgrade the AIX operating system, you could perform the upgrade now. If required, reboot the system after the operating system upgrade.
  • Use the smitty command to install the newer version of the SDD. Note: it is also possible to do smitty update_all to simply update the SDD fileset, without first uninstalling it; but IBM recommends doing an uninstall first, then patch the OS, and then do an install of the SDD fileset.
  • Use the smitty device command to configure all the SDD vpath devices to the Available state.
  • Enter the lsvpcfg command to verify the SDD configuration.
  • If you are upgrading to an SDD version earlier than 1.6.0.0, run the hd2vp volume_group_name script for each SDD volume group to convert the physical volumes from supported storage hdisk devices back to the SDD vpath devices.
  • Enter the varyonvg command for each volume group that was previously varied offline.
  • Enter the lspv command to verify that all physical volumes of the SDD volume groups are SDD vpath devices.
  • Check for any errors:
    # errpt | more
    # lppchk -v
    # errclear 0
  • Enter the mount command to mount all file systems that were unmounted.
Attention: If the physical volumes on an SDD volume groupís physical volumes are mixed with hdisk devices and SDD vpath devices, you must run the dpovgfix utility to fix this problem. Otherwise, SDD will not function properly:
# dpovgfix vg_name

Topics: EMC, Storage, Storage Area Network

EMC Grab

EMC Grab is a utility that is run locally on each host and gathers storage-specific information (driver version, storage-technical details, etc). The EMC Grab report creates a zip file. This zip file can be used by EMC support.

You can download the "Grab Utility" from the following locations:

When you've downloaded EMCgrab, and stored in a temporary location on the server like /tmp/emc, untar it using:
tar -xvf *tar
Then run:
/tmp/emc/emcgrab/emcgrab.sh
The script is interactive and finishes after a couple of minutes.

Topics: EMC, Storage, Storage Area Network

Reset reservation bit

If you run into not being able to access an hdiskpowerX disk, you may need to reset the reservation bit on it:

# /usr/lpp/EMC/Symmetrix/bin/emcpowerreset fscsiX hdiskpowerX

Number of results found for topic Storage Area Network: 20.
Displaying results: 1 - 10.