Topics: AIX, Monitoring, System Administration

Boxes and lines in NMON

Usually, with the default settings used with NMON, along with using PuTTY on a Windows system, you may notice that the boxes and lines in NMON are not displayed correctly. It may look something like this:



An easy fix for this issue is to cahnge the character set translation within PuTTY. In the upper left corner of your PuTTY window, click the icon and select "Change Settings". Then navigate to Window -> Translation. In the "Remote character set" field, change "UTF-8" to "ISO-8859-1".



Once changed, restart PuTTY and it should something like this:



Another option is to stop using boxes and lines altogether. You can do this by starting nmon with the -B option:

# nmon -B
Or you can set the NMON environment variable to the same:
# export NMON=B
# nmon

Topics: AIX, Linux, Monitoring, Security, System Administration

Sudosh

Sudosh is designed specifically to be used in conjunction with sudo or by itself as a login shell. Sudosh allows the execution of a root or user shell with logging. Every command the user types within the root shell is logged as well as the output.

This is different from "sudo -s" or "sudo /bin/sh", because when you use one of these instead of sudosh to start a new shell, then this new shell does not log commands typed in the new shell to syslog; only the fact that a new shell started is logged.

If this newly started shell supports commandline history, then you can still find the commands called in the shell in a file such as .sh_history, but if you use a shell such as csh that does not support command-line logging you are out of luck.

Sudosh fills this gap. No matter what shell you use, all of the command lines are logged to syslog (including vi keystrokes). In fact, sudosh uses the script command to log all key strokes and output.

Setting up sudosh is fairly easy. For a Linux system, first download the RPM of sudosh, for example from rpm.pbone.net. Then install it on your Linux server:

# rpm -ihv sudosh-1.8.2-1.2.el4.rf.i386.rpm
Preparing...  ########################################### [100%]
   1:sudosh   ########################################### [100%]
Then, go to the /etc file system and open up /etc/sudosh.conf. Here you can adjust the default shell that is started, and the location of the log files. Default, the log directory is /var/log/sudosh. Make sure this directory exists on your server, or change it to another existing directory in the sudosh.conf file. This command will set the correct authorizations on the log directory:
# sudosh -i
[info]: chmod 0733 directory /var/log/sudosh
Then, if you want to assign a user sudosh access, edit the /etc/sudoers file by running visudo, and add the following line:
username ALL=PASSWD:/usr/bin/sudosh
Now, the user can login, and run the following command to gain root access:
$ sudo sudosh
Password:
# whoami
root
Now, as a sys admin, you can view the log files created in /var/log/sudosh, but it is much cooler to use the sudosh-replay command to replay (like a VCR) the actual session, as run by the user with the sudosh access.

First, run sudosh-replay without any paramaters, to get a list of sessions that took place using sudosh:
# sudosh-replay
Date       Duration From To   ID
====       ======== ==== ==   ==
09/16/2010 6s       root root root-root-1284653707-GCw26NSq

Usage: sudosh-replay ID [MULTIPLIER] [MAXWAIT]
See 'sudosh-replay -h' for more help.
Example: sudosh-replay root-root-1284653707-GCw26NSq 1 2
Now, you can actually replay the session, by (for example) running:
# sudosh-replay root-root-1284653707-GCw26NSq 1 5
The first paramtere is the session-ID, the second parameter is the multiplier. Use a higher value for multiplier to speed up the replay, while "1" is the actual speed. And the third parameter is the max-wait. Where there might have been wait times in the actual session, this parameter restricts to wait for a maximum max-wait seconds, in the example above, 5 seconds.

For AIX, you can find the necessary RPM here. It is slightly different, because it installs in /opt/freeware/bin, and also the sudosh.conf is located in this directory. Both Linux and AIX require of course sudo to be installed, before you can install and use sudosh.

Topics: Monitoring, PowerHA / HACMP

Cluster status webpage

How do you monitor multiple HACMP clusters? You're probably familiar with the clstat or the xclstat commands. These are nice, but not sufficient when you have more than 8 HACMP clusters to monitor, as it can't be configured to monitor more than 8 clusters. It's also difficult to get an overview of ALL clusters in a SINGLE look with clstat. IBM included a clstat.cgi in HACMP 5 to show the cluster status on a webpage. This still doesn't provide an overview in a single look, as the clstat.cgi shows a long listing of all clusters, and it is just like clstat limited to monitoring just 8 clusters.

HACMP cluster status can be retrieved via SNMP (this is actually what clstat does too). Using the IP addresses of a cluster and the snmpinfo command, you can remotely retrieve cluster status information, and use that information to build a webpage. By using colors for the status of the clusters and the nodes (green = ok, yellow = something is happening, red = error), you can get a quick overview of the status of all the HACMP clusters.


Per cluster you can see: the cluster name, the cluster ID, HACMP version and the status of the cluster and all its nodes. It will also show you where any resource groups are active.

You can download the script here. Untar the file. There is a readme in the package, that will tell you how you can configure the script. This script has been tested with HACMP version 4 and 5, up to version 5.5.0.5.

Topics: AIX, Monitoring, System Administration

Cec Monitor

To monitor all lpars within 1 frame, use:

# topas -C

Topics: Monitoring, PowerHA / HACMP

HACMP auto-verification

HACMP automatically runs a verification every night, usually around mid-night. With a very simple command you can check the status of this verification run:

# tail -10 /var/hacmp/log/clutils.log 2>/dev/null|grep detected|tail -1
If this shows a returncode of 0, the cluster verification ran without any errors. Anything else, you'll have to investigate. You can use this command on all your HACMP clusters, allowing you to verify your HACMP cluster status every day.

With the following smitty menu you can change the time when the auto-verification runs and if it should produce debug output or not:
# smitty clautover.dialog
You can check with:
# odmget HACMPcluster
# odmget HACMPtimersvc
Be aware that if you change the runtime of the auto-verification that you have to synchronize the cluster afterwards to update the other nodes in the cluster.

Topics: Monitoring, PowerHA / HACMP

HACMP Event generation

HACMP provides events, which can be used to most accurately monitor the cluster status, for example via the Tivoli Enterprise Console. Each change in the cluster status is the result of an HACMP event. Each HACMP event has an accompanying notify method that can be used to handle the kind of notification we want.

Interesting Cluster Events to monitor are:

  • node_up
  • node_down
  • network_up
  • network_down
  • join_standby
  • fail_standby
  • swap_adapter
  • config_too_long
  • event_error
You can set the notify method via:
# smitty hacmp
Cluster Configuration
Cluster Resources
Cluster Events
Change/Show Cluster Events
You can also query the ODM:
# odmget HACMPevent

Topics: AIX, Monitoring, System Administration

"Bootpd: Received short packet" messages on console

If you're receiving messages like these on your console:

Mar 9 11:47:29 daemon:notice bootpd[192990]: received short packet
Mar 9 11:47:31 daemon:notice bootpd[192990]: received short packet
Mar 9 11:47:38 daemon:notice bootpd[192990]: hardware address not found: E41F132E3D6C
Then it means that you have the bootpd enabled on your server. There's nothing wrong with that. In fact, a NIM server for example requires you to have this enabled. However; these messages on the console can be annoying. There are systems on your network that are sending bootp requests (broadcast). Your system is listening to these requests and trying to answer. It is looking in the bootptab configuration (file /etc/bootptab) to see if their mac-addresses are defined. When they aren't, you are getting these messages.

To solve this, either disable the bootpd daemon, or change the syslog configuration. If you don't need the bootpd daemon, then edit the /etc/inetd.conf file and comment the entry for bootps. Then run:
# refresh -s inetd
If you do have a requirement for bootpd, then update the /etc/syslog.conf file and look for the entry that starts with daemon.notice:
#daemon.notice /dev/console
daemon.notice /nsr/logs/messages
By commenting the daemon.notice entry to /dev/console, and instead adding an entry that logs to a file, you can avoid seeing these messages on the console. Now all you have to do is refresh the syslogd daemon:
# refresh -s syslogd

Topics: AIX, Backup & restore, Linux, Monitoring, TSM

Report the end result of a TSM backup

A very easy way of getting a report from a backup is by using the POSTSchedulecmd entry in the dsm.sys file. Add the following entry to your dsm.sys file (which is usually located in /usr/tivoli/tsm/client/ba/bin or /opt/tivoli/tsm/client/ba/bin):

POSTSchedulecmd "/usr/local/bin/RunTsmReport"
This entry tells the TSM client to run script /usr/local/bin/RunTSMReport, as soon as it has completed its scheduled command. Now all you need is a script that creates a report from the dsmsched.log file, the file that is written to by the TSM scheduler:
#!/bin/bash
TSMLOG=/tmp/dsmsched.log
WRKDIR=/tmp
echo "TSM Report from `hostname`" >> ${WRKDIR}/tsmc
tail -100 ${TSMLOG} > ${WRKDIR}/tsma
grep -n "Elapsed processing time:" ${WRKDIR}/tsma > ${WRKDIR}/tsmb
CT2=`cat ${WRKDIR}/tsmb | awk -F":" '{print $1}'`
((CT3 = $CT2 - 14))
((CT5 = $CT2 + 1 ))
CT4=1
while read Line1 ; do
   if [ ${CT3} -gt ${CT4} ] ; then
      ((CT4 = ${CT4} + 1 ))
   else
      echo "${Line1}" >> ${WRKDIR}/tsmc
      ((CT4 = ${CT4} + 1 ))
      if [ ${CT4} -gt ${CT5} ] ; then
         break
      fi
   fi
done < ${WRKDIR}/tsma
mail -s "`hostname` Backup" email@address.com < ${WRKDIR}/tsmc
rm ${WRKDIR}/tsma ${WRKDIR}/tsmb ${WRKDIR}/tsmc

Topics: Monitoring, PowerHA / HACMP, Security

HACMP 5.4: How to change SNMP community name from default "public" and keep clstat working

HACMP 5.4 supports changing the default community name from "public" to something else. SNMP is used for clstatES communications. Using the "public" SNMP community name, can be a security vulnerability. So changing it is advisable.

First, find out what version of SNMP you are using:

# ls -l /usr/sbin/snmpd
lrwxrwxrwx 1 root system 9 Sep 08 2008 /usr/sbin/snmpd -> snmpdv3ne
(In this case, it is using version 3).

Make a copy of your configuration file. It is located on /etc.
/etc/snmpd.conf <- Version 1
/etc/snmpdv3.conf <- Version 3
Edit the file and replace wherever public is mentioned for your new community name. Make sure to use not more that 8 characters for the new community name.

Change subsystems and restart them:
# chssys -s snmpmibd -a "-c new"
# chssys -s hostmibd -a "-c new"
# chssys -s aixmibd -a "-c new"
# stopsrc -s snmpd
# stopsrc -s aixmibd
# stopsrc -s snmpmibd
# stopsrc -s hostmibd
# startsrc -s snmpd
# startsrc -s hostmibd
# startsrc -s snmpmibd
# startsrc -s aixmibd
Test using your locahost:
# snmpinfo -m dump -v -h localhost -c new -o /usr/es/sbin/cluster/hacmp.defs nodeTable
If the command hangs, something is wrong. Check the changes you made.

If everything works fine, perform the same change in the other node and test again. Now you can test from one server to the other using the snmpinfo command above.

If you need to backout, replace with the original configuration file and restart subsystems. Note in this case we use double-quotes. There is no space.
# chssys -s snmpmibd -a ""
# chssys -s hostmibd -a ""
# chssys -s aixmibd -a ""
# stopsrc -s snmpd
# stopsrc -s aixmibd
# stopsrc -s snmpmibd
# stopsrc -s hostmibd
# startsrc -s snmpd
# startsrc -s hostmibd
# startsrc -s snmpmibd
# startsrc -s aixmibd
Okay, now make the change to clinfoES and restart and both nodes:
# chssys -s clinfoES -a "-c new"
# stopsrc -s clinfoES
# startsrc -s clinfoES
Wait a few minutes and you should be able to use clstat again with the new community name.

Disclaimer: If you have any other application other than clinfoES that uses snmpd with the default community name, you should make changes to it as well. Check with your application team or software vendor.

Topics: AIX, Monitoring

Removing error report entries forever

There's a way to avoid certain entries appearing in the error report indefinitely. You can use this for example for tape cleaning messages:

The following command shows you the entries that are written to the error log, but not reported on:

# errpt -t -F Report=0
Let's say you don't want any reports on errors with ID D1A1AE6F:
# errupdate [Enter]
=D1A1AE6F: [Enter]
Report=False [Enter]
[Ctrl-D]
[Ctrl-D]
With "Report=False", errors are still logged in your logfile (usually /var/adm/ras/errlog). If you don't want them to be logged to the error log, for example when you have an errnotify (which still starts an action, also for error ID's with "Report=False"), you can change "Report=False" to "Log=False".

More info on this subject can be found here.

Number of results found for topic Monitoring: 11.
Displaying results: 1 - 10.