How to detect hardware problems in Solaris 10

There was many great new features included in Solaris 10, including SMF that I blogged about earlier on how to utilize it to detect problems. Another wonderful tool is the Solaris Fault Management. It is part of the Self Healing technologies available in Solaris 10. It monitors your system and if it detects errors it will flag them, log them and whatever you may like.

Fortunately Sun has made it really easy to use it as a monitoring system. The following easy command is all you really need to know (you must run it as root)


# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------

It will then list all the devices/subsystems that are marked as faulty currently. As you can see my home server does not have a problem at the moment. I’m also not that interested in causing a problem on it just as an example :-). Fortunately the output has been looking like this for a couple for releases so it is very easy to script it and add it to cron. A simple script could look like this (you can also download the script here)

#!/bin/ksh

# Public domain. Use as you wish.

EMAIL=nickus@xxx.yyy
TMPFILE=/tmp/fmadm.output.$$

export PATH=/usr/bin:/usr/sbin

#
# run fmadm and cut away the first two lines
#
fmadm faulty | sed 1,2d >$TMPFILE

#
# check if the file size is greater than zero
# which means we got some output from fmadm and
# therefore some hardware may be bad
#
if [ -s $TMPFILE ]; then
        cat $TMPFILE | mailx -s "Hardware failed on `hostname`" $EMAIL
fi

rm -f $TMPFILE

As you can see I just remove the first two lines of the output and if this results in a non-empty file I email the content. Very simple but very effective. Save this script, put a line in cron to execute it once an hour (or once a day) and you will have another great tool to monitor your machine.

Once you have repaired your hardware you need to tell fmd it has been fixed but you can read all about it in the man-page for fmadm(1M).

[?]
Do you need system administration assistance? If you like what you are reading please consider subscribing to the RSS feed. If you have feedback or if you find the article useful please leave a comment below.

3 Responses to “How to detect hardware problems in Solaris 10”

  1. http://aspiringsysadmin.com/blog/2007/07/02/how-to-detect-hardware-problems-in-solaris-10/

    Your sentence is a little messed up.

    “Fortunately the output has been looking like this for a couple for releases so it is very easy to script it and add it to cron.”

  2. star poker reader poker code star

  3. Honi soit legate left buy cytotec then announced held.

Leave a Reply