7 Basic Solaris Troubleshooting Tips
When a user reports a problem with a server which is your responsibility there are a few checks you should always do. One thing that has complicated this on a Solaris 10 system is zones. It is possible that the zone you are working in is not the cause of the problem but you could still feel the effects.
1. First check if the process is alive. Do you expect the httpd process to be there. ps(1) will give you the answer.
$ ps -ef | grep httpd
If you are working in the global zone it may be wise to add the -Z flag to ps so that you get the name of each zone.
$ ps -efZ | grep httpd
2. Check the logfiles. On Solaris you find the standard logs in /var/log/syslog and /var/adm/messages. If you use Blastwave packages you can find the log files in /opt/csw/var/log
3. Are you out of diskspace? df -h will give you an easy interpretable result. Or are you out of inodes? This can happen quite easily on multi terabyte UFS file systems. df -o i will give you the answer to that question.
4. Do you have the expected network connectivity? Try to use ping(1M) to see if the machine answers. Or is the daemon in the example listening on the correct port? From your own machine you can try to connect to the port on the server using telnet(1). Below is an example of what it looks like when the daemon answers.
$ telnet 192.168.127.137 80
telnet 192.168.127.137 80
Trying 192.168.127.137...
Connected to xyz.
Escape character is '^]'.
and if there is nothing listening there you will get a connection refused message like this
$ telnet 192.168.127.137 80
Trying 192.168.127.137...
telnet: connect to address 192.168.127.137: Connection refused
telnet: Unable to connect to remote host
In case of network problems it is always good to see if you have set the correct netmask for all interfaces. If you have come this far and everything looks ok we may need to take a look at the machine as a whole.
5.If the network “feels slow” then make sure you double check DNS entries and the DNS settings. The interesting files on a Solaris system is /etc/hosts, /etc/inet/ipnodes, /etc/resolv.conf and /etc/nsswitch.conf. Use nslookup(1) to make sure your hostname resolves to the correct IP address as well as the IP address resolves back to your hostname.
6. Are there are other runaway processes that may cause a problem? Use prstat(1M) is useful to spot them. Make sure you are looking at this from the global zone if possible.
$ prstat -Z
This command will show you use the CPU usage per zone.
7. Use iostat(1M) to see if there is any disk related issues. Maybe an Oracle process is hogging all bandwidth on one channel.
$ iostat -xnM 1
It will show the usage in megabytes. These values are the same no matter if you run them from the global or local zone.
If you have any additional tests that you think should be part of a basic troubleshooting guide please leave a comment. I will update this post and make a permanent page on the blog at a later stage.
Do you need system administration assistance? If you like what you are reading please consider subscribing to the RSS feed. If you have feedback or if you find the article useful please leave a comment below.


If you are working with Solaris 10 or later systems, you should always run svcs -x to see which services are not running, It will even provide you log locations to see if the bad event that caused any problems left info about the failure.
svc:/application/print/server:default (LP print server)
State: disabled since Wed Jul 11 08:11:44 2007
Reason: Disabled by an administrator.
See: http://sun.com/msg/SMF-8000-05
See: lpsched(1M)
Impact: 1 dependent service is not running. (Use -v for list.)
Indeed, I shouldn’t have forgotten that. I even wrote a blog post about some weeks ago - How to use SMF to quickly detect problems
Gay Emo Boy
Girl Russian