Reduce Downtime with rsync

rsync is one of those tools that you find new uses for every day. It is really just a fancy file mover but it has an amazing feature set. Among other things You can control bandwidth usage, resume operations, copy files over ssh and include and exclude files as you like.

In The Practice of System and Network Administration they have a whole chapter dedicated to planning downtime (chapter 12 in the first edition). As a sysadmin it is your job to minimize it as much as possible but without sacrificing safety. In some cases rsync can be the perfect tool for this.

One usage scenario where I come back over and over again to rsync is when I want to move an application to a new machine, e.g a mail server. In a previous life I wanted to move a Cyrus imap server to a new hardware platform. At that time we had almost a 30GB mail spool and it would take an estimated 10 hours to move all files to the new hardware platform. The problem with Cyrus (and one of its strengths) is that it saves every email as a separate file. 10 hours was not really acceptable as a downtime window so an alternate solution was needed. In the end I used rsync with a total of 1 hour of downtime. Let me explain how.

I first shared the current mail spool using NFS to the new server. In this example the old mail spool is under /mnt and the new mail spool will be located under /cyrus. Late one evening I started my first rsync command.

# cd /cyrus
# rsync -arv /mnt .

This rsync will copy all files from /mnt to /cyrus and preserve all permissions and owner/group settings. This took about 10 hours to run and I did it while the mail server was active. So in the morning I had a copy of my mail spool but the copy was of course not consistent.

The next evening we did a trial run of the final migration. I ran this command and timed it carefully.

# cd /cyrus
# rsync -arv --delete /mnt .

Notice the –delete flag. It will delete all files in the destination that no longer exists in the source. This is to make sure that all emails the users deleted on the old server during the previous day are deleted on the new server as well. Since I timed this command I got a good approximation on how long the operation will last. It was around 1 hour. Now it was then time to announce the downtime window to the users. Fortunately I had very flexible users back then and 1 day warning for night work was ok.

The next evening about 2 hours before the downtime window I ran the previous command again. It sync:ed the mail spool once again over to the new server. When the downtime window started I disabled the mail server software on the old server (so I could get a final consistent copy) and then I ran the previous rsync command a third time and since it was just an hour since the previous once finished it only took 15 minutes. I then had to do some changes to internal DNS servers and bring up the Cyrus software on the new software and everything was ready.

So with a little bit of planning and careful use of rsync I could reduce my downtime window from 10 hours to 1 hour.

[?]
Do you need system administration assistance? If you like what you are reading please consider subscribing to the RSS feed. If you have feedback or if you find the article useful please leave a comment below.

Leave a Reply