Skip to main content

5 advanced rsync tips for Linux sysadmins

Use rsync compression and checksums to better manage file synchronization.
Image
sink

Photo by ato de from Pexels

In a previous article entitled Sysadmin tools: Using rsync to manage backup, restore, and file synchronization, I discussed cp and sftp, and looked at the basics of rsync for moving files around. There are also a couple of other great articles here on Enable Sysadmin on tar and SSH you should take a look at. Copying files to and from remote systems and having an easy way to run a backup of something you're working on (or, for that matter, critical company data) are basic, useful tools in the sysadmin toolbox that I use again and again. Sometimes, however, you may want to do something a little more sophisticated, like move data across a less trusted or slower link. Rsync can provide encryption to protect it in transit, compression to make it flow better, and checksums to ensure you get what you were expecting.

[ Readers also liked: How to securely copy files between Linux hosts using SCP and SFTP ]

Maintaining a website

I first started using rsync to synchronize a local version of a website I administered back in the dark ages when CI/CD was just a twinkle in her father's eye. I could keep a local copy to work on and also have a backup of the latest version of the site. I'll use that scenario as my example. You can use rsync to synchronize any remote filesystem for backups or as a quick way to create a test-to-production pipeline. I've also used it to sync a directory and then used tar to create local backups:

skipworthy  ~  enable  websync  ls -al
total 8
drwxrwxr-x 2 skipworthy skipworthy 4096 Dec 16 13:57 .
drwxrwxr-x 5 skipworthy skipworthy 4096 Dec 16 14:01 ..
 skipworthy  ~  enable  websync  rsync -aruv 192.168.11.111:/usr/share/httpd/enable ./
receiving incremental file list
enable/
enable/bar
enable/foo
enable/index

sent 85 bytes  received 229 bytes  209.33 bytes/sec
total size is 0  speedup is 0.00
skipworthy  ~  enable  websync  ls -l
total 4
dr-xr-xr-x 2 skipworthy skipworthy 4096 Dec 16 13:49 enable

As before with the local rsync, we're running in archive mode, preserving mtime and file attributes, recursing into subdirectories, and only updating data that is new or changed.

Note: -v almost always means verbose, sending output to the console.

So let’s say I wanted to add a page to the site and upload it:

skipworthy  ~  enable  websync  rsync -aruv ./* 192.168.11.111:/usr/share/httpd/enable
sending incremental file list
pagetwo
rsync: recv_generator: mkdir "/usr/share/httpd/enable/enable" failed: Permission denied (13)

This is a good time to note that there are some things you need to think about when using rsync to push files. Rsync needs permission to the whole directory tree, not just the destination directory. There are several ways you can accomplish this. For one, it's possible to specify the uid and gid of the rsync daemon in /etc/rsyncd.conf. Another way is to run rsync as a user with the required permissions. Both of these can be problematic in a secure environment, so tread carefully here.

Also, note that by default, when you use rsync remotely, you are connecting directly to the rsync service on port 873. You need to think about that when setting up firewall rules and permissions. In addition, there are SELinux permissions, which are a whole other discussion and not within the scope of this article. One solution to the directory permissions problem and security concerns, in general, is to use SSH (this should sound familiar by now). SSH sets up an encrypted tunnel and can be set to listen on any port, not to mention that you can specify an SSH key to further secure the connection and make remote connections a bit more automation-friendly.

Advanced features

For this next example, I'm going to push these changes as root. Please don't be like me:

rsync -aruv -e ssh  ./* root@192.168.11.111:/usr/share/httpd/enable
root@192.168.11.111's password:
sending incremental file list
pagetwo

sent 246 bytes  received 36 bytes  43.38 bytes/sec
total size is 31  speedup is 0.11

Note again that the usual SSH options are available for connections, including specifying the port and key location. Rsync also allows you to select the remote shell option, so long as it's installed on both ends and configured in .ssh/config.

Checksums

I mentioned checksums earlier, and there are two potentially useful things here. First, rsync runs a checksum by default and then verifies it on the target, which will warn you on the off chance of data getting lost or damaged in flight.

Second, you can also use checksums to determine which files to transfer (i.e., which files are actually different between source and destination), which is useful if you aren't sure if the mtime/atime represents the actual version of the file you want. I've had filesystems that I was trying to sync that were getting touched by another app, so the times were incorrect.

skipworthy  ~  enable  websync  rsync -aruv -e ssh --checksum 192.168.11.111:/usr/share/httpd/enable ./
receiving incremental file list

sent 26 bytes  received 364 bytes  780.00 bytes/sec

Note: This incurs some additional overhead in processing and in the transfer since a checksum is generated for each file on each side and then compared.

Compression

One more useful trick is compression. The -z option will compress the stream, --zc sets the compression type, and --zl sets the level:

skipworthy  ~  enable  websync  rsync -aruv -e ssh --zc=zlib --zl=6 192.168.11.111:/usr/share/httpd/enable ./
receiving incremental file list

sent 26 bytes  received 248 bytes  548.00 bytes/sec
total size is 31  speedup is 0.11

If you don't specify the type or level, rsync uses a list provided in the RSYNC_COMPRESS_LIST environment variable to negotiate for a common type and level of compression.

[ Looking for more on system automation? Get started with The Automated Enterprise, a free book from Red Hat. ] 

Wrap up

So there you have it: rsync is yet another one of those tools that is still useful and relevant to Linux systems administration—a tool I have been glad to have many times in my career. There are many, many more things you can do with it that we did not have room to explore here—as always, check the man pages!

Author’s photo

Glen Newell

Glen Newell has been solving problems with technology for 20 years. As a Systems Engineer and administrator, he’s built and managed servers for Web Services, Healthcare, Finance, Education, and a wide variety of enterprise applications. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.