In a previous article entitled Sysadmin tools: Using rsync to manage backup, restore, and file synchronization, I discussed
sftp, and looked at the basics of
rsync for moving files around. There are also a couple of other great articles here on Enable Sysadmin on tar and SSH you should take a look at. Copying files to and from remote systems and having an easy way to run a backup of something you're working on (or, for that matter, critical company data) are basic, useful tools in the sysadmin toolbox that I use again and again. Sometimes, however, you may want to do something a little more sophisticated, like move data across a less trusted or slower link.
Rsync can provide encryption to protect it in transit, compression to make it flow better, and checksums to ensure you get what you were expecting.
[ Readers also liked: How to securely copy files between Linux hosts using SCP and SFTP ]
Maintaining a website
I first started using
rsync to synchronize a local version of a website I administered back in the dark ages when CI/CD was just a twinkle in her father's eye. I could keep a local copy to work on and also have a backup of the latest version of the site. I'll use that scenario as my example. You can use
rsync to synchronize any remote filesystem for backups or as a quick way to create a test-to-production pipeline. I've also used it to sync a directory and then used
tar to create local backups:
skipworthy ~ enable websync ls -al total 8 drwxrwxr-x 2 skipworthy skipworthy 4096 Dec 16 13:57 . drwxrwxr-x 5 skipworthy skipworthy 4096 Dec 16 14:01 .. skipworthy ~ enable websync rsync -aruv 192.168.11.111:/usr/share/httpd/enable ./ receiving incremental file list enable/ enable/bar enable/foo enable/index sent 85 bytes received 229 bytes 209.33 bytes/sec total size is 0 speedup is 0.00 skipworthy ~ enable websync ls -l total 4 dr-xr-xr-x 2 skipworthy skipworthy 4096 Dec 16 13:49 enable
As before with the local
rsync, we're running in archive mode, preserving mtime and file attributes, recursing into subdirectories, and only updating data that is new or changed.
-v almost always means verbose, sending output to the console.
So let’s say I wanted to add a page to the site and upload it:
skipworthy ~ enable websync rsync -aruv ./* 192.168.11.111:/usr/share/httpd/enable sending incremental file list pagetwo rsync: recv_generator: mkdir "/usr/share/httpd/enable/enable" failed: Permission denied (13)
This is a good time to note that there are some things you need to think about when using
rsync to push files.
Rsync needs permission to the whole directory tree, not just the destination directory. There are several ways you can accomplish this. For one, it's possible to specify the uid and gid of the
rsync daemon in
/etc/rsyncd.conf. Another way is to run
rsync as a user with the required permissions. Both of these can be problematic in a secure environment, so tread carefully here.
Also, note that by default, when you use
rsync remotely, you are connecting directly to the
rsync service on port 873. You need to think about that when setting up firewall rules and permissions. In addition, there are SELinux permissions, which are a whole other discussion and not within the scope of this article. One solution to the directory permissions problem and security concerns, in general, is to use SSH (this should sound familiar by now). SSH sets up an encrypted tunnel and can be set to listen on any port, not to mention that you can specify an SSH key to further secure the connection and make remote connections a bit more automation-friendly.
For this next example, I'm going to push these changes as root. Please don't be like me:
rsync -aruv -e ssh ./* email@example.com:/usr/share/httpd/enable firstname.lastname@example.org's password: sending incremental file list pagetwo sent 246 bytes received 36 bytes 43.38 bytes/sec total size is 31 speedup is 0.11
Note again that the usual SSH options are available for connections, including specifying the port and key location.
Rsync also allows you to select the remote shell option, so long as it's installed on both ends and configured in
I mentioned checksums earlier, and there are two potentially useful things here. First,
rsync runs a checksum by default and then verifies it on the target, which will warn you on the off chance of data getting lost or damaged in flight.
Second, you can also use checksums to determine which files to transfer (i.e., which files are actually different between source and destination), which is useful if you aren't sure if the mtime/atime represents the actual version of the file you want. I've had filesystems that I was trying to sync that were getting touched by another app, so the times were incorrect.
skipworthy ~ enable websync rsync -aruv -e ssh --checksum 192.168.11.111:/usr/share/httpd/enable ./ receiving incremental file list sent 26 bytes received 364 bytes 780.00 bytes/sec
Note: This incurs some additional overhead in processing and in the transfer since a checksum is generated for each file on each side and then compared.
One more useful trick is compression. The
-z option will compress the stream,
--zc sets the compression type, and
--zl sets the level:
skipworthy ~ enable websync rsync -aruv -e ssh --zc=zlib --zl=6 192.168.11.111:/usr/share/httpd/enable ./ receiving incremental file list sent 26 bytes received 248 bytes 548.00 bytes/sec total size is 31 speedup is 0.11
If you don't specify the type or level,
rsync uses a list provided in the RSYNC_COMPRESS_LIST environment variable to negotiate for a common type and level of compression.
[ Looking for more on system automation? Get started with The Automated Enterprise, a free book from Red Hat. ]
So there you have it:
rsync is yet another one of those tools that is still useful and relevant to Linux systems administration—a tool I have been glad to have many times in my career. There are many, many more things you can do with it that we did not have room to explore here—as always, check the man pages!