Subscribe to the feed

It was late summer in 2000 when things went terribly awry in my new job at EDS. The backups that we needed had failed. I traced the failure's root cause to numbers 4 and 7 in my article, 10 things I wish I'd known before becoming a Linux sysadmin. I discovered that we hadn't had a good backup on the systems in question in at least three years. I discussed this failure with the backup and restore (BUR) team lead, and his and my manager's opinions were the same—it was my fault that the backups were bad. Here's the interesting part of the story: I'd only been at this job for less than four months. 

[ Did you take the backup technology poll? ]

There were other people in the group of varying levels of technical expertise, but one person was praised as a "guru" and, much to my chagrin, no one called her out for not checking the failed backups. My manager actually told me that I "should have been checking those backups," and it was my responsibility to do so. It had been my errant assumption that the BUR team would verify the backups.

"My best advice to all system administrators is to verify backups for every system you touch..."

And, yes, I did bring up the fact that the backups hadn't worked for three years and that three years was as far back as the backups went. So, basically, it's likely that there had never been good backups of those systems.

I took responsibility, albeit under protest, and then also took on the action item of getting backups working on the twenty or more systems that monitored our infrastructure. It took me a couple of weeks to get it all going, to test, and to verify that the backups were working. And although I considered this task to be significant, I never heard a "good job" or "thank you" for my work. I assume my lack of accolades for a successful backup implementation was because it had been deemed my fault that the backups had never worked. 

My best advice to all system administrators is to verify backups for every system you touch or might have adjacent responsibility for, because someone will most likely eventually need to point a finger, and it could be at you. 

Here's how I verify backups to ensure that they're working on my systems:

  • Create a restore_test.txt file for each system buried deep in the filesystem.
  • Create a script to scrape the backup logs for your restore_test.txt file.
  • Select a random system once per week and restore the restore_test.txt file.
  • Create a backup_restore_log.txt file and log your weekly progress.
  • Prepare to share the backup_restore_log.txt file with your manager in case of a failure, disaster, accident, or neglect.

Hopefully, your work environment isn't as dysfunctional as mine was. But, just in case of any issues that might arise, be proactive in checking backups and verifying that you can restore a file from your backups. It's too important of a task to leave it to chance. Whether you have official responsibility or not, make it your job to verify that backups are being done and that they're working as expected.

[ Want to test your sysadmin skills? Take a skills assessment today. ]


About the author

Ken has used Red Hat Linux since 1996 and has written ebooks, whitepapers, actual books, thousands of exam review questions, and hundreds of articles on open source and other topics. Ken also has 20+ years of experience as an enterprise sysadmin with Unix, Linux, Windows, and Virtualization.

Follow him on Twitter: @kenhess for a continuous feed of Sysadmin topics, film, and random rants.

In the evening after Ken replaces his red hat with his foil hat, he writes and makes films with varying degrees of success and acceptance. He is an award-winning filmmaker who constantly tries to convince everyone of his Renaissance Man status, also with varying degrees of success and acceptance.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Original series icon

Original shows

Entertaining stories from the makers and leaders in enterprise tech