It was late summer in 2000 when things went terribly awry in my new job at EDS. The backups that we needed had failed. I traced the failure's root cause to numbers 4 and 7 in my article, 10 things I wish I'd known before becoming a Linux sysadmin. I discovered that we hadn't had a good backup on the systems in question in at least three years. I discussed this failure with the backup and restore (BUR) team lead, and his and my manager's opinions were the same—it was my fault that the backups were bad. Here's the interesting part of the story: I'd only been at this job for less than four months.
[ Did you take the backup technology poll? ]
There were other people in the group of varying levels of technical expertise, but one person was praised as a "guru" and, much to my chagrin, no one called her out for not checking the failed backups. My manager actually told me that I "should have been checking those backups," and it was my responsibility to do so. It had been my errant assumption that the BUR team would verify the backups.
"My best advice to all system administrators is to verify backups for every system you touch..."
And, yes, I did bring up the fact that the backups hadn't worked for three years and that three years was as far back as the backups went. So, basically, it's likely that there had never been good backups of those systems.
I took responsibility, albeit under protest, and then also took on the action item of getting backups working on the twenty or more systems that monitored our infrastructure. It took me a couple of weeks to get it all going, to test, and to verify that the backups were working. And although I considered this task to be significant, I never heard a "good job" or "thank you" for my work. I assume my lack of accolades for a successful backup implementation was because it had been deemed my fault that the backups had never worked.
My best advice to all system administrators is to verify backups for every system you touch or might have adjacent responsibility for, because someone will most likely eventually need to point a finger, and it could be at you.
Here's how I verify backups to ensure that they're working on my systems:
- Create a restore_test.txt file for each system buried deep in the filesystem.
- Create a script to scrape the backup logs for your restore_test.txt file.
- Select a random system once per week and restore the restore_test.txt file.
- Create a backup_restore_log.txt file and log your weekly progress.
- Prepare to share the backup_restore_log.txt file with your manager in case of a failure, disaster, accident, or neglect.
Hopefully, your work environment isn't as dysfunctional as mine was. But, just in case of any issues that might arise, be proactive in checking backups and verifying that you can restore a file from your backups. It's too important of a task to leave it to chance. Whether you have official responsibility or not, make it your job to verify that backups are being done and that they're working as expected.
[ Want to test your sysadmin skills? Take a skills assessment today. ]
About the author
Ken has used Red Hat Linux since 1996 and has written ebooks, whitepapers, actual books, thousands of exam review questions, and hundreds of articles on open source and other topics. Ken also has 20+ years of experience as an enterprise sysadmin with Unix, Linux, Windows, and Virtualization.
Follow him on Twitter: @kenhess for a continuous feed of Sysadmin topics, film, and random rants.
In the evening after Ken replaces his red hat with his foil hat, he writes and makes films with varying degrees of success and acceptance. He is an award-winning filmmaker who constantly tries to convince everyone of his Renaissance Man status, also with varying degrees of success and acceptance.
More like this
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech
Products
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Cloud services
- See all products
Tools
- Training and certification
- My account
- Customer support
- Developer resources
- Find a partner
- Red Hat Ecosystem Catalog
- Red Hat value calculator
- Documentation
Try, buy, & sell
Communicate
About Red Hat
We’re the world’s leading provider of enterprise open source solutions—including Linux, cloud, container, and Kubernetes. We deliver hardened solutions that make it easier for enterprises to work across platforms and environments, from the core datacenter to the network edge.
Select a language
Red Hat legal and privacy links
- About Red Hat
- Jobs
- Events
- Locations
- Contact Red Hat
- Red Hat Blog
- Diversity, equity, and inclusion
- Cool Stuff Store
- Red Hat Summit