Abonnez-vous au flux

It was late summer in 2000 when things went terribly awry in my new job at EDS. The backups that we needed had failed. I traced the failure's root cause to numbers 4 and 7 in my article, 10 things I wish I'd known before becoming a Linux sysadmin. I discovered that we hadn't had a good backup on the systems in question in at least three years. I discussed this failure with the backup and restore (BUR) team lead, and his and my manager's opinions were the same—it was my fault that the backups were bad. Here's the interesting part of the story: I'd only been at this job for less than four months. 

[ Did you take the backup technology poll? ]

There were other people in the group of varying levels of technical expertise, but one person was praised as a "guru" and, much to my chagrin, no one called her out for not checking the failed backups. My manager actually told me that I "should have been checking those backups," and it was my responsibility to do so. It had been my errant assumption that the BUR team would verify the backups.

"My best advice to all system administrators is to verify backups for every system you touch..."

And, yes, I did bring up the fact that the backups hadn't worked for three years and that three years was as far back as the backups went. So, basically, it's likely that there had never been good backups of those systems.

I took responsibility, albeit under protest, and then also took on the action item of getting backups working on the twenty or more systems that monitored our infrastructure. It took me a couple of weeks to get it all going, to test, and to verify that the backups were working. And although I considered this task to be significant, I never heard a "good job" or "thank you" for my work. I assume my lack of accolades for a successful backup implementation was because it had been deemed my fault that the backups had never worked. 

My best advice to all system administrators is to verify backups for every system you touch or might have adjacent responsibility for, because someone will most likely eventually need to point a finger, and it could be at you. 

Here's how I verify backups to ensure that they're working on my systems:

  • Create a restore_test.txt file for each system buried deep in the filesystem.
  • Create a script to scrape the backup logs for your restore_test.txt file.
  • Select a random system once per week and restore the restore_test.txt file.
  • Create a backup_restore_log.txt file and log your weekly progress.
  • Prepare to share the backup_restore_log.txt file with your manager in case of a failure, disaster, accident, or neglect.

Hopefully, your work environment isn't as dysfunctional as mine was. But, just in case of any issues that might arise, be proactive in checking backups and verifying that you can restore a file from your backups. It's too important of a task to leave it to chance. Whether you have official responsibility or not, make it your job to verify that backups are being done and that they're working as expected.

[ Want to test your sysadmin skills? Take a skills assessment today. ]


À propos de l'auteur

Ken has used Red Hat Linux since 1996 and has written ebooks, whitepapers, actual books, thousands of exam review questions, and hundreds of articles on open source and other topics. Ken also has 20+ years of experience as an enterprise sysadmin with Unix, Linux, Windows, and Virtualization.

Follow him on Twitter: @kenhess for a continuous feed of Sysadmin topics, film, and random rants.

In the evening after Ken replaces his red hat with his foil hat, he writes and makes films with varying degrees of success and acceptance. He is an award-winning filmmaker who constantly tries to convince everyone of his Renaissance Man status, also with varying degrees of success and acceptance.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

Parcourir par canal

automation icon

Automatisation

Les dernières nouveautés en matière d'automatisation informatique pour les technologies, les équipes et les environnements

AI icon

Intelligence artificielle

Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement

open hybrid cloud icon

Cloud hybride ouvert

Découvrez comment créer un avenir flexible grâce au cloud hybride

security icon

Sécurité

Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies

edge icon

Edge computing

Actualité sur les plateformes qui simplifient les opérations en périphérie

Infrastructure icon

Infrastructure

Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde

application development icon

Applications

À l’intérieur de nos solutions aux défis d’application les plus difficiles

Original series icon

Programmes originaux

Histoires passionnantes de créateurs et de leaders de technologies d'entreprise