피드 구독

It was late summer in 2000 when things went terribly awry in my new job at EDS. The backups that we needed had failed. I traced the failure's root cause to numbers 4 and 7 in my article, 10 things I wish I'd known before becoming a Linux sysadmin. I discovered that we hadn't had a good backup on the systems in question in at least three years. I discussed this failure with the backup and restore (BUR) team lead, and his and my manager's opinions were the same—it was my fault that the backups were bad. Here's the interesting part of the story: I'd only been at this job for less than four months. 

[ Did you take the backup technology poll? ]

There were other people in the group of varying levels of technical expertise, but one person was praised as a "guru" and, much to my chagrin, no one called her out for not checking the failed backups. My manager actually told me that I "should have been checking those backups," and it was my responsibility to do so. It had been my errant assumption that the BUR team would verify the backups.

"My best advice to all system administrators is to verify backups for every system you touch..."

And, yes, I did bring up the fact that the backups hadn't worked for three years and that three years was as far back as the backups went. So, basically, it's likely that there had never been good backups of those systems.

I took responsibility, albeit under protest, and then also took on the action item of getting backups working on the twenty or more systems that monitored our infrastructure. It took me a couple of weeks to get it all going, to test, and to verify that the backups were working. And although I considered this task to be significant, I never heard a "good job" or "thank you" for my work. I assume my lack of accolades for a successful backup implementation was because it had been deemed my fault that the backups had never worked. 

My best advice to all system administrators is to verify backups for every system you touch or might have adjacent responsibility for, because someone will most likely eventually need to point a finger, and it could be at you. 

Here's how I verify backups to ensure that they're working on my systems:

  • Create a restore_test.txt file for each system buried deep in the filesystem.
  • Create a script to scrape the backup logs for your restore_test.txt file.
  • Select a random system once per week and restore the restore_test.txt file.
  • Create a backup_restore_log.txt file and log your weekly progress.
  • Prepare to share the backup_restore_log.txt file with your manager in case of a failure, disaster, accident, or neglect.

Hopefully, your work environment isn't as dysfunctional as mine was. But, just in case of any issues that might arise, be proactive in checking backups and verifying that you can restore a file from your backups. It's too important of a task to leave it to chance. Whether you have official responsibility or not, make it your job to verify that backups are being done and that they're working as expected.

[ Want to test your sysadmin skills? Take a skills assessment today. ]


저자 소개

Ken has used Red Hat Linux since 1996 and has written ebooks, whitepapers, actual books, thousands of exam review questions, and hundreds of articles on open source and other topics. Ken also has 20+ years of experience as an enterprise sysadmin with Unix, Linux, Windows, and Virtualization.

Follow him on Twitter: @kenhess for a continuous feed of Sysadmin topics, film, and random rants.

In the evening after Ken replaces his red hat with his foil hat, he writes and makes films with varying degrees of success and acceptance. He is an award-winning filmmaker who constantly tries to convince everyone of his Renaissance Man status, also with varying degrees of success and acceptance.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리