Maybe this is too obvious for others out there, but a book I would recommend for sysadmins is Site Reliability Engineering (SRE), edited by Betsy Beyer, Chris Jones, et al. It’s not an obscure choice by any means. This book might be one of the best known titles to sysadmins everywhere. I recommend it because it’s easy to ignore, but—I think— game-changing in its own right.
For years, I’d ignored this SRE book on the basis that anything Google-scale could not possibly apply to what I did on a day-to-day basis. I reasoned that the masses of online discussion could be chalked up to the fanboys and fangirls. Certainly, after a decade as a sysadmin, nothing truly new would be included in what was essentially a sysadmin handbook.
I was wrong. When I did finally pick up a copy and start to read it, my mind was changed within a few chapters. No, there’s no magical recipe for perfect system administration. Yes, it describes a job that focuses heavily on programming rather than "traditional" system administration. No, it is not a manual about how to be a system administrator.
Site Reliability Engineering describes exactly the challenges facing my team. We’re handling more servers per sysadmin than ever before—a ratio of hundreds-to-one where ten years ago it was dozens-to-one. Even with better automation tools and increased scripting, trying to handle that scale is challenging, and a new workflow has to be developed to deal with the load.
SREs are arguably not sysadmins as we know the term, but they are the next generation of operations staff. This book discusses well-thought-out steps to transition a team from traditional sysadmins to a team of SREs, including the skills needed, practices to put into place as a team, and policies from leadership that support and enhance these changes. It is well worth the read, even as a single contributing individual.
About the author
Chris Collins is an SRE at Red Hat and a Community Moderator for Opensource.com. He is a container and container orchestration, DevOps, and automation evangelist, and will talk with anyone interested in those topics for far too long and with much enthusiasm.
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Original shows
Entertaining stories from the makers and leaders in enterprise tech