A few weeks ago, I talked with the venerable Ken Hess on the "Red Hat Enterprise Linux Presents …" live stream. The topic of discussion was general systems administration practices, and it became clear that Ken and I have very different opinions of what that is.
Both Ken and I worked in what I can only describe as the Gilded Age of systems administration. In this time, administrators would lovingly hand craft the systems that they administered. There was literally a guild, which still exists today: The System Administrators Guild. Also, during this period, a lot of systems were incredibly expensive. As a result, administrators would often manage 5-20 servers. One of the factors was that compute hardware in the Unix space was extremely expensive. My Silicon Graphics Indy workstation, pictured below, was about $26,000 when new.
In this era, we needed different skills to be effective administrators, and we spent a lot of our time doing tasks like:
Storage planning: Large disk drives were 1G, and filesystems didn't support features like resizing. When you installed a system and set up its storage configuration, the sizing selected for different filesystems and their placement on the disk was important for ensuring the machine's longevity. If you chose poorly, you'd find yourself months later redoing the whole thing and restoring content from a backup.
Software management: Packaging of software was almost non-existent. Generally, you were downloading a source archive, compiling it, and then installing it on the machine. However, because that software wasn't packaged, you, as the administrator, then got to maintain it as well. This consisted of monitoring the project you had downloaded (like Apache) for updates to be issued. Once they were, you were then able to download the updated version, compile again, and install it. What fun, right?
Recompiling the kernel: If you were lucky, when you needed an extra device on your machine, like a tape library, scanner, or optical storage, the system's kernel would have the drivers for it. However, oftentimes, it didn't. That meant that you got to re-compile your kernel to either add the driver or, again, if you were lucky, make a driver module for the kernel. What a great way to spend a day at work!
Managing individual processes: These systems were often not single-use. Often you had a system functioning as a webserver for some information, but also running data analytics jobs or rendering, running as the mail server, providing DNS services for the organization, and acting as a file server. Because the system did so many things, a runaway Bind process or Apache daemon could drastically impact your organization. This meant that you were checking on the systems and looking at their processes pretty frequently to catch these problems early or writing your own scripts to run in
cron jobs to notify you of potential issues. Unlike most things we rely on today, we didn't have comprehensive monitoring applications. We had to write those ourselves.
Managing users: Because systems were multi-function, they were also used by multiple people who did different things. That meant that different people needed access to various systems. Therefore, you also managed individual accounts across your fleet of systems. There were some central user services, like NIS, but at this time, there wasn't great control over which systems a particular user was allowed to access. That meant that users could access any system in the organization if they were in the central user service. If you worked somewhere less open than that, you got to spend your time using
userdel to maintain who had access to what systems.
[ Readers also enjoyed: Sysadmin careers: Is your sysadmin job going away? ]
Clearly, today, we have many technologies that have obsoleted these tasks, from central user management and monitoring services to packaging formats and better software and hardware ecosystems. That also means that we spend our time at work doing different tasks. Having all these improvements to our technology resources over the years means that we now administer much larger populations of systems. If before was the Gilded Age, now is Industrialized Age of system administration. Larger populations of systems and deployment models like cloud mean that we are operating at a speed and efficiency that would have been impossible in the days of yore.
Today, I'd suggest that skills that allow administrators to perform tasks more efficiently or at a larger scale are more critical to have. Skills such as:
Standardization: Earlier, I talked about systems that had multiple purposes. However, having systems dedicated to a specific purpose means that you can administer all of them together. If one needs an update, they probably all need that update. If one is getting a new configuration setting, they probably all need that configuration setting.
Automation: Automation is also critical to standardization. If you discover that you need to apply an nginx update to all your web servers, you need a method to actually do that. Whether that is rolling your own tooling, scripts, or using a framework like Ansible, you need to have an efficient, repeatable way to accomplish tasks across your systems.
Monitoring: With larger populations of systems to manage, you probably can't check them all. Using a monitoring method allows you to identify problems earlier. Monitoring, when combined with standardization and automation, allows what could have been a cascading failure to be caught early and resolved. For example, if one of your web servers has low disk space on one of its file systems, many of your systems of that type are probably in a similar state (though maybe not yet across the monitoring alert threshold). You could use your automation utilities to address the filesystem issue and apply that to the population to prevent near-future problems from occurring.
Reporting: As you get those larger populations, you can't look at them all individually. You need to collect data from them about their configuration, installed packages, and other features. Again, when combined with automation and standardization, this is a powerful tool as you can do things like apply updates across vast swaths of your population that need them. Equally important is knowing what is in that population. Recently, I was asked if we used a specific vendor's software in our environment. Because I regularly collect data about what is deployed, I was able to report, with confidence, that we did not. Further, I provided some additional data on things like when systems had their last maintenance and details of what that maintenance was. If needed, I can supply a history of actions performed on systems from the population level down to individual boxes.
[ New research from HBR Analytic Services - IT talent strategy: New tactics for a new era ]
As countless industries move from individual, small scale practitioners to larger industrialized processes, so must system administrators adapt. As the article's title suggests, system administration of the Gilded Age is dead, long live system administration in the Industrial age!