Red Hat on Red Hat: How collaboration can transform configuration management in IT

9 settembre 2020Drew McMillen, Vikas Kumar

As in most IT organizations, configuration management can be complex and crosses both infrastructure and software development disciplines. Red Hat IT worked to remove much of that complexity and drive consistency by directly involving infrastructure, software, information security engineers, and enterprise architects across the organization to create a set of clearly defined standards and best practices.

But, before we get into our successes with this collaboration, it’s important to begin with a clear description of the business problems we faced, and what specific challenges we needed to address. For starters, our then-current configuration management solution, Puppet, was not what most of our team wanted to use, nor what we were recommending to customers.

We made a point to adopt a true Red Hat solution for configuration management and focused our attention on Ansible as a replacement. Ansible’s implementation would allow us to fix another concern: implementing enterprise security standards, specific to configuration management.

At the time, our environment was inconsistent and frankly, spotty at best. In addition, Puppet code was often forked, so there were several branches of master Puppet code and some people were making changes in their branches but not bringing those changes back to the master copy. We recognized that the Puppet code had not been maintained over time.

Related to challenges with inconsistency, our development teams were using Ansible “as needed” with no real use of given standards or best practices to guide its use. And lastly, Puppet and Ansible are both used to provide some non-virtual machine config management, creating more inefficiencies and confusion. To make matters worse, our development teams weren't effectively communicating with each other about how they were using both Puppet and Ansible simultaneously, due once again to a lack of standards and governance, with no central guidelines to follow.

After assessing each of these business challenges, we set to work on establishing key objectives for our success as we transitioned to Ansible, based on the things we thought our best-in-class Configuration Management should be and do:

Drive consistency, reusability, and a predictable system state through loose coupling and standard patterns.
Be modern, supportable, usable, focused on security and projected future needs (e.g. using OpenStack in our new datacenter) including modern application. workloads ( IaaS, PaaS, and eventually Serverless), and on-demand environments.
Align with open hybrid cloud program requirements, including scalability and resiliency requirements.
Be environment and instance agnostic; these should not enforce application environment strategies.
Support shared data and secrets between deployment models (IaaS, PaaS, etc.).
Facilitate offline application and CM development.
Make use of existing sources of truth rather than hard-coding or duplicating (e.g. IdM, AWS tags).
Encourage the creation of testable code.

With our leading indicators clearly defined, we drafted the following goals, in order of importance:

Enable engineers to easily automate the configuration of systems and applications as needed, avoiding unnecessary complexity.
Improve the process and consistency of implementing automation for infrastructure provisioning, orchestration and monitoring.
Use Red Hat’s Open Decision Framework where possible or practical.
Implement Red Hat products, wherever it makes sense.
Reorient IT to a focus on cloud native architectures.
Define a process for making decisions about configuration management.
Contribute final artifacts to CM service owners for inclusion in service roadmaps.
Establish guidelines for building a decentralized community of CM contributors.

With so many goals to meet, we knew it was going to take a proverbial village to tackle our list holistically. Considering the number of moving parts we needed to consider, it was clear that collaboration between our teams and their leaders, as well as executives, was at the core of our project’s future success.

That said, we took a uniquely Red Hat approach by baking collaboration into our recent configuration management revamp efforts. We created a “core working group,” representing a cross-section of infrastructure and development subject matter experts across multiple teams within IT. It was kept intentionally small to help ensure efficient and successful decision making and dialogue. The goal of the core working group was to create a unified, easy-to-use reference architecture with flexible components, driven by our enterprise standards to circumvent chaos and provide improved consistency.

We then created our second working group, called the “extended working group,” representing a wider cross-section of infrastructure and development subject matter experts in our decision making processes. Their collaboration was essential to informed decision making and organizational buy-in.

We clearly laid out the necessary roles, responsibilities, and expectations for members of the core working group and extended working group, along with a detailed RACI model and guiding principles (focused on the problems we were trying to solve). We articulated a clear definition of our current and desired future state for configuration management, and the expected outcomes from the working groups themselves.

In the end, we were able to open direct lines of communication between these groups to create a clear, implementation-agnostic document for base configurations and define a process for making decisions about configuration management (an achievement that is especially important in a decentralized organization like Red Hat).

If we had failed at this critical step, we risked not getting buy-in, because people in decentralized organizations tend to opt to make their own choices in the end. We also contributed final artifacts to CM service owners (ALM / Ansible and Provisioning Automation team) for inclusion in service roadmaps and successfully drafted and implemented guidelines for building a decentralized community of CM contributors.

Lessons learned along the way

Of course, these outcomes didn’t come easily, and once the transition was complete, we had an opportunity to reflect on our lessons learned along the way. For one thing, we did not get the kind of official buy-in necessary from the participant’s managers on the use of the CMWG team member’s time. That’s not to say they were intentionally deprioritizing the CMWG effort, but we should have been more intentional about ensuring the engineers’ time would be set aside to work on this initiative. More to the point, we had some of our most talented and in-demand engineers in this group, which made getting their time committed to our goals even more difficult.

We also spent too much time driving unanimous agreement on decisions, instead of moving forward with the best idea. This was likely the result of having only two managers, already spread very thin with other business priorities, attempting to do all of the necessary project management and facilitation themselves for the duration of the project. In addition, It wasn’t that we had the wrong engineers participating, or that they didn’t have the right skillset, or that they weren’t motivated.

Frankly, it’s just a fact that you cannot make meaningful decisions on solutions at this scale in single hour meetings, once per week. Eventually, we needed to have an honest conversation with the core working group, at which point we decided to convert our hour-long sync sessions into true working sessions. We also reached out to the managers of those associates for additional time. Unfortunately, we realized this late in the game, and should have made changes to the dynamic of those working sessions much earlier, perhaps even at the beginning of the project. It also cannot be overstated that this project was especially difficult simply by virtue of its novelty: this was a brand new process that we had never attempted before, which never makes for easy, swift work, let alone success.

Despite these reflections, we were still able to achieve the original goals we set out to accomplish as a team. We have no doubt that these lessons learned will be considered in our future cross-functional endeavors, especially as we set our sights on the new datacenter we are now building. In future posts, we’ll be sure to address the ways in which effective collaboration played a role in this effort as well, as it continues to be a hallmark of our unique Red Hat culture, especially within our IT organization.

Sugli autori

Drew McMillen

Senior Manager, IT Compute Services, Acme Products

Drew McMillen is Senior Manager, IT Compute Services at Red Hat.

Read full bio

Vikas Kumar

Senior Manager, North America Infrastructure Support and Operations, Acme Products

Vikas Kumar has been with Red Hat since 2006. He works closely with Red Hat product implementations in Red Hat IT, and looks for opportunities to provide feedback, and to collaborate on new features and customer engagement with Red Hat's Product and Technology organization.

Read full bio