Architecting Containers Part 4: Workload Characteristics and Candidates for Containerization

21 avril 2016Scott McCarty (fatherlinux)6 minutes (temps de lecture)

Many development and operations teams are looking for guidelines to help them determine what applications can be containerized and how difficult it may be. In Architecting Containers Part 3: How the User Space Affects Your Applications we took an in depth look at how the user space affects applications for both developers and operations. In this article we are going to take a look at workload characteristics and the level of effort required to containerize different types of applications.

The goal of this article is to provide guidance based on current capabilities and best practices within

the Docker, Kubernetes and OpenShift context. Much of this guidance is subject to change because of how rapidly container technology is moving, but this article will continue to be updated over time.

Guidelines

For about about a year, we have been telling customers that if an application has good separation of code, configuration and data - then the application is a candidate for containerization. Though, as time has went on, we have added more and more variables to the conversation.

The following chart describes some general characteristics of typical workloads seen in the data center. While this isn’t everything that one needs to worry about it seems like a good starting point.

	Easy	Moderate	Difficult
Code	Completely Isolated (single process)	Somewhat Isolated (multiple processes)	Self Modifying (e.g. Actor Model)
Configuration	One Configuration File	Several Configuration Files	Configuration Anywhere in Filesystem
Data	Data Saved in Single Place	Data Saved in Several Places	Data Anywhere in Filesystem
Secrets	Static Files	Network	Dynamic Generation of Certificates
Network	HTTP, HTTPS	TCP, UDP	IPSEC, Highly Isolated
Installation	Packages, Source	Installers (install.sh) and Understood Configuration	Installers (install.sh)
Licensing	Open Source	Proprietary	Restrictive & Proprietary

Code, Configuration, and Data (...and Secrets)

As mentioned above, if an application has good separation of code, configuration, and data, it is much easier to containerize; classic examples include Apache and MySQL.

On a Red Hat Enterprise Linux system, the binaries for both Apache and MySQL are provided via RPMs (Server Packages and Red Hat Software Collections). Configuration is cleanly located in /etc/my.cnf or /etc/httpd.conf. Data is also conveniently located in /var/lib/mysql or /var/www/html.

Since there is clean separation of code, configuration, and data, it makes it quite easy to bind mount a data volume. It also makes it easy to inject configuration and use secrets files to provide certificates, etc. If a workload matches these patterns - it should be be a lot easier to containerize.

Network

Docker and Kubernetes were designed to support standard web services. Advanced networking gets a bit trickier. As you try to serve traffic for raw UDP, or TCP sockets, or even try to connect pods together with dynamically generated SSL certificates, you will find that it gets more difficult. That’s not to say that it’s impossible... but you might have to “do” more engineering as there are less features in the platform to support these kinds of workloads.

Another interesting use case is network scanners. However, these tools (and others like them) are quite easy to containerize and can be ran as Super Privileged Containers.

Installation

Often overlooked is the installation process, but moving to containers, implies that you will be reinstalling every time you build a new version of the container image. This means that having a clean and well understood installation process will greatly simplify your Dockerfiles. This can be the difference between getting things done in five minutes... or spending weeks trying to reverse engineer it.

Many assume that software just comes as a tar ball, RPM, or DEB package, but this is not true with a lot of enterprise applications. When software is delivered as binaries in an open package format or even as source code, it’s quite easy to containerize the application. More often, enterprise application are delivered with an install.sh file, which are built by ISVs or projects, in order to simplify installation across various different platforms.

Sometimes these installation scripts can be reverse engineered and broken out into environment variables, but sometimes they are quite complex and make many configuration changes all over the filesystem. Some teams have even used the installer scripts in the Dockerfile to install a fresh copy of the software each time the container is built - others have used the installation script as the Entrypoint so that a fresh copy of the software is installed every time the container is launched.

Some of these solutions increase the amount of work necessary to containerize an application, but none of them are show stoppers.

Licensing

Like installation, many in the container world, just assume that software is open source and freely usable in a container. In an corporate environment this is simply not the case. Many applications that run on Red Hat Enterprise Linux are proprietary and they are going to be here for a long, long (long) time.

Some of these applications can still be containerized depending on how restrictive their licenses are. Read your End User License Agreement (EULA) and contracts carefully. Site licenses are common with some kinds of software (monitoring agents, logging agents, scanning tools, etc - this makes it quite easy to containerize. Other licenses are more restrictive. In these scenarios it’s best to check with your software vendor’s account team.

Workloads

A simpler way to characterize applications is by workload. In this section, we will use the above guidelines to take a look at how difficult it is to containerize different types of workloads. It is assumed that you are also adopting a container platform such as OpenShift.

First, we will take a look at easier applications, and move our way towards the more difficult workloads.

Easy

The following list of applications very much fall in line with the targeted use cases for Docker, Kubernetes, and OpenShift. In fact, these applications are in the proverbial sweet spot for container orchestration as they will gain the biggest benefits from being containerized and orchestrated with OpenShift.

Web Based
Web Scale
Stateless
Stateful (support sharing state. ex: JBoss EAP, Tomcat)
Require Persistent Storage (with good separation of data)
Applications using supported technologies from Red Hat (e.g. PHP, Python, Ruby, Perl, Node.js, Java - EAP and Tomcat, MySQLDB, PostgreSQL, MongoDB) and XPaaS technologies

Moderate

It is possible to containerize and run these applications on OpenShift, but they may not be low hanging fruit. Each of these workloads should be considered on case by case basis and may need refactoring, engineering work, and complex implementations.

Applications which have compliance requirements (e.g. PCI compliance has some network isolation challenges where the current network isolation per project would not be able to solve).
Big data workloads. There are still some challenges with how data on a per pod basis is handled. It is possible to work around, but currently Kubernetes does not support a simple object that does this easily with external volumes.
Apps that require custom or advanced routing needs. If the workload requires something other than what is provided today by the default routing mechanism in OpenShift, it will be more difficult to run in a containerized environment. There are quite a few enhancements on the roadmap. But until those are implemented, you’ll have to deal with these requirements on a case by case basis.
Applications which need to expose non-HTTP(s)/WebSocket. These can be handled with NodePorts, but will need some work.

Difficult

These are applications that require major rearchitecture, refactoring or rewriting to be able to containerize them.

Highly proprietary software can be a challenge. Restrictive licenses may prevent moving and sharing the containerized application with a registry server. One workaround may be to refactor the workload to use non-proprietary software. Note that some proprietary vendors are providing docker images and some do have site license agreements for their software.
ERPs, CRMs, and legacy workloads such as COBOL applications running on mainframes and microcomputers. These applications were typically designed to have configuration settings and data all over the filesystem. Worse, the application may change the setting from within a web interface that was bolted on later.
.Net apps are currently under this category but this may change in the future once it becomes better supported on Linux.

Conclusion

Containers are really just “fancy processes” that run in their own pristine user space, so with enough time, money, and willpower almost any workload can be containerized - but, to no one’s surprise, some applications are more difficult to containerize than others. A recommended strategy is to start with the easier applications, gain some success and confidence, then move on to more difficult applications.

There is a tipping point, when several applications have been moved into containers and the value of containerization becomes clear to development and operations teams. They then will look to containerize everything that they can (...we’ve been through this phase). By this point, you will have a lot more experience and it will become more and more clear as to what workloads can be containerized.

Next time somebody asks you the question, “Can I containerize application XYZ?” - tell them, “...well it depends...” and, as a starting point, send them a link to this post!