Sean Cohen and Mauricio "Maltron" Leal
"Software is eating the World," and this is particularly true if you have to keep your solution running non-stop, even after changes are made to your production environment, without creating disruptions for your users. Roughly speaking, we used to rely on traditional APM vendors (Application Performance Monitoring or Application Performance Management), and those vendors would provide the tools needed to make sure you can manage your failures and be able to pinpoint problems when they occur.
Although those vendors offered a key solution the market needed, over time everyone started to realize they were locked into their solutions. If for some reason we needed to change vendors, this would translate to a costly migration which could take years, and end with another lock-in situation.
Observability solutions, roughly speaking, can be described in 4 simple steps:
- Creating Telemetry Signals (Instrumentation): This is the step you need to decide where in your workload you will need to focus your attention. The traditional way of doing this is to insert agents in your environment in order to collect telemetry signals. Another way is to add specific code into your solution in order to generate the information you need to track, usually done by Developers or SREs.
- Collecting, Aggregating or Exporting Telemetry Signals (Data Collection): The next step is about collecting the telemetry signals generated in (1) and deciding what to do with them. Dependending on the telemetry signal, we want to keep it for enhancement (like adding labels), batching, resource detection or sampling. At the end of this step, we want to send them for further processing and analysis.
- Processing Telemetry Signals (Data Processing): Once we have all the telemetry signals available, we can perform analysis and correlate to other historical data to detect trends. Oftentimes, some vendors employ AIOps to predict problems.
- Visualization: Finally, we're ready to show to our user how his/her solution is actually behaving and if there are areas of attention to focus on.
As the market evolves, many realized step #1 (Instrumentation) and step #2 (Data Collection) can be open sourced, as this gives a vendor agnostic approach to anyone to choose their way to Observability. In those 2 steps, 2 particular projects have stood out: OpenTracing and OpenCensus.
It's not uncommon in the Open Source world to have a few projects doing the same things or sometimes, taking different approaches to the same problem. Both communities understand they could benefit if they join forces and work together for a common good. As New Jersey Senator Cory Booker once said: "If you want to go fast, go alone; if you want to go far, go together"
The Dawn of OpenTelemetry
Since the inception of OpenTelemetry at the Cloud Native Computing Foundation (CNCF), it quickly became one of the most popular projects, indicating a clear interest in the market around Observability for Cloud Native workloads. Big players in the space such as Splunk, Red Hat, Dynatrace, Microsoft, Lightstep, Datadog, Amazon (just to name a few) support OpenTelemetry in different form and shape and in their respective websites, educating or promoting their approach to OpenTelemetry and give freedom to the market to choose how you want to bring Observability into your environment.
OpenTelemetry is an open-source observability telemetry data collection, offering vendor agnostic APIs in a cloud native environment.
There are many components about OpenTelemetry but I'm going to mention just 3 of them:
- Specification: It describes cross-language requirements and expectations for all implementations. This includes a common API all vendors agreed on and SDK specific for a computer language, as well as a wired protocol called OpenTelemetry Protocol (or OTLP for short) for cross-communication.
- Instrumentation: It offers several libraries for all kinds of computer languages observable out-of-the-box.
- Collector: This is a service to easily collect all the data generated by your instrumentation and you can decide what to do with it. This gives you the ability to offload your telemetry signals locally (act as an agent) or collect into a single cluster before exporting to a specific Vendor. The Collector offers you an unique way to leverage the power of OpenTelemetry without too many changes in your existing solution.
By harvesting the knowledge and experience of everyone involved in the Observability space, OpenTelemetry is helping us evolve to vendor agnostic situation in how we want to collect telemetry signals from our environment by different people in our organization, which it could be Application Developers, SREs, System Administrators, Security Administrators and etc.
OpenTelemetry is slowly becoming the standard for the industry for telemetry data collection by providing freedom on HOW you want to instrument and HOW you want to collect the data before you decide who is going to give the answers about the failures and performance of your solution.
If you are in an organization which needs to decide what telemetry signals are important to the business, then OpenTelemetry is definitely something you need to look at very closely. Also make sure your developers, SREs and systems administrators get a chance to evaluate it, as well.
At the end of the day, users are the big winners.