Overview
Jaeger is open source software for tracing transactions between distributed services. It’s used for monitoring and troubleshooting complex microservices environments.
What is distributed tracing?
Distributed tracing is a way to see and understand the whole chain of events in a complex interaction between microservices.
Modern, cloud-native software development relies on microservices: independent services that each provide a different core function. When a user makes a request in an app, many individual services respond to produce a result.
A single call in an app can invoke dozens of different services that interact with each other. How can developers and engineers isolate a problem when something goes wrong or a request is running slow? We need a way to keep track of all the connections.
That’s where distributed tracing comes in. It’s often run as part of a service mesh, which is a way to manage and observe microservices.
Jaeger uses distributed tracing to follow the path of a request through different microservices. Rather than guessing, we can see a visual representation of the call flows.
Organized information about transactions is useful for debugging and optimization. Jaeger includes tools to monitor distributed transactions, optimize performance and latency, and perform root cause analysis (RCA), a method of problem solving.
Jaeger’s open source community
As an open source project, Jaeger benefits from a community of hundreds of contributors. Jaeger is based on the vendor-neutral OpenTracing APIs and instrumentation.
Ridesharing company Uber developed Jaeger as an open source project in 2015. It was accepted as a Cloud Native Computing Foundation (CNCF) Incubation project in 2017 and promoted to graduated status in 2019.
How does Jaeger work?
Jaeger collects, stores, and visualizes “traces” from distributed systems, providing insights into how requests flow through a system, where time is spent, and where errors occur.
Jaeger presents execution requests as traces. A trace shows the data/execution path through a system.
A trace is made up of one or more spans. A span is a logical unit of work in Jaeger–such as a database query or HTTP request. Each span includes the operation name, start time, and duration. Spans may be nested and ordered.
Jaeger’s process
Jaeger works by tracing the executions of an operation across a distributed system and tasking specific routines to several components of Jaeger.
The OpenTelemetry SDK can be used in any programming language and then exported to a format Jaeger or other tracing platform understands.
Once in operation, Jaeger follows this process:
- Jaeger tracing begins with instrumenting an application. Instrumenting modifies an application's code to generate traces. Instrumentation can be manual, using Jaeger client libraries available for various programming languages, or automatic, using middleware and frameworks that support OpenTracing or OpenTelemetry APIs.
- When the application runs, the traces provide a detailed execution path of an operation across microservices. Each trace consists of multiple spans, where each span contains information such as the operation name, start and end time, and key-value pair tags that provide additional context (e.g., HTTP status codes, error messages).
- To link spans together into a single trace, Jaeger passes identifiers and other trace context between services as part of requests and responses. This is known as context propagation. Each span and trace has a unique ID which allows individual components of a request's journey to be pieced together.
- Spans are collected using Jaeger client libraries and sent to the Jaeger Agent which is usually deployed alongside the application or as a daemonset in environments like Kubernetes.
- The Jaeger Collector receives the spans from the Jaeger Agent and stores them in a backend database. Jaeger supports several storage options, such as Elasticsearch, Cassandra, or Google Cloud Bigtable, allowing for scalability and flexibility in how trace data is managed.
- The Jaeger Query service provides a UI for users to search and visualize traces. The Jaeger UI allows developers and operators to explore the details of individual traces, visualize the span hierarchy and timings, and analyze system behavior and performance.
- Jaeger Console is a user interface that lets you visualize your distributed tracing data to gain insights into: latency issues, error analysis, dependency analysis, and performance optimization.
Use cases for Jaeger
Jaeger provides insights into the behavior of microservices and requests through a distributed system. As such, it can provide the following:
- Performance optimization: Jaeger can pinpoint where delays are occurring within a series of microservices as well as providing visualization on how services interact and depend on each other to help optimize resource allocation.
- Root cause analysis: Jaeger traces track a service failure or unexpected result back to its origin to assist in quick resolution. Additionally, Jaeger can be integrated with monitoring systems to alert teams when unusual patterns emerge, such as spikes in latency or error rates.
- Security and compliance: As they show how data flows through a system, traces can serve as a form of audit trail which is crucial for compliance with regulatory requirements concerning data handling and processing.
- Development and testing: Developers can perform Jaeger tracing in local environments allowing them to detect errors, latency and dependency issues before deploying an application.
Distributed tracing and Red Hat
Red Hat ® OpenShift ® Observability is a comprehensive set of observability capabilities that provides deep insights into the performance and health of OpenShift-based applications and infrastructure. A feature of Red Hat’s observability stack is distributed tracing which until 2024 included Jaeger. In early 2024, Red Hat deprecated Jaeger and Elasticsearch in favor of the Tempo Operator and the Red Hat build of OpenTelemetry.
The Red Hat build of OpenTelemetry can be used to collect traces in many formats, not only coming from Jaeger clients, but also zipkin and OpenTelemetry Protocol (OTLP). That’s only the beginning as this collector can be used to collect all your observability signals. Red Hat’s latest releases of distributed tracing encompasses numerous enhancements. In addition to implementing automatic metrics generation from spans, it enables the creation of alerts based on these metrics. To help Prometheus stack integration, we have added the Target Allocator component to our build, which enables customers to scrape Prometheus endpoints effortlessly and efficiently manage and scale.
Tempo serves as a drop-in replacement for the distributed tracing storage and visualization capabilities provided by the Jaeger product. It facilitates simple local deployments for experimenting with distributed tracing or for swiftly troubleshooting deployments that do not require extensive tracing storage as well as large deployments. Tempo still incorporates the Jaeger User Interface, guaranteeing smooth visualization of traces.