In the previous article in the Path to Distributed Tracing series, we discussed how leveraging some key capabilities brought by the service mesh can be a great first step toward modern observability. We also explained why it’s important to take into account context propagation, and how it can help us connect the dots when dealing with complex distributed systems. Now that you understand the theory, it's time for us to show you a real, step-by-step example.
We'll start by deploying a couple of simple client-server applications into aRed Hat OpenShift cluster. We'll show you what the service mesh can do for you with the help of Kiali. We'll demonstrate why context needs to be propagated, and then explore two complementary alternatives (manual header propagation and automatic instrumentation) to achieve it.
Propagating the context
Before installing a demo application in OpenShift, we deploy the requisite components by following the official instructions:
- Red Hat build of OpenTelemetry
- Distributed tracing platform (Tempo)
- Kiali Operator provided by Red Hat
- Red Hat OpenShift Service Mesh
Deploying example applications on OpenShift
First, we deploy the following applications, written in Python:
@app.route('/no-header-forwarding')
def no_forwarding():
# Simulate some work
sleep(0.1)
response = requests.get('http://app2:9080')
# Simulate some work
sleep(0.1)
return flask.Response(response.text)
This application is just a small server that listens to /no-header-forwarding
endpoint. When receiving a request, it nests another request to app2 and, after receiving an answer, it responds to the first request.
The second application, called App2, is even simpler. It listens on / and responds with the string bye
after 0.15 seconds.
@app.route('/')
def bye():
# Simulate some work
sleep(0.15)
return flask.Response("bye")
To follow along and try this yourself, download the manifests for App1 and App2.
Add these apps into the mesh by adding the following deployment annotation under spec.template.metadata.annotations
:
sidecar.istio.io/inject: "true"
Once deployed, we can move to the fun part. Let’s observe!
First observations
After deploying our application into the OpenShift Service Mesh, it provides great observability information in the Kiali console. We're ready to send some simple curl
requests to our application, so open the Kiali console to explore the different dashboards.
curl $(oc -n istio-system get route istio-ingressgateway -o jsonpath='{.spec.host}')/header-forwarding
First, click on Graph in the left column to see the dependency graph.
You can observe who’s talking to whom, and the green lines are telling you that the communication is successful! You also see traffic metrics with useful insights, including response codes.
Now navigate to Applications and click on the Traces tab. One by one, information about the time spent in all your requests is shown:
You can observe a scatter plot showing the time consumed by every request. This information is also aggregated and presented in a heatmap, which allows you to understand percentiles and comparisons between them. You can see that spans are separated into two “lines”, ones that take around 350ms and others 150ms. We'll analyze this later.
Inbound and Outbound metrics are also provided. This helps you have a better idea about how your application is performing, including data about request throughput, duration, sizes, and more. You get all this and you haven’t touched the application yet!
As we explained in our previous article, even though OpenShift Service Mesh and Envoy automate much of the process, a small change is still needed in the application code to observe a full transaction in a distributed system. You must at least propagate the tracing headers to other services.
The Traces view in Kiali graph above reveals that the traces are divided into two lines. Let’s try to find out why.
To do that, open the user interface (UI) for Jaeger by clicking on the View in Tracing link in the top right corner of the Traces view in Kiali. This user interface is made available by the Tempo operator, and constitutes the tracing UI in OpenShift.
By exploring traces in the Jaeger UI, you can see that our distributed trace only contains two spans. The first one is injected by the mesh, and the second one comes from the application. The context is not further propagated between app1 and app2.
Now it’s clear that the service mesh can't do this part by itself. It doesn't understand the service's logic and can't link incoming and outgoing requests magically. This change, while minor in this simple demo app, must be manually added to every part of the code, including when it propagates the request to subsequent services. Without this change, your traces look like what's been shown above, with important information (specifically about App2, in this example) missing.
Let’s fix this! In App1, we just need to extract some of the headers from the incoming request and inject them into the request sent to App2. Specifically, the required headers in our context propagation are the Envoy-generated request ID and the b3 trace headers.
def context_get_headers(headers):
header_names = [
'X-B3-Traceid',
'X-B3-Spanid',
'X-B3-Parentspanid',
'X-B3-Sampled',
'X-B3-flags',
'X-Request-Id',
]
trace_headers = {}
for name in header_names:
if name in headers:
trace_headers[name] = headers[name]
return trace_headers
@app.route('/header-forwarding')
def forwarding():
# Capture headers from the incoming request
headers = context_get_headers(dict(flask.request.headers))
# Simulate some work
sleep(0.1)
response = requests.get('http://app2:9080', headers=headers)
# Simulate some work
sleep(0.1)
return flask.Response(response.text)
Pro tip: Forward only the required headers. Propagating everything received can lead to unexpected behavior. For example, if you forward the Host header, OpenShift Service Mesh will route the request again, causing an infinite loop.
Now that we've instrumented our application to propagate context, we can see all the players involved in a nice distributed trace with all the context ready to be debugged:
Back in the Kiali console, you can see that once we've instrumented the code and repeated the experiment, the two sets of traces have converged into one. Everything is now tied together.
Auto-instrumenting
So far, we've done this the "hard way" by manually instrumenting the code to propagate context. As you can tell, it's not actually that scary. Truth to be told though, there's yet an easier way. You can do all of this without touching a line of your application's code! Using the OpenTelemetry Operator, you can instrument code automatically. With this feature, you can forward the necessary headers for context propagation. To show you how, we need to create an Instrumentation object:
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: python
namespace: demo
spec:
exporter:
# Python autoinstrumentation uses http/proto by default
# so data must be sent to 4318 instead of 4317.
endpoint: http://jaeger-collector.istio-system.svc.cluster.local:4318
propagators:
- tracecontext
- baggage
- b3multi
sampler:
type: parentbased_traceidratio
argument: "1"
resource:
addK8sUIDAttributes: true
Most significantly, keep these components in mind:
- The endpoint: Point to the HTTP receiver
- The propagators used to propagate the needed HTTP headers
- The sampler
For more information, read the OpenTelemetry official documentation.
Finally, we add the annotation to the app1 deployment:
instrumentation.opentelemetry.io/inject-python: "true"
This annotation injects the instrumentation into your deployment. This means that an init container is executed while your pods are being created. It copies the autoinstrumentation libraries to a volume accessible by your application, and injects some environment variables to ensure those libraries are loaded before the ones in your application.
Now our trace has more spans than before. The libraries injected into our application don't just forward the necessary headers for context propagation, they also add extra spans for the instrumented operations. For example, for therequests
library, the get
method is instrumented and one new span is created. The same happens for the Flask
methods.
What next?
If this is so great, why didn't we start by just automagically instrumenting the code from the beginning? It’s because the ultimate goal of observability is understanding our own software to better react to failures that will always happen. Instrumenting the code is crucial, but be mindful of the challenges it entails.
Producing too much data can impact the storage capacity and computing resources of your hardware, so there’s always a balance. As you've already noticed during this simple example, adding instrumentation automatically renders a lot of information. You need to decide whether you want to produce it and store it in all circumstances, or whether it’s better to fine tune its generation, sampling, storage, and retention.
Our recommendation is to start small, understand the added value, and then grow sustainably from there. We think you'll end up instrumenting a lot, and using these great tools, but we also believe in the great value of understanding what's happening.
Conclusion
In this article, we've transformed theory into action, and demonstrated how to put together a service mesh with Kiali and distributed tracing in OpenShift, using OpenTelemetry auto instrumentation. We explained how it can help your teams to focus on implementing and maintaining code while leveraging the OpenShift platform components.
Stay tuned! In the next article we discuss the pros and cons of each method, and in what situation each of them shine.
Sobre os autores
Jose is a Senior Product Manager at Red Hat OpenShift, with a focus on Observability and Sustainability. His work is deeply related to manage the OpenTelemetry, distributed tracing and power monitoring products in Red Hat OpenShift.
His expertise has been built from previous gigs as a Software Architect, Tech Lead and Product Owner in the telecommunications industry, all the way from the software programming trenches where agile ways of working, a sound CI platform, best testing practices with observability at the center have presented themselves as the main principles that drive every modern successful project.
With a heavy scientific background on physics and a PhD in Computational Materials Engineering, curiousity, openness and a pragmatic view are always expected. Beyond the boardroom, he is a C++ enthusiast and a creative force, contributing symphonic and electronic touches as a keyboardist in metal bands, when he is not playing videogames or lowering lap times at his simracing cockpit.
Israel is a Sofware Engineer at Red Hat OpenShift, with a focus on Observability, mainly on OpenTelemetry.
Mais como este
Navegue por canal
Automação
Últimas novidades em automação de TI para empresas de tecnologia, equipes e ambientes
Inteligência artificial
Descubra as atualizações nas plataformas que proporcionam aos clientes executar suas cargas de trabalho de IA em qualquer ambiente
Nuvem híbrida aberta
Veja como construímos um futuro mais flexível com a nuvem híbrida
Segurança
Veja as últimas novidades sobre como reduzimos riscos em ambientes e tecnologias
Edge computing
Saiba quais são as atualizações nas plataformas que simplificam as operações na borda
Infraestrutura
Saiba o que há de mais recente na plataforma Linux empresarial líder mundial
Aplicações
Conheça nossas soluções desenvolvidas para ajudar você a superar os desafios mais complexos de aplicações
Programas originais
Veja as histórias divertidas de criadores e líderes em tecnologia empresarial
Produtos
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Red Hat Cloud Services
- Veja todos os produtos
Ferramentas
- Treinamento e certificação
- Minha conta
- Suporte ao cliente
- Recursos para desenvolvedores
- Encontre um parceiro
- Red Hat Ecosystem Catalog
- Calculadora de valor Red Hat
- Documentação
Experimente, compre, venda
Comunicação
- Contate o setor de vendas
- Fale com o Atendimento ao Cliente
- Contate o setor de treinamento
- Redes sociais
Sobre a Red Hat
A Red Hat é a líder mundial em soluções empresariais open source como Linux, nuvem, containers e Kubernetes. Fornecemos soluções robustas que facilitam o trabalho em diversas plataformas e ambientes, do datacenter principal até a borda da rede.
Selecione um idioma
Red Hat legal and privacy links
- Sobre a Red Hat
- Oportunidades de emprego
- Eventos
- Escritórios
- Fale com a Red Hat
- Blog da Red Hat
- Diversidade, equidade e inclusão
- Cool Stuff Store
- Red Hat Summit