Subscribe to our blog

The Five Pillars of Red Hat OpenShift Observability

It is with great pleasure that we announce additional Observability features coming up as part of the OpenShift Monitoring 4.14, Logging 5.8, and Distributed Tracing 2.9 releases. Red Hat OpenShift Observability’s plan continues to move forward: as our teams tackle key data collection, storage, delivery, visualization, and analytics features with the goal of turning your data into answers. 

What are the problems you can now solve with Red Hat OpenShift Observability?

Data Collection

The Distributed Tracing platform 2.9 takes the OpenTelemetry collector operator Technology Preview to a whole new level. For the first time, it enables the collection of OpenTelemetry Protocol (OTLP) metrics. This also helps users to collect traces and metrics from remote clusters using OTLP/HTTP(s). Additionally, the operator abilities have been expanded to support upgrades, monitoring and alerting of the OpenTelemetry collector instances themselves as it has been promoted to level 4 (Deep Insights). Managed and unmanaged states are now supported too.

Customizable Alerting Rules for Admins

Soon, admins will have the enhanced capability to form new alerting rules based on any platform-related metrics exposed in namespaces such as openshift-, kube-, and default. This significant enhancement allows for the creation of new alerting rules that target metrics across any namespace. Additionally, admins can now clone existing rules, simplifying the rule creation process and also enabling the modification of any existing alert rules. All of this has been developed to address a clear need: enabling administrators to enrich the OpenShift Container Platform with rules tailored to their unique environments.

Advanced Customizations for Node-Exporter Collectors (Phase 2)

Our efforts in Node Exporter customizations are taking the next step forward. Users will be presented with on/off switch options for several collectors, including Systemd, Hwmon, Mountstats, and ksmd. Along with these options, there will be an introduction of general settings for node-exporter, one of which is maxprocs.

Design Scrape Profiles in CMO (Technical Preview)

In an effort to offer greater flexibility and optimization, we're unveiling the concept of optional scrape profiles for service monitors in CMO. This innovation will empower admins to influence the volume of metrics collected by the in-cluster stack. Moreover, the enhanced scaling behavior of CMO will be noticeable both in minuscule and expansive environments. The central idea propelling this change is the provision of an ability to discard non-essential metrics, thus offering admins a deeper control over the gathered data.

Specifying Resource Limits for All Components

In our next release, users will have expanded capabilities to specify resource requirements. While they can currently set limits for components like Prometheus, Alertmanager, Thanos Querier, and Thanos Ruler, we're extending this capability to other vital components such as Node exporter, Kube state metrics, OpenShift state metrics, prometheus-adapter, prometheus-operator, admission webhook, and telemeter-client.

Extend user customizable TopologySpreadConstraints to all relevant pods

A significant update to our platform allows users to configure TopologySpreadConstraints for all pods deployed by CMO. This includes a comprehensive list such as Prometheus-adapter, Openshift-state-metrics, Telemeter-client, Thanos-querier, UWM alertmanager, UWM prometheus, UWM thanos ruler, Prometheus-operator, Kube-state-metrics, and Config reloader.

Data Storage 

Distributed Tracing 2.9 comes with several enhancements for Tempo, our new distributed tracing storage that will soon reach General Availability (GA) status. Tempo is a scalable, distributed tracing storage solution that can be used to store and query traces from large-scale microservices architectures. For now, in Tech Preview, Tempo has been expanded to ingest and store distributed traces in the following protocols when using the Distributor service: Jaeger Thrift binary, Jaeger Thrift compact, Jaeger gRPC and Zipkin. As it happens with the OpenTelemetry operator, we have worked to bring the Tempo operator to level 4 (Deep Insights).

In the same way as we did for the OpenTelemetry collector, the TempoStack custom resource now supports both managed and unmanaged states. We also want to mention that we’ve been working in the Tempo Gateway, which supports OTLP gRPC as the Query Frontend service and provides authentication and authorization capabilities. The Tempo Gateway is a separate component that can be used to query Tempo traces and data ingestion, deployed through the operator. We have also expanded the multitenancy experience to be used without the Gateway.

Logging 5.8 has quite a few improvements to Loki log storage. Customers run the OpenShift Logging stack on clusters that span multiple availability zones. Since our new stack based on Loki has built-in support for zone-aware data replication, we are making this available in Logging 5.8.  With this new feature, data ingestion will span across all tenants in all availability zones, and in the case of an availability zone failure, some query capabilities are ensured.

Also as of Logging 5.8, we are introducing Cluster Restart hardening and Reliability hardening to Loki.  These features increase availability and reliability to Loki - when clusters restart, Loki will keep operating, and will then recover without need for manual intervention.  Also, Loki will be more aware of node placements so that no critical components will share the same node, and customers will be able to tune their Affinity/Anti-Affinity rulesets for Loki.

Data Delivery

In case you were wondering where all those OTLP metrics collected are delivered to, our users can now, in Technology Preview, choose to forward them via OTLP/HTTP(s), OTLP/gRPC or just store them in the user-workload-monitoring via the Prometheus exporter. The new version of the OpenShift distribution for the OpenTelemetry Collector included in Distributed Tracing 2.9 also includes the resourcedetection and k8sattributesprocessor processors, which can be used to detect resource information from the host and append it or override the values in telemetry data with this information. This provides a new power to the user to enrich data on demand with a small configuration change. It will result in querying the OpenShift and Kubernetes API to retrieve the following resource attributes: cloud.provider, cloud.platform, cloud.region, k8s.cluster.name, and add them to your OpenTelemetry signals.

In Logging 5.8, one of our most exciting new features is the ability to create multiple log forwarders.  This feature allows a ClusterLogForwarder to be created in any namespace, and also allows multiple, isolated ClusterLogForwarder instances so that independent groups can forward their choice of logs to their choice of destinations. With this feature, users can control their own log forwarding configurations separately, without interfering with other users’ log forwarding configurations.

Data Visualization

As highlighted in our previous release blogs, we continue to enhance the user navigation and overall functionalities of the OpenShift Web Console - our central Observability visualization tool. Our goal is not only to simplify navigation, but also to empower you, as a user, to reduce the time spent troubleshooting individual clusters and navigate the increasing amount of data/signals. In an effort to refine the monitoring console experience, the console team has transitioned the Monitoring features into an optional plugin for the console. We'll be ensuring these resources are deployed through CMO, making monitoring console pages visible whenever CMO is present. Also from a Monitoring perspective, with OpenShift 4.14 you can benefit from a brand new Silences tab in the Developer perspective of the OpenShift Web Console. Thanks to this new feature, as a developer, you will be able to directly manage alert silences and now also expire them in bulk - automatically minimizing silence noise. Both functionalities are introduced in the video below:

 

With Logging 5.8, we have a series of great features. Firstly, you will be able to benefit from log-based alerts in the Developer perspective of the OpenShift Web Console. With Logging 5.7, those were in fact made available in the Admin perspective. In addition to this, developers can benefit from searching patterns across multiple namespaces. This new functionality will allow users to reduce time spent troubleshooting and track problems down within different services. Take a look at the feature described in the video below:

 

Within Logging 5.8, we are introducing Loki dashboards so that users can have visual insight into the performance and health of their log storage. Finally, by accessing the Developer Perspective of the Web Console, users will be able to search logs, thus patterns, across all namespaces, making it easier to debug applications.

Data Analytics

With Logging 5.8, we are glad to communicate that you will be able to benefit from a first correlation experience directly in the OpenShift Web Console - a Dev Preview feature. As firstly introduced in KubeCon Europe 2023, the Red Hat Observability team has been working on korrel8r - an open source project that aims to make correlation across observability signals accessible to everyone. How can correlation benefit you? You will be able to reduce the amount of time spent troubleshooting individual clusters, thus the time needed to identify issues, by quickly jumping from one observability signal to another. The good news is that we have integrated korrel8r as part of our OpenShift Observability experience, meaning you will be able to quickly switch from an alert to its equivalent log and/or from a log to its equivalent metric. From now on, being able to identify problems will only be a few clicks away!

What are we planning next for Observability?

Our Observability stack is expanding. We are aware of the importance of shedding light on a variety of different metrics, including sustainability. That is why, we are glad to announce that Power Monitoring for Red Hat OpenShift (based on Kepler) will soon be available for Dev Preview, and we can’t wait to listen to your feedback.

We are also working tirelessly to enhance our OpenTelemetry support for different use cases. That means that the OpenTelemetry Operator will become GA very soon, to help our users to avoid vendor lock-in and renew their observability stack while making the right choices.

Distributed tracing will not only provide a GA version for Tempo very soon but also improve the core functionality of traces with RED metrics, auto instrumentation capabilities, ARM support and many more.


About the authors

Roger Florén, a dynamic and forward-thinking leader, currently serves as the Principal Product Manager at Red Hat, specializing in Observability. His journey in the tech industry is marked by high performance and ambition, transitioning from a senior developer role to a principal product manager. With a strong foundation in technical skills, Roger is constantly driven by curiosity and innovation. At Red Hat, Roger leads the Observability platform team, working closely with in-cluster monitoring teams and contributing to the development of products like Prometheus, AlertManager, Thanos and Observatorium. His expertise extends to coaching, product strategy, interpersonal skills, technical design, IT strategy and agile project management.

Read full bio

Vanessa is a Senior Product Manager in the Observability group at Red Hat, focusing on both OpenShift Analytics and Observability UI. She is particularly interested in turning observability signals into answers. She loves to combine her passions: data and languages.

Read full bio

Jose is a Senior Product Manager at Red Hat OpenShift, with a focus on Observability and Sustainability. His work is deeply related to manage the OpenTelemetry, distributed tracing and power monitoring products in Red Hat OpenShift.

His expertise has been built from previous gigs as a Software Architect, Tech Lead and Product Owner in the telecommunications industry, all the way from the software programming trenches where agile ways of working, a sound CI platform, best testing practices with observability at the center have presented themselves as the main principles that drive every modern successful project.

With a heavy scientific background on physics and a PhD in Computational Materials Engineering, curiousity, openness and a pragmatic view are always expected. Beyond the boardroom, he is a C++ enthusiast and a creative force, contributing symphonic and electronic touches as a keyboardist in metal bands, when he is not playing videogames or lowering lap times at his simracing cockpit.

Read full bio

Jamie Parker is a Product Manager at Red Hat who specializes in Observability, particularly in the Logging and OpenStack areas. At Red Hat, Jamie works with organizations and customers to learn about their needs within the ever changing Observability landscape, and based on their feedback, helps to guide upcoming products within the Red Hat Observability Platform. Jamie enjoys sharing lessons learned to the community by frequently speaking at meetups and conferences, and by blogging.

Read full bio

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Original series icon

Original shows

Entertaining stories from the makers and leaders in enterprise tech