订阅内容

OpenShift Container Platform (OCP) is the leading Kubernetes environment for managing container-based applications. However, this is just the core platform. If you go to OperatorHub on the OpenShift web console (UI), you will see hundreds of optional operators, which are analogous to extensions for your browser. Buried in this operator gold mine is one called Network Observability.

Network Observability operator

Network Observability 1.4, as the release number suggests, is not new. The team has put out four feature releases since its first general availability back in January 2023. It has grown significantly since I wrote a blog about Network Observability 1.0. This release supports OCP 4.11 and above.

The focus of this blog is the new features in 1.4, but a quick word about Network Observability. It is a tool that collects traffic flows using an eBPF agent and then enriches and stores them as logs and metrics to provide valuable insight and visualization into your network. In case you've been living under a rock (I mean, a gold crystal) in the Linux world, eBPF is the technology that allows you to extend the kernel capabilities without writing a messy kernel module. It uses various probes to get some really cool statistics that would otherwise be difficult to get, which we will dive into later in this blog.

Features

All of the 1.4 features and enhancements can be put into four categories. They are:

1. Hardware

  • Support for SR-IOV interfaces
  • Support for IBM Z architecture

2. Traffic

  • Packet drops
  • DNS tracking information
  • Export flows and dashboards without Loki
  • Enhancements to Network Observability dashboards
  • Round Trip Time (RTT) {developer preview}

3. UI/UX

  • Comparison operator field in filtering
  • "Swap" source and dest
  • "Back and forth" to include return traffic
  • Vertical slider for changing scopes in Topology

4. Performance and scalability

Some features require configuring the Network Observability eBPF agent to enable a particular feature. Do this when you create a Flow Collector instance. After installing the Network Observability operator, click the Flow Collector link as shown below.

Network Observability - Installed Operator

This link brings up the Create FlowCollector panel. Follow the steps for that feature to enable it.

Hardware

SR-IOV Interfaces

SR-IOV is a hardware standard to virtualize a NIC. In netdevice mode, the eBPF agent can now provide traffic flows that go through these interfaces. To enable this feature, when creating the FlowCollector instance, you must enable privileged mode. This setting is in the Create FlowCollector form view under agent > ebpf > privileged.

eBPF Agent - Privileged

IBM Z

In the last release, we added support for IBM Power and ARM. We now officially support IBM Z as well. Long live the mainframes!

IBM Z

Traffic

On traffic features, Network Observability provides additional information directly relevant to troubleshooting packet drops, DNS, and latency issues. We plan to publish more details about these features and how to use them in future blogs.

Packet drops

The eBPF agent can get real-time packet drops per flow for TCP, UDP, SCTP, and ICMPv4/v6 (such as ping). When creating the FlowCollector instance, you must enable privileged mode and the PacketDrop feature to enable this feature. This is in the Create FlowCollector form view under agent > ebpf > privileged and agent > ebpf > features.

eBPF Agent - Privileged / Features

Now, decide how you want to filter packet drops. In Observe > Network Traffic under Query options, select whether to show flows that have all packets dropped, at least one packet dropped, no packets dropped, or no filter. Be careful if you choose no packets dropped, as that means you won't see flows with packet drops. There are new filters in the filter field for the TCP state and the drop cause. See the highlighted red areas below that it's referring to in the web console. You also need to be running OCP 4.13 or higher.

query option and filter

The Overview tab has several new packet drop graphs, two of which are shown below.

Overview - Packet drops

Click Show advanced options (which then becomes Hide advanced options) to reveal Manage panels. Click this link to choose what graphs to display.

Managed panels

The Traffic flows tab shows the bytes and packet counts of what has been dropped in red. The Topology tab displays a red link between vertices where packet drops have occurred.

Traffic flows - Packet drops

Traffic flows - Topology drops

DNS tracking information

DNS is one networking area that is the source of potential problems. This feature provides information on DNS ID, latency, and response code and the ability to filter on these fields. To enable this feature, when creating the FlowCollector instance, you must turn on privileged mode and the DNSTracking feature. This is in the Create FlowCollector form view under agent > ebpf > privileged and agent > ebpf > features. See the screenshot in the Packet drops section above on where to configure this.

Like the Packet Drops feature, new DNS graphs are in the Overview tab. See above for how to display them. There are also new DNS columns in the traffic flows table.

Overview - DNS

Export flows and dashboards without Loki

Installing Loki is no longer necessary if you only want to export flows to a Kafka consumer or an IPFIX collector. Without Loki and internal flow storage, the netobserv console plugin is not installed, meaning you don't get the Observe > Network Traffic panel and hence no Overview graphs, Traffic flows table, and Topology. You will still get flow metrics in Observe > Dashboards.

Enhancements to Network Observability dashboards

Speaking of dashboards, in Observe > Dashboards, NetObserv / Health selection, there is a new Flows Overhead graph showing the percentage of flows generated by Network Observability itself.

Dashboards - Netobserv / Health

The dashboard under NetObserv was also changed to separate applications and infrastructure.

Dashboards - Netobserv

Round Trip Time (RTT)

Round Trip Time (RTT) is a development preview feature that shows the latency for the TCP handshake process on a per-flow basis. You must enable the FlowRTT feature when creating the FlowCollector instance to use this feature. This is in the Create FlowCollector form view under agent > ebpf > features. See the screenshot in the Packet drops section above on where to configure this. Note the eBPF privileged feature is not required. Setting sampling to 1 (or a low value) is also recommended to avoid missing the TCP handshaking packets (SYN and ACK).

The Overview tab has two new RTT graphs shown below.

Overview - RTT

The Traffic flows tab adds the Flow RTT column. In the table below, it filters and displays all flows that take more than one millisecond (the value is in nanoseconds).

Traffic flows - RTT

UI/UX

Comparison operator field in filtering

To the left of the filter field (see figure above on Query options) is a new field for the comparison operator. Previously, the only comparison operator was an implied "equals" comparison. Now, you can filter on "not equals" or "more than" for numeric value comparisons such as DNS latency or Flow RTT.

"Swap" source and dest

When you click Swap in the filter section, it changes all the source values to dest and vice versa. This makes it convenient to do this operation.

Swap and Back and Forth

"Back and forth" to include return traffic

Before this change, Query options had a Reporter section where you chose Source, Destination, or Both. For external traffic, you either got half the traffic or all the traffic duplicated, depending on your selection. This setting now gives you two similar options. Choose One way (default) to get the traffic exactly in one direction. Choose Back and forth to let it figure out and do the right thing for you. You don't have to think about ingress, egress... I digress! The Direction column in the flows table will show one of ingress, egress, or the new inner, which indicates inter-node traffic.

Vertical slider for changing scopes in Topology

The scope in Topology determines what is shown for the vertices in the graph. The highest level view is to show Kubernetes nodes for the vertices by selecting Node. The next level is Namespace, followed by Owner (typically a Deployment, StatefulSet, or DaemonSet), and finally, Resource for the specific pod-to-pod/service communication. Before this, the selection was under Query options. Now, it is visible as a vertical slider on the topology, as shown on the right.

scope slider

Performance and scalability

We are constantly looking to improve the performance and scalability of the operator at the same time while reducing the resource footprint without compromising on visibility that matters. We have published guidelines on the same and intend to evolve this over time.

Wrap up

I hope you enjoy the new features. This post is a high-level overview of this release, and going forward, we plan to publish other blogs to describe the features in more detail. In the meantime, continue mining for that data nugget!

Special thanks to Julien Pinsonneau, Mohamed Mahmoud, Joel Takvorian, Dave Gordon, Sara Thomas, and Deepthi Dharwar for providing feedback, advice, and accuracy in this article.


关于作者

UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Original series icon

原创节目

关于企业技术领域的创客和领导者们有趣的故事