Not all traffic has the same priority, and when there is contention for bandwidth, there should be a mechanism for network appliances outside the OpenShift Container Platform (OCP) cluster to prioritize the traffic. To enable this, we will use Quality of Service (QoS) Differentiated Services Code Point (DSCP), which allows us to classify packets by setting a 6-bit field in the IP header, effectively marking the priority of a given packet relative to other packets as "Critical," "High Priority," "Best Effort," and so on.

Marking packets with DSCP as they head out allows a router to distinguish between them and determine, for example, which require higher bandwidth or higher priority and handle their requirements properly.

Starting from OCP 4.11 (enabled by default to all customers), a new Developer Preview OVN-Kubernetes Container Network Interface (CNI) feature is introduced: EgressQoS, which enables a cluster administrator to mark pods egress traffic with a valid QoS DSCP value. The markings will be consumed and acted on by network appliances outside the OCP cluster to optimize traffic flow throughout their networks.

Configuring the router to handle DSCP markings is outside the scope of this post. Instead, we'll focus on how we can apply different markings to traffic coming from pods heading to an external destination using EgressQoS.

A simple user story example: As a cluster administrator, I pre-configured my router to handle the different DSCP values (using colors for demonstration, in reality they are decimals from 0-63) of incoming traffic, by giving “green” traffic full priority, “yellow” traffic low priority, and “red” best effort. I want egress traffic coming from different applications (pods) on a given namespace (namespace1) to be marked with different DSCP “colors” so my router can handle them properly and allow their requirements to be fulfilled. imagefor

In this post, we'll explore how such configuration is available in OCP clusters that use OVN-Kubernetes CNI as their network provider.

What is EgressQoS?

Starting from OCP 4.11, EgressQoS (Developer Preview) is a namespaced Custom Resource Definition (CRD) that enables marking pods egress traffic with a valid QoS DSCP value. A namespace supports having only one EgressQoS resource named default (other EgressQoSes will be ignored).

An EgressQoS resource allows specifying a list of QoS rules, each consisting of 3 fields:

  • dscp: DSCP value for matching egress traffic

  • dstCIDR (optional): Apply DSCP to traffic heading to this CIDR

  • podSelector (optional): Apply DSCP to traffic from pods whose labels match this selector

kind: EgressQoS
apiVersion: k8s.ovn.org/v1
metadata:
name: default
namespace: default
spec:
egress:
- dscp: 30
dstCIDR: 1.2.3.0/24
- dscp: 42
podSelector:
matchLabels:
app: example
- dscp: 28

This example marks the packets originating from pods in the default namespace in the following way:

  • All traffic heading to an address that belongs to 1.2.3.0/24 is marked with DSCP 30.

  • Egress traffic from pods labeled app: example heading to a CIDR that is not 1.2.3.0/24 is marked with DSCP 42.

  • All egress traffic is marked with DSCP 28.

IMPORTANT: The priority of a rule is determined by its placement in the egress array. An earlier rule is processed before a later rule. In this example, if the rules are reversed, all traffic originating from pods in the default namespace is marked with DSCP 28, regardless of its destination or pods labels. Because of that, specific rules should always come before general ones in that array.

Usage Example

Following a similar example to the user story we mentioned previously, here we would like to have the packets coming from the default namespace to be marked the following way:

  • All packets heading to 172.18.0.6/32 marked with DSCP 40.

  • All packets heading to 172.18.0.7/32 from pods labeled app: demo marked with DSCP 50.

To achieve that, we create the following EgressQoS resource in our OCP cluster:

apiVersion: k8s.ovn.org/v1
kind: EgressQoS
metadata:
name: default
namespace: default
spec:
egress:
- dscp: 40
dstCIDR: 172.18.0.6/32
- dscp: 50
dstCIDR: 172.18.0.7/32
podSelector:
matchLabels:
app: demo

Assuming these are the pods in the default namespace:
image2-Sep-28-2023-06-46-46-3931-PM

We can expect the traffic to be marked like:
image1-Sep-28-2023-06-46-46-5057-PM

And, indeed, running tcpdump on each of the destinations and pinging them from the pods results in:

tcpdump on 172.18.0.6 host:

bash-5.0# tcpdump -i eth0 -v icmp

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

10:40:06.238100 IP (tos 0xa0, ttl 62, id 23892, offset 0, flags [DF], proto ICMP (1), length 84)

ovn-worker > a7acb5556708: ICMP echo request, id 7424, seq 0, length 64


10:40:08.280624 IP (tos 0xa0, ttl 62, id 42569, offset 0, flags [DF], proto ICMP (1), length 84)

ovn-worker2 > a7acb5556708: ICMP echo request, id 6656, seq 0, length 64

tcpdump on 172.18.0.7 host:

bash-5.0# tcpdump -i eth0 -v icmp

tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes

10:44:33.847400 IP (tos 0xc8, ttl 62, id 58984, offset 0, flags [DF], proto ICMP (1), length 84)

ovn-worker > 90d8708e53a8: ICMP echo request, id 7680, seq 0, length 64


10:44:37.536332 IP (tos 0x0, ttl 62, id 33532, offset 0, flags [DF], proto ICMP (1), length 84)

ovn-worker2 > 90d8708e53a8: ICMP echo request, id 6912, seq 0, length 64

DSCP is derived from the tos field. To get the right decimal value from the hexadecimal we must convert it to decimals and shift 2 bits to the right (e.g., 0xc8 = 200, after shifting 2 bits to the right we get 50).

When a packet from a pod exits a node, its src is changed to the node’s IP, hence we see here that the packets come from our nodes.

Overall, from our tcpdump outputs we can see that we have reached the desired state.

Summary

In this post we saw how an OCP cluster running OVN-Kubernetes CNI can use QoS DSCP to mark selected pods’ egress traffic with a simple CRD. This allows routers and other network appliances that are connected to the cluster to prioritize packets from pods the same way they do for virtual machines (VMs) and bare-metal servers.


저자 소개

UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Virtualization icon

가상화

온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래