Red Hat OpenShift Container Platform 3.7 release introduced a new feature for egress traffic management. Egress is the traffic that is leaving the OpenShift cluster to an external server. An example of such traffic would be an application running in a container invoking an external web service.

The new egress traffic management was developed by Dan Winship, a principal software engineer at Red Hat specializing in OpenShift networking. This new feature is different from the existing egress router. The legacy egress router acts as a bridge between the application pod and the external system. It also requires two network interfaces. The first interface (eth0) interacts with the internal cluster network and the second interface (macvlan0) provides the external physical network IP address. Openshift pods requiring external access get assigned to the egress router service by the cluster administrators. The outgoing traffic from the pod uses the MAC address of the macvlan interface instead of the node.

One particular drawback with the legacy egress router is that it creates a new virtual network device which has its own MAC address. This presents a challenge for cloud-based deployments where macvlan traffic might not be compatible. Additionally, it can only route to a limited number of external servers which makes it difficult to use when pods need to contact many external servers sharing the same recognized source IP.

By contrast, the new egress IP feature just adds a second IP address to the primary network interface, which is much more compatible with various cloud providers. By providing static egress IP addresses per project, administrators can utilize their existing organizational firewall policies instead of defining the firewall rules separately for application containers deployed in OpenShift.

After implementing the new egress IP feature all outgoing traffic from the pods within a project will share the fixed IP address. This namespace-wide egress IP functionality is available for both ovs-multitenant and ovs-network policy plugins. It is currently a technology preview feature in Red Hat OpenShift Container Platform 3.7.

We will go over the steps involved in assigning an egress IP with an example OpenShift project. After allocating the egress IP, we will validate the IP address of the outgoing traffic from our application by performing a simple curl command to an external apache web server.

First, let’s create a sample application.

# oc new-project egress-test
# oc new-app https://github.com/OpenShiftDemos/os-sample-python.git

Let’s check the status of our deployment.

# oc get pods -o wide
NAME                        READY STATUS RESTARTS   AGE IP NODE
os-sample-python-2-jxpcf    1/1 Running 0     2h 10.130.0.67 node1.ocp.io

In this particular example, the application is deployed on “node1.ocp.io” as we can determine from the output above. Let’s find out the IP address for “node1.ocp.io“.

# dig node1.ocp.io +short
      >>> 192.168.1.17

Our application container is named “os-sample-python-2-jxpcf”. Next, we are going to execute a remote curl command to an external apache web server hosted at http://192.168.1.5 via oc exec CLI.

# oc exec os-sample-python-2-jxpcf -- curl http://192.168.1.5/index.html

Apache web server maintains an access_log file where all incoming requests are logged. Let’s check the access log entry from our curl command above which we ran inside our application container in OpenShift.

192.168.1.17 - - [09/Feb/2018:01:41:31 -0600] "GET /index.html HTTP/1.1" 200 46 "-" "curl/7.29.0"

As we can see it logged the IP address of the node where our python application container is deployed. This is expected since we have not yet assigned the egress IP to our project.

Assigning a namespace-wide egress IP is a two step process. First step is where we select a node to host the egress IPs. The second and final step involves assigning an egress IP to the project.

We will select a node to host the egress IPs. In our example below, we selected “node2.ocp.io” as the host. The egress IP must be on the local subnet of the “node2.ocp.io” primary interface. The “egressIPs” field below is an array, therefore, multiple IPs can be allocated for multiple projects if required. We will run the following oc patch command to select “node2.ocp.io” as the egress host node:

# oc patch hostsubnet node2.ocp.io -p '{"egressIPs": ["192.168.1.102"]}'
>>> hostsubnet "node2.ocp.io" patched

We can validate this egress IP assignment by running “oc get hostsubnet” on our master node. Please note the “EGRESS IPS” field below for “node2.ocp.io”:

# oc get hostsubnet 
NAME            HOST HOST IP        SUBNET EGRESS IPS
master.ocp.io   master.ocp.io 192.168.1.16   10.129.0.0/23 []
node1.ocp.io    node1.ocp.io 192.168.1.17   10.130.0.0/23 []
node2.ocp.io    node2.ocp.io 192.168.1.18   10.128.0.0/23 [192.168.1.102]

Let’s login to the egress host node (“node2.ocp.io”) and determine how the egress IP got assigned on the primary network interface of this node.

# ip a
enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
   link/ether 08:00:27:2b:64:e1 brd ff:ff:ff:ff:ff:ff
   inet 192.168.1.18/24 brd 192.168.1.255 scope global enp0s3
      valid_lft forever preferred_lft forever
   inet 192.168.1.102/24 brd 192.168.1.255 scope global secondary enp0s3
      valid_lft forever preferred_lft forever

We can see that the egress IP is a second IP address to the node’s primary network interface.

Next, we will assign this egress IP to our project, “egress-test”. Currently only one egress IP address per project is allowed. We will run the following command from our master node as a cluster administrator:

# oc patch netnamespace egress-test -p '{"egressIPs": ["192.168.1.102"]}'
>>> netnamespace "egress-test" patched

These two commands are all that is required to assign egress IPs for our “egress-test” project.

You might wonder how the traffic flow is being directed since the application container is deployed on a different node (i.e. “node1.ocp.io”) from our egress host node (ie. “node2.ocp.io”). If the application container is not deployed on the node hosting the egress IPs, then the traffic will first route over VxLAN to the node hosting the egress IPs (“node2.ocp.io” in our example) and then route to the external server.

Now let’s run our curl test again to the external apache web server.

# oc exec os-sample-python-2-jxpcf -- curl http://192.168.1.5/index.html

We will check the access_log file on our apache web server for the IP address:

192.168.1.102 - - [09/Feb/2018:01:55:31 -0600] "GET /index.html HTTP/1.1" 200 46 "-" "curl/7.29.0"

As we can see this time apache shows our assigned egress IP “192.168.1.102” from the project as the requestor IP address instead of the node IP.

Behind the scene, the egress IP service uses a combination of Open vSwitch (OVS) flows and iptables rules to match the egress packets. OVS flows are used first to get the traffic to the correct node, and then to output the traffic with a "mark" set indicating the egress IP to use. An iptables rule then matches that mark and directs the traffic accordingly. We will elaborate this communication flow below using our “egress-test” project as a reference.

Let’s run the following ovs-ofctl command to find out the ovs flow details on “node1.ocp.io” where our application container is deployed:

[root@node1 ~]# ovs-ofctl -O OpenFlow13 dump-flows br0 table=100
OFPST_FLOW reply (OF1.3) (xid=0x2):
cookie=0x0, duration=8457.072s, table=100, n_packets=6, n_bytes=479, priority=100,ip,reg0=0x1094c2 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:192.168.1.18->tun_dst,output:1
cookie=0x0, duration=9381.898s, table=100, n_packets=0, n_bytes=0, priority=0 actions=goto_table:101

Above the “reg0” attribute value of “0x1094c2” converted to decimal as “1086658” is our project’s virtual network ID (VNID). In ovs-multitenant SDN plug-in, each project has its own virtual network ID. We can run the following command on the master node to find the VNID of the “egress-test” project.

# oc get netnamespace egress-test 
NAME          NETID EGRESS IPS
egress-test   1086658 [192.168.1.102]

The other important attribute is the “set_field” attribute which contains the IP address of node2.ocp.io (“192.168.1.18”). Here we are directing traffic from “node1.ocp.io” over to “node2.ocp.io” over VxLAN.

Let’s run the ovs-ofctl command on “node2.ocp.io” which is the egress host node.

[root@node2 ~]# ovs-ofctl -O OpenFlow13 dump-flows br0 table=100
OFPST_FLOW reply (OF1.3) (xid=0x2):
cookie=0x0, duration=13978.993s, table=100, n_packets=6, n_bytes=479, priority=100,ip,reg0=0x1094c2 actions=set_field:1e:22:c7:d3:08:d9->eth_dst,set_field:0xc0a80166->pkt_mark,goto_table:101
cookie=0x0, duration=14905.971s, table=100, n_packets=0, n_bytes=0, priority=0 actions=goto_table:101

By inspecting the “reg0” attribute, we see that there exists a flow that matches our “egress-test“ project virtual network ID. The “set_field” attribute here is setting the egress destination IP to the “pkt_mark” value. The decimal value of the destination IP “0xc0a80166” (0xc0.0xa8.0x01.0x66) is “192.168.1.102” which matches our assigned egress IP for the project. Traffic is then directed to the tun0 which is an OVS internal port.

The traffic from tun0 then gets picked up by iptables “OPENSHIFT-MASQUERADE” rule matching the egress IP value that the OVS flow has set. Let’s run the following “iptables” command on our egress node host, “node2.ocp.io”:

# iptables -t nat -S POSTROUTING

-A POSTROUTING -s 172.17.0.0/16 ! -o docker0 -j MASQUERADE
-A POSTROUTING -m comment --comment "rules for masquerading OpenShift traffic"      -j OPENSHIFT-MASQUERADE
-A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING

The “OPENSHIFT-MASQUERADE” chain contains a specific rule that matches the egress traffic for external routing. We can see further in the rule below matching under “mark match” attribute for the egress IP that OVS flow has set.

# iptables -t nat --list
Chain OPENSHIFT-MASQUERADE (1 references)
target     prot opt source               destination 
SNAT       all -- 10.128.0.0/14        anywhere mark match 0xc0a80166 to:192.168.1.102
MASQUERADE  all -- 10.128.0.0/14        anywhere /* masquerade pod-to-service and pod-to-external traffic */

The new namespace wide egress IP feature is a great enhancement for external traffic management in OpenShift. The ability to assign a fixed egress IP per project and then using the existing firewall process to control the traffic allows management of egress traffic efficiently.
 

imageTaneem Ibrahim is a principal Technical Account Manager in North America Central region with more than 15 years of experience in application development, platform automation, and management. Taneem is currently helping developers and administrators with Red Hat JBoss Middleware and Red Hat OpenShift Container Platform.

Find more posts by Taneem at https://www.redhat.com/en/blog/authors/taneem-ibrahim .

A Red Hat Technical Account Manager (TAM) is a  specialized product expert who works collaboratively with IT organizations to strategically plan for successful deployments and help realize optimal performance and growth. The TAM is part of Red Hat’s world class Customer Experience and Engagement organization and provides proactive advice and guidance to help you identify and address potential problems before they occur. Should a problem arise, your TAM will own the issue and engage the best resources to resolve it as quickly as possible with minimal disruption to your business.

Connect with TAMs at a Red Hat Convergence event near you! Red Hat Convergence is a free, invitation-only event offering technical users an opportunity to deepen their Red Hat product knowledge and discover new ways to apply open source technology to meet their business goals. These events travel to cities around the world to provide you with a convenient, local one-day experience to learn and connect with Red Hat experts and industry peers.

Open source is collaborative curiosity. Join us at Red Hat Summit, May 8-10, in San Francisco to connect with TAMs and other Red Hat experts in person! Register now for only US$1,100 using code CEE18.