Troubleshooting OpenShift network performance with a netperf DaemonSet

2018년 8월 9일Jose Antonio Gonzalez Prada6분 읽기

A Kubernetes DaemonSet ensures that an instance of a specific pod is running on all (or a selection of) nodes in a cluster. It creates pods on each node, and garbage collects pods when nodes are removed from the cluster.

The simplest use case is deploying a daemon on every node. However, you might want to split that up into multiple daemon sets. For example, if you have a cluster with nodes of varying hardware, they might need adaptation in the memory and/or cpu requests you include for the daemon.

As our approach fit with this use case, we decided to create a DaemonSet that would deploy pods running netperf’s netserver server-side binary in the background. We thought this might be useful for analyzing networking performance within the OpenShift Container Platform (OCP) cluster.

This post shows how we constructed a netperf DaemonSet from scratch.

Dockerfile

First of all, we need to create a custom docker image that will run the netserver binary.

FROM fedora:27 MAINTAINER josgonza@redhat.com RUN \ dnf clean all && \ dnf install http://people.redhat.com/mcroce/packages/netperf-2.7.1-3.x86_64.rpm -y USER 1001

ENTRYPOINT ["/usr/bin/netserver", "-D"] EXPOSE 12865

NOTE: this container doesn’t need privileged rights so you won’t have to grant them Enable Container Images that Require Root.
This Dockerfile is just for testing purposes and to keep this example as simple as possible, but we strongly recommend following best practices when you create your containers:
- Container Image Guidelines
- 10 things to avoid in docker containers

To avoid the complexity of generating the binaries from scratch, we used the RPM netperf-2.7.1-3.x86_64.rpm, courtesy of Matteo Croce (former rpm from Fedora COPR repository teknoraver/netperf).

Once you have the Dockerfile you only need to build the image, ex: docker build -t netperf-fedora. For testing purposes, you could run it and connect to the container:

docker run -d --name netperf netperf-fedora

docker exec -ti netperf /bin/bash

Finally, tag and push the image to your image registry.

DaemonSet Manifest

Create a DaemonSet manifest with the following contents:

apiVersion: extensions/v1beta1

kind: DaemonSet

metadata:

  name: netperf

  namespace: <-your_project->

spec:

  selector:

    matchLabels:

    name: netperf

  template:

    metadata:

      labels:

        name: netperf

        app-name: netperf

    spec:

      nodeSelector:

        type: NODE

        stage: NON_PRODUCTION

      containers:

      - image: <-your_registry->/netperf-fedora:latest

        imagePullPolicy: Always

        name: netperf

        ports:

        - containerPort: 12865

          protocol: TCP

        resources:

          limits:

            memory: 256MB

          requests:

            memory: 256MB

        resources: {}

        terminationMessagePath: /dev/termination-log

      terminationGracePeriodSeconds: 10

Note the .spec.nodeSelector tags. We decided to use non-production computing nodes (not masters or infra nodes) to avoid any impact on production workloads, while still being deployed inside the OCP cluster. Check the DaemonSet docs for details about DaemonSet manifests.

Deploy Daemonset in OCP

Once you have created the YAML for the DaemonSet manifest, login with rights/permissions to modify the selected project (.metadata.namespace in the manifest). Then you can:

Create/deploy the DaemonSet
oc create -f netperf-daemonset.yml
Monitor it
oc get daemonset oc get event --sort-by='.lastTimestamp'
Delete/Undeploy it
oc delete daemonset netperf --cascade

Automation of the netperf tests

Now that you’ve deployed a netperf DaemonSet and its pods are running the netserver daemon, you can execute your netperf client tests from any point of the infrastructure within your OCP cluster.
This bash snippet loops through a list of nodes to collect statistics from the netserver daemon pod on each of them:

TSEC=30

ITERATIONS=5
for HOST in $(oc get nodes -o jsonpath='{range .items[?(.metadata.labels.stage=="NON_PRODUCTION")]}{.metadata.name}{"\n"}{end}');
 do
...
for iteration in $(seq ${ITERATIONS})
 do
 yes | ssh $HOST "./netperf -t TCP_STREAM -cC -l ${TSEC} -H ${POD_IP} " | tee -a logs/${iteration}_TCP_STREAM.log
 yes | ssh $HOST "./netperf -t TCP_MAERTS -cC -l ${TSEC} -H ${POD_IP} " | tee -a logs/${iteration}_TCP_MAERTS.log
 yes | ssh $HOST "./netperf -t TCP_RR -cC -l ${TSEC} -H ${POD_IP} " | tee -a logs/${iteration}_TCP_RR.log
 yes | ssh $HOST "./netperf -t TCP_CRR -cC -l ${TSEC} -H ${POD_IP} " | tee -a logs/${iteration}_TCP_CRR.log
 done
...
done
...

NOTE: about the outer loop, it’s recommended to filter the OCP nodes (at least to discard the nodes where the DaemonSet has not been deployed). As the jsonpath option has a limited filtering functionality, you can use awk instead if you want a subset of nodes.

Variables

TSEC (30): This option controls the length of any one iteration of the requested test.
ITERATIONS (5): Number of iterations.
HOST: IP/FQDN of the host from you want to execute the tests (netperf binary must exists or the script have to copy it with a previous scp command, for example).
POD_IP: destination IP of the pod running the netserver binary listening for client requests.

See the Netperf documentation for more netperf options and features.

Here’s a quick way to parse the results files:

for i in $(ls -d *_TCP_MAERTS.log);do echo $i;awk '/Throughput/,/^[0-9]/{print $5}' $i | egrep -v "[a-zA-Z]"|sed '/^$/d';done

for i in $(ls -d *_TCP_STREAM.log);do echo $i;awk '/Throughput/,/^[0-9]/{print $5}' $i | egrep -v "[a-zA-Z]"|sed '/^$/d';done

Recommended Usage

I recommend having a bastion host with access to the entire OCP infrastructure, and using Ansible to automate the tests.

If you want a random selection of pods for each test rather than a static list, I suggest one of two approaches:

Using OpenShift’s oc command line client and some classic UNIX CLI filter programs:
```
POD=$(oc get po -o wide | grep netperf | awk {'print $6'} | shuf -n1)
```

Using endpoints, so you need to create the netperf service:

apiVersion: v1

kind: Service

metadata:

  labels:

    app-name: netperf

  name: netperf

  namespace: your_project

spec:

  ports:

  - port: 12865

    protocol: TCP

    targetPort: 12865

  selector:

    app-name: netperf

  sessionAffinity: ClientIP

  type: ClusterIP

And then POD=$(oc export -n <-your_project-> ep/netperf | grep ip | awk {'print $3'} | shuf -n1)

NOTE: tested with oc v3.6.0

I recommend the second approach, creating a SVC, because:

You can use the endpoints to choose OCP nodes. This is quite helpful when you want to test from the same node where the netperf POD IP is deployed:
```
NODE=$(oc export -n <-your_project-> ep/netperf | grep -A1 "${POD_IP}" | grep 'nodeName:' | awk {'print $2'})
```
I tried to launch the test through the SVC but could not make it work (probably because TCP headers / NAT or combination of both).
```
./netperf -t TCP_STREAM -cC -l 30 -H ${ClusterIP}  #Failed with timeout
```
Any thoughts about how to fix this would be very appreciated.

Other interesting tests would be:
- From one master (in multi-master environment) to a netperf POD IP: .

MASTER=$(oc get nodes -l type=MASTER --no-headers | awk '$2 == "Ready,SchedulingDisabled" {print $1}' | shuf -n1)

From one node in the OCP cluster to a netperf POD IP.
From any other host deployed in the OCP cluster (like bastion hosts, monitoring hosts ..)

Conclusion

With Kubernetes at its core, OpenShift is a powerful platform that lets you deploy complex or tedious systems/applications in an easy way.

You can use DaemonSets to create shared storage, to run a logging pod on every node in a cluster, or to deploy a monitoring agent on every node, such as Dynatrace.

DaemonSets on OpenShift are also great because they provide useful abstractions for:
- Monitoring and managing logs for daemons in the same way as applications.
- Configuring daemons with the same formats and tools as applications, e.g., Pod templates.
- Running daemons in containers with resource limits to increase isolation between daemons and app containers.