Benchmarking OpenShift Network Performance Part 1: Basics

2020 年 5 月 6 日Courtney Pacheco, Mohit Sheth7 分钟阅读

BACKGROUND

RIPSAW

Ripsaw is a benchmark operator for OpenShift and Kubernetes that is used to establish a performance‌ ‌baseline‌ ‌of‌ ‌your‌ ‌cluster‌ ‌by‌ ‌deploying‌ ‌common‌ ‌workloads‌ ‌such‌ ‌as:‌ ‌uperf,‌ ‌iperf3,‌ ‌fio,‌ sysbench, YCSB, pgbench, smallfile, fs-drift, and hammerdb. You can also use your own workload if you are not satisfied with the workloads built in. Since this article focuses on uperf, however, we will ignore custom workloads and the other common workloads.

UPERF

Uperf is a network performance tool that uses a high-level language called a “profile” to model real-world applications. Users create “profiles” to generate generic workloads for assessing network statistics, including but not limited to: bandwidth or latency with different network protocols (for example, TCP, UDP, etc.), TCP congestion control algorithms, and connection setup/teardown scalability for different network protocols.

SETTING UP THE RIPSAW OPERATOR AND RUNNING UPERF

PREREQUISITES

This section assumes you already have an OpenShift/Kubernetes cluster running. If you do not have one running, please create one before continuing to the next step. Also, make sure that you have set the KUBECONFIG environment variable to point to your kubeconfig file.

GETTING THE OPERATOR SOURCES

To set up the operator, clone the Ripsaw git repo:

$ export RIPSAW=/tmp/ripsaw
$ git clone https://github.com/cloud-bulldozer/ripsaw.git ${RIPSAW}

CREATE THE RIPSAW NAMESPACE

Create the namespace like so:

$ kubectl apply -f ${RIPSAW}/resources/namespace.yaml

This command will generate a namespace called “my-ripsaw,” and you should see the following output in your terminal: namespace/my-ripsaw created

PREPARE TO DEPLOY THE OPERATOR

It’s almost time to deploy the operator. But before you do this, you must modify the CR for uperf. For simplicity’s sake, you will use most of the defaults in the uperf CR file. To run uperf across two nodes, you need to select two nodes you want to analyze. To list your available nodes:

$ kubectl get nodes -n my-ripsaw

Once you’ve done so, modify the uperf CR file, which can be found here:

https://github.com/cloud-bulldozer/ripsaw/blob/master/resources/crds/ripsaw_v1alpha1_uperf_cr.yaml

Change “node-0” with the name of the first node you want to use. Then change “node1” with the other node you want to use. Finally, set “cleanup” to “false” so that the benchmark pod isn’t deleted after completion.

DEPLOYING THE OPERATOR

Now that you’ve modified the uperf CR yaml file, it’s time to deploy the operator. To deploy:

$ kubectl apply -f ${RIPSAW}/deploy
$ kubectl apply -f ${RIPSAW}/resources/crds/ripsaw_v1alpha1_ripsaw_crd.yaml
$ kubectl apply -f ${RIPSAW}/resources/crds/ripsaw_v1alpha1_uperf_cr.yaml
$ kubectl apply -f ${RIPSAW}/resources/operator.yaml

You should see the following outputs:

serviceaccount/benchmark-operator created
role.rbac.authorization.k8s.io/benchmark-operator created
rolebinding.rbac.authorization.k8s.io/benchmark-operator created
clusterrole.rbac.authorization.k8s.io/benchmark-operator-kube created
customresourcedefinition.apiextensions.k8s.io/benchmarks.ripsaw.cloudbulldozer.io created
deployment.apps/benchmark-operator created

ANALYZING UPERF RESULTS

FIND UPERF CLIENT POD

The uperf results can be found in a “uperf-client-<...>” pod. To find such pods:

$ kubectl get pods -n my-ripsaw | grep uperf-client

To view your results, run the following command (replacing <...> with the hash of your pod):

$ kubectl logs pod/uperf-client-<...> -n my-ripsaw

The output from the above command will show you the “profile.” A description of what “transactions” and “flowops” are will be provided in the following subsections.

HANDSHAKE PHASES

The output below is the debug output for the handshake phases (1 and 2) to test the connection. For example:

Allocating shared memory of size 156624 bytes
Completed handshake phase 1
Starting handshake phase 2
Handshake phase 2 with <server-pod-ip>
  Done preprocessing accepts
  Sent handshake header
  Sending workorder
    Sent workorder
    Sent transaction
    Sent flowop
    Sent transaction
    Sent flowop
    Sent transaction
    Sent flowop
TX worklist success  Sent workorder
Handshake phase 2 with <server-pod-ip> done
Completed handshake phase 2

In essence, the handshake process is broken down into two “phases.” During phase 1, the client attempts to connect to the remote host, and verification of version numbers, supported protocols, etc., is executed. During phase 2, however, strands are created in prep for the network testing. A “strand” is a lightweight thread that “either [processes] independent packets (belonging to different connections) or [performs] bookkeeping operations.” Also during this phase is the process of sending (“transferring”) a workorder from the first node to the second node, where the workorder contains one or more flowops. Flowops are essentially basic operations such as “connect,” “disconnect,” “send,” “receive,” and “sendfile.” See reference 5 for a complete list of supported ops, including ones you can write yourself.

EXECUTION OF THE FLOWOPS

Following the handshake phases’ debug output, you should see a similar debug output that describes which transaction from the workorder is being called:

Starting 1 threads running profile:stream-tcp-16384-1 ...   0.00 seconds
TX command [UPERF_CMD_NEXT_TXN, 0] to <server-pod-ip>
timestamp_ms:1579010499379.3477 name:Txn1 nr_bytes:0 nr_ops:0
timestamp_ms:1579010500380.2354 name:Txn1 nr_bytes:0 nr_ops:1
TX command [UPERF_CMD_NEXT_TXN, 1] to <server-pod-ip>
timestamp_ms:1579010500380.3958 name:Txn2 nr_bytes:0 nr_ops:0
timestamp_ms:1579010501382.2000 name:Txn2 nr_bytes:219152384 nr_ops:13376
timestamp_ms:1579010502385.6763 name:Txn2 nr_bytes:727973888 nr_ops:44432
timestamp_ms:1579010503387.1470 name:Txn2 nr_bytes:1226309632 nr_ops:74848
timestamp_ms:1579010504387.8291 name:Txn2 nr_bytes:1772355584 nr_ops:108176
timestamp_ms:1579010505400.3938 name:Txn2 nr_bytes:2225078272 nr_ops:135808

                              </snip>

timestamp_ms:1579045625165.4785 name:Txn2 nr_bytes:12914786304 nr_ops:788256
timestamp_ms:1579045626166.5405 name:Txn2 nr_bytes:13346537472 nr_ops:814608
Sending signal SIGUSR2 to 140690430813952
called out
timestamp_ms:1579010530670.2065 name:Txn2 nr_bytes:13825212416 nr_ops:843824

TX command [UPERF_CMD_NEXT_TXN, 2] to <server-pod-ip>
timestamp_ms:1579010530670.3181 name:Txn3 nr_bytes:0 nr_ops:0
timestamp_ms:1579010530670.3245 name:Txn3 nr_bytes:0 nr_ops:0

The above output has been truncated (hence the “</snip>”), but this output format is what you should expect to see after successfully running uperf with Ripsaw.

Remember that workorder transactions are linked to specific flowops. In the previous subsection, three separate flowops were sent (as transactions!) in the workorder. In the logs above, their transactions are conveniently named “Txn1,” “Txn2,” and “Txn3,” and the flowops are the values defined in the “profile” output of the logs.

FLOWOP AND TRANSACTION (“TXN”) DEFINITIONS AND STATISTICS

Once the flowops have successfully been executed, the “UPERF_CMD_SEND_STATS” command collects and transfers all the statistics, as shown in the sample output below:

---------------------------------------------------------------------------
TX command [UPERF_CMD_SEND_STATS, 0] to <server-pod-ip>
timestamp_ms:1579010530772.1462 name:Total nr_bytes:13825212416 nr_ops:843825
Group Details
---------------------------------------------------------------------------
timestamp_ms:1579010530671.5735 name:Group0 nr_bytes:0 nr_ops:1
Strand Details
---------------------------------------------------------------------------
timestamp_ms:1579010530671.5737 name:Thr0 nr_bytes:13825212416 nr_ops:843827
Txn                Count         avg         cpu         max         min 
---------------------------------------------------------------------------
Txn0                   1    853.41us      0.00ns    853.41us    853.41us 
Txn1               52740    570.21us      0.00ns    148.09ms     30.09us 
Txn2                   1     13.15us      0.00ns     13.15us     13.15us 
Flowop             Count         avg         cpu         max         min 
---------------------------------------------------------------------------
connect                1    852.71us      0.00ns    852.71us    852.71us 
write             843824     35.61us      0.00ns    148.09ms     29.99us 
disconnect             1     12.66us      0.00ns     12.66us     12.66us

You can see that there are three transactions and three flowops. From viewing this debug output, you can see that the client (also known as the “uperf-client” pod) connects to the server (also known as the “uperf-server” pod), sends some data (“write”), then disconnects. So in other words, the first transaction involves connecting each process from the client to the server, the second transaction involves sending 1 byte at a time, and the third transaction involves disconnecting each process from the client to the server (also known as cleanup).

Under the “Txn” (transaction) table, there is a “Count” column, which describes the number of times the transaction was called. So in other words, transaction 0 was called 1 time, transaction 1 was called 52,740 times, and transaction 2 was called 1 time. Similarly, for “Flowop” values, the count represents the number of times connected (once), sent 1 byte (843,824 times), and disconnected (once).

Finally, you can see that the “avg,” “max,” and “min” columns represent the average, maximum, and minimum time it took to execute a transaction or a flowop. The “cpu” column represents the CPU time divided by the count.

NETSTAT RESULTS

The next subsection describes the netstat results. For example:

Netstat statistics for this run
-------------------------------------------------------------------------------
Nic       opkts/s     ipkts/s      obits/s      ibits/s
eth0         9503       11004     3.42Gb/s     5.81Mb/s

Nic = Network Interface Card

opkts = packets sent (“output”)

ikpts = packets received (“input”)

obits = bits sent (“output”)

ibits = bits received (“input”)

RUN STATISTICS

The final section describes the statistics for the run. For example, you may see an output like:

Run Statistics
Hostname            Time       Data   Throughput   Operations      Errors
-------------------------------------------------------------------------------
[<server-pod-ip>] 32.39s    12.87GB     3.41Gb/s       843443        0.00
master            32.39s    12.88GB     3.41Gb/s       843827        0.00
-------------------------------------------------------------------------------
Difference(%)     -0.00%      0.05%        0.05%        0.05%       0.00%

If you recall from earlier, you set the test type equal to “stream” in the CR. So, the above output reflects the results from the “stream” test type with a message size of 16,384 bytes.

NETWORK TEST TYPE

This describes the underlying way the packets will take to reach its destination. There are three basic ways a client and server can communicate.

Using HostNetwork

When the hostnetwork argument in the CR is set to true, the uperf pods can directly access the network interface of the node on which they are running. Having a pod access and/or be accessible on the node’s interface is a privileged operation on OpenShift. It is the type where the throughput obtained between client and server is the highest amongst the other types. For using this type, you tweak the CR as follows:

<snip>  
  workload:
    cleanup: false
    name: uperf
    args:
      serviceip: false
      hostnetwork: true
</snip>

2. Using SDN

This is default and uses the overlay network. It can be OpenShiftSDN or OVNKubernetes. Here the traffic is flowed through the overlay network the user chooses. As of now, the OpenShiftSDN is the default SDN used by OpenShift. It configures an overlay network using Open vSwitch (OVS). For using this type, set serviceip: false and hostnetwork: false

3. Using Service

Service is an abstract way to expose an application running on a set of pods as a network service. Pods are expendable resources and are deleted and restarted time and again. Therefore, they get assigned to any node in the cluster and get a new pod IP address. This can be a problem for an application that is using this pod's resources through the pod's IP address. Here the service is helpful as it is not deleted, and in this case, the client and server are connected through a service. For using this type, set serviceip: true and hostnetwork: false

For further fine tuning of the Ripsaw CR and network performance results, please checkout Benchmarking OpenShift Network Performance Part 2: Deep Dive.