Iniciar sesión / Registrar Cuenta

People working in the telco space with DPDK accelerated Open vSwitch have probably seen questions about packet drops a number of times and with multiple customers. Is it OVS-DPDK, or is it the VNF? Why is it happening, and where? What happened, and more importantly, what can we do about it? In this post we'll look at troubleshooting and answering some of these questions.

About three years ago, at OVScon 2017, Ericsson came up with a proposal made of two parts:

  • Exposing more advanced metrics from the PMD Threads

  • Improve how drop packets are classified in Open vSwitch

Since then (and many OVS-DPDK escalations after) the upstream community (including a number of Red Hatters) has worked really hard. So it was clear, we had to improve the usability and the observability features of OVS-DPDK.

We're blind when it comes to troubleshooting, and this has to change.

Let's start with the basics, what does the modern OVS-DPDK architecture look like? (Keep in mind, this won't take into consideration HWOffload and AFXDP scenarios) 

Figure 1 - OVS-DPDK architecture

Figure 1 - OVS-DPDK architecture

Figure 2 - OVS-DPDK trouble spots

Figure 2 - OVS-DPDK trouble spots

This post will break down the new features added to OVS, when those features are useful for troubleshooting and  how to consume them.

1. OVS-DPDK PMD Performance Metrics

Okay, so this is a big one and probably the most complex to understand of the series. Ericsson pitched it highly as "The main tool for troubleshooting OVS-DPDK data plane issues."

So what does it do?

  • Provides detailed statistics of what is happening in every iteration of a PMD thread, recording:

    • Cycles (aka the CPU Frequency also exposed as such)

    • Packets

    • RX Batches

    • packets/batch

    • max vhost user qlen (queue length)

    • upcalls and the cycles spent in upcalls

  • Those statistics are also exposed as histograms in three different forms, which can be extremely useful, especially to spot recurring patterns over time: 

    • Histograms since the statistic reset (or Open vSwitch start).

    • Histograms for the last second in slices of ms.

    • Histograms for the last 999 iterations in slices per iteration.

  • As if above wouldn't be enough, the icing on the cake is the detection and logging of suspicious PMD iterations, or in other words, instead of scanning hours of histograms, OVS can log suspicious iterations given some patterns:

    • PMD Iteration lasts longer than US_THR microseconds (default 250)

    • Max vhost qlen exceeds a threshold Q_THR (default 128)

These Open vSwitch improvements come from experience in the field where real-life work helped highlight aspects to keep track of through the DPDK. And that's the most important feature for me. Being able to see and correlate events over time (e.g. upcalls every 1ms and dropped packets) is so powerful that it will dramatically shorten the RCA. These advanced PMD performance metrics are disabled unless the operator turns them on. There is a 1%-ish performance impact. To enable it.

ovs-vsctl set Open_vSwitch . other_config:pmd-perf-metrics=true

To see the statistics together with the histogram:

ovs-appctl dpif-netdev/pmd-perf-show

The command is quite useful since the stats are available also per PMD (specifying the common -pmd [core] argument) and the cleanup is consistent with the current CLI, using:

ovs-appctl dpif-netdev/pmd-stats-clear

Let's break it down a bit.

1.1 Global PMD Stats

The top part reports global PMD statistics, similar to pmd-stats-show but way easier to understand. In my case, a single PMD thread is displayed.

ovs-appctl dpif-netdev/pmd-perf-show -nh -pmd 2

Time: 14:02:49.792
Measurement duration: 74.074 s

pmd thread numa_id 0 core_id 2:

  Iterations:           253645450  (0.29 us/it)
  - Used TSC cycles: 182424040882  ( 98.5 % of total cycles)
  - idle iterations:    246279232  ( 42.4 % of used cycles)
  - busy iterations:      7366218  ( 57.6 % of used cycles)
  Rx packets:           148150420  (2000 Kpps, 709 cycles/pkt)
  Datapath passes:      148150420  (1.00 passes/pkt)
  - EMC hits:            46902786  ( 31.7 %)
  - SMC hits:           101243794  ( 68.3 %)
  - Megaflow hits:           3832  (  0.0 %, 1.00 subtbl lookups/hit)
  - Upcalls:                    8  (  0.0 %, 27.0 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:           148146584  (2000 Kpps)
  Tx batches:            11667397  (12.70 pkts/batch)

There is a clear report of:

  • How long a PMD iteration takes on average (0.29 us).

  • From the total number of iterations (253645450), how many are busy (7366218) and idle (246279232) ones, in other words, does the PMD have any spare capacity?

  • How many are the used TSC cycles from a given CPU, in other words, the number of the CPU clock cycles.

  • How many receiving packets per second (2000 Kpps) and the number of cycles spent per packet.

  • How many passes per packet (also known as recirculation — none in this case with a value of 1.00 passes/pkt) happen in the datapath and statistics about the caching utilization (EMC, SMC, Megaflow, Upcall and any lost ones).

  • How many transmitting packets per second (2000 Kpps), the number of batches and of course, per batch how many packets are sent.

1.2 Performance Histogram

The performance histogram is not as intuitive at first but it includes a great deal of detail about performance over time.

ovs-appctl dpif-netdev/pmd-perf-show -pmd 2

Time: 14:02:49.796
Measurement duration: 74.078 s

pmd thread numa_id 0 core_id 2:

  Iterations:           253661585  (0.29 us/it)
  - Used TSC cycles: 182436009747  ( 98.5 % of total cycles)
  - idle iterations:    246294875  ( 42.4 % of used cycles)
  - busy iterations:      7366710  ( 57.6 % of used cycles)
  Rx packets:           148160167  (2000 Kpps, 709 cycles/pkt)
  Datapath passes:      148160167  (1.00 passes/pkt)
  - EMC hits:            46905591  ( 31.7 %)
  - SMC hits:           101250736  ( 68.3 %)
  - Megaflow hits:           3832  (  0.0 %, 1.00 subtbl lookups/hit)
  - Upcalls:                    8  (  0.0 %, 27.0 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:           148156331  (2000 Kpps)
  Tx batches:            11668188  (12.70 pkts/batch)

Histograms
   cycles/it              packets/it             cycles/pkt             pkts/batch             max vhost qlen         upcalls/it             cycles/upcall
   499       244862637    0         246295034    100       0            0         0            0         253661747    0         253661739    999       0
   716       634491       1         1371421      1096      4883015      1         3269163      1         0            1         8            1258      0
   1025      725571       2         376039       2093      1115087      2         1308309      2         0            2         0            1584      0
   1469      70524        3         276151       3090      1023449      3         914742       3         0            3         0            1995      0
   2104      76371        4         249456       4086      331190       4         848946       4         0            4         0            2511      0
   3014      955012       5         215626       5083      10891        5         703371       5         0            5         0            3162      0
   4317      865926       6         178136       6080      2417         6         632981       6         0            6         0            3981      0
   6183      602917       7         147110       7076      532          7         598099       7         0            7         0            5011      0
   8857      619503       8         133248       8073      116          8         563776       8         0            8         0            6309      0
   12686     881332       9         126671       9070      7            9         544639       9         0            9         0            7943      0
   18171     948439       10        115439       10066     0            10        541891       10        0            10        0            9999      0
   26027     1019270      11        120137       11063     0            11        596121       11        0            11        0            12589     0
   37279     1210263      13        267800       12060     0            12        644960       12        0            12        0            15848     0
   53395     189486       17        574838       13056     1            13        640917       13        0            13        0            19952     0
   76480     1            22        490689       14053     0            14        604726       16        0            14        0            25118     0
   109544    1            28        546363       15050     1            16        1114554      20        0            15        0            31622     2
   156903    1            35        498462       16046     2            17        398686       25        0            16        0            39810     2
   224736    4            45        636232       17043     0            18        354995       31        0            17        0            50118     2
   321895    0            57        542871       18040     0            19        300376       38        0            18        0            63095     2
   461058    0            72        500027       19036     0            20        285025       48        0            19        0            79432     2
   660385    0            92        0            20033     0            21        256811       59        0            20        0            99999     4
   945886    0            117       0            21030     0            22        263800       73        0            21        0            125892    2
   1354815   0            148       0            22026     0            23        254168       91        0            22        0            158489    0
   1940535   0            188       0            23023     0            24        265690       113       0            23        0            199526    0
   2779476   0            239       0            24020     0            25        226204       140       0            24        0            251188    0
   3981111   0            303       0            25016     0            26        220191       174       0            25        0            316227    0
   5702243   0            385       0            26013     0            27        208123       216       0            26        0            398107    0
   8167461   0            489       0            27010     0            28        228785       268       0            27        0            501187    0
   11698454  0            621       0            28006     0            29        209380       332       0            28        0            630957    0
   16755981  0            788       0            29003     0            30        222237       412       0            29        0            794328    0
   24000000  0            1000      0            30000     0            32        1815253      512       0            30        0            1000000   0
   >         0            >         0            >         5            >         0            >         0            >         0            >         0
----------------------------------------------------------------------------------------------------------------------------------------------------------
   cycles/it              packets/it             cycles/pkt             pkts/batch             vhost qlen             upcalls/it             cycles/upcall
   719                    0.58409                708                    12.69560               0.00000                0.00000                67425

So what's the meaning of all of this?

  • Cycles per iteration (cycles/it): this histogram tells us how the CPU and memory subsystem performs, how the NICs (different vendor, VHU etc) performs and how the features enabled in Open vSwitch are affecting the number of cycles required to complete an iteration.

    • In above example, most of the time 499 cycles is what is required to complete most of the iterations (499 cycles per 244862637) but together with the global PMD stats, it's clear those are mostly the idle iterations (idle iterations: 246294875). Then we can see that actually, most busy iterations require between 26027 and 37279 cycles as those have respectively 1019270 and 1210263 iterations

  • Packets per iteration (packets/it): this histogram tells us how OVS enabled features and the traffic profile can cause packet processing to take more time. 

    • In the above example, clearly the lower the number of packets (1 packet per 1371421 iterations) the more inefficient this processing was and a possible cause could be an Upcall but let's wait for the next section for that. 

  • Cycles per packet (cycles/pkt): this histogram tells us how CPU and OVS enabled features are affecting the packet processing.

    • In above example, clearly most packet, 4883015, have spent only 1096 cycles each

  • Packets per batch (pkts/batch): this histogram tells us how many packets can be pushed per batches

    • In above examples, there is no definitive answer, but clearly 3269163 times only a single packet has been pushed which sounds quite bad compared with the opposite side of the histogram where 1815253 times 32 packets have been sent together. To explain this column further, for example, for every PMD iteration OVS counts the number of packets processed and increases the ‘batch’ (+1)  for that packet count. 

  • Upcalls per iteration (upcalls/it): this histogram tells us how many iterations have made an Upcall

    • In above examples, it's flat and only 8 iterations have made an Upcall

  • Cycles per Upcall (cycles/upcall): this histogram tells us similarly to the other one, how my cycles are spent for Upcall processing

    • Being the above example flat also the cycles have little value here

  • At the bottom per histogram, we also have the average values 

The max vhost qlen requires a different example and is one of if not the most important histogram in this series

  • Depth of vHost queue (max vhost qlen): this histogram tells us an extremely important value about how OVS performs. The queue length represents the buffer of virtual interfaces being used to push packets from the VM to OVS. In the event the buffer is full,  when, for example, the DPDK application can’t keep up, there is a packet drop and OVS, on the host, is completely unaware of this drop (which is only truly visible in the VM's counter, which could be a dpdk app or the kernel) 

    • In above original examples, all is zero because that PMD only handles physical interfaces, see the below example which is for a PMD dedicated to a single VHU. We can clearly see how most of the time the buffer is empty (which is very good) and the peaks are mostly between 1 and 12

ovs-appctl dpif-netdev/pmd-perf-show -pmd 8

Time: 10:33:23.320
Measurement duration: 124.554 s

pmd thread numa_id 0 core_id 8:

  Iterations:           278641911  (0.44 us/it)
  - Used TSC cycles: 304655522174  ( 97.8 % of total cycles)
  - idle iterations:    232566988  ( 23.5 % of used cycles)
  - busy iterations:     46074923  ( 76.5 % of used cycles)
  Rx packets:           305809148  (2455 Kpps, 762 cycles/pkt)
  Datapath passes:      305809148  (1.00 passes/pkt)
  - EMC hits:           305809148  (100.0 %)
  - SMC hits:                   0  (  0.0 %)
  - Megaflow hits:              0  (  0.0 %, 0.00 subtbl lookups/hit)
  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:           305809148  (2455 Kpps)
  Tx batches:            46074923  (6.64 pkts/batch)

Histograms
   cycles/it              packets/it             cycles/pkt             pkts/batch             max vhost qlen         upcalls/it             cycles/upcall
   499       230082962    0         232567031    100       0            0         0            0         232567031    0         278641961    999       0
   716       2348794      1         4429713      1096      37488609     1         4429713      1         4429713      1         0            1258      0
   1025      126360       2         4357395      2093      7723788      2         4357395      2         4357395      2         0            1584      0
   1469      440700       3         4716953      3090      857961       3         4716953      3         4716953      3         0            1995      0
   2104      4178725      4         7991733      4086      4527         4         7991733      4         7991733      4         0            2511      0
   3014      7490999      5         2139353      5083      13           5         2139353      5         2139353      5         0            3162      0
   4317      10602715     6         2041110      6080      11           6         2041110      6         2041110      6         0            3981      0
   6183      12747816     7         1821996      7076      4            7         1821996      7         1821996      7         0            5011      0
   8857      7443233      8         9572653      8073      5            8         9572653      8         9572653      8         0            6309      0
   12686     1538068      9         1506048      9070      2            9         1506048      9         1506048      9         0            7943      0
   18171     560169       10        1028548      10066     0            10        1028548      10        1028548      10        0            9999      0
   26027     1081352      11        630210       11063     0            11        630210       11        630210       11        0            12589     0
   37279     56           13        2996375      12060     4            12        2636106      12        2636106      12        0            15848     0
   53395     16           17        999776       13056     1            13        360269       13        360269       13        0            19952     0
   76480     0            22        437824       14053     0            14        203801       16        908323       14        0            25118     0
   109544    0            28        290066       15050     1            16        704523       20        467773       15        0            31622     0
   156903    0            35        1115181      16046     0            17        91453        25        238188       16        0            39810     0
   224736    0            45        0            17043     0            18        60406        31        168761       17        0            50118     0
   321895    0            57        0            18040     0            19        45100        38        39675        18        0            63095     0
   461058    0            72        0            19036     0            20        270814       48        229133       19        0            79432     0
   660385    0            92        0            20033     1            21        34224        59        234547       20        0            99999     0
   945886    0            117       0            21030     0            22        27280        73        306394       21        0            125892    0
   1354815   0            148       0            22026     0            23        23247        91        220944       22        0            158489    0
   1940535   0            188       0            23023     2            24        132301       113       28756        23        0            199526    0
   2779476   0            239       0            24020     1            25        21136        140       347          24        0            251188    0
   3981111   0            303       0            25016     0            26        18417        174       4            25        0            316227    0
   5702243   0            385       0            26013     0            27        17121        216       2            26        0            398107    0
   8167461   0            489       0            27010     0            28        77844        268       0            27        0            501187    0
   11698454  0            621       0            28006     0            29        17250        332       0            28        0            630957    0
   16755981  0            788       0            29003     0            30        17516        412       0            29        0            794328    0
   24000000  0            1000      0            30000     0            32        1080415      512       0            30        0            1000000   0
   >         0            >         0            >         0            >         0            >         0            >         0            >         0
----------------------------------------------------------------------------------------------------------------------------------------------------------
   cycles/it              packets/it             cycles/pkt             pkts/batch             vhost qlen             upcalls/it             cycles/upcall
   1093                   1.09750                761                    6.63722                1.20933                0.00000                0

Here is the history of the past 10 milliseconds:

ovs-appctl dpif-netdev/pmd-perf-show -nh -ms 10 -pmd 8

Time: 10:59:10.376
Measurement duration: 125.730 s

pmd thread numa_id 0 core_id 8:

  Iterations:           106715062  (1.16 us/it)
  - Used TSC cycles: 309408267307  ( 98.4 % of total cycles)
  - idle iterations:     88882527  (  8.7 % of used cycles)
  - busy iterations:     17832535  ( 91.3 % of used cycles)
  Rx packets:           308696611  (2455 Kpps, 915 cycles/pkt)
  Datapath passes:      308696611  (1.00 passes/pkt)
  - EMC hits:             4013658  (  1.3 %)
  - SMC hits:           290122445  ( 94.0 %)
  - Megaflow hits:       14560489  (  4.7 %, 1.00 subtbl lookups/hit)
  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:           308696592  (2455 Kpps)
  Tx batches:            17832535  (17.31 pkts/batch)

   ms             iterations   cycles/it    Kpps         cycles/pkt   pkts/batch   vhost qlen   upcalls      cycles/upcall
   8582460        713          3431         2496         910          17           4            0            0
   8582459        1009         2447         2416         913          17           3            0            0
   8582458        711          3464         2500         916          17           4            0            0
   8582457        979          2517         2428         910          18           2            0            0
   8582456        859          2869         2468         911          16           3            0            0
   8582455        934          2631         2442         907          17           2            0            0
   8582454        817          3005         2453         919          14           3            0            0
   8582453        1022         2404         2441         897          17           3            0            0
   8582452        961          2556         2460         899          15           3            0            0
   8582451        890          2751         2472         899          16           3            0            0

And also the past 10 iterations:

ovs-appctl dpif-netdev/pmd-perf-show -nh -it 10 -pmd 8

Time: 10:59:12.890
Measurement duration: 128.244 s

pmd thread numa_id 0 core_id 8:

  Iterations:           108763877  (1.16 us/it)
  - Used TSC cycles: 315598998711  ( 98.4 % of total cycles)
  - idle iterations:     90590023  (  8.7 % of used cycles)
  - busy iterations:     18173854  ( 91.3 % of used cycles)
  Rx packets:           314870545  (2455 Kpps, 915 cycles/pkt)
  Datapath passes:      314870545  (1.00 passes/pkt)
  - EMC hits:             4093873  (  1.3 %)
  - SMC hits:           295925740  ( 94.0 %)
  - Megaflow hits:       14850900  (  4.7 %, 1.00 subtbl lookups/hit)
  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:           314870513  (2455 Kpps)
  Tx batches:            18173854  (17.33 pkts/batch)

   iter                cycles       packets      cycles/pkt   pkts/batch   vhost qlen   upcalls      cycles/upcall
   41217710251         28588        32           893          32           60           0            0
   41217710250         27551        32           860          32           65           0            0
   41217710249         26861        32           839          32           67           0            0
   41217710249         26861        32           839          32           67           0            0
   41217710248         28118        32           878          32           76           0            0
   41217710247         28515        32           891          32           78           0            0
   41217710246         28267        32           883          32           80           0            0
   41217710245         27079        32           846          32           87           0            0
   41217710245         27079        32           846          32           87           0            0
   41217710244         30394        32           949          32           80           0            0

Now let's do some analysis of the following traffic profile:

  • 8Mpps (the following PMD is only for a single DPDK Physical port and a single VHU, and there are two PMD not reported here, so the total N/S is actually 8Mpps).

  • 64B Packet size.

  • 51K flows (as couples of different source and destination IP addresses).

  • UDP packets to simulate GTP (Ethernet, VLAN, IPv4, UDP, random payload).

The idea of this test is to generate some realistic traffic, overloading Open vSwitch a bit beyond its capacity at zero loss  in order to generate some loss  roughly 20 packets dropped per million or 0.00002% drop rate but not a completely unreasonable situation. This is actually a very common situation in the industry. Like a specific VNF was tested with a given traffic profile but then down the road, the packet rate has increased beyond what originally expected (especially in times where people are more and more dependent on the Internet).

The traffic has been generated using OPNFV NFVBench 4.2.2 and routed back by FD.IO VPP 20.05.

ovs-appctl dpif-netdev/pmd-perf-show -nh -pmd 2

Time: 11:25:13.273
Measurement duration: 547.862 s

pmd thread numa_id 0 core_id 2:

  Iterations:            49240680  (11.03 us/it)
  - Used TSC cycles: 1357982526312  ( 99.1 % of total cycles)
  - idle iterations:      4595286  (  0.1 % of used cycles)
  - busy iterations:     44645394  ( 99.9 % of used cycles)
  Rx packets:          2191447001  (4000 Kpps, 619 cycles/pkt)
  Datapath passes:     2191447280  (1.00 passes/pkt)
  - EMC hits:           693804346  ( 31.7 %)
  - SMC hits:          1497614519  ( 68.3 %)
  - Megaflow hits:          28345  (  0.0 %, 1.06 subtbl lookups/hit)
  - Upcalls:                   38  (  0.0 %, 19.6 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:          2191421051  (4000 Kpps)
  Tx batches:            87439714  (25.06 pkts/batch)

ovs-appctl dpif-netdev/pmd-perf-show -ms 2 -it 2 -pmd 8

Time: 11:25:13.278
Measurement duration: 547.867 s

pmd thread numa_id 0 core_id 8:

  Iterations:           294461622  (1.83 us/it)
  - Used TSC cycles: 1348161965122  ( 98.4 % of total cycles)
  - idle iterations:    225396737  (  5.0 % of used cycles)
  - busy iterations:     69064885  ( 95.0 % of used cycles)
  Rx packets:          2191437619  (4000 Kpps, 584 cycles/pkt)
  Datapath passes:     2191437619  (1.00 passes/pkt)
  - EMC hits:           687977303  ( 31.4 %)
  - SMC hits:          1503460316  ( 68.6 %)
  - Megaflow hits:              0  (  0.0 %, 0.00 subtbl lookups/hit)
  - Upcalls:                    0  (  0.0 %, 0.0 us/upcall)
  - Lost upcalls:               0  (  0.0 %)
  Tx packets:          2191437619  (4000 Kpps)
  Tx batches:            69064885  (31.73 pkts/batch)

Histograms
   cycles/it              packets/it             cycles/pkt             pkts/batch             max vhost qlen         upcalls/it             cycles/upcall
   499       224972808    0         225396737    100       0            0         0            0         225396737    0         294461624    999       0
   716       191768       1         26927        1096      68968043     1         26927        1         26927        1         0            1258      0
   1025      222053       2         32381        2093      70884        2         32381        2         32381        2         0            1584      0
   1469      8787         3         38275        3090      25489        3         38275        3         38275        3         0            1995      0
   2104      1875         4         13463        4086      471          4         13463        4         13463        4         0            2511      0
   3014      39868        5         11784        5083      1            5         11784        5         11784        5         0            3162      0
   4317      68769        6         11977        6080      0            6         11977        6         11977        6         0            3981      0
   6183      44705        7         12200        7076      0            7         12200        7         12200        7         0            5011      0
   8857      100338       8         18046        8073      0            8         18046        8         18046        8         0            6309      0
   12686     268388       9         16991        9070      0            9         16991        9         16991        9         0            7943      0
   18171     7600634      10        17344        10066     0            10        17344        10        17344        10        0            9999      0
   26027     60941048     11        18199        11063     0            11        18199        11        18199        11        0            12589     0
   37279     88           13        53839        12060     0            12        28223        12        28223        12        0            15848     0
   53395     498          17        138721       13056     0            13        25616        13        25616        13        0            19952     0
   76480     0            22        274966       14053     0            14        26656        16        98674        14        0            25118     0
   109544    0            28        650540       15050     0            16        72018        20        193576       15        0            31622     0
   156903    0            35        67729238     16046     0            17        40047        25        380445       16        0            39810     0
   224736    0            45        0            17043     0            18        41946        31        1571912      17        0            50118     0
   321895    0            57        0            18040     0            19        43953        38        539715       18        0            63095     0
   461058    0            72        0            19036     0            20        67630        48        613924       19        0            79432     0
   660385    0            92        0            20033     0            21        59682        59        1388530      20        0            99999     0
   945886    0            117       0            21030     0            22        61755        73        7550979      21        0            125892    0
   1354815   0            148       0            22026     0            23        64090        91        13169439     22        0            158489    0
   1940535   0            188       0            23023     0            24        109018       113       19170248     23        0            199526    0
   2779476   0            239       0            24020     0            25        85900        140       17599579     24        0            251188    0
   3981111   0            303       0            25016     0            26        90086        174       6168106      25        0            316227    0
   5702243   0            385       0            26013     0            27        94130        216       344121       26        0            398107    0
   8167461   0            489       0            27010     0            28        207316       268       4188         27        0            501187    0
   11698454  0            621       0            28006     0            29        401431       332       30           28        0            630957    0
   16755981  0            788       0            29003     0            30        393401       412       0            29        0            794328    0
   24000000  0            1000      0            30000     0            32        66934409     512       0            30        0            1000000   0
   >         0            >         0            >         0            >         0            >         0            >         0            >         0
----------------------------------------------------------------------------------------------------------------------------------------------------------
   cycles/it              packets/it             cycles/pkt             pkts/batch             vhost qlen             upcalls/it             cycles/upcall
   4578                   7.44218                584                    31.73013               9.03423                0.00000                0

   iter                cycles       packets      cycles/pkt   pkts/batch   vhost qlen   upcalls      cycles/upcall
   43061652739         18391        32           574          32           88           0            0
   43061652739         18391        32           574          32           88           0            0

   ms             iterations   cycles/it    Kpps         cycles/pkt   pkts/batch   vhost qlen   upcalls      cycles/upcall
   10145363       432          5691         3976         595          31           27           0            0
   10145362       246          9979         4024         601          31           57           0            0

Let's do some analysis:

  • The first PMD thread (id 2) for the physical interface is utilized at 99.9% as "busy iterations" reports, same story for the physical core with a TSC cycle of 99.1% as reported by "Used TSC cycles."

  • The second PMD thread (id 8) for the VHU interface is utilized at 95% as "busy iterations" reports, while the TSC cycle reports 98.4% utilization. For this report both the last two milliseconds (-ms 2) and the last two iterations (-it 2) will also be displayed.  

  • EMC hit is below 32% while SMC is giving a tangible boost being above 68% hit rate. The MegaFlow is effectively not utilized (which in turns means good for the performance).

  • Because the cache is so efficient, no upcall is happening.

  • Both PMD Threads handle 4Mpps each, the first one for the physical interface can push (TX towards the physical fabric) batches of 25 packets while the second one for the virtual interface can push (TX towards the VM) batches of 32 packets (easier or faster in virtual than in physical but also DPDK driver and NIC family can affect this).

  • On the histogram, the queue depth is averaging at around 9.03, many are actually staying there as the queue depth is between 91 and 140 for a considerably high number of packets 

    • The histogram of the past 2 iteration reports higher average queue depth. 

    • While the on of the past 2 milliseconds is much lower in comparison.

    • When dealing with a bursty type of traffic (for example 3G where traditionally the Node B is sent irregularly even every a tenth of second), precision per millisecond and even per iteration can tell a very different story than the math mean.

Clearly the first PMD thread has no spare resources here. This together with the drop stats (down below) can give a clear idea about the current status in no time

1.3 What's an Upcall?

Well, this is something quite important in OVS as one of the main causes of drops and (lack of) scalability. OVS has a main caching layer called EMC where the action for packets a cached (the flows). If a flow for a specific packet is missing, the packet is sent to the ovs-vswitchd daemon in the slow path (figure 1) and then typically the datapath is populated with the related flow. Now the action of sending a packet is called upcall. Why is it important? Because the packet processing time increases dramatically. Here is where SMC comes into play, caching up to 1M flows (vs. 8k of EMC).

1.4 What’s an Iteration?

To better understand what an iteration is in OVS-DPDK (and more in general, in any DPDK Application) see the official DPDK documentation.

1.5 Suspicious Iteration

New with this release is the ability of the userspace datapath to supervise and detect suspicious iterations in PMD performance. This is the icing on the cake. First off, to enable it, run this ovs-appctl command:

ovs-appctl dpif-netdev/pmd-perf-log-set on

In this case the performance impact is more significant and in the patch was embedded a protection mechanism to disable the feature in the event of too many consumed resources.

Then analyze the ovs-vswitchd logs file (at /var/log/openvswitch/ovs-vswitchd.log) and look for "Neighborhood of suspicious iteration" we will be able to see something like this that was generated pushing 10Mpps to my VM.

WARN|Neighborhood of suspicious iteration:
   iter                cycles       packets      cycles/pkt   pkts/batch   vhost qlen   upcalls      cycles/upcall
   22834435908         14554        32           454          32           995          0            0
   22834435907         14639        32           457          32           1024         0            0
   22834435906         14079        32           439          32           1024         0            0
   22834435905         14552        32           454          32           1021         0            0
   22834435904         14515        32           453          32           1024         0            0
   22834435903         15115        32           472          32           1024         0            0
   22834435902         14697        32           459          32           1020         0            0
   22834435901         14197        32           443          32           1018         0            0
   22834435900         14612        32           456          32           1020         0            0
   22834435899         14506        32           453          32           1018         0            0
   22834435898         14978        32           468          32           1019         0            0

Being able to log weird iterations is nothing without being able to define what suspicions mean. Those are the supported options:

Argument

Default Value

Description

-b

5

The number of iterations before the suspicious iteration to be logged

-a

5

The number of iterations after the suspicious iteration to be logged

-e

false

Extend logging interval if another suspicious iteration is detected before logging occurs

-ne

true

Do not extend logging interval if another suspicious iteration is detected before logging occurs

-q

128

Suspicious vhost queue fills level threshold. Increase this to 512 if the Qemu supports 1024 virtio queue length

-us

250

Change the duration threshold for a suspicious iteration

Effectively what this tool does, is to log for you any suspicious activity without you staying and analyzing the real-time statistics. Also given the available time-stamp, correlating events (cron job ran, user activity, significant clock variation, packet burst, etc.) can be extremely useful.

1.6 Real-world

At OVScon 2017 Ericsson shared three real-world examples resolved with PMD performance metrics and supervision;  the last one is quite common. See slides 14 and 15 here.

  • 20-30% packet drop in conjunction with frequent upcalls due to the i40e driver wasting time checking the like state

  • Periodic packet drops in VMs due to VirtIO TX Queue Overrun result of Linux CFS balancing threads over the NUMA nodes 

  • Periodic bursts of packets dropped in VM due to VirtIO TX Queue Overrun result of QEMU Emulator Threads preempting the vCPUs where DPDK in the VM runs

For more information, see the Ericsson deck shared at OVScon 2017 available here, up to slide 15 as the rest is another feature further down.

2. OVS-DPDK Counters

While the previous patch focuses on exposing detailed statistics, the following patches expose counters (anybody familiar with network devices knows how fundamentals rates and counters are).

2.1 Current Status

The dump-ports command has been available for some time:

ovs-ofctl dump-ports br-int
OFPST_PORT reply (xid=0x2): 6 ports
  port "patch-tun": rx pkts=67, bytes=39152, drop=?, errs=?, frame=?, over=?, crc=?
           tx pkts=3216, bytes=213001, drop=?, errs=?, coll=?
  port  "vhu143d1144-93": rx pkts=83, bytes=9669, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=67, bytes=39152, drop=0, errs=?, coll=?
  port LOCAL: rx pkts=0, bytes=0, drop=0, errs=0, frame=0, over=0, crc=0
           tx pkts=0, bytes=0, drop=3144, errs=0, coll=0
  port  "vhufafd17b2-ed": rx pkts=4598724880, bytes=257533869688, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=4648833689, bytes=260339038168, drop=0, errs=?, coll=?
  port  "int-br-link0": rx pkts=9296687968, bytes=557811267416, drop=?, errs=?, frame=?, over=?, crc=?
           tx pkts=9126896579, bytes=511115758012, drop=?, errs=?, coll=?
  port  "vhu85ef82f8-1e": rx pkts=4529490571, bytes=253655745272, drop=0, errs=0, frame=?, over=?, crc=?
           tx pkts=4649180271, bytes=260359422240, drop=0, errs=?, coll=?

All clear, right?

There is a simple shell script to better expose those values. It yields better-organized data, not more.

take_net_stats_dpdk.sh --duration 60

########## Open vSwitch DPDK STATS over 60 seconds ##########
                Port             TX             RX       TX Drops       RX Drops      TX Errors      RX Errors         TX PPS         RX PPS    TX & RX PPS   TX Drops PPS   RX Drops PPS
              br-int              0              0            120              0              0              0              0              0              0           2.00              0
            br-link0             62              2              0              0              0              0           1.03           0.03           1.06              0              0
              br-tun              0              0              0              0              0              0              0              0              0              0              0
               dpdk0      240016980      240020001              0           5226              0              0     4000283.00     4000333.35     8000616.35              0          87.10
               dpdk1      240015499      240020134              0           5126              0              0     4000258.31     4000335.56     8000593.88              0          85.43
          ovs-netdev              0              0              0              0              0              0              0              0              0              0              0
      vhu1c8db502-c9              0              0              0              0              0              0              0              0              0              0              0
      vhu49d8cd2f-24      240017011      240015522              0              0              0              0     4000283.51     4000258.70     8000542.21              0              0
      vhub80d9af0-79      240016911      240016896              0              0              0              0     4000281.85     4000281.60     8000563.45              0              0

2.2 vHost TX Retries

The problem statement: when sending a batch of packets to a vhost-user or vhost-user-client interface, it may happen that some but not all of the packets in the batch are able to be sent to the guest. This is often because there aren't enough free descriptors in the virtqueue for all the packets in the batch to be sent. 

In this case, there will be a retry, with a default maximum of 8 occurring. If at any time no packets can be sent, it may mean the guest is not accepting packets, so there are no (more) retries. In turn, this means that the buffers in the guest are full or the packet processing reached 100% and no more packets can be received. Looking at the worst scenario, the guest may be wrongly configured hence not able to accept more packets.

To see the TX Retries counter:

ovs-vsctl get Interface vhu85ef82f8-1e statistics:ovs_tx_retries
0

No matter how hard I push the system, I couldn't generate any retries (in my scenario the guest is a DPDK application, on a kernel one the result might be different).

2.3 vHost TX Contention

The problem statement: Multiple PMD threads may need to write to the same vHost queue. Writing on a VHU queue in an atomic action and only one PMD Thread can write at the same time, effectively all the others waste CPU cycles waiting for it. This is defined as a spinlock. When it happens, packets may be dropped.

To see the TX Contention (This system was pushed well beyond the limit)

ovs-appctl coverage/show | grep -E "Event|vhost_tx_contention"
Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=a213c5ea:
vhost_tx_contention        0.0/sec     0.000/sec    15538.8247/sec   total: 55939769

With OVS 2.13, there is an ad-hoc command to interrogate a specific counter.

ovs-appctl coverage/read-counter vhost_tx_contention

2.4 vHost TX IRQs

The problem statement: When OVS-DPDK writes to a vHost queue that is handled in by the guest's kernel space, it needs to send an IRQ notifying the guest that some packets are available. The DPDK library responsible for this action is eventfd_write().

VNF handles the traffic on DPDK accelerated interfaces while the management traffic is most of the time in kernel space. From a NEPs perspective, leveraging the kernel TCP/IP stack in the VM is actually a big plus, and also, let's be clear, managing SSH and other remote configuration protocol access does not require the effort of writing an entire TCP/IP stack using DPDK. 

The downside is that when a VNF vendor uses this management interface for many more activities (remote access, remote configuration, monitoring, config synchronization, time sync, clustering, remote health check, data replication, etc.) then OVS-DPDK (but actually any DPDK Accelerated vSwitch/vRouter) will spend a considerable amount of time sending those IRQ notifications which in turns slows down the PMD Thread(s) affecting other interfaces which can result in packet drops. This counter has this exact aim at tracking down this issue.

To see the TX IRQs, in this case just an initial SSH config, NTP running in background and that's about it.

ovs-appctl coverage/show | grep -E "Event|vhost_notification"
Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=a213c5ea:
vhost_notification         0.0/sec     0.067/sec        0.0225/sec   total: 81

But what happens when running a heavy application over the kernel?

ovs-appctl coverage/show | grep -E "Event|vhost_notification" Event coverage, avg rate over last: 5 seconds, last minute, last hour, hash=9a784f6c: vhost_notification 8734.2/sec 6511.350/sec 479.0725/sec total: 1746806

2.5 vHost Detailed Packet Drop Statistics

This patch adds four new counters for the vHost interfaces:

Category

Counter

Description

Transmit

ovs_tx_failure_drops

Packet drop due to unable to transmit (to VHU)

e.g. the guest VHU TX queue is full, the guest packet processing is over max capacity, etc

MTU

ovs_tx_mtu_exceeded_drops

Packet drop due to packet length bigger than the guest configured MTU size for a given interface

QoS Policy

ovs_tx_qos_drops

Packet drop due to QoS Egress policy for a given interface

e.g. transmitting (to VHU) higher rate than configured traffic shaping one (sending 100Mbps while shipping at 80Mbps)

QoS Policy

ovs_rx_qos_drops

Packet drop due to QoS Ingress policy for a given interface

e.g. receiving (to OVS) higher rate than configured traffic shaping one (sending 50Mbps while shipping at 40Mbps)

All the counters are available underneath the VHU (again, not quite simple to drop packets here)

ovs-vsctl get Interface vhufafd17b2-ed statistics:ovs_tx_failure_drops
0
ovs-vsctl get Interface vhufafd17b2-ed statistics:ovs_tx_mtu_exceeded_drops
0
ovs-vsctl get Interface vhufafd17b2-ed statistics:ovs_tx_qos_drops
0
ovs-vsctl get Interface vhufafd17b2-ed statistics:ovs_rx_qos_drops
0

2.6 Userspace Improved Packet Drop Statistics

Now we arrive at the big patch, this one adds 24 new counters.

Here is an approximate description:

Category

Counter

Description

Upcall drops

datapath_drop_upcall_error

Drop packet due to error in the Upcall process

Upcall drops

datapath_drop_lock_error

Drop packet due to Upcall lock contention

RX drops

datapath_drop_rx_invalid_packet

Drop invalid packet having size lower than what wrote in the Ethernet header

Datapath drops

datapath_drop_meter

Drop packet in the OpenFlow (1.3+) Meter Table

Datapath drops

datapath_drop_userspace_action_error

Drop packet due to generic error executing the action

Datapath drops

datapath_drop_tunnel_push_error

Drop packet due to error executing the tunnel push (aka encapsulation) action

Datapath drops

datapath_drop_tunnel_pop_error

Drop packet due to error executing the tunnel pop (aka decapsulation) action

Datapath drops

datapath_drop_recirc_error

Drop packet due to error in the recirculation (this can also happen in the tunnel pop action) 

Datapath drops

datapath_drop_invalid_port

Drop packet due to invalid port

Datapath drops

datapath_drop_invalid_tnl_port

Drop packet due to invalid tunnel port executing the pop action

Datapath drops

datapath_drop_sample_error

Drop packet due to sampling error

Datapath drops

datapath_drop_nsh_decap_error

Drop packet due to invalid NSH pop (aka decapsulation)

OpenFlow (action=drop)

drop_action_of_pipeline

Drop packet due to pipeline errors

e.g. error parsing datapath actions

OpenFlow (action=drop)

drop_action_bridge_not_found

Drop packet due to bridge not found but, at time of translation, existing

OpenFlow (action=drop)

drop_action_recursion_too_deep

Drop packet due to too many translations, this is also a system limit to protect translation from running too long or occupy too much space

OpenFlow (action=drop)

drop_action_too_many_resubmit

Drop packet due to too many resubmitted, this is also a system limit to protect translation from running too long or occupy too much space

OpenFlow (action=drop)

drop_action_stack_too_deep

Drop packet due to the stack consuming more than 64 kB, this is also a system limit to protect translation from running too long or occupy too much space

OpenFlow (action=drop)

drop_action_no_recirculation_context

Drop packet due to missing recirculation context (also look in the log where the context ID will be reported)

OpenFlow (action=drop)

drop_action_recirculation_conflict

Drop packet due to conflict in the recirculation

OpenFlow (action=drop)

drop_action_too_many_mpls_labels

Drop packet due to MPLS pop action can't be performed as it has more labels than the supported (in OVS 2.13 up to 3).

OpenFlow (action=drop)

drop_action_invalid_tunnel_metadata

Drop packet due to invalid GENEVE tunnel metadata

OpenFlow (action=drop)

drop_action_unsupported_packet_type

Drop packet due to packet not supported by the specific rule

  • Ethernet VLAN encapsulation is not supported for current ethernet type

  • Ethernet VLAN decapsulation cannot be done when VLAN is already present (like double encap without QinQ)

  • NSH encapsulation is not supported for the current packet type

  • Current packet type does not support decapsulation

OpenFlow (action=drop)

drop_action_congestion

Drop packet due to congestion ECN (Explicit Congestion Notification) mismatch

OpenFlow (action=drop)

drop_action_forwarding_disabled

Drop packet when the forward for a port is disabled (like when it's admin down) 

The counters are displayed through the usual coverage utility. Unless triggered, the counters are hidden. Also the only way to reset these counters is to restart ovs.

Here is an example when pushing 10Mpps with 1M flows:

ovs-appctl coverage/show|grep -E "Event|drop_action|datapath"
Event coverage, avg rate over last: 5 seconds, last minute, last hour,  hash=55d53f83:
datapath_drop_upcall_error   0.0/sec     0.000/sec        0.0108/sec   total: 39
datapath_drop_lock_error   0.0/sec     0.000/sec        0.0003/sec   total: 1
drop_action_of_pipeline  100.6/sec   100.967/sec       95.1669/sec   total: 342601

Still in this example, at the bottom on coverage/show, there is a hint saying most of the counters are still at zero with "72 events never hit."

3. Features Availability

Since the Fast DataPath delivery model is an incremental one, all the features mentioned above are available starting from the FDP 20.B or in other words using OVS 2.11.0-48 or above. While upstream everything is available with the 2.13 release. Breaking down the availability of feature by feature:

Feature

Upstream Git Commit

Upstream OVS

FDP Version

Downstream OVS

OVS PMD Performance Metrics

First and Second patch

2.10

FDP 18.09

2.10.0-0

vHost TX Retries

Many patches but the one exposing it is Here

2.12

FDP 20.A

2.11.0-23

vHost TX Contention

Here

2.13

FDP 20.A

2.11.0-28

vHost TX IRQs

Here

2.13

FDP 20.B

2.11.0-37

vHost Detailed Packet Drop Statistics

Here

2.13

FDP 20.B

2.11.0-41

Userspace Improved Packet Drop Statistics

Here

2.13

FDP 20.B

2.11.0-42

See here for the complete Fast DataPath matrix.

4. Wrap up

Telco customers on the TLS release, OSP13, are enjoying all of this with the z-stream 12 shipped in June together with FDP 20.D while for the ones on OSP16.0 (and upcoming 16.1), the FDP 20.B is already a thing since a few months and they just got the 20.D upgrade as well.

Before troubleshooting OVS-DPDK was either guesswork or plenty of Perf and GDB (raising massively the complexity and also the time required for the RCA), now we can safely say it's much easier with more hints and tools where the problem is.

I hope some improvements will be made in the way how counters are displayed and especially new features will not get merged without their respective counters. But for the time being, we should be happy with what we've got here.

5. References


About the author

Federico Iezzi is an open-source evangelist who has witnessed the Telco NFV transformation. Over his career, Iezzi achieved a number of international firsts in the public cloud space and also has about a decade of experience with OpenStack. He has been following the Telco NFV transformation since 2014. At Red Hat, Federico is member of the EMEA Telco practice as a Principle Architect.

Of interest

Noticias destacadas de su interés