Skip to main content

How to configure MPLS with tc in the Linux kernel

The tc command is useful for configuring Multiprotocol Label Switching to test Linux kernel features.
Image
Using sshpass
Image by Manfred Richter from Pixabay

Multiprotocol Label Switching (MPLS) is a telecom routing technique that uses labels to direct data between nodes. It is supported by the Linux networking stack, and many articles and tutorials have been written about how to configure it with ip route. However, you can also handle MPLS at a lower level with tc.

Real deployments typically use control plane software to configure MPLS dynamically. However, it's useful to be able to execute tc commands manually for learning, experimenting, and testing Linux kernel features.

This article explains how to match different fields in MPLS headers with tc-flower, covers the different MPLS actions that tc supports for adding, modifying, or removing MPLS headers, and finally shows how to encapsulate MPLS into the User Datagram Protocol (UDP).

All commands are based on Linux v5.14 and iproute2 v5.10.0 (you can use an iproute2 version older than the kernel because v5.10.0 implements all the required Netlink features). Also, the upcoming Red Hat Enterprise Linux (RHEL) 8.5 will have these features in tech preview. Sysadmins will have to install the kernel-modules-extra package.

What is MPLS?

Typically, an MPLS header comes right after an Ethernet header. It is composed of one or more Label Stack Entries (LSE). An LSE is encoded as follows (diagram courtesy of RFC 5462):

  0                   1                   2                   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Label
 |                Label                  | TC  |S|       TTL     | Stack
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Entry

                     Label:  Label Value, 20 bits
                     TC:     Traffic Class field, 3 bits
                     S:      Bottom of Stack, 1 bit
                     TTL:    Time to Live, 8 bits

The last LSE has the S bit set. The header type that follows the last LSE is inferred from its Label (this is typically IP or Ethernet). Labels have local significance only.

Match MPLS packets with tc-flower

A qdisc is a scheduler and "the major building block on which all of Linux traffic control is built." Most examples in this article are applied on the ingress qdisc eth0. Create this ingress qdisc:

    tc qdisc add dev eth0 ingress

This command shows how to match the different fields of the first LSE (the one at the top of the stack):

    tc filter add dev eth0 ingress protocol mpls_uc          \
      flower mpls_label 100 mpls_tc 2 mpls_bos 1 mpls_ttl 64 \
      action drop

In the above:

  • dev eth0: The networking device this rule applies to
  • ingress: Apply this rule on the ingress qdisc (incoming packets)
  • protocol mpls_uc: Match MPLS packets
  • flower: Name of the classifier used for matching (the mpls_* parameters following are specific to this classifier)
  • mpls_label 100: Match only if the first LSE has Label 100
  • mpls_tc 2: Match only if the first LSE has Traffic Class 2
  • mpls_bos 1: Match only if the first LSE has the S bit set (that is, the MPLS header contains exactly one LSE)
  • mpls_ttl 64: Match only if the first LSE has TTL 64
  • action drop: Drop packets that match this filter

These mpls_* parameters can match only the first LSE. For matching beyond the first LSE, you can use its more expressive syntax with the lse keyword:

    tc filter add dev eth0 ingress protocol mpls_uc \
      flower mpls lse depth 1 label 100 tc 2 ttl 64 \
                  lse depth 2 label 101 bos 1       \
      action drop

In the above:

  • mpls: Introduces a list of LSEs to match
  • lse: Introduces an LSE to match
  • depth 1: Focuses on the first LSE
  • label 100: Match only if the current LSE has Label 100
  • tc 2: Match only if the current LSE has Traffic Class 2
  • ttl 64: Match only if the current LSE has TTL 64
  • bos 1: Match only if the current LSE has the S bit set

This filter drops MPLS packets where the first LSE has Label 100, Traffic Class 2, and TTL 64 and is followed by exactly one LSE with Label 101 (whatever the value of its TC and TTL bits).

The bos 1 option makes the filter match only packets that have exactly two LSEs. You don't need to add a bos 0 option for the first LSE: Having another LSE with depth 2 means that it only considers MPLS packets that have at least two LSEs.

The depth option is mandatory for every lse. Otherwise, the filter wouldn't know which LSE to work on. The label, tc, bos, and ttl parameters are optional. Omitting them means any value will match.

The following filter drops all MPLS packets that have a stack of at least three LSEs:

    tc filter add dev eth0 ingress protocol mpls_uc \
      flower mpls lse depth 3                       \
      action drop

Another way to view it is that the filter matches (and drops) every MPLS packet where it finds an LSE at depth 3.

To match packets with exactly three LSEs, add the bos 1 option:

    tc filter add dev eth0 ingress protocol mpls_uc \
      flower mpls lse depth 3 bos 1                 \
      action drop

Conversely, you could use bos 0 to match packets with more than three LSEs. More technically, the filter would match MPLS packets where an LSE exists at depth 3, and that LSE doesn't have the S bit set.

The maximum depth the flower classifier can handle is 7.

Do simple MPLS actions

So far, I've only used one of the simplest possible actions: dropping packets. Now I'll introduce some MPLS-specific actions.

For example, the following filter matches all IPv6 packets whose destination address belongs to 2001:db8::/32 and encapsulates the IP packet into MPLS with Label 100, Traffic Class 2, and TTL 64:

    tc filter add dev eth0 ingress protocol ipv6 \
      flower dst_ip 2001:db8::/32                \
      action mpls push label 100 tc 2 ttl 64

There's normally no need to specify the bos option for the new LSE, as it should be correctly inferred from the original packet type.

[ Sign up for this free online course: RHEL technical overview. ]

Adding a new LSE on top of other ones is also supported:

    tc filter add dev eth0 ingress protocol mpls_uc \
      flower mpls_label 100                         \
      action mpls push label 101 tc 3 ttl 64

This command pushes a new LSE (Label 101, Traffic Class 3, TTL 64) on top of MPLS packets whose first LSE has Label 100.

Another very common action is to drop the MPLS header:

    tc filter add dev eth0 ingress protocol mpls_uc \
      flower mpls_label 100 mpls_bos 1              \
      action mpls pop protocol ipv6

Since MPLS labels have only local significance, tc and the networking stack have no way to figure out which type of header follows the MPLS header. That's why pop requires the protocol option to tell the kernel how to handle the resulting packet. The mpls_bos 1 option ensures the LSE you're removing is the only one in the stack.

If there are more LSEs after the one to remove, use:

    tc filter add dev eth0 ingress protocol mpls_uc flower \
      mpls_label 100 mpls_bos 0                            \
      action mpls pop protocol mpls_uc

You can also remove multiple LSEs:

    tc filter add dev eth0 ingress protocol mpls_uc \
      flower                                        \
        mpls lse depth 1 label 101                  \
             lse depth 2 label 102                  \
             lse depth 3 label 103 bos 1            \
      action mpls pop protocol mpls_uc              \
      action mpls pop protocol mpls_uc              \
      action mpls pop protocol ipv6

Apart from adding or removing LSEs, you can also modify the topmost label with the modify action:

    tc filter add dev eth0 ingress protocol mpls_uc flower \
      mpls_label 100 mpls_tc 2 mpls_ttl 64                 \
      action mpls modify label 101 tc 3

This command modifies the first LSE's label and traffic class but leaves the TTL as is.

Do advanced MPLS actions

The push action inserts an LSE right after the Ethernet header. But sometimes MPLS is used to encapsulate packets with their Ethernet header. The mac_push action was developed for this use case. It behaves like push but adds the LSE before the MAC header (usually Ethernet) instead of after it. It's important to keep in mind that after applying the mac_push action, the packet doesn't have any MAC header at all since it now starts with an MPLS header (the original MAC header is just payload now). Therefore you also need to push a new Ethernet header. That's what the push_eth action does: It takes the source and destination MAC addresses as parameters, but not Ethertype, as the kernel sets it automatically. It looks like this:

    tc filter add dev eth0 ingress                   \
      matchall                                       \
      action mpls mac_push label 200 ttl 64          \
      action vlan push_eth dst_mac 00:00:5e:00:53:01 \
                           src_mac 00:00:5e:00:53:00 \
      action mirred egress redirect dev eth1

This command encapsulates all packets arriving on eth0, including their Ethernet header, into MPLS (with Label 200 and TTL 64), then adds an external Ethernet header and finally forwards the resulting packet over eth1.

[ Plan your path to network automation. Download the free eBook Network automation for everyone. ]

The opposite operation requires dropping the outer Ethernet header and the MPLS one. The pop_eth action takes care of the Ethernet part. For MPLS, you can use the normal mpls pop action, with protocol teb (Transparent Ethernet Bridging) to tell the kernel that what follows is an Ethernet header:

    tc filter add dev eth0 ingress protocol mpls_uc \
      flower mpls_label 200 mpls_bos 1              \
      action vlan pop_eth                           \
      action mpls pop protocol teb                  \
      action mirred egress redirect dev eth1

This filter matches incoming MPLS packets on eth0 with Label 200, then drops the outer Ethernet header, drops the MPLS header (you know there's only one LSE to pop as you've specified mpls_bos 1), and finally forwards the resulting packet over eth1.

Note that the pop_eth action requires that the packet have no VLAN header. If the outer Ethernet header has some VLANs, you'd have to drop them first with some vlan pop actions before the vlan pop_eth one.

Encapsulate MPLS into UDP

To encapsulate MPLS into UDP, first create a bareudp device:

    ip link add name bareudp0 up type bareudp dstport 6635 ethertype mpls_uc

Explanations:

  • name bareudp0: Name of the new device
  • up: Make the new device usable immediately
  • type bareudp: The new device's type (the arguments that follow are specific to this device type)
  • dstport 6635: The UDP port used (6635 is the IANA assigned port for MPLS-in-UDP)
  • ethertype mpls_uc: MPLS isn't the only protocol supported by bareudp, therefore you must explicitly tell which protocol is handled by this device

Currently, bareudp works only in external mode (although the "external" keyword isn't explicitly passed as a parameter). An external virtual device doesn't have all the information necessary to build the outer headers; those need to be attached to each packet as metadata. That's what tc's tunnel_key action is for:

    tc filter add dev eth0 ingress protocol mpls_uc matchall  \
      action tunnel_key set src_ip 192.0.2.1 dst_ip 192.0.2.2 \
      action mirred egress redirect dev bareudp0

Here, the tunnel_key set action attaches outer source and destination IP addresses as metadata to each packet matching the filter (that is, all MPLS packets received on eth0). The second action just redirects the packets to the virtual bareudp0 device, which builds the outer headers based on its own configuration (for UDP ports) and the packets metadata (for IP addresses), and finally routes the resulting packets to their destination.

It's also possible to apply filters on the bareudp device. The following command drops the MPLS header if the label is 200 and forwards the packet over eth1, assuming the payload is an Ethernet packet. This is the same filter as in the previous section but adapted for working on a bareudp device instead of an Ethernet one:

    tc qdisc add dev bareudp0 ingress
    tc filter add dev bareudp0 ingress protocol mpls_uc \
      flower mpls_label 200 mpls_bos 1                  \
      action mpls pop protocol teb                      \
      action mirred egress redirect dev eth1

Here, there's no need to drop any MAC header since the MPLS header directly follows UDP and is the first header seen by the bareudp device.

If the protocol following the MPLS header is IP, you have to push an Ethernet header before forwarding the packet:

    tc filter add dev bareudp0 ingress protocol mpls_uc \
      flower mpls_label 200 mpls_bos 1                  \
      action mpls pop protocol ipv6                     \
      action vlan push_eth dst_mac 00:00:5e:00:53:01    \
                           src_mac 00:00:5e:00:53:00    \
      action mirred egress redirect dev eth1

Learn more

Alhough tc might be intimidating at first, it has a rich set of features that can be combined together to create a powerful MPLS data path.

You can find more information in the following docs: ip-link(8), tc(8), tc-flower(8), tc-matchall(8), tc-mirred(8), tc-mpls(8), tc-tunnel_key(8), and tc-vlan(8).

Check out these related articles on Enable Sysadmin

Topics:   Linux administration   Networking  
Author’s photo

Guillaume Nault

Guillaume Nault is a C programmer with 10 years of experience in the networking field (user space and kernel space). He joined Red Hat in January 2019 and works on the Linux kernel networking stack. More about me

On Demand: Red Hat Summit 2021 Virtual Experience

Relive our April event with demos, keynotes, and technical sessions from
experts, all available on demand.

Related Content

OUR BEST CONTENT, DELIVERED TO YOUR INBOX