A sysadmin's guide to troubleshooting VLANs
All but the smallest networks are typically split into Virtual Local Area Networks (VLANs, for short), and understanding how to properly troubleshoot them can save you hours of back-and-forth with your network team. In the previous articles, I covered VLAN basics and configuration for Red Hat Enterprise Linux (RHEL) systems. Here, I’ll cover basic troubleshooting steps that you can use to rule out host problems before moving on to the network. By the end of this article, you’ll be able to confidently rule out problems with your servers.
Now that you know how to configure VLANs, I'll spend some time discussing how to troubleshoot them. If you’re working in a small shop where you handle both the servers and network devices, then this troubleshooting information can help to validate your server config before moving on to network devices. For larger organizations with a dedicated networking team, this process can help to prove that the server configuration is correct.
VLAN trouble can be difficult to isolate, since you may not have visibility into the upstream network devices that your servers plug into. However, some basic troubleshooting techniques can help to identify a possible VLAN issue so you can provide your networking team with as much information as possible for proper troubleshooting.
VLAN issues constitute a local area network problem. An obvious symptom is your local server being unable to reach other hosts on the same local network, such as your default gateway. Take a look at an example:
# ip route sh
default via 192.168.1.254 dev eth0
# ping 192.168.1.254
PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
From 192.168.1.25 icmp_seq=1 Destination Host Unreachable
From 192.168.1.25 icmp_seq=2 Destination Host Unreachable
From 192.168.1.25 icmp_seq=3 Destination Host Unreachable
From 192.168.1.25 icmp_seq=4 Destination Host Unreachable
^C
--- 192.168.1.254 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3000ms
pipe 4
# ip neighbor show
192.168.1.254 dev eth0 INCOMPLETE
In the example above, you clearly can’t ping the default gateway. An ARP entry also isn’t populating for the host that you’re trying to ping, so there’s likely some kind of layer two issue, such as a VLAN configuration problem on the switch port (this possibility assumes that you’ve finished troubleshooting other aspects of the network, as discussed in our beginner’s article on network troubleshooting).
Troubleshooting this sort of issue with a packet capture can be instructive. Dig into the failure scenario above with tcpdump
:
# ip --br addr show eth0
eth0 UP 192.168.1.25/24 fe80::5054:ff:fe82:d66e/64
# tcpdump -i eth0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
17:36:23.491195 ARP, Request who-has 192.168.2.254 tell 192.168.2.50, length 28
17:36:24.493148 ARP, Request who-has 192.168.2.254 tell 192.168.2.50, length 28
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
The above capture shows ARP traffic for a completely different subnet (192.168.2.X). This result is a dead giveaway that you have a VLAN misconfiguration on our hands. If the upstream switch port is on the wrong VLAN, you could be seeing traffic for an entirely different local area network.
If your interface configuration files look OK and you’ve performed the basic troubleshooting above, then it’s time to bring the issue to your network team. If the issue is with an access port and a single VLAN, then the upstream switch port may simply be on the wrong VLAN. If the issue is with a more complex trunk configuration, then there are a number of issues that could occur. The upstream switch port may not be configured with the correct VLANs, or the VLANs may not exist on the switch at all. Either way, these basic troubleshooting tools can help you to rule out your server and start looking at the network.
VLANs might seem complex to the network-uninitiated, but they’re just a simple way of dividing switches into multiple broadcast domains (and typically, IP subnets) for increased efficiency. Small networks may never move beyond a few VLANs, while complex data centers may have hundreds. In both cases, having a basic understanding of how to configure and troubleshoot VLANs can help you to more quickly isolate problems in your network and work toward the right resolution.
[Need more on networking? Download the Linux networking cheat sheet.]
Anthony Critelli
Anthony Critelli is a Linux systems engineer with interests in automation, containerization, tracing, and performance. He started his professional career as a network engineer and eventually made the switch to the Linux systems side of IT. He holds a B.S. and an M.S. More about me