If you have ever worked in the networking space, or in support for that matter, you probably have good stories concerning troubleshooting a lost connection. Inevitably, some new system is installed and configured, but we cannot get traffic to pass to or from the system.
One such story I can recall goes something like this.
The use case
I was troubleshooting a back-end storage array that had just been installed by a large, well-known American financial corporation. The customer was trying to configure data replication from his production (PROD) site in Houston to a disaster recovery (DR) site in San Antonio. The disaster recovery system had long been implemented and had a reputation for being a known good config.
The production site was brand new, and of course, the cause of all of our problems. The real issue was that we were unable to get traffic moving between the two replication interfaces that we had configured. We were able to reach outside of the gateway on the DR site, but we were unable to reach outside of the PROD site. Traffic was hitting the gateway device and being dropped.
Within a half-hour of troubleshooting, I asked the client if they had a firewall at the PROD site that could be blocking traffic on the needed ports. "Of course not, there is no firewall between these sites." This response was shocking considering that this was one of the largest financial institutions in the entire country. But, as all customer-facing positions go, you have to be respectful and polite.
So, I ran through every check I could think of from my side. Internal storage firewall disabled: check. Ports open from DR to PROD: check. Ports open from PROD to DR? Nope.
Turns out, that after four hours of troubleshooting and reconfiguring interfaces, the customer said, "Let me get our firewall guy on the call."
Your what guy?
That is a weird position to have hired for considering that you don't have a firewall between these sites. But it wasn't weird, because of course, they had a firewall. Issue solved. Now that the nightmare was over, the tools that I used to figure out where the issue occurred were good old-fashion Telnet (which we can cover in a later article) and of course,
Now that you can see a clear use case for
traceroute, let's talk about the command itself, and what information you can get from it. The purpose of this article, after all, is that you come away with a little more knowledge about the utility that
The syntax is rather simple. The command
traceroute <x> (
x here being an IP or hostname) is the most basic version and it will begin to send packets to the designated target. This result will allow you to trace the path of the packets sent from your machine to each of the systems between you and your desired destination.
For example, if I wanted to trace the path from my computer to
google.com, I would enter something like this:
[root@rhel8dev ~]# traceroute www.google.com traceroute to www.google.com (22.214.171.124), 30 hops max, 60 byte packets 1 _gateway (192.168.2.1) 2.396 ms 2.726 ms 3.057 ms 2 145.sub-66-174-43.myvzw.com (126.96.36.199) 119.355 ms 119.315 ms 119.508 ms 3 * * * 4 10.209.189.140 (10.209.189.140) 120.321 ms 119.836 ms 120.009 ms 5 66.sub-69-83-106.myvzw.com (188.8.131.52) 119.042 ms 119.489 ms 119.156 ms 6 2.sub-69-83-107.myvzw.com (184.108.40.206) 120.039 ms 125.954 ms 101.450 ms 7 112.sub-69-83-96.myvzw.com (220.127.116.11) 110.757 ms 108.485 ms 122.108 ms 8 112.sub-69-83-96.myvzw.com (18.104.22.168) 115.028 ms 121.073 ms 125.537 ms 9 116.sub-69-83-96.myvzw.com (22.214.171.124) 121.793 ms 124.769 ms 124.434 ms 10 Bundle-Ether10.GW6.DFW13.ALTER.NET (126.96.36.199) 128.082 ms 128.400 ms 126.509 ms 11 google-gw.customer.alter.net (188.8.131.52) 106.276 ms 107.885 ms 105.718 ms 12 184.108.40.206 (220.127.116.11) 99.725 ms 101.797 ms 18.104.22.168 (22.214.171.124) 101.671 ms 13 126.96.36.199 (188.8.131.52) 101.207 ms 100.515 ms 99.730 ms 14 dfw06s48-in-f100.1e100.net (184.108.40.206) 99.059 ms 94.502 ms 94.015 ms [root@rhel8dev ~]#
Let's break down these results into smaller bites. This command can produce a lot of information, and as the saying goes, "The best way to eat an elephant is one bite at a time:"
[root@rhel8dev ~]# traceroute www.google.com traceroute to www.google.com (220.127.116.11), 30 hops max, 60 byte packets 1 _gateway (192.168.2.1) 2.396 ms 2.726 ms 3.057 ms
We are only looking at the first hop here. However, we can use this hop to dissect the info that is on display. First up, we see what is actually being sent, and where:
traceroute to www.google.com(IP), 30 hops max, 60 byte packets
From this output, we gather that we are sending traffic to the desired target (
www.google.com). Traceroute, by default, measures 30 hops of 60-byte packets.
Next, we see the first hop occur. Here we are hitting the external gateway:
1 _gateway (192.168.2.1) 2.396 ms 2.726 ms 3.057 ms
You can tell here where hop one actually landed, and then there are three numerical values. These are known as the Round-Trip Time (RTT), which refers to the amount of time that a given packet takes to reach its destination and route back an ICMP message to the source. By default,
traceroute routes three packets of data to test each hop. You can find more information on this process online, however, the short of it is that every packet routes an ICMP error message back to the source when it reaches a device on the network. This action allows
traceroute to determine the RTT of that packet and does not necessarily indicate an error.
Now, let's look at hops 2 to 4:
2 145.sub-66-174-43.myvzw.com (18.104.22.168) 109.206 ms 109.400 ms 109.423 ms 3 * * * 4 10.209.189.140 (10.209.189.140) 124.793 ms 123.585 ms 124.585 ms
We can see something new here. Hop 2 looks normal: A device is hit with RTT times in the 100 millisecond range. Then, it gets interesting. We see only stars (*).
What do these stars (asterisks) mean? Were the packets dropped? Are they timed out?
Let me explain. There are two possibilities when it comes to these stars. First, ICMP/UDP may not be configured. If the
traceroute command completes successfully and you see these stars, most likely the device that was hit was not configured to reply to ICMP/UDP traffic. This result does not mean that the traffic wasn't passed. The second possibility is that the packets were dropped due to an issue on the network. These results are usually packet timeouts, or the traffic has been blocked by a firewall.
As you can see in the above example, even after we see stars at hop 2, the packets continue and are routed back in hop 4. This behavior leads to a successful
traceroute as we can see that Google has been reached.
Traceroute can be an invaluable tool when it comes to troubleshooting network issues. It really helps to visualize where the issue is actually occurring. Of course, there are other operations going on behind the scenes of
traceroute that were not covered here.
I highly suggest if you want an even further in-depth look at this tool that you do some research online. There is a lot of info on Time-to-Live (TTL) and RTT that, in the interest of time, was not included in this article. My goal is that you now have a better understanding of when and why you should use the
traceroute tool, and how to interpret the data that it offers. For more information on networking troubleshooting and concepts, check out our related articles here.