Voice over IP (VoIP) is a technology that allows phone calls to traverse regular IP-based networks (such as the internet). You might associate phone systems with arcane, difficult to use technology. For many years, this was certainly the case. However, building a modern VoIP system doesn’t have to be difficult. While telephony is a complex discipline, building a simple phone network to place inbound and outbound calls is within the reach of anyone who takes the time to understand VoIP technology. With the foundation that you will build in this article, you can approach your next voice project with confidence in your knowledge of the basic protocols that make VoIP networks run.
At their core, VoIP networks are built on two types of protocols: signaling and media. If you’re unfamiliar with the idea of a network protocol, think of it like a common language that network devices use to talk with each other. You’ve interacted with several protocols just to read this article: IP to route your packets, DNS to resolve hostnames into numeric addresses, and HTTP to obtain the contents of this page.
In the context of a VoIP network, a signaling protocol is a language that handles call setup, teardown, and control. If phone A tries to call phone B, they need a common language to discuss several pieces of information, either between each other or between each of their local office phone systems:
- Who is the source and destination of the call?
- What codecs (ways of encoding voice and video data) does each side support?
- Has the other side hung up and completed the call?
Signaling protocols handle all of these questions. The most common signaling protocol that you will likely deal with is the Session Initiation Protocol (SIP), although other protocols such as H.323 and Cisco’s Skinny Client Control Protocol (SCCP) are also easy to run into. Signaling protocols might run over the Transmission Control Protocol (TCP) or the less reliable User Datagram Protocol (UDP).
Media protocols handle the transportation of the actual encoded audio or video that you want to send between endpoints. As each media packet reaches the other side of the conversation, it can be decoded and played back through your speaker (or screen, if you’re having a video call). You will typically see the Real-time Transport Protocol (RTP) in use as the media protocol in modern networks. The Real-time Transport Control Protocol (RTCP), which is a sister protocol to RTP, is also often lumped together as a media protocol even though it only contains statistical data about call quality.
RTP runs on top of UDP using high-numbered ports. UDP is connectionless and does not handle retransmitting lost packets. That feature might sound strange at first: Wouldn’t you want to resend any missed voice data so that you have a coherent conversation? However, using UDP makes sense if you examine the issue more closely. Would you rather hear a second or two of silence, or would you prefer to have words arriving out of order and causing confusion? You would almost certainly prefer a brief moment of silence. Similarly, consider the impact of a connection-oriented protocol, such as TCP, on packet retransmission: The conversation might be blocked and unable to proceed until the missing packets have arrived. This issue would be terrible for phone call quality, so the less reliable UDP transport protocol is preferred.
Understanding the protocols that traverse your network is important for avoiding common protocol-related pitfalls. If you have to perform packet captures, an understanding of the underlying protocols will help guide your packet collection and analysis. With this basic understanding, you can now take a look at two common issues experienced in simple VoIP networks.
VoIP traffic struggles with network address translation (NAT) and overly aggressive firewalls. Phones that are behind a NAT gateway often have difficulty establishing two-way audio streams due to the connectionless nature of UDP. While the outbound audio stream might successfully work, the inbound audio won’t be able to traverse the NAT gateway because it doesn’t have a "session" for the inbound leg of the call.
Technology, such as Session Traversal Utilities for NAT (STUN), exists to help alleviate problems. However, it’s often best to avoid these issues altogether: Organizations frequently deploy their phone system with one interface on the public internet (or behind a 1:1 NAT) and then heavily firewall connectivity to the public IP address to ensure that their system can only talk to an upstream IP telephony provider. Carefully designed firewall rules are absolutely critical for protecting your phone system, especially if you need to expose part of it to the public internet. Ensure that your firewall rules only permit access to the appropriate IPs, such as your external phone carrier.
Overzealous firewalls are also a common source of VoIP woes. Sometimes, they drop traffic for reasons similar to NAT: Outbound traffic is permitted, but no corresponding rules exist to handle inbound traffic. Next-generation firewalls often have opaque rules for handling packet inspection and making intelligent traffic decisions. These rules might need special configuration to correctly handle VoIP traffic. If you suspect that a VoIP issue is being caused by your firewall, see if you can implement a temporary bypass of advanced traffic processing rules to test your theory.
In this article, we took a look at the basic protocols that underpin a Voice over IP system. By having a familiarity with these protocols, you set yourself up for success in understanding how an IP telephony system works. This knowledge is valuable for designing, implementing, and especially troubleshooting an IP telephony system. In the following articles, we will take a look at implementing a VoIP system using the open-source Asterisk software on Linux.