There are two supported protocols in Red Hat Enterprise Linux for synchronization of computer clocks over a network. The older and more well-known protocol is the Network Time Protocol (NTP). In its fourth version, NTP is defined by IETF in RFC 5905. The newer protocol is the Precision Time Protocol (PTP), which is defined in the IEEE 1588-2008 standard.
The reference implementation of NTP is provided in the ntp package. Starting with Red Hat Enterprise Linux 7.0 (and now in Red Hat Enterprise Linux 6.8) a more versatile NTP implementation is also provided via the chrony package, which can usually synchronize the clock with better accuracy and has other advantages over the reference implementation. PTP is implemented in the linuxptp package.
With two different protocols designed for synchronization of clocks, there is an obvious question as to which one is
better. PTP was designed for local networks with broadcast/multicast transmission and, in ideal conditions, the system clock can be synchronized with sub-microsecond accuracy to the reference time. NTP was primarily designed for synchronization over the Internet using unicast, where it can usually achieve accuracy in the single-digit millisecond range. As it's currently implemented in chrony and ntp, in local networks (again, in ideal conditions) the accuracy can get within tens of microseconds. However, accuracy in ideal conditions isn't the only thing that matters. There are usually other criteria that need to be considered - criteria like resiliency, security, and cost. As will be explained later, each protocol has some advantages over the other and the best choice actually may be to use them both (at the same time) in order to combine their respective advantages.
The basic principles of the two protocols are the same. Computers or other devices that have a clock are connected in a network and form a hierarchy of time sources in which time is distributed from top to bottom. The devices on top are normally synchronized to a reference time source (e.g. a timing signal from a GPS receiver). Devices "below" periodically exchange timestamps with their time sources in order to measure the offset of their clocks. The clocks are continuously adjusted to correct for random variations in their rate (due to effects like thermal changes) and to minimize the observed offset.
In NTP, one level of the hierarchy is called stratum. The devices on top are stratum 1 servers, below them are stratum 2 clients, which are servers to stratum 3 clients, and so on. In PTP there are slaves, which are synchronized to their masters. Each communication path has one master and its slaves can be masters on other communication paths. The master on top is called grandmaster (GM). A device that has ports in two or more communication paths (i.e. it can be a slave and also master of other slaves at the same time) is a boundary clock (BC). Clocks with one port are ordinary clocks (OC). The group of all clocks that are directly or indirectly synchronized to each other using the protocol is called a PTP domain.
Read more about optimizing performance for the open-hybrid enterprise.
The first important difference between NTP and PTP is in how time sources are selected when multiple paths to the top of the hierarchy are available. In NTP, the selection happens on the client side. An NTP client is receiving timestamps from all of its possible sources and it's up to the client to decide which sources are acceptable for synchronization. The servers just tell the client what time they think it is and its maximum expected error. The client checks if the measurements overlap and a majority of the sources agree on what time it is. Sources that are off are rejected as serving false time (falsetickers). From the remaining sources are selected sources with best statistics and shortest paths to the reference time sources on stratum 1 servers. The measurements are combined and the result is used for the adjustment of the local clock. This source selection process / procedure makes the protocol very resilient against failures. The client will stay synchronized for as long as it has enough good sources to outvote falsetickers. With three sources, it can detect one falseticker, with five sources it can detect two falsetickers, and so on.
In PTP, each slave has only one master and there is only one grandmaster in a PTP domain. Masters on the communication paths are selected by a Best Master Clock (BMC) algorithm, which runs independently on each clock in the domain. There is no negotiation between the clocks. The current master announces attributes of its grandmaster (class, accuracy, priority, etc.) to other clocks on the path and if there is a clock with a better grandmaster, or is closer to the same grandmaster in the network topology, it will switch to the master state and the previous master will stop when it sees it's no longer the best master on the communication path. The selection may take several iterations before it stabilizes, but ultimately there will be just one grandmaster and all slaves will be synchronized with it through masters on the shortest paths.
When a network link or master fails, another clock on a path to the grandmaster can take over. When the grandmaster fails, another clock with a reference time source can be the grandmaster. There are optional mechanisms in PTP that allow fast reselection. But the assumption here is that it fails completely, or at least is able to detect its failure and stop working. There is no resiliency against other failures. When the synchronization of a (grand)master fails or degrades for some reason (e.g. its clock becomes unstable or a network link becomes congested or asymmetric), the master will still be the clock with the best attributes on the communication path and all clocks synchronized to it will fail with it. A single failure can disrupt synchronization in whole PTP domain, even if there are redundant network paths and multiple clocks with a reference time source.
In order to have resiliency against any kind of failure it's necessary to run multiple PTP domains on separate networks, where clocks on bottom of the hierarchy are connected to all of them and implement some logic for their selection. Ideally, PTP devices in different domains should be from different vendors to avoid simultaneous failures (e.g. due to bugs in handling of rare events like GPS week rollover, leap seconds, etc.).
Synchronization in NTP
NTP has three synchronization modes, namely: a client/server mode, a symmetric mode between two peers, and a broadcast mode. There are differences in how they measure the offset and network delay.
The most commonly used mode is the client/server mode. In this mode, the client periodically sends a request to the server using a client mode packet and the server responds immediately with a server mode packet. From each exchange the client gets four timestamps: time when the request was sent (t1), time when it was received by the server (t2), time when the response (from the server) was sent (t3) and time when it was received back at the client (t4). Note that t1 and t4 are measured by the client's clock and t2 and t3 are measured by the server's clock. The offset of the client's clock is the difference between the midpoints of intervals [t1, t4] and [t2, t3]. The delay is the round-trip time not including the server's processing time (i.e. the length of the local interval [t1, t4] without remote interval [t2, t3]).
The assumption here is that the delays were identical in both directions. If not, the measured offset will have an error equal to the half of the difference in the delays. As the client has both offset and delay for each measurement, it can throw away measurements that have unusually large delay, assuming the extra delay was asymmetric (which it usually is) and the measured offset has a large error.
The symmetric mode is similar to the client/server mode, except it allows synchronization in both directions. It's typically used between NTP servers operating at the same stratum as a backup in case one of them loses its upstream synchronization. A symmetric mode packet is basically a client request and server response at the same time. Normally, peers don't respond immediately, but they both send packets in their own interval. Similarly to the client/server mode, after each received packet each peer has four timestamps from which it can calculate the offset and delay. The only difference is that the [t2, t3] interval may be significantly longer.
The broadcast mode is very different from the client/server and symmetric modes. It is fully supported in ntp, but chrony supports it only as a broadcast server, not as a client. The main purpose of the broadcast mode is to simplify configuration of clients in very large networks. Instead of configuring each client with a list of NTP servers (or distributing the list via DHCP for instance), all clients can use the same configuration file, which just enables reception of broadcast or multicast NTP packets.
A broadcast server periodically sends a broadcast mode packet to an IP broadcast or multicast address. Its clients normally don't send any requests. After each received packet they have only two timestamps, the remote time when the packet was sent by the server (t3) and the local time when the packet was received (t4). That's not enough to calculate both offset and delay. The delay is measured independently using client mode packets first and the offset can be then calculated by subtracting half of the delay from the difference between t4 and t3.
The assumption here is that the delay doesn't change over time. If it does change, and it almost always does at least a bit due to variations in delays in network switches and routers, the error in the measured offset is equal to the change in the delay. That's twice as much as the error in the client/server mode due to asymmetry in the delay. As the client doesn't know the delay for each measurement, it can't easily discard those that were significantly delayed and have a large error in the offset. This means the broadcast mode is less accurate (and less secure even if authentication is enabled) than the other modes and should generally be avoided.
Synchronization in PTP
The synchronization of clocks in PTP is similar to the NTP broadcast mode. Typically, all clocks on the communication path send messages to the same IP or layer 2 (e.g. Ethernet) multicast address. PTP also supports unicast messaging, but it doesn't add any new message types or change anything in how the offset and delay are measured. It just changes the addressing of messages, which can be useful in larger networks to reduce the amount of PTP traffic. Note that linuxptp currently does not support unicast transmissions.
A PTP master periodically sends sync messages, which are received by its slaves. This gives them two timestamps, remote time when the message was sent and local time when the message was received. As in the NTP broadcast mode, that's not enough to calculate the offset of the clock. The network delay between the master and slave has to be measured first.
There are two mechanisms how it can be measured: end-to-end (E2E) using delay request/response and peer-to-peer (P2P) using peer delay request/response. With the E2E mechanism the slave sends a delay request and the master immediately responds with the timestamp when it received the request. This gives the slave the two missing timestamps, which allow it to calculate the delay exactly as in the NTP client/server mode. With the P2P mechanism, both request and response are timestamped, but the slave generally doesn't know exactly the remote timestamps, only their difference. This is sufficient to calculate the delay directly without using the timestamps from the sync message. When the slave knows the delay, it can calculate the offset with each sync message.
Unlike in the NTP broadcast mode, where clients normally measure the delay only once on start, PTP slaves measure the delay periodically. The rate is controlled by the master and it's normally a fraction of the rate of sync messages (i.e. slaves usually have more timestamps from the sync messages than delay response messages). The standard PTP approach is to calculate the offset and delay independently. Alternatively, with the E2E delay mechanism it's possible to use the four most recent timestamps (two for sync message and two for delay response) and calculate the offset and delay at the same time as in the NTP client/server mode. This allows more effective filtering, but reduces the number of samples. Which of the two works better depends on the stability of the clock and the amount of jitter in the measurements.
Accuracy of Synchronization
The accuracy of the NTP and PTP synchronization ultimately depends on the accuracy of the measured offset of the clock against the source. This error accumulates in the synchronization hierarchy. The clients and slaves don't know how accurate their clocks really are, they just try to minimize the observed offset by adjusting the rate of their clocks. The error has a variable component, which can be reduced with multiple measurements by filtering and averaging, and a constant component, which generally can't be detected. If the error is stable and the clock is also stable, the offset can be reduced to very small values, but that doesn't necessarily mean the clock is also accurate. This is very important when looking at the offsets reported in the
ntpd logs, or values printed by the
ntpq programs. Small offsets generally indicate low network jitter and a stable clock, but it doesn't say much about the accuracy as there still may be a large constant error.
Asymmetry in Network Delay
One source of the error is asymmetry in the network delay. The calculation of the offset assumes the delay in the two directions is exactly the same, but that's rarely the case. For instance, if packets sent from A to B take 200 microseconds and packets sent from B to A only 100 microseconds, the measured offset will have an error of 50 microseconds. If A can keep its offset close to zero, it will actually be running 50 microseconds ahead of B.
The asymmetry has multiple sources. It may be in the physical or link layer, the packets may go over different network paths (e.g. due to asymmetric routing), and there may be different processing and queueing delays in switches and routers. There is nothing the clients/slaves can do to measure this error. It can be corrected only if it's measured externally by other means (e.g. with a reference time source connected directly to the client/slave). In PTP there is an option called
delayAsymmetry intended for this correction.
Fortunately, this error has an upper bound. The clients/slaves don't know the asymmetry between the delays, but they do know the round-trip delay. If the packet was received instantly after it was sent in one direction and the other direction took the whole delay, the error in the offset would be equal to the half of the round-trip delay. This is the maximum error due to asymmetry in the network delay between two devices. In order to determine the maximum error relative to the reference time source, it's necessary to know the accumulated round-trip delay over all levels of the hierarchy to the reference time source.
In NTP this value is called root delay. It's exchanged in NTP packets together with root dispersion, which estimates the maximum error of the synchronization itself, taking into account stability of the clocks on the path to the reference time source. Both values can be monitored using the
ntpq programs. In PTP this information is not exchanged, which means only slaves of the grandmaster can estimate their maximum error and it's also less reliable due to the limitations of the broadcast synchronization as the delay is normally calculated independently from the offset.
The asymmetry in the network delay due to switches and routers can be corrected if they support a special correction field in PTP messages. A PTP-aware switch or router that supports this correction is called a transparent clock (TC). When it's forwarding a PTP message which includes this field, the difference between the time of reception and time of transmission is added to the value in the field. The slave includes these corrections in the calculation of the delay and offset, which improves the accuracy significantly. In an ideal case, the error drops to zero as if the slave and master were connected directly. NTP doesn't have anything like that. In order to avoid this error all NTP devices would have to be connected directly.
Another source of the error in the offset is inaccuracy of the timestamping itself (i.e. the transmit or receive timestamp doesn't correspond exactly to the time when the packet was actually sent or received by the NIC).
On a Linux machine, there are basically three different places where the timestamps can be generated:
- In user space (i.e. the NTP/PTP daemon), typically before making a send()
system call and after a select() or poll() system call.
- In the kernel, before the packet is copied to the NIC ring buffer and when
the NIC issues an interrupt after receiving a packet. This is called software
- In the NIC itself, when the packet enters and leaves the link or physical
layer. This is called hardware timestamping.
Software timestamping is more accurate than user-space timestamping, because it doesn't include the context switch, processing of the packet in the kernel and waiting in the network stack. Hardware timestamping is more accurate than software timestamping, because it doesn't include waiting in the NIC. However, there are several issues with HW and SW timestamping that make them more difficult to use than user-space timestamping.
The first issue is that not every NIC and network driver supports HW timestamping. In the past only few selected NICs had support for HW timestamping, but it's more common with modern hardware. Also, some NICs don't allow for the timestamping of arbitrary packets and support is limited to PTP packets. SW timestamping depends entirely on the driver. Supported timestamping features can be verified with the
ethtool -T command.
The second issue is that with HW timestamping the NIC has its own clock, which is independent from the system clock, and there has to be some form of synchronization between these two clocks. HW timestamping can be so accurate that the weakest link of the synchronization may actually be between the NIC and the system clock (!). Measuring the offset between the two clocks involves sending messages over PCIe bus and the round-trip delay is typically a few microseconds. As the asymmetry in the delay on the bus and the time the NIC needs to respond are unknown, the error in the offset may actually be close to the half of the round-trip delay. There is no easy way to measure this error. Even if the NIC clock is accurate to few nanoseconds, the system clock may still be off by a microsecond.
The third issue is with servers/masters sending packets which are supposed to include the transmit timestamp of the packet itself. With SW timestamping the kernel would have to know where to put the timestamp in the packet. With HW timestamping it would have to be done by the NIC. The Linux kernel supports some NICs that can do this with PTP packets, which are called one-step clocks. Some NICs have a special "launch time" feature that would allow sending packets with an accurate transmit timestamp by pre-programming the time of the transmission instead of making modifications in the packet, but the kernel doesn't support that yet.
If the NIC can't modify the packet, the protocol itself has to provide some mechanism to send the transmit timestamp to the client/slave later. PTP has follow-up messages and the devices that use them are called two-step clocks. The NTP specification doesn't have anything like that (yet). The reference implementation supports special interleaved variants of the symmetric and broadcast modes, which allow the peer/server to send the transmit timestamp of the previous packet, but it doesn't support HW timestamping on Linux, so there is currently no practical use for it. The client/server mode could have an interleaved variant too if the server was allowed to keep some state for each client.
Both NTP implementations currently use SW timestamping for reception and user-space timestamping for transmission; linuxptp supports SW and HW timestamping for both reception and transmission. With HW timestamping the synchronization of the two clocks is separate. The NIC clock is synchronized by
ptp4l and the system clock is synchronized to the NIC clock by
Similarly to the network delay, the error in the measured offset doesn't depend on the absolute error in the timestamping, but asymmetry in the errors between the server/master and client/slave. If the sum of the error in the transmit timestamp and receive timestamp for packets sent from A to B is similar to the sum of errors in timestamps for packets sent from B to A, the errors will cancel out. There may be a large asymmetry between the errors in transmit and receive timestamps on one side, but that's not a problem if the other side has a similar asymmetry. For this reason it's recommended to use the same combination of timestamping on both sides and ideally also the same model of NIC. Mixing different combinations of timestamping or different NICs may increase the error in the measured offset.
One source of the error in user-space and SW timestamping is interrupt coalescing. When the NIC receives a packet, it may wait for more packets before interrupting the kernel in order to reduce the number of interrupts, but this means the user-space or SW timestamp is made later and has a larger error. On some NICs interrupts coalescing can be configured with the
ethtool -C command. Different NICs and drivers have different configurations. Adjusting the values for a shorter delay may reduce the error in the receive timestamp, but without a reference time source it's difficult to tell how the asymmetry between the server/master and client/slave has changed and whether the accuracy has actually improved.
These graphs show differences between user-space/SW and HW timestamps that were collected over several hours in one-second interval with a gigabit Ethernet NIC. Some patterns can be seen in the plot of error vs packets, which correspond to an increased network and CPU load. The distribution shows that the user-space transmit timestamps are most likely to be around 30 microseconds early. The SW receive timestamps are most of the time only about 6 microseconds late, which indicates the interrupt coalescing on this NIC is adaptive. If this was a client/slave using user-space/SW timestamps for synchronization and errors in the timestamping on the server/master were perfectly symmetric, the client/slave would be running about 12 microsecond ahead of the server/master.
Combining PTP with NTP
In order to get both accuracy and resiliency at the same time, it would be useful if PTP and NTP could be combined. PTP would be the primary source for synchronization of the clock when everything is working as expected. NTP would keep the PTP sources in check and allow for fallback between different PTP sources, or to NTP servers when all PTP sources fail.
In Red Hat Enterprise Linux, this is possible. Programs from the linuxptp package can be used in a combination with an NTP daemon. A PTP clock on a NIC is synchronized by
ptp4l and is used as a reference clock by
ntpd for synchronization of the system clock. The
phc2sys program has an option to work as a shared memory (SHM) reference clock, which is supported by both NTP daemons. With multiple NICs supporting HW timestamping, for each PTP clock there is one
ptp4l instance and one
phc2sys instance. To make the configuration easy, linuxptp includes also a
timemaster program, which from a single configuration file can create configuration files for all other programs and start them as needed. It supports both
chronyd is preferred as it can synchronize the clock with better accuracy.
timemaster configuration file is in /etc/timemaster.conf. It specifies NTP servers, network interfaces in PTP domains, and also options for
phc2sys. This is a minimal example of the configuration file using a single PTP source and an NTP source:
In this configuration
timemaster configures and starts
chronyd with one NTP server and one SHM reference clock. The NTP server is polled every 16 seconds. The PTP clock on the eth0 interface is synchronized by
ptp4l in the PTP domain number 0 and
phc2sys provides the PTP clock as a SHM reference clock to
delay option sets the maximum expected error of the PTP clock due to asymmetry in network delays and timestamping to ±5 microseconds. This value can't be provided by PTP, so it needs to be specified in the configuration file. The default value is 100 microseconds (maximum error of ±50 microseconds), which should cover errors in SW timestamping, but in most cases with HW timestamping it would be unnecessarily large. Setting the delay to a smaller value will prevent
chronyd from combining the PTP source with close NTP servers in local network, which are expected to have a much larger error than the PTP source, and it will also make the detection of falsetickers more sensitive.
In normal conditions the system clock is synchronized to the PTP clock. If
ptp4l switches to an unsynchronized state (e.g. after a complete failure of a PTP master),
phc2sys will stop updating the SHM refclock and
chronyd will switch to the NTP source when the estimated error of the system clock becomes comparable to the estimated error of the NTP source. A short loss of the PTP source doesn't cause an immediate switch to a significantly worse NTP source. If synchronization of one source fails in such a way that it gives false time, there will be a problem. The two sources won't agree with each other and
chronyd will have to stop the synchronization as it doesn't know which one is correct. A warning message will be written to the system log. If the configuration file specified a second NTP server, the falseticker could be identified and the synchronization would continue uninterrupted.
The next example shows a more resilient configuration using two different PTP domains and three NTP servers.
timemaster will now start
chronyd with two
ptp4l instances and two
phc2sys instances. The following diagram illustrates how this all works together.
This configuration is resilient against up to four sources failing completely and up to two sources giving false time. In normal conditions all five sources are expected to give true time, but only the two PTP sources are used for synchronization of the system clock. Combining the PTP sources may improve its accuracy as their average is likely to be closer to the true time. If one PTP source fails, the accuracy won't degrade significantly. If both PTP sources fail completely or they start giving false time, the synchronization will fall back to a combination of the NTP sources.
If the PTP grandmasters can also serve time over NTP and are specified as the two of the three NTP sources, this configuration will still be resilient against any failure of a grandmaster, even though it's used as two separate sources. The grandmaster clocks should ideally be from different vendors in order to avoid four sources failing at the same time in the same way due to a firmware or software bug, which could outvote the third NTP source. If the two PTP domains are in separate networks, this configuration will be resilient also against network failures.
The following examples show what the
chronyc sources command prints in this configuration when different failures of PTP sources are simulated. In the first example everything is working as expected and all sources agree with each other. Only the two PTP sources are used for synchronization (indicated with the
MS Name/IP address Stratum Poll Reach LastRx Last sample
#* PTP0 0 2 377 4 +23ns[ +5ns] +/- 5052ns
#+ PTP1 0 2 377 4 -116ns[ -116ns] +/- 5083ns
^- ntp1.example.com 1 4 377 6 -9983ns[ -10us] +/- 122us
^- ntp2.example.com 1 4 377 5 +11us[ +11us] +/- 85us
^- ntp3.example.com 1 4 377 5 -12us[ -12us] +/- 155us
In the next example synchronization in one PTP domain failed. The source is off by about 200 microseconds and it's drifting away. It doesn't agree with other sources, so it's rejected as a falseticker (indicated with the
x symbol). The other PTP source still works as expected and is used for synchronization.
MS Name/IP address Stratum Poll Reach LastRx Last sample
#* PTP0 0 2 377 3 +59ns[ +73ns] +/- 5057ns
#x PTP1 0 2 377 2 -239us[ -239us] +/- 5079ns
^- ntp1.example.com 1 4 377 11 -9996ns[ -10us] +/- 122us
^- ntp2.example.com 1 4 377 10 +8005ns[+8000ns] +/- 82us
^- ntp3.example.com 1 4 377 9 -13us[ -13us] +/- 163us
In the last example the other PTP source failed completely (
? symbol) and NTP sources are used for synchronization. The accuracy of the clock is significantly worse than before.
MS Name/IP address Stratum Poll Reach LastRx Last sample
#? PTP0 0 2 0 17m +25us[ -22ns] +/- 5075ns
#x PTP1 0 2 377 3 -382us[ -382us] +/- 5088ns
^* ntp1.example.com 1 4 377 11 -3078ns[-3000ns] +/- 121us
^+ ntp2.example.com 1 4 377 9 +6000ns[+6000ns] +/- 81us
^+ ntp3.example.com 1 4 377 7 -23us[ -23us] +/- 180us
When the PTP sources are fixed, they will be used for synchronization and the accuracy will improve again. Failure of any NTP source would be handled in the same way. The synchronization will work correctly for as long as the number of remaining good sources is larger than the number of falsetickers.
Here is an overview of main features that are currently specified in the protocols and that have an effect on accuracy, resiliency, or security:
|Transmit timestamp correction
|Client-side source selection
|Estimation of maximum error
Both NTP and PTP have some strong advantages over the other. PTP in ideal conditions with HW timestamping and transparent clocks can effectively eliminate the effect of the network on the measurements and synchronize the system clock with sub-microsecond accuracy. NTP is highly resilient. It works with multiple sources, estimates their errors, and selects only good sources for synchronization.
NTP supports authentication with symmetric keys in order to allow clients to verify the authenticity and integrity of received packets and prevent attackers from synchronizing them to a false time. The PTP specification includes an experimental security extension, which is not supported in linuxptp. For NTP there is also Autokey (RFC 5906), which is based on public-key cryptography, but it's no longer considered secure and should be avoided. It's supposed to be replaced by new Network Time Security (NTS) protocol, which will probably be specified for both NTP and PTP.
We can expect that both protocols will be improved over time. It will be interesting to see which one will be the first to allow both highly resilient and highly accurate synchronization. NTP will need to specify a new extension field for delay corrections, which will have to be supported in networking devices. For timestamp corrections, the specification could just include the ntpd's interleaved modes, possibly extended for the client/server mode, so servers don't have to be configured to accept symmetric associations. Alternatively, a new extension field could be introduced to request follow-up messages. PTP may need more substantial changes. It may need to allow multiple independent masters on one communication path, provide slaves with more information, and allow them to select between masters.
Irrespective of future improvements, as mentioned above, it is possible to get both accuracy and resiliency at the same time by combining PTP and NTP through the use of timemaster.
Interested in learning more? Questions on PTP, NTP, or timemaster? Please don't hesitate to reach out.