Skip to main content

3 ways to monitor time on OpenShift nodes

Learn how to monitor OpenShift nodes for NTP inaccuracies, corrections, or time drift occurrences.

Image by Alexas_Fotos from Pixabay

Today, most organizations rely on vast computer systems, and maintaining accurate time in those computer systems has always been a challenge for administrators. That task became easier with the introduction of the Network Time Protocol (NTP) in the early 1980s. However, NTP also introduced the need to monitor your computer systems to ensure the time is correct and there aren't many corrections or time drift occurrences.

While monitoring and maintaining the time on OpenShift nodes, OpenShift administrators might receive the dreaded NodeClockNotSynchronising alert from Prometheus. If you navigate to the Observe/Monitor tab in the OpenShift Web Console, you see numerous dashboards that display useful information. The Node Exporter/USE Method/Node dashboard is the best dashboard for monitoring your nodes. It provides CPU, memory, network utilization, and other helpful metrics. It does not, however, include any metrics for time.

So the question is: How do you monitor time on an OpenShift node? There are several options, including:

  • Accessing the node manually and running commands
  • Running queries in Prometheus to view collected metrics from the timex collector from the node_exporter running on each node
  • Setting up a custom Grafana instance and dashboard to utilize the Node Exporter Full dashboard to review the same timex metrics

I'll cover each option in more detail below.

Verify time manually on OpenShift nodes

Typically when there is a time issue, it affects the entire cluster because all nodes synchronize with the same time source. However, there are instances where only one or a few nodes in the cluster have time issues. When this situation occurs, it's easier to access the node directly and diagnose the problem.

To complete this task, use the oc debug node command to launch a debug pod on the affected node and then check the status of the time-keeping daemon to see whether it is running. If the service is running, the next step is to check the sources and tracking information:

$ oc debug node/openshift-sxqnd-master2
Starting pod/openshift-sxqnd-master2-debug ...
To use host binaries, run `chroot /host`

Pod IP: 192.168.50.62
If you don't see a command prompt, try pressing enter.
sh# chroot /host binbash
[root@openshift-sxqnd..]#  

Once in the chroot environment, check the status of the time-keeping daemon to determine if it is connecting to the upstream NTP servers and if the time-keeping daemon is in sync with those servers. To complete these tasks, run the following commands:

  • chronyc sources: Lists all sources the chronyd service utilizes based on the configuration of the chrony.conf file
  • chronyc sourcestats: Displays information about the drift rate and offset estimation for each source currently using chronyc
  • chronyc tracking: Provides detailed information about the system's clock performance

Refer to the chronyc documentation for more details about the information displayed from each command.

[root@openshift-sxq...]# chronyc sources
210 Number of sources = 1
MS Name/IP    Stratum  Poll  Reach  LastRx  Last sample
===========================================================
^*192.168.50.13     4     7    377     85   +642ns[ +525...
[root@openshift-sxq...]# chronyc sourcestats
Name/IP       NP NR Span Freq  Freq Skew  Offset  Std Dev
=========================================================
192.168.50.13  6  4  324  +0.478  4.675   +1399ns   131us
[root@openshift-sxq...]# chronyc tracking
Reference ID  : C0A832OD (192.168.50.13)
Stratum       : 5
Ref time (UTC): Thu Apr 28 21:12:01 2022
[...]

Use OpenShift's metrics to monitor time drift on each OpenShift node

The Prometheus exporter (also known as the Node Exporter) runs on each OpenShift node as part of a daemonset in the openshift-monitoring namespace. One of the collectors node_exporter utilizes is the timex collector. The timex collector exports the state of the kernel synchronization flag that is maintained by the time-keeping daemon. OpenShift provides the ability to query Prometheus from the web console. You can display these metrics to monitor time drift on each OpenShift node.

 [ For more on Grafana, Prometheus, and other monitoring tools, read 3 types of monitoring and some open source tools to get started. ]

For time synchronization drift and corrections, query the following metrics:

  • node_timex_estimated_error_seconds: Estimated error in seconds
  • node_timex_offset_seconds: Time offset between the local system clock and reference clock
  • node_timex_maxerror_seconds: The maximum error in seconds
  • node_timex_sync_status: Determines if the clock is synced to a reliable NTP server
  • node_timex_frequency_adjustment_ratio: Frequency of adjustment of the local clock

In the following examples, focus on a specific instance by adding {instance=" <openshift_node_name>"} after the metric you want to query.

(Morgan Peterman, CC BY-SA 4.0)
(Morgan Peterman, CC BY-SA 4.0)

Use custom Grafana dashboards to monitor time drift on each OpenShift node

If you would like a display with prebuilt graphs to monitor time drift and corrections for your entire cluster, you can use a custom Grafana dashboard. If you do not already have the Grafana Community Operator installed and configured, follow the steps in Custom Grafana dashboards for Red Hat OpenShift Container Platform 4.

[ Ready to get started with OpenShift? Get O'Reilly's OpenShift for Developers eBook. ]

Once you have a custom Grafana dashboard configured, import the Node Exporter Full dashboard to view the graphs for the system timesync.

(Morgan Peterman, CC BY-SA 4.0)

Now that you have imported the Node Exporter Full Dashboard, you can monitor the Time Synchronized Drift and Time Synchronized Status statistics. Time Synchronized Drift can help quickly determine the difference in time between the node and the reference NTP server. It also provides the maximum error in seconds, which is useful in seeing how far off the node has been. Time Synchronized Drift also shows if the node is synchronized to a reliable server.

(Morgan Peterman, CC BY-SA 4.0)

Wrap up

These are three different ways an OpenShift administrator can check the time synchronization on an OpenShift node. The first two options, manually verifying the time on the node and querying the built-in metrics for the node_exporter timex collection statistics, are quick and easy steps when there's a need to troubleshoot a single node or a few nodes with a time synchronization issue. The last option, setting up a custom Grafana dashboard, provides a nice graph that is readily available to check if a time synchronization issue occurs.

Topics:   OpenShift   Monitoring  
Author’s photo

Morgan Peterman

Morgan Peterman is a Senior Partner Technical Account Manager for Red Hat OpenShift. He is a Red Hat Certified Engineer (RHCE) and a Red Hat Certified Specialist in OpenShift. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.