A brief history of mktime()

20 de outubro de 2022DJ Delorie5 minutos (tempo de leitura)

With apologies to Steven Hawking :-)

In the beginning, there was… well, we don't know, because we weren't there to tweet about it. Without the internet, it was difficult to arrange things like hunting parties or afternoon tea. Most people woke up with the sun and slept when it was dark. The keeping of time was so sloppy that eventually winter stopped happening in winter, and something had to be done about it. In 1582 Pope Gregory XIII created our modern calendar, and centuries later most countries have adopted it. I'm not kidding about "centuries" either; Saudi Arabia only adopted it in 2016, and Britain waited until 1752 as can be seen by the weird calendar that year:

$ cal 9 1752
   September 1752   
Su Mo Tu We Th Fr Sa
       1  2 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Despite this common calendar, we still didn't have the internet, and with the introduction of fast widespread travel, agreeing on the day wasn't enough. We had to agree on the time of day too. Since the time of day is different depending on where you are on the planet, time zones were created, and the world finally had a way to coordinate afternoon tea. Plus or minus leap seconds, but that's a different story.

And then there was Unix time

Computers like working with numbers (it's pretty much all they can do), but it's far easier to work with one large number than many small ones. So, instead of representing time as "year, month, day, hour, minute, second, timezone", early programmers chose to represent time as "seconds since Jan 1, 1970, GMT," and thus "Unix Time" (time_t) was born. Computers worldwide could easily coordinate their afternoon… er, tea? Well, they could timestamp messages, sort records by age, etc. Calculating elapsed time was as simple as subtracting two numbers.

Computers still had to interact with humans, though. Back in the dark ages of computing (i.e. pre-Internet, somewhere around 1987) mktime() and its inverse localtime() were written[1]. The mktime() function converts calendar time to Unix time. It takes a year, month, day, hour, minute and second… and a tm_isdst field? Doesn't the computer know when daylight saving time (DST) happens? Well, usually (and you can set tm_isdst to -1 most of the time), with one exception. Consider Sunday, November 6th in my timezone (America/Eastern). At 2:00 in the morning we "fall back" (reverse the clock) by an hour to transition from summer time (daylight saving time) to winter time (standard time), and the time from 1:00am to 2:00am is repeated. I repeat - the clock goes from 1:00 am to 2:00 am twice.

So if I ask mktime() to convert "Nov 6, 2022, 01:30:00" to a time_t, it needs to know *which* 01:30 I'm talking about. The tm_isdst field chooses one of those two wall clock times to convert to Unix time.

So most of the time, you can pass -1 and let the computer figure out whether DST is in effect, and if needed, set it to 0 or 1. But how do you know if it's needed? It turns out mktime() can tell you that. Let's look at the man page:

The mktime() function modifies the fields of the tm structure as
follows: ... tm_isdst is set (regardless of its initial value) to a
positive value or to 0, respectively, to indicate whether DST is or
is not in effect at the specified time.

Note that I'm referring to RHEL 8's man page and that other man pages may differ. The ISO spec may differ, too. A lot of the documentation is a bit vague, so let's try some calls and see what actually happens. Consider November 6, 2022, America/NewYork, which happens to be when the transition from DST to standard time happens, and the clocks go from 1:59 AM EDT to 1:00 AM EST. Here are some sample calls to mktime, showing what hour,› minute and DST are passed (tm_hour, tm_min, and tm_isdst) and what "fixed" values are in the structure after mktime() returns:

some sample calls to mktime, showing what hour, › minute and DST are passed (tm_hour, tm_min, and tm_isdst) and what "fixed" values are in the structure after mktime() returns

Ok, so that handles the normal cases, but what if we want more control over the transition period?

Ah! So, if you ask for a specific one of the overlap times, you can get it. What happens if you try this outside of a transition period?

With a bit of experimentation we've found enough consistency that we can choose at least one of two ways to get the results we want:

If it doesn't matter which of the two overlap times you get, set tm_isdst to -1. After the call, tm_isdst will have a suitable value.
If it does matter, call mktime() twice - once with tm_isdst set to 0 and once set to 1. If both calls set tm_isdst to the same value, use the time_t for the call that had that value from the start. I.e. for one of the two calls, tm_isdst will remain unchanged, and that's the "right" call. If the two calls return two different tm_isdst values, you have all the information you can get so you can choose one.

Here's some pseudo-code that changes the logic so that the tm_isdst you pass is only a hint, and only to be used during overlap times:

time_t mktime_hint (struct tm *t_in):
  if (t_in->tm_isdst == -1)
    return mktime (t_in);
  t0 = *t_in; t0.tm_isdst = 0; ret0 = mktime (&t0);
  t1 = *t_in; t1.tm_isdst = 1; ret1 = mktime (&t1);
  if (t0.tm_isdst == t1.tm_isdst)
    {
      if (t0.tm_isdst == 0)
        *t_in = t0, return ret0;
      *t_in = t1, return ret1;
    }
  // it's an overlapped time, choose one and return it.

Caveats

Above, I quoted RHEL 8's man page for mktime(). If you refer to the relevant standards, you'll find that the behavior of mktime() is not clearly defined. Normally one would rely on the standards to define the behavior one relies on, but when the behavior is unspecified, one must experiment on and research the implementation itself. The above results are based on RHEL's C library, which is based on glibc, and glibc has a policy of "backward compatibility forever"[2], so we can consider our experimental results reliable. However, even RHEL and Fedora differ in what the tm_isdst field means. Defining "Daylight Saving Time" is actually up to each government which is highly political and, thus, inconsistent. It turns out that the only valid non-negative values for tm_isdst are whatever values are returned by localtime(). In Dublin's official timezone data, for example, tm_isdst is zero in their summer and one in their winter, the opposite of the norm, but not on RHEL, which "normalizes" the data. If you lived in Dublin and used Fedora, and made an invalid assumption about tm_isdst values, you would not be on time for tea.

Conclusion

When standards cover the functions you're using, follow them. Here we've needed to experiment to find out what RHEL does when the standards are vague. If you need to support other systems with other C libraries, you'll need to test those to see how they work. Go ahead, you've got time.

[1] The oldest reference I can find was a contribution to BSD by David Olson, so, thanks Dave!

[2] How the GNU C Library handles backward compatibility