Random numbers are important in computing. TCP/IP sequence numbers, TLS nonces, ASLR offsets, password salts, and DNS source port numbers all rely on random numbers. In cryptography randomness is found everywhere, from the generation of keys to encryption systems, even the way in which cryptosystems are attacked. Without randomness, all crypto operations would be predictable and hence insecure.

When computer algorithms are fed with the same input they should always give the same output; they are predictable and therefore not a good source of random numbers. A good random numbers generator consists of two parts: a source of entropy and a cryptographic algorithm.

A source of entropy (RNG)

Entropy is the measurement of uncertainty or disorder in a system. Good entropy comes from the surrounding environment which is unpredictable and chaotic. You can think of entropy as the amount of surprise found in the result of a randomized process: the higher the entropy, the less the certainty found in the result. Random number generators or RNGS are hardware devices or software programs which take non-deterministic inputs in the form of physical measurements of temperature or phase noise or clock signals etc and generate unpredictable numbers as its output.

A hardware RNG could use hard-to-predict values such as wind speed or atmospheric pressure, or exploit intrinsically random (quantum) processes such as photon transmission/reflection through a semi-transparent mirror. In computers we can use the attached hardware to harvest entropy like movement on the mouse pointer, keys typed on the keyboard, and disk and/or network I/O. Such systems are a good source of entropy, however they are slow to yield data (for example the CPU jitter generator). Also they are dependent on external triggers in order to generate random numbers and are often not reliable when large amount of random numbers are required.

There are algorithms to produce pseudo-random values from within an ideal, deterministic computing environment. However, there is no algorithm to produce unpredictable random numbers without some sort of additional non-deterministic input.

A cryptographic algorithm (PRNG)

Pseudo random number generators, or PRNGs, are systems that are efficient in reliably producing lots of artificial random bits from a few true random bits. For example, a RNG which relies on mouse movements or keyboard key presses would stop working once the user stops interacting with the mouse or the keyboard. However a PRNG would use these random bits of initial entropy and continue producing random numbers.

PRNGs maintain a large memory buffer called the entropy pool. The bytes received from the entropy sources (RNG) are stored there. Often the PRNG mixes the entropy pool bytes in order to remove statistical biases in the entropy data. Random bits are generated by running a deterministic random bit generator (DRBG) on the entropy pool data bits. This algorithm is deterministic (it always produces the same output given the same input). The trick is to ensure that the DRBG is never fed the same value input twice!

Real working PRNG’s

Most operating systems have built-in crypto PRNGs. Most of them are software based, but some can be pure hardware as well. In Linux, the device files /dev/random and /dev/urandom are the userland interfaces to the crypto PRNG which can reliably generate random bits.

The kernel maintains an entropy pool which is used to store random data generated from events like inter-keypress timings, inter-interrupt timings, etc. Randomness from these interfaces are fixed with the entropy pool using a sort of cyclic redundancy check-like function. This is not cryptographically strong but tries to ensure that any maliciously introduced randomness is eliminated and is also fast enough. The kernel also keeps an estimate of how many bits of randomness has been stored into the random number generator’s internal state via the /proc/sys/kernel/random/entropy_avail file.

When random numbers are desired they are obtained by taking SHA-1 hash of the contents of the entropy pool. The SHA hash is chosen because it is cryptographically strong: it does not expose the contents of the entropy pool, and it is computationally infeasible to reverse the SHA output to obtain its input. Thus, the confidentiality of the entropy pool is preserved. On each generation of random numbers, the kernel decreases its estimate of true randomness which are contained in the entropy pool.

The kernel provides two character devices /dev/random and /dev/urandom. The /dev/random device is suitable for use when very high-quality randomness is desired (for example, for key generation or one-time pads), as it will only return a maximum of the number of bits of randomness (as estimated by the random number generator) contained in the entropy pool.

The /dev/urandom device does not have this limit and will return as many bytes as are  requested. As more and more random bytes are requested without giving time for the entropy pool to recharge, this will result in random numbers that are “merely” cryptographically strong. For many applications, however, this is acceptable.

The biggest problem with /dev/random is that it is blocking. Once the kernel's entropy pool is exhausted, reads from /dev/random will pause until sufficient entropy is replenished. Such pauses are typically unacceptable and can constitute a denial-of-service attack against the application or even the system as a whole.

Boot time randomness

In 2012 security researchers scanned the internet and harvested public keys from TLS certificates and SSH hosts. They found a few systems had identical public keys and in some cases very similar RSA keys with shared prime factors. It was found that many of these systems generated their keys very early after boot. At this point very little entropy is collected in the entropy pool. Therefore despite having a good PRNG, because the entropy pool is almost identical, the random numbers generated are similar on different systems. In Linux you can carry the information in the entropy pool across shutdowns and start-ups.

To do this, you can use this as a script to run during the boot sequence:

echo "Initializing random number generator..."
random_seed=/var/run/random-seed
# Carry a random seed from start-up to start-up
# Load and then save the whole entropy pool
if [ -f $random_seed ]; then
   cat $random_seed >/dev/urandom
  else
    touch $random_seed
fi
chmod 600 $random_seed
dd if=/dev/urandom of=$random_seed count=1 bs=512

and use this as a script which is run as the system is shut down:

# Carry a random seed from shut-down to start-up
# Save the whole entropy pool
echo "Saving random seed..."
random_seed=/var/run/random-seed
touch $random_seed
chmod 600 $random_seed
dd if=/dev/urandom of=$random_seed count=1 bs=512

For example, for older systems which use System V init scripts, such code fragments would be found in /etc/rc.d/init.d/random. The script causes the contents of the entropy pool to be saved at shutdown time and reloaded into the entropy pool at start-up.  (The dd, in addition to the bootup script, is to make sure that /etc/random-seed is different for every start-up)

Newer systems (for example Red Hat Enterprise Linux 7) which use systemd already have the systemd-random-seed.service installed by default. This service restores the random seed of the system at early boot and saves it at shutdown which has the same effect as the script listed above.

Hardware based PRNG

The Intel Digital Random Number Generator is a hardware random generator which was introduced in Intel CPUs in 2012 as a part of Intel Ivy Bridge microarchitecture and is based on NIST’s SP 800-90 guidelines. Intel provides RDRAND assembly instructions which can be used to access this PRNG and is much faster than any software PRNGs.

RDRAND has a single entropy source and provides a stream of entropy data as zeros and ones. It is essentially a hardware circuit which jumps between 0 and 1 based on thermal noise fluctuations within the CPU. Though Intel’s PRNG is only partially documented it is audited by a company called Cryptography Research. There are, however, some concerns about the security of this type of random number generator, mainly since PRNGs are a very good target for cryptographic backdoors. These issues can normally be avoided by mixing the output from RDRAND with other sources of entropy in the entropy pool (unless of course the CPU itself is malicious). This should prevent any possible bias, if they exist.

Providing random numbers on virtual machines

Generating a good amount of entropy can be a problem for virtual machines because by default there are no attached hardware devices which can seed the entropy pool. Red Hat Enterprise Linux 7 includes virtio-rng, a virtual hardware random number generator device that can provide the guest with fresh entropy on request.

On the host physical machine, the hardware RNG interface creates a chardev at /dev/hwrng, which can be opened and then read to fetch entropy from the host physical machine. In co-operation with the rngd daemon, the entropy from the host physical machine can be routed to the guest virtual machine's /dev/random, which is the primary source of randomness. The virtual random number generator device allows the host physical machine to pass through entropy to guest virtual machine operating systems.

Do you know about Red Hat Enterprise Linux's latest features and updates?

Non-crytographic random number generators

Finally, let us look at a good source of non-cryptographic random number generator on Linux, namely glibc’s random() function. Glibc provides a simple linear congruential generator (LCG), defined by the following equation:

val = ((state * 1103515245) + 12345) & 0x7fffffff

This generator is referred to as TYPE_0 in the glibc source.  (LCG random generators have the useful property that they are very fast and they have a very small amount of state - the same size as the random value that is returned. This means that once a particular (31-bit) value is produced, it will not be seen again until the function has been called enough times to produce every other value in its range.)

Glibc also provides a slightly more advanced, additive feedback generator. That generator has a number of states, unlike the above described LCG. You can get the same number twice (or more times) during the same period. This generator is called the TYPE_1, TYPE_2, TYPE_3 or TYPE_4 in the glibc source.

Which generator is used depends on the size of the initial state set with the initstate() function. The first (LCG) generator is used only when state size is 8 bytes. When it is bigger, the second generator is used. When you set your seed using srand() the size of the state is 128 bytes by default, so the second generator is used. While not cryptographically strong, these generators are useful for monte-carlo methods and testing, where it may be desirable to repeat exactly the same pseudo-random stream on a subsequent run. As long as srand() or initstate() is called with the same value each time your program starts, it will obtain the same random numbers.

Conclusion:

Random numbers are the lifeline of any cryptographic operation in modern computing. It is important for developers to understand what interface to use, and how to handle random numbers correctly in their code. It is also important for users to understand the limitations of such code. This post provides a basic insight into how random number generators actually work in Linux and what are their limitations. 


关于作者

Huzaifa Sidhpurwala is a Principal Product Security Engineer with Red Hat and part of a number of upstream security groups such as Mozilla, LibreOffice, Python, PHP and others. He speaks about security issues at open source conferences, and has been a Fedora contributor for more than 10 years.

Read full bio