Looking forward to Linux network configuration in the initial ramdisk (initrd)
An initrd (initial ramdisk) is a small filesystem loaded during the boot process on a Linux system. One of the tasks that the initrd might be responsible for is network configuration.
This article explains the cases in which network configuration early in the boot process is necessary, how it's implemented, and the improvements that Red Hat Enterprise Linux 8.3 brings.
The need for an initrd
When you press a machine's power button, the boot process starts with a hardware-dependent mechanism that loads a bootloader. The bootloader software finds the kernel on the disk and boots it. Next, the kernel mounts the root filesystem and executes an init
process.
This process sounds simple, and it might be what actually happens on some Linux systems. However, modern Linux distributions have to support a vast set of use cases for which this procedure is not adequate.
First, the root filesystem could be on a device that requires a specific driver. Before trying to mount the filesystem, the right kernel module must be inserted into the running kernel. In some cases, the root filesystem is on an encrypted partition and therefore needs a userspace helper that asks the passphrase to the user and feeds it to the kernel. Or, the root filesystem could be shared over the network via NFS or iSCSI, and mounting it may first require configured IP addresses and routes on a network interface.
[ You might also like: Linux networking: 13 uses for netstat ]
To overcome these issues, the bootloader can pass to the kernel a small filesystem image (the initrd) that contains scripts and tools to find and mount the real root filesystem. Once this is done, the initrd switches to the real root, and the boot continues as usual.
The dracut infrastructure
On Fedora and RHEL, the initrd is built through dracut. From its home page, dracut is "an event-driven initramfs infrastructure. dracut (the tool) is used to create an initramfs image by copying tools and files from an installed system and combining it with the dracut framework, usually found in /usr/lib/dracut/modules.d
."
A note on terminology: Sometimes, the names initrd and initramfs are used interchangeably. They actually refer to different ways of building the image. An initrd is an image containing a real filesystem (for example, ext2) that gets mounted by the kernel. An initramfs is a cpio archive containing a directory tree that gets unpacked as a tmpfs. Nowadays, the initrd images are deprecated in favor of the initramfs scheme. However, the initrd name is still used to indicate the boot process involving a temporary filesystem.
Kernel command-line
Let's revisit the NFS-root scenario that was mentioned before. One possible way to boot via NFS is to use a kernel command-line containing the root=dhcp
argument.
The kernel command-line is a list of options passed to the kernel from the bootloader, accessible to the kernel and applications. If you use GRUB, it can be changed by pressing the e key on a boot entry and editing the line starting with linux.
The dracut code inside the initramfs parses the kernel command-line and starts DHCP on all interfaces if the command-line contains root=dhcp
. After obtaining a DHCP lease, dracut configures the interface with the parameters received (IP address and routes); it also extracts the value of the root-path DHCP option from the lease. The option carries an NFS server's address and path (which could be, for example, 192.168.50.1:/nfs/client
). Dracut then mounts the NFS share at this location and proceeds with the boot.
If there is no DHCP server providing the address and the NFS root path, the values can be configured explicitly in the command line:
root=nfs:192.168.50.1:/nfs/client ip=192.168.50.101:::24::ens2:none
Here, the first argument specifies the NFS server's address, and the second configures the ens2 interface with a static IP address.
There are two syntaxes to specify network configuration for an interface:
ip=<interface>:{dhcp|on|any|dhcp6|auto6}[:[<mtu>][:<macaddr>]]
ip=<client-IP>:[<peer>]:<gateway-IP>:<netmask>:<client_hostname>:<interface>:{none|off|dhcp|on|any|dhcp6|auto6|ibft}[:[<mtu>][:<macaddr>]]
The first can be used for automatic configuration (DHCP or IPv6 SLAAC), and the second for static configuration or a combination of automatic and static. Here some examples:
ip=enp1s0:dhcp
ip=192.168.10.30::192.168.10.1:24::enp1s0:none
ip=[2001:0db8::02]::[2001:0db8::01]:64::enp1s0:none
Note that if you pass an ip=
option, but dracut doesn't need networking to mount the root filesystem, the option is ignored. To force network configuration without a network root, add rd.neednet=1
to the command line.
You probably noticed that among automatic configuration methods, there is also ibft. iBFT stands for iSCSI Boot Firmware Table and is a mechanism to pass parameters about iSCSI devices from the firmware to the operating system. iSCSI (Internet Small Computer Systems Interface) is a protocol to access network storage devices. Describing iBFT and iSCSI is outside the scope of this article. What is important is that by passing ip=ibft
to the kernel, the network configuration is retrieved from the firmware.
Dracut also supports adding custom routes, specifying the machine name and DNS servers, creating bonds, bridges, VLANs, and much more. See the dracut.cmdline man page for more details.
Network modules
The dracut framework included in the initramfs has a modular architecture. It comprises a series of modules, each containing scripts and binaries to provide specific functionality. You can see which modules are available to be included in the initramfs with the command dracut --list-modules
.
At the moment, there are two modules to configure the network: network-legacy
and network-manager
. You might wonder why different modules provide the same functionality.
network-legacy
is older and uses shell scripts calling utilities like iproute2
, dhclient
, and arping
to configure interfaces. After the switch to the real root, a different network configuration service runs. This service is not aware of what the network-legacy
module intended to do and the current state of each interface. This can lead to problems maintaining the state across the root switch boundary.
A prominent example of a state to be kept is the DHCP lease. If an interface's address changed during the boot, the connection to an NFS share would break, causing a boot failure.
To ensure a seamless transition, there is a need for a mechanism to pass the state between the two environments. However, passing the state between services having different configuration models can be a problem.
The network-manager
dracut module was created to improve this situation. The module runs NetworkManager in the initrd to configure connection profiles generated from the kernel command-line. Once done, NetworkManager serializes its state, which is later read by the NetworkManager instance in the real root.
Fedora 31 was the first distribution to switch to network-manager
in initrd by default. On RHEL 8.2, network-legacy
is still the default, but network-manager
is available. On RHEL 8.3, dracut will use network-manager
by default.
Enabling a different network module
While the two modules should be largely compatible, there are some differences in behavior. Some of those are documented in the nm-initrd-generator
man page. In general, it is suggested to use the network-manager
module when NetworkManager is enabled.
To rebuild the initrd using a specific network module, use one of the following commands:
# dracut --add network-legacy --force --verbose
# dracut --add network-manager --force --verbose
Since this change will be reverted the next time the initrd is rebuilt, you may want to make the change permanent in the following way:
# echo 'add_dracutmodules+=" network-manager "' > /etc/dracut.conf.d/network-module.conf
# dracut --regenerate-all --force --verbose
The --regenerate-all
option also rebuilds all the initramfs images for the kernel versions found on the system.
The network-manager dracut module
As with all dracut modules, the network-manager
module is split into stages that are called at different times during the boot (see the dracut.modules man page for more details).
The first stage parses the kernel command-line by calling /usr/libexec/nm-initrd-generator
to produce a list of connection profiles in /run/NetworkManager/system-connections
. The second part of the module runs after udev has settled, i.e., after userspace has finished handling the kernel events for devices (including network interfaces) found in the system.
When NM is started in the real root environment, it registers on D-Bus, configures the network, and remains active to react to events or D-Bus requests. In the initrd, NetworkManager is run in the configure-and-quit=initrd
mode, which doesn't register on D-Bus (since it's not available in the initrd, at least for now) and exits after reaching the startup-complete event.
The startup-complete event is triggered after all devices with a matching connection profile have tried to activate, successfully or not. Once all interfaces are configured, NM exits and calls dracut hooks to notify other modules that the network is available.
Note that the /run/NetworkManager
directory containing generated connection profiles and other runtime state is copied over to the real root so that the new NetworkManager process running there knows exactly what to do.
Troubleshooting
If you have network issues in dracut, this section contains some suggestions for investigating the problem.
The first thing to do is add rd.debug to the kernel command-line, enabling debug logging in dracut. Logs are saved to /run/initramfs/rdsosreport.txt
and are also available in the journal.
If the system doesn't boot, it is useful to get a shell inside the initrd environment to manually check why things aren't working. For this, there is an rd.break command-line argument. Note that the argument spawns a shell when the initrd has finished its job and is about to give control to the init process in the real root filesystem. To stop at a different stage of dracut (for example, after command-line parsing), use the following argument:
rd.break={cmdline|pre-udev|pre-trigger|initqueue|pre-mount|mount|pre-pivot|cleanup}
The initrd image contains a minimal set of binaries; if you need a specific tool at the dracut shell, you can rebuild the image, adding what is missing. For example, to add the ping and tcpdump binaries (including all their dependent libraries), run:
# dracut -f --install "ping tcpdump"
and then optionally verify that they were included successfully:
# lsinitrd | grep "ping\|tcpdump"
Arguments: -f --install 'ping tcpdump'
-rwxr-xr-x 1 root root 82960 May 18 10:26 usr/bin/ping
lrwxrwxrwx 1 root root 11 May 29 20:35 usr/sbin/ping -> ../bin/ping
-rwxr-xr-x 1 root root 1065224 May 29 20:35 usr/sbin/tcpdump
The generator
If you are familiar with NetworkManager configuration, you might want to know how a given kernel command-line is translated into NetworkManager connection profiles. This can be useful to better understand the configuration mechanism and find syntax errors in the command-line without having to boot the machine.
The generator is installed in /usr/libexec/nm-initrd-generator
and must be called with the list of kernel arguments after a double dash. The --stdout
option prints the generated connections on standard output. Let's try to call the generator with a sample command line:
$ /usr/libexec/nm-initrd-generator --stdout -- \
ip=enp1s0:dhcp:00:99:88:77:66:55 rd.peerdns=0
802-3-ethernet.cloned-mac-address: '99:88:77:66:55' is not a valid MAC
address
In this example, the generator reports an error because there is a missing field for the MTU after enp1s0. Once the error is corrected, the parsing succeeds and the tool prints out the connection profile generated:
$ /usr/libexec/nm-initrd-generator --stdout -- \
ip=enp1s0:dhcp::00:99:88:77:66:55 rd.peerdns=0
*** Connection 'enp1s0' ***
[connection]
id=enp1s0
uuid=e1fac965-4319-4354-8ed2-39f7f6931966
type=ethernet
interface-name=enp1s0
multi-connect=1
permissions=
[ethernet]
cloned-mac-address=00:99:88:77:66:55
mac-address-blacklist=
[ipv4]
dns-search=
ignore-auto-dns=true
may-fail=false
method=auto
[ipv6]
addr-gen-mode=eui64
dns-search=
ignore-auto-dns=true
method=auto
[proxy]
Note how the rd.peerdns=0 argument translates into the ignore-auto-dns=true property, which makes NetworkManager ignore DNS servers received via DHCP. An explanation of NetworkManager properties can be found on the nm-settings man page.
[ Network getting out of control? Check out Network automation for everyone, a free book from Red Hat. ]
Conclusion
The NetworkManager dracut module is enabled by default in Fedora and will also soon be enabled on RHEL. It brings better integration between networking in the initrd and NetworkManager running in the real root filesystem.
While the current implementation is working well, there are some ideas for possible improvements. One is to abandon the configure-and-quit=initrd
mode and run NetworkManager as a daemon started by a systemd service. In this way, NetworkManager will be run in the same way as when it's run in the real root, reducing the code to be maintained and tested.
To completely drop the configure-and-quit=initrd
mode, NetworkManager should also be able to register on D-Bus in the initrd. Currently, dracut doesn't have any module providing a D-Bus daemon because the image should be minimal. However, there are already proposals to include it as it is needed to implement some new features.
With D-Bus running in the initrd, NetworkManager's powerful API will be available to other tools to query and change the network state, unlocking a wide range of applications. One of those is to run nm-cloud-setup
in the initrd. The service, shipped in the NetworkManager-cloud-setup
Fedora package fetches metadata from cloud providers' infrastructure (EC2, Azure, GCP) to automatically configure the network.
Beniamino Galvani
Beniamino is a software engineer working in the networking services team at Red Hat. More about me