I have an interest in application deployment systems, having created three different application bunding system (1,2, 3). These were all experiments in the area of desktop applications, but recently there has been some interesting motions in related areas, namely Docker.
Docker is a server application/container deployment system, which nicely sidesteps a lot of the complexity with desktop apps (not having to integrate deeply with the desktop) which makes it a lot easier to deploy. Additionally, docker is more than a deployment system, it also has some interesting ideas about how to create and distribute applications.
Every docker container is a copy-on-write clone of a specific parent Image, which means instantiation of docker containers is very fast and cheap. It also gives some very interesting properties because you can track the changes in a container (compared to t its parent image) and “commit” this to a new image. This creates a git-like hierarchy of images where every commit is a filesystem layer that applies to a previous layer, up to some base image. The git-like workflow is really nice to work with when creating images, and the final result is very easy to share and deploy, and at the same time automatically shares as much as possible with common base images.
Unfortunately Docker relies on AUFS, a union filesystem that is not in the upstream kernel, nor is it likely to ever be there. Also, while AUFS is in the current Ubuntu kernel it is deprecated there and will eventually be removed. This means Docker doesn’t run on Fedora which has a primarily-upstream approach to packaging.
So, the last month or so I’ve been working on making Docker work in Fedora (and thus eventually in RHEL, which is the nr 1 requested Docker feature). Of course, this work will benefit other distributions that don’t have AUFS too.
I started looking at possible replacements for the copy-on-write support, and there are a few possibilities availible:
-
overlayfs
-
btrfs
-
lvm snapshots
-
lvm thin provisioning
Overlayfs is a different union filesystem implementation than AUFS, and the one that seems most likely to land upstream. But that is happening slowly, if at all. Long-term I think this is the best option, but right now it is out of the question.
Btrfs has copy-on-write both using filesystem snapshots and one a file basis using reflink. However, btrfs is not currently used much in production as its not considered stable enough. It would also be a very heavy dependency for Docker, as may users would have to reformat their disks to use it.
Lvm snapshots are useful for doing e.g. backup of a snapshot, but regress badly in performance when you start having many snapshots of the same device.
This leaves us with only lvm thin provisioning. This is a fairly recent, but relatively stable technology that allows you to create copy-on-write block devices that are “thinly” provisioned, meaning they don’t use real space until the device is in use. This is not ideal for Docker as it really wants copy-on-write at the file level, but with some work it is possible to work around this.
Rather than interacting with lvm which is a very generic volume manager I chose to use the lower level device-mapper kernel APIs directly (via libdevmapper). This allows us greater ease of access to the devices programmatically, as well as avoiding confusion with possible system use of LVM. Also it avoids some LVM performance issues with very many devices.
So, we set up a single large block device on which we create a device-mapper “thinp” pool. On this we then creates a single “base” block device formated with ext4. Every image and container are then created as snapshot (in multiple steps) from this base device. So, say you’re starting a container based on an image “apache” which itself is based on a “fedora” image, we would:
-
Create a snapshot of the base device.
-
Mount it and apply the changes in the fedora image.
-
Create a snapshot based on the fedora device.
-
Mount it and apply the changes in the apache image.
-
Create a snapshot based on the apache device.
-
Mount it and use as the root in the new container.
And of course, these devices will be reused (with corresponding steps skipped) as needed by other images/containers.
The devicemapper pool need to be set up on a large block device that fits all the images and containers that you will be used which would be painful for most people. Docker handles this by automatically creating the a large sparse file, using it as a loopback device for the devicemapper work. Additionally we ensure that DISCARD support is enabled in the filesystem so that any files removed in the conttainer filters down to the loopback file making it sparse again.
This means that there is no need for setup, and space for images and containers will only be used as needed. Of course, there are still issues, like the max size of the loopback mount (100G by default, but this should be easy to grow) and the max size of the base extt4 image (10G by default, resizing is harder after initial construction, but should be possible).
We’re currently in the process of landing this in Docker, and hope to have a 0.7 release out based on my device-mapper work pretty soon. Then I will continue working on making docker a first-class citizen on Fedora.
Sull'autore
Alexander has worked at Red Hat since 2001, doing development work on desktop and containers, including the creation of flatpak and lots foundational work on Gnome.
Ricerca per canale
Automazione
Novità sull'automazione IT di tecnologie, team e ambienti
Intelligenza artificiale
Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque
Hybrid cloud open source
Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido
Sicurezza
Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti
Edge computing
Aggiornamenti sulle piattaforme che semplificano l'operatività edge
Infrastruttura
Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale
Applicazioni
Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili
Serie originali
Raccontiamo le interessanti storie di leader e creatori di tecnologie pensate per le aziende
Prodotti
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Servizi cloud
- Scopri tutti i prodotti
Strumenti
- Formazione e certificazioni
- Il mio account
- Supporto clienti
- Risorse per sviluppatori
- Trova un partner
- Red Hat Ecosystem Catalog
- Calcola il valore delle soluzioni Red Hat
- Documentazione
Prova, acquista, vendi
Comunica
- Contatta l'ufficio vendite
- Contatta l'assistenza clienti
- Contatta un esperto della formazione
- Social media
Informazioni su Red Hat
Red Hat è leader mondiale nella fornitura di soluzioni open source per le aziende, tra cui Linux, Kubernetes, container e soluzioni cloud. Le nostre soluzioni open source, rese sicure per un uso aziendale, consentono di operare su più piattaforme e ambienti, dal datacenter centrale all'edge della rete.
Seleziona la tua lingua
Red Hat legal and privacy links
- Informazioni su Red Hat
- Opportunità di lavoro
- Eventi
- Sedi
- Contattaci
- Blog di Red Hat
- Diversità, equità e inclusione
- Cool Stuff Store
- Red Hat Summit