[libvirt] [FW: An introduction to libvirt's LXC (LinuX Container) support]

Daniel P. Berrange berrange at redhat.com
Wed Sep 17 15:35:09 UTC 2008


FYI this mail i just sent to containers at lists.linux-foundation.org
where all the kernel container developers hang out.

Daniel

----- Forwarded message from "Daniel P. Berrange" <berrange at redhat.com> -----

> Date: Wed, 17 Sep 2008 16:06:35 +0100
> From: "Daniel P. Berrange" <berrange at redhat.com>
> To: containers at lists.linux-foundation.org
> Subject: An introduction to libvirt's LXC (LinuX Container) support
> 
> This is a short^H^H^H^H^H long mail to introduce / walk-through some
> recent developments in libvirt to support native Linux hosted
> container virtualization using the kernel capabilities the people
> on this list have been adding in recent releases. We've been working
> on this for a few months now, but not really publicised it before
> now, and I figure the people working on container virt extensions
> for Linux might be interested in how it is being used.
> 
> For those who aren't familiar with libvirt, it provides a stable API
> for managing virtualization hosts and their guests. It started with
> a Xen driver, and over time has evolved to add support for QEMU, KVM,
> OpenVZ and most recently of all a driver we're calling "LXC" short
> for "LinuX Containers". The key is that no matter what hypervisor
> you are using, there is a consistent set of APIs, and standardized
> configuration format for userspace management applications in the
> host (and remote secure RPC to the host).
> 
> The LXC driver is the result of a combined effort from a number of
> people in the libvirt community, most notably Dave Leskovec contributed
> the original code, and Dan Smith now leads development along with my
> own contributions to its architecture to better integrate with libvirt.
> 
> We have a couple of goals in this work. Overall, libvirt wants to be
> the defacto standard, open source management API for all virtualization
> platforms and native Linux virtualization capabilities are a strong
> focus. The LXC driver is attempting to provide a general purpose
> management solution for two container virt use cases:
> 
>  - Application workload isolation
>  - Virtual private servers
> 
> In the first use case we want to provide the ability to run an
> application in primary host OS with partial restrictons on its
> resource / service access. It will still run with the same root
> directory as the host OS, but its filesystem namespace may have
> some additional private mount points present. It may have a
> private network namespace to restrict its connectivity, and it
> will ultimately have restrictions on its resource usage (eg
> memory, CPU time, CPU affinity, I/O bandwidth).
> 
> In the second use case, we want to provide completely virtualized
> operating system in the container (running the host kernel of
> course), akin to the capabilities of OpenVZ / Linux-VServer. The
> container will have a totally private root filesystem, private
> networking namespace, whatever other namespace isolation the
> kernel provides, and again resource restirctions. Some people
> like to think of this as 'a better chroot than chroot'.
> 
> In terms of technical implementation, at its core is direct usage 
> of the new clone() flags. By default all containers get created 
> with CLONE_NEWPID, CLONE_NEWNS, CLONE_NEWUTS, CLONE_NEWUSER, and
> CLONE_NEWIPC. If private network config was requested they also
> get CLONE_NEWNET.
> 
> For the workload isolation case, after creating the container we
> just add a number of filesystem mounts in the containers private
> FS namespace. In the VPS case, we'll do a pivot_root() onto the
> new root directory, and then add any extra filesystem mounts the
> container config requested.
> 
> The stdin/out/err of the process leader in the container is bound
> to the slave end of a Psuedo TTY, libvirt owning the master end
> so it can provide a virtual text console into the guest container.
> Once the basic container setup is complete, libvirt exec the so 
> called 'init' process. Things are thus setup such that when the 
> 'init' process exits, the container is terminated / cleaned up.
> 
> On the host side, the libvirt LXC driver creates what we call a
> 'controller' process for each container. This is done with a small
> binary /usr/libexec/libvirt_lxc. This is the process which owns the
> master end of the Pseduo-TTY, along with a second Pseduo-TTY pair.
> When the host admin wants to interact with the contain, they use
> the command 'virsh console CONTAINER-NAME'. The LXC controller
> process takes care of forwarding I/O between the two slave PTYs,
> one slave opened by virsh console, the other being the containers'
> stdin/out/err. If you kill the controller, then the container
> also dies. Basically you can think of the libvirt_lxc controller
> as serving the equivalent purpose to the 'qemu' command for full
> machine virtualization - it provides the interface between host
> and guest, in this case just the container setup, and access to
> text console - perhaps more in the future.
> 
> For networking, libvirt provides two core concepts
> 
>  - Shared physical device. A bridge containing one of your
>    physical network interfaces on the host, along with one or
>    more of the guest vnet interfaces. So the container appears
>    as if its directly on the LAN
> 
>  - Virtual network. A bridge containing only guest vnet
>    interfaces, and NO physical device from the host. IPtables
>    and forwarding provide routed (+ optionally NATed)
>    connectivity to the LAN for guests.
> 
> The latter use case is particularly useful for machines without
> a permanent wired ethernet - eg laptops, using wifi, as it lets
> guests talk to each other even when there's no active host network.
> Both of these network setups are fully supported in the LXC driver
> in precense of a suitably new host kernel.
> 
> That's a 100ft overview and the current functionality is working
> quite well from an architectural/technical point of view, but there
> is plenty more work we still need todo to provide an system which
> is mature enough for real world production deployment.
> 
>  - Integration with cgroups. Although I talked about resource
>    restrictions, we've not implemented any of this yet. In the
>    most immediate timeframe we want to use cgroups' device
>    ACL support to prevent the container having any ability to
>    access to device nodes other than the usual suspects of
>    /dev/{null,full,zero,console}, and possibly /dev/urandom.
>    The other important one is to provide a memory cap across
>    the entire container. CPU based resource control is lower
>    priority at the moment.
> 
>  - Efficient query of resource utilization. We need to be able
>    to get the cumulative CPU time of all the processes inside 
>    the container, without having to iterate over every PIDs'
>    /proc/$PID/stat file. I'm not sure how we'll do this yet..
>    We want to get this data this for all CPUs, and per-CPU.
> 
>  - devpts virtualization. libvirt currently just bind mount the
>    host's /dev/pts into the container. Clearly this isn't a
>    serious impl. We've been monitoring the devpts namespace
>    patches and these look like they will provide the capabilities
>    we need for the full virtual private server use case
> 
>  - network sysfs virtualization. libvirt can't currently use the
>    CLONE_NEWNET flag in most Linux distros, since current released
>    kernel has this capability conflicting with SYSFS in KConfig.
>    Again we're looking forward to seeing this addressed in next
>    kernel
> 
>  - UID/GID virtualization. While we spawn all containers as root,
>    applications inside the container may witch to unprivileged
>    UIDs. We don't (neccessarily) want users in the host with
>    equivalent UIDs to be able to kill processes inside the
>    container. It would also be desirable to allow unprivileged
>    users to create containers without needing root on the host,
>    but allowing them to be root & any other user inside their
>    container. I'm not aware if anyone's working on this kind of
>    thing yet ?
> 
> There're probably more things Dan Smith is thinking of but that
> list is a good starting point.
> 
> Finally, a 30 second overview of actually using LXC usage with
> libvirt to create a simple VPS using busybox in its root fs...
> 
>  - Create a simple chroot environment using busybox
> 
>     mkdir /root/mycontainer
>     mkdir /root/mycontainer/bin
>     mkdir /root/mycontainer/sbin
>     cp /sbin/busybox /root/mycontainer/sbin
>     for cmd in sh ls chdir chmod rm cat vi
>     do
>       ln -s /root/mycontainer/bin/$cmd ../sbin/busybox
>     done
>     cat > /root/mycontainer/sbin/init <<EOF
>     #!/sbin/busybox
>     sh
>     EOF
> 
> 
>  - Create a simple libvirt configuration file for the
>    container, defining the root filesystem, the network
>    connection (bridged to br0 in this case), and the
>    path to the 'init' binary (defaults to /sbin/init if
>    omitted)
> 
>     # cat > mycontainer.xml <<EOF
>     <domain type='lxc'>
>       <name>mycontainer</name>
>       <memory>500000</memory>
>       <os>
>         <type>exe</type>
>         <init>/sbin/init</init>
>       </os>
>       <devices>
>         <filesystem type='mount'>
>           <source dir='/root/mycontainer'/>
>           <target dir='/'/>
>         </filesystem>
>         <interface type='bridge'>
>           <source network='br0'/>
>           <mac address='00:11:22:34:34:34'/>
>         </interface>
>         <console type='pty' />
>       </devices>
>     </domain>
>     EOF
> 
>  - Load the configuration into libvirt
> 
>     # virsh --connect lxc:/// define mycontainer.xml
>     # virsh --connect lxc:/// list --inactive
>      Id Name                 State
>     ----------------------------------
>      -  mycontainer          shutdown
> 
> 
> 
>  - Start the VM and query some information about it
> 
>     # virsh --connect lxc:/// start mycontainer
>     # virsh --connect lxc:/// list
>      Id   Name                 State
>     ----------------------------------
>     28407 mycontainer          running
> 
>     # virsh --connect lxc:/// dominfo mycontainer
>     Id:             28407
>     Name:           mycontainer
>     UUID:           8369f1ac-7e46-e869-4ca5-759d51478066
>     OS Type:        exe
>     State:          running
>     CPU(s):         1
>     Max memory:     500000 kB
>     Used memory:    500000 kB
> 
> 
>    NB. the CPU/memory info here is not enforce yet.
> 
>  - Interact with the container
> 
>     # virsh --connect lxc:/// console mycontainer
> 
>    NB, Ctrl+] to exit when done
> 
>  - Query the live config - eg to discover what PTY its
>    console is connected to
> 
> 
>     # virsh --connect lxc:/// dumpxml mycontainer
>     <domain type='lxc' id='28407'>
>       <name>mycontainer</name>
>       <uuid>8369f1ac-7e46-e869-4ca5-759d51478066</uuid>
>       <memory>500000</memory>
>       <currentMemory>500000</currentMemory>
>       <vcpu>1</vcpu>
>       <os>
>         <type arch='i686'>exe</type>
>         <init>/sbin/init</init>
>       </os>
>       <clock offset='utc'/>
>       <on_poweroff>destroy</on_poweroff>
>       <on_reboot>restart</on_reboot>
>       <on_crash>destroy</on_crash>
>       <devices>
>         <filesystem type='mount'>
>           <source dir='/root/mycontainer'/>
>           <target dir='/'/>
>         </filesystem>
>         <console type='pty' tty='/dev/pts/22'>
>           <source path='/dev/pts/22'/>
>           <target port='0'/>
>         </console>
>       </devices>
>     </domain>
> 
>  - Shutdown the container
> 
>     # virsh --connect lxc:/// destroy mycontainer
> 
> There is lots more I could say, but hopefully this serves as
> a useful introduction to the LXC work in libvirt and how it
> is making use of the kernel's container based virtualization
> support. For those interested in finding out more, all the
> source is in the libvirt CVS repo, the files being those
> named  src/lxc_conf.c, src/lxc_container.c, src/lxc_controller.c
> and src/lxc_driver.c. 
> 
>    http://libvirt.org/downloads.html
> 
> or via the GIT mirror of our CVS repo
> 
>    git clone git://git.et.redhat.com/libvirt.git
> 
> Regards,
> Daniel
> -- 
> |: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
> |: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|
> _______________________________________________
> Containers mailing list
> Containers at lists.linux-foundation.org
> https://lists.linux-foundation.org/mailman/listinfo/containers
> 
----- End forwarded message -----

-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|




More information about the libvir-list mailing list