[libvirt] [RFC] [PATCH v2 1/6] add configure option --with-fuse for libvirt

Thu Sep 6 07:38:20 UTC 2012

On 09/06/2012 05:53 AM, Gao feng wrote:
> 于 2012年09月05日 20:42, Daniel P. Berrange 写道:
>> On Wed, Sep 05, 2012 at 05:41:40PM +0800, Gao feng wrote:
>>> Hi Daniel & Glauber
>>>
>>> 于 2012年07月31日 17:27, Daniel P. Berrange 写道:
>>>> Hi Gao,
>>>>
>>>> I'm wondering if you are planning to attend the Linux Plumbers Conference
>>>> in San Diego at the end of August ?  Glauber is going to be giving a talk
>>>> on precisely the subject of virtualizing /proc in containers which is
>>>> exactly what your patch is looking at
>>>>
>>>>   https://blueprints.launchpad.net/lpc/+spec/lpc2012-cont-proc
>>>>
>>>> I'll review your patches now, but I think I'd like to wait to hear what
>>>> Glauber talks about at LPC before we try to merge this support in libvirt,
>>>> so we have an broadly agreed long term strategy for /proc between all the
>>>> interested userspace & kernel guys.
>>>
>>> I did not attend the LPC,so can you tell me what's the situation of the
>>> /proc virtualization?
>>>
>>> I think maybe we should just apply this patchset first,and wait for somebody
>>> sending patches to implement /proc virtualization.
>>
>> So there were three main approaches discussed
>>
>>  1. FUSE based /proc + a real hidden /.proc. The FUSE /proc provides custom
>>     handling of various files like meminfo, otherwise forwards I/O requests
>>     through to the hidden /.proc files. This was the original proof of
>>     concept.
>>
>>  2. One FUSE filesystem for all containers + a real /proc. Bind mount files
>>     from the FUSE filesystem into the container's /proc. This is what Glauber
>>     has done.
>>
>>  3. One FUSE filesystem per container + a real /proc. Bind mount files from
>>     the FUSE filesystem into the container's /proc. This is what your patch
>>     is doing
>>
>> Options 2 & 3 have a clear a win over option 1 in efficiency terms, since
>> they avoid doubling the I/O required for the majority of files.
>>
>> Glaubar thinks it is perferrable to have a single FUSE filesystem that
>> has one sub-directory for each container. Then bind mount the appropriate
>> sub dir into each container.
>>
>> I kinda like the way you have done things, having a private FUSE filesystem
>> per container, for security reasons. By having the FUSE backend be part of
>> the libvirt_lxc process we have strictly isolated each containers' environment.
>>
>> If we wanted a single shared FUSE for all containers, we'd need to have some
>> single shared daemon to maintain it. This could not be libvirtd itself, since
>> we need the containers & their filesystems to continue to work when libvirtd
>> itself is not running. We could introduce a separate libvirt_fused which
>> provided a shared filesystem, but this still has the downside that any
>> flaw in its impl could provide a way for one container to attack another
>> container
> 
> Agree,if we choose the option 2,we have to organize the sub-directory for each
> container in fuse,it will make fuse filesystem complicated.
> 

So, according to Daniel Lezcano, that tried it once, FUSE is very fork
intensive, and having one mount per-container would lead to bad
performance. But I have to admit I have never measured it myself. I
would be curious to see any numbers for a large deployment, to see if
that complication is worth the gain.