[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd

Corey Bryant coreyb at linux.vnet.ibm.com
Thu Jul 5 14:22:41 UTC 2012



On 07/04/2012 04:00 AM, Kevin Wolf wrote:
> Am 03.07.2012 19:03, schrieb Eric Blake:
>>>>> 2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the
>>>>> set that has access flags matching the qemu_open action flags.
>>>>> qemu_open increments refcount for this fd.
>>>>> 3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named
>>>>> "/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5).  libvirt
>>>>> in-use flag turned on for fd.
>>>>> 3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first
>>>>> fd from the set that has access flags matching the qemu_open action
>>>>> flags.  qemu_open increments refcount for this fd.
>>>>> 4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the
>>>>> set.  turns libvirt in-use flag off marking the fd ready to be closed
>>>>> when qemu is done with it.
>>>>
>>>> If we decided to not return the individual fd numbers to libvirt, file
>>>> descriptors would be uniquely identified by an fdset/flags pair here.
>>>>
>>>
>>> Are you saying we'd pass the fdset name and flags parameters on
>>> remove-fd to somehow identify the fds to remove?
>>
>> Passing the flag parameters is not trivial, as that would mean the QMP
>> code would have to define constants mapping to all of the O_* flags that
>> qemu_open supports.  It's easier to support closing by fd number.
>
> Good point. So pass-fd or whatever it will be called needs to returnn
> both fdset number and the individual fd number.
>
>>>>> 5. qemu_close decrements refcount for fd, and closes fd when refcount is
>>>>> zero and libvirt in use flag is off.
>>>>
>>>> The monitor could just hold another reference, then we save the
>>>> additional flag. But that's a qemu implementation detail.
>>>>
>>>
>>> I'm not sure I understand what you mean.
>>
>> pass-fd (or add-fd, whatever name we give it) adds an fd to an fdset,
>> with initial use count of 1 (the use is the monitor).  qemu_open()
>> increments the use count.  A new qemu_close() wrapper would decrement
>> the use count.  And both calling 'remove-fd', or closing the QMP monitor
>> of an fd that has not yet been passed through 'remove-fd', serves as a
>> way to decrement the use count.  You'd still have to track whether the
>> monitor is using an fd (to avoid over-decrementing on QMP monitor
>> close), but by having the monitor's use also tracked under the refcount,
>> then refcount reaching 0 is sufficient to auto-close an fd.
>
>> I think
>> that also means that re-establishing the client QMP connection would
>> increment
>
> This is an interesting idea. I've never thought about what to do with a
> reconnect. It felt a bit odd at first, but now your proposal totally
> makes sense to me.
>
>
>> For some examples:
>>
>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>> use by monitor, as member of fdset1
>> 2. client crashes, so all tracked fds are visited; fd=4 had not yet been
>> passed to 'remove-fd', so qemu decrements refcount; refcount of fd=4 is
>> now 0 so qemu closes it
>
> Doesn't the refcount belong to an fdset rather than a single fd? As long
> as the fdset has still a reference, it's possible to reach the other fds
> in the set through a reopen. For example:
>
> 1. client calls 'add-fd', qemu is now tracking fd=4 (O_RDWR) as a member
> of fdset1, in use by monitor, refcount 1
> 2. client calls 'add-fd', qemu is now tracing fd=5 (O_RDONLY) as a
> member of fdset, in use by monitor, refcount is still 1
> 3. client calls 'device-add' with /dev/fdset/1 as the backing filename
> and r/w, so qemu_open() dup()s fd 4 and increments the refcount to 2
> 4. client crashes, so all tracked fdsets are visited; fdset1 gets its
> refcount decreased to 1, and both member fds 4 and 5 stay open.
>
> If you had refcounted a single fd, fd 5 would be closed now and qemu
> couldn't switch to O_RDONLY any more.
>

If the O_RDONLY is for a future command, it would make more sense to add 
that fd before that future command.  Or are you passing it at step 2 
because it is needed for the probe re-open issue?

>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>> use by monitor, as member of fdset1
>> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
>> so qemu_open() increments the refcount to 2
>> 3. client crashes, so all tracked fds are visited; fd=4 had not yet been
>> passed to 'remove-fd', so qemu decrements refcount to 1, but leaves fd=4
>> open because it is still in use by the block device
>> 4. client re-establishes QMP connection, and 'query-fds' lets client
>> learn about fd=4 still being open as part of fdset1, but also informs
>> client that fd is not in use by the monitor
>>
>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>> use by monitor, as member of fdset1
>> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
>> so qemu_open() increments the refcount to 2
>> 3. client calls 'remove-fd fdset=1 fd=4', so qemu marks fd=4 as no
>> longer in use by the monitor, refcount decremented to 1 but still left
>> open because it is in use by the block device
>> 4. client crashes, so all tracked fds are visited; but fd=4 is already
>> marked as not in use by the monitor, so its refcount is unchanged
>>
>> 1. client calls 'add-fd', qemu is now tracking fd=4 with refcount 1, in
>> use by monitor, as member of fdset1
>> 2. client calls 'device-add' with /dev/fdset/1 as the backing filename,
>> but the command fails for some other reason, so the refcount is still 1
>> at the end of the command (although it may have been temporarily
>> incremented then decremented during the command)
>> 3. client calls 'remove-fd fdset=1 fd=4' to deal with the failure (or
>> QMP connection is closed), so qemu marks fd=4 as no longer in use by the
>> monitor, refcount is now decremented to 0 and fd=4 is closed
>
> Yup, apart from the comment above the examples look good to me.
>
>> I think that covers the idea; you need a bool in_use for tracking
>> monitor state (the monitor is in use until either a remove-fd or a
>> monitor connection closes), as well as a ref-count.
>
> Yes, makes sense to me.
>
>>> We also need a query-fdsets command that lists all fdsets that exist. If
>>>> we add information about single fds to the return value of it, we
>>>> probably don't need a separate query-fd that operates on a single fdset.
>>>>
>>>
>>> Yes, good point.  And maybe we don't need 2 commands.  query-fdsets
>>> could return all the sets and all the fds that are in those sets.
>>
>> Yes, I think a single query command is good enough here, something like:
>>
>> { "execute":"query-fdsets" } =>
>> { "return" : { "sets": [
>>     { "name": "fdset1",
>>       "fds": [ { "fd": 4, "monitor": true, "refcount": 1 } ] },
>>     { "name": "fdset2",
>>       "fds": [ { "fd": 5, "monitor": false, "refcount": 1 },
>>                { "fd": 6, "monitor": true, "refcount": 2 } ] } ] } }
>
> Looks good, this is exactly what I was thinking of.
>
>>>> In use by whom? If it's still in use in qemu (as in "in-use flag would
>>>> be set") and we have a refcount of zero, then that's a bug.
>>>>
>>>
>>> In use by qemu.  I don't think it's a bug.  I think there are situations
>>> where refcount gets to zero but qemu is still using the fd.
>>
>> I think the refcount being non-zero _is_ what defines an fd as being in
>> use by qemu (including use by the monitor).  Any place you have to close
>> an fd before reopening it is dangerous; the safe way is always to open
>> with the new permissions before closing the old permissions.
>
> There is one case I'm aware of where we need to be careful: Before
> opening an image, qemu may probe the format. In this case, the image
> gets opened twice, and the first close comes before the second open. I'm
> not entirely sure how hard it would be to get rid of that behaviour.
>
> If we can't get rid of it, we have a small window that the refcount
> doesn't really cover, and if we weren't careful it would become racy.
> This is why I mentioned earlier that maybe we need to defer the refcount
> decrease a bit. However, I can't see how the in-use flag would make a
> difference there. If the refcount can't cover it, the in-use flag can't
> either.

Yeah this is a problem.  Could we introduce another flag to cover this?

-- 
Regards,
Corey





More information about the libvir-list mailing list