[libvirt] [Qemu-devel] [PATCH v4 0/7] file descriptor passing using pass-fd

Tue Jul 3 15:40:21 UTC 2012

On 07/02/2012 06:02 PM, Corey Bryant wrote:
>
>
> On 06/26/2012 06:54 PM, Eric Blake wrote:
>> On 06/26/2012 04:28 PM, Corey Bryant wrote:
>>
>>>>>> With this proposed series, we have usage akin to:
>>>>>>
>>>>>>     1. pass_fd FDSET={M} -> returns a string "/dev/fd/N" showing
>>>>>> QEMU's
>>>>>>        view of the FD
>>>>>>     2. drive_add file=/dev/fd/N
>>>>>>     3. if failure:
>>>>>>          close_fd "/dev/fd/N"
>>>>>
>>>>> In fact, there are more steps:
>>>>>
>>>>> 4. use it successfully
>>>>> 5. close_fd "/dev/fd/N"
>>>>>
>>>>> I think it would well be possible that qemu just closes the fd when
>>>>> it's
>>>>> not used internally any more.
>>>>
>>>> pass-fd could have a flag indicating qemu to do that.
>>>>
>>>
>>> It seems like libvirt would be in a better position to understand when a
>>> file is no longer in use, and then it can call close_fd.  No?  Of course
>>> the the only fd that needs to be closed is the originally passed fd. The
>>> dup'd fd's are closed by QEMU.
>>
>> The more we discuss this, the more I'm convinced that commands like
>> 'block-commit' that will reopen a file to change from O_RDONLY to O_RDWR
>> will need to have an optional argument that says the name of the file to
>> reopen.  That is, I've seen three proposals:
>>
>> Proposal One: enhance every command to take an fd:name protocol.
>> PRO: Successful use of a name removes the fd from the 'getfd' list.
>> CON: Lots of commands to change.
>> CON: The command sequence is not atomic.
>> CON: Block layer does not have a file name tied to the fd, so the reopen
>> operation also needs to learn the fd:name protocol, but since we're
>> already touching the command it's not much more work.
>> USAGE:
>> 1. getfd FDSET={M}, ties new fd to "name"
>> 2. drive_add fd=name - looks up the fd by name from the list
>> 3a. on success, fd is no longer in the list, drive_add consumed it
>> 3b. on failure, libvirt must call 'closefd name' to avoid a leak
>> 4. getfd FDSET={M2}, ties new fd to "newname"
>> 5. block-commit fd=newname - looks up fd by name from the list
>> 6a. on success, fd is no longer in the list, block-commit consumed it
>> 6b. on failure, libvirt must call 'closefd newname' to avoid a leak
>>
>> Proposal Two: Create a permanent name via 'getfd' or 'pass-fd', then
>> update that name to the new fd in advance of any operation that needs to
>> do a reopen.
>> PRO: All other commands work without impact by using qemu_open(), by
>> getting a clone of the permanent name.
>> CON: The permanent name must remain open as long as qemu might need to
>> reopen it, and reopening needs the pass-fd force option.
>> CON: The command sequence is not atomic.
>> USAGE:
>> 1. pass_fd FDSET={M} -> returns an integer N (or a string "/dev/fd/N")
>> showing QEMU's permanent name of the FD
>> 2. drive_add file=<permanent name> means that qemu_open() will clone the
>> fd, and drive_add is now using yet another FD while N remains open;
>> meanwhile, the block layer knows that the drive is tied to the permanent
>> name
>> 3. pass-fd force FDSET={M2} -> replaces fd N with the passed M2, and
>> still returns /dev/fd/N
>> 4. block-commit - looks up the block-device name (/dev/fd/N), which maps
>> back to the permanent fd, and gets a copy of the newly passed M2
>> 5. on completion (success or failure), libvirt must call 'closefd name'
>> to avoid a leak
>>
>> Proposal Three: Use thread-local fds passed alongside any command,
>> rather than passing via a separate command
>> PRO: All commands can now atomically handle fd passing
>> PRO: Commands that only need a one-shot open can now use fd
>> CON: Commands that need to reopen still need modification to allow a
>> name override during the reopen
>> 1. drive_add nfds=1 file="/dev/fdlist/1" FDSET={M} -> on success, the fd
>> is used as the block file, on failure it is atomically closed, so there
>> is no leak either way. However, the block file has no permanent name.
>> 2. block-commit nfds=1 file-override="/dev/fdlist/1" FDSET={M2} -> file
>> override must be passed again (since we have no guarantee that the
>> /dev/fdlist/1 of _this_ command matches the command-local naming used in
>> the previous command), but the fd is again used atomically
>>
>> Under proposal 3, there is no need to dup fd's, merely only to check
>> that qemu_open("/dev/fdlist/n", flags) has compatible flags with the fd
>> received via FDSET.  And the fact that things are made atomic means that
>> libvirt no longer has to worry about calling closefd, nor does it have
>> to worry about being interrupted between two monitor commands and not
>> knowing what state qemu is in.  But it does mean teaching every possible
>> command that wants to do a reopen to learn how to use fd overrides
>> rather than to reuse the permanent name that was originally in place on
>> the first open.
>>
>
> Here's another option that Kevin and I discussed today on IRC.  I've
> modified a few minor details since the discussion. And Kevin please
> correct me if anything is wrong.
>
> Proposal Four: Pass a set of fds via 'pass-fds'.  The group of fds
> should all refer to the same file, but may have different access flags
> (ie. O_RDWR, O_RDONLY).  qemu_open can then dup the fd that has the
> matching access mode flags.
> pass-fds:
>      { 'command': 'pass-fds',
>        'data': {'fdsetname': 'str', '*close': 'bool'},
>        'returns': 'string' }
> close-fds:
>      { 'command': 'close-fds',
>        'data': {'fdsetname': 'str' }
> where:
>       @fdsetname - the file descriptor set name
>       @close - *if true QEMU closes the monitor fds when they expire.
>                if false, fds remain on the monitor until close-fds
>                command.
> PRO: Supports reopen
> PRO: All other commands work without impact by using qemu_open()
> PRO: No fd leakage if close==true specified
> CON: Determining when to close fds when close==true may be tricky
> USAGE:
> 1. pass-fd FDSET={M} -> returns a string "/dev/fdset/1") that refers to
> the passed set of fds.
> 2. drive_add file=/dev/fdset/1 -- qemu_open() uses the first fd from the
> list that has access flags matching the qemu_open() action flags.
> 3. block-commit -- qemu_open() reopens "/dev/fdset/1" by using the first
> fd from the list that has access flags matching the qemu_open() action
> flags.
> 4. The monitor fds are closed:
>     A) *If @close == true, qemu closes the set of fds when the timer
>        expires
>     B) If @close == false, libvirt must issue close-fds command to close
>        the set of fds
>
> *How to solve this has yet to be determined.  Kevin mentioned
> potentially using a bottom-half or a timer in qemu_close().
>

Thanks again for taking time to discuss this at today's QEMU community call.

Here's the proposal we discussed at the call.  Please let me know if I 
missed anything or if there are any issues with this design.

Proposal Five:  New monitor commands enable adding/removing an fd 
to/from a set.  The set of fds should all refer to the same file, but 
may have different access flags (ie. O_RDWR, O_RDONLY).  qemu_open can 
then dup the fd that has the matching access mode flags.
PRO: Supports reopen
PRO: All other commands work without impact by using qemu_open()
PRO: No fd leakage (fds are associated with monitor connection and, if 
not in use, closed when monitor disconnects)
PRO: Security-wise this is ok since libvirt can manage the set of fd's 
(ie. it can add/remove an O_RDWR fd to/from the set as needed).
CON: Not atomic (e.g. doesn't add an fd with single drive_add command).
USAGE:
1. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named 
"/dev/fdset/1" - command returns qemu fd (e.g fd=4) to caller.  libvirt 
in-use flag turned on for fd.
2. drive_add file=/dev/fdset/1 -> qemu_open uses the first fd from the
set that has access flags matching the qemu_open action flags. 
qemu_open increments refcount for this fd.
3. add-fd /dev/fdset/1 FDSET={M} -> qemu adds fd to set named 
"/dev/fdset/1" - command returns qemu fd to caller (e.g fd=5).  libvirt 
in-use flag turned on for fd.
3. block-commit -> qemu_open reopens "/dev/fdset/1" by using the first
fd from the set that has access flags matching the qemu_open action 
flags.  qemu_open increments refcount for this fd.
4. remove-fd /dev/fdset/1 5 -> caller requests fd==5 be removed from the 
set.  turns libvirt in-use flag off marking the fd ready to be closed 
when qemu is done with it.
5. qemu_close decrements refcount for fd, and closes fd when refcount is 
zero and libvirt in use flag is off.

More functional details:
-If libvirt crashes it can call "query-fd /dev/fdset/1" to determine 
which fds are open in the set.
-If monitor connection closes, qemu will close fds that have a refcount 
of zero.  Do we also need a qemu in-use flag in case refcount is zero 
and fd is still in use?
-This support requires introduction of qemu_close function that will be 
called everywhere in block layer that close is currently called.

Notes:
-Patch series 1 will include support for all of the above.  This will be 
my initial focus.
-Patch series 2 will include command line support that enables 
association of command line fd with a monitor set.  This will be my 
secondary focus, most likely after patch series 1 is applied.

-- 
Regards,
Corey