[libvirt] [PATCH v5 6/6] block: Enable qemu_open/close to work with fd sets
Corey Bryant
coreyb at linux.vnet.ibm.com
Thu Jul 26 03:57:37 UTC 2012
On 07/25/2012 03:43 PM, Eric Blake wrote:
> On 07/23/2012 07:08 AM, Corey Bryant wrote:
>> When qemu_open is passed a filename of the "/dev/fdset/nnn"
>> format (where nnn is the fdset ID), an fd with matching access
>> mode flags will be searched for within the specified monitor
>> fd set. If the fd is found, a dup of the fd will be returned
>> from qemu_open.
>>
>> Each fd set has a reference count. The purpose of the reference
>> count is to determine if an fd set contains file descriptors that
>> have open dup() references that have not yet been closed. It is
>> incremented on qemu_open and decremented on qemu_close. It is
>> not until the refcount is zero that file desriptors in an fd set
>> can be closed. If an fd set has dup() references open, then we
>> must keep the other fds in the fd set open in case a reopen
>> of the file occurs that requires an fd with a different access
>> mode.
>>
>
>> +++ b/monitor.c
>> @@ -2551,6 +2551,91 @@ static void monitor_fdsets_set_in_use(Monitor *mon, bool in_use)
>> }
>> }
>>
>> +void monitor_fdset_increment_refcount(Monitor *mon, int64_t fdset_id)
>> +{
>> + mon_fdset_t *mon_fdset;
>> +
>> + if (!mon) {
>> + return;
>> + }
>
> Am I reading this code right by stating that 'if there is no monitor, we
> don't increment the refcount'? How does a monitor reattach affect
> things? Or am I missing something fundamental about the cases when
> 'mon==NULL' will exist?
>
Yes you're reading this correctly.
I'm pretty sure that mon will only be NULL if QEMU is started without a
monitor.
If QEMU has a monitor, and libvirt disconnects it's connection to the
qemu monitor, then I believe mon will remain non-NULL.
I'll plan on testing this out to verify though. (I'm out most of this
week and will be back full time starting next Tues.)
>> +int monitor_fdset_get_fd(Monitor *mon, int64_t fdset_id, int flags)
>> +{
>> + mon_fdset_t *mon_fdset;
>> + mon_fdset_fd_t *mon_fdset_fd;
>> + int mon_fd_flags;
>> +
>> + if (!mon) {
>> + errno = ENOENT;
>> + return -1;
>> + }
>> +
>> + QLIST_FOREACH(mon_fdset, &mon->fdsets, next) {
>> + if (mon_fdset->id != fdset_id) {
>> + continue;
>> + }
>> + QLIST_FOREACH(mon_fdset_fd, &mon_fdset->fds, next) {
>> + if (mon_fdset_fd->removed) {
>> + continue;
>> + }
>> +
>> + mon_fd_flags = fcntl(mon_fdset_fd->fd, F_GETFL);
>> + if (mon_fd_flags == -1) {
>> + return -1;
>
> This says we fail on the first fcntl() failure, instead of trying other
> fds in the set. Granted, an fcntl() failure is probably the sign of a
> bigger bug (such as closing an fd at the wrong point in time), so I
> guess trying to go on doesn't make much sense once we already know we
> are hosed.
>
I think I'll stick with it the way it is. If fcntl() fails we might
have a tainted fd set so I think we should fail.
>> + }
>> +
>> + switch (flags & O_ACCMODE) {
>> + case O_RDWR:
>> + if ((mon_fd_flags & O_ACCMODE) == O_RDWR) {
>> + return mon_fdset_fd->fd;
>> + }
>> + break;
>> + case O_RDONLY:
>> + if ((mon_fd_flags & O_ACCMODE) == O_RDONLY) {
>> + return mon_fdset_fd->fd;
>> + }
>> + break;
>
> Do we want to allow the case where the caller asked for O_RDONLY, but
> the set only has O_RDWR? After all, the caller is getting a compatible
> subset of what the set offers.
>
I don't see a problem with it.
>> + case O_WRONLY:
>> + if ((mon_fd_flags & O_ACCMODE) == O_WRONLY) {
>> + return mon_fdset_fd->fd;
>> + }
>> + break;
>
> Likewise, should we allow a caller asking for O_WRONLY when the set
> provides only O_RDWR?
>
I don't see a problem with it.
>>
>> +/*
>> + * Dups an fd and sets the flags
>> + */
>> +static int qemu_dup(int fd, int flags)
>> +{
>> + int i;
>> + int ret;
>> + int serrno;
>> + int dup_flags;
>> + int setfl_flags[] = { O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME,
>> + O_NONBLOCK, 0 };
>> +
>> + if (flags & O_CLOEXEC) {
>> + ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
>
> F_DUPFD_CLOEXEC is required by POSIX but not implemented on all modern
> OS yet; you probably need some #ifdef and/or configure guards.
>
Ok
>> + if (ret == -1 && errno == EINVAL) {
>> + ret = dup(fd);
>> + if (ret != -1 && fcntl_setfl(ret, O_CLOEXEC) == -1) {
>
> You _can't_ call F_SETFL with O_CLOEXEC. O_CLOEXEC only causes open()
> to set FD_CLOEXEC; thereafter, including in the case of this dup, what
> you want to do is instead set FD_CLOEXEC via F_SETFD (aka call
> qemu_set_cloexec, not fcntl_setfl).
>
I know, this is a mistake. I'm planning to replace fcntl_setfl() here
with qemu_set_cloexec().
>> + goto fail;
>> + }
>> + }
>> + } else {
>> + ret = dup(fd);
>> + }
>> +
>> + if (ret == -1) {
>> + goto fail;
>> + }
>> +
>> + dup_flags = fcntl(ret, F_GETFL);
>> + if (dup_flags == -1) {
>> + goto fail;
>> + }
>> +
>> + if ((flags & O_SYNC) != (dup_flags & O_SYNC)) {
>> + errno = EINVAL;
>> + goto fail;
>> + }
>> +
>> + /* Set/unset flags that we can with fcntl */
>> + i = 0;
>> + while (setfl_flags[i] != 0) {
>> + if (flags & setfl_flags[i]) {
>> + dup_flags = (dup_flags | setfl_flags[i]);
>> + } else {
>> + dup_flags = (dup_flags & ~setfl_flags[i]);
>> + }
>> + i++;
>> + }
>
> Rather than looping one bit at a time, you should do this as a mask
> operation.
>
I agree.
>> +
>> + if (fcntl(ret, F_SETFL, dup_flags) == -1) {
>> + goto fail;
>> + }
>> +
>> + /* Truncate the file in the cases that open() would truncate it */
>> + if (flags & O_TRUNC ||
>> + ((flags & (O_CREAT | O_EXCL)) == (O_CREAT | O_EXCL))) {
>> + if (ftruncate(ret, 0) == -1) {
>> + goto fail;
>> + }
>> + }
>> +
>> + qemu_set_cloexec(ret);
>
> If we're going to blindly set FD_CLOEXEC at the end of the day, rather
> than try to honor O_CLOEXEC, then why not simplify the beginning of this
> function:
This call to qemu_set_cloexec() was a mistake. I'm planning on removing it.
>
> ret = fcntl(fd, F_DUPFD_CLOEXEC, 0);
> if (ret == -1 && errno == EINVAL) {
> ret = dup(fd);
> if (ret != -1) {
> qemu_set_cloexec(ret);
> }
> }
> if (ret == -1) {
> goto fail;
> }
>
I'll plan on sticking with the existing code in the beginning of this
function with the modifications mentioned above.
--
Regards,
Corey
More information about the libvir-list
mailing list