[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [libvirt] [Qemu-devel] [PATCH v3] Add support for fd: protocol



Am 26.07.2011 18:57, schrieb Corey Bryant:
> 
> Kevin, thanks for the input.
> 
> On 07/26/2011 11:18 AM, Kevin Wolf wrote:
>> Am 26.07.2011 14:51, schrieb Corey Bryant:
>>>>  sVirt provides SELinux MAC isolation for Qemu guest processes and their
>>>>  corresponding resources (image files). sVirt provides this support
>>>>  by labeling guests and resources with security labels that are stored
>>>>  in file system extended attributes. Some file systems, such as NFS, do
>>>>  not support the extended attribute security namespace, which is needed
>>>>  for image file isolation when using the sVirt SELinux security driver
>>>>  in libvirt.
>>>>
>>>>  The proposed solution entails a combination of Qemu, libvirt, and
>>>>  SELinux patches that work together to isolate multiple guests' images
>>>>  when they're stored in the same NFS mount. This results in an
>>>>  environment where sVirt isolation and NFS image file isolation can both
>>>>  be provided.
>>>>
>>>>  This patch contains the Qemu code to support this solution. I would
>>>>  like to solicit input from the libvirt community prior to starting
>>>>  the libvirt patch.
>>>>
>>>>  Currently, Qemu opens an image file in addition to performing the
>>>>  necessary read and write operations. The proposed solution will move
>>>>  the open out of Qemu and into libvirt. Once libvirt opens an image
>>>>  file for the guest, it will pass the file descriptor to Qemu via a
>>>>  new fd: protocol.
>>>>
>>>>  If the image file resides in an NFS mount, the following SELinux policy
>>>>  changes will provide image isolation:
>>>>
>>>>     - A new SELinux boolean is created (e.g. virt_read_write_nfs) to
>>>>       allow Qemu (svirt_t) to only have SELinux read and write
>>>>       permissions on nfs_t files
>>>>
>>>>     - Qemu (svirt_t) also gets SELinux use permissions on libvirt
>>>>       (virtd_t) file descriptors
>>>>
>>>>  Following is a sample invocation of Qemu using the fd: protocol on
>>>>  the command line:
>>>>
>>>>       qemu -drive file=fd:4,format=qcow2
>>>>
>>>>  The fd: protocol is also supported by the drive_add monitor command.
>>>>  This requires that the specified file descriptor is passed to the
>>>>  monitor alongside a prior getfd monitor command.
>>>>
>>>>  There are some additional features provided by certain image types
>>>>  where Qemu reopens the image file. All of these scenarios will be
>>>>  unsupported for the fd: protocol, at least for this patch:
>>>>
>>>>     - The -snapshot command line option
>>>>     - The savevm monitor command
>>>>     - The snapshot_blkdev monitor command
>>>>     - Use of copy-on-write image files
>>>>     - The -cdrom command line option
>>>>     - The -drive command line option with media=cdrom
>>>>     - The change monitor command
>>>>
>>>>  The thought is that this support can be added in the future, but is
>>>>  not required for the initial fd: support.
>>>>
>>>>  This patch was tested with the following formats: raw, cow, qcow,
>>>>  qcow2, qed, and vmdk, using the fd: protocol from the command line
>>>>  and the monitor. Tests were also run to verify existing file name
>>>>  support and qemu-img were not regressed. Non-valid file descriptors,
>>>>  fd: without format, snapshot and backing files, and cdrom were also
>>>>  tested.
>>>>
>>>>  v2:
>>>>     - Add drive_add monitor command support
>>>>     - Fence off unsupported features that re-open image file
>>>>
>>>>  v3:
>>>>     - Fence off cdrom and change monitor command support
>>>>
>>>>  Signed-off-by: Corey Bryant<coreyb linux vnet ibm com>
>>>>  ---
>>>>    block.c           |   16 ++++++++++
>>>>    block.h           |    1 +
>>>>    block/cow.c       |    5 +++
>>>>    block/qcow.c      |    5 +++
>>>>    block/qcow2.c     |    5 +++
>>>>    block/qed.c       |    4 ++
>>>>    block/raw-posix.c |   81 +++++++++++++++++++++++++++++++++++++++++++++++------
>>>>    block/vmdk.c      |    5 +++
>>>>    block_int.h       |    1 +
>>>>    blockdev.c        |   19 ++++++++++++
>>>>    monitor.c         |    5 +++
>>>>    monitor.h         |    1 +
>>>>    qemu-options.hx   |    8 +++--
>>>>    qemu-tool.c       |    5 +++
>>>>    14 files changed, 149 insertions(+), 12 deletions(-)
>>>>
>>>>  diff --git a/block.c b/block.c
>>>>  index 24a25d5..500db84 100644
>>>>  --- a/block.c
>>>>  +++ b/block.c
>>>>  @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>>>            char tmp_filename[PATH_MAX];
>>>>            char backing_filename[PATH_MAX];
>>>>
>>>>  +        if (bdrv_is_fd_protocol(bs)) {
>>>>  +            return -ENOTSUP;
>>>>  +        }
>> Hm, shouldn't that just work even with fd?
>>
> 
> My thought was that the proposed SELinux changes would prevent the open 
> of the temporary backing file if the file corresponding to fd resides on 
> NFS.  But perhaps the backing file is not opened on NFS?

Depends on how broken your SELinux restrictions are. ;-)

I would argue that allowing qemu to create temporary files is a
reasonable thing to do. Maybe libvirt comes to a different conclusion,
but that doesn't mean that every other management tool comes to the same.

Also, as Alex already said, you shouldn't think of your use case as the
only valid use case. What a fd driver implementation is about is just
working with an fd for images. If libvirts puts more restrictions on it,
that's fine, but don't force other users to follow your model.

>>>>  +
>>>>            /* if snapshot, we create a temporary backing file and open it
>>>>               instead of opening 'filename' directly */
>>>>
>>>>  @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags,
>>>>
>>>>        /* Find the right image format driver */
>>>>        if (!drv) {
>>>>  +        /* format must be specified for fd: protocol */
>>>>  +        if (bdrv_is_fd_protocol(bs)) {
>>>>  +            return -ENOTSUP;
>>>>  +        }
>> This isn't really required to make fd work.
>>
> 
> If format is not specified, the file needs to be opened and probed to 
> determine the format.  The proposed SELinux changes should prevent the 
> open if it resides on NFS.

SELinux shouldn't see an open() if you bdrv_open() an image with the fd
protocol. I think the only thing to which you need to pay attention is
that you can open it multiple times, i.e. don't close the fd on
bdrv_close (or maybe just dup() it in bdrv_open?)

>>>>            ret = find_image_format(filename,&drv);
>>>>        }
>>>>
>>>>  @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState *bs)
>>>>        return bs->enable_write_cache;
>>>>    }
>>>>
>>>>  +int bdrv_is_fd_protocol(BlockDriverState *bs)
>>>>  +{
>>>>  +    return bs->fd_protocol;
>>>>  +}
>> The generic block layer shouldn't have any special cases based on the
>> format driver. If you need to do something different for fd, think about
>> what property of fd it is that requires the special case (like can't
>> reopen).
>>
> 
> I'm not sure I completely understand what you're saying here.  I 
> understand the desire to not have special cases based on the fd 
> protocol.  I have a number special case checks that prevent re-opens, 
> which presumably would not be allowed by the proposed SELinux changes. 
> Should there be no special cases at all (e.g. should I provide solutions 
> to all the re-opens now) or would you rather special cases be provided 
> at a different layer?

bdrv_is_fd_protocol() names a specific driver instead of the important
property of that driver. Something like bdrv_can_reopen() would be
generic and could be used by other drivers having the same property.

Something like this should be a flag of the BlockDriver and not of the
BlockDriverState.

>>>>  +
>>>>    /* XXX: no longer used */
>>>>    void bdrv_set_change_cb(BlockDriverState *bs,
>>>>                            void (*change_cb)(void *opaque, int reason),
>>>>  @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs,
>>>>        BlockDriver *drv = bs->drv;
>>>>        if (!drv)
>>>>            return -ENOMEDIUM;
>>>>  +    if (bdrv_is_fd_protocol(bs)) {
>>>>  +        return -ENOTSUP;
>>>>  +    }
>> What's the problem with internal snapshots in images accesses over an fd?
>>
> 
> I was thinking the proposed SELinux changes would prevent the open of 
> the snapshot file if the file corresponding to fd resides on NFS.  But 
> perhaps it is not opened on NFS?

This is about internal snapshots, it doesn't open anything.

>>>>        if (drv->bdrv_snapshot_create)
>>>>            return drv->bdrv_snapshot_create(bs, sn_info);
>>>>        if (bs->file)
>>>>  diff --git a/block.h b/block.h
>>>>  index 859d1d9..0417b69 100644
>>>>  --- a/block.h
>>>>  +++ b/block.h
>>>>  @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs);
>>>>    int bdrv_is_read_only(BlockDriverState *bs);
>>>>    int bdrv_is_sg(BlockDriverState *bs);
>>>>    int bdrv_enable_write_cache(BlockDriverState *bs);
>>>>  +int bdrv_is_fd_protocol(BlockDriverState *bs);
>>>>    int bdrv_is_inserted(BlockDriverState *bs);
>>>>    int bdrv_media_changed(BlockDriverState *bs);
>>>>    int bdrv_is_locked(BlockDriverState *bs);
>>>>  diff --git a/block/cow.c b/block/cow.c
>>>>  index 4cf543c..e17f8e7 100644
>>>>  --- a/block/cow.c
>>>>  +++ b/block/cow.c
>>>>  @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int flags)
>>>>        pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>>>>                cow_header.backing_file);
>>>>
>>>>  +    if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +        /* backing file currently not supported by fd: protocol */
>>>>  +        goto fail;
>>>>  +    }
>> I don't think these checks are strictly needed. Without the check you
>> can open the image itself using an fd, and the backing file using good
>> old raw-posix. We shouldn't decide for our users that this isn't useful.
>>
>> In any case, it would have to move into block.c, you can't modify
>> independent drivers for this.
>>
> 
> I understand the point on not modifying independent drivers.
> 
> But if the backing file resides on NFS, wouldn't the the proposed 
> SELinux changes prevent the open?

Probably. But what about cases where the backing file is local? Or even
a non-libvirt, non-SELinux use case?

> Or are you talking about support where libvirt opens the backing file 
> and passes the fd to Qemu?  If so there was some discussion about future 
> support for this here: 
> http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg01496.html

Not really, but implementing this will be a bit easier if you don't
forbid using backing files with fd.


>>>>  +
>>>>        bitmap_size = ((bs->total_sectors + 7)>>  3) + sizeof(cow_header);
>>>>        s->cow_sectors_offset = (bitmap_size + 511)&  ~511;
>>>>        return 0;
>>>>  diff --git a/block/qcow.c b/block/qcow.c
>>>>  index 227b104..964d411 100644
>>>>  --- a/block/qcow.c
>>>>  +++ b/block/qcow.c
>>>>  @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int flags)
>>>>            if (bdrv_pread(bs->file, header.backing_file_offset, bs->backing_file, len) != len)
>>>>                goto fail;
>>>>            bs->backing_file[len] = '\0';
>>>>  +
>>>>  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +            /* backing file currently not supported by fd: protocol */
>>>>  +            goto fail;
>>>>  +        }
>>>>        }
>>>>        return 0;
>>>>
>>>>  diff --git a/block/qcow2.c b/block/qcow2.c
>>>>  index 48e1b95..7f6a4fa 100644
>>>>  --- a/block/qcow2.c
>>>>  +++ b/block/qcow2.c
>>>>  @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>>>>                goto fail;
>>>>            }
>>>>            bs->backing_file[len] = '\0';
>>>>  +
>>>>  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +            ret = -ENOTSUP;
>>>>  +            goto fail;
>>>>  +        }
>>>>        }
>>>>        if (qcow2_read_snapshots(bs)<  0) {
>>>>            ret = -EINVAL;
>>>>  diff --git a/block/qed.c b/block/qed.c
>>>>  index 3970379..5028897 100644
>>>>  --- a/block/qed.c
>>>>  +++ b/block/qed.c
>>>>  @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>>>>                return ret;
>>>>            }
>>>>
>>>>  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +            return -ENOTSUP;
>>>>  +        }
>>>>  +
>>>>            if (s->header.features&  QED_F_BACKING_FORMAT_NO_PROBE) {
>>>>                pstrcpy(bs->backing_format, sizeof(bs->backing_format), "raw");
>>>>            }
>>>>  diff --git a/block/raw-posix.c b/block/raw-posix.c
>>>>  index 34b64aa..cec4d36 100644
>>>>  --- a/block/raw-posix.c
>>>>  +++ b/block/raw-posix.c
>>>>  @@ -28,6 +28,7 @@
>>>>    #include "block_int.h"
>>>>    #include "module.h"
>>>>    #include "block/raw-posix-aio.h"
>>>>  +#include "monitor.h"
>>>>
>>>>    #ifdef CONFIG_COCOA
>>>>    #include<paths.h>
>>>>  @@ -185,7 +186,8 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>>>>                               int bdrv_flags, int open_flags)
>>>>    {
>>>>        BDRVRawState *s = bs->opaque;
>>>>  -    int fd, ret;
>>>>  +    int fd = -1;
>>>>  +    int ret;
>>>>
>>>>        ret = raw_normalize_devicepath(&filename);
>>>>        if (ret != 0) {
>>>>  @@ -207,15 +209,17 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
>>>>        if (!(bdrv_flags&  BDRV_O_CACHE_WB))
>>>>            s->open_flags |= O_DSYNC;
>>>>
>>>>  -    s->fd = -1;
>>>>  -    fd = qemu_open(filename, s->open_flags, 0644);
>>>>  -    if (fd<  0) {
>>>>  -        ret = -errno;
>>>>  -        if (ret == -EROFS)
>>>>  -            ret = -EACCES;
>>>>  -        return ret;
>>>>  +    if (s->fd == -1) {
>>>>  +        fd = qemu_open(filename, s->open_flags, 0644);
>>>>  +        if (fd<  0) {
>>>>  +            ret = -errno;
>>>>  +            if (ret == -EROFS) {
>>>>  +                ret = -EACCES;
>>>>  +            }
>>>>  +            return ret;
>>>>  +        }
>>>>  +        s->fd = fd;
>>>>        }
>>>>  -    s->fd = fd;
>>>>        s->aligned_buf = NULL;
>>>>
>>>>        if ((bdrv_flags&  BDRV_O_NOCACHE)) {
>>>>  @@ -272,6 +276,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
>>>>    {
>>>>        BDRVRawState *s = bs->opaque;
>>>>
>>>>  +    s->fd = -1;
>>>>        s->type = FTYPE_FILE;
>>>>        return raw_open_common(bs, filename, flags, 0);
>>>>    }
>>>>  @@ -892,6 +897,60 @@ static BlockDriver bdrv_file = {
>>>>        .create_options = raw_create_options,
>>>>    };
>>>>
>>>>  +static int raw_open_fd(BlockDriverState *bs, const char *filename, int flags)
>>>>  +{
>>>>  +    BDRVRawState *s = bs->opaque;
>>>>  +    const char *fd_str;
>>>>  +    int fd;
>>>>  +
>>>>  +    /* extract the file descriptor - fail if it's not fd: */
>>>>  +    if (!strstart(filename, "fd:",&fd_str)) {
>>>>  +        return -EINVAL;
>>>>  +    }
>>>>  +
>>>>  +    if (!qemu_isdigit(fd_str[0])) {
>>>>  +        /* get fd from monitor */
>>>>  +        fd = qemu_get_fd(fd_str);
>>>>  +        if (fd == -1) {
>>>>  +            return -EBADF;
>>>>  +        }
>>>>  +    } else {
>>>>  +        char *endptr = NULL;
>>>>  +
>>>>  +        fd = strtol(fd_str,&endptr, 10);
>>>>  +        if (*endptr || (fd == 0&&  fd_str == endptr)) {
>>>>  +            return -EBADF;
>>>>  +        }
>>>>  +    }
>>>>  +
>>>>  +    s->fd = fd;
>>>>  +    s->type = FTYPE_FILE;
>>>>  +
>>>>  +    return raw_open_common(bs, filename, flags, 0);
>>>>  +}
>>>>  +
>>>>  +static BlockDriver bdrv_file_fd = {
>>>>  +    .format_name = "file",
>>>>  +    .protocol_name = "fd",
>>>>  +    .instance_size = sizeof(BDRVRawState),
>>>>  +    .bdrv_probe = NULL, /* no probe for protocols */
>>>>  +    .bdrv_file_open = raw_open_fd,
>>>>  +    .bdrv_read = raw_read,
>>>>  +    .bdrv_write = raw_write,
>>>>  +    .bdrv_close = raw_close,
>>>>  +    .bdrv_flush = raw_flush,
>>>>  +    .bdrv_discard = raw_discard,
>>>>  +
>>>>  +    .bdrv_aio_readv = raw_aio_readv,
>>>>  +    .bdrv_aio_writev = raw_aio_writev,
>>>>  +    .bdrv_aio_flush = raw_aio_flush,
>>>>  +
>>>>  +    .bdrv_truncate = raw_truncate,
>>>>  +    .bdrv_getlength = raw_getlength,
>>>>  +
>>>>  +    .create_options = raw_create_options,
>>>>  +};
>>>>  +
>>>>    /***********************************************/
>>>>    /* host device */
>>>>
>>>>  @@ -1000,6 +1059,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
>>>>        }
>>>>    #endif
>>>>
>>>>  +    s->fd = -1;
>>>>        s->type = FTYPE_FILE;
>>>>    #if defined(__linux__)
>>>>        {
>>>>  @@ -1170,6 +1230,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags)
>>>>        BDRVRawState *s = bs->opaque;
>>>>        int ret;
>>>>
>>>>  +    s->fd = -1;
>>>>        s->type = FTYPE_FD;
>>>>
>>>>        /* open will not fail even if no floppy is inserted, so add O_NONBLOCK */
>>>>  @@ -1288,6 +1349,7 @@ static int cdrom_open(BlockDriverState *bs, const char *filename, int flags)
>>>>    {
>>>>        BDRVRawState *s = bs->opaque;
>>>>
>>>>  +    s->fd = -1;
>>>>        s->type = FTYPE_CD;
>>>>
>>>>        /* open will not fail even if no CD is inserted, so add O_NONBLOCK */
>>>>  @@ -1517,6 +1579,7 @@ static void bdrv_file_init(void)
>>>>         * Register all the drivers.  Note that order is important, the driver
>>>>         * registered last will get probed first.
>>>>         */
>>>>  +    bdrv_register(&bdrv_file_fd);
>>>>        bdrv_register(&bdrv_file);
>>>>        bdrv_register(&bdrv_host_device);
>> The part modifying raw looks good at the first sight.
>>
> 
> Ok. This leads me to think that a block/fd.c file isn't necessary then.

Yes, having it in raw-posix.c is fine.

>>>>    #ifdef __linux__
>>>>  diff --git a/block/vmdk.c b/block/vmdk.c
>>>>  index 922b23d..2ea808e 100644
>>>>  --- a/block/vmdk.c
>>>>  +++ b/block/vmdk.c
>>>>  @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs)
>>>>                return -1;
>>>>
>>>>            pstrcpy(bs->backing_file, end_name - p_name + 1, p_name);
>>>>  +
>>>>  +        if (bs->backing_file[0] != '\0'&&  bdrv_is_fd_protocol(bs)) {
>>>>  +            /* backing file currently not supported by fd: protocol */
>>>>  +            return -1;
>>>>  +        }
>>>>        }
>>>>
>>>>        return 0;
>>>>  diff --git a/block_int.h b/block_int.h
>>>>  index 1e265d2..441049c 100644
>>>>  --- a/block_int.h
>>>>  +++ b/block_int.h
>>>>  @@ -152,6 +152,7 @@ struct BlockDriverState {
>>>>        int encrypted; /* if true, the media is encrypted */
>>>>        int valid_key; /* if true, a valid encryption key has been set */
>>>>        int sg;        /* if true, the device is a /dev/sg* */
>>>>  +    int fd_protocol; /* if true, the fd: protocol was specified */
>> You don't need this any more when you remove bdrv_is_fd_protocol()
>>
>>>>        /* event callback when inserting/removing */
>>>>        void (*change_cb)(void *opaque, int reason);
>>>>        void *change_opaque;
>>>>  diff --git a/blockdev.c b/blockdev.c
>>>>  index c263663..5cb7b56 100644
>>>>  --- a/blockdev.c
>>>>  +++ b/blockdev.c
>>>>  @@ -542,6 +542,14 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>>>>
>>>>        bdrv_flags |= ro ? 0 : BDRV_O_RDWR;
>>>>
>>>>  +    if (strncmp(file, "fd:", 3) == 0) {
>>>>  +        if (media == MEDIA_CDROM) {
>>>>  +            error_report("CD-ROM not supported by fd: protocol");
>>>>  +            goto err;
>>>>  +        }
>>>>  +        dinfo->bdrv->fd_protocol = 1;
>>>>  +    }
>> Why?
>>
> 
> The thought was that CD-ROMs can be re-opened but perhaps that is only 
> applicable to native CD-ROM, which may rule out an iso stored on NFS. 
> Is that what you're thinking?

Actually, using host devices over the fd protocol is an interesting
point. I think in this implementation fd implicitly means file.

I don't think that with CD-ROM passthrough a reopen happens in normal
use cases. It does happen with the 'change' monitor command, but
changing to a different image (maybe another fd) shouldn't be a problem.

Kevin


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]