Re: [libvirt] [Qemu-devel] QMP: RFC: I/O error info & query-stop-reason

Hi there,

There are people who want to use QMP for thin provisioning. That's, the VM is
started with a small storage and when a no space error is triggered, more space
is allocated and the VM is put to run again.

QMP has two limitations that prevent people from doing this today:

1. The BLOCK_IO_ERROR doesn't contain error information

2. Considering we solve item 1, we still have to provide a way for clients
       to query why a VM stopped. This is needed because clients may miss the
       BLOCK_IO_ERROR event or may connect to the VM while it's already stopped

A proposal to solve both problems follow.

A. BLOCK_IO_ERROR information

We already have discussed this a lot, but didn't reach a consensus. My solution
is quite simple: to add a stringfied errno name to the BLOCK_IO_ERROR event,
for example (see the "reason" key):

{ "event": "BLOCK_IO_ERROR",
       "data": { "device": "ide0-hd1",
                 "operation": "write",
                 "action": "stop",
                 "reason": "enospc", }

you can call the reason whatever you want, but don't call it stringfied
errno name :-)

In fact, just make reason "no space".

You mean, we should do:

     "reason": "no space"

Or that we should make it a boolean, like:

    "no space": true

Do we need reason in BLOCK_IO_ERROR if query-block returns this information?

True, no.

I'm ok with either way. But in case you meant the second one, I guess
we should make "reason" a dictionary so that we can group related
information when we extend the field, for example:

    "reason": { "no space": false, "no permission": true }

Why would we ever have "no permission"?

Why did it happen?  It's not clear to me when read/write would return
EPERM.  open() should fail.  In fact, EPERM is not mentioned in man 2 read.

Actually, the error was an EACCESS which might sound more bizarre :)

What happened was that the device file in question had its permission
changed during VM execution due to a bug somewhere else. I'm not sure if
the error was returned in a read() or write() (Kevin might have more details).

Strange, EACCES should only happen on open(). Is it possible that somehow a reopen was happening?

This is a bit extreme and I'd agree it's arguable whether or not we should
report EACCESS, but I had this in mind and ended up mentioning it...

If we can't explain why an error would occur, we shouldn't make it part of the protocol :-)

Maybe libvirt guys could provide more input wrt the error reason usage.
If we don't have valid use cases for other errors, then I'll agree that
providing only "no space" is enough.

Definitely!  Adding libvirt to the CC to help encourage their input.


Anthony Liguori

