[libvirt RFCv11 00/33] multifd save restore prototype

Wed Oct 11 14:56:12 UTC 2023

Hi Daniel,

thanks for your answer,

On 10/11/23 16:05, Daniel P. Berrangé wrote:
> On Wed, Oct 11, 2023 at 03:46:59PM +0200, Claudio Fontana wrote:
>> In terms of our use case, we would need to trigger these migrations from virsh save, restore, managedsave / start.
>>
>> 1) Can you confirm this is still a good target?
> 
> IIRC the 'dump' command also has a codepath that can exercise
> the migrate-to-file logic too.
> 
>> It would seem right from my perspective to hook up save/restore first, and then reuse the same mechanism for managedsave / start.
> 
> All of save, restore, managedsave, start, dump end up calling
> into the same internal helper methods. So once you update these
> helpers, you essentially get all the commands converted in one
> go.

ok

> 
>> 2) Do we expect to pass filename or file descriptor from libvirt into QEMU?
>>
>>
>> As is, libvirt today generally passes an already opened file descriptor to QEMU for migrations, roughly:
>>
>> {"execute": "getfd", "arguments": {"fdname":"migrate"}} (passing the already open fd from libvirt f.e. 10)
>> {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"fd:migrate"}}'
>>
>> Do we want to change libvirt to migrate to a file: URI ? Does this have consequence for "labeling" / security sandboxing?
>>
>> Or would it be better to continue opening the fd in libvirt, writing the libvirt header, and then passing the existing open fd to QEMU, using QMP command "getfd",
>> followed by "migrate"? In this second case we would need to inform QEMU of the offset into the already open fd.
> 
> How about both :-)
> 
> The current migration 'fd' protocol technically can cope with
> any type of FD being passed on. QEMU doesn't try to interpret
> the FD type right to any significant degree.
> 
> The 'file' protocol is explicitly providing a migration transport
> supporting random access I/O to storage. As such we can specify
> the offset too.
> 
> Now the neat trick is that 'file' protocol impl uses
> qio_channel_file and this in turn uses qemu_open,
> which supports FD passing.

Interesting!

> 
> Instead of using 'getfd' though we have to use 'add-fd'.
> 
> Anyway, this lets us do FD passing as normal, whle also
> letting us specify the offset.
> 
>  {"execute": "add-fd", "arguments": {"fdset-id":"migrate"}}
>  {"execute": "migrate", "arguments": {"detach":true,"blk":false,"inc":false,"uri":"file:/dev/fdset/migrate,offset=124456"}}'
> 
>> Internally, the QEMU multifd code just reads and writes using pread, pwrite, so there is in any case just one fd to worry about,
>> but who should own it, libvirt or QEMU?
> 
> How about both :-)

I need to familiarize a bit with this, there are pieces I am missing. Can you correct here?

OPTION 1)

libvirt opens the file and has the FD, writes the header, marks the offset,
then we dup the FD in libvirt for the benefit of QEMU, optionally set the flags of the dup to "O_DIRECT" (the usual case) depending on --bypass-cache,
pass the duped FD to QEMU,
QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter),
then libvirt closes the duped fd
libvirt rewrites the header using the original fd (needed to update the metadata),
libvirt closes the original fd

OPTION 2)

libvirt opens the file and has the FD, writes the header, marks the offset,
then we pass the FD to QEMU,
QEMU dups the FD and sets it as "O_DIRECT" depending on a passed parameter,
QEMU does all the pread/pwrite on it with the correct offset (since it knows it from the file:// URI optional offset parameter),
QEMU closes the duped FD,
libvirt rewrites the header using the original fd (needed to update the metadata),
libvirt closes the original fd

I don't remember if QEMU changes for the file offsets optimization are already "block friendly" ie they operate correctly whatever the state of O_DIRECT or ~O_DIRECT,
I think so. They have been thought with O_DIRECT in mind.

So I would tend to see OPTION 1) as more attractive as QEMU does not need to care about another parameter, whatever has been chosen in libvirt in terms of bypass cache is handled in libvirt.

Please correct my understanding where needed, thanks!

Claudio

> 
> Libvirt will open the file, in order to write its header.
> Then libvirt passes the open FD to QEMU, specifying the
> offset, and QEMU does its thing with vmstate, etc and
> closes the FD when its done. libvirt's copy of the FD
> is still open, and libvirt can finalize its header and
> close the FD.
> 
>> 3) How do we deal with O_DIRECT? In the prototype we were setting the O_DIRECT on the fd from libvirt in response to the user request for --bypass-cache,
>> which is needed 99% of the time with large VMs. I think I remember that we plan to write from libvirt normally (without O_DIRECT) and then set the flag later,
>> but should libvirt or QEMU set the O_DIRECT flag? This likely depends on who owns the fd?
> 
> For O_DIRECT, the 'file' protocol should gain a new parameter
> 'bypass_cache: bool'. If this is set to 'true' then QEMU can
> set O_DIRECT on the FD it opens or receives from libvirt.
> 
> Libvirt probably just has to be careful to unset O_DIRECT
> at the end before it finalizes the header.
> 
> With regards,
> Daniel