[libvirt] Re: kernel summit topic - 'containers end-game'

Serge E. Hallyn serue at us.ibm.com
Mon Jul 6 14:51:37 UTC 2009


Quoting Daniel Lezcano (dlezcano at fr.ibm.com):
> Serge E. Hallyn wrote:
...
> Checkpoint:
> 	- The initiator of the checkpoint initialize the barrier and send a  
> signal SIGCKPT to all the checkpointable tasks and these ones will jump  
> on the handler and block on the barrier.
>
> 	- When all these tasks reach this barrier, the initiator of the
> checkpoint dumps the system wide resources (memory, sysv ipc, struct  
> files, etc ...).
>
> 	- When this is done, the tasks are released and they store their  
> process wide resources (semundo, file descriptor, etc ...) to a  
> current->ckpt_restart buffer and then set the status of the operation  
> and block on the barrier.
>
> 	- The initiator of the checkpoint then collects all these informations  
> and dump them.

Do you envision all of the dumping being done in kernel or by userspace?

...

> 	- Finally the initiator of the checkpoint release the tasks.
>
>
> Restart:
> 	- The user executes the statefile, that spawns the process tree and all  
> the processes are blocked in the barrier.
>
> 	- The initiator of the restart restore the system wide resources
> and fill the restarted processes' current->ckpt_restart buffer.

Same question about restore...

> 	- The initiator sends a SIGRESTART to all the tasks and unblock the tasks
>
> 	- all the tasks restore their process wide resources regarding the  
> current->ckpt_restart buffer.
>
> 	- all the tasks write their status and block on the barrier
>
> 	- the initiator of the restart release the tasks which will return to  
> their execution context when they were checkpointed.
>
> This approach is different of you are doing but I am pretty sure most of  
> the code is re-usable. I see different advantages of this approach:
>
>  - because the process resources are checkpointed / restarted from  
> current, it would be easy to reuse some syscalls code (from the kernel  
> POV) and that would reduce the code duplication and maintenance overhead.
>
>  - the approach is more fine grained as we can implement piece by piece  
> the checkpoint / restart.
>
>  - as the statefile is in the elf format, gdb could be used to debug a  
> statefile as a core file

Note btw that Dave has found that a checkpoint is faster than a core-dump
at the moment :)  That's not entirely an aside - I need to reread your
email a few times and really process your suggestion, but given that some
users want to dump hundreds of gigabytes of memory, not slowing down the
checkpoint is a big consideration.

>  - as each process checkpoint / restart themselves, most of the  
> execution context is stored in the stack which is CR with the memory, so  
> when returning from the signal handler, the process returns to the right  
> context. That is less complicated and more generic than externally  
> checkpoint the execution context of a frozen task which would be  
> potentially different for the restart.
>
>
> I hope Serge you can present this approach as an alternative of the  
> current patchset __if__ this one is not acceptable.

I'll try to understand it better than I do right now - I don't think
it's for discussing at ksummit, but definately if we have a mini-summit
or during the next round of discussions (during or immediately after
the ckpt-v17 publish).

thanks,
-serge




More information about the libvir-list mailing list