The concept to save (i.e. checkpoint / dump) the state of a process, at a certain point in time, so that it may later be used to restore / restart the process (to the exact same state) has existed for many years. One of the most prominent motivations to develop and support checkpoint/restore functionality was to provide improved fault tolerance. For example, checkpoint/restore allows for processes to be restored from previously created checkpoints if, for one reason or another, these processes had been aborted.
Over the years there have been several different implementations of checkpoint/restore for Linux. Existing implementations of checkpoint/restore differ in terms of “what level” (of the operating system) they are operating; the lowest level approaches focus on implementing checkpoint/restore directly in the kernel while other “higher level” approaches implement checkpoint/restore completely in user-space. While it would be difficult to unearth each and every approach / implementation - it is likely fair to
assume that the various permutations of checkpoint/restore have covered nearly all possible "levels" (i.e. from “completely in kernel” to “completely in user-space”).
Checkpoint/Restore in Linux
The closer to the kernel the checkpoint/restore is implemented - the more transparent it can be. What does “more transparent” mean? More transparent means that it has less requirements on the processes being checkpointed. For example, requirements can include things like: pre-loading special libraries (which could then be used to intercept system calls). It can also mean that an application has to be re-compiled to be able to be checkpointed. These kinds of pre-requisites make the usage of a checkpoint/restore implementation more difficult as it means that before starting some processes it must already be known if these processes will ever be checkpointed (...at some arbitrary point in “the future”).
Full transparency would likely be difficult to implement as it could require massive changes in the Linux kernel and such changes would have to be accepted by the Linux kernel community.
So... after many approaches to implement checkpoint/restore... a new approach started in 2011. This approach was named Checkpoint/Restore in Userspace (CRIU) and although it is named "in Userspace" it actually is both the user space and the kernel space. CRIU uses existing Linux kernel interfaces as far as possible; only extending existing interfaces / introducing new interfaces as needed. These required changes were (thankfully) all accepted by the Linux kernel community as most could also be used in other cases were more detailed information about a running process could be useful.
CRIU
When CRIU is used to checkpoint a process it uses the existing and extended interfaces to the Linux kernel to collect as much information about the process. Using the PTRACE interface CRIU takes control over the process (CRIU actually always operates on a process tree; one process and all its child processes) and pauses that process. In the next step code is injected via PTRACE in the paused process. CRIU calls this code parasite-code. The parasite-code then runs from within the process's address space and can access and dump/save/checkpoint the memory content of the process. Once all memory pages and all additional information have been collected (and possibly written to a directory) the process can either continue running or it can be aborted. Letting the process continue to run is something to expect in a fault tolerance scenario - to migrate a process to another system the process would be aborted.
To restore a process CRIU transforms the CRIU process doing the restore into the process to be restored. The is one of the places where CRIU uses a newly introduced Linux kernel interface. With CRIU a process can only be restored with the same process identifier (PID) the process had during checkpoint. To influence which PID the restored process gets, CRIU writes the PID of the to be restored process minus one to /proc/sys/kernel/ns_last_pid. CRIU then verifies if the newly created process actually has the desired PID. If not - the process restoration is aborted. Another example of steps performed during restore are file descriptors, which are re-opened with the same identifier as in the original process and then the file descriptors are repositioned to the same location. All extracted (dumped) memory pages are loaded from the checkpoint directory to the currently being restored process and mapped to the same location as in the original process.
During the last step to restore the process CRIU jumps into the restored process at the same location it was during checkpoint and from that point on the restored process continues to run (without ever “knowing” that it was migrated or restored).
CRIU Limitations
One of the most obvious and hardest requirements for using CRIU is that used libraries (on the source and destination system) of the checkpointed and restored process must be exactly the same. The libraries are not newly loaded on the destination system of the restore. The restored binary expects all library provided functions to be at the (exact) same memory address as before. If a used library function is at a different memory location - the restored process will crash. Although this sounds like a severe restriction... it is not as fatal as it sounds. For example, when using CRIU to migrate a container (i.e. a “fancy process”) the container will often include not only the actual application to be migrated - it will also include the required libraries.
Another limitation of note: CRIU cannot (currently) be used to migrate applications which are directly accessing hardware through ioctl(). If such an application needs to be checkpointed and restarted or migrated CRIU provides an interface to create plugins which can be used to extract the state of the hardware on the source system and then put the hardware back into the same state during restore.
It is also important to remember that it is not possible to checkpoint processes which are already being ptrace’d (e.g., gdb, strace).
Process Migration
Process migration in its simplest form is nothing more than to checkpoint a process, transfer the checkpoint image from the source to the destination system and restore the process (once it has been transferred. Red Hat Enterprise Linux 7.2 provides CRIU as a Technology Preview. To migrate a process the following steps could be performed:
On the source system -
mkdir /tmp/checkpoint criu dump -D /tmp/checkpoint -t `pidof <process>` rsync -a /tmp/checkpoint <destination system>:/tmp/
On the destination system -
criu restore -D /tmp/checkpoint
With these commands a process can be migrated from one system to another. Note that the aforementioned limitations (still) need to be taken into account. Examples for processes which can be successfully migrated with the above listed steps include:
- A webserver streaming a video
- A postgresql database
- A Java application server communicating with the postgresql database
Many other applications have been successfully migrated and an important feature is that any established TCP network connections are (also) migrated with CRIU.
Using a file system which is shared between the source and destination host (NFS for example) removes the necessity to also transfer the required files (the streamed video, the database files, the java application).
Container Migration
Migrating a container is not that different than migrating a process from the standpoint of CRIU. It is also important to know that CRIU was developed with the goal to migrate containers. As mentioned earlier, CRIU always operates on a process and all its child processes. In most (if not all) cases a container is also such a process tree. In fact, migrating a container might be even easier (than migration a “vanilla” process) as interaction from the container’s processes to the “outside world” is limited (e.g. with the help of namespaces).
In addition to transferring the images of all checkpointed processes the container file system also needs to be transferred if the hosts running the containers are not using a shared file system.
Outlook
Using CRIU it is possible to migrate running processes, process trees, and containers. Migration in its simplest form (checkpoint, transfer, restore) can, depending on the size of processes, require a downtime which might be longer than desired. Virtual machine migration has incorporated a number of optimizations to reduce the downtime during migration; these same techniques already exist for process migration based on CRIU. Interested in learning more about CRIU? Check out this white paper / kbase in the Red Hat Customer Portal. Alternatively, do feel free to reach out using the comments section (below).
저자 소개
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.