[Linux-cluster] KVM Live migration when node's FS is read-only

Digimer lists at alteeve.ca
Tue Apr 15 21:03:55 UTC 2014


Hi all,

   So I hit a weird issue last week... (EL6 + cman + rgamanager + drbd)

   For reasons unknown, a client thought they could start yanking and 
replacing hard drives on a running node. Obviously, that did not end 
well. The VMs that had been running on the node continues to operate 
fine and they just started using the peer's storage.

   The problem came when I tried to live-migrate the VMs over to the 
still-good node. Obviously, the old host couldn't write to logs, and the 
live-migration failed. Once failed, rgmanager also stopped working once 
the migration failed. In the end, I had to manually fence the node 
(corosync never failed, so it didn't get automatically fenced).

   This obviously caused the VMs running on the node to reboot, causing 
a ~40 second outage. It strikes me that the system *should* have been 
able to migrate, had it not tried to write to the logs.

   Is there a way, or can there be made a way, to migrate VMs off of a 
node whose underlying FS is read-only/corrupt/destroyed, so long as the 
programs in memory are still working?

   I am sure this is part a part rgmanager, part KVM/qemu question.

Thanks for any feedback!

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?




More information about the Linux-cluster mailing list