[linux-lvm] clvmd leaving kernel dlm uncontrolled lockspace

Andreas Pflug pgadmin at pse-consulting.de
Wed Jun 5 13:23:32 UTC 2013


Hi David,

I got quite some trouble with clvmd on corosync 2.3.0/dlm; apparently a 
nonfunctional clvmd in the cluster can block all others (kern.log states 
clvmd stuck for >120s in some dlm call). I tried to clean things up 
killing -9 clvmd, but it will remain on state D or Z. Unfortunately, it 
seems that those zombies still keep some dlm stuff locked. When I 
restart corosync on a node and dlm_controld -D on it, I see "found 
uncontrolled lockspace, tell corosync to remove nodeid from cluster".

Well, that's fine for the first step, but how about cleaning up the dlm 
lockspace? dlm_tool leave <lockspace> hangs as well (sometimes it just 
fails with error 49). The comment in dlm_controld/action.c isn't too 
satisfactory: need reboot, not funny if a whole cluster is affected. I'd 
really appreciate a way to manually clean old lockspaces. I'd presume 
that an uncontrolled lockspace on an isolated node should be easily 
removable...


Regards
Andreas




More information about the linux-lvm mailing list