[Linux-cluster] CLVM/GFS2 distributed locking

Tue Jan 3 09:55:28 UTC 2012

Hi,

On Fri, 2011-12-30 at 21:37 +0100, Stevo Slavić wrote:
> Pulling the cables between shared storage and foo01, foo01 gets
> fenced. Here is some info from foo02 about shared storage and dlm
> debug (lock file seems to remain locked)
> 
> root at foo02:-//data/activemq_data#ls -li
> total 276
>  66467 -rw-r--r-- 1 root root 33030144 Dec 30 16:32 db-1.log
>  66468 -rw-r--r-- 1 root root    73728 Dec 30 16:24 db.data
>  66470 -rw-r--r-- 1 root root    53344 Dec 30 16:24 db.redo
> 128014 -rw-r--r-- 1 root root        0 Dec 30 19:49 dummy
>  66466 -rw-r--r-- 1 root root        0 Dec 30 16:23 lock
> root at foo02:-//data/activemq_data#grep -A 7 -i
> 103a2 /debug/dlm/activemq
> Resource ffff81090faf96c0 Name (len=24) "       2           103a2"  
> Master Copy
> Granted Queue
> 03d10002 PR Remote:   1 00c80001
> 00e00001 PR
> Conversion Queue
> Waiting Queue
> --
> Resource ffff81090faf97c0 Name (len=24) "       5           103a2"  
> Master Copy
> Granted Queue
> 03c30003 PR Remote:   1 039a0001
> 03550001 PR
> Conversion Queue
> Waiting Queue
> 
> 
> Are there some docs for interpreting this dlm debug output?
> 
> 
Not as such I think. It sounds like the issue is recovery related. Are
there any messages which indicate what might be going on? Once the
failed node has been fenced, then recovery should proceed fairly soon
afterwards,

Steve.

> Regards,
> Stevo.
> 
> On Fri, Dec 30, 2011 at 9:23 PM, Digimer <linux at alteeve.com> wrote:
>         On 12/30/2011 03:08 PM, Stevo Slavić wrote:
>         > Hi Digimer and Yvette,
>         >
>         > Thanks for tips! I don't doubt reliability of the
>         technology, just want
>         > to make sure it is configured well.
>         >
>         > After fencing a node that held a lock on a file on shared
>         storage, lock
>         > remains, and non-fenced node cannot take over the lock on
>         that file.
>         > Wondering how can one check which process (from which node
>         if possible)
>         > is holding a lock on a file on shared storage.
>         > dlm should have taken care of releasing the lock once node
>         got fenced,
>         > right?
>         >
>         > Regards,
>         > Stevo.
>         
>         
>         After a successful fence call, DLM will clean up any locks
>         held by the
>         lost node. That's why it's so critical that the fence action
>         succeeded
>         (ie: test-test-test). If a node doesn't actually die in a
>         fence, but the
>         cluster thinks it did, and somehow the lost node returns, the
>         lost node
>         will think it's locks are still valid and modify shared
>         storage, leading
>         to near-certain data corruption.
>         
>         It's all perfectly safe, provided you've tested your fencing
>         properly. :)
>         
>         Yvette,
>         
>          You might be right on the 'noatime' implying 'nodiratime'...
>         I add
>         both out of habit.
>         
>         --
>         Digimer
>         E-Mail:              digimer at alteeve.com
>         Freenode handle:     digimer
>         Papers and Projects: http://alteeve.com
>         Node Assassin:       http://nodeassassin.org
>         "omg my singularity battery is dead again.
>         stupid hawking radiation." - epitron
>         
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster