[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] CLVM/GFS2 distributed locking



Pulling the cables between shared storage and foo01, foo01 gets fenced. Here is some info from foo02 about shared storage and dlm debug (lock file seems to remain locked)

root foo02:-//data/activemq_data#ls -li
total 276
 66467 -rw-r--r-- 1 root root 33030144 Dec 30 16:32 db-1.log
 66468 -rw-r--r-- 1 root root    73728 Dec 30 16:24 db.data
 66470 -rw-r--r-- 1 root root    53344 Dec 30 16:24 db.redo
128014 -rw-r--r-- 1 root root        0 Dec 30 19:49 dummy
 66466 -rw-r--r-- 1 root root        0 Dec 30 16:23 lock
root foo02:-//data/activemq_data#grep -A 7 -i 103a2 /debug/dlm/activemq
Resource ffff81090faf96c0 Name (len=24) "       2           103a2" 
Master Copy
Granted Queue
03d10002 PR Remote:   1 00c80001
00e00001 PR
Conversion Queue
Waiting Queue
--
Resource ffff81090faf97c0 Name (len=24) "       5           103a2" 
Master Copy
Granted Queue
03c30003 PR Remote:   1 039a0001
03550001 PR
Conversion Queue
Waiting Queue


Are there some docs for interpreting this dlm debug output?


Regards,
Stevo.

On Fri, Dec 30, 2011 at 9:23 PM, Digimer <linux alteeve com> wrote:
On 12/30/2011 03:08 PM, Stevo Slavić wrote:
> Hi Digimer and Yvette,
>
> Thanks for tips! I don't doubt reliability of the technology, just want
> to make sure it is configured well.
>
> After fencing a node that held a lock on a file on shared storage, lock
> remains, and non-fenced node cannot take over the lock on that file.
> Wondering how can one check which process (from which node if possible)
> is holding a lock on a file on shared storage.
> dlm should have taken care of releasing the lock once node got fenced,
> right?
>
> Regards,
> Stevo.

After a successful fence call, DLM will clean up any locks held by the
lost node. That's why it's so critical that the fence action succeeded
(ie: test-test-test). If a node doesn't actually die in a fence, but the
cluster thinks it did, and somehow the lost node returns, the lost node
will think it's locks are still valid and modify shared storage, leading
to near-certain data corruption.

It's all perfectly safe, provided you've tested your fencing properly. :)

Yvette,

 You might be right on the 'noatime' implying 'nodiratime'... I add
both out of habit.

--
Digimer
E-Mail:              digimer alteeve com
Freenode handle:     digimer
Papers and Projects: http://alteeve.com
Node Assassin:       http://nodeassassin.org
"omg my singularity battery is dead again.
stupid hawking radiation." - epitron


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]