[Linux-cluster] Lock Resources

Ja S jas199931 at yahoo.com
Wed May 7 23:41:36 UTC 2008


--- Ja S <jas199931 at yahoo.com> wrote:

> > > 
> > > A couple of further questions about the master
> > copy of
> > > lock resources.
> > > 
> > > The first one:
> > > =============
> > > 
> > > Again, assume:
> > > 1) Node A is extremely too busy and handle all
> > > requests
> > > 2) other nodes are just idle and have never
> > handled
> > > any requests
> > > 
> > > According to the documents, Node A will hold all
> > > master copies initially. The thing I am not
> aware
> > of
> > > and unclear is whether the lock manager will
> > evenly
> > > distribute the master copies on Node A to other
> > nodes
> > > when it thinks the number of master copies on
> Node
> > A
> > > is too many?
> > 
> > Locks are only remastered when a node leaves the
> > cluster. In that case
> > all of its nodes will be moved to another node. We
> > do not do dynamic
> > remastering - a resource that is mastered on one
> > node will stay mastered
> > on that node regardless of traffic or load, until
> > all users of the
> > resource have been freed.
> 
> 
> Thank you very much.
> 
> 
> > 
> > > The second one:
> > > ==============
> > > 
> > > Assume a master copy of lock resource is on Node
> > A.
> > > Now Node B holds a local copy of the lock
> > resource.
> > > When the lock queues changed on the local copy
> on
> > Node
> > > B, will the master copy on Node A be updated
> > > simultaneously? If so, when more than one nodes
> > have
> > > the local copy of the same lock resource, how
> the
> > lock
> > > manager to handle the update of the master copy?
> > Using
> > > another lock mechanism to prevent the corruption
> > of
> > > the master copy?
> > > 
> > 
> > All locking happens on the master node. The local
> > copy is just that, a
> > copy. It is updated when the master confirms what
> > has happened. The
> > local copy is there mainly for rebuilding the
> > resource table when a
> > master leaves the cluster, and to keep a track of
> > locks that exist on
> > the local node. The local copy is NOT complete. it
> > only contains local
> > users of a resource.
> > 
> 
> Thanks again for the kind and detailed explanation. 
> 
> 
> I am sorry I have to bother you again as I am having
> more questions. I analysed /proc/cluster/dlm_dir and
> dlm_locks and found some strange things. Please see
> below:
> 
> 
> >From /proc/cluster/dlm_dir:
> 
> In lock space [ABC]:
> This node (node 2) has 445 lock resources in total
> where
> --328   master lock resources
> --117   local copies of lock resources mastered on
> other nodes.
>
> ===============================
> ===============================
> 
> 
> >From /proc/cluster/dlm_locks:
> 
> In lock space [ABC]:
> There are 1678 lock resouces in use where
> --1674  lock resources are mastered by this node
> (node
> 2)
> --4     lock resources are mastered by other nodes,
> within which:
> ----1 lock resource mastered on node 1
> ----1 lock resource mastered on node 3
> ----1 lock resource mastered on node 4
> ----1 lock resource mastered on node 5
> 
> A typical master lock resource in
> /proc/cluster/dlm_locks is:
> Resource 000001000de4fd88 (parent 0000000000000000).
> Name (len=24) "       3         5fafc85"
> Master Copy
> LVB: 01 16 19 70 00 00 ff f8 00 00 00 00 00 00 00 00
>      00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Granted Queue
> 1ff5036d NL Remote:   4 000603e8
> 80d2013f NL Remote:   5 00040214
> 00240209 NL Remote:   3 0001031d
> 00080095 NL Remote:   1 00040197
> 00010304 NL
> Conversion Queue
> Waiting Queue
> 
> 
> After search for local copy in
> /proc/cluster/dlm_locks, I got:
> Resource 000001002a273618 (parent 0000000000000000).
> Name (len=16) "withdraw 3......"
> Local Copy, Master is node 3
> Granted Queue
> 0004008d PR Master:     0001008c
> Conversion Queue
> Waiting Queue
> 
> --
> Resource 000001003fe69b68 (parent 0000000000000000).
> Name (len=16) "withdraw 5......"
> Local Copy, Master is node 5
> Granted Queue
> 819402ef PR Master:     00010317
> Conversion Queue
> Waiting Queue
> 
> --
> Resource 000001002a2732e8 (parent 0000000000000000).
> Name (len=16) "withdraw 1......"
> Local Copy, Master is node 1
> Granted Queue
> 000401e9 PR Master:     00010074
> Conversion Queue
> Waiting Queue
> 
> --
> Resource 000001004a32e598 (parent 0000000000000000).
> Name (len=16) "withdraw 4......"
> Local Copy, Master is node 4
> Granted Queue
> 1f5b0317 PR Master:     00010203
> Conversion Queue
> Waiting Queue
> 
> These four local copy of lock resources have been
> staying in /proc/cluster/dlm_locks for several days.
> 
> Now my questions:
> 1. In my case, for the same lock space, the number
> of
> master lock resources reported by dlm_dir is much
> SMALLER than that reported in dlm_locks. My
> understanding is that master lock resources listed
> in
> dlm_dir must be larger than or at least the same as
> that reported in dlm_locks. The situation I
> discovered
> on the node does not make any sense to me. Am I
> missing anything? Can you help me to clarify the
> case?

I have found the answer. Yes, I did miss something. I
need to sum all lock resources mastered by the node on
all cluster members. In this case, the total number of
lock resources mastered by the node is just 1674,
which matches the number reported from dlm_locks.
Sorry for asking the question without careful
thinking.


> 2. What can cause "withdraw ...." to be the lock
> resource name? 

After read the gfs source code, it seems that this is
caused by issuing a command like "gfs_tool withdraw
<mountpoint>". However, I checked all command
histroies on all nodes in the cluster, but did not
find any command like this. This question and the next
question remain open. Please help.
 
> 3. These four local copy of lock resources have not
> been released for at least serveral days as I knew.
> How can I find out whether they are in a strange
> dead
> situation or are still waiting for the lock manager
> to release them? How to change the timeout?
> 

Thank you very much for your great further help in
advance.

Jas



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ




More information about the Linux-cluster mailing list