[Linux-cluster] Some GDLM questions

Jeff jeff at intersystems.com
Sat Jul 3 14:33:56 UTC 2004


These are from reviewing http://people.redhat.com/~teigland/sca.pdf
and the CVS copy of cluster/dlm/doc/libdlm.txt.
------------------------------------------------------------------

If a program requests a lock on the AST side can it wait for
the lock to complete without returning from the original AST
routine?  Would it use the poll/select mechanism to do this?

What's the best way to implement a blocking lock request in
an application where some requests are synchronous and some
are asynchronous? Use semop() after the lock request and in 
the lock completion routine? Is semop() safe to call from
a thread on Linux? Would pthread_cond_wait()/pthread_cond_signal()
be better?

Does conversion deadlock occur only when a conversion is
about to be queued and its granted/requested state is 
incompatible with another lock already on the conversion queue?
(eg. there is a PR->EX conversion queued and another PR->EX
conversion is about to be queued)

Other DLMs do not deliver a blocking AST to a lock which is not
on the granted queue. This means that a lock which queued for
conversion will not get a blocking AST if it is interfering with
another lock being added to the conversion queue. Does GDLM do this 
as well or are blocking ASTs delivered to all locks regardless of 
their state?

GDLM is not listed as a client of FENCE. This seems to imply
that a GDLM application has to interact directly with FENCE to 
deal with the unknown state problem in a 2 node cluster where each 
member has 1 vote and expected votes is 1 (section 3.2.6.2, page 28)
as otherwise the same lockspace could end up existing on multiple
machines in a single cluster. How would an application interact
with FENCE to prevent this or does this have to be handled by
configuring the cluster to reboot in this case?

libdlm.txt has a vague comment which reads:
   One further point about lockspace operations is that there is no locking
   on the creating/destruction of lockspaces in the library so it is up to
   the application to only call dlm_*_lockspace when it is sure that
   no other locking operations are likely to be happening.
Does this mean 'no other locking operations' by the process which is
creating the lockspace? no other requests to create a lock space on
that cluster member? in the cluster as a whole?


Possible Enhancements:
----------------------
The following two items are areas where GDLM appears to differ from
the DLMs from HP and IBM (eg for VMS, Tru64, AIX and OpenDLM for
Linux which is derived from IBM's DLM for AIX). These differences 
aren't incompatible with GFS's requirements and could be implemented 
as optional behaviors. I'd be happy to work on patches for
these if they would be welcome.

GDLM is described as granting new lock requests as long as they 
are compatible with the existing lock mode regardless of the 
existence of a conversion queue. The other DLMs mentioned above
always queue new lock requests if there are any locks on the conversion 
queue. Certain mechanisms can't be implemented without this kind of
ordering. Would it be possible to make the alternate behavior a property 
of the lock space or a property of a grant request so it can be 
utilized where necessary?

Certain tasks are simplified if the return status of a lock indicates
whether it was granted immediately or ended up on the waiting queue.
Other DLMs which have both synchronous and asynchronous completion
mechanisms implement this via a flag which requests synchronous
completion if the lock is available, otherwise the request is queued
and the asynchronous mechanism is used. This is particularly useful 
for deadman locks that control recovery to distinguish between 
the first instance of a service to start and recovery conditions.
There are other (more complex) techniques to implement this but 
even though GDLM is purely an asynchronous mechanism, it still would 
be possible for the completion status to indicate (if requested) 
whether the lock was granted immediately or not.





More information about the Linux-cluster mailing list