[Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL)
David Teigland
teigland at redhat.com
Tue Jan 8 22:56:09 UTC 2008
On Fri, Jan 04, 2008 at 04:18:45PM -0500, Charlie Brady wrote:
> We've reduced the application code to a simple test case. The following
> code run on each node will soon block, and doesn't receive signals until
> the peer node is shutdown:
>
> ...
> fl.l_whence=SEEK_SET;
> fl.l_start=0;
> fl.l_len=1;
>
> while (1)
> {
> fl.l_type=F_WRLCK;
> retval=fcntl(filedes,F_SETLKW,&fl);
> if (retval==-1)
> {
> perror("lock");
> exit(1);
> }
> // attempt to unlock the index file
> fl.l_type=F_UNLCK;
> retval=fcntl(filedes,F_SETLKW,&fl);
> if (retval==-1)
> {
> perror("unlock");
> exit(1);
> }
> }
Yes, this stresses a problematic design limitation in the RHEL4 dlm where
the dlm master node is ping-ponging all over the place and becomes so
unstable that everything comes to a halt. One possible work-around is to
modify the program to hold a lock on filedes to keep the master stable,
e.g. hold a zero length lock at some unused offset like 0xFFFFFF.
Dave
More information about the Linux-cluster
mailing list