[Linux-cluster] fcntl locking lockup (dlm 1.07, GFS 6.1.5, kernel 2.6.9-67.EL)

David Teigland teigland at redhat.com
Tue Jan 8 22:56:09 UTC 2008


On Fri, Jan 04, 2008 at 04:18:45PM -0500, Charlie Brady wrote:
> We've reduced the application code to a simple test case. The following 
> code run on each node will soon block, and doesn't receive signals until 
> the peer node is shutdown:
> 
> ...
>     fl.l_whence=SEEK_SET;
>     fl.l_start=0;
>     fl.l_len=1;
> 
>     while (1)
>     {
>       fl.l_type=F_WRLCK;
>       retval=fcntl(filedes,F_SETLKW,&fl);
>       if (retval==-1)
>       {
>         perror("lock");
>         exit(1);
>       }
>       // attempt to unlock the index file
>       fl.l_type=F_UNLCK;
>       retval=fcntl(filedes,F_SETLKW,&fl);
>       if (retval==-1)
>       {
>         perror("unlock");
>         exit(1);
>       }
>     }

Yes, this stresses a problematic design limitation in the RHEL4 dlm where
the dlm master node is ping-ponging all over the place and becomes so
unstable that everything comes to a halt.  One possible work-around is to
modify the program to hold a lock on filedes to keep the master stable,
e.g.  hold a zero length lock at some unused offset like 0xFFFFFF.

Dave




More information about the Linux-cluster mailing list