[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] GFS2 - F_SETLK fails with "ENOSYS" after umount + mount



Hi,

On Wed, 2013-01-30 at 12:31 +0100, Kristian Grønfeldt Sørensen wrote:
> Hi,
> 
> I'm setting up a two-node cluster sharing a single GFS2 filesystem
> backed by a dual-primary DRBD-device (DRBD on top of LVM, so no CLVM
> involved).
> 
> I am experiencing more or less the same as the OP in this thread:
> http://www.redhat.com/archives/linux-cluster/2010-July/msg00136.html
> 

Well I'm not so sure about that. We never found out what the issue was
in that case, but in your case it seems that you are doing something
which should work. Also, in the msg00136 case it seems that the lock
request didn't work at all, whereas in your case it appears that it does
work until a umount/mount of one node - at least if I've understood it
correctly.

Which kernel and userspace are you using?

It would be a good plan to report this as a bug (or via support if you
are a supported customer and are using RHEL) as it should work
correctly,

Steve.


> I have an activemq-5.6.0 instance on each server that tries to lock a
> file on the GFS2-filesystem (using ).  
> 
> When i start the cluster, everything works as expected. The first
> activemq instance that starts up acquires the lock, the lock is released
> when the activemq exits, and the second instance takes the lock. 
> 
> The problem shows when I unmount and subsequently mount the GFS2
> filesystem  again on one of the nodes, or reboot one of the nodes (after
> having started at least one activemq instance.) 
> The I start seeing statements like this in the activemq log files:
> 
> Database /srv/activemq/queue#3a#2f#2fstat.#3e/lock is locked... waiting 10 seconds for the database to be unlocked. Reason: java.io.IOException: Function not implemented | org.apache.activemq.store.kahadb.MessageDatabase
> 
> strace -f while that message is logged gives the following:
> 
> [pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
> [pid  3549] stat("/srv/activemq/queue#3a#2f#2fstat.#3e", {st_mode=S_IFDIR|0755, st_size=3864, ...}) = 0
> [pid  3549] open("/srv/activemq/queue#3a#2f#2fstat.#3e/lock", O_RDWR|O_CREAT, 0666) = 133
> [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  3549] fcntl(133, F_GETFD)         = 0
> [pid  3549] fcntl(133, F_SETFD, FD_CLOEXEC) = 0
> [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  3549] fstat(133, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
> [pid  3549] fcntl(133, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = -1 ENOSYS (Function not implemented)
> [pid  3549] dup2(138, 133)              = 133
> [pid  3549] close(133)
> 
> As you can see, the "Function not implemented" originates from the
> F_SETLK fnctl that the JVM does. 
> The only way to recover from this state seems to be by unmounting the
> GFS2-filesystem on both nodes, then mounting it again again on both
> nodes. 
> 
> I've tried to isolate this by using a simpler testcase than starting two
> activemq instances. I ended up using the java sample from
> http://www.javabeat.net/2007/10/locking-files-using-java/ . 
> 
> I haven't managed to get the system in to a state where F_SETLK returns
> "Function no implemented" by only using the above FileLockTest class, (I
> need activemq in order to trigger the situation) but when the system is
> in that state, I can run FileLockTest, and it will print out the
> following stacktrace.
> 
> Exception in thread "main" java.io.IOException: Function not implemented
>         at sun.nio.ch.FileChannelImpl.lock0(Native Method)
>         at sun.nio.ch.FileChannelImpl.tryLock(FileChannelImpl.java:871)
>         at java.nio.channels.FileChannel.tryLock(FileChannel.java:962)
>         at FileLockTest.main(FileLockTest.java:15)
> 
> 
> If I run this on the other server (where the GFS2 fs was not unmounted
> and mounted again), it works correctly. 
> 
> Any ideas to what happens, and why?
> 
> BR
> Kristian Sørensen
> 
> -- 
> Linux-cluster mailing list
> Linux-cluster redhat com
> https://www.redhat.com/mailman/listinfo/linux-cluster



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]