[Linux-cluster] gfs1 and 2.6.20

Thu Feb 22 14:33:22 UTC 2007

Robert Peterson wrote:
> Asbjørn Sannes wrote:
>> Asbjørn Sannes wrote:
>>  
>>> I have been trying to use the STABLE branch of the cluster suite with
>>> vanilla 2.6.20 kernel, and everything seemed at first to work, my
>>> problem can be reproduced by this:
>>>
>>> mount a gfs filesystem anywhere..
>>> do a sync, this sync will now just hang there ..
>>>
>>> If I unmount the filesystem in another terminal, the sync command will
>>> end..
>>>
>>> .. dumping the kernel stack of sync shows that it is in
>>> __sync_inodes on
>>> __down_read, looking in the code it seems that is waiting for the
>>> s_umount semaphore (in the superblock)..
>>>
>>> Just tell me if you need any more information or if this is not the
>>> correct place for this..
>>>       
>> Here is the trace for sync (while hanging) ..
>>
>> sync          D ffffffff8062eb80     0 17843 
>> 15013                    (NOTLB)
>> ffff810071689e98 0000000000000082 ffff810071689eb8 ffffffff8024d210
>> 0000000071689e18 0000000000000000 0000000100000000 ffff81007b670fe0
>> ffff81007b6711b8 00000000000004c8 ffff810037c84770 0000000000000001
>> Call Trace:
>> [<ffffffff8024d210>] wait_on_page_writeback_range+0xed/0x140
>> [<ffffffff8046046c>] __down_read+0x90/0xaa
>> [<ffffffff802407d6>] down_read+0x16/0x1a
>> [<ffffffff8028df35>] __sync_inodes+0x5f/0xbb
>> [<ffffffff8028dfa7>] sync_inodes+0x16/0x2f
>> [<ffffffff80290293>] do_sync+0x17/0x60
>> [<ffffffff802902ea>] sys_sync+0xe/0x12
>> [<ffffffff802098be>] system_call+0x7e/0x83
>>
>> Greetings,
>> Asbjørn Sannes
>>
> Hi Asbjørn,
>
> I'll look into this as soon as I can find the time...
>
Great! I tried to figure out why the s_umount semaphore was not upped by
comparing to other filesystems, but the functions seems almost identical
.. so I cheated and looked what had changed lately (from your patch):

diff -w -u -p -p -u -r1.1.2.1.4.1.2.1 diaper.c

--- gfs-kernel/src/gfs/diaper.c	26 Jun 2006 21:53:51 -0000	1.1.2.1.4.1.2.1
+++ gfs-kernel/src/gfs/diaper.c	2 Feb 2007 22:28:41 -0000
@@ -50,7 +50,7 @@ static int diaper_major = 0;
 static LIST_HEAD(diaper_list);
 static spinlock_t diaper_lock;
 static DEFINE_IDR(diaper_idr);
-kmem_cache_t *diaper_slab;
+struct kmem_cache *diaper_slab;
 
 /**
  * diaper_open -
@@ -232,9 +232,9 @@ get_dummy_sb(struct diaper_holder *dh)
 	struct inode *inode;
 	int error;
 
-	mutex_lock(&real->bd_mount_mutex);
+	down(&real->bd_mount_sem);
 	sb = sget(&gfs_fs_type, gfs_test_bdev_super, gfs_set_bdev_super, real);
-	mutex_unlock(&real->bd_mount_mutex);
+	up(&real->bd_mount_sem);
 	if (IS_ERR(sb))
 		return PTR_ERR(sb);
 
@@ -252,7 +252,6 @@ get_dummy_sb(struct diaper_holder *dh)
 	sb->s_op = &gfs_dummy_sops;
 	sb->s_fs_info = dh;
 
-	up_write(&sb->s_umount);
 	module_put(gfs_fs_type.owner);
 
 	dh->dh_dummy_sb = sb;
@@ -263,7 +262,6 @@ get_dummy_sb(struct diaper_holder *dh)
 	iput(inode);
 
  fail:
-	up_write(&sb->s_umount);
 	deactivate_super(sb);
 	return error;
 }



And undid those up_write ones (added them back in), which helped, I
don't know if it safe though, and maybe you could shed some lights on
why they were removed? (I didn't find any changes that would do up_write
on s_umount later..
> Regards,
>
> Bob Peterson
> Red Hat Cluster Suite
>
> -- 
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster
>
Mvh,
Asbjørn Sannes