[Linux-cluster] DLM messages

Bas van der Vlies basv at sara.nl
Mon Mar 27 11:24:05 UTC 2006


Bas van der Vlies wrote:
> David Teigland wrote:
>> On Mon, Mar 27, 2006 at 08:23:40AM +0200, Bas van der Vlies wrote:
>>> GFS: cvs STABLE
>>>
>>> I just noticed that as we get these dlm messages the load will  
>>> increase about 60 and stays there. what does this messages
>>> mean?
>>>
>>> process_lockqueue_reply id 237030a state 0
>>
>> These are common and haven't been associated with any problems
>> before.  We should probably remove the printk.  It's caused
>> by a grant message arriving before the reply to the lock request.
>>
> Thanks i also found that answer in the archive
> 
>>> Mar 26 15:35:25 ifs2 kernel: dlm: lisa_vg3_lv1: cancel reply ret 0
>>> Mar 26 15:35:25 ifs2 kernel: lock_dlm: unlock sb_status 0 2,15a940b8  
>>> flags 0
>>
>> I've never seen these before.  They're related to cancels which
>> GFS only does during recovery.  What applications are using GFS?
>>
> Our setup is a 4 node GFS-server that act as Home NFS-server for our 
> cluster. This cluster is used by different universities and research 
> institutes. All applications are using NFS and therfore also GFS.
> 

The load is even getting higher and the node does not respond to NFS
requests. When it try to fence the nodeby disabing the heartbeat-network
interface (It was the master). We get all kind od cman/dlm kernel 
crashes when the node wants to rejoin

Here are some kern.log outputs

==== FS4 ====
Mar 27 12:29:26 ifs4 kernel: ------------[ cut here ]------------
Mar 27 12:29:26 ifs4 kernel: kernel BUG at 
/usr/src/gfs/stable_1.0.2/stable/cluster/gfs-kernel/src/dlm/lock.c:357!
Mar 27 12:29:26 ifs4 kernel: invalid opcode: 0000 [#1]
Mar 27 12:29:26 ifs4 kernel: SMP
Mar 27 12:29:26 ifs4 kernel: Modules linked in: lock_dlm dlm cman 
dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage 
piix e1000 gfs lock_harness dm_mod
Mar 27 12:29:26 ifs4 kernel: CPU:    0
Mar 27 12:29:26 ifs4 kernel: EIP:    0060:[<f8aa5586>]    Tainted: GF 
   VLI
Mar 27 12:29:26 ifs4 kernel: EFLAGS: 00010246   (2.6.16-rc5-sara3 #1)
Mar 27 12:29:26 ifs4 kernel: EIP is at do_dlm_unlock+0x91/0xaa [lock_dlm]
Mar 27 12:29:26 ifs4 kernel: eax: 00000004   ebx: f08b35c0   ecx: 
0000b8f7   edx: 00000246
Mar 27 12:29:26 ifs4 kernel: esi: ffffffea   edi: f8c29000   ebp: 
eff65ee8   esp: eff65edc
Mar 27 12:29:26 ifs4 kernel: ds: 007b   es: 007b   ss: 0068
Mar 27 12:29:26 ifs4 kernel: Process gfs_glockd (pid: 6605, 
threadinfo=eff64000 task=eff46030)
Mar 27 12:29:26 ifs4 kernel: Stack: <0>f8aa9d89 f8c29000 f167e3c0 
eff65ef4 f8aa5824 f08b35c0 eff65f08 f899a7bc
Mar 27 12:29:26 ifs4 kernel: f08b35c0 00000003 f167e3e4 eff65f2c 
f8990ca4 f8c29000 f08b35c0 00000003
Mar 27 12:29:26 ifs4 kernel: f89c4ec0 f167e3c0 00000001 f167e3c0 
eff65f40 f8993680 f167e3c0 f167e3c0
Mar 27 12:29:26 ifs4 kernel: Call Trace:
Mar 27 12:29:26 ifs4 kernel: [<c0103599>] show_stack_log_lvl+0xad/0xb5
Mar 27 12:29:26 ifs4 kernel: [<c01036db>] show_registers+0x10d/0x176
Mar 27 12:29:26 ifs4 kernel: [<c01038ad>] die+0xf2/0x16d
Mar 27 12:29:26 ifs4 kernel: [<c0103996>] do_trap+0x6e/0x8a
Mar 27 12:29:26 ifs4 kernel: [<c0103bed>] do_invalid_op+0x90/0x97
Mar 27 12:29:26 ifs4 kernel: [<c010322f>] error_code+0x4f/0x54
Mar 27 12:29:26 ifs4 kernel: [<f8aa5824>] lm_dlm_unlock+0x1d/0x24 [lock_dlm]
Mar 27 12:29:26 ifs4 kernel: [<f899a7bc>] gfs_lm_unlock+0x2c/0x46 [gfs]
Mar 27 12:29:26 ifs4 kernel: [<f8990ca4>] gfs_glock_drop_th+0xf0/0x12d [gfs]
Mar 27 12:29:26 ifs4 kernel: [<f8993680>] inode_go_drop_th+0x13/0x18 [gfs]
Mar 27 12:29:26 ifs4 kernel: [<f89901f9>] rq_demote+0x79/0x95 [gfs]
Mar 27 12:29:26 ifs4 kernel: [<f89902b4>] run_queue+0x56/0xbb [gfs]
Mar 27 12:29:26 ifs4 kernel: [<f89903d6>] unlock_on_glock+0x1f/0x29 [gfs]
Mar 27 12:29:26 ifs4 kernel: [<f899232a>] gfs_reclaim_glock+0xbf/0x138 [gfs]
Mar 27 12:29:26 ifs4 kernel: [<f8986682>] gfs_glockd+0x3b/0xe3 [gfs]
Mar 27 12:29:26 ifs4 kernel: [<c0100ed9>] kernel_thread_helper+0x5/0xb
Mar 27 12:29:26 ifs4 kernel: Code: 73 34 ff 73 2c ff 73 08 ff 73 04 ff 
73 0c 56 8b 03 ff 70 18 68 a0 a6 aa f8 e8 80 19 67 c7 83 c4 34 68 89 9d 
aa f8 e8 73 19 67 c7 <0f> 0b 65 01 c0 a4 aa f8 68 a0 a5 aa f8 e8 27 12 
67 c7 8d 65 f8
Mar 27 12:29:36 ifs4 kernel: ------------[ cut here ]------------
Mar 27 12:29:36 ifs4 kernel: kernel BUG at 
/usr/src/gfs/stable_1.0.2/stable/cluster/gfs-kernel/src/dlm/lock.c:357!
Mar 27 12:29:36 ifs4 kernel: invalid opcode: 0000 [#2]
Mar 27 12:29:36 ifs4 kernel: SMP
Mar 27 12:29:36 ifs4 kernel: Modules linked in: lock_dlm dlm cman 
dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage 
piix e1000 gfs lock_harness dm_mod
Mar 27 12:29:36 ifs4 kernel: CPU:    1
Mar 27 12:29:36 ifs4 kernel: EIP:    0060:[<f8aa5586>]    Tainted: GF 
   VLI
Mar 27 12:29:36 ifs4 kernel: EFLAGS: 00010206   (2.6.16-rc5-sara3 #1)
Mar 27 12:29:36 ifs4 kernel: EIP is at do_dlm_unlock+0x91/0xaa [lock_dlm]
Mar 27 12:29:36 ifs4 kernel: eax: 00000004   ebx: e948f6c0   ecx: 
c034ab40   edx: 00000206
Mar 27 12:29:36 ifs4 kernel: esi: ffffffea   edi: f8b57000   ebp: 
f4fafee8   esp: f4fafedc
Mar 27 12:29:36 ifs4 kernel: ds: 007b   es: 007b   ss: 0068
Mar 27 12:29:36 ifs4 kernel: Process gfs_glockd (pid: 6472, 
threadinfo=f4fae000 task=f5d44550)
Mar 27 12:29:36 ifs4 kernel: Stack: <0>f8aa9d89 f8b57000 f33799a8 
f4fafef4 f8aa5824 e948f6c0 f4faff08 f899a7bc
Mar 27 12:29:36 ifs4 kernel: e948f6c0 00000003 f33799cc f4faff2c 
f8990ca4 f8b57000 e948f6c0 00000003
Mar 27 12:29:36 ifs4 kernel: f89c4ec0 f33799a8 00000001 f33799a8 
f4faff40 f8993680 f33799a8 f33799a8
Mar 27 12:29:36 ifs4 kernel: Call Trace:
Mar 27 12:29:36 ifs4 kernel: [<c0103599>] show_stack_log_lvl+0xad/0xb5
Mar 27 12:29:36 ifs4 kernel: [<c01036db>] show_registers+0x10d/0x176
Mar 27 12:29:36 ifs4 kernel: [<c01038ad>] die+0xf2/0x16d
Mar 27 12:29:36 ifs4 kernel: [<c0103996>] do_trap+0x6e/0x8a
Mar 27 12:29:36 ifs4 kernel: [<c0103bed>] do_invalid_op+0x90/0x97
Mar 27 12:29:36 ifs4 kernel: [<c010322f>] error_code+0x4f/0x54
Mar 27 12:29:36 ifs4 kernel: [<f8aa5824>] lm_dlm_unlock+0x1d/0x24 [lock_dlm]
Mar 27 12:29:36 ifs4 kernel: [<f899a7bc>] gfs_lm_unlock+0x2c/0x46 [gfs]
Mar 27 12:29:36 ifs4 kernel: [<f8990ca4>] gfs_glock_drop_th+0xf0/0x12d [gfs]
Mar 27 12:29:36 ifs4 kernel: [<f8993680>] inode_go_drop_th+0x13/0x18 [gfs]
Mar 27 12:29:36 ifs4 kernel: [<f89901f9>] rq_demote+0x79/0x95 [gfs]
Mar 27 12:29:36 ifs4 kernel: [<f89902b4>] run_queue+0x56/0xbb [gfs]
Mar 27 12:29:36 ifs4 kernel: [<f89903d6>] unlock_on_glock+0x1f/0x29 [gfs]
Mar 27 12:29:36 ifs4 kernel: [<f899232a>] gfs_reclaim_glock+0xbf/0x138 [gfs]
Mar 27 12:29:36 ifs4 kernel: [<f8986682>] gfs_glockd+0x3b/0xe3 [gfs]
Mar 27 12:29:36 ifs4 kernel: [<c0100ed9>] kernel_thread_helper+0x5/0xb
Mar 27 12:29:36 ifs4 kernel: Code: 73 34 ff 73 2c ff 73 08 ff 73 04 ff 
73 0c 56 8b 03 ff 70 18 68 a0 a6 aa f8 e8 80 19 67 c7 83 c4 34 68 89 9d 
aa f8 e8 73 19 67 c7 <0f> 0b 65 01 c0 a4 aa f8 68 a0 a5 aa f8 e8 27 12 
67 c7 8d 65 f8


=== FS2 ==
Mar 27 12:28:25 ifs2 kernel: ------------[ cut here ]------------
Mar 27 12:28:25 ifs2 kernel: kernel BUG at 
/usr/src/gfs/stable_1.0.2/stable/cluster/cman-kernel/src/membership.c:3151!
Mar 27 12:28:25 ifs2 kernel: invalid opcode: 0000 [#1]
Mar 27 12:28:25 ifs2 kernel: SMP
Mar 27 12:28:25 ifs2 kernel: Modules linked in: lock_dlm dlm cman 
dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage 
piix e1000 gfs lock_harness dm_mod
Mar 27 12:28:25 ifs2 kernel: CPU:    0
Mar 27 12:28:25 ifs2 kernel: EIP:    0060:[<f8ac7825>]    Tainted: GF 
   VLI
Mar 27 12:28:25 ifs2 kernel: EFLAGS: 00010246   (2.6.16-rc5-sara3 #1)
Mar 27 12:28:25 ifs2 kernel: EIP is at elect_master+0x34/0x41 [cman]
Mar 27 12:28:25 ifs2 kernel: eax: f8a3e000   ebx: 00000080   ecx: 
00000080   edx: 00000000
Mar 27 12:28:25 ifs2 kernel: esi: f8adb584   edi: f677bfcc   ebp: 
f677bf7c   esp: f677bf78
Mar 27 12:28:25 ifs2 kernel: ds: 007b   es: 007b   ss: 0068
Mar 27 12:28:25 ifs2 kernel: Process cman_memb (pid: 5892, 
threadinfo=f677a000 task=f6984030)
Mar 27 12:28:25 ifs2 kernel: Stack: <0>f63374a0 f677bf90 f8ac5430 
f677bf98 00000000 f63374a0 f677bfa4 f8ac3acc
Mar 27 12:28:25 ifs2 kernel: f68dba40 00000000 f6984030 f677bfe4 
f8ac3cb3 0000001f 00000000 c0102612
Mar 27 12:28:25 ifs2 kernel: 00000000 f6984030 c01124e1 00100100 
00200200 00000000 00000000 00000000
Mar 27 12:28:25 ifs2 kernel: Call Trace:
Mar 27 12:28:25 ifs2 kernel: [<c0103599>] show_stack_log_lvl+0xad/0xb5
Mar 27 12:28:25 ifs2 kernel: [<c01036db>] show_registers+0x10d/0x176
Mar 27 12:28:25 ifs2 kernel: [<c01038ad>] die+0xf2/0x16d
Mar 27 12:28:25 ifs2 kernel: [<c0103996>] do_trap+0x6e/0x8a
Mar 27 12:28:25 ifs2 kernel: [<c0103bed>] do_invalid_op+0x90/0x97
Mar 27 12:28:25 ifs2 kernel: [<c010322f>] error_code+0x4f/0x54
Mar 27 12:28:25 ifs2 kernel: [<f8ac5430>] a_node_just_died+0x118/0x178 
[cman]
Mar 27 12:28:25 ifs2 kernel: [<f8ac3acc>] process_dead_nodes+0x4e/0x7a 
[cman]
Mar 27 12:28:25 ifs2 kernel: [<f8ac3cb3>] membership_kthread+0x1bb/0x38d 
[cman]
Mar 27 12:28:25 ifs2 kernel: [<c0100ed9>] kernel_thread_helper+0x5/0xb
Mar 27 12:28:25 ifs2 kernel: Code: 8b 1d 44 c3 ad f8 39 d9 7d 21 a1 48 
c3 ad f8 8b 14 88 85 d2 74 10 83 7a 1c 02 75 0a 8b 45 08 89 10 8b 42 14 
eb 0f 41 39 d9 7c df <0f> 0b 4f 0c 60 f2 ac f8 31 c0 5b 5d c3 55 89 e5 
ff 35 48 c3 ad

== fs3 ==
ar 27 12:22:30 ifs3 kernel: kernel BUG at 
/usr/src/gfs/stable_1.0.2/stable/cluster/gfs-kernel/src/dlm/lock.c:428!
Mar 27 12:22:30 ifs3 kernel: invalid opcode: 0000 [#1]
Mar 27 12:22:30 ifs3 kernel: SMP
Mar 27 12:22:30 ifs3 kernel: Modules linked in: lock_dlm dlm cman 
dm_round_robin dm_multipath sg ide_floppy ide_cd cdrom qla2xxx siimage 
piix e1000 gfs lock_harness dm_mod
Mar 27 12:22:30 ifs3 kernel: CPU:    0
Mar 27 12:22:30 ifs3 kernel: EIP:    0060:[<f8aa5714>]    Tainted: GF 
   VLI
Mar 27 12:22:30 ifs3 kernel: EFLAGS: 00010246   (2.6.16-rc5-sara3 #1)
Mar 27 12:22:30 ifs3 kernel: EIP is at do_dlm_lock+0x138/0x152 [lock_dlm]
Mar 27 12:22:30 ifs3 kernel: eax: 00000004   ebx: ffffffea   ecx: 
0000b152   edx: 00000246
Mar 27 12:22:30 ifs3 kernel: esi: eb12b4c0   edi: f5f8f180   ebp: 
f62edb00   esp: f62edacc
Mar 27 12:22:30 ifs3 kernel: ds: 007b   es: 007b   ss: 0068
Mar 27 12:22:30 ifs3 kernel: Process nfsd (pid: 6278, 
threadinfo=f62ec000 task=f62c1030)
Mar 27 12:22:30 ifs3 kernel: Stack: <0>f8aa9d89 00000001 20202020 
32202020 20202020 20202020 32363820 33646565
Mar 27 12:22:30 ifs3 kernel: f62e0018 70baf030 eb12b4c0 00000008 
f8b57000 f62edb34 f8aa57b9 eb12b4c0
Mar 27 12:22:30 ifs3 kernel: 00000000 eb12b4c0 00000008 ffffffff 
00000003 00000003 eb12b4c0 00000000
Mar 27 12:22:30 ifs3 kernel: Call Trace:
Mar 27 12:22:30 ifs3 kernel: [<c0103599>] show_stack_log_lvl+0xad/0xb5
Mar 27 12:22:30 ifs3 kernel: [<c01036db>] show_registers+0x10d/0x176
Mar 27 12:22:30 ifs3 kernel: [<c01038ad>] die+0xf2/0x16d
Mar 27 12:22:30 ifs3 kernel: [<c0103996>] do_trap+0x6e/0x8a
Mar 27 12:22:30 ifs3 kernel: [<c0103bed>] do_invalid_op+0x90/0x97
Mar 27 12:22:30 ifs3 kernel: [<c010322f>] error_code+0x4f/0x54
Mar 27 12:22:30 ifs3 kernel: [<f8aa57b9>] lm_dlm_lock+0x4f/0x5b [lock_dlm]
Mar 27 12:22:30 ifs3 kernel: [<f899a776>] gfs_lm_lock+0x32/0x4c [gfs]
Mar 27 12:22:30 ifs3 kernel: [<f89909f8>] gfs_glock_xmote_th+0x125/0x161 
[gfs]
Mar 27 12:22:30 ifs3 kernel: [<f8993625>] inode_go_xmote_th+0x20/0x25 [gfs]
Mar 27 12:22:30 ifs3 kernel: [<f89900fd>] rq_promote+0xb3/0x136 [gfs]
Mar 27 12:22:30 ifs3 kernel: [<f89902e3>] run_queue+0x85/0xbb [gfs]
Mar 27 12:22:30 ifs3 kernel: [<f8991281>] gfs_glock_nq+0xce/0x119 [gfs]
Mar 27 12:22:30 ifs3 kernel: [<f89917f0>] gfs_glock_nq_init+0x1d/0x36 [gfs]
Mar 27 12:22:30 ifs3 kernel: [<f8991858>] gfs_glock_nq_num+0x37/0x7f [gfs]
Mar 27 12:22:30 ifs3 kernel: [<f89a17d1>] gfs_get_dentry+0xa9/0x29c [gfs]
Mar 27 12:22:30 ifs3 kernel: [<c01915bc>] find_exported_dentry+0x2f/0x4e1
Mar 27 12:22:30 ifs3 kernel: [<f89a1461>] gfs_decode_fh+0xc1/0xc9 [gfs]
Mar 27 12:22:30 ifs3 kernel: [<c0193c37>] fh_verify+0x35f/0x4db
Mar 27 12:22:30 ifs3 kernel: [<c0194d5c>] nfsd_access+0x27/0xe5
Mar 27 12:22:30 ifs3 kernel: [<c019ab4f>] nfsd3_proc_access+0x95/0xa2
Mar 27 12:22:30 ifs3 kernel: [<c019229f>] nfsd_dispatch+0xbe/0x17f
Mar 27 12:22:30 ifs3 kernel: [<c02e0a52>] svc_process+0x381/0x5c7
Mar 27 12:22:30 ifs3 kernel: [<c019208c>] nfsd+0x18d/0x2e2
Mar 27 12:22:30 ifs3 kernel: [<c0100ed9>] kernel_thread_helper+0x5/0xb
Mar 27 12:22:30 ifs3 kernel: Code: 26 50 0f bf 46 24 50 53 ff 76 08 ff 
76 04 ff 76 0c ff 77 18 68 e0 a6 aa f8 e8 f2 17 67 c7 83 c4 38 68 89 9d 
aa f8 e8 e5 17 67 c7 <0f> 0b ac 01 c0 a4 aa f8 68 a0 a5 aa f8 e8 99 10 
67 c7 8d 65 f4


-- 
--
********************************************************************
*                                                                  *
*  Bas van der Vlies                     e-mail: basv at sara.nl      *
*  SARA - Academic Computing Services    phone:  +31 20 592 8012   *
*  Kruislaan 415                         fax:    +31 20 6683167    *
*  1098 SJ Amsterdam                                               *
*                                                                  *
********************************************************************




More information about the Linux-cluster mailing list