[Linux-cluster] Occasional kernel panics

Farid Bavandpouri fbavandpouri at amcc.com
Tue Oct 25 16:41:27 UTC 2005


Unsubscribe mmontaseri at amcc.com

He no longer works at AMCC.

-----Original Message-----
From: linux-cluster-bounces at redhat.com
[mailto:linux-cluster-bounces at redhat.com] On Behalf Of Ethan Sommer
Sent: Monday, October 24, 2005 5:48 PM
To: linux-cluster at redhat.com
Subject: [Linux-cluster] Occasional kernel panics

Every few days or so our cluster machines seem to have kernel panics
comp laing about GFS locking (although its pretty irregular, we went for
a few weeks without an outage)

We noticed that this happened a LOT, and it was reproducible when
certain users accessed files, when we were serving afp off the cluster.
We have changed things since then so that afp is run on a server which
nfs mounts the cluster.

We are running FC4 with the gfs modules from yum.


Here is our most recent kernel panics, followed by one from when we had
afp running on the cluster: (it looks like there is relevant info above
the cut-here, possibly if it might be helpful)



Oct 19 14:44:41 meow kernel: ------------[ cut here ]------------ Oct 19
14:44:41 meow kernel: kernel BUG at
/usr/src/build/607755-i686/BUILD/smp/src/lockqueue.c:1144!
Oct 19 14:44:41 meow kernel: invalid operand: 0000 [#1] Oct 19 14:44:41
meow kernel: SMP Oct 19 14:44:41 meow kernel: Modules linked in: nfsd
exportfs lockd
autofs4 lock_dlm(U) gfs(U) lock_harness(U) rfcomm l2cap bluetooth dlm(U)
cman(U) md5 ip
v6 sunrpc ipt_LOG ipt_limit ipt_state ip_conntrack iptable_filter
ip_tables video button battery ac uhci_hcd ehci_hcd hw_random i2c_i801
i2c_core shpchp e1000 floppy ext3 jbd raid1 dm_mod qla2200 qla2xxx
scsi_transport_fc ata_piix libata sd_mod scsi_mod
Oct 19 14:44:41 meow kernel: CPU:    1
Oct 19 14:44:41 meow kernel: EIP:    0060:[<f8af9dcf>]    Not tainted
VLI
Oct 19 14:44:41 meow kernel: EFLAGS: 00010292   (2.6.12-1.1447_FC4smp)
Oct 19 14:44:41 meow kernel: EIP is at
process_cluster_request+0xddb/0xdef [dlm]
Oct 19 14:44:41 meow kernel: eax: 00000004   ebx: 00000000   ecx:
c035fa4c   edx: 00000286
Oct 19 14:44:41 meow kernel: esi: f7fb8400   edi: 00000000   ebp:
d2988000   esp: f7eefe24
Oct 19 14:44:41 meow kernel: ds: 007b   es: 007b   ss: 0068
Oct 19 14:44:41 meow kernel: Process dlm_recvd (pid: 2402,
threadinfo=f7eef000 task=f7851020) Oct 19 14:44:41 meow kernel: Stack:
f8b0621b 00000001 f8b071e0 f8b06217
2583f987 00000001 00000040 00004000
Oct 19 14:44:41 meow kernel:        f7eefe48 00000000 c038e1a0 00000a58
f0167b00 c02a26c1 00000a58 00004040
Oct 19 14:44:41 meow kernel:        00000072 f7eefed4 00000000 00000001
00000246 00000000 edd6eeb8 00000000
Oct 19 14:44:41 meow kernel: Call Trace:
Oct 19 14:44:41 meow kernel:  [<c02a26c1>] sock_recvmsg+0x103/0x11e Oct
19 14:44:41 meow kernel:  [<f8afd46b>]
midcomms_process_incoming_buffer+0x13b/0x25f [dlm] Oct 19 14:44:41 meow
kernel:  [<c011ce54>] load_balance_newidle+0x23/0x82 Oct 19 14:44:41
meow kernel:  [<f8afb3d3>] receive_from_sock+0x196/0x2c9 [dlm] Oct 19
14:44:41 meow kernel:  [<c0307705>] schedule+0x405/0xc5e Oct 19 14:44:41
meow kernel:  [<c0307731>] schedule+0x431/0xc5e Oct 19 14:44:41 meow
kernel:  [<f8afc457>] dlm_recvd+0x0/0x9c [dlm] Oct 19 14:44:41 meow
kernel:  [<f8afc2d3>] process_sockets+0x75/0xb7 [dlm] Oct 19 14:44:41
meow kernel:  [<f8afc4c7>] dlm_recvd+0x70/0x9c [dlm] Oct 19 14:44:41
meow kernel:  [<c0134c09>] kthread+0x93/0x97 Oct 19 14:44:41 meow
kernel:  [<c0134b76>] kthread+0x0/0x97 Oct 19 14:44:41 meow kernel:
[<c01023d1>] kernel_thread_helper+0x5/0xb Oct 19 14:44:41 meow kernel:
Code: 4f 82 62 c7 89 e8 e8 b1 b4 00 00 8b 4c 24 14 89 4c 24 04 c7 04 24
6d 63 b0 f8 e8 34 82 62 c7 c7 04 24 1b 62 b0 f8 e8 28
82 62 c7 <0f> 0b 78 04 e0 71 b0 f8 c7 04 24 70 72 b0 f8 e8 40 78 62 c7
57 Oct 19 14:44:41 meow kernel:  <0>Fatal exception: panic in 5 seconds


Panic 2:

Oct 10 09:58:39 woof kernel: ------------[ cut here ]------------ Oct 10
09:58:39 woof kernel: kernel BUG at
/usr/src/build/607778-i686/BUILD/smp/src/dlm/lock.c:411!
Oct 10 09:58:39 woof kernel: invalid operand: 0000 [#1] Oct 10 09:58:39
woof kernel: SMP Oct 10 09:58:39 woof kernel: Modules linked in: nfsd
exportfs lockd
autofs4 lock_dlm(U) gfs(U) lock_harness(U) rfcomm l2cap bluetooth dlm(U)
cman(U) md5 ip
v6 sunrpc ipt_LOG ipt_limit ipt_state ip_conntrack iptable_filter
ip_tables video button battery ac uhci_hcd ehci_hcd hw_random i2c_i801
i2c_core shpchp e1 000 dm_snapshot dm_zero dm_mirror ext3 jbd raid1
dm_mod qla2200 qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod
Oct 10 09:58:39 woof kernel: CPU:    1
Oct 10 09:58:39 woof kernel: EIP:    0060:[<f8b98bf5>]    Not tainted
VLI
Oct 10 09:58:39 woof kernel: EFLAGS: 00010292   (2.6.12-1.1447_FC4smp)
Oct 10 09:58:39 woof kernel: EIP is at do_dlm_lock+0x1b7/0x21d
[lock_dlm]
Oct 10 09:58:39 woof kernel: eax: 00000004   ebx: 00000000   ecx:
c035fa4c   edx: 00000292
Oct 10 09:58:39 woof kernel: esi: f7848140   edi: ffffffea   ebp:
00000003   esp: c74b3cfc
Oct 10 09:58:39 woof kernel: ds: 007b   es: 007b   ss: 0068
Oct 10 09:58:39 woof kernel: Process imapd (pid: 24278,
threadinfo=c74b3000 task=f4721a80) Oct 10 09:58:39 woof kernel: Stack:
f8b9de75 f7848140 00000003 1bbe0000 00000000 ffffffea 00000003 00000005
Oct 10 09:58:39 woof kernel:        0000000d 00000005 00000000 f58c0a00
00000001 0000000d 20200000 20202020
Oct 10 09:58:39 woof kernel:        20203320 20202020 62312020 30306562
00183030 c8fb2f00 00000001 00000001
Oct 10 09:58:39 woof kernel: Call Trace:
Oct 10 09:58:39 woof kernel:  [<f8b98cff>] lm_dlm_lock+0x52/0x5e
[lock_dlm] Oct 10 09:58:39 woof kernel:  [<f8b98cad>]
lm_dlm_lock+0x0/0x5e [lock_dlm] Oct 10 09:58:39 woof kernel:
[<f8bd000c>] gfs_lm_lock+0x3d/0x5c [gfs] Oct 10 09:58:39 woof kernel:
[<f8bc5039>] gfs_glock_xmote_th+0xae/0x1d3 [gfs] Oct 10 09:58:39 woof
kernel:  [<f8bc463c>] rq_promote+0x126/0x150 [gfs] Oct 10 09:58:39 woof
kernel:  [<f8bc4840>] run_queue+0xee/0x113 [gfs] Oct 10 09:58:39 woof
kernel:  [<f8bc5af1>] gfs_glock_nq+0x93/0x144 [gfs] Oct 10 09:58:39 woof
kernel:  [<f8bc619d>] gfs_glock_nq_init+0x18/0x2d [gfs] Oct 10 09:58:39
woof kernel:  [<f8be3926>] get_local_rgrp+0xca/0x1b0 [gfs] Oct 10
09:58:39 woof kernel:  [<f8be3a9c>] gfs_inplace_reserve_i+0x90/0xd0
[gfs] Oct 10 09:58:39 woof kernel:  [<f8be046b>]
gfs_quota_lock_m+0xbf/0x117 [gfs] Oct 10 09:58:39 woof kernel:
[<f8bd8a2b>] do_do_write_buf+0x3a1/0x485 [gfs] Oct 10 09:58:39 woof
kernel:  [<f8bc56a1>] glock_wait_internal+0x16b/0x26a [gfs] Oct 10
09:58:39 woof kernel:  [<f8bd8c91>] do_write_buf+0x182/0x1b6 [gfs] Oct
10 09:58:39 woof kernel:  [<f8bd7be5>] walk_vm+0xb3/0x111 [gfs] Oct 10
09:58:39 woof kernel:  [<f8bd8d65>] gfs_write+0xa0/0xc2 [gfs] Oct 10
09:58:39 woof kernel:  [<f8bd8b0f>] do_write_buf+0x0/0x1b6 [gfs] Oct 10
09:58:39 woof kernel:  [<f8bd8cc5>] gfs_write+0x0/0xc2 [gfs] Oct 10
09:58:39 woof kernel:  [<c0162987>] vfs_write+0x9e/0x110 Oct 10 09:58:39
woof kernel:  [<c0162aa4>] sys_write+0x41/0x6a Oct 10 09:58:39 woof
kernel:  [<c0104035>] syscall_call+0x7/0xb Oct 10 09:58:39 woof kernel:
Code: 7c 24 14 89 4c 24 0c 89 5c 24 10 89 6c 24 08 89 74 24 04 c7 04 24
28 e6 b9 f8 e8 0e 94 58 c7 c7 04 24 75 de
b9 f8 e8 02
94 58 c7 <0f> 0b 9b 01 a0 e4 b9 f8 c7 04 24 3c e5 b9 f8 e8 1a 8a 58 c7
66 Oct 10 09:58:39 woof kernel:  <0>Fatal exception: panic in 5 seconds




Sep  7 15:37:44 meow kernel: ------------[ cut here ]------------ Sep  7
15:37:44 meow kernel: kernel BUG at
/usr/src/build/588748-i686/BUILD/smp/src/dlm/plock.c:500!
Sep  7 15:37:44 meow kernel: invalid operand: 0000 [#1] Sep  7 15:37:44
meow kernel: SMP Sep  7 15:37:44 meow kernel: Modules linked in:
appletalk nfsd exportfs lockd autofs4 lock_dlm(U) gfs(U) lock_harness(U)
rfcomm l2cap bluetooth
dlm(U) cman
(U) sunrpc md5 ipv6 ipt_LOG ipt_limit ipt_state ip_conntrack
iptable_filter ip_tables video button battery ac uhci_hcd ehci_hcd
hw_random i2c_i801 i2c_core shpchp e1000 floppy ext3 jbd raid1 dm_mod
qla2200 qla2xxx scsi_transport_fc ata_piix libata sd_mod scsi_mod
Sep  7 15:37:44 meow kernel: CPU:    3
Sep  7 15:37:44 meow kernel: EIP:    0060:[<f8b9a3f7>]    Tainted:
GF     VLI
Sep  7 15:37:44 meow kernel: EFLAGS: 00010292   (2.6.12-1.1398_FC4smp)
Sep  7 15:37:44 meow kernel: EIP is at update_lock+0x87/0x9b [lock_dlm]
Sep  7 15:37:44 meow kernel: eax: 00000004   ebx: fffffff5   ecx:
c035ca4c   edx: 00000282
Sep  7 15:37:44 meow kernel: esi: 00000000   edi: e99c2c00   ebp:
00000000   esp: d05dedb4
Sep  7 15:37:44 meow kernel: ds: 007b   es: 007b   ss: 0068
Sep  7 15:37:44 meow kernel: Process afpd (pid: 3872,
threadinfo=d05de000 task=d6447550) Sep  7 15:37:44 meow kernel: Stack:
badc0ded f8b9d0d6 fffffff5 f8b9da70
f8b9d101 06609291 f7943000 00000000
Sep  7 15:37:44 meow kernel:        f8b9a499 7ffffff8 00000000 7ffffff8
00000000 d05dede8 d7636700 7ffffff8
Sep  7 15:37:44 meow kernel:        00000000 d05deea8 d05dee28 f8b9a987
00000001 7ffffff8 00000000 7ffffff8
Sep  7 15:37:44 meow kernel: Call Trace:
Sep  7 15:37:44 meow kernel:  [<f8b9a499>] add_lock+0x8e/0xed [lock_dlm]
Sep  7 15:37:44 meow kernel:  [<f8b9a987>] fill_gaps+0x87/0x10e
[lock_dlm] Sep  7 15:37:44 meow kernel:  [<f8b9aa51>]
lock_case3+0x43/0xac [lock_dlm] Sep  7 15:37:44 meow kernel:
[<f8b9aeac>] plock_internal+0x1aa/0x370 [lock_dlm] Sep  7 15:37:44 meow
kernel:  [<f8b9b614>] lm_dlm_plock+0x25b/0x2dc [lock_dlm] Sep  7
15:37:44 meow kernel:  [<f8b9b3b9>] lm_dlm_plock+0x0/0x2dc [lock_dlm]
Sep  7 15:37:44 meow kernel:  [<f8bdc1c3>] gfs_lm_plock+0x45/0x57 [gfs]
Sep  7 15:37:44 meow kernel:  [<f8be5731>] gfs_lock+0xcd/0x11c [gfs] Sep
7 15:37:44 meow kernel:  [<f8be5664>] gfs_lock+0x0/0x11c [gfs] Sep  7
15:37:44 meow kernel:  [<c0176c4f>] fcntl_setlk64+0x16c/0x26a Sep  7
15:37:44 meow kernel:  [<c0162e93>] fget+0x3b/0x42 Sep  7 15:37:44 meow
kernel:  [<c0172bfd>] sys_fcntl64+0x55/0x97 Sep  7 15:37:44 meow kernel:
[<c0104025>] syscall_call+0x7/0xb Sep  7 15:37:44 meow kernel: Code: 01
00 00 c7 04 24 a8 da b9 f8 e8 7c
77 58 c7 89 5c 24 04 c7 04 24 08 d1 b9 f8 e8 6c 77 58 c7 c7 04 24 d6 d0
b9 f8 e8 60
77 58 c7 <0f> 0b f4 01 70 da b9 f8 c7 04 24 10 db b9 f8 e8 78 6d 58 c7
55 Sep  7 15:37:44 meow kernel:  <0>Fatal exception: panic in 5 seconds


Thanks for any help,
  Ethan





--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster
--------------------------------------------------------

CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Applied Micro Circuits Corporation or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. All unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message.




More information about the Linux-cluster mailing list