[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] dlm_recvd + bnx2 oops

Kovacs, Corey J. wrote:
> Morning all. We've been experienceing regular cluster crashes on RHEL4u4.
> This system has 5 nodes and a few dozen nodes mounting shares via nfs.
> Periodically, nodes will panic, get fenced and all continues on. This
> system
> does have some of the HP Product Support Pack installed (not the HP bnx2
> driver). Below is the section from the logs. It is hand typed but I am
> fairly sure
> it's accurrate.
> The machines are  HP DL360-G5's. The nics are Broadcom NeXtreme II 5708's.
> Anyone else seeing this?
> Corey
> ===========================================================
> Unable to handle kernel NULL pointer dereference at virtual address
> 000000ac
> printing eip:
> f8f339ae
> *pde = 37038001
> Oops: 0000 [#1]
> Modules linked in: ipt_multiport iptable_nat ip_conntrack ip_tables
> ip_vs_rr ip_vs cpqci(U) ipmi_dev intf ipmi_si ipmi_msghandler xp(U)
> mptctl mptbase sg autofs4 i2c_dev i2c_core lock_dlm(U) gfs(U)
> lock_harness(U) dlm(U) cman(U) md5 ipv6 nfsd exportfs lockd nfs_acl
> sunrpc joydev dm_mirror button battery ac ehci_hcd uhci_hcd bnx2 ext3
> jbd dm_mod qla6312(U) qla2400(U) qla2300(U) qla2xxx_conf(U) qla2xxx(U)
> cciss sd_mod scsi_mod
> CPU:    0
> EIP:    0060:[<f8f339ae>]     Tainted: P    VLI
> EFLAGS: 00010202    (2.6.9-42.0.2.ELsmp)
> EIP is at bnx2_tx_int+0x48/0x1d1 [bnx2]
> eax: f70620dc   ebx:  00000ad7   ecx:  00000002   edx:  00000037
> esi: 00000a37   edi:  00000000   ebp:  f6a0b200   esp:  c03cefa0
> ds:  007b    es: 007b   ss: 0068
> Process dlm_recvd (pid: 3973, threadinfo=c03ce000 task=f71652f0)
> Stack: f70620dc 00000037 f5c19000 00000000 f6a0b200 f6a0afc0 c03cefd4
> f8f3431d
>        00000000 f6a0afc0 c201fd80 15a3182b c0280e24 000493dc 00000001
> c0392c18
>        0000000a 00000000 c01269b8 f59d4dc4 00000046 c038b900 f59d4000
> c010819f
> Call trace:
>  [<f8f3431d>] bnx2_poll+0x4f/0x142 [bnx2]
>  [<c0280e24>] net_rx_action+0xae/0x160
>  [<c01269b8>] __do_softirq+0x4c/0xb1
>  [<c010819f>] do_softirq+0x4f/0x56

That looks like a driver crash to me. The fact that it's in dlm_recvd is probably just that
it's a busy process doing lots of network IO. There's no DLM code in the stacktrace at all



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]