[rhelv6-list] RHEL6 kernel 2.6.32-71.14.1.el6.x86_64 panic...

Masopust, Christian christian.masopust at siemens.com
Tue Feb 15 08:51:29 UTC 2011


Short update to my kernel-panic....

it turns out that this panic is caused by a bug in RHEL6 kernel (all 2.6.32-71.*) and happens
if somebody telnets to nlockmgr and simply press <return> several times.

newer kernels (in fedora 14) don't have this bug anymore.  does probably anybody know a
"bug number" or when this bug has been fixed in kernel?

thanks a lot,
christian

p.s.: i'm already in contact with redhat-support, but don't have a fix up to now....


________________________________
Von: rhelv6-list-bounces at redhat.com [mailto:rhelv6-list-bounces at redhat.com] Im Auftrag von Masopust, Christian
Gesendet: Mittwoch, 09. Februar 2011 12:01
An: 'Red Hat Enterprise Linux 6 (Santiago) discussion mailing-list'
Betreff: [rhelv6-list] RHEL6 kernel 2.6.32-71.14.1.el6.x86_64 panic...

Hi all,

some of my RHEL6-systems are facing a kernel panic from time to time. 2 of them are huge HP's (DL585 G7)
with 48cores and 128GB, one of them is an older Primergy RX300S2 (4 cores, 8GB). Some other systems
(also Primergies) run fine all the time...

some other facts:
- all filesystems ext4
- nfs4 enabled
- 3 bonding devices, each having 2 physical devices
- 2 of the bonding devices configured for jumbo frames (MTU=9000)


Here's the console-log from one of the HP's:

------------[ cut here ]------------
kernel BUG at fs/inode.c:1333!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu47/cache/index2/shared_cpu_map
CPU 4
Modules linked in: iptable_filter ip_tables nfs fscache fuse nfsd nfs_acl auth_rpcgss exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler
lockd sunrpc bonding ipv6 dm_mirror dm_region_hash dm_log uinput power_meter hwmon bnx2 amd64_edac_mod edac_core edac_mce_amd i2c_piix4 sg h
pilo nx_nic(U) ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic pata_atiixp ahci hpsa(U) radeon ttm drm_kms_helper drm
 i2c_algo_bit i2c_core dm_mod [last unloaded: freq_table]

Modules linked in: iptable_filter ip_tables nfs fscache fuse nfsd nfs_acl auth_rpcgss exportfs autofs4 ipmi_devintf ipmi_si ipmi_msghandler
lockd sunrpc bonding ipv6 dm_mirror dm_region_hash dm_log uinput power_meter hwmon bnx2 amd64_edac_mod edac_core edac_mce_amd i2c_piix4 sg h
pilo nx_nic(U) ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic pata_atiixp ahci hpsa(U) radeon ttm drm_kms_helper drm
 i2c_algo_bit i2c_core dm_mod [last unloaded: freq_table]
Pid: 3393, comm: lockd Tainted: G        W  ----------------  2.6.32-71.14.1.el6.x86_64 #1 ProLiant DL585 G7
RIP: 0010:[<ffffffff81186bf9>]  [<ffffffff81186bf9>] iput+0x69/0x70
RSP: 0018:ffff88082b86fce0  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8802fc8616c8 RCX: 000000000000c60e
RDX: ffff88202e13a901 RSI: ffffffffa0341de0 RDI: ffff8802fc8616c8
RBP: ffff88082b86fcf0 R08: 000000000002ac45 R09: 0000000000000000
R10: 000000000000000f R11: 0000000000000000 R12: ffff880227b49c00
R13: ffffffffa034e060 R14: ffff88202e13a940 R15: 00000000fffffff5
FS:  00007fac6a0247c0(0000) GS:ffff88002c240000(0000) knlGS:00000000f77916c0
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00007fac6a048000 CR3: 0000000c2da36000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
Process lockd (pid: 3393, threadinfo ffff88082b86e000, task ffff88082d9ab4e0)
Stack:
 ffff88082b86fd40 ffff8802fc861680 ffff88082b86fd10 ffffffff813fdbf1
<0> ffff880227b49c00 ffff880227b49c00 ffff88082b86fd30 ffffffffa03351f8
<0> ffff88082b86fd30 ffff880227b49c10 ffff88082b86fd60 ffffffffa0341e2c
Call Trace:
 [<ffffffff813fdbf1>] sock_release+0x71/0x90
 [<ffffffffa03351f8>] svc_sock_free+0x48/0x70 [sunrpc]
 [<ffffffffa0341e2c>] svc_xprt_free+0x4c/0x70 [sunrpc]
 [<ffffffffa0341de0>] ? svc_xprt_free+0x0/0x70 [sunrpc]
 [<ffffffff8125cb97>] kref_put+0x37/0x70
 [<ffffffffa0340f29>] svc_xprt_put+0x19/0x20 [sunrpc]
 [<ffffffffa0341191>] svc_xprt_release+0xc1/0xe0 [sunrpc]
 [<ffffffffa03415bd>] svc_recv+0x2ed/0x830 [sunrpc]
 [<ffffffff8105c530>] ? default_wake_function+0x0/0x20
 [<ffffffffa02f6291>] lockd+0xc1/0x230 [lockd]
 [<ffffffffa02f61d0>] ? lockd+0x0/0x230 [lockd]
 [<ffffffff81091a76>] kthread+0x96/0xa0
 [<ffffffff810141ca>] child_rip+0xa/0x20
 [<ffffffff810919e0>] ? kthread+0x0/0xa0
 [<ffffffff810141c0>] ? child_rip+0x0/0x20
Code: 38 48 c7 c0 f0 7c 18 81 48 85 d2 74 12 48 8b 42 20 48 c7 c2 f0 7c 18 81 48 85 c0 48 0f 44 c2 48 89 df ff d0 48 83 c4 08 5b c9 c3 <0f>
0b eb fe 0f 1f 00 55 48 89 e5 41 55 41 54 53 48 83 ec 08 0f
RIP  [<ffffffff81186bf9>] iput+0x69/0x70
 RSP <ffff88082b86fce0>
ÿMounting proc filesystem
Mounting sysfs filesystem
Creating /dev
Creating initial device nodes
Free memory/Total memory (free %): 456164 / 495584 ( 92.0457 )
Loading jbd2.ko module
Loading mbcache.ko module
Loading ext4.ko module
Loading crc-t10dif.ko module
Loading sd_mod.ko module
Loading ata_generic.ko module
Loading exportfs.ko module
Loading autofs4.ko module
Loading ipmi_msghandler.ko module
Loading sunrpc.ko module
Loading ipv6.ko module
Loading uinput.ko module
Loading hwmon.ko module
Loading bnx2.ko module
Loading edac_core.ko module
Loading edac_mce_amd.ko module
Loading sg.ko module
Loading hpilo.ko module
Loading nx_nic.ko module
Loading cdrom.ko module
Loading pata_acpi.ko module
Loading pata_atiixp.ko module
Loading ahci.ko module
Loading hpsa.ko module
hpsa 0000:03:00.0: controller message 03:00 timed out
hpsa 0000:03:00.0: controller message 03:00 timed out
hpsa 0000:03:00.0: controller message 03:00 timed out
hpsa 0000:44:00.0: controller message 03:00 timed out
hpsa 0000:44:00.0: controller message 03:00 timed out
hpsa 0000:44:00.0: controller message 03:00 timed out
Loading i2c-core.ko module
Loading dm-mod.ko module
Loading nfs_acl.ko module
Loading auth_rpcgss.ko module
Loading ipmi_devintf.ko module
Loading ipmi_si.ko module
Loading lockd.ko module
Loadingpower_meter ACPI000D:00: Ignoring unsafe software power cap!
 bonding.ko module
Loading dm-log.ko module
Loading power_meter.ko module
Loading amd64_edac_mod.ko module
Loading i2c-piix4.ko module
Loading sr_mod.ko module
Loading drm.ko module
Loading i2c-algo-bit.ko module
Loading nfsd.ko module
Loading dm-region-hash.ko module
Loading ttm.ko module
Loading drm_kms_helper.ko module
Loading dm-mirror.ko module
Loading radeon.ko module
Waiting for required block device discovery
Waiting for 8 sdd-like device(s)...Found
Creating Block Devices
Creating block device loop0
Creating block device loop1
Creating block device loop2
Creating block device loop3
Creating block device loop4
Creating block device loop5
Creating block device loop6
Creating block device loop7
Creating block device ram0
Creating block device ram1
Creating block device ram10
Creating block device ram11
Creating block device ram12
Creating block device ram13
Creating block device ram14
Creating block device ram15
Creating block device ram2
Creating block device ram3
Creating block device ram4
Creating block device ram5
Creating block device ram6
Creating block device ram7
Creating block device ram8
Creating block device ram9
Creating block device sda
Creating block device sdb
Creating block device sdc
Creating block device sdd
Creating block device sr0
mdadm: No arrays found in config file or automatically
Free memory/Total memory (free %): 432796 / 495584 ( 87.3305 )
Saving to the local filesystem /dev/sdd1
e2fsck 1.41.12 (17-May-2010)
Homes: recovering journal
Homes: clean, 9073003/164782080 files, 387571383/659105347 blocks
Free memory/Total memory (free %): 427248 / 495584 ( 86.211 )
Copying data                       : [  2 %]
Copying data                       : [100 %]
Saving core complete
Restarting system.
Backtrace from crash-dump utility shows:

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-71.14.1.el6.x86_64/vmlinux
    DUMPFILE: ./vmcore  [PARTIAL DUMP]
        CPUS: 48
        DATE: Wed Feb  9 09:30:52 2011
      UPTIME: 14 days, 13:57:19
LOAD AVERAGE: 3.65, 3.39, 3.25
       TASKS: 1663
    NODENAME: hydra.sie.siemens.at
     RELEASE: 2.6.32-71.14.1.el6.x86_64
     VERSION: #1 SMP Wed Jan 5 17:01:01 EST 2011
     MACHINE: x86_64  (2095 Mhz)
      MEMORY: 128 GB
       PANIC: "kernel BUG at fs/inode.c:1333!"
         PID: 3393
     COMMAND: "lockd"
        TASK: ffff88082d9ab4e0  [THREAD_INFO: ffff88082b86e000]
         CPU: 4
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 3393   TASK: ffff88082d9ab4e0  CPU: 4   COMMAND: "lockd"
 #0 [ffff88082b86f9a0] machine_kexec at ffffffff8103695b
 #1 [ffff88082b86fa00] crash_kexec at ffffffff810b9068
 #2 [ffff88082b86fad0] oops_end at ffffffff814cc6e0
 #3 [ffff88082b86fb00] die at ffffffff8101733b
 #4 [ffff88082b86fb30] do_trap at ffffffff814cbfb4
 #5 [ffff88082b86fb90] do_invalid_op at ffffffff81014ee5
 #6 [ffff88082b86fc30] invalid_op at ffffffff81013f5b
    [exception RIP: iput+105]
    RIP: ffffffff81186bf9  RSP: ffff88082b86fce0  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff8802fc8616c8  RCX: 000000000000c60e
    RDX: ffff88202e13a901  RSI: ffffffffa0341de0  RDI: ffff8802fc8616c8
    RBP: ffff88082b86fcf0   R8: 000000000002ac45   R9: 0000000000000000
    R10: 000000000000000f  R11: 0000000000000000  R12: ffff880227b49c00
    R13: ffffffffa034e060  R14: ffff88202e13a940  R15: 00000000fffffff5
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff88082b86fcf8] sock_release at ffffffff813fdbf1
 #8 [ffff88082b86fd18] svc_sock_free at ffffffffa03351f8
 #9 [ffff88082b86fd38] svc_xprt_free at ffffffffa0341e2c
#10 [ffff88082b86fd68] kref_put at ffffffff8125cb97
#11 [ffff88082b86fd88] svc_xprt_put at ffffffffa0340f29
#12 [ffff88082b86fd98] svc_xprt_release at ffffffffa0341191
#13 [ffff88082b86fdc8] svc_recv at ffffffffa03415bd
#14 [ffff88082b86fe58] lockd at ffffffffa02f6291
#15 [ffff88082b86fee8] kthread at ffffffff81091a76
#16 [ffff88082b86ff48] kernel_thread at ffffffff810141ca
crash>
any idea? any hint?   what else can i do to find the reason for these panics? how to solve it?

thanks a lot,
christian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rhelv6-list/attachments/20110215/230ec0e4/attachment.htm>


More information about the rhelv6-list mailing list