[Linux-cluster] :BUG: soft lockup - CPU#0 stuck for 67s! [vm.sh:29764]

Hammad Siddiqi hsiddiqi at gmail.com
Mon Jul 15 08:54:40 UTC 2013


Geniuses,

I have a Redhat cluster setup for VMs running on KVM. during the live
migration I have come across a kernel bug related to soft lockup of CPU #
0. Please see the back trace from abrt tool below. The host specs are:

Supermicro Server with AMD Opteron processor (48 cores)
RAM ECC 512 GB
6.4 x86_64
Disk images stored on Netapp volumes shared via NFS on 10Gbps network


The issue may not be related to Clustering Suite (looks like kernel
related) but any help in pointing to the right direction will highly be
appreciated. Please let me know if you require additional
information/logs/output

Thank you
Hammad Siddiqi



abrt_version:   2.0.8
cmdline:        ro root=/dev/mapper/VolGroup-lv_root rd_NO_LUKS
LANG=en_US.UTF-8 rd_NO_MD rd_LVM_LV=VolGroup/lv_swap
SYSFONT=latarcyrheb-sun16 crashkernel=161M at 0M rd_LVM_LV=VolGroup/lv_root
 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM rhgb quiet
comment:        During live migration of KVM VMs (13 VMs at a time)
kernel:         2.6.32-358.6.2.el6.x86_64
logfile:
time:           Mon 15 Jul 2013 12:55:20 AM PDT

sosreport.tar.xz: Binary file, 3153956 bytes

backtrace:
:BUG: soft lockup - CPU#0 stuck for 67s! [vm.sh:29764]
:Modules linked in: act_police cls_u32 sch_ingress cls_fw sch_htb
ip6table_filter ip6_tables ebtable_nat ebtables bridge nfs lockd fscache
auth_rpcgss nfs_acl dlm configfs sunrpc iptable_filter ip_tables
openvswitch xsvhba(U) scsi_transport_fc scsi_tgt xve(U) xsvnic(U) bonding
ipv6 8021q garp stp llc xscore(U) ib_cm mlx4_ib ib_sa ib_mad ib_core
vhost_net macvtap macvlan tun kvm_amd kvm igb dca ptp pps_core mlx4_core sg
serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core
shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mpt2sas
scsi_transport_sas raid_class ata_generic pata_acpi pata_atiixp ahci
usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
:CPU 0
:Modules linked in: act_police cls_u32 sch_ingress cls_fw sch_htb
ip6table_filter ip6_tables ebtable_nat ebtables bridge nfs lockd fscache
auth_rpcgss nfs_acl dlm configfs sunrpc iptable_filter ip_tables
openvswitch xsvhba(U) scsi_transport_fc scsi_tgt xve(U) xsvnic(U) bonding
ipv6 8021q garp stp llc xscore(U) ib_cm mlx4_ib ib_sa ib_mad ib_core
vhost_net macvtap macvlan tun kvm_amd kvm igb dca ptp pps_core mlx4_core sg
serio_raw k10temp amd64_edac_mod edac_core edac_mce_amd i2c_piix4 i2c_core
shpchp ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom mpt2sas
scsi_transport_sas raid_class ata_generic pata_acpi pata_atiixp ahci
usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded:
scsi_wait_scan]
:Pid: 29764, comm: vm.sh Not tainted 2.6.32-358.6.2.el6.x86_64 #1
Supermicro H8QG6/H8QG6
:RIP: 0010:[<ffffffff8105007c>]  [<ffffffff8105007c>]
wait_for_rqlock+0x2c/0x40
:RSP: 0018:ffff887a9febbeb8  EFLAGS: 00000202
:RAX: 0000000003d503b2 RBX: ffff887a9febbeb8 RCX: ffff880028216700
:RDX: 00000000000003d5 RSI: 0000000000000056 RDI: 0000000000000000
:RBP: ffffffff8100bb8e R08: ffff887bd174b500 R09: 0000000000000000
:R10: 0000000000000001 R11: 00000000000004fd R12: ffffffff00000000
:R13: 0000000000007444 R14: ffff887b00040001 R15: 0000000000000011
:FS:  00007f3bf82ec700(0000) GS:ffff880028200000(0000)
knlGS:0000000000000000
:CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
:CR2: 00007f3bf79250a0 CR3: 0000000001a85000 CR4: 00000000000007f0
:DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
:DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
:Process vm.sh (pid: 29764, threadinfo ffff887a9feba000, task
ffff887bd174b500)
:Stack:
:ffff887a9febbf38 ffffffff8107382b ffff888007203668 ffff887a9febbef8
: 00007fff8bf63cdc ffff887bd174b9c8 ffff887bd174b9c8 0000000000000000
: ffff887a9febbef8 ffff887a9febbef8 0000000001395020 0000000000000000
:Call Trace:
:[<ffffffff8107382b>] ? do_exit+0x5ab/0x870
:[<ffffffff81073b48>] ? do_group_exit+0x58/0xd0
:[<ffffffff81073bd7>] ? sys_exit_group+0x17/0x20
:[<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
:Code: 48 89 e5 0f 1f 44 00 00 48 c7 c0 00 67 01 00 65 48 8b 0c 25 b0 e0 00
00 0f ae f0 48 01 c1 eb 09 0f 1f 80 00 00 00 00 f3 90 8b 01 <89> c2 c1 fa
10 66 39 c2 75 f2 c9 c3 0f 1f 84 00 00 00 00 00 55

END:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130715/49ecd312/attachment.htm>


More information about the Linux-cluster mailing list