[Linux-cluster] Corosync softlookup

Christian Grassi christiangrassi at gmail.com
Sat Jan 12 00:24:20 UTC 2013


Hi all,
I have a three node cluster which run KVM guests a services. The system run
fine for some months but the suddenly it started to have soft lockups as
you can se below and the nodes get fenced.
The guests use clvm with raw lv as back end, and the config files are on
shared gfs2 file systems. Any idea which could be the cause ?
A attache also my cluster.conf

Any idea is welcome

Regards
Chris

Pid: 136556, comm: corosync Not tainted 2.6.32-279.el6.x86_64 #1 HP
ProLiant DL980 G7
RIP: 0010:[<ffffffff8104d08e>]  [<ffffffff8104d08e>]
wait_for_rqlock+0x2e/0x40
RSP: 0018:ffff881c12231ee8  EFLAGS: 00000206
RAX: 00000000e52ae4c7 RBX: ffff881c12231ee8 RCX: ffff882070e16680
RDX: 00000000e52ae4c7 RSI: ffff882070e11960 RDI: 0000000000000000
RBP: ffffffff8100bc0e R08: 0000000000000000 R09: dead000000200200
R10: ffff881c125830c0 R11: 00000000000000d2 R12: 0000000000000282
R13: ffffffff81aa5700 R14: ffff882070e11960 R15: ffff881c12583438
FS:  0000000000000000(0000) GS:ffff882070e00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000035a489a490 CR3: 0000000001a85000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process corosync (pid: 136556, threadinfo ffff881c12230000, task
ffff881c12582aa0)
Stack:
ffff881c12231f68 ffffffff8107091b ffff881c12231f78 ffff881c12231f28
<d> ffff881faf1d5660 ffff881c12582f68 ffff881c12582f68 0000000000000000
<d> ffff881c12231f28 ffff881c12231f28 ffff881c12231f78 00007f9ce339d440
Call Trace:
[<ffffffff8107091b>] ? do_exit+0x5ab/0x870
[<ffffffff81070ce7>] ? sys_exit+0x17/0x20
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Code: e5 0f 1f 44 00 00 48 c7 c0 80 66 01 00 65 48 8b 0c 25 b0 e0 00 00 0f
ae f0 48 01 c1 eb 09 0f 1f 80 00 00 00 00 f3 90 8b 01 89 c2 <c1> fa 10 66
39 c2 75 f2 c9 c3 0f 1f 84 00 00 00 00 00 55 48 89
Call Trace:
[<ffffffff8107091b>] ? do_exit+0x5ab/0x870
[<ffffffff81070ce7>] ? sys_exit+0x17/0x20
[<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
BUG: soft lockup - CPU#90 stuck for 67s! [multipathd:141345]
Modules linked in: iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack
nf_defrag_ipv4 ip6table_filter ip6_tables iptable_filter ip_tables
ebtable_nat ebtables gfs2 dlm configfs autofs4 sunrpc bridge bonding 8021q
garp stp llc ipv6 ext2 vhost_net macvtap macvlan tun kvm_intel kvm
microcode serio_raw power_meter be2net bnx2 netxen_nic iTCO_wdt
iTCO_vendor_support hpilo hpwdt sg i7core_edac edac_core shpchp ext4
mbcache jbd2 dm_round_robin sr_mod cdrom sd_mod crc_t10dif lpfc
scsi_transport_fc scsi_tgt pata_acpi ata_generic ata_piix hpsa radeon ttm
drm_kms_helper drm i2c_algo_bit i2c_core dm_multipath dm_mirror
dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130112/290e8324/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster.conf
Type: application/octet-stream
Size: 7783 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20130112/290e8324/attachment.obj>


More information about the Linux-cluster mailing list