[Linux-cluster] soft lockups on 2.6.27/2.03.09 with gfs1

Frederik Schüler fs at debian.org
Mon Nov 10 09:27:41 UTC 2008


Hello,

I am experiencing a rather big problem with deadlocks on a 9 nodes GFS1 
cluster, with vanilla 2.6.27 and both rhcs 2.03.09 and latest git stable2. 
Fencing is done via fabric, the node keeps throwing these errors after it got 
fenced.

This is a rather busy webserver cluster, with usually some dozens to hundreds 
of apache processes running concurrently, and 4 gfs1 shares with lots of 
small writes on the "template cache" volume from all 9 nodes. 

Lockups look like this:

[44955.425003] BUG: soft lockup - CPU#2 stuck for 61s! [apache:12639]
[44955.425007] Modules linked in: gfs ac battery ipv6 iptable_filter xt_tcpudp 
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables lock_dlm 
gfs2 dlm configfs snd_pcm snd_timer snd soundcore snd_page_alloc rtc_cmos 
rtc_core i2c_nforce2 k8temp shpchp rtc_lib pcspkr pci_hotplug i2c_core button 
evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom 
amd74xx sd_mod ide_pci_generic ide_core floppy qla2xxx scsi_transport_fc 
3w_9xxx e1000e scsi_tgt ata_generic sata_nv forcedeth libata ehci_hcd 
scsi_mod dock ohci_hcd thermal processor fan thermal_sys
[44955.425007] irq event stamp: 0
[44955.425007] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[44955.425007] hardirqs last disabled at (0): [<ffffffff8023d7df>] 
copy_process+0x543/0x12b4
[44955.425007] softirqs last  enabled at (0): [<ffffffff8023d7df>] 
copy_process+0x543/0x12b4
[44955.425007] softirqs last disabled at (0): [<0000000000000000>] 0x0
[44955.425007] CPU 2:
[44955.425007] Modules linked in: gfs ac battery ipv6 iptable_filter xt_tcpudp 
iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack ip_tables x_tables lock_dlm 
gfs2 dlm configfs snd_pcm snd_timer snd soundcore snd_page_alloc rtc_cmos 
rtc_core i2c_nforce2 k8temp shpchp rtc_lib pcspkr pci_hotplug i2c_core button 
evdev ext3 jbd mbcache dm_mirror dm_log dm_snapshot dm_mod ide_cd_mod cdrom 
amd74xx sd_mod ide_pci_generic ide_core floppy qla2xxx scsi_transport_fc 
3w_9xxx e1000e scsi_tgt ata_generic sata_nv forcedeth libata ehci_hcd 
scsi_mod dock ohci_hcd thermal processor fan thermal_sys
[44955.425007] Pid: 12639, comm: apache Not tainted 2.6.27-2-amd64 #1
[44955.425007] RIP: 0010:[<ffffffff8021759b>]  [<ffffffff8021759b>] 
native_read_tsc+0x6/0x18
[44955.425007] RSP: 0018:ffff880214af9d80  EFLAGS: 00000202
[44955.425007] RAX: 0000000000000000 RBX: 00000000498fb129 RCX: 
ffffffff8085d300
[44955.425007] RDX: 000062bb00000000 RSI: 0000000001062560 RDI: 
0000000000000001
[44955.425007] RBP: 0000000000000002 R08: 0000000000000002 R09: 
0000000000000000
[44955.425007] R10: 0000000000000000 R11: ffffffff8033dd3e R12: 
ffff88041f0b0000
[44955.425007] R13: ffff8802abb76000 R14: ffff880214af8000 R15: 
ffffffff8085a890
[44955.425007] FS:  00007f3e8ea7d6d0(0000) GS:ffff88041f0c9940(0000) 
knlGS:0000000000000000
[44955.425007] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44955.425007] CR2: 00007f3e8e9fc000 CR3: 0000000214adf000 CR4: 
00000000000006e0
[44955.425007] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[44955.425007] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
0000000000000400
[44955.425007] 
[44955.425007] Call Trace:
[44955.425007]  [<ffffffff8033dd53>] ? delay_tsc+0x15/0x45
[44955.425007]  [<ffffffff80341333>] ? _raw_spin_lock+0x98/0x100
[44955.425007]  [<ffffffff8045b3ce>] ? _spin_lock+0x4e/0x5a
[44955.425007]  [<ffffffff802c47dd>] ? igrab+0x10/0x36
[44955.425007]  [<ffffffff802c47dd>] ? igrab+0x10/0x36
[44955.425007]  [<ffffffffa0394971>] ? gfs_getattr+0x83/0xb7 [gfs]
[44955.425007]  [<ffffffff802b5846>] ? vfs_getattr+0x1a/0x5e
[44955.425007]  [<ffffffff802b59f6>] ? vfs_stat_fd+0x2f/0x43
[44955.425007]  [<ffffffff802b5a66>] ? sys_newstat+0x19/0x31
[44955.425007]  [<ffffffff8020ff7a>] ? system_call_fastpath+0x16/0x1b


Best regards
Frederik Schüler

-- 
ENOSIG
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20081110/991ad986/attachment.sig>


More information about the Linux-cluster mailing list