[Linux-cluster] GFS Crash

Chmouel Boudjnah cboudjnah at squiz.net
Mon Feb 20 08:13:36 UTC 2006


Hi,

Using GFS on RHEL4 (kernel 2.6.9-22.0.2.ELsmp) i got a crash with
it. 

Before in the dmesg, there is some OOM-Killer activity with httpd.
It seems to get something to do with the HIGHMEM and the 4GB of RAM.
It looks like the OOM killer keep the pages still dirty until the
physical RAM filled (but i am pretty sure i can get wrong about that,
i don't really know how works the OOM-Killer theses days).

So i am not sure what's trigering the crash, is it because of the OOM
killer or because of GFS.

Here is the crash of GFS :

,----
| Feb 20 02:29:38 www01 kernel: 6810 en punlock 7,4034f6
| Feb 20 02:29:38 www01 kernel: 6810 ex punlock -2
| Feb 20 02:29:38 www01 kernel:
| Feb 20 02:29:38 www01 kernel: lock_dlm:  Assertion failed on line 357 of file /usr/src/build/678343-i686/BUILD/gfs-kernel-2.6.9-45/smp/src/dlm/lock.c
| Feb 20 02:29:38 www01 kernel: lock_dlm:  assertion:  "!error"
| Feb 20 02:29:38 www01 kernel: lock_dlm:  time = 2296932048
| Feb 20 02:29:38 www01 kernel: matrix: error=-22 num=2,6b66f lkf=10000 flags=84
| Feb 20 02:29:38 www01 kernel:
| Feb 20 02:29:38 www01 kernel: ------------[ cut here ]------------
| Feb 20 02:29:38 www01 kernel: kernel BUG at /usr/src/build/678343-i686/BUILD/gfs-kernel-2.6.9-45/smp/src/dlm/lock.c:357!
| Feb 20 02:29:38 www01 kernel: invalid operand: 0000 [#3]
| Feb 20 02:29:38 www01 kernel: SMP
| Feb 20 02:29:38 www01 kernel: Modules linked in: iptable_nat ip_conntrack iptable_filter ip_tables lock_dlm(U) dlm(U) gfs(U) lock_harness(U) cman(U) nls_utf8 loop autofs4 md5 ipv6 microcode dm_mirror dm_mod button battery ac uhci_hcd ehci_hcd hw_random e100 mii tg3 floppy ext3
| jbd cciss sd_mod scsi_mod
| Feb 20 02:29:38 www01 kernel: CPU:    2
| Feb 20 02:29:38 www01 kernel: EIP:    0060:[<f89e65f3>]    Not tainted VLI
| Feb 20 02:29:38 www01 kernel: EFLAGS: 00010246   (2.6.9-22.0.2.ELsmp)
| Feb 20 02:29:38 www01 kernel: EIP is at do_dlm_unlock+0x8b/0xa0 [lock_dlm]
| Feb 20 02:29:38 www01 kernel: eax: 00000001   ebx: f2afea80   ecx: f3c8bf2c   edx: f89eb175
| Feb 20 02:29:38 www01 kernel: esi: ffffffea   edi: f2afea80   ebp: f89bf000   esp: f3c8bf28
| Feb 20 02:29:38 www01 kernel: ds: 007b   es: 007b   ss: 0068
| Feb 20 02:29:38 www01 kernel: Process gfs_glockd (pid: 5950, threadinfo=f3c8b000 task=f65a2eb0)
| Feb 20 02:29:38 www01 kernel: Stack: f89eb175 f89bf000 00000003 f89e6893 f8abc5ee 00000001 d9d85314 d9d852f8
| Feb 20 02:29:38 www01 kernel:        f8ab2892 f8ae7580 cae61980 d9d852f8 f8ae7580 d9d852f8 f8ab1d8b d9d852f8
| Feb 20 02:29:38 www01 kernel:        00000001 d9d8538c f8ab1e42 d9d852f8 d9d85314 f8ab1f65 00000001 d9d85314
| Feb 20 02:29:38 www01 kernel: Call Trace:
| Feb 20 02:29:38 www01 kernel:  [<f89e6893>] lm_dlm_unlock+0x14/0x1c [lock_dlm]
| Feb 20 02:29:38 www01 kernel:  [<f8abc5ee>] gfs_lm_unlock+0x2c/0x42 [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab2892>] gfs_glock_drop_th+0xf3/0x12d [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab1d8b>] rq_demote+0x7f/0x98 [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab1e42>] run_queue+0x5a/0xc1 [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab1f65>] unlock_on_glock+0x1f/0x28 [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8ab3ec9>] gfs_reclaim_glock+0xc3/0x13c [gfs]
| Feb 20 02:29:38 www01 kernel:  [<f8aa6e05>] gfs_glockd+0x39/0xde [gfs]
| Feb 20 02:29:38 www01 kernel:  [<c011e45f>] default_wake_function+0x0/0xc
| Feb 20 02:29:38 www01 kernel:  [<c02d129e>] ret_from_fork+0x6/0x14
| Feb 20 02:29:38 www01 kernel:  [<c011e45f>] default_wake_function+0x0/0xc
| Feb 20 02:29:38 www01 kernel:  [<f8aa6dcc>] gfs_glockd+0x0/0xde [gfs]
| Feb 20 02:29:38 www01 kernel:  [<c01041f1>] kernel_thread_helper+0x5/0xb
| Feb 20 02:29:38 www01 kernel: Code: 73 34 8b 03 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 ff 70 18 68 6d b2 9e f8 e8 5c bc 73 c7 83 c4 34 68 75 b1 9e f8 e8 4f bc 73 c7 <0f> 0b 65 01 b2 b0 9e f8 68 77 b1 9e f8 e8 0a b4 73 c7 5b 5e c3
`----

And here is one of the OOM-Killer cleanup stuff :

,----
| Feb 20 02:29:36 www01 kernel: cpu 0 hot: low 32, high 96, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 0 cold: low 0, high 32, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 1 hot: low 32, high 96, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 1 cold: low 0, high 32, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 2 hot: low 32, high 96, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 2 cold: low 0, high 32, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 3 hot: low 32, high 96, batch 16
| Feb 20 02:29:36 www01 kernel: cpu 3 cold: low 0, high 32, batch 16
| Feb 20 02:29:36 www01 kernel:
| Feb 20 02:29:36 www01 kernel: Free pages:       75308kB (62400kB HighMem)
| Feb 20 02:29:36 www01 kernel: Active:622325 inactive:354228 dirty:0 writeback:0 unstable:0 free:18827 slab:19992 mapped:976121 pagetables:2922
| Feb 20 02:29:36 www01 kernel: DMA free:12588kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB pages_scanned:15087 all_unreclaimable? yes
| Feb 20 02:29:36 www01 kernel: protections[]: 0 0 0
| Feb 20 02:29:36 www01 kernel: Normal free:320kB min:928kB low:1856kB high:2784kB active:508288kB inactive:265708kB present:901120kB pages_scanned:2805165 all_unreclaimable? yes
| Feb 20 02:29:36 www01 kernel: protections[]: 0 0 0
| Feb 20 02:29:36 www01 kernel: HighMem free:62400kB min:512kB low:1024kB high:1536kB active:1980884kB inactive:1151332kB present:3735548kB pages_scanned:0 all_unreclaimable? no
| Feb 20 02:29:36 www01 kernel: protections[]: 0 0 0
| Feb 20 02:29:36 www01 kernel: DMA: 3*4kB 4*8kB 4*16kB 2*32kB 4*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 2*4096kB = 12588kB
| Feb 20 02:29:36 www01 kernel: Normal: 0*4kB 4*8kB 0*16kB 1*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 320kB
| Feb 20 02:29:36 www01 kernel: HighMem: 1152*4kB 1118*8kB 1179*16kB 903*32kB 7*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 62400kB
| Feb 20 02:29:36 www01 kernel: Swap cache: add 528754, delete 527948, find 173/299, race 0+0
| Feb 20 02:29:36 www01 kernel: 0 bounce buffer pages
| Feb 20 02:29:36 www01 kernel: Free swap:            0kB
| Feb 20 02:29:36 www01 kernel: 1163263 pages of RAM
| Feb 20 02:29:36 www01 kernel: 802802 pages of HIGHMEM
| Feb 20 02:29:36 www01 kernel: 141639 reserved pages
| Feb 20 02:29:36 www01 kernel: 22868 pages shared
| Feb 20 02:29:36 www01 kernel: 806 pages swap cached
| Feb 20 02:29:36 www01 kernel: Out of Memory: Killed process 6122 (httpd).
`----

Someone has an idea about it ?

Cheers, Chmouel.

-- 
Chmouel Boudjnah - Squiz.net - http://www.squiz.net/




More information about the Linux-cluster mailing list