[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] bug in kernel or module cman?



Здравствуйте, Linux-cluster.


My gfs mountpoints in cluster periodically (approximately, once per 2 weeks) hangs, and in my logs i see this:



Jun 30 23:16:26 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 100339712 for RLIMIT_STACK against limit 4194304 for /[cman_t

ool:13085] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0

Jun 30 23:16:26 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 100339712 for RLIMIT_STACK against limit 4194304 for /[cman_t

ool:13085] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0

Jun 30 23:16:26 cluster kernel: CMAN: Waiting to join or form a Linux-cluster

Jun 30 23:16:30 cluster kernel: CMAN: sending membership request

Jun 30 23:16:31 cluster kernel: CMAN: got node node0

Jun 30 23:16:31 cluster kernel: CMAN: got node node1

Jun 30 23:17:01 cluster kernel: CMAN: Master died after JOINCONF, we must leave the cluster

Jun 30 23:17:01 cluster kernel: CMAN: we are leaving the cluster.

Jun 30 23:18:04 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 111812608 for RLIMIT_STACK against limit 8388608 for /[cman_t

ool:16413] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0

Jun 30 23:18:04 cluster kernel: grsec: From 87.245.147.2: denied resource overstep by requesting 111812608 for RLIMIT_STACK against limit 8388608 for /[cman_t

ool:16413] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:11531] uid/euid:0/0 gid/egid:0/0

Jun 30 23:18:04 cluster kernel: CMAN: Waiting to join or form a Linux-cluster

Jun 30 23:18:05 cluster kernel: CMAN: sending membership request

Jun 30 23:18:06 cluster kernel: CMAN: got node node0

Jun 30 23:18:06 cluster kernel: CMAN: got node node1

Jun 30 23:18:36 cluster kernel: CMAN: Master died after JOINCONF, we must leave the cluster

Jun 30 23:18:36 cluster kernel: CMAN: we are leaving the cluster.

Jun 30 23:19:05 cluster kernel: CMAN: Waiting to join or form a Linux-cluster

Jun 30 23:19:05 cluster kernel: CMAN: sending membership request

Jun 30 23:19:06 cluster kernel: CMAN: got node node1

Jun 30 23:19:06 cluster kernel: CMAN: got node node0

Jun 30 23:19:27 cluster kernel: CMAN: node node0 has been removed from the cluster : Inconsistent cluster view

Jun 30 23:22:39 cluster kernel: CMAN: removing node node1 from the cluster : No response to messages

Jun 30 23:22:39 cluster kernel: ------------[ cut here ]------------

Jun 30 23:22:39 cluster kernel: kernel BUG at /home/Compile/GFS/cluster-1.02.00/cman-kernel/src/membership.c:3151!

Jun 30 23:22:39 cluster kernel: invalid opcode: 0000 [#1]

Jun 30 23:22:39 cluster kernel: Modules linked in: nfs lock_dlm dlm cman lock_harness nfsd exportfs lockd nfs_acl sunrpc ipt_REJECT ipt_multiport iptable_nat

ip_nat ip_conntrack iptable_filter lm75 microcode dm_mod button battery ac uhci_hcd ehci_hcd i2c_i801 e1000 ext3 jbd 3w_xxxx

Jun 30 23:22:39 cluster kernel: CPU:    0

Jun 30 23:22:39 cluster kernel: EIP:    0060:[<f8aa95e6>]    Tainted: GF     VLI

Jun 30 23:22:39 cluster kernel: EFLAGS: 00010246   (2.6.16.20-grsec #8)

Jun 30 23:22:39 cluster kernel: eax: 00000000   ebx: 00000080   ecx: f8ab9000   edx: 00000080

Jun 30 23:22:39 cluster kernel: esi: d3352f64   edi: d3352fa0   ebp: 00000000   esp: d3352f58

Jun 30 23:22:39 cluster kernel: ds: 007b   es: 007b   ss: 0068

Jun 30 23:22:39 cluster kernel: Process cman_memb (pid: 7952, threadinfo=d3352000 task=c530e2b0)

Jun 30 23:22:39 cluster kernel: Stack: <0>f2b45920 f8aa12bc f8aaa9f9 f5458dc0 f8aa0712 00000001 f2b45920 f8aaaa9d

Jun 30 23:22:39 cluster kernel:        c530e2b0 f8aa12e5 f8aad021 00000000 00000000 00000000 c530e2b0 c01473ba

Jun 30 23:22:39 cluster kernel:        00100100 00200200 0100001e 00000001 c01473ba 00100100 00200200 00000001

Jun 30 23:22:39 cluster kernel: Call Trace:

Jun 30 23:22:39 cluster kernel:  [<f8aaa9f9>]

Jun 30 23:22:39 cluster kernel:  [<f8aaaa9d>]

Jun 30 23:22:39 cluster kernel:  [<f8aad021>]

Jun 30 23:22:39 cluster kernel:  [<c01473ba>]

Jun 30 23:22:39 cluster kernel:  [<c01473ba>]

Jun 30 23:22:39 cluster kernel:  [<f8aac631>]

Jun 30 23:22:39 cluster kernel:  [<c0131005>]

Jun 30 23:22:39 cluster kernel: Code: 1d f8 15 aa f8 8b 0d f4 15 aa f8 ba 01 00 00 00 eb 15 8b 04 91 85 c0 74 0d 83 78 1c 02 75 07 89 06 8b 40 14 eb 0f 42 39

da 7c e7 <0f> 0b 4f 0c 93 38 ab f8 31 c0 5b 5e c3 a3 3c 22 aa f8 b8 cc 15


------------------ 

And another one:


Jul 10 12:48:46 cluster kernel: grsec: From 83.166.231.248: denied resource overstep by requesting 57942016 for RLIMIT_STACK against limit 4194304 for /[cman_tool:11938] uid/guid:0/0 gid/egid:0/0, parent /bin/bash[bash:4524] uid/euid:0/0 gid/egid:0/0

Jul 10 12:48:46 cluster kernel: grsec: From 83.166.231.248: denied resource overstep by requesting 57942016 for RLIMIT_STACK against limit 4194304 for /[cman_tool:11938] uid/euid:0/0 gid/egid:0/0, parent /bin/bash[bash:4524] uid/euid:0/0 gid/egid:0/0

Jul 10 12:48:46 cluster kernel: CMAN: Waiting to join or form a Linux-cluster

Jul 10 12:48:48 cluster kernel: CMAN: sending membership request

Jul 10 12:48:48 cluster kernel: CMAN: sending membership request

Jul 10 12:48:48 cluster kernel: CMAN: got node node1

Jul 10 12:53:42 cluster kernel: CMAN: removing node node1 from the cluster : No response to messages

Jul 10 12:53:42 cluster kernel: ------------[ cut here ]------------

Jul 10 12:53:42 cluster kernel: kernel BUG at /home/Compile/GFS/cluster-1.02.00/cman-kernel/src/membership.c:3151!

Jul 10 12:53:42 cluster kernel: invalid opcode: 0000 [#1]

Jul 10 12:53:42 cluster kernel: Modules linked in: nfs gnbd lock_dlm dlm cman lock_harness nfsd exportfs lockd nfs_acl sunrpc ipt_REJECT ipt_multiport iptable

_nat ip_nat ip_conntrack iptable_filter lm75 microcode dm_mod button battery ac uhci_hcd ehci_hcd i2c_i801 e1000 ext3 jbd 3w_xxxx

Jul 10 12:53:42 cluster kernel: CPU:    0

Jul 10 12:53:42 cluster kernel: EIP:    0060:[<f8aa95e6>]    Tainted: GF     VLI

Jul 10 12:53:42 cluster kernel: EFLAGS: 00010246   (2.6.16.20-grsec #8)

Jul 10 12:53:42 cluster kernel: eax: 00000000   ebx: 00000080   ecx: f8ab9000   edx: 00000080

Jul 10 12:53:42 cluster kernel: esi: c0722f64   edi: c0722fa0   ebp: 00000000   esp: c0722f58

Jul 10 12:53:42 cluster kernel: ds: 007b   es: 007b   ss: 0068

Jul 10 12:53:42 cluster kernel: Process cman_memb (pid: 31173, threadinfo=c0722000 task=d41e8910)

Jul 10 12:53:42 cluster kernel: Stack: <0>f6e45bc0 f8aa12bc f8aaa9f9 f6e630c0 f8aa0712 00000003 f6e45bc0 f8aaaa9d

Jul 10 12:53:42 cluster kernel:        d41e8910 f8aa12e5 f8aad021 00000000 00000000 00000000 d41e8910 c01473ba

Jul 10 12:53:42 cluster kernel:        00100100 00200200 0100001e 00000003 c01473ba 00100100 00200200 00000001

Jul 10 12:53:42 cluster kernel: Call Trace:

Jul 10 12:53:42 cluster kernel:  [<f8aaa9f9>]

Jul 10 12:53:42 cluster kernel:  [<f8aaaa9d>]

Jul 10 12:53:42 cluster kernel:  [<f8aad021>]

Jul 10 12:53:42 cluster kernel:  [<c01473ba>]

Jul 10 12:53:42 cluster kernel:  [<c01473ba>]

Jul 10 12:53:42 cluster kernel:  [<f8aac631>]

Jul 10 12:53:42 cluster kernel:  [<c0131005>]

Jul 10 12:53:42 cluster kernel: Code: 1d f8 15 aa f8 8b 0d f4 15 aa f8 ba 01 00 00 00 eb 15 8b 04 91 85 c0 74 0d 83 78 1c 02 75 07 89 06 8b 40 14 eb 0f 42 39

da 7c e7 <0f> 0b 4f 0c 93 38 ab f8 31 c0 5b 5e c3 a3 3c 22 aa f8 b8 cc 15

Jul 10 13:03:02 cluster kernel:  releasing gnbd class

Jul 10 13:03:02 cluster kernel: releasing gnbd class

Jul 10 13:03:05 cluster last message repeated 126 times


Actually, all requests to GFS moutpoint gets hang forever to wait something, and all 100% CPU time passeed to wait state.

At that time servers with imported GNBD`s does not go to soft reboot or shutdown anyway. Only hard reset/poweroff helps.

The dump i provide is from main cluster node that hosts hard disks with partition that i shared over GNBD with GFS.


BTW, my kernel patched with grsecurity patch (as you can see at top of provided logs).


what is a solution? What for cman_tool require a stack size over 50Mb and over 100Mb??? 


-- 

С уважением,

 Flagman                          mailto:Flagman incomtel ru


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]