[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] kernel bug at fs/dlm/lowcomms.c:647!



Hi all,


I'm doing some tests on OCFS2 with a 2.6.32-100 kernel (Oracle) or RHEL6/fedora and I have a hang in lowcomms.c as you can see below. I have a crash dump if you need more information. I'm lost and I need help to know where to search to debug this problem.

Thanks

Regards,

Benoit



Kernel 2.6.32-100.0.19.el5 on an x86_64
chili0 login: ------------[ cut here ]------------
kernel BUG at fs/dlm/lowcomms.c:647!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control
CPU 34
Modules linked in: ocfs2(U) ocfs2_nodemanager(U) nfsd(U) exportfs(U) sctp(U) libcrc32c(U) ocfs2_stack_user(U) ocfs2_stackglue(U) dlm(U) configfs(U) acpi_cpufreq(U) freq_table(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) ipv6(U) scsi_dh_emc(U) dm_round_robin(U) dm_multipath(U) iTCO_wdt(U) iTCO_vendor_support(U) mlx4_core(U) i2c_i801(U) igb(U) pcspkr(U) i2c_core(U) ioatdma(U) dca(U) ahci(U) uhci_hcd(U) ehci_hcd(U) lpfc(U) scsi_transport_fc(U) scsi_tgt(U) [last unloaded: ocfs2_nodemanager] Pid: 27062, comm: dlm_recv/34 Not tainted 2.6.32-100.0.19.el5 #1 bullx super-node RIP: 0010:[<ffffffffa02406c3>] [<ffffffffa02406c3>] receive_from_sock+0x554/0x6ed [dlm]
RSP: 0018:ffff880c77c6bc60  EFLAGS: 00010246
RAX: 0000000000000030 RBX: ffff8810774b8d30 RCX: ffff88087c4548f8
RDX: 0000000000000030 RSI: ffff880876dce000 RDI: ffffffff81398045
RBP: ffff880c77c6be50 R08: ffff000000000000 R09: ffff880c77c6b900
R10: ffff880c77c6b8f0 R11: 0000000000000030 R12: 0000000000000030
R13: ffff8810774b8d20 R14: ffff880c7caa00c0 R15: ffffffffa023ecca
FS: 0000000000000000(0000) GS:ffff88048e600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000fcb078 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dlm_recv/34 (pid: 27062, threadinfo ffff880c77c6a000, task ffff880c7caa00c0)
Stack:
 ffff880c77c6bc70 ffffffff8122fa24 ffff880c77c6bc90 ffffffff8122faca
<0> ffff88048e414ec0 0000100000000002 0000000000000000 ffffffff00000000
<0> 0000000000000000 0000000000000000 ffffffffa024bb20 0000000000000030
Call Trace:
 [<ffffffff8122fa24>] ? cpumask_next+0x19/0x1b
 [<ffffffff8122faca>] ? cpumask_next_and+0x20/0x32
 [<ffffffffa023ecca>] ? process_recv_sockets+0x0/0x28 [dlm]
 [<ffffffffa023ecea>] process_recv_sockets+0x20/0x28 [dlm]
 [<ffffffff81071802>] worker_thread+0x14d/0x1ed
 [<ffffffff81075a7c>] ? autoremove_wake_function+0x0/0x3d
 [<ffffffff810716b5>] ? worker_thread+0x0/0x1ed
 [<ffffffff810756d3>] kthread+0x6e/0x76
 [<ffffffff81012dea>] child_rip+0xa/0x20
 [<ffffffff81075665>] ? kthread+0x0/0x76
 [<ffffffff81012de0>] ? child_rip+0x0/0x20
Code: 29 e7 ff ff e9 2d 01 00 00 41 8b 74 24 10 0f b7 d0 48 c7 c7 d1 8c 24 a0 31 c0 e8 ab 71 e1 e0 e9 12 01 00 00 41 83 7d 08 00 75 04 <0f> 0b eb fe 4d 8d 7d 68 49 be 00 00 00 00 00 16 00 00 41 8b 55
RIP  [<ffffffffa02406c3>] receive_from_sock+0x554/0x6ed [dlm]
 RSP <ffff880c77c6bc60>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-100.0.19.el5 (mockbuild ca-build9 us oracle com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Sep 17 17:51:41 EDT 2010 Command line: ro root=/dev/mapper/vg_chili0-lv_root rd_LVM_LV=vg_chili0/lv_root rd_LVM_LV=vg_chili0/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=fr-pc cgroup_disable=memory selinux=0 pcie_aspm=off nmi_watchdog=0 console=ttyS1,115200 maxcpus=1 reset_devices memmap=exactmap memmap=640K 0K memmap=195948K 33408K elfcorehdr=229356K memmap=308K#1993940K memmap=16K#2077704K memmap=4K#2077748K memmap=4K#2077764K memmap=44K#2077768K memmap=72K#2077812K memmap=4K#2077884K memmap=4K#2077888K memmap=4K#2077892K memmap=4K#2078024K memmap=2716K#2078052K memmap=1024K#69204860K memmap=128K#69205884K
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:

Here is the configuration :

[root chili1 ~]#  crm configure show
node chili0
node chili1
primitive IPaddr-dhcp ocf:Bull:IPaddr \
        params ip="11.1.0.20" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-dns ocf:Bull:IPaddr \
        params ip="11.1.0.21" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-monitoring-master ocf:Bull:IPaddr \
        params ip="11.1.0.22" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-mysql ocf:Bull:IPaddr \
        params ip="11.1.0.23" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-nfs ocf:Bull:IPaddr \
        params ip="11.1.0.24" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-postgresql ocf:Bull:IPaddr \
        params ip="11.1.0.25" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-tftp ocf:Bull:IPaddr \
        params ip="11.1.0.26" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive dhcp-dhcp-server lsb:dhcpd \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive dlm ocf:pacemaker:controld \
        op monitor interval="120s"
primitive dns-dns-server lsb:named \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive fs-BCM-MCO ocf:Bull:Filesystem \
params device="-L HA_MNGT:MCO" directory="/BCM/MCO" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-conf ocf:Bull:Filesystem \
params device="-L HA_MNGT:CONF" directory="/BCM/conf" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-console ocf:Bull:Filesystem \
params device="-L HA_MNGT:CONSOLE" directory="/BCM/console" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-data ocf:Bull:Filesystem \
params device="-L HA_MNGT:RRDDBs" directory="/BCM/data" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-log ocf:Bull:Filesystem \
params device="-L HA_MNGT:LOGs" directory="/BCM/log" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-storage ocf:Bull:Filesystem \
params device="-L HA_MNGT:STORAGE" directory="/BCM/storage" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive monitoring-master-errorManager lsb:errorManager \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive monitoring-master-eventManager lsb:eventManager \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive monitoring-master-nagios lsb:nagios \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive monitoring-master-powerManager lsb:powerManager \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive monitoring-master-syslog-ng lsb:syslog-ng-monitoring \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive mysql-fs-DBs ocf:Bull:Filesystem \
params device="-L HA_MNGT:MYSQLDBs" directory="/var/lib/mysql" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive mysql-mysqld ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" pid="/var/run/mysqld/mysqld.pid" \
        op start interval="0" timeout="" 120 \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive nfs-nfs-server ocf:heartbeat:nfsserver \
params nfs_init_script="/etc/init.d/nfs" nfs_notify_cmd="/usr/sbin/sm-notify" nfs_shared_infodir="/BCM/log/nfs-server-logs" nfs_ip="11.1.0.24" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60"
primitive o2cb ocf:ocfs2:o2cb \
        op monitor interval="120s"
primitive postgresql-clusterdb ocf:heartbeat:pgsql \
        params pgdata="/var/lib/pgsql/data" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive postgresql-fs-DBs ocf:Bull:Filesystem \
params device="-L HA_MNGT:PGSQLDBs" directory="/var/lib/pgsql/data" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive restofencechili0 stonith:fence_ipmilan \
params ipaddr="11.1.0.10" login="super" passwd="pass" pcmk_host_check="none" action="diag" \
        meta target-role="Stopped"
primitive restofencechili1 stonith:fence_ipmilan \
params ipaddr="11.1.0.11" login="super" passwd="pass" pcmk_host_check="none" action="diag" \
        meta target-role="Stopped"
primitive syslog-ng-syslog-ng lsb:hasyslog-ng \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40" on-fail="restart" \
        meta migration-threshold="3"
primitive tftp-tftp-server lsb:xinetd \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
group dhcp IPaddr-dhcp dhcp-dhcp-server \
        meta target-role="Started" migration-threshold="1"
group dns IPaddr-dns dns-dns-server \
        meta target-role="Started" migration-threshold="1"
group monitoring-master IPaddr-monitoring-master monitoring-master-syslog-ng monitoring-master-nagios monitoring-master-errorManager monitoring-master-eventManager monitoring-master-powerManager \
        meta target-role="Started" migration-threshold="1"
group mysql IPaddr-mysql mysql-mysqld \
        meta target-role="Started" migration-threshold="1"
group nfs IPaddr-nfs nfs-nfs-server \
        meta target-role="Started" migration-threshold="1"
group postgresql IPaddr-postgresql postgresql-clusterdb \
        meta target-role="Started" migration-threshold="1"
group tftp IPaddr-tftp tftp-tftp-server \
        meta target-role="Started" migration-threshold="1"
clone clone-dlm dlm \
meta target-role="Started" globally-unique="false" interleave="true"
clone clone-fs-BCM-MCO fs-BCM-MCO \
        meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-conf fs-BCM-conf \
        meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-console fs-BCM-console \
        meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-data fs-BCM-data \
        meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-log fs-BCM-log \
        meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-storage fs-BCM-storage \
        meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-mysql-fs-DBs mysql-fs-DBs \
        meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-o2cb o2cb \
meta target-role="Started" globally-unique="false" interleave="true"
clone clone-postgresql-fs-DBs postgresql-fs-DBs \
        meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-syslog-ng syslog-ng-syslog-ng \
        meta interleave="true" ordered="false" target-role="Stopped" \
        meta target-role="Stopped"
location forbiddenloc-restofencechili0 restofencechili0 -inf: chili0
location forbiddenloc-restofencechili1 restofencechili1 -inf: chili1
location loc1-group-dhcp dhcp +100: chili0
location loc1-group-dns dns +100: chili1
location loc1-group-monitoring-master monitoring-master +100: chili0
location loc1-group-mysql mysql +100: chili1
location loc1-group-nfs nfs +100: chili1
location loc1-group-postgresql postgresql +100: chili1
location loc1-group-tftp tftp +100: chili0
location loc1-restofencechili0 restofencechili0 +inf: chili1
location loc1-restofencechili1 restofencechili1 +inf: chili0
colocation coloc-clone-fs-BCM-MCO-o2cb inf: clone-fs-BCM-MCO clone-o2cb
colocation coloc-clone-fs-BCM-conf-o2cb inf: clone-fs-BCM-conf clone-o2cb
colocation coloc-clone-fs-BCM-console-o2cb inf: clone-fs-BCM-console clone-o2cb
colocation coloc-clone-fs-BCM-data-o2cb inf: clone-fs-BCM-data clone-o2cb
colocation coloc-clone-fs-BCM-log-o2cb inf: clone-fs-BCM-log clone-o2cb
colocation coloc-clone-fs-BCM-storage-o2cb inf: clone-fs-BCM-storage clone-o2cb
colocation coloc-clone-mysql-fs-DBs-o2cb inf: clone-mysql-fs-DBs clone-o2cb
colocation coloc-clone-postgresql-fs-DBs-o2cb inf: clone-postgresql-fs-DBs clone-o2cb colocation coloc-fs-BCM-MCO-monitoring-master +inf: monitoring-master clone-fs-BCM-MCO
colocation coloc-fs-BCM-MCO-nfs +inf: nfs clone-fs-BCM-MCO
colocation coloc-fs-BCM-conf-monitoring-master +inf: monitoring-master clone-fs-BCM-conf
colocation coloc-fs-BCM-conf-nfs +inf: nfs clone-fs-BCM-conf
colocation coloc-fs-BCM-console-nfs +inf: nfs clone-fs-BCM-console
colocation coloc-fs-BCM-data-monitoring-master +inf: monitoring-master clone-fs-BCM-data
colocation coloc-fs-BCM-data-nfs +inf: nfs clone-fs-BCM-data
colocation coloc-fs-BCM-log-monitoring-master +inf: monitoring-master clone-fs-BCM-log
colocation coloc-fs-BCM-log-nfs +inf: nfs clone-fs-BCM-log
colocation coloc-mysql-fs-DBs-mysql +inf: mysql clone-mysql-fs-DBs
colocation coloc-postgresql-fs-DBs-postgresql +inf: postgresql clone-postgresql-fs-DBs
colocation o2cb-with-dlm inf: clone-o2cb clone-dlm
order order-clone-fs-BCM-MCO-o2cb inf: clone-o2cb clone-fs-BCM-MCO
order order-clone-fs-BCM-conf-o2cb inf: clone-o2cb clone-fs-BCM-conf
order order-clone-fs-BCM-console-o2cb inf: clone-o2cb clone-fs-BCM-console
order order-clone-fs-BCM-data-o2cb inf: clone-o2cb clone-fs-BCM-data
order order-clone-fs-BCM-log-o2cb inf: clone-o2cb clone-fs-BCM-log
order order-clone-fs-BCM-storage-o2cb inf: clone-o2cb clone-fs-BCM-storage
order order-clone-mysql-fs-DBs-o2cb inf: clone-o2cb clone-mysql-fs-DBs
order order-clone-postgresql-fs-DBs-o2cb inf: clone-o2cb clone-postgresql-fs-DBs order order-monitoring-master inf: clone-fs-BCM-MCO clone-fs-BCM-log clone-fs-BCM-data clone-fs-BCM-conf monitoring-master
order order-mysql inf: clone-mysql-fs-DBs mysql
order order-nfs inf: clone-fs-BCM-console clone-fs-BCM-MCO clone-fs-BCM-log clone-fs-BCM-data clone-fs-BCM-conf nfs
order order-postgresql inf: clone-postgresql-fs-DBs postgresql
order start-o2cb-after-dlm inf: clone-dlm clone-o2cb
property $id="cib-bootstrap-options" \
        dc-version="1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="true" \
        no-quorum-policy="ignore" \
        default-resource-stickiness="5000" \
        last-lrm-refresh="1286452453"


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]