[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] kernel bug at fs/dlm/lowcomms.c:647!



More info about the problem from the dump, if that can help :

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-100.0.19.el5/vmlinux
DUMPFILE: /var/var/crash/127.0.0.1-2010-10-18-16:42:07/vmcore [PARTIAL DUMP]
        CPUS: 64
        DATE: Mon Oct 18 16:41:48 2010
      UPTIME: 00:15:00
LOAD AVERAGE: 1.06, 1.22, 1.65
       TASKS: 1594
    NODENAME: chili0
     RELEASE: 2.6.32-100.0.19.el5
     VERSION: #1 SMP Fri Sep 17 17:51:41 EDT 2010
     MACHINE: x86_64  (1999 Mhz)
      MEMORY: 64 GB
       PANIC: "kernel BUG at fs/dlm/lowcomms.c:647!"
         PID: 27062
     COMMAND: "dlm_recv/34"
        TASK: ffff880c7caa00c0  [THREAD_INFO: ffff880c77c6a000]
         CPU: 34
       STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 27062  TASK: ffff880c7caa00c0  CPU: 34  COMMAND: "dlm_recv/34"
 #0 [ffff880c77c6b910] machine_kexec at ffffffff8102cc9b
 #1 [ffff880c77c6b990] crash_kexec at ffffffff810964d4
 #2 [ffff880c77c6ba60] oops_end at ffffffff81439bd9
 #3 [ffff880c77c6ba90] die at ffffffff81015639
 #4 [ffff880c77c6bac0] do_trap at ffffffff8143952c
 #5 [ffff880c77c6bb10] do_invalid_op at ffffffff81013902
 #6 [ffff880c77c6bbb0] invalid_op at ffffffff81012b7b
    [exception RIP: receive_from_sock+1364]
    RIP: ffffffffa02406c3  RSP: ffff880c77c6bc60  RFLAGS: 00010246
    RAX: 0000000000000030  RBX: ffff8810774b8d30  RCX: ffff88087c4548f8
    RDX: 0000000000000030  RSI: ffff880876dce000  RDI: ffffffff81398045
    RBP: ffff880c77c6be50   R8: ffff000000000000   R9: ffff880c77c6b900
    R10: ffff880c77c6b8f0  R11: 0000000000000030  R12: 0000000000000030
    R13: ffff8810774b8d20  R14: ffff880c7caa00c0  R15: ffffffffa023ecca
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffff880c77c6be58] process_recv_sockets at ffffffffa023ecea
 #8 [ffff880c77c6be78] worker_thread at ffffffff81071802
 #9 [ffff880c77c6bee8] kthread at ffffffff810756d3
#10 [ffff880c77c6bf48] kernel_thread at ffffffff81012dea

Le 18/10/2010 17:33, Welterlen Benoit a écrit :
Hi all,


I'm doing some tests on OCFS2 with a 2.6.32-100 kernel (Oracle) or RHEL6/fedora and I have a hang in lowcomms.c as you can see below. I have a crash dump if you need more information. I'm lost and I need help to know where to search to debug this problem.

Thanks

Regards,

Benoit



Kernel 2.6.32-100.0.19.el5 on an x86_64
chili0 login: ------------[ cut here ]------------
kernel BUG at fs/dlm/lowcomms.c:647!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/dlm/14E8093BB71D447EBEE691622CF86B9C/control
CPU 34
Modules linked in: ocfs2(U) ocfs2_nodemanager(U) nfsd(U) exportfs(U) sctp(U) libcrc32c(U) ocfs2_stack_user(U) ocfs2_stackglue(U) dlm(U) configfs(U) acpi_cpufreq(U) freq_table(U) ipmi_devintf(U) ipmi_si(U) ipmi_msghandler(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) auth_rpcgss(U) sunrpc(U) ipv6(U) scsi_dh_emc(U) dm_round_robin(U) dm_multipath(U) iTCO_wdt(U) iTCO_vendor_support(U) mlx4_core(U) i2c_i801(U) igb(U) pcspkr(U) i2c_core(U) ioatdma(U) dca(U) ahci(U) uhci_hcd(U) ehci_hcd(U) lpfc(U) scsi_transport_fc(U) scsi_tgt(U) [last unloaded: ocfs2_nodemanager] Pid: 27062, comm: dlm_recv/34 Not tainted 2.6.32-100.0.19.el5 #1 bullx super-node RIP: 0010:[<ffffffffa02406c3>] [<ffffffffa02406c3>] receive_from_sock+0x554/0x6ed [dlm]
RSP: 0018:ffff880c77c6bc60  EFLAGS: 00010246
RAX: 0000000000000030 RBX: ffff8810774b8d30 RCX: ffff88087c4548f8
RDX: 0000000000000030 RSI: ffff880876dce000 RDI: ffffffff81398045
RBP: ffff880c77c6be50 R08: ffff000000000000 R09: ffff880c77c6b900
R10: ffff880c77c6b8f0 R11: 0000000000000030 R12: 0000000000000030
R13: ffff8810774b8d20 R14: ffff880c7caa00c0 R15: ffffffffa023ecca
FS: 0000000000000000(0000) GS:ffff88048e600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000fcb078 CR3: 0000000001001000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process dlm_recv/34 (pid: 27062, threadinfo ffff880c77c6a000, task ffff880c7caa00c0)
Stack:
 ffff880c77c6bc70 ffffffff8122fa24 ffff880c77c6bc90 ffffffff8122faca
<0> ffff88048e414ec0 0000100000000002 0000000000000000 ffffffff00000000
<0> 0000000000000000 0000000000000000 ffffffffa024bb20 0000000000000030
Call Trace:
 [<ffffffff8122fa24>] ? cpumask_next+0x19/0x1b
 [<ffffffff8122faca>] ? cpumask_next_and+0x20/0x32
 [<ffffffffa023ecca>] ? process_recv_sockets+0x0/0x28 [dlm]
 [<ffffffffa023ecea>] process_recv_sockets+0x20/0x28 [dlm]
 [<ffffffff81071802>] worker_thread+0x14d/0x1ed
 [<ffffffff81075a7c>] ? autoremove_wake_function+0x0/0x3d
 [<ffffffff810716b5>] ? worker_thread+0x0/0x1ed
 [<ffffffff810756d3>] kthread+0x6e/0x76
 [<ffffffff81012dea>] child_rip+0xa/0x20
 [<ffffffff81075665>] ? kthread+0x0/0x76
 [<ffffffff81012de0>] ? child_rip+0x0/0x20
Code: 29 e7 ff ff e9 2d 01 00 00 41 8b 74 24 10 0f b7 d0 48 c7 c7 d1 8c 24 a0 31 c0 e8 ab 71 e1 e0 e9 12 01 00 00 41 83 7d 08 00 75 04 <0f> 0b eb fe 4d 8d 7d 68 49 be 00 00 00 00 00 16 00 00 41 8b 55
RIP  [<ffffffffa02406c3>] receive_from_sock+0x554/0x6ed [dlm]
 RSP <ffff880c77c6bc60>
Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Linux version 2.6.32-100.0.19.el5 (mockbuild ca-build9 us oracle com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Fri Sep 17 17:51:41 EDT 2010 Command line: ro root=/dev/mapper/vg_chili0-lv_root rd_LVM_LV=vg_chili0/lv_root rd_LVM_LV=vg_chili0/lv_swap rd_NO_LUKS rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=fr-pc cgroup_disable=memory selinux=0 pcie_aspm=off nmi_watchdog=0 console=ttyS1,115200 maxcpus=1 reset_devices memmap=exactmap memmap=640K 0K memmap=195948K 33408K elfcorehdr=229356K memmap=308K#1993940K memmap=16K#2077704K memmap=4K#2077748K memmap=4K#2077764K memmap=44K#2077768K memmap=72K#2077812K memmap=4K#2077884K memmap=4K#2077888K memmap=4K#2077892K memmap=4K#2078024K memmap=2716K#2078052K memmap=1024K#69204860K memmap=128K#69205884K
KERNEL supported cpus:
  Intel GenuineIntel
  AMD AuthenticAMD
  Centaur CentaurHauls
BIOS-provided physical RAM map:

Here is the configuration :

[root chili1 ~]#  crm configure show
node chili0
node chili1
primitive IPaddr-dhcp ocf:Bull:IPaddr \
        params ip="11.1.0.20" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-dns ocf:Bull:IPaddr \
        params ip="11.1.0.21" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-monitoring-master ocf:Bull:IPaddr \
        params ip="11.1.0.22" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-mysql ocf:Bull:IPaddr \
        params ip="11.1.0.23" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-nfs ocf:Bull:IPaddr \
        params ip="11.1.0.24" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-postgresql ocf:Bull:IPaddr \
        params ip="11.1.0.25" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive IPaddr-tftp ocf:Bull:IPaddr \
        params ip="11.1.0.26" \
        op monitor on-fail="restart" interval="30" \
        meta migration-threshold="1"
primitive dhcp-dhcp-server lsb:dhcpd \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive dlm ocf:pacemaker:controld \
        op monitor interval="120s"
primitive dns-dns-server lsb:named \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive fs-BCM-MCO ocf:Bull:Filesystem \
params device="-L HA_MNGT:MCO" directory="/BCM/MCO" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-conf ocf:Bull:Filesystem \
params device="-L HA_MNGT:CONF" directory="/BCM/conf" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-console ocf:Bull:Filesystem \
params device="-L HA_MNGT:CONSOLE" directory="/BCM/console" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-data ocf:Bull:Filesystem \
params device="-L HA_MNGT:RRDDBs" directory="/BCM/data" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-log ocf:Bull:Filesystem \
params device="-L HA_MNGT:LOGs" directory="/BCM/log" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive fs-BCM-storage ocf:Bull:Filesystem \
params device="-L HA_MNGT:STORAGE" directory="/BCM/storage" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive monitoring-master-errorManager lsb:errorManager \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive monitoring-master-eventManager lsb:eventManager \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive monitoring-master-nagios lsb:nagios \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive monitoring-master-powerManager lsb:powerManager \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive monitoring-master-syslog-ng lsb:syslog-ng-monitoring \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive mysql-fs-DBs ocf:Bull:Filesystem \
params device="-L HA_MNGT:MYSQLDBs" directory="/var/lib/mysql" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive mysql-mysqld ocf:heartbeat:mysql \
params binary="/usr/bin/mysqld_safe" pid="/var/run/mysqld/mysqld.pid" \
        op start interval="0" timeout="" 120 \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive nfs-nfs-server ocf:heartbeat:nfsserver \
params nfs_init_script="/etc/init.d/nfs" nfs_notify_cmd="/usr/sbin/sm-notify" nfs_shared_infodir="/BCM/log/nfs-server-logs" nfs_ip="11.1.0.24" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60"
primitive o2cb ocf:ocfs2:o2cb \
        op monitor interval="120s"
primitive postgresql-clusterdb ocf:heartbeat:pgsql \
        params pgdata="/var/lib/pgsql/data" \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
primitive postgresql-fs-DBs ocf:Bull:Filesystem \
params device="-L HA_MNGT:PGSQLDBs" directory="/var/lib/pgsql/data" fstype="ocfs2" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40"
primitive restofencechili0 stonith:fence_ipmilan \
params ipaddr="11.1.0.10" login="super" passwd="pass" pcmk_host_check="none" action="diag" \
        meta target-role="Stopped"
primitive restofencechili1 stonith:fence_ipmilan \
params ipaddr="11.1.0.11" login="super" passwd="pass" pcmk_host_check="none" action="diag" \
        meta target-role="Stopped"
primitive syslog-ng-syslog-ng lsb:hasyslog-ng \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="20" timeout="40" on-fail="restart" \
        meta migration-threshold="3"
primitive tftp-tftp-server lsb:xinetd \
        op start interval="0" timeout="120" \
        op stop interval="0" timeout="120" \
op monitor interval="20" timeout="60" on-fail="restart" start-delay="60" \
        meta migration-threshold="1"
group dhcp IPaddr-dhcp dhcp-dhcp-server \
        meta target-role="Started" migration-threshold="1"
group dns IPaddr-dns dns-dns-server \
        meta target-role="Started" migration-threshold="1"
group monitoring-master IPaddr-monitoring-master monitoring-master-syslog-ng monitoring-master-nagios monitoring-master-errorManager monitoring-master-eventManager monitoring-master-powerManager \
        meta target-role="Started" migration-threshold="1"
group mysql IPaddr-mysql mysql-mysqld \
        meta target-role="Started" migration-threshold="1"
group nfs IPaddr-nfs nfs-nfs-server \
        meta target-role="Started" migration-threshold="1"
group postgresql IPaddr-postgresql postgresql-clusterdb \
        meta target-role="Started" migration-threshold="1"
group tftp IPaddr-tftp tftp-tftp-server \
        meta target-role="Started" migration-threshold="1"
clone clone-dlm dlm \
meta target-role="Started" globally-unique="false" interleave="true"
clone clone-fs-BCM-MCO fs-BCM-MCO \
meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-conf fs-BCM-conf \
meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-console fs-BCM-console \
meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-data fs-BCM-data \
meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-log fs-BCM-log \
meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-fs-BCM-storage fs-BCM-storage \
meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-mysql-fs-DBs mysql-fs-DBs \
meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-o2cb o2cb \
meta target-role="Started" globally-unique="false" interleave="true"
clone clone-postgresql-fs-DBs postgresql-fs-DBs \
meta interleave="true" ordered="false" true target-role="Started" \
        meta target-role="Started"
clone clone-syslog-ng syslog-ng-syslog-ng \
        meta interleave="true" ordered="false" target-role="Stopped" \
        meta target-role="Stopped"
location forbiddenloc-restofencechili0 restofencechili0 -inf: chili0
location forbiddenloc-restofencechili1 restofencechili1 -inf: chili1
location loc1-group-dhcp dhcp +100: chili0
location loc1-group-dns dns +100: chili1
location loc1-group-monitoring-master monitoring-master +100: chili0
location loc1-group-mysql mysql +100: chili1
location loc1-group-nfs nfs +100: chili1
location loc1-group-postgresql postgresql +100: chili1
location loc1-group-tftp tftp +100: chili0
location loc1-restofencechili0 restofencechili0 +inf: chili1
location loc1-restofencechili1 restofencechili1 +inf: chili0
colocation coloc-clone-fs-BCM-MCO-o2cb inf: clone-fs-BCM-MCO clone-o2cb
colocation coloc-clone-fs-BCM-conf-o2cb inf: clone-fs-BCM-conf clone-o2cb
colocation coloc-clone-fs-BCM-console-o2cb inf: clone-fs-BCM-console clone-o2cb
colocation coloc-clone-fs-BCM-data-o2cb inf: clone-fs-BCM-data clone-o2cb
colocation coloc-clone-fs-BCM-log-o2cb inf: clone-fs-BCM-log clone-o2cb
colocation coloc-clone-fs-BCM-storage-o2cb inf: clone-fs-BCM-storage clone-o2cb colocation coloc-clone-mysql-fs-DBs-o2cb inf: clone-mysql-fs-DBs clone-o2cb colocation coloc-clone-postgresql-fs-DBs-o2cb inf: clone-postgresql-fs-DBs clone-o2cb colocation coloc-fs-BCM-MCO-monitoring-master +inf: monitoring-master clone-fs-BCM-MCO
colocation coloc-fs-BCM-MCO-nfs +inf: nfs clone-fs-BCM-MCO
colocation coloc-fs-BCM-conf-monitoring-master +inf: monitoring-master clone-fs-BCM-conf
colocation coloc-fs-BCM-conf-nfs +inf: nfs clone-fs-BCM-conf
colocation coloc-fs-BCM-console-nfs +inf: nfs clone-fs-BCM-console
colocation coloc-fs-BCM-data-monitoring-master +inf: monitoring-master clone-fs-BCM-data
colocation coloc-fs-BCM-data-nfs +inf: nfs clone-fs-BCM-data
colocation coloc-fs-BCM-log-monitoring-master +inf: monitoring-master clone-fs-BCM-log
colocation coloc-fs-BCM-log-nfs +inf: nfs clone-fs-BCM-log
colocation coloc-mysql-fs-DBs-mysql +inf: mysql clone-mysql-fs-DBs
colocation coloc-postgresql-fs-DBs-postgresql +inf: postgresql clone-postgresql-fs-DBs
colocation o2cb-with-dlm inf: clone-o2cb clone-dlm
order order-clone-fs-BCM-MCO-o2cb inf: clone-o2cb clone-fs-BCM-MCO
order order-clone-fs-BCM-conf-o2cb inf: clone-o2cb clone-fs-BCM-conf
order order-clone-fs-BCM-console-o2cb inf: clone-o2cb clone-fs-BCM-console
order order-clone-fs-BCM-data-o2cb inf: clone-o2cb clone-fs-BCM-data
order order-clone-fs-BCM-log-o2cb inf: clone-o2cb clone-fs-BCM-log
order order-clone-fs-BCM-storage-o2cb inf: clone-o2cb clone-fs-BCM-storage
order order-clone-mysql-fs-DBs-o2cb inf: clone-o2cb clone-mysql-fs-DBs
order order-clone-postgresql-fs-DBs-o2cb inf: clone-o2cb clone-postgresql-fs-DBs order order-monitoring-master inf: clone-fs-BCM-MCO clone-fs-BCM-log clone-fs-BCM-data clone-fs-BCM-conf monitoring-master
order order-mysql inf: clone-mysql-fs-DBs mysql
order order-nfs inf: clone-fs-BCM-console clone-fs-BCM-MCO clone-fs-BCM-log clone-fs-BCM-data clone-fs-BCM-conf nfs
order order-postgresql inf: clone-postgresql-fs-DBs postgresql
order start-o2cb-after-dlm inf: clone-dlm clone-o2cb
property $id="cib-bootstrap-options" \
        dc-version="1.1.2-c6b59218ee949eebff30e837ff6f3824ed0ab86b" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        stonith-enabled="true" \
        no-quorum-policy="ignore" \
        default-resource-stickiness="5000" \
        last-lrm-refresh="1286452453"

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]