[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Cluster fails after fencing by DRAC



Hi all!

I have a problem with 3 nodes cluster. When I run "fence_node node1" the node1 reeboot by drac succesfully. When node1 restarts then gets frozen:

------------------
starting clvmd: dlm: got connection fron 32
dlm: connecting to 33
dlm: got connection fron 33
[frozen]

* cman_tool services shows:
type level name id state fence 0 default 0001001f none [31 32 33] dlm 1 clvmd 00010020 none [31 32 33] dlm 1 rgmanager 00020020 none [32 33]

It seems rgmanager has not 31 (?)

* clustat shows:
Member Status: Quorate

 Member Name                        ID   Status
 ------ ----                        ---- ------
 xenr3u1.domain.com                  31 Online
 xenr3u2.domain.com                 32 Online, Local
 xenr3u3.domain.com                 33 Online

-------------------

Then I rebooted again the node1:
Starting cluster
   Loading modules DLM .......
done
starting ccsd
starting cman
starting daemons
starting fencing
[frozen again]

after long time starting fencing [done] but cman_tool services fails

* cman_tool services shows:
type level name id state fence 0 default 0001001f FAIL_ALL_STOPPED
[31 32 33]
dlm              1     clvmd      00010020 FAIL_STOP_WAIT
[31 32 33]
dlm              1     rgmanager  00020020 FAIL_STOP_WAIT

* clustat shows:
Member Status: Quorate

 Member Name                        ID   Status
 ------ ----                        ---- ------
 xenr3u1.domain.com                  31 Online
 xenr3u2.domain.com                 32 Online, Local
 xenr3u3.domain.com                 33 Online

/etc/init.d/rgmanager restart
Shutting down Cluster Service Manager...
Waiting for services to stop:
[long timeeeeeeee]
----------------------------------

I saw this page translated to english (http://translate.google.com/translate?u=http%3A%2F%2Fken-etsu-tech.blogspot.com%2F2007%2F11%2Fred-hat-cluster-kernel-xen.html&langpair=ja%7Cen&hl=es&ie=UTF-8).
It's exactly the same. A kernel bug? clvmd bug?

Linux xenr3u2 2.6.18-8.1.15.el5xen #1 SMP Mon Oct 22 09:01:12 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5.centos
lvm2-cluster-2.02.16-3.el5



Sometimes the node starts ok and cman_tool is also ok.

* /etc/lvm.conf:

devices {
   dir = "/dev"
   scan = [ "/dev" ]
filter = [ "a/.*/" ] cache = "/etc/lvm/.cache"
   write_cache_state = 1
sysfs_scan = 1 md_component_detection = 1
}
log { verbose = 0
   syslog = 1
overwrite = 0 level = 0
   indent = 1
   command_names = 0
   prefix = "  "
}
backup {
   backup = 1
   backup_dir = "/etc/lvm/backup"
   archive = 1
   archive_dir = "/etc/lvm/archive"
   retain_min = 10
   retain_days = 30
}
shell {
   history_size = 100
}
global {
   library_dir = "/usr/lib64"
   umask = 077
   test = 0
   activation = 1
   proc = "/proc"
   locking_type = 3
   fallback_to_clustered_locking = 1
   fallback_to_local_locking = 1
   locking_dir = "/var/lock/lvm"
}
activation {
   missing_stripe_filler = "/dev/ioerror"
   reserved_stack = 256
   reserved_memory = 8192
   process_priority = -18
   mirror_region_size = 512
   mirror_log_fault_policy = "allocate"
   mirror_device_fault_policy = "remove"
}



That's all ;-)
Thanks in advance








begin:vcard
fn:Jorge Gonzalez y Hurtado de Mendoza
n:Gonzalez y Hurtado de Mendoza;Jorge
org:DEGESYS
adr;quoted-printable:Edif 3 Plt 3=C2=AA;;Av de la Vega 15;Alcobendas;Madrid;28100;Spain
email;internet:jorge gonzalez degesys com
title:Tecnico de Sistemas
tel;work:+34911517194
tel;fax:+34911517199
url:http://www.degesys.com
version:2.1
end:vcard


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]