[Linux-cluster] Cluster fails after fencing by DRAC
Jorge Gonzalez
jorge.gonzalez at degesys.com
Thu Jan 10 16:18:21 UTC 2008
Hi all!
I have a problem with 3 nodes cluster. When I run "fence_node node1" the
node1 reeboot by drac succesfully. When node1 restarts then gets frozen:
------------------
starting clvmd: dlm: got connection fron 32
dlm: connecting to 33
dlm: got connection fron 33
[frozen]
* cman_tool services shows:
type level name id state
fence 0 default 0001001f none
[31 32 33]
dlm 1 clvmd 00010020 none
[31 32 33]
dlm 1 rgmanager 00020020 none
[32 33]
It seems rgmanager has not 31 (?)
* clustat shows:
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
xenr3u1.domain.com 31 Online
xenr3u2.domain.com 32 Online, Local
xenr3u3.domain.com 33 Online
-------------------
Then I rebooted again the node1:
Starting cluster
Loading modules DLM .......
done
starting ccsd
starting cman
starting daemons
starting fencing
[frozen again]
after long time starting fencing [done] but cman_tool services fails
* cman_tool services shows:
type level name id state
fence 0 default 0001001f FAIL_ALL_STOPPED
[31 32 33]
dlm 1 clvmd 00010020 FAIL_STOP_WAIT
[31 32 33]
dlm 1 rgmanager 00020020 FAIL_STOP_WAIT
* clustat shows:
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
xenr3u1.domain.com 31 Online
xenr3u2.domain.com 32 Online, Local
xenr3u3.domain.com 33 Online
/etc/init.d/rgmanager restart
Shutting down Cluster Service Manager...
Waiting for services to stop:
[long timeeeeeeee]
----------------------------------
I saw this page translated to english
(http://translate.google.com/translate?u=http%3A%2F%2Fken-etsu-tech.blogspot.com%2F2007%2F11%2Fred-hat-cluster-kernel-xen.html&langpair=ja%7Cen&hl=es&ie=UTF-8).
It's exactly the same. A kernel bug? clvmd bug?
Linux xenr3u2 2.6.18-8.1.15.el5xen #1 SMP Mon Oct 22 09:01:12 EDT 2007
x86_64 x86_64 x86_64 GNU/Linux
cman-2.0.64-1.0.1.el5
rgmanager-2.0.24-1.el5.centos
lvm2-cluster-2.02.16-3.el5
Sometimes the node starts ok and cman_tool is also ok.
* /etc/lvm.conf:
devices {
dir = "/dev"
scan = [ "/dev" ]
filter = [ "a/.*/" ]
cache = "/etc/lvm/.cache"
write_cache_state = 1
sysfs_scan = 1
md_component_detection = 1
}
log {
verbose = 0
syslog = 1
overwrite = 0
level = 0
indent = 1
command_names = 0
prefix = " "
}
backup {
backup = 1
backup_dir = "/etc/lvm/backup"
archive = 1
archive_dir = "/etc/lvm/archive"
retain_min = 10
retain_days = 30
}
shell {
history_size = 100
}
global {
library_dir = "/usr/lib64"
umask = 077
test = 0
activation = 1
proc = "/proc"
locking_type = 3
fallback_to_clustered_locking = 1
fallback_to_local_locking = 1
locking_dir = "/var/lock/lvm"
}
activation {
missing_stripe_filler = "/dev/ioerror"
reserved_stack = 256
reserved_memory = 8192
process_priority = -18
mirror_region_size = 512
mirror_log_fault_policy = "allocate"
mirror_device_fault_policy = "remove"
}
That's all ;-)
Thanks in advance
-------------- next part --------------
A non-text attachment was scrubbed...
Name: jorge.gonzalez.vcf
Type: text/x-vcard
Size: 350 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080110/b6a70fe2/attachment.vcf>
More information about the Linux-cluster
mailing list