[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] can't communicate with fenced -1



Hi,
an other problem the process clurgmgrd don't dead:

[root yoda2 ~]# /etc/init.d/rgmanager stop
Shutting down Cluster Service Manager...
Waiting for services to stop:   

but nothing to do...

[root yoda2 ~]# ps -ef | grep clurgmgrd
root      6620     1 55 Jun03 ?        12-02:06:46 clurgmgrd
[root yoda2 ~]# kill -9 6620   
[root yoda2 ~]# ps -ef | grep clurgmgrd

and the process clvmd

[root yoda2 ~]# /etc/init.d/clvmd status
clvmd dead but subsys locked
active volumes: LV06 LV_nex2

help me ... i don't want reboot the yoda2 ...

bye


On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono gmail com> wrote:
Hi,
if I try to restart on yoda2 cman
[root yoda2 ~]# /etc/init.d/cman restart
Stopping cluster:
   Stopping fencing... done
   Stopping cman... done
   Stopping ccsd... done
   Unmounting configfs... done
                                                           [  OK  ]
Starting cluster:
   Enabling workaround for Xend bridged networking... done

   Loading modules... done
   Mounting configfs... done
   Starting ccsd... done
   Starting cman... done
   Starting daemons... done
   Starting fencing... failed

                                                           [FAILED]
[root yoda2 ~]# tail -f /var/log/messages
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0) ip(172.20.0.174)
Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the primary component and will provide service.
Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state.
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message 172.20.0.174
Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message 172.20.0.175
Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message from node 2
Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because we were killed by cman_tool or other application
Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster infrastructure after 30 seconds.
Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32]:55090


on this server there are 3 xen domu and i can't to reboot yoda2 :( ..

best regards..  and sorry for my english :)

2008/6/25 GS R <gsrlinux gmail com>:



2008/6/25 GS R <gsrlinux gmail com>:


On 6/24/08, Gian Paolo Buono <gpbuono gmail com> wrote:
Hi,

We have two RHEL5.1 boxes installed sharing a
single iscsi emc2 SAN, whitout fence devices. System is configured
as a high-availability system of xen guest.

One of the most repeating problems are fence_tool related.

# service cman start
Starting cluster:
Loading modules... done
Mounting configfs... done
Starting ccsd... done
Starting cman... done
Starting daemons... done
Starting fencing... fence_tool: can't communicate with fenced -1

# fenced -D
1204556546 cman_init error 0 111

# clustat
CMAN is not running.

# cman_tool join

# clustat
msg_open: Connection refused
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
yoda1 1 Online, Local
yoda2 2 Offline

Sometimes this problem gets solved if the two machines are rebooted at
the same time. But in the current HA configuration, I cannot guarantee
two systems will be rebooted at the same time for every problem we
face. This is my config file:

###################################cluster.conf####################################

<?xml version="1.0"?>
<cluster alias="yoda-cl" config_version="2" name="yoda-cl">
<fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
<clusternodes>
<clusternode name="yoda2" nodeid="1" votes="1">
<fence/>
</clusternode>
<clusternode name="yoda1" nodeid="2" votes="1">
<fence/>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<rm>
<failoverdomains/>
<resources/>
</rm>
<fencedevices/>
</cluster>
###################################cluster.conf####################################
Regards.
Hi
 
I configured a two node cluster with no fence device on RHEL5.1.
The cluster started and stopped with no issues. The only difference that I see is that I have used FQDN in my cluster.conf
 
i.e., <clusternode name="yoda2.gsr.com" nodeid="1" votes="1">
 
Check your /etc/hosts if it has the FQDN in it.
 
Thanks
Gowrishankar Rajaiyan

 


On 6/25/08, Gian Paolo Buono <gpbuono gmail com> wrote:
Hi,

the problem of my cluster is that it start-up weel but after two days the problem that I have described is running, and this problem gets solved if the two machines are rebooted at the same time.

Thanks
Gian Paolo
 
 
Hi Gian
 
Could you please attach the logs.
 
Thanks
Gowrishankar Rajaiyan




[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]