[Linux-cluster] can't communicate with fenced -1

Wed Jun 25 09:24:04 UTC 2008

Hi,
an other problem the process clurgmgrd don't dead:

[root at yoda2 ~]# /etc/init.d/rgmanager stop
Shutting down Cluster Service Manager...
Waiting for services to stop:

but nothing to do...

[root at yoda2 ~]# ps -ef | grep clurgmgrd
root      6620     1 55 Jun03 ?        12-02:06:46 clurgmgrd
[root at yoda2 ~]# kill -9 6620
[root at yoda2 ~]# ps -ef | grep clurgmgrd

and the process clvmd

[root at yoda2 ~]# /etc/init.d/clvmd status
clvmd dead but subsys locked
active volumes: LV06 LV_nex2

help me ... i don't want reboot the yoda2 ...

bye

On Wed, Jun 25, 2008 at 10:55 AM, Gian Paolo Buono <gpbuono at gmail.com>
wrote:

> Hi,
> if I try to restart on yoda2 cman
> [root at yoda2 ~]# /etc/init.d/cman restart
> Stopping cluster:
>    Stopping fencing... done
>    Stopping cman... done
>    Stopping ccsd... done
>    Unmounting configfs... done
>                                                            [  OK  ]
> Starting cluster:
>    Enabling workaround for Xend bridged networking... done
>    Loading modules... done
>    Mounting configfs... done
>    Starting ccsd... done
>    Starting cman... done
>    Starting daemons... done
>    Starting fencing... failed
>
>                                                            [FAILED]
> [root at yoda2 ~]# tail -f /var/log/messages
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] Members Joined:
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ]   r(0) ip(172.20.0.174)
> Jun 25 10:50:42 yoda2 openais[18429]: [SYNC ] This node is within the
> primary component and will provide service.
> Jun 25 10:50:42 yoda2 openais[18429]: [TOTEM] entering OPERATIONAL state.
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
> 172.20.0.174
> Jun 25 10:50:42 yoda2 openais[18429]: [CLM  ] got nodejoin message
> 172.20.0.175
> Jun 25 10:50:42 yoda2 openais[18429]: [CPG  ] got joinlist message from
> node 2
> Jun 25 10:50:42 yoda2 openais[18429]: [CMAN ] cman killed by node 1 because
> we were killed by cman_tool or other application
> Jun 25 10:50:42 yoda2 ccsd[18421]: Initial status:: Quorate
> Jun 25 10:50:43 yoda2 gfs_controld[18455]: cman_init error 111
> Jun 25 10:51:10 yoda2 ccsd[18421]: Unable to connect to cluster
> infrastructure after 30 seconds.
> Jun 25 10:51:37 yoda2 snmpd[4764]: Connection from UDP: [172.20.0.32
> ]:55090
>
>
> on this server there are 3 xen domu and i can't to reboot yoda2 :( ..
>
> best regards..  and sorry for my english :)
>
> 2008/6/25 GS R <gsrlinux at gmail.com>:
>
>>
>>>
>>>
>>> 2008/6/25 GS R <gsrlinux at gmail.com>:
>>>
>>>>
>>>>
>>>> On 6/24/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We have two RHEL5.1 boxes installed sharing a
>>>>> single iscsi emc2 SAN, whitout fence devices. System is configured
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> as a high-availability system of xen guest.
>>>>>
>>>>> One of the most repeating problems are fence_tool related.
>>>>>
>>>>>   # service cman start
>>>>>   Starting cluster:
>>>>>      Loading modules... done
>>>>>      Mounting configfs... done
>>>>>      Starting ccsd... done
>>>>>      Starting cman... done
>>>>>      Starting daemons... done
>>>>>  Starting fencing... fence_tool: can't communicate with fenced -1
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>  # fenced -D
>>>>>   1204556546 cman_init error 0 111
>>>>>
>>>>>   # clustat
>>>>>   CMAN is not running.
>>>>>
>>>>>   # cman_tool join
>>>>>
>>>>>   # clustat
>>>>>   msg_open: Connection refused
>>>>>
>>>>>   Member Status: Quorate
>>>>>     Member Name                        ID   Status
>>>>>
>>>>>     ------ ----                        ---- ------
>>>>>     yoda1                             1 Online, Local
>>>>>     yoda2                             2 Offline
>>>>>
>>>>> Sometimes this problem gets solved if the two machines are rebooted at
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> the same time. But in the current HA configuration, I cannot guarantee
>>>>> two systems will be rebooted at the same time for every problem we
>>>>> face. This is my config file:
>>>>>
>>>>> ###################################cluster.conf####################################
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> <?xml version="1.0"?>
>>>>> <cluster alias="yoda-cl" config_version="2" name="yoda-cl">
>>>>>         <fence_daemon clean_start="0" post_fail_delay="0" post_join_delay="3"/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>         <clusternodes>
>>>>>                 <clusternode name="yoda2" nodeid="1" votes="1">
>>>>>                         <fence/>
>>>>>                 </clusternode>
>>>>>                 <clusternode name="yoda1" nodeid="2" votes="1">
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                         <fence/>
>>>>>                 </clusternode>
>>>>>         </clusternodes>
>>>>>         <cman expected_votes="1" two_node="1"/>
>>>>>         <rm>
>>>>>                 <failoverdomains/>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>                 <resources/>
>>>>>         </rm>
>>>>>         <fencedevices/>
>>>>> </cluster>
>>>>> ###################################cluster.conf####################################
>>>>> Regards.
>>>>>
>>>>> Hi
>>>>
>>>> I configured a two node cluster with no fence device on RHEL5.1.
>>>> The cluster started and stopped with no issues. The only difference that
>>>> I see is that I have used FQDN in my cluster.conf
>>>>
>>>> i.e., <clusternode name="yoda2*.gsr.com*" nodeid="1" votes="1">
>>>>
>>>> Check your /etc/hosts if it has the FQDN in it.
>>>>
>>>> Thanks
>>>> Gowrishankar Rajaiyan
>>>>
>>>>
>>>>
>>>
>>
>> On 6/25/08, Gian Paolo Buono <gpbuono at gmail.com> wrote:
>>
>>> Hi,
>>> the problem of my cluster is that it start-up weel but after two days the
>>> problem that I have described is running, and this problem gets solved if
>>> the two machines are rebooted at the same time.
>>>
>>> Thanks
>>> Gian Paolo
>>>
>>
>>
>> Hi Gian
>>
>> Could you please attach the logs.
>>
>> Thanks
>> Gowrishankar Rajaiyan
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster at redhat.com
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20080625/b39ebe5e/attachment.htm>