[Linux-cluster] IP Relocate Error / IP Restart error

Tue Jul 10 14:50:44 UTC 2007

Lon Hohberger wrote:
> On Mon, Jul 09, 2007 at 04:06:40PM +0200, dan.deshayes at algitech.com wrote:
>   
>> Hi,
>> thx for the reply but I'm not sure thats my problem.
>> I couldn't find the syntax for disabling the exclusivity (I'm not using gui)
>> but as far as I've understood its disabled by default. I tried with
>> exclusive="0" (not sure if its the right syntax though) but didn't solve
>> my problem.
>> But if the cluster was running with exclusive-mode the relocation
>> shouldn't work either, right?
>> As stated earlier the service restarts fine aslong as the node already
>> have an external ip.
>> Anyone with other ideas. maybe related to the "IP monitor failing
>> periodically"? but I don't have any problems running the cluster aslong as
>> the bond0 interface goes down, so maybe not.
>>     
>
> I haven't figured out the cause here, but disabling the 'ping' test
> seems to fix it.
>
> (edit ip.sh and change the 'ping' command to /bin/true or whatever)
>
>   
I'm afraid it didn't help much.
I changed the pingcmd in the function ping_check to /bin/true restarted 
the rgmanagers but didn't work.

Here is my full configuration: http://nangilima.se/cluster.conf

I can have the full cluster running without problem, when first starting

bit when i then try to restart it with 'clusvcadm -R' it says:
Jul 10 16:22:20 asl012 clurgmgrd[412]: <notice> Stopping service 
service:www-project1
Jul 10 16:22:31 asl012 clurgmgrd[412]: <notice> Service 
service:www-project1 is stopped
Jul 10 16:22:31 asl012 clurgmgrd[412]: <notice> Starting stopped service 
service:www-project1
Jul 10 16:22:32 asl012 clurgmgrd[412]: <notice> start on ip "<external 
ip 1>" returned 1 (generic error)
Jul 10 16:22:32 asl012 clurgmgrd[412]: <warning> #68: Failed to start 
service:www-project1; return value: 1
Jul 10 16:22:32 asl012 clurgmgrd[412]: <notice> Stopping service 
service:www-project1
Jul 10 16:22:32 asl012 clurgmgrd: [412]: <err> script:psql-db: stop of 
/etc/init.d/postgresql failed (returned 1)
Jul 10 16:22:32 asl012 clurgmgrd[412]: <notice> stop on script "psql-db" 
returned 1 (generic error)
Jul 10 16:22:32 asl012 clurgmgrd[412]: <crit> #12: RG 
service:www-project1 failed to stop; intervention required
Jul 10 16:22:32 asl012 clurgmgrd[412]: <notice> Service 
service:www-project1 is failed
Jul 10 16:22:32 asl012 clurgmgrd[412]: <crit> #13: Service 
service:www-project1 failed to stop cleanly

then i disable the service and enable it on node usl001-mgmnt which 
works fine (since it got net through its own ip and route)
Jul 10 16:25:18 usl001 clurgmgrd[30130]: <notice> Starting disabled 
service service:www-project1
Jul 10 16:25:18 usl001 avahi-daemon[3533]: Registering new address 
record for <external ip 1> on bond0.
Jul 10 16:25:22 usl001 clurgmgrd[30130]: <notice> Service 
service:www-project1 started

also relocating it to node usl002-mgmnt works and then back to 
usl001-mgmnt works.
But never back to asl012-mgmnt except when i manully puts back the ip 
and route.

I'm using bond0 interface configured the following:
DEVICE=bond0
USERCTL=no
ONBOOT=yes
BROADCAST=<broadcast>
NETWORK=<network>.32
NETMASK=255.255.255.224
IPADDR=<external ip 1>
GATEWAY=<gw ip>

with slave interfaces eth0 and eth3 like this:
DEVICE=eth0 /3
USERCTL=no
ONBOOT=yes
MASTER=bond0
SLAVE=yes
BOOTPROTO=none

I can supply more info if anyone wants to give it a shot.
sorry for repeting my question but i'm closing a deadline and walking 
blind ;)

Regards, Dan