[rdo-list] Overcloud pacemaker services restart behavior causes downtime

Thu Aug 4 15:37:38 UTC 2016

I'll test.

For the rabbitmq issue you need this patch:
https://github.com/ClusterLabs/resource-agents/commit/1d15f54923a77969fb035313eb979ee6efb3470c

Had the same problem too :)

On Thu, Aug 4, 2016 at 4:32 PM, Ilja Maslov <imaslov at dispersivegroup.com>
wrote:

> Not on this fresh install, but what I saw few weeks back was that when
> controller nodes restart, I see services created with FQDN names that were
> up and I was able to safely clean the original services with short host
> names.  But I haven’t re-tested controller restarts afterwards.
>
>
>
> With my fresh install, rabbitmq is not coming up upon reboot (‘unknown
> error’ (1)), so I need to fix this first before I’m able to proceed with
> testing.   I’ll let you know how it goes.
>
>
>
> Ilja
>
>
>
> *From:* Pedro Sousa [mailto:pgsousa at gmail.com]
> *Sent:* Thursday, August 04, 2016 11:23 AM
> *To:* Ilja Maslov <imaslov at dispersivegroup.com>
> *Cc:* Raoul Scarazzini <rasca at redhat.com>; rdo-list <rdo-list at redhat.com>
>
> *Subject:* Re: [rdo-list] Overcloud pacemaker services restart behavior
> causes downtime
>
>
>
> Hi Ilja,
>
>
>
> I noticed that too. Did you try to delete the services that are marked
> down and retest?
>
>
>
> Thanks
>
>
>
> On Thu, Aug 4, 2016 at 4:12 PM, Ilja Maslov <imaslov at dispersivegroup.com>
> wrote:
>
> Hi,
>
> I've noticed similar behavior on Mitaka installed from
> trunk/mitaka/passed-ci.  Appreciate if you could put me in CC.
>
> Additional detail is that during initial deployment, nova services,
> neutron agents and heat engines are registered with the short hostnames and
> upon controller node restart, these will all show with state=down.
> Probably because hosts files are re-written after the services had been
> started with FQDN as a first entry.  I do not know to what extent pacemaker
> resources are monitored, but it could be related to the problem you are
> reporting.
>
> Cheers,
> Ilja
>
>
>
> -----Original Message-----
> From: rdo-list-bounces at redhat.com [mailto:rdo-list-bounces at redhat.com] On
> Behalf Of Raoul Scarazzini
> Sent: Thursday, August 04, 2016 9:31 AM
> To: Pedro Sousa <pgsousa at gmail.com>
> Cc: rdo-list <rdo-list at redhat.com>
> Subject: Re: [rdo-list] Overcloud pacemaker services restart behavior
> causes downtime
>
> That will be great, thank you, put me in CC so I can follow this.
>
> Thanks,
>
> --
> Raoul Scarazzini
> rasca at redhat.com
>
> On 04/08/2016 15:29, Pedro Sousa wrote:
> > Hi Raoul,
> >
> > this only happens when the node comes back online after booting. When I
> > stop the node with "pcs cluster stop", everything works fine, even if
> > VIP is active on that node.
> >
> > Anyway I will file a bugzilla.
> >
> > Thanks
> >
> >
> >
> >
> > On Thu, Aug 4, 2016 at 1:51 PM, Raoul Scarazzini <rasca at redhat.com
> > <mailto:rasca at redhat.com>> wrote:
> >
> >     Ok, so we are on mitaka. Here we have VIPs that are a (Optional)
> >     dependency for haproxy, which is a (Mandatory) dependency for
> >     openstack-core from which all the others (nova, neutron, cinder and
> so
> >     on) depends.
> >     This means that if you are rebooting a controller in which a VIP is
> >     active you will NOT have a restart of openstack-core since haproxy
> will
> >     not be restarted, because of the OPTIONAL constraint.
> >     So the behavior you're describing is quite strange.
> >     Maybe other components are in the game here. Can you open a bugzilla
> >     with the exact steps you're using to reproduce the problem and share
> the
> >     sosreports of your systems?
> >
> >     Thanks,
> >
> >     --
> >     Raoul Scarazzini
> >     rasca at redhat.com <mailto:rasca at redhat.com>
> >
> >     On 04/08/2016 12:34, Pedro Sousa wrote:
> >     > Hi,
> >     >
> >     > I use mitaka from centos sig repos:
> >     >
> >     > Centos 7.2
> >     > centos-release-openstack-mitaka-1-3.el7.noarch
> >     > pacemaker-cli-1.1.13-10.el7_2.2.x86_64
> >     > pacemaker-1.1.13-10.el7_2.2.x86_64
> >     > pacemaker-remote-1.1.13-10.el7_2.2.x86_64
> >     > pacemaker-cluster-libs-1.1.13-10.el7_2.2.x86_64
> >     > pacemaker-libs-1.1.13-10.el7_2.2.x86_64
> >     > corosynclib-2.3.4-7.el7_2.3.x86_64
> >     > corosync-2.3.4-7.el7_2.3.x86_64
> >     > resource-agents-3.9.5-54.el7_2.10.x86_64
> >     >
> >     > Let me know if you need more info.
> >     >
> >     > Thanks
> >     >
> >     >
> >     >
> >     > On Thu, Aug 4, 2016 at 11:21 AM, Raoul Scarazzini <
> rasca at redhat.com <mailto:rasca at redhat.com>
> >     > <mailto:rasca at redhat.com <mailto:rasca at redhat.com>>> wrote:
> >     >
> >     >     Hi,
> >     >     can you please give us more information about the environment
> you are
> >     >     using? Release, package versions and so on.
> >     >
> >     >     --
> >     >     Raoul Scarazzini
> >     >     rasca at redhat.com <mailto:rasca at redhat.com>
> >     <mailto:rasca at redhat.com <mailto:rasca at redhat.com>>
> >     >
> >     >     On 04/08/2016 11:34, Pedro Sousa wrote:
> >     >     > Hi all,
> >     >     >
> >     >     > I have an overcloud with 3 controller nodes, everything is
> >     working fine,
> >     >     > the problem is when I reboot one of the controllers. When
> >     the node comes
> >     >     > online, all the services (nova-api, neutron-server) on the
> >     other nodes
> >     >     > are also restarted, causing a couple of minutes of downtime
> >     until
> >     >     > everything is recovered.
> >     >     >
> >     >     > In the example below I restarted controller2 and I see these
> >     messages on
> >     >     > controller0. My question is if this is the expected
> >     behavior, because in
> >     >     > my opinion it shouldn't happen.
> >     >     >
> >     >     > *Authorization Failed: Service Unavailable (HTTP 503)*
> >     >     > *== Glance images ==*
> >     >     > *Service Unavailable (HTTP 503)*
> >     >     > *== Nova managed services ==*
> >     >     > *No handlers could be found for logger
> >     >     "keystoneauth.identity.generic.base"*
> >     >     > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
> >     >     > *== Nova networks ==*
> >     >     > *No handlers could be found for logger
> >     >     "keystoneauth.identity.generic.base"*
> >     >     > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
> >     >     > *== Nova instance flavors ==*
> >     >     > *No handlers could be found for logger
> >     >     "keystoneauth.identity.generic.base"*
> >     >     > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
> >     >     > *== Nova instances ==*
> >     >     > *No handlers could be found for logger
> >     >     "keystoneauth.identity.generic.base"*
> >     >     > *ERROR (ServiceUnavailable): Service Unavailable (HTTP 503)*
> >     >     > *[root at overcloud-controller-0 ~]# openstack-status *
> >     >     > *Broadcast message from
> >     >     > systemd-journald at overcloud-controller-0.localdomain (Thu
> >     2016-08-04
> >     >     > 09:22:31 UTC):*
> >     >     > *
> >     >     > *
> >     >     > *haproxy[2816]: proxy neutron has no server available!*
> >     >     >
> >     >     > Thanks,
> >     >     > Pedro Sousa
> >     >     >
> >     >     >
> >     >     >
> >     >     >
> >     >     > _______________________________________________
> >     >     > rdo-list mailing list
> >     >     > rdo-list at redhat.com <mailto:rdo-list at redhat.com>
> >     <mailto:rdo-list at redhat.com <mailto:rdo-list at redhat.com>>
> >     >     > https://www.redhat.com/mailman/listinfo/rdo-list
> >     >     >
> >     >     > To unsubscribe: rdo-list-unsubscribe at redhat.com <mailto:
> rdo-list-unsubscribe at redhat.com>
> >     >     <mailto:rdo-list-unsubscribe at redhat.com
> >     <mailto:rdo-list-unsubscribe at redhat.com>>
> >     >     >
> >     >
> >     >
> >
> >
>
> _______________________________________________
> rdo-list mailing list
> rdo-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rdo-list
>
> To unsubscribe: rdo-list-unsubscribe at redhat.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/rdo-list/attachments/20160804/111f5d6f/attachment.htm>