[Spacewalk-list] osa-dispatcher Error Caught

Matt Moldvan matt at moldvan.com
Thu Mar 30 20:24:28 UTC 2017


Yeah unfortunately the osa-dispatcher code is unaware of the need for
multiple dispatchers, which makes a multi-master setup for
redundancy/failover difficult.  Also, when you change the OSAD back end it
will recreate the password and dirty hacks are necessary.  Maybe this has
been fixed in 2.6, however I had a very tough time trying to upgrade from
2.5 and had to revert, in my last attempt, and my priorities have shifted
since then.

Just this past week I ran into an issue where removing some packages
(vmware tools) also removed openjdk on our Spacewalk master, and I had to
restore the master from backup.  After that, getting all my clients back in
online state with OSAD was a struggle... the dispatcher would crash at
random times with little output due what I can only venture to guess is one
client having an issue:

2017/03/30 14:56:57 -05:00 9475 0.0.0.0: osad/jabber_lib.main('ERROR',
(<type 'exceptions.NameError'>, NameError("global name 'jabber_id' is not
defined",), <traceback object at 0x1b930e0>))

And when it crashed, it would take all my clients down and they would
reregister, then crash again, until finally it got stable.  This wouldn't
be a big deal for a small environment, but at 6,000 systems plus, the
traffic and load generated by OSAD clients doing this over and over again
until the OSA dispatcher worked is embarrassing.  I'm surprised it hasn't
caused an outage yet...

On Thu, Mar 30, 2017 at 4:02 PM Camp, Neil (NIH/NCI) [C] <neil.camp at nih.gov>
wrote:

> In the end I did end up moving to pgsql and it has really stabilized
> jabberd. I had to clear out the rhnpushdispatcher table a couple of times.
> I also got invalid password after moving to pgsql. I found a post from
> https://www.redhat.com/archives/spacewalk-list/2016-August/msg00091.html
> which helped. It suggested
>
>
>
> delete from authreg where username = 'rhn-dispatcher-sat';
>
> delete from "roster-items" where "collection-owner" =
> 'rhn-dispatcher-sat at ourmaster1.fqdn';
>
> delete from status where "collection-owner" = 'rhn-dispatcher-sat
> ourmaster1 fqdn';
>
> delete from active where "collection-owner" = 'rhn-dispatcher-sat
> ourmaster1 fqdn';
>
>
>
> From the jabberd database.
>
>
>
> *From: *<spacewalk-list-bounces at redhat.com> on behalf of Konstantin
> Raskoshnyi <konrasko at gmail.com>
> *Reply-To: *"spacewalk-list at redhat.com" <spacewalk-list at redhat.com>
> *Date: *Thursday, March 30, 2017 at 2:18 PM
>
>
> *To: *"spacewalk-list at redhat.com" <spacewalk-list at redhat.com>
> *Subject: *Re: [Spacewalk-list] osa-dispatcher Error Caught
>
>
>
> This message was identified as a phishing
> <http://aka.ms/LearnAboutPhishing> scam.
>
> Feedback <http://aka.ms/SafetyTipsFeedback>
>
> I guess the main problem was caused by machines with login:
> osad-85cdcd1a3e <osad-85cdcd1a3e at ncias-p1466-v.nci.nih.gov>
>
>
>
> I recommend to move from berkley to pgsql db
>
>
>
> On Thu, Mar 30, 2017 at 9:49 AM, Camp, Neil (NIH/NCI) [C] <
> neil.camp at nih.gov> wrote:
>
> I was able to connect to the database and select one row. I deleted the
> row in rhnpushdispatcher and stopped jabberd. I removed
> /var/lib/jabberd/db/* and started jabberd. I waited for jabberd to start up
> and then started osa-dispatcher. It is staying up and I checked a host and
> it is showing as online for OSA status. Thank you for your help!
>
>
>
> *From: *<spacewalk-list-bounces at redhat.com> on behalf of Matt Moldvan <
> matt at moldvan.com>
> *Reply-To: *"spacewalk-list at redhat.com" <spacewalk-list at redhat.com>
> *Date: *Thursday, March 30, 2017 at 11:37 AM
> *To: *"spacewalk-list at redhat.com" <spacewalk-list at redhat.com>
> *Subject: *Re: [Spacewalk-list] osa-dispatcher Error Caught
>
>
>
> This message was identified as a phishing
> <http://aka.ms/LearnAboutPhishing> scam.
>
> Feedback <http://aka.ms/SafetyTipsFeedback>
>
> Looks like osa-dispatcher is having trouble connecting to your database...
> have you tried running "spacewalk-sql -i" from your master (or the same
> system you're seeing that error from) to get a n idea of the connectivity
> from that system to your database?
>
>
>
> Once you have that tested take a look at the rhnpushdispatcher table in
> the database.  You can remove any entry there, osa-dispatcher will recreate
> it when you restart...
>
>
>
> On Thu, Mar 30, 2017 at 10:47 AM Camp, Neil (NIH/NCI) [C] <
> neil.camp at nih.gov> wrote:
>
> Hello,
>
>
>
> Osa-dispatcher starts, but dies after a few seconds. I have been digging
> through the archives and searching but have not found a solution.
>
>
>
> 2017/03/30 10:35:25 -04:00 10622 0.0.0.0:
> osad/jabber_lib.subscribe_to_presence('Subscribed from', {})
>
> 2017/03/30 10:35:25 -04:00 10622 0.0.0.0:
> osad/jabber_lib.subscribe_to_presence('
> osad-85cdcd1a3e at ncias-p1466-v.nci.nih.gov',)
>
> 2017/03/30 10:35:25 -04:00 10622 0.0.0.0:
> rhnSQL/driver_postgresql._execute_wrapper('Executing SQL: "select * from
> rhnPushClient where jabber_id = %(p1)s" with bind params: {p1:
> osad-85cdcd1a3e at hostname/osad}',)
>
> 2017/03/30 10:35:26 -04:00 10622 0.0.0.0: osad/jabber_lib.main('ERROR',
> 'Error caught:')
>
>
>
> I have turned debugging up to 5 for osa-dispatcher and have the last 4
> lines posted above. Jabberd appears to be running correctly. I see
> connections coming in from clients. It does have one error (SASL callback
> for non-existing host: spacewalk.fqdn). Does anyone have a suggestion as to
> what could be causing the error for osa-dispatcher?
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20170330/110aff6a/attachment.htm>


More information about the Spacewalk-list mailing list