[Spacewalk-list] Osad problems

Sorensen, Paul - (p) pauls at telenav.com
Fri Sep 2 18:34:57 UTC 2016


Just putting in my two cents – I’ve just tried out the solution mentioned here:

https://bugzilla.redhat.com/show_bug.cgi?id=662593#c3

Which gives the suggestion of commenting out all mentions to ‘roster’ in ‘/etc/jabberd/sm.xml’   (and doubling) the value for <max_fps> in c2s.xml, router.xml, and s2s.xml

Detailed patch here:   https://bugzilla.redhat.com/attachment.cgi?id=478869&action=diff

And now all my clients are coming back as ‘Online since …’ – so this appears to have fixed my osad issues, which I’ve been trying to fix for months.

I hope this helps!


From: spacewalk-list-bounces at redhat.com [mailto:spacewalk-list-bounces at redhat.com] On Behalf Of Ree, Jan-Albert van
Sent: Friday, September 02, 2016 10:28 AM
To: spacewalk-list at redhat.com
Subject: Re: [Spacewalk-list] Osad problems


Sounds like you still have certificate issues.

Are you sure all clients are using the correct new cert? If unsure, you might want to try manually specifying the correct new cert in the /etc/sysconfig/rhn/osad.conf file to see if that helps

Also how did you replace the certs, did you install a newer version of the certificates RPM?



The database is only updated if the osa-dispatcher is running properly

The following post might be of some use too, it helped me a lot recently debugging OSA related issues too https://www.redhat.com/archives/spacewalk-list/2014-May/msg00124.html​



Regards

Jan-Albert



Jan-Albert van Ree | Linux System Administrator | MARIN Support Group
MARIN | T +31 317 49 35 48 | J.A.v.Ree at marin.nl<mailto:J.A.v.Ree at marin.nl> | www.marin.nl<http://www.marin.nl>

[LinkedIn]<https://www.linkedin.com/company/marin> [YouTube] <http://www.youtube.com/marinmultimedia>  [Twitter] <https://twitter.com/MARIN_nieuws>  [Facebook] <https://www.facebook.com/marin.wageningen>
MARIN news: Subsidy granted for offshore project with Ampyx Power, ECN and Mocean<http://www.marin.nl/web/News/News-items/Subsidy-granted-for-offshore-project-with-Ampyx-Power-ECN-and-Mocean.htm>
________________________________
From: spacewalk-list-bounces at redhat.com<mailto:spacewalk-list-bounces at redhat.com> <spacewalk-list-bounces at redhat.com<mailto:spacewalk-list-bounces at redhat.com>> on behalf of Konstantin Raskoshnyi <konrasko at gmail.com<mailto:konrasko at gmail.com>>
Sent: Friday, September 02, 2016 00:59
To: spacewalk-list at redhat.com<mailto:spacewalk-list at redhat.com>
Subject: Re: [Spacewalk-list] Osad problems

Yes, such a pain,
but something interesting - I have two servers, the
first was set up by previous employee and it was renamed manually and I had to replace cert on all clients & renamed the server through sp utility
the second server was installed from the scratch and have never been renamed, and it works smoothly without any problems with osa-dispatcher.

Anyway, thanks man

On Thu, Sep 1, 2016 at 12:57 PM, Matt Moldvan <matt at moldvan.com<mailto:matt at moldvan.com>> wrote:
I have the same issues with 2.5 and latest OSAD packages... the connection still looks like it's established at the client side, but for some reason it has stopped trying to send data.  The master no longer sees the connection as open and therefore cannot send anything to it.

The only resolution I've found is to restart the client(s), but for so many systems this caused the dispatchers to become unresponsive during our maintenance windows.  Essentially, Puppet would run, restart OSAD, and it would consume all the HTTP connections and make the GUI unresponsive.  Update and reboot actions were picked up outside of the scheduled maintenance, and it was all around chaos.

So at this point I'm stuck babysitting OSAD status of systems because there is nothing easily found in /var/log/osad that indicates an issue, even though the client still has 5222 open to the dispatcher and the osad service is running.  In the Spacewalk database, the system is marked down... I ran an strace on the OSAD process on the client for about 30 minutes, and didn't see any attempts to do anything.

[me at osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP
osad    21996 root    3u  IPv4 7392569      0t0      TCP osad-client1:56939->spacewalk-master:5222 (ESTABLISHED)
[me at osad-client1 ~]$ service osad status
osad (pid  21996) is running...
[me at osad-client1 ~]$ sudo lsof -Pp 21996 | grep TCP
osad    21996 root    3u  IPv4 7392569      0t0      TCP osad-client1:56939->spacewalk-master:5222 (ESTABLISHED)
[me at osad-client1 ~]$ sudo strace -fp 21996
Process 21996 attached
select(4, [3], [], [], NULL

---
rhnschema=# select s.name<http://s.name/>,pc.state_id from rhnpushclient pc, rhnserver s where s.name<http://s.name/>='osad-client1' and pc.server_id=s.id<http://s.id/>;
         name          | state_id
-----------------------+----------
 osad-client1          |        2
(1 row)

Even though osad-client1 thought it was still connected, the master didn't have a corresponding connection on 5222:
[me at spacewalk-master ~]$ netstat -a | grep osad-client1
[me at spacewalk-master ~]$

For me, changing the values in /etc/jabberd/*.xml as recommended in https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD wasn't going to work... I tried that and all systems would be disconnected, then would reconnect, causing some (perhaps insignificant) load on the database as well as unnecessary network traffic and client processing.  I could see the number systems marked as "online" in the database flapping wildly between 1,000 and 5,000 over time.

One thing I did notice on the systems that were marked offline... a netstat showed two connections, one in CLOSE_WAIT status and another in ESTABLISHED.  On restart of OSAD, only one was there, in ESTABLISHED state and the system was marked online again.

I'm thinking that the OSAD Python code isn't closing the sockets properly when an error is encountered, and leaves the client thinking it's still connected, while the master doesn't have a corresponding connection to send data to.

Basically, as a workaround, I think I'm going to have systems restart OSAD if they see connections open on 5222 in CLOSE_WAIT status... until something better comes along and the client code is fixed up.  Unfortunately the workaround isn't even a full one... not every system had multiple connections, but it's a step toward more systems staying usable than before.

On Thu, Sep 1, 2016 at 1:26 PM Konstantin Raskoshnyi <konrasko at gmail.com<mailto:konrasko at gmail.com>> wrote:
2.4, I tried, actually after I did spacewalk-service restart it helped for one day.

Now it's the same, but no any errors on both sides.

On Wed, Aug 31, 2016 at 9:06 AM, Matthew Madey <mattmadey at gmail.com<mailto:mattmadey at gmail.com>> wrote:

What version of Spacewalk are you running? You likely need to reset the osad credentials on the clients. This typically only occurs when the jabber database has been corrupted.

On the clients, run the below commands:



rm -f /etc/sysconfig/rhn/osad-auth.conf ; service osad restart

You may find the below links helpful

https://fedorahosted.org/spacewalk/wiki/OsadHowTo

https://fedorahosted.org/spacewalk/wiki/JabberAndOSAD





On Aug 30, 2016 4:43 PM, "Konstantin Raskoshnyi" <konrasko at gmail.com<mailto:konrasko at gmail.com>> wrote:
Something strange with some of my osad clients ~1/3

They don't pickup any jobs from osa-dispatcher, no any errors during starting the service,

also if I restart osad on sp I see logs:

Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] [::ffff:172.90.7.220, port=43046] disconnect jid=osad-e43e3265db at spacewalk15.ooma.internal/osad<mailto:jid=osad-e43e3265db at spacewalk15.ooma.internal/osad>, packets: 29, bytes: 3738
Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session ended: jid=osad-e43e3265db at spacewalk15.ooma.internal/osad<mailto:jid=osad-e43e3265db at spacewalk15.ooma.internal/osad>
Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: user unloaded jid=osad-e43e3265db at spacewalk15.ooma.internal<mailto:jid=osad-e43e3265db at spacewalk15.ooma.internal>
Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] traditional.digest authentication succeeded: osad-e43e3265db@/osad ::ffff:172.90.7.220:43454<http://172.90.7.220:43454/> TLS
Aug 30 14:32:20 spacewalk15 jabberd/c2s[51907]: [142] requesting session: jid=osad-e43e3265db at spacewalk15.ooma.internal/osad<mailto:jid=osad-e43e3265db at spacewalk15.ooma.internal/osad>
Aug 30 14:32:20 spacewalk15 jabberd/sm[51904]: session started: jid=osad-e43e3265db at spacewalk15.ooma.internal/osad<mailto:jid=osad-e43e3265db at spacewalk15.ooma.internal/osad>

So looks like everything should be fine

_______________________________________________
Spacewalk-list mailing list
Spacewalk-list at redhat.com<mailto:Spacewalk-list at redhat.com>
https://www.redhat.com/mailman/listinfo/spacewalk-list

_______________________________________________
Spacewalk-list mailing list
Spacewalk-list at redhat.com<mailto:Spacewalk-list at redhat.com>
https://www.redhat.com/mailman/listinfo/spacewalk-list

_______________________________________________
Spacewalk-list mailing list
Spacewalk-list at redhat.com<mailto:Spacewalk-list at redhat.com>
https://www.redhat.com/mailman/listinfo/spacewalk-list

_______________________________________________
Spacewalk-list mailing list
Spacewalk-list at redhat.com<mailto:Spacewalk-list at redhat.com>
https://www.redhat.com/mailman/listinfo/spacewalk-list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/12ac6f64/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 293 bytes
Desc: image001.png
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/12ac6f64/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 331 bytes
Desc: image002.png
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/12ac6f64/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 333 bytes
Desc: image003.png
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/12ac6f64/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 253 bytes
Desc: image004.png
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20160902/12ac6f64/attachment-0003.png>


More information about the Spacewalk-list mailing list