[Spacewalk-list] Spacewalk - PostgreSQL High I/O

Wed Oct 12 02:04:52 UTC 2016

Yeah, it was quite a trial to get it to scale to that degree, so I tried a
lot of different things.  I increased the max_fds in the jabber configs and
the ulimits for the jabber user and it's pretty stable now, though even the
back end Python code wasn't written for having multiple osa dispatchers
(one overwrites the other with respect to the database password in the
rhnpushdispatcher table in the main database).

As for proxies, we have 4 in each datacenter but even that was bringing
down the database due to what I realized later was OSAD restarting on all
systems when Puppet would run in the morning.  That coupled with the
database locks that were being set on the snapshot tables, because
Spacewalk for some reason thought the base channels were updated daily,
made the UI completely unresponsive and unusable by our operations folks
trying to run the patching for hundreds of systems at a time.

As for the proxies, my main struggle was putting them behind F5 local and
global load balancers.  I signed the SSL certs with the name of the global
WIP, and unfortunately the way the jabber S2S component works, it was
trying to use that same name for all the proxies.  I realized later I was
trying to fit a square peg into a round hole and fixed up the Puppet
module, disabled snapshots, and disabled Jabber on the proxies and pointed
the clients at the masters, instead.  That was after months of frustration
and a growing chorus of complaints on the issues with the UI on the masters.

Anyway, having the proxies helped reduce the load significantly on the
masters, as they still function with the other components like Apache httpd
and Squid for caching the RPMs.

On Tue, Oct 11, 2016 at 5:02 PM Paul Robert Marino <prmarino1 at gmail.com>
wrote:

> Matt,
> you ave a lot of clients so those numbers start to make sense to
> increase the database connections.
> also OSAD isn't meant to scale that high in fact you should run out of
> file handles for it before it even gets to that many clients.
> further more I hope you are using spacewalk proxies if not you may
> find they help you a lot.
>
> As to the original poster look at this page
> https://www.postgresql.org/docs/9.2/static/runtime-config-resource.html
> specifically work_mem I would be willing to bet that is your issue
> because it isn't enough for Spacewalk by default.
> the easiest way to confirm it is to look in the postgresql base data
> directory /var/lib/pgsql/data by default and look for a directory
> called pgsql_tmp if there are any files in that directory then you
> know its swaping.
> further more you can look for any queries that have been running for a
> long time and run explain analyze on them.
> I normally do not suggest this site or any ones like it because they
> often give very wrong answers but in this case its not a bad answer
>
> http://dba.stackexchange.com/questions/112079/slow-query-performance-due-to-temporary-file
> I just don't suggest going strait to 64M
>
>
>
> On Tue, Oct 11, 2016 at 8:37 AM, Allan Moraes <allan at allanmoraes.com.br>
> wrote:
> > Thank you for the tips,
> >
> > In this case there is available 6GB of memory and the high I/O occur at
> the
> > postgres disk. Other disks, the I/O is normal. The system not is using
> swap
> > and there is 3GB of swap.
> >
> > I will separate the postgre and apply your tips.
> >
> > 2016-10-10 21:58 GMT-03:00 Matt Moldvan <matt at moldvan.com>:
> >>
> >> We have about 6,000 systems to manage and it was unusable otherwise... I
> >> had way too much trouble trying to get OSAD to work through proxies and
> F5
> >> load balancers, so I had to end up pointing them all to two masters
> that are
> >> still using the same Postgres database VM.  I was also toying with
> having
> >> the database be the back end for OSAD, so with that in mind the number
> of
> >> concurrent clients would often reach higher than usual numbers...  I
> tried a
> >> lot of different things to get Spacewalk stable, usable, and have proper
> >> failover, so I don't know that any of my recommendations or environment
> >> specific settings might be a silver bullet for anyone else, but it can't
> >> hurt to try, and learn in the process.
> >>
> >> On Mon, Oct 10, 2016 at 6:23 PM Paul Robert Marino <prmarino1 at gmail.com
> >
> >> wrote:
> >>>
> >>> tuning for 5000 clients is nuts that would hurt your performance
> >>> try running pgtune for about 50 to maybe 500 clients max, but I try
> >>> the lower setting first.
> >>> Now lets talk about the high IO that usually happens when you don't
> >>> have enough working memory in PostgreSQL's configuration. When that
> >>> happens PostgreSQL creates temp files that are slow and do a lot of
> >>> write IO during read operations because it will have to swap the data
> >>> out to the temp files, note setting the number of connections too high
> >>> would exacerbate that issue f its the root cause.
> >>> By the way I managed up to 400 with spacewalk and never had to disable
> >>> the snapshots.
> >>>
> >>>
> >>> On Mon, Oct 10, 2016 at 4:48 PM, Matt Moldvan <matt at moldvan.com>
> wrote:
> >>> > I had similar issues and ended up first breaking out the database to
> >>> > it's
> >>> > own VM, then increasing the Postgres debug logs.  I saw that there
> were
> >>> > a
> >>> > large number of operations running against the snapshot tables, with
> >>> > locks
> >>> > and so on being set for a long period of time.  In /etc/rhn/rhn.conf,
> >>> > try
> >>> > disabling snapshots with:
> >>> >
> >>> > enable_snapshots = 0
> >>> >
> >>> > I also did quite a bit of Postgres tuning using pgtune, for 5,000
> >>> > clients or
> >>> > so:
> >>> > pgtune -i data/postgresql.conf  -o ./data/postgresql.conf.new -c 5000
> >>> >
> >>> > Another thing that may help is installing pgbadger to analyze your
> >>> > Postgres
> >>> > logs... it has some nice visualizations of the types of queries and
> >>> > tables
> >>> > involved, which may point you in the right direction if snapshots
> >>> > aren't the
> >>> > reason for the high utilization.
> >>> > https://github.com/dalibo/pgbadger
> >>> >
> >>> > Hope that helps.
> >>> >
> >>> > On Mon, Oct 10, 2016 at 4:06 PM Konstantin Raskoshnyi
> >>> > <konrasko at gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Because all your systems request information from SP, and default
> >>> >> installation doesn't make any sense if you have more that 50
> machines,
> >>> >> so
> >>> >> you need to tyne postgres, tomcat & linux itself
> >>> >>
> >>> >> On Mon, Oct 10, 2016 at 12:34 PM, Allan Moraes
> >>> >> <allan at allanmoraes.com.br>
> >>> >> wrote:
> >>> >>>
> >>> >>> Hi
> >>> >>> In my CentOS 7 server, is installed the spacewalk 2.4 and
> PostgreSQL
> >>> >>> from
> >>> >>> default installation. Via iotop, my postgresql write a lot of
> >>> >>> informations,
> >>> >>> during all day. Why this occur?
> >>> >>>
> >>> >>> _______________________________________________
> >>> >>> Spacewalk-list mailing list
> >>> >>> Spacewalk-list at redhat.com
> >>> >>> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >>> >>
> >>> >>
> >>> >> _______________________________________________
> >>> >> Spacewalk-list mailing list
> >>> >> Spacewalk-list at redhat.com
> >>> >> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >>> >
> >>> >
> >>> > _______________________________________________
> >>> > Spacewalk-list mailing list
> >>> > Spacewalk-list at redhat.com
> >>> > https://www.redhat.com/mailman/listinfo/spacewalk-list
> >>>
> >>> _______________________________________________
> >>> Spacewalk-list mailing list
> >>> Spacewalk-list at redhat.com
> >>> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >>
> >>
> >> _______________________________________________
> >> Spacewalk-list mailing list
> >> Spacewalk-list at redhat.com
> >> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >
> >
> >
> >
> > --
> >
> > Atenciosamente...
> >
> > Allan Moraes
> > - Linux Consulting at Venda e Cia
> > - Founder and Editor at MySQL Box
> > - Linux System Administrador and DBA MySQL at Umbler
> >
> > Cel: (51) 81885480
> > E-mail: allan at mysqlbox.com.br
> > Skype: allan at allanmoraes.com.br
> >
> > _______________________________________________
> > Spacewalk-list mailing list
> > Spacewalk-list at redhat.com
> > https://www.redhat.com/mailman/listinfo/spacewalk-list
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20161012/48c0e2b5/attachment.htm>