[Spacewalk-list] Spacewalk - PostgreSQL High I/O

Wed Oct 12 14:16:03 UTC 2016

Nice, I did intend to do some kind of replication to the other datacenter
for a hot/cold standby (or even active/active if it was fast enough) but
bucardo had a pretty steep learning curve, and I'm sort of bound by ITIL to
create change requests and so on, along with other priorities.  Anyway, at
some point I'll get pgpool working and figure something out, maybe a
master/master setup between DCs so the WIP for the main UI works no matter
which server it hits.

It's running pretty smoothly now, I've yet to go so far as to try and
reenable snapshots, but those would be useful at some point also.

On Wed, Oct 12, 2016 at 12:45 AM Paul Robert Marino <prmarino1 at gmail.com>
wrote:

> Matt
> You may consider using multiple PostgreSQL replica servers in HOT mode
> servers with pgpool in front of them it may help but a little. but
> because of te way spacewalk does its queries I doubt it will help much
> as the safties in pgpool will think just about every query may write
> and send it to the master any way.
> A better but more maintenece intensive option you may want to consider
> is table partitioning to alleviate the locks
> https://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
> https://wiki.postgresql.org/wiki/Table_partitioning
> this is a chopped down version of what the commercial data warehouse
> versions of PostgreSQL do,
> its a little bit of a pain but it works well for alleviating locking
> issues. I've used it with multi TB databases in the past with great
> success but it takes a lot of planning and testing to get it right.
>
> On Tue, Oct 11, 2016 at 10:04 PM, Matt Moldvan <matt at moldvan.com> wrote:
> > Yeah, it was quite a trial to get it to scale to that degree, so I tried
> a
> > lot of different things.  I increased the max_fds in the jabber configs
> and
> > the ulimits for the jabber user and it's pretty stable now, though even
> the
> > back end Python code wasn't written for having multiple osa dispatchers
> (one
> > overwrites the other with respect to the database password in the
> > rhnpushdispatcher table in the main database).
> >
> > As for proxies, we have 4 in each datacenter but even that was bringing
> down
> > the database due to what I realized later was OSAD restarting on all
> systems
> > when Puppet would run in the morning.  That coupled with the database
> locks
> > that were being set on the snapshot tables, because Spacewalk for some
> > reason thought the base channels were updated daily, made the UI
> completely
> > unresponsive and unusable by our operations folks trying to run the
> patching
> > for hundreds of systems at a time.
> >
> > As for the proxies, my main struggle was putting them behind F5 local and
> > global load balancers.  I signed the SSL certs with the name of the
> global
> > WIP, and unfortunately the way the jabber S2S component works, it was
> trying
> > to use that same name for all the proxies.  I realized later I was
> trying to
> > fit a square peg into a round hole and fixed up the Puppet module,
> disabled
> > snapshots, and disabled Jabber on the proxies and pointed the clients at
> the
> > masters, instead.  That was after months of frustration and a growing
> chorus
> > of complaints on the issues with the UI on the masters.
> >
> > Anyway, having the proxies helped reduce the load significantly on the
> > masters, as they still function with the other components like Apache
> httpd
> > and Squid for caching the RPMs.
> >
> > On Tue, Oct 11, 2016 at 5:02 PM Paul Robert Marino <prmarino1 at gmail.com>
> > wrote:
> >>
> >> Matt,
> >> you ave a lot of clients so those numbers start to make sense to
> >> increase the database connections.
> >> also OSAD isn't meant to scale that high in fact you should run out of
> >> file handles for it before it even gets to that many clients.
> >> further more I hope you are using spacewalk proxies if not you may
> >> find they help you a lot.
> >>
> >> As to the original poster look at this page
> >> https://www.postgresql.org/docs/9.2/static/runtime-config-resource.html
> >> specifically work_mem I would be willing to bet that is your issue
> >> because it isn't enough for Spacewalk by default.
> >> the easiest way to confirm it is to look in the postgresql base data
> >> directory /var/lib/pgsql/data by default and look for a directory
> >> called pgsql_tmp if there are any files in that directory then you
> >> know its swaping.
> >> further more you can look for any queries that have been running for a
> >> long time and run explain analyze on them.
> >> I normally do not suggest this site or any ones like it because they
> >> often give very wrong answers but in this case its not a bad answer
> >>
> >>
> http://dba.stackexchange.com/questions/112079/slow-query-performance-due-to-temporary-file
> >> I just don't suggest going strait to 64M
> >>
> >>
> >>
> >> On Tue, Oct 11, 2016 at 8:37 AM, Allan Moraes <allan at allanmoraes.com.br
> >
> >> wrote:
> >> > Thank you for the tips,
> >> >
> >> > In this case there is available 6GB of memory and the high I/O occur
> at
> >> > the
> >> > postgres disk. Other disks, the I/O is normal. The system not is using
> >> > swap
> >> > and there is 3GB of swap.
> >> >
> >> > I will separate the postgre and apply your tips.
> >> >
> >> > 2016-10-10 21:58 GMT-03:00 Matt Moldvan <matt at moldvan.com>:
> >> >>
> >> >> We have about 6,000 systems to manage and it was unusable
> otherwise...
> >> >> I
> >> >> had way too much trouble trying to get OSAD to work through proxies
> and
> >> >> F5
> >> >> load balancers, so I had to end up pointing them all to two masters
> >> >> that are
> >> >> still using the same Postgres database VM.  I was also toying with
> >> >> having
> >> >> the database be the back end for OSAD, so with that in mind the
> number
> >> >> of
> >> >> concurrent clients would often reach higher than usual numbers...  I
> >> >> tried a
> >> >> lot of different things to get Spacewalk stable, usable, and have
> >> >> proper
> >> >> failover, so I don't know that any of my recommendations or
> environment
> >> >> specific settings might be a silver bullet for anyone else, but it
> >> >> can't
> >> >> hurt to try, and learn in the process.
> >> >>
> >> >> On Mon, Oct 10, 2016 at 6:23 PM Paul Robert Marino
> >> >> <prmarino1 at gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> tuning for 5000 clients is nuts that would hurt your performance
> >> >>> try running pgtune for about 50 to maybe 500 clients max, but I try
> >> >>> the lower setting first.
> >> >>> Now lets talk about the high IO that usually happens when you don't
> >> >>> have enough working memory in PostgreSQL's configuration. When that
> >> >>> happens PostgreSQL creates temp files that are slow and do a lot of
> >> >>> write IO during read operations because it will have to swap the
> data
> >> >>> out to the temp files, note setting the number of connections too
> high
> >> >>> would exacerbate that issue f its the root cause.
> >> >>> By the way I managed up to 400 with spacewalk and never had to
> disable
> >> >>> the snapshots.
> >> >>>
> >> >>>
> >> >>> On Mon, Oct 10, 2016 at 4:48 PM, Matt Moldvan <matt at moldvan.com>
> >> >>> wrote:
> >> >>> > I had similar issues and ended up first breaking out the database
> to
> >> >>> > it's
> >> >>> > own VM, then increasing the Postgres debug logs.  I saw that there
> >> >>> > were
> >> >>> > a
> >> >>> > large number of operations running against the snapshot tables,
> with
> >> >>> > locks
> >> >>> > and so on being set for a long period of time.  In
> >> >>> > /etc/rhn/rhn.conf,
> >> >>> > try
> >> >>> > disabling snapshots with:
> >> >>> >
> >> >>> > enable_snapshots = 0
> >> >>> >
> >> >>> > I also did quite a bit of Postgres tuning using pgtune, for 5,000
> >> >>> > clients or
> >> >>> > so:
> >> >>> > pgtune -i data/postgresql.conf  -o ./data/postgresql.conf.new -c
> >> >>> > 5000
> >> >>> >
> >> >>> > Another thing that may help is installing pgbadger to analyze your
> >> >>> > Postgres
> >> >>> > logs... it has some nice visualizations of the types of queries
> and
> >> >>> > tables
> >> >>> > involved, which may point you in the right direction if snapshots
> >> >>> > aren't the
> >> >>> > reason for the high utilization.
> >> >>> > https://github.com/dalibo/pgbadger
> >> >>> >
> >> >>> > Hope that helps.
> >> >>> >
> >> >>> > On Mon, Oct 10, 2016 at 4:06 PM Konstantin Raskoshnyi
> >> >>> > <konrasko at gmail.com>
> >> >>> > wrote:
> >> >>> >>
> >> >>> >> Because all your systems request information from SP, and default
> >> >>> >> installation doesn't make any sense if you have more that 50
> >> >>> >> machines,
> >> >>> >> so
> >> >>> >> you need to tyne postgres, tomcat & linux itself
> >> >>> >>
> >> >>> >> On Mon, Oct 10, 2016 at 12:34 PM, Allan Moraes
> >> >>> >> <allan at allanmoraes.com.br>
> >> >>> >> wrote:
> >> >>> >>>
> >> >>> >>> Hi
> >> >>> >>> In my CentOS 7 server, is installed the spacewalk 2.4 and
> >> >>> >>> PostgreSQL
> >> >>> >>> from
> >> >>> >>> default installation. Via iotop, my postgresql write a lot of
> >> >>> >>> informations,
> >> >>> >>> during all day. Why this occur?
> >> >>> >>>
> >> >>> >>> _______________________________________________
> >> >>> >>> Spacewalk-list mailing list
> >> >>> >>> Spacewalk-list at redhat.com
> >> >>> >>> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >> >>> >>
> >> >>> >>
> >> >>> >> _______________________________________________
> >> >>> >> Spacewalk-list mailing list
> >> >>> >> Spacewalk-list at redhat.com
> >> >>> >> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >> >>> >
> >> >>> >
> >> >>> > _______________________________________________
> >> >>> > Spacewalk-list mailing list
> >> >>> > Spacewalk-list at redhat.com
> >> >>> > https://www.redhat.com/mailman/listinfo/spacewalk-list
> >> >>>
> >> >>> _______________________________________________
> >> >>> Spacewalk-list mailing list
> >> >>> Spacewalk-list at redhat.com
> >> >>> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Spacewalk-list mailing list
> >> >> Spacewalk-list at redhat.com
> >> >> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > Atenciosamente...
> >> >
> >> > Allan Moraes
> >> > - Linux Consulting at Venda e Cia
> >> > - Founder and Editor at MySQL Box
> >> > - Linux System Administrador and DBA MySQL at Umbler
> >> >
> >> > Cel: (51) 81885480
> >> > E-mail: allan at mysqlbox.com.br
> >> > Skype: allan at allanmoraes.com.br
> >> >
> >> > _______________________________________________
> >> > Spacewalk-list mailing list
> >> > Spacewalk-list at redhat.com
> >> > https://www.redhat.com/mailman/listinfo/spacewalk-list
> >>
> >> _______________________________________________
> >> Spacewalk-list mailing list
> >> Spacewalk-list at redhat.com
> >> https://www.redhat.com/mailman/listinfo/spacewalk-list
> >
> >
> > _______________________________________________
> > Spacewalk-list mailing list
> > Spacewalk-list at redhat.com
> > https://www.redhat.com/mailman/listinfo/spacewalk-list
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/spacewalk-list/attachments/20161012/21225df9/attachment.htm>