[Spacewalk-list] cleanup-data-bunch takes forever

Paul Robert Marino prmarino1 at gmail.com
Mon Jun 8 18:19:45 UTC 2015


Well I'm running PostgreSQL 9.x on all of my database servers so my
plan looks way different than your for instance my servers don't need
to do all of those sorts, so its not a viable comparison. that said
the numbers are nothing like yours my servers chomp through that query
reasonably quickly.
Also Ive seen PostgreSQL 8.4 handle more complex operations on 1TB
tables much more efficiently than how your box is handling things.

I suspect your issue is the setting for work_mem are too low to handle
the dataset without resorting to using a temporary directory to swap
the data in and out of memory. Additionally in PostgreSQL 9.x
effective cache auto tunes to 1/3 of your ram which is fine for most
users, but in 8.x you still need to statically set it in the config.
Also load can be deceptive on modern systems I rarely use it as a
benchmark any more. the intresting thing to know is what does sar say
about your system during those queries.further more which scheduler
are you using for the disks if you are using CFQ which is the default
on Red Hat your IO performance can be somewhat against PostgreSQL in
favor of other applications because PostgreSQL eats so much IO I
always advise the deadline shceduler for physical servers and noop for
virtual ones.

the finally question is by any chance can it have been a really long
time since that query was executed successfully, in which case its
probably got a lot of cleanup to take care of.


On Mon, Jun 8, 2015 at 1:33 PM, Gerald Vogt <vogt at spamcop.net> wrote:
> vacuum analyze is the first thing I do when I see problems on a
> postgresql database...
>
> The database is running on the spacewalk server but it doesn't show too
> much load at any time nor is it slow on the disks.
>
> The estimate of explain verbose don't look too promising to me.
>
> What do you get for
>
> explain verbose DELETE FROM rhnPackageChangeLogData WHERE id NOT IN (
> SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec );
>
> ??
>
> Thanks,
>
> Gerald
>
> On 08.06.15 19:05, Paul Robert Marino wrote:
>> looks like your database isn't tuned correctly or you desperately need
>> to run a vacuum analyze on it.
>> check your disks too.
>> This is most likely a problem with your database servers,
>> configuration, maintenance, or hardware.
>>
>>
>> On Mon, Jun 8, 2015 at 11:07 AM, Gerald Vogt <vogt at spamcop.net> wrote:
>>> Does anyone have an idea what I could do about this?
>>>
>>> Does anyone see something similar?
>>>
>>> Thanks,
>>>
>>> Gerald
>>> On 28.05.15 06:57, Gerald Vogt wrote:
>>>> Eventually cleanup-data-bunch finished after 8:40 h. Definitively not
>>>> suitable to run daily.
>>>>
>>>> I don't quite understand why the query runs with two sequential scans
>>>> and does not use the indexes.
>>>>
>>>> Does anyone have an idea how to optimize that? Is that always taking so
>>>> long? I am using CentOS 6.6, Spacewalk 2.3, PostgreSQL 8.4.20.
>>>>
>>>> For now, I'll schedule that task monthly...
>>>>
>>>> Thanks,
>>>>
>>>> Gerald
>>>>
>>>> On 27.05.15 13:39, Gerald Vogt wrote:
>>>>> Hi!
>>>>>
>>>>> It seems cleanup-data-bunch takes a very long time.
>>>>>
>>>>> I have noticed that it was in state "INTERRUPTED" and did not run for a
>>>>> couple of months. I have applied the changes as suggested in
>>>>>
>>>>> https://www.redhat.com/archives/spacewalk-list/2015-May/msg00091.html
>>>>>
>>>>> and now cleanup-data-bunch does run again. It's however very slow.
>>>>> Currently, it is already running for 2:15 h and still goes on.
>>>>>
>>>>> I can see with pg_stat_activity that is running this query:
>>>>>
>>>>> DELETE FROM rhnPackageChangeLogData
>>>>>          WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM
>>>>> rhnPackageChangeLogRec )
>>>>>
>>>>> which seems to keep the database busy for a while:
>>>>>
>>>>>> spacewalk=# explain verbose DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec );
>>>>>>                                                 QUERY PLAN
>>>>>> -----------------------------------------------------------------------------------------------------------
>>>>>>  Seq Scan on rhnpackagechangelogdata  (cost=803563.75..215621093.82 rows=201534 width=6)
>>>>>>    Output:
>>>>>>    Filter: (NOT (SubPlan 1))
>>>>>>    SubPlan 1
>>>>>>      ->  Materialize  (cost=803563.75..804467.22 rows=64947 width=8)
>>>>>>            Output: rhnpackagechangelogrec.changelog_data_id
>>>>>>            ->  Unique  (cost=781080.12..803244.80 rows=64947 width=8)
>>>>>>                  Output: rhnpackagechangelogrec.changelog_data_id
>>>>>>                  ->  Sort  (cost=781080.12..792162.46 rows=4432935 width=8)
>>>>>>                        Output: rhnpackagechangelogrec.changelog_data_id
>>>>>>                        Sort Key: rhnpackagechangelogrec.changelog_data_id
>>>>>>                        ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..109859.35 rows=4432935 width=8)
>>>>>>                              Output: rhnpackagechangelogrec.changelog_data_id
>>>>>> (13 rows)
>>>>>>
>>>>>> spacewalk=# select count(*) from rhnpackagechangelogrec;
>>>>>>   count
>>>>>> ---------
>>>>>>  5112795
>>>>>> (1 row)
>>>>>>
>>>>>> spacewalk=# select count(*) from rhnpackagechangelogdata;
>>>>>>  count
>>>>>> --------
>>>>>>  401000
>>>>>> (1 row)
>>>>>>
>>>>>
>>>>> If it is taking this long I guess there is no point scheduling it daily.
>>>>>
>>>>> Any thoughts?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Gerald
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Spacewalk-list mailing list
>>>>> Spacewalk-list at redhat.com
>>>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>>>>
>>>>
>>>> _______________________________________________
>>>> Spacewalk-list mailing list
>>>> Spacewalk-list at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>>>
>>>
>>> _______________________________________________
>>> Spacewalk-list mailing list
>>> Spacewalk-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list




More information about the Spacewalk-list mailing list