[Spacewalk-list] cleanup-data-bunch takes forever

Gerald Vogt vogt at spamcop.net
Mon Jun 8 15:07:45 UTC 2015


Does anyone have an idea what I could do about this?

Does anyone see something similar?

Thanks,

Gerald
On 28.05.15 06:57, Gerald Vogt wrote:
> Eventually cleanup-data-bunch finished after 8:40 h. Definitively not
> suitable to run daily.
> 
> I don't quite understand why the query runs with two sequential scans
> and does not use the indexes.
> 
> Does anyone have an idea how to optimize that? Is that always taking so
> long? I am using CentOS 6.6, Spacewalk 2.3, PostgreSQL 8.4.20.
> 
> For now, I'll schedule that task monthly...
> 
> Thanks,
> 
> Gerald
> 
> On 27.05.15 13:39, Gerald Vogt wrote:
>> Hi!
>>
>> It seems cleanup-data-bunch takes a very long time.
>>
>> I have noticed that it was in state "INTERRUPTED" and did not run for a
>> couple of months. I have applied the changes as suggested in
>>
>> https://www.redhat.com/archives/spacewalk-list/2015-May/msg00091.html
>>
>> and now cleanup-data-bunch does run again. It's however very slow.
>> Currently, it is already running for 2:15 h and still goes on.
>>
>> I can see with pg_stat_activity that is running this query:
>>
>> DELETE FROM rhnPackageChangeLogData
>>          WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM
>> rhnPackageChangeLogRec )
>>
>> which seems to keep the database busy for a while:
>>
>>> spacewalk=# explain verbose DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec );
>>>                                                 QUERY PLAN                                                 
>>> -----------------------------------------------------------------------------------------------------------
>>>  Seq Scan on rhnpackagechangelogdata  (cost=803563.75..215621093.82 rows=201534 width=6)
>>>    Output: 
>>>    Filter: (NOT (SubPlan 1))
>>>    SubPlan 1
>>>      ->  Materialize  (cost=803563.75..804467.22 rows=64947 width=8)
>>>            Output: rhnpackagechangelogrec.changelog_data_id
>>>            ->  Unique  (cost=781080.12..803244.80 rows=64947 width=8)
>>>                  Output: rhnpackagechangelogrec.changelog_data_id
>>>                  ->  Sort  (cost=781080.12..792162.46 rows=4432935 width=8)
>>>                        Output: rhnpackagechangelogrec.changelog_data_id
>>>                        Sort Key: rhnpackagechangelogrec.changelog_data_id
>>>                        ->  Seq Scan on rhnpackagechangelogrec  (cost=0.00..109859.35 rows=4432935 width=8)
>>>                              Output: rhnpackagechangelogrec.changelog_data_id
>>> (13 rows)
>>>
>>> spacewalk=# select count(*) from rhnpackagechangelogrec;
>>>   count  
>>> ---------
>>>  5112795
>>> (1 row)
>>>
>>> spacewalk=# select count(*) from rhnpackagechangelogdata;
>>>  count  
>>> --------
>>>  401000
>>> (1 row)
>>>
>>
>> If it is taking this long I guess there is no point scheduling it daily.
>>
>> Any thoughts?
>>
>> Thanks,
>>
>> Gerald
>>
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
> 
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
> 




More information about the Spacewalk-list mailing list