[Spacewalk-list] cleanup-data-bunch takes forever
Gerald Vogt
vogt at spamcop.net
Mon Jun 8 15:07:45 UTC 2015
Does anyone have an idea what I could do about this?
Does anyone see something similar?
Thanks,
Gerald
On 28.05.15 06:57, Gerald Vogt wrote:
> Eventually cleanup-data-bunch finished after 8:40 h. Definitively not
> suitable to run daily.
>
> I don't quite understand why the query runs with two sequential scans
> and does not use the indexes.
>
> Does anyone have an idea how to optimize that? Is that always taking so
> long? I am using CentOS 6.6, Spacewalk 2.3, PostgreSQL 8.4.20.
>
> For now, I'll schedule that task monthly...
>
> Thanks,
>
> Gerald
>
> On 27.05.15 13:39, Gerald Vogt wrote:
>> Hi!
>>
>> It seems cleanup-data-bunch takes a very long time.
>>
>> I have noticed that it was in state "INTERRUPTED" and did not run for a
>> couple of months. I have applied the changes as suggested in
>>
>> https://www.redhat.com/archives/spacewalk-list/2015-May/msg00091.html
>>
>> and now cleanup-data-bunch does run again. It's however very slow.
>> Currently, it is already running for 2:15 h and still goes on.
>>
>> I can see with pg_stat_activity that is running this query:
>>
>> DELETE FROM rhnPackageChangeLogData
>> WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM
>> rhnPackageChangeLogRec )
>>
>> which seems to keep the database busy for a while:
>>
>>> spacewalk=# explain verbose DELETE FROM rhnPackageChangeLogData WHERE id NOT IN ( SELECT DISTINCT changelog_data_id FROM rhnPackageChangeLogRec );
>>> QUERY PLAN
>>> -----------------------------------------------------------------------------------------------------------
>>> Seq Scan on rhnpackagechangelogdata (cost=803563.75..215621093.82 rows=201534 width=6)
>>> Output:
>>> Filter: (NOT (SubPlan 1))
>>> SubPlan 1
>>> -> Materialize (cost=803563.75..804467.22 rows=64947 width=8)
>>> Output: rhnpackagechangelogrec.changelog_data_id
>>> -> Unique (cost=781080.12..803244.80 rows=64947 width=8)
>>> Output: rhnpackagechangelogrec.changelog_data_id
>>> -> Sort (cost=781080.12..792162.46 rows=4432935 width=8)
>>> Output: rhnpackagechangelogrec.changelog_data_id
>>> Sort Key: rhnpackagechangelogrec.changelog_data_id
>>> -> Seq Scan on rhnpackagechangelogrec (cost=0.00..109859.35 rows=4432935 width=8)
>>> Output: rhnpackagechangelogrec.changelog_data_id
>>> (13 rows)
>>>
>>> spacewalk=# select count(*) from rhnpackagechangelogrec;
>>> count
>>> ---------
>>> 5112795
>>> (1 row)
>>>
>>> spacewalk=# select count(*) from rhnpackagechangelogdata;
>>> count
>>> --------
>>> 401000
>>> (1 row)
>>>
>>
>> If it is taking this long I guess there is no point scheduling it daily.
>>
>> Any thoughts?
>>
>> Thanks,
>>
>> Gerald
>>
>>
>> _______________________________________________
>> Spacewalk-list mailing list
>> Spacewalk-list at redhat.com
>> https://www.redhat.com/mailman/listinfo/spacewalk-list
>>
>
> _______________________________________________
> Spacewalk-list mailing list
> Spacewalk-list at redhat.com
> https://www.redhat.com/mailman/listinfo/spacewalk-list
>
More information about the Spacewalk-list
mailing list