[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Pulp-list] what removes my sync schedules? *workaround found*

On 26.02.2013 16:28, Jason Connor wrote:
> Unfortunately I've never seen the pickle.loads function hang. I've seen it explode, but not hang. I took a look at the serialized Task snippet in the email, but don't see anything that stands out to me. 

i see it all the times and can reproduce it at will. here's my testcase:

- delete all task snapshots
- restart httpd
- tasks are loaded, everything is fine
- start a big sync job, should be big to get time for the next step
- httpd restart/reload/graceful
- startup hangs at pickle.load() while trying to deserialize the big sync job task

Unfortunately the daily run of logrotate reloads httpd and PULP stops syncing.

i'm by no means competent enough to debug the problem, so here's my workaround:

first, don't reload httpd, use copytruncate or totatelogs:
# cat /etc/logrotate.d/httpd
/var/log/httpd/*log {
# copytruncate wegen PULP-problem mit httpd restart
#    postrotate
#        /sbin/service httpd reload > /dev/null 2>/dev/null || true
#    endscript

second, i patched PULP to delete all task snapshots at startup:

# diff -Purp server/async.py.org server/async.py
--- server/async.py.org        2012-12-05 20:42:57.000000000 +0100
+++ server/async.py        2013-02-26 13:58:15.204527321 +0100
@@ -161,6 +161,13 @@ def _load_persisted_tasks():
     for task in tasks:

+def _delete_persisted_tasks():
+    collection = TaskSnapshot.get_collection()
+    for snapshot in collection.find():
+        log.info(_('Deleted Task from database: %s') % snapshot['id'])
+        last_error = collection.remove({'_id': snapshot['_id']}, safe=True)
+        if not last_error.get('ok', False):
+            raise Exception(repr(last_error))

 def initialize():
@@ -180,7 +187,8 @@ def initialize():
-    _load_persisted_tasks()
+#    _load_persisted_tasks()
+    _delete_persisted_tasks()

it's brutal but i don't want PULP to stop syncing because httpd was reloaded or restarted, an
aborted job is the lesser issue.

This message causes pickle.load() to hang:

2013-02-26 13:50:54,396 21588:140483422095328: pulp.server.async:INFO: async:168 Deleted Task from
database: SON([(u'synchronizer_class', u'cpulp.server.api.
synchronizers\nYumSynchronizer\np0\n.'), (u'weight', 2), (u'class_name', None), (u'hooks',
(u'result', u'N.'), (u'timeout_delta', u'N.'), (u'id', u'182b53a6-8013-11e2-974c-001676c998d5'),
(u'task_class', u'cpulp.server.api.repo_sync_task\\nRepoSyncTask\\np0\\n.'), (u'repo_id',
u'centos5-i386-os'), (u'job_id', None), (u'_ns', u'task_snapshots'), (u'method_name', u'_sync'),
(u'state', u'waiting'), (u'_progress_callback',
u'cpulp.server.api.synchronizers\\nyum_rhn_progress_callback\\np0\\n.'), (u'failure_threshold',
None), (u'progress', None), (u'consecutive_failures', 0), (u'start_time', u'N.'), (u'args',
u"(lp0\\nS'centos5-i386-os'\\np1\\na."), (u'callable',
u'cpulp.server.api.repo_sync\\n_sync\\np0\\n.'), (u'cancel_attempts', 0), (u'exception', u'N.'),
(u'finish_time', u'N.'), (u'schedule_threshold',
u'cdatetime\\ntimedelta\\np0\\n(I0\\nI300\\nI0\\ntp1\\nRp2\\n.'), (u'traceback', u'N.'), (u'kwargs',
u"(dp0\\nS'skip'\\np1\\n(dp2\\nsS'max_speed'\\np3\\nNsS'threads'\\np4\\nNs."), (u'_id',

i have no idea, what causes the hang, this is beyond my knowledge.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]