[Pulp-list] what removes my sync schedules? *workaround found*

Andreas Piesk a.piesk at gmx.net
Tue Feb 26 18:33:19 UTC 2013


On 26.02.2013 16:28, Jason Connor wrote:
> 
> Unfortunately I've never seen the pickle.loads function hang. I've seen it explode, but not hang. I took a look at the serialized Task snippet in the email, but don't see anything that stands out to me. 
> 

i see it all the times and can reproduce it at will. here's my testcase:

- delete all task snapshots
- restart httpd
- tasks are loaded, everything is fine
- start a big sync job, should be big to get time for the next step
- httpd restart/reload/graceful
- startup hangs at pickle.load() while trying to deserialize the big sync job task

Unfortunately the daily run of logrotate reloads httpd and PULP stops syncing.

i'm by no means competent enough to debug the problem, so here's my workaround:

first, don't reload httpd, use copytruncate or totatelogs:
# cat /etc/logrotate.d/httpd
/var/log/httpd/*log {
    missingok
    notifempty
    sharedscripts
    delaycompress
# copytruncate wegen PULP-problem mit httpd restart
    copytruncate
#    postrotate
#        /sbin/service httpd reload > /dev/null 2>/dev/null || true
#    endscript
}

second, i patched PULP to delete all task snapshots at startup:

# diff -Purp server/async.py.org server/async.py
--- server/async.py.org        2012-12-05 20:42:57.000000000 +0100
+++ server/async.py        2013-02-26 13:58:15.204527321 +0100
@@ -161,6 +161,13 @@ def _load_persisted_tasks():
     for task in tasks:
         enqueue(task)

+def _delete_persisted_tasks():
+    collection = TaskSnapshot.get_collection()
+    for snapshot in collection.find():
+        log.info(_('Deleted Task from database: %s') % snapshot['id'])
+        last_error = collection.remove({'_id': snapshot['_id']}, safe=True)
+        if not last_error.get('ok', False):
+            raise Exception(repr(last_error))

 def initialize():
     """
@@ -180,7 +187,8 @@ def initialize():
                        schedule_threshold=schedule_threshold,
                        storage=SnapshotStorage(),
                        dispatch_interval=5)
-    _load_persisted_tasks()
+#    _load_persisted_tasks()
+    _delete_persisted_tasks()

it's brutal but i don't want PULP to stop syncing because httpd was reloaded or restarted, an
aborted job is the lesser issue.


This message causes pickle.load() to hang:

2013-02-26 13:50:54,396 21588:140483422095328: pulp.server.async:INFO: async:168 Deleted Task from
database: SON([(u'synchronizer_class', u'cpulp.server.api.
synchronizers\nYumSynchronizer\np0\n.'), (u'weight', 2), (u'class_name', None), (u'hooks',
u"(dp0\\nS'dequeue'\\np1\\n(lp2\\n(ipulp.server.event.handler.task\\nTaskDequeued\\np3\\n(dp4\\nbacpulp.server.api.repo_sync\\npost_sync\\np5\\naccopy_reg\\n_reconstructor\\np6\\n(cpulp.server.auth.authorization\\nRevokePermissionsForTask\\np7\\nc__builtin__\\nobject\\np8\\nNtp9\\nRp10\\n(dp11\\nS'user_name'\\np12\\nVpharaoh\\np13\\nsbasS'enqueue'\\np14\\n(lp15\\ng6\\n(cpulp.server.auth.authorization\\nGrantPermissionsForTask\\np16\\ng8\\nNtp17\\nRp18\\n(dp19\\ng12\\ng13\\nsbas."),
(u'result', u'N.'), (u'timeout_delta', u'N.'), (u'id', u'182b53a6-8013-11e2-974c-001676c998d5'),
(u'task_class', u'cpulp.server.api.repo_sync_task\\nRepoSyncTask\\np0\\n.'), (u'repo_id',
u'centos5-i386-os'), (u'job_id', None), (u'_ns', u'task_snapshots'), (u'method_name', u'_sync'),
(u'state', u'waiting'), (u'_progress_callback',
u'cpulp.server.api.synchronizers\\nyum_rhn_progress_callback\\np0\\n.'), (u'failure_threshold',
None), (u'progress', None), (u'consecutive_failures', 0), (u'start_time', u'N.'), (u'args',
u"(lp0\\nS'centos5-i386-os'\\np1\\na."), (u'callable',
u'cpulp.server.api.repo_sync\\n_sync\\np0\\n.'), (u'cancel_attempts', 0), (u'exception', u'N.'),
(u'finish_time', u'N.'), (u'schedule_threshold',
u'cdatetime\\ntimedelta\\np0\\n(I0\\nI300\\nI0\\ntp1\\nRp2\\n.'), (u'traceback', u'N.'), (u'kwargs',
u"(dp0\\nS'skip'\\np1\\n(dp2\\nsS'max_speed'\\np3\\nNsS'threads'\\np4\\nNs."), (u'_id',
u'c7124a71-0307-4434-9bf3-d423c74949ff')])

i have no idea, what causes the hang, this is beyond my knowledge.

regards,
-ap




More information about the Pulp-list mailing list