[Pulp-list] 2.8 Tasks Stuck at Waiting

Brian Bouterse bbouters at redhat.com
Tue Apr 19 11:44:02 UTC 2016


All users reporting issues with stuck tasks have been resolved. There is
no evidence of a Pulp defect that this time. Given that, I am closing
the two bugs used to track the "stuck tasks" issues[0][1].

If you experience stuck tasks please do e-mail the list or open a new
bug, but be sure that you're running the latest python-kombu rpm as
discussed here[2].

[0]: https://pulp.plan.io/issues/1363
[1]: https://pulp.plan.io/issues/1838
[2]: https://www.redhat.com/archives/pulp-list/2016-April/msg00020.html

-Brian

On 04/15/2016 03:26 PM, Brian Bouterse wrote:
> I'm very interested in determining if there is a legitimate issue with
> the Pulp tasking system or not. I need more information from the users
> observing these issues.
> 
> If you observe a task that is stuck at Waiting please ping 'bmbouter' in
> #pulp and I will syncrhonously ask you for the info needed to get to the
> bottom of this.
> 
> FWIW, we are tracking this issue here[0], but the current reports look
> more like environmental issues and not legitimate Pulp bugs.
> 
> [0]: https://pulp.plan.io/issues/1838
> 
> -Brian
> 
> On 04/12/2016 04:14 PM, Paul Jochum wrote:
>> Slight correction:  (in the middle of my email):
>>
>> I did grep /var/log/httpd/ssl_access_log *for my ta**sk**-id
>> (**f3a4a828-1a98-4d30-989f-a8f62acc29f9)* in /var/log/* and
>> /var/log/*/*, but it only found it in /var/log/httpd/ssl_access_log and
>> /var/log/httpd/ssl_request_log (lines like:
>> /var/log/httpd/ssl_access_log:135.1.215.52 - - [12/Apr/2016:15:04:17
>> -0500] "GET /pulp/api/v2/tasks/f3a4a828-1a98-4d30-989f-a8f62acc29f9/
>> HTTP/1.1" 200 579
>>
>> regards,
>> Paul
>> On 04/12/2016 03:09 PM, EXT Paul Jochum wrote:
>>> Hi Brian:
>>>
>>>     I am running into the same problem of "waiting", so here are my
>>> outputs (let me know if you need anything else):
>>>
>>> -sh-4.1# pulp-admin -vv tasks details --task-id
>>> f3a4a828-1a98-4d30-989f-a8f62acc29f9
>>> +----------------------------------------------------------------------+
>>>                               Task Details
>>> +----------------------------------------------------------------------+
>>>
>>> 2016-04-12 14:59:46,539 - DEBUG - sending GET request to
>>> /pulp/api/v2/tasks/f3a4a828-1a98-4d30-989f-a8f62acc29f9/
>>> 2016-04-12 14:59:46,602 - INFO - GET request to
>>> /pulp/api/v2/tasks/f3a4a828-1a98-4d30-989f-a8f62acc29f9/ with
>>> parameters None
>>> 2016-04-12 14:59:46,602 - INFO - Response status : 200
>>>
>>> 2016-04-12 14:59:46,602 - INFO - Response body :
>>>  {
>>>   "exception": null,
>>>   "task_type": "pulp.server.managers.repo.sync.sync",
>>>   "_href": "/pulp/api/v2/tasks/f3a4a828-1a98-4d30-989f-a8f62acc29f9/",
>>>   "task_id": "f3a4a828-1a98-4d30-989f-a8f62acc29f9",
>>>   "tags": [
>>>     "pulp:repository:rhel-i386-server-optional-6.5.z-debuginfo",
>>>     "pulp:action:sync"
>>>   ],
>>>   "finish_time": null,
>>>   "_ns": "task_status",
>>>   "start_time": null,
>>>   "traceback": null,
>>>   "spawned_tasks": [],
>>>   "progress_report": {},
>>>   "queue": "None.dq",
>>>   "state": "waiting",
>>>   "worker_name": null,
>>>   "result": null,
>>>   "error": null,
>>>   "_id": {
>>>     "$oid": "570d1727e75b4adec6bd0d58"
>>>   },
>>>   "id": "570d1727e75b4adec6bd0d58"
>>> }
>>>
>>> Operations:       sync
>>> Resources:        rhel-i386-server-optional-6.5.z-debuginfo (repository)
>>> State:            Waiting
>>> Start Time:       Unstarted
>>> Finish Time:      Incomplete
>>> Result:           Incomplete
>>> Task Id:          f3a4a828-1a98-4d30-989f-a8f62acc29f9
>>> Progress Report:
>>>
>>>
>>> -sh-4.1# ps -awfux | grep celery
>>> Warning: bad syntax, perhaps a bogus '-'? See
>>> /usr/share/doc/procps-3.2.8/FAQ
>>> root     16147  0.0  0.0   6444   704 pts/2    S+   15:01 0:00 \_ grep
>>> celery
>>> apache     977  0.1  0.1 561608 25908 ?        Sl   Apr09   6:15
>>> /usr/bin/python /usr/bin/celery beat
>>> --app=pulp.server.async.celery_instance.celery
>>> --scheduler=pulp.server.async.scheduler.Scheduler
>>> --workdir=/var/run/pulp/ -f /var/log/pulp/celerybeat.log -l INFO
>>> --detach --pidfile=/var/run/pulp/celerybeat.pid
>>> apache    9558  0.0  0.4 553760 58972 ?        Sl   10:40   0:05
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n resource_manager at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO -Q resource_manager
>>> --logfile=/var/log/pulp/resource_manager.log
>>> --pidfile=/var/run/pulp/resource_manager.pid
>>> apache    9602  0.4  0.3 472928 50512 ?        Sl   10:40   1:16  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n resource_manager at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO -Q resource_manager
>>> --logfile=/var/log/pulp/resource_manager.log
>>> --pidfile=/var/run/pulp/resource_manager.pid
>>> apache    9792  0.0  0.4 553976 58924 ?        Sl   10:41   0:06
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-0 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-0.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-0.pid
>>> apache    9865  0.0  0.3 472916 49692 ?        S    10:41   0:00  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-0 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-0.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-0.pid
>>> apache    9807  0.0  0.4 553976 58920 ?        Sl   10:41   0:06
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-1 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-1.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-1.pid
>>> apache    9887  0.0  0.3 472916 49688 ?        S    10:41   0:00  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-1 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-1.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-1.pid
>>> apache    9826  0.0  0.4 553564 58896 ?        Sl   10:41   0:06
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-2 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-2.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-2.pid
>>> apache    9893  0.0  0.3 472924 49684 ?        S    10:41   0:00  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-2 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-2.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-2.pid
>>> apache    9847  0.0  0.4 553976 58912 ?        Sl   10:41   0:06
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-3 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-3.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-3.pid
>>> apache    9917  0.0  0.3 472916 49692 ?        S    10:41   0:00  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-3 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-3.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-3.pid
>>> apache    9871  0.0  0.4 553752 58896 ?        Sl   10:41   0:06
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-4 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-4.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-4.pid
>>> apache    9945  0.0  0.3 472920 49684 ?        S    10:41   0:00  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-4 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-4.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-4.pid
>>> apache    9896  0.0  0.4 553988 58932 ?        Sl   10:41   0:06
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-5 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-5.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-5.pid
>>> apache    9982  0.0  0.3 472924 49700 ?        S    10:41   0:00  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-5 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-5.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-5.pid
>>> apache    9918  0.0  0.4 553972 58916 ?        Sl   10:41   0:06
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-6 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-6.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-6.pid
>>> apache    9990  0.0  0.3 472912 49680 ?        S    10:41   0:00  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-6 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-6.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-6.pid
>>> apache    9940  0.0  0.4 553756 58892 ?        Sl   10:41   0:06
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-7 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-7.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-7.pid
>>> apache   10013  0.0  0.3 472920 49684 ?        S    10:41   0:00  \_
>>> /usr/bin/python -m celery.__main__ worker --heartbeat-interval=30 -c 1
>>> -n reserved_resource_worker-7 at lss-pulp01.ih.lucent.com --events
>>> --app=pulp.server.async.app --loglevel=INFO
>>> --logfile=/var/log/pulp/reserved_resource_worker-7.log
>>> --pidfile=/var/run/pulp/reserved_resource_worker-7.pid
>>> -sh-4.1#
>>>
>>>
>>> I did grep /var/log/httpd/ssl_access_log in /var/log/* and
>>> /var/log/*/*, but it only found it in /var/log/httpd/ssl_access_log
>>> and /var/log/httpd/ssl_request_log (lines like:
>>> /var/log/httpd/ssl_access_log:135.1.215.52 - - [12/Apr/2016:15:04:17
>>> -0500] "GET /pulp/api/v2/tasks/f3a4a828-1a98-4d30-989f-a8f62acc29f9/
>>> HTTP/1.1" 200 579
>>>
>>>
>>> -sh-4.1# rpm -q -a | grep -i pulp | sort
>>> mod_wsgi-3.4-2.pulp.el6.x86_64
>>> pulp-admin-client-2.8.1-1.el6.noarch
>>> pulp-puppet-admin-extensions-2.8.1-1.el6.noarch
>>> pulp-puppet-plugins-2.8.1-1.el6.noarch
>>> pulp-rpm-admin-extensions-2.8.1-1.el6.noarch
>>> pulp-rpm-plugins-2.8.1-1.el6.noarch
>>> pulp-selinux-2.8.1-1.el6.noarch
>>> pulp-server-2.8.1-1.el6.noarch
>>> python-isodate-0.5.0-4.pulp.el6.noarch
>>> python-kombu-3.0.33-4.pulp.el6.noarch
>>> python-pulp-bindings-2.8.1-1.el6.noarch
>>> python-pulp-client-lib-2.8.1-1.el6.noarch
>>> python-pulp-common-2.8.1-1.el6.noarch
>>> python-pulp-oid_validation-2.8.1-1.el6.noarch
>>> python-pulp-puppet-common-2.8.1-1.el6.noarch
>>> python-pulp-repoauth-2.8.1-1.el6.noarch
>>> python-pulp-rpm-common-2.8.1-1.el6.noarch
>>>
>>>
>>> -sh-4.1# uname -a
>>> Linux lss-pulp01.ih.lucent.com 2.6.32-042stab113.21 #1 SMP Wed Mar 23
>>> 11:05:25 MSK 2016 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>>
>>> -sh-4.1# more /etc/*release*
>>> ::::::::::::::
>>> /etc/redhat-release
>>> ::::::::::::::
>>> Scientific Linux release 6.7 (Carbon)
>>> ::::::::::::::
>>> /etc/system-release
>>> ::::::::::::::
>>> Scientific Linux release 6.7 (Carbon)
>>> ::::::::::::::
>>> /etc/system-release-cpe
>>> ::::::::::::::
>>> cpe:/o:redhat:enterprise_linux:6.7:ga
>>> -sh-4.1#
>>>
>>> On 04/11/2016 03:32 PM, Brian Bouterse wrote:
>>>> To anyone who has a stuck task running Pulp 2.6 or newer, can you open a
>>>> bug or e-mail the list with the following. The full output from:
>>>>
>>>> pulp-admin -vv tasks details --task-id <the_stuck_task_UUID>
>>>>
>>>> Post the full output, but observe the value of "worker_name" similar to
>>>> what was suggested here[0].
>>>>
>>>> Can you also include the `sudo ps -awfux | grep celery` output to show
>>>> your running process?
>>>>
>>>> Also can you grep the logs for that UUID and include any lines that you
>>>> see as output too.
>>>>
>>>> [0]: https://www.redhat.com/archives/pulp-list/2016-April/msg00026.html
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>>
>>>> On 04/11/2016 01:32 PM, Mihai Ibanescu wrote:
>>>>> I have a similar problem on 2.7.1 with rabbitmq.
>>>>>
>>>>> Occasionally a task will get "lost":
>>>>>
>>>>> pulp-admin tasks details --task-id 622041ac-e9e4-4a15-bd7c-7c98a17782e0
>>>>> +----------------------------------------------------------------------+
>>>>>
>>>>>                                Task Details
>>>>> +----------------------------------------------------------------------+
>>>>>
>>>>>
>>>>> Operations:       publish
>>>>> Resources:        thirdparty-snapshot-rpm-latest (repository)
>>>>> State:            Waiting
>>>>> Start Time:       Unstarted
>>>>> Finish Time:      Incomplete
>>>>> Result:           Incomplete
>>>>> Task Id:          622041ac-e9e4-4a15-bd7c-7c98a17782e0
>>>>> Progress Report:
>>>>>
>>>>>
>>>>> I have seen more than one getting in this state.
>>>>>
>>>>> I can cancel the task, but I'd like to find out if I can tickle
>>>>> something to re-process the queue. Other tasks do get correctly
>>>>> processed.
>>>>>
>>>>> If there's something you would like me to do in order to debug this
>>>>> further, please let me know.
>>>>>
>>>>> Thank you!
>>>>> Mihai
>>>>>
>>>>> On Fri, Apr 8, 2016 at 9:27 AM, Brian Bouterse <bbouters at redhat.com
>>>>> <mailto:bbouters at redhat.com>> wrote:
>>>>>
>>>>>      There is a known issue [0] introduced in 2.8.0 which will cause
>>>>> your
>>>>>      tasks to become stuck at waiting. This occurs after Pulp runs
>>>>> normally
>>>>>      for hours or days.
>>>>>
>>>>>      A fix is completed but won't be released until Pulp 2.8.3 is
>>>>> released.
>>>>>      The good news is, you can resolve that issue now by upgrading
>>>>>      python-kombu on your existing 2.8.0, 2.8.1, or 2.8.2 system.
>>>>> Manually
>>>>>      install python-kombu-3.0.33-5.pulp for your distro on your system,
>>>>>      restart Pulp services and you'll be all set. All of the links
>>>>> below come
>>>>>      out of the Pulp build system[1]
>>>>>
>>>>>      EL6: http://tinyurl.com/j5sy5zh
>>>>>      EL7: http://tinyurl.com/jppwcdv
>>>>>      F22: http://tinyurl.com/hp7z4sq
>>>>>      F23: http://tinyurl.com/zplzua3
>>>>>      F24: update is in -testing repo [2]
>>>>>      rawhide: already available in stable
>>>>>
>>>>>      These are the same versions of python-kombu that you'll receive
>>>>> when
>>>>>      2.8.3 comes out so upgrading this package will keep your
>>>>> packages on the
>>>>>      normal upgrade path. These newer python-kombu versions are
>>>>> compatible
>>>>>      with 2.8.0, 2.8.1, and 2.8.2 versions of Pulp.
>>>>>
>>>>>      [0]: https://pulp.plan.io/issues/1801
>>>>>      [1]: http://koji.katello.org/koji/packageinfo?packageID=665
>>>>>      [2]:
>>>>> https://bodhi.fedoraproject.org/updates/FEDORA-2016-ec038bbf19
>>>>>
>>>>>      -Brian
>>>>>
>>>>>      _______________________________________________
>>>>>      Pulp-list mailing list
>>>>>      Pulp-list at redhat.com <mailto:Pulp-list at redhat.com>
>>>>>      https://www.redhat.com/mailman/listinfo/pulp-list
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Pulp-list mailing list
>>>> Pulp-list at redhat.com
>>>> https://www.redhat.com/mailman/listinfo/pulp-list
>>>
>>> _______________________________________________
>>> Pulp-list mailing list
>>> Pulp-list at redhat.com
>>> https://www.redhat.com/mailman/listinfo/pulp-list
>>
> 
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
> 




More information about the Pulp-list mailing list