[Pulp-list] Tasks stuck in waiting state

Brian Bouterse bbouters at redhat.com
Thu Sep 22 20:28:55 UTC 2016


When using Pulp with Qpid (default broker) there is a hard-to-reproduce 
deadlocking bug [0]. The bug is in Qpid not Pulp, but we are very 
interested in seeing it resolved.

In terms of clearing out your task operations, this will happen 
naturally if all the pulp workers are killed and restarted. If it's 
really bad you could consider running `sudo kill -9 -f celery` which 
kills all pulp workers.

You could also issue cancel for all outstanding tasks with pulp-admin 
and then kill+restart at which point your system will be empty when 
processes finish starting. Note that deadlocked workers usually need to 
be killed with SIGKILL before being restarted.

Many users never experience this problem. A few users do experience it 
and usually they experience it again. Several devs have tried to 
reproduce this but we have not been able to.

The Qpid project is aware and investigating. I believe they have some 
rpms that provide a new version of python-qpid which is specifically 
patched for this issue. I'm waiting for them to produce rpms for the 
different distros so that affected users can evaluate if it resolves 
their issue.

One other option to be aware of is that Pulp does support rabbitMQ and 
has not experienced this deadlocking issue. See the docs and server.conf 
for more info. FYI Pulp currently only tests the releases against Qpid.

[0]: https://issues.apache.org/jira/browse/QPID-7317

-Brian


On 09/21/2016 09:00 PM, Erinn Looney-Triggs wrote:
> I have 52 tasks that are stuck in a waiting state with nothing in a
> running state. I don't know much about pulp at this point, I am just
> fighting my way through satellite in an attempt to make it stable, but
> this looks a bit odd to me:
>
> pulp-admin -u admin -p  tasks list | grep -i waiting | wc -l
> 52
>
> pulp-admin -u admin -p tasks list --state running
> +----------------------------------------------------------------------+
>                                  Tasks
> +----------------------------------------------------------------------+
>
> No tasks found
>
> The tasks, with the exception of one are all unit_update operations, the
> remaining one is a sync operation.
>
> I have done many restarts of the pulp processes with no luck in clearing
> these out, I can kill them off of course, but I would prefer to know
> what is going on here. Also chances are very good this will happen again.
>
> Thanks,
> -Erinn
>
> The technical details:
> RHEL 7.2
>
> rpm -qa | grep pulp
> pulp-katello-1.0.1-1.el7sat.noarch
> rubygem-smart_proxy_pulp-1.2.2-1.el7sat.noarch
> python-pulp-repoauth-2.8.3.4-1.el7sat.noarch
> python-pulp-client-lib-2.8.3.4-1.el7sat.noarch
> pulp-docker-plugins-2.0.1.1-1.el7sat.noarch
> pulp-selinux-2.8.3.4-1.el7sat.noarch
> pulp-server-2.8.3.4-1.el7sat.noarch
> pulp-client-1.0-1.noarch
> python-pulp-common-2.8.3.4-1.el7sat.noarch
> pulp-rpm-admin-extensions-2.8.3.5-1.el7sat.noarch
> python-pulp-docker-common-2.0.1.1-1.el7sat.noarch
> pulp-ostree-plugins-1.1.1-2.el7sat.noarch
> pulp-puppet-plugins-2.8.3.3-1.el7sat.noarch
> python-pulp-bindings-2.8.3.4-1.el7sat.noarch
> python-isodate-0.5.0-4.pulp.el7sat.noarch
> python-pulp-streamer-2.8.3.4-1.el7sat.noarch
> python-pulp-oid_validation-2.8.3.4-1.el7sat.noarch
> python-pulp-agent-lib-2.8.3.4-1.el7sat.noarch
> python-pulp-ostree-common-1.1.1-2.el7sat.noarch
> pulp-rpm-handlers-2.8.3.5-1.el7sat.noarch
> pulp-admin-client-2.8.3.4-1.el7sat.noarch
> pulp-rpm-plugins-2.8.3.5-1.el7sat.noarch
> python-pulp-rpm-common-2.8.3.5-1.el7sat.noarch
> python-pulp-puppet-common-2.8.3.3-1.el7sat.noarch
> pulp-puppet-tools-2.8.3.3-1.el7sat.noarch
>
> ps -awfux | grep celery
> root      65959  0.0  0.0 112648   972 pts/0    S+   18:57   0:00  |
>               \_ grep --color=auto celery
> apache    52282  0.1  0.0 685240 63396 ?        Ssl  18:46   0:00
> /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n
> resource_manager@%h -Q resource_manager -c 1 --events --umask 18
> --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
> apache    52406  0.0  0.0 595524 53092 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -A pulp.server.async.app -n
> resource_manager@%h -Q resource_manager -c 1 --events --umask 18
> --pidfile=/var/run/pulp/resource_manager.pid --heartbeat-interval=30
> apache    52424  0.1  0.0 685240 63272 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-0.pid
> --heartbeat-interval=30
> apache    52692  0.0  0.0 610828 56536 ?        Sl   18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-0@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-0.pid
> --heartbeat-interval=30
> apache    52426  0.1  0.0 684664 63404 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-1.pid
> --heartbeat-interval=30
> apache    52714  0.0  0.0 595524 53052 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-1@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-1.pid
> --heartbeat-interval=30
> apache    52428  0.1  0.0 684668 63428 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-2.pid
> --heartbeat-interval=30
> apache    52715  0.0  0.0 595528 53056 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-2@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-2.pid
> --heartbeat-interval=30
> apache    52430  0.1  0.0 684660 63236 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-3.pid
> --heartbeat-interval=30
> apache    52745  0.0  0.0 595520 53072 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-3@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-3.pid
> --heartbeat-interval=30
> apache    52432  0.1  0.0 684664 63224 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-4.pid
> --heartbeat-interval=30
> apache    52749  0.0  0.0 595520 53096 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-4@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-4.pid
> --heartbeat-interval=30
> apache    52434  0.1  0.0 684668 63388 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-5.pid
> --heartbeat-interval=30
> apache    52750  0.0  0.0 595528 53056 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-5@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-5.pid
> --heartbeat-interval=30
> apache    52436  0.1  0.0 684660 65480 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-6.pid
> --heartbeat-interval=30
> apache    52724  0.0  0.0 595524 55092 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-6@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-6.pid
> --heartbeat-interval=30
> apache    52440  0.1  0.0 684664 63364 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-7.pid
> --heartbeat-interval=30
> apache    52720  0.0  0.0 595524 53088 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-7@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-7.pid
> --heartbeat-interval=30
> apache    52444  0.1  0.0 684664 63248 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-8@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-8.pid
> --heartbeat-interval=30
> apache    52747  0.0  0.0 595524 53080 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-8@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-8.pid
> --heartbeat-interval=30
> apache    52453  0.1  0.0 684684 65432 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-9@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-9.pid
> --heartbeat-interval=30
> apache    52752  0.0  0.0 595516 53060 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-9@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-9.pid
> --heartbeat-interval=30
> apache    52459  0.1  0.0 684696 63416 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-10@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-10.pid
> --heartbeat-interval=30
> apache    52725  0.0  0.0 595524 53056 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-10@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-10.pid
> --heartbeat-interval=30
> apache    52468  0.1  0.0 684696 63400 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-11@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-11.pid
> --heartbeat-interval=30
> apache    52716  0.0  0.0 595524 53044 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-11@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-11.pid
> --heartbeat-interval=30
> apache    52472  0.1  0.0 684688 63416 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-12@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-12.pid
> --heartbeat-interval=30
> apache    52729  0.0  0.0 595516 53044 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-12@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-12.pid
> --heartbeat-interval=30
> apache    52479  0.1  0.0 684692 63424 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-13@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-13.pid
> --heartbeat-interval=30
> apache    52722  0.0  0.0 595520 53084 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-13@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-13.pid
> --heartbeat-interval=30
> apache    52486  0.1  0.0 685272 63432 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-14@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-14.pid
> --heartbeat-interval=30
> apache    52708  0.0  0.0 669764 54612 ?        Sl   18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-14@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-14.pid
> --heartbeat-interval=30
> apache    52491  0.1  0.0 684652 63360 ?        Ssl  18:46   0:01
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-15@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-15.pid
> --heartbeat-interval=30
> apache    52731  0.0  0.0 595516 53040 ?        S    18:46   0:00  \_
> /usr/bin/python /usr/bin/celery worker -n reserved_resource_worker-15@%h
> -A pulp.server.async.app -c 1 --events --umask 18
> --pidfile=/var/run/pulp/reserved_resource_worker-15.pid
> --heartbeat-interval=30
> apache    52570  0.7  0.0 690292 44016 ?        Ssl  18:46   0:05
> /usr/bin/python /usr/bin/celery beat
> --app=pulp.server.async.celery_instance.celery
> --scheduler=pulp.server.async.scheduler.Scheduler
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>




More information about the Pulp-list mailing list