[Pulp-list] REST API performance-related question(s)

Deej Howard Deej.Howard at neulion.com
Mon Dec 18 22:44:16 UTC 2017


                Hoping to follow up on my own questions, I attempted to
take advantage of the cProfile functionality[3] against a new run of my
cleanup script (profiling was enabled within the Apache/Pulp container to
get data on the server side).  This run had an even more isolated run-time
data set, with only a single invocation of each of the “unassociate”,
“orphans”, and “publish” operations (clocking in at 15.22, 10.28, and 60.59
seconds each, respectively), and I did in fact end up with cProfile data
for each of these 3 tasks.  This is a very nice feature, and I’ll bet that
data would be really useful to someone who is much more familiar with Pulp
than I… but as yet I haven’t yet managed to make much use out of it or
appreciate how it is impacted by my specific repository configurations.

                Still looking forward to some insights from the experts.



[3] https://docs.pulpproject.org/dev-guide/debugging.html



*From:* Deej Howard [mailto:Deej.Howard at neulion.com]
*Sent:* Friday, December 15, 2017 10:44 AM
*To:* pulp-list <pulp-list at redhat.com>
*Cc:* deej.howard at neulion.com
*Subject:* REST API performance-related question(s)



                Hi, I’m using the 2.14.3 release in a Docker-based
configuration (details below), and I’ve noticed some performance-related
issues in a script-based artifact cleanup job that is run on a daily
basis.  The artifacts in question are of our own construction, incorporated
via the Pulp plugin mechanism, and all residing in a single repository
(there are around 22K artifacts in that one repo at this point).  The
Python script makes various Pulp REST API calls, and I’ve put in some extra
code to give me feedback on how much time each call is taking.  The “query”
calls have acceptable performance (less than a second, typically), but
there are others that are much slower;  calls to “unassociate” and
“orphans” take somewhere around 10s, and calls to “publish” take around 45s.

                I’m looking for some guidance on how I can improve this
performance.  I’m not the original author of this code, but I was lucky(?)
enough to inherit it.  The core algorithm essentially does some queries to
get the essential “keys” for the artifacts in question, then calls
“unassociate” with the relevant JSON payload for those artifacts, followed
by “orphans” to do the actual clean-up action, then “publish” after that
completes.  This cycle of action is executed potentially multiple times
within the cleanup script (on a “grouped artifact” basis).

                Some specific questions I have:

   - Is the methodology outline above appropriate for removing artifacts
   from a repository, or would some other mechanism be better/more efficient?
   - In the documentation for implementing support for new types[1], there
   is mention of a type definition JSON file that belongs in
   /usr/lib/pulp/plugins/types[2]. Unfortunately, it’s not clear which of
   the Pulp components (Qpid?  MongoDB?  Resource manager?  Workers?) use that
   information, and it looks like our installation has no files at all in that
   directory location.  We have other repo types installed (puppet, python),
   so I would have expected at least one such file, especially given that the
   puppet_module is provided as the example in the documentation.   This
   sounds like it could provide improvements to performance via insertion of
   search indexes or other such shortcuts.  Where can I find more details
   about this and/or more extensive examples?



[1]
https://docs.pulpproject.org/dev-guide/newtypesupport/plugin/example.html

[2]
https://docs.pulpproject.org/dev-guide/newtypesupport/plugin/type_defs.html

Environment Details

   - Pulp 2.14.3 using Docker containers based on Centos 7: one Apache/Pulp
   API container, one Qpid message broker container, one Mongo DB container,
   one Celery worker management container, one resource manager/task
   assignment container, and two Pulp worker containers.  All containers are
   running within a single Docker host, dedicated to only Pulp-related
   operations.  The diagram at
   http://docs.pulpproject.org/en/2.14/user-guide/scaling.html was used as
   a guide for this setup.
   - Artifacts are company-proprietary (configured as a Pulp plugin), but
   essentially are a single ZIP file with attached metadata for tracking and
   management purposes.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20171218/e22755b2/attachment.htm>


More information about the Pulp-list mailing list