[Pulp-list] REST API performance-related question(s)

Brian Bouterse bbouters at redhat.com
Mon Dec 18 22:58:45 UTC 2017


Is there some way you can post the cProfile data? If you don't have a place
to post, one option may be to file an issue at pulp.plan.io.

Consider using one of these tools [4] to look for the line of Pulp code
that has the longest cumulative time. The tools provide sorting so that
should be easy. Then if you look at that part of the Pulp code you can get
an idea of how your task is spending its time. Usually what I find is that
Pulp is waiting on either the disk or the database, and the cProfile report
can show you that. For example if you're waiting on mongo you'll see a lot
of time being attributed to lines in mongo code. That means Pulp calls a
PyMongo method like read() and Pulp just waits for several seconds. That
example would be a "waiting on the db" issue that you can observe with
cProfile. Once you know where the issue is, we can talk about ways to
improve it. Even in cases of DB wait, perhaps there is a way to restructure
the code to read less from the database for example, so there are still
things that can be done.

If you post the data, maybe someone can help root cause the performance
issue.

[4]:
https://docs.pulpproject.org/dev-guide/debugging.html#analyzing-profiles

-Brian


On Mon, Dec 18, 2017 at 5:44 PM, Deej Howard <Deej.Howard at neulion.com>
wrote:

>                 Hoping to follow up on my own questions, I attempted to
> take advantage of the cProfile functionality[3] against a new run of my
> cleanup script (profiling was enabled within the Apache/Pulp container to
> get data on the server side).  This run had an even more isolated run-time
> data set, with only a single invocation of each of the “unassociate”,
> “orphans”, and “publish” operations (clocking in at 15.22, 10.28, and 60.59
> seconds each, respectively), and I did in fact end up with cProfile data
> for each of these 3 tasks.  This is a very nice feature, and I’ll bet that
> data would be really useful to someone who is much more familiar with Pulp
> than I… but as yet I haven’t yet managed to make much use out of it or
> appreciate how it is impacted by my specific repository configurations.
>
>                 Still looking forward to some insights from the experts.
>
>
>
> [3] https://docs.pulpproject.org/dev-guide/debugging.html
>
>
>
> *From:* Deej Howard [mailto:Deej.Howard at neulion.com]
> *Sent:* Friday, December 15, 2017 10:44 AM
> *To:* pulp-list <pulp-list at redhat.com>
> *Cc:* deej.howard at neulion.com
> *Subject:* REST API performance-related question(s)
>
>
>
>                 Hi, I’m using the 2.14.3 release in a Docker-based
> configuration (details below), and I’ve noticed some performance-related
> issues in a script-based artifact cleanup job that is run on a daily
> basis.  The artifacts in question are of our own construction, incorporated
> via the Pulp plugin mechanism, and all residing in a single repository
> (there are around 22K artifacts in that one repo at this point).  The
> Python script makes various Pulp REST API calls, and I’ve put in some extra
> code to give me feedback on how much time each call is taking.  The “query”
> calls have acceptable performance (less than a second, typically), but
> there are others that are much slower;  calls to “unassociate” and
> “orphans” take somewhere around 10s, and calls to “publish” take around 45s.
>
>                 I’m looking for some guidance on how I can improve this
> performance.  I’m not the original author of this code, but I was lucky(?)
> enough to inherit it.  The core algorithm essentially does some queries to
> get the essential “keys” for the artifacts in question, then calls
> “unassociate” with the relevant JSON payload for those artifacts, followed
> by “orphans” to do the actual clean-up action, then “publish” after that
> completes.  This cycle of action is executed potentially multiple times
> within the cleanup script (on a “grouped artifact” basis).
>
>                 Some specific questions I have:
>
>    - Is the methodology outline above appropriate for removing artifacts
>    from a repository, or would some other mechanism be better/more efficient?
>    - In the documentation for implementing support for new types[1],
>    there is mention of a type definition JSON file that belongs in
>    /usr/lib/pulp/plugins/types[2]. Unfortunately, it’s not clear which of
>    the Pulp components (Qpid?  MongoDB?  Resource manager?  Workers?) use that
>    information, and it looks like our installation has no files at all in that
>    directory location.  We have other repo types installed (puppet, python),
>    so I would have expected at least one such file, especially given that the
>    puppet_module is provided as the example in the documentation.   This
>    sounds like it could provide improvements to performance via insertion of
>    search indexes or other such shortcuts.  Where can I find more details
>    about this and/or more extensive examples?
>
>
>
> [1] https://docs.pulpproject.org/dev-guide/newtypesupport/
> plugin/example.html
>
> [2] https://docs.pulpproject.org/dev-guide/newtypesupport/
> plugin/type_defs.html
>
> Environment Details
>
>    - Pulp 2.14.3 using Docker containers based on Centos 7: one
>    Apache/Pulp API container, one Qpid message broker container, one Mongo DB
>    container, one Celery worker management container, one resource
>    manager/task assignment container, and two Pulp worker containers.  All
>    containers are running within a single Docker host, dedicated to only
>    Pulp-related operations.  The diagram at http://docs.pulpproject.org/
>    en/2.14/user-guide/scaling.html
>    <http://docs.pulpproject.org/en/2.14/user-guide/scaling.html> was used
>    as a guide for this setup.
>    - Artifacts are company-proprietary (configured as a Pulp plugin), but
>    essentially are a single ZIP file with attached metadata for tracking and
>    management purposes.
>
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20171218/4c102372/attachment.htm>


More information about the Pulp-list mailing list