[katello-devel] Content views operations optimization
Justin Sherrill
jsherril at redhat.com
Fri Jun 14 13:08:26 UTC 2013
On 06/14/2013 05:56 AM, Ivan Necas wrote:
> Hi,
>
> I've started using content views quite heavily recently and waiting for
> various operations to finish made me think if the waiting is really
> necessary for most of the operation.
>
> I might be wrong, but it seems to me, that the only expensive operation
> should be publishing of a new version of a content view: this time
> new repositories are created with content that was not anywhere before.
>
> Other operations with content views don't modify any content:
>
> Content view promotion
> ----------------------
> We basically just copy existing repos without changing their content,
> therefore:
>
> * computing metadata is useless, as we have the very same metadata
> already in the original repositories of the content view version
I can't really speak to this as this is done in pulp, but the only
situation where you could simply 'copy' the metadata would be if the
destination repo was empty and we were copying everything with a single
call. There is no linkage between two repos in pulp except during a
copy operation, so pulp wouldn't necessarily know that two repos are
exactly the same unless the above occured or it checked it against all
repos. Due to performance reasons we have to copy units individually
(rpm, errata, etc...) and for rpms specify distinct fields for the copy
operation.
> * indexing the repositories is useless as it should be just the same as the
> index for the original repositories of the content view version
Its not the same :) We use the repoids field on packages & errata to
make searching useful. Without it we won't know what packages are in
what repos. We potentially could investigate fetching the data from
Elastic Search, modifying the repoid list ourselves and updating it
within elasticsearch. I'm not sure if that would be faster or not. My
guess is that it might be faster (simply becuase fetching data via ES is
faster) but we would have to test it to see.
Another option might be to use the update feature of ES
http://www.elasticsearch.org/guide/reference/api/update/ My guess is
that would be much much faster.
>
> Composite content view publishing
> ---------------------------------
>
> I wonder what operations really need to be performed here? It seems to me,
> that it just references the sub-content-views, not bearing any info about the
> content itself (guessing form the fact, that promotion of a composite means promoting
> the sub-content-views). Still, it takes 10 minutes to publish it with real-world repos
> (RHEL, EPEL, katello)
Due to the way subscription-manager & candlepin work, we are unable to
point a system to repos in two different candlepin environments. A
system can only know a) Organization, b) Candlepin Environment, c)
Content path and it assembles the url from that:
http://HOSTNAME/pulp/repos/ORG/CP_Environment/Content_Path
the Candlepin Environment in this case is the Katello Environment &
Content View Combination. So a system cannot point to
/ACME/Dev/View1/ContentA and /ACME/Dev/View2/ContentB at the same time.
If it could we probably could get away without content views. So we
compromised and did composite views. One option that we could
investigate would be to reuse a pulp repo within pulp for the composite
and its component, and just publish via 2 yum distributors to two
different paths. That may complicate the code greatly, but could be
worth investigation. It would stlil require a publish to occur twice
though, so you are really only saving the repo creation and unit copy
aspects.
I do agree that it takes far to long to publish/promote a content view
with a very large repo and we & the pulp team should try to make it
faster :)
-Justin
>
>
> Am I missing something here, or we are really able to reduce the metadata calculation and
> indexing to the content view publish phase and the rest should be really just about copying
> symlinks (which could also be optimized heavily when learning Pulp how to create a repository
> simply by symlinking another one)
>
> -- Ivan
>
> _______________________________________________
> katello-devel mailing list
> katello-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/katello-devel
More information about the katello-devel
mailing list