[katello-devel] Content views operations optimization

Justin Sherrill jsherril at redhat.com
Fri Jun 14 13:08:26 UTC 2013


On 06/14/2013 05:56 AM, Ivan Necas wrote:
> Hi,
>
> I've started using content views quite heavily recently and waiting for
> various operations to finish made me think if the waiting is really
> necessary for most of the operation.
>
> I might be wrong, but it seems to me, that the only expensive operation
> should be publishing of a new version of a content view: this time
> new repositories are created with content that was not anywhere before.
>
> Other operations with content views don't modify any content:
>
> Content view promotion
> ----------------------
> We basically just copy existing repos without changing their content,
> therefore:
>
> * computing metadata is useless, as we have the very same metadata
>    already in the original repositories of the content view version
I can't really speak to this as this is done in pulp, but the only 
situation where you could simply 'copy' the metadata would be if the 
destination repo was empty and we were copying everything with a single 
call.  There is no linkage between two repos in pulp except during a 
copy operation, so pulp wouldn't necessarily know that two repos are 
exactly the same unless the above occured or it checked it against all 
repos.  Due to performance reasons we have to copy units individually 
(rpm, errata, etc...) and for rpms specify distinct fields for the copy 
operation.

> * indexing the repositories is useless as it should be just the same as the
>    index for the original repositories of the content view version
Its not the same :)  We use the repoids field on packages & errata to 
make searching useful.  Without it we won't know what packages are in 
what repos.  We potentially could investigate fetching the data from 
Elastic Search, modifying the repoid list ourselves and updating it 
within elasticsearch.  I'm not sure if that would be faster or not.  My 
guess is that it might be faster (simply becuase fetching data via ES is 
faster) but we would have to test it to see.

Another option might be to use the update feature of ES 
http://www.elasticsearch.org/guide/reference/api/update/   My guess is 
that would be much much faster.

>
> Composite content view publishing
> ---------------------------------
>
> I wonder what operations really need to be performed here? It seems to me,
> that it just references the sub-content-views, not bearing any info about the
> content itself (guessing form the fact, that promotion of a composite means promoting
> the sub-content-views). Still, it takes 10 minutes to publish it with real-world repos
> (RHEL, EPEL, katello)

Due to the way subscription-manager & candlepin work, we are unable to 
point a system to repos in two different candlepin environments.  A 
system can only know a) Organization, b) Candlepin Environment, c) 
Content path and it assembles the url from that:

   http://HOSTNAME/pulp/repos/ORG/CP_Environment/Content_Path

the Candlepin Environment in this case is the Katello Environment & 
Content View Combination.  So a system cannot point to 
/ACME/Dev/View1/ContentA and /ACME/Dev/View2/ContentB at the same time.  
If it could we probably could get away without content views.  So we 
compromised and did composite views.   One option that we could 
investigate would be to reuse a pulp repo within pulp for the composite 
and its component, and just publish via 2 yum distributors to two 
different paths.   That may complicate the code greatly, but could be 
worth investigation.  It would stlil require a publish to occur twice 
though, so you are really only saving the repo creation and unit copy 
aspects.


I do agree that it takes far to long to publish/promote a content view 
with a very large repo and we & the pulp team should try to make it 
faster :)

-Justin


>
>
> Am I missing something here, or we are really able to reduce the metadata calculation and
> indexing to the content view publish phase and the rest should be really just about copying
> symlinks (which could also be optimized heavily when learning Pulp how to create a repository
> simply by symlinking another one)
>
> -- Ivan
>
> _______________________________________________
> katello-devel mailing list
> katello-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/katello-devel




More information about the katello-devel mailing list