[katello-devel] Content views operations optimization

Fri Jun 14 13:11:47 UTC 2013

On 06/14/2013 09:09 AM, Jay Dobies wrote:
>>> Content view promotion
>>> ----------------------
>>> We basically just copy existing repos without changing their content,
>>> therefore:
>>>
>>> * computing metadata is useless, as we have the very same metadata
>>>    already in the original repositories of the content view version
>  >
>> I can't really speak to this as this is done in pulp, but the only
>> situation where you could simply 'copy' the metadata would be if the
>> destination repo was empty and we were copying everything with a single
>> call.  There is no linkage between two repos in pulp except during a
>> copy operation, so pulp wouldn't necessarily know that two repos are
>> exactly the same unless the above occured or it checked it against all
>> repos.  Due to performance reasons we have to copy units individually
>> (rpm, errata, etc...) and for rpms specify distinct fields for the copy
>> operation.
>
> A massive +1 to this. In blindly copying the metadata, we'd be ignoring
> the server inventory entirely and potentially publishing content that
> doesn't actually exist in the repository. The inventory model is a very
> core concept in Pulp and bypassing that feels wrong on a number of levels.
>
> As for the copy individually, we have a solution for it, we just haven't
> had time to do it.
>
>
>>> * indexing the repositories is useless as it should be just the same
>>> as the
>>>    index for the original repositories of the content view version
>> Its not the same :)  We use the repoids field on packages & errata to
>> make searching useful.  Without it we won't know what packages are in
>> what repos.  We potentially could investigate fetching the data from
>> Elastic Search, modifying the repoid list ourselves and updating it
>> within elasticsearch.  I'm not sure if that would be faster or not.  My
>> guess is that it might be faster (simply becuase fetching data via ES is
>> faster) but we would have to test it to see.
>>
>> Another option might be to use the update feature of ES
>> http://www.elasticsearch.org/guide/reference/api/update/   My guess is
>> that would be much much faster.
>>
>>>
>>> Composite content view publishing
>>> ---------------------------------
>>>
>>> I wonder what operations really need to be performed here? It seems to
>>> me,
>>> that it just references the sub-content-views, not bearing any info
>>> about the
>>> content itself (guessing form the fact, that promotion of a composite
>>> means promoting
>>> the sub-content-views). Still, it takes 10 minutes to publish it with
>>> real-world repos
>>> (RHEL, EPEL, katello)
>>
>> Due to the way subscription-manager & candlepin work, we are unable to
>> point a system to repos in two different candlepin environments.  A
>> system can only know a) Organization, b) Candlepin Environment, c)
>> Content path and it assembles the url from that:
>>
>>    http://HOSTNAME/pulp/repos/ORG/CP_Environment/Content_Path
>>
>> the Candlepin Environment in this case is the Katello Environment &
>> Content View Combination.  So a system cannot point to
>> /ACME/Dev/View1/ContentA and /ACME/Dev/View2/ContentB at the same time.
>> If it could we probably could get away without content views.  So we
>> compromised and did composite views.   One option that we could
>> investigate would be to reuse a pulp repo within pulp for the composite
>> and its component, and just publish via 2 yum distributors to two
>> different paths.   That may complicate the code greatly, but could be
>> worth investigation.  It would stlil require a publish to occur twice
>> though, so you are really only saving the repo creation and unit copy
>> aspects.
>>
>>
>> I do agree that it takes far to long to publish/promote a content view
>> with a very large repo and we & the pulp team should try to make it
>> faster :)

So do we. The goal is going to be to rewrite the distributor like we 
rewrote the importer for 2.2. We make some symlinks and concatenate some 
XML; there's no reason it should take as long as it does today.

>> -Justin
>>
>>
>>>
>>>
>>> Am I missing something here, or we are really able to reduce the
>>> metadata calculation and
>>> indexing to the content view publish phase and the rest should be
>>> really just about copying
>>> symlinks (which could also be optimized heavily when learning Pulp how
>>> to create a repository
>>> simply by symlinking another one)
>>>
>>> -- Ivan
>>>
>>> _______________________________________________
>>> katello-devel mailing list
>>> katello-devel at redhat.com
>>> https://www.redhat.com/mailman/listinfo/katello-devel
>>
>> _______________________________________________
>> katello-devel mailing list
>> katello-devel at redhat.com
>> https://www.redhat.com/mailman/listinfo/katello-devel
>
>

-- 
Jay Dobies
Freenode: jdob @ #pulp
http://pulpproject.org