[Pulp-list] Importer Sync APIs

Jay Dobies jason.dobies at redhat.com
Tue Nov 22 15:27:40 UTC 2011


On 11/22/2011 10:08 AM, Jason L Connor wrote:
> Hey Jay,
>
> Nice write-up. I'm not the best person to speak to the necessary steps
> to synchronize a repository, but I'll add what comments I can.
>
> It's probably important to note that the steps, 1-6, are actually calls
> by pulp into the plugin via a pre-defined api that the plugin will have
> to implement. It'd be interesting to show what those calls look like.

Actually, that's not the case at all. That entire write up was only 
about the importer's sync_repo method. In steps 1-6, the things labeled 
"Conduit Calls" are the calls back into Pulp. But that whole process 
took place during the importer's sync_repo implementation.

> I'm not sure this comment matters, but I was thinking that steps 1 and 2
> (query external feed and current state of repository) are ordered
> arbitrarily. (OCD made me point this out, don't think it's useful)
>
> In step 2, current state of the repository, the get_unit_keys_for_repo
> could be simply get_units_for_repo, and include more information. Along
> with the "unit key" we could return the unit_id (solving the unassociate
> problem), the stored metadata, and the file path. All this information
> may be useful to the plugin writer for determining whether or not a
> content unit needs to be updated, unassociated, or ignored.

See my first reply to Nick, I'm with you guys on changing this.

> In step 4, add or update units, I fully agree that add or update should
> be combined with associate in the conduit call. It seems unnecessary to
> me that they are separate. Since add/update is one idempotent call, I
> don't think that batching this operation always buy you anything. Mongo
> allows multi updates, but not multi adds. Though a batch operation would
> be pretty cool from an "ease of use" perspective.
>
> You should probably note that unit_id is the return of
> add_or_update_content_unit under it's description instead of under
> associate_content_unit's description.
>
> In step 5, unassociate removed units, I think pulp should provide some
> standard metadata on repositories, perhaps a "_pulp_uploaded" field that
> lists the unit_ids that were manually uploaded into the repository by
> pulp. The plugin then can makes use or not make use of this information
> as they see fit. This could also be a flag on the content unit meta data
> returned by my proposed get_units_for_repo call.
>
> One thing that seems to be missing is a sync_progress conduit call. We
> make heavy use of progress information today and the plugins will need
> an api that will allow them to intermittently pass pulp progress
> information.

I meant to mention that too. Ugh. The conduit has a call set_progress 
that's gonna be peppered all over the importer's sync_repo method to 
update Pulp on the status as it progresses. That's not implemented now 
since I'm trying to hold off on using the existing tasking stuff in 
favor of the new coordinator hotness coming soon.




More information about the Pulp-list mailing list