Re: [Pulp-list] Importer Sync APIs

On 11/22/2011 10:08 AM, Jason L Connor wrote:
Hey Jay,

Nice write-up. I'm not the best person to speak to the necessary steps
to synchronize a repository, but I'll add what comments I can.

It's probably important to note that the steps, 1-6, are actually calls
by pulp into the plugin via a pre-defined api that the plugin will have
to implement. It'd be interesting to show what those calls look like.

Actually, that's not the case at all. That entire write up was only about the importer's sync_repo method. In steps 1-6, the things labeled "Conduit Calls" are the calls back into Pulp. But that whole process took place during the importer's sync_repo implementation.

I'm not sure this comment matters, but I was thinking that steps 1 and 2
(query external feed and current state of repository) are ordered
arbitrarily. (OCD made me point this out, don't think it's useful)

In step 2, current state of the repository, the get_unit_keys_for_repo
could be simply get_units_for_repo, and include more information. Along
with the "unit key" we could return the unit_id (solving the unassociate
problem), the stored metadata, and the file path. All this information
may be useful to the plugin writer for determining whether or not a
content unit needs to be updated, unassociated, or ignored.

See my first reply to Nick, I'm with you guys on changing this.

In step 4, add or update units, I fully agree that add or update should
be combined with associate in the conduit call. It seems unnecessary to
me that they are separate. Since add/update is one idempotent call, I
don't think that batching this operation always buy you anything. Mongo
allows multi updates, but not multi adds. Though a batch operation would
be pretty cool from an "ease of use" perspective.

You should probably note that unit_id is the return of
add_or_update_content_unit under it's description instead of under
associate_content_unit's description.

In step 5, unassociate removed units, I think pulp should provide some
standard metadata on repositories, perhaps a "_pulp_uploaded" field that
lists the unit_ids that were manually uploaded into the repository by
pulp. The plugin then can makes use or not make use of this information
as they see fit. This could also be a flag on the content unit meta data
returned by my proposed get_units_for_repo call.

One thing that seems to be missing is a sync_progress conduit call. We
make heavy use of progress information today and the plugins will need
an api that will allow them to intermittently pass pulp progress

I meant to mention that too. Ugh. The conduit has a call set_progress that's gonna be peppered all over the importer's sync_repo method to update Pulp on the status as it progresses. That's not implemented now since I'm trying to hold off on using the existing tasking stuff in favor of the new coordinator hotness coming soon.

