[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Pulp-list] Importer Sync APIs

Hey Jay,

Nice write-up. I'm not the best person to speak to the necessary steps
to synchronize a repository, but I'll add what comments I can.

It's probably important to note that the steps, 1-6, are actually calls
by pulp into the plugin via a pre-defined api that the plugin will have
to implement. It'd be interesting to show what those calls look like.

I'm not sure this comment matters, but I was thinking that steps 1 and 2
(query external feed and current state of repository) are ordered
arbitrarily. (OCD made me point this out, don't think it's useful)

In step 2, current state of the repository, the get_unit_keys_for_repo
could be simply get_units_for_repo, and include more information. Along
with the "unit key" we could return the unit_id (solving the unassociate
problem), the stored metadata, and the file path. All this information
may be useful to the plugin writer for determining whether or not a
content unit needs to be updated, unassociated, or ignored.

In step 4, add or update units, I fully agree that add or update should
be combined with associate in the conduit call. It seems unnecessary to
me that they are separate. Since add/update is one idempotent call, I
don't think that batching this operation always buy you anything. Mongo
allows multi updates, but not multi adds. Though a batch operation would
be pretty cool from an "ease of use" perspective.

You should probably note that unit_id is the return of
add_or_update_content_unit under it's description instead of under
associate_content_unit's description.

In step 5, unassociate removed units, I think pulp should provide some
standard metadata on repositories, perhaps a "_pulp_uploaded" field that
lists the unit_ids that were manually uploaded into the repository by
pulp. The plugin then can makes use or not make use of this information
as they see fit. This could also be a flag on the content unit meta data
returned by my proposed get_units_for_repo call.

One thing that seems to be missing is a sync_progress conduit call. We
make heavy use of progress information today and the plugins will need
an api that will allow them to intermittently pass pulp progress

That's all for now... (aren't you glad you asked?)

On Mon, 2011-11-21 at 16:43 -0500, Jay Dobies wrote:
> http://blog.pulpproject.org/2011/11/21/importer-sync-apis/
> I know the week of Thanksgiving isn't the best time to ask for deep 
> thought, but I'm asking anyway.
> The above blog entry talks about an importer's sync_repo call. I talk 
> about the expectations I am making about the order and types of steps 
> the importer will want to take when synchronizing a repo. I also mention 
> what the conduit calls the plugin will use to feed information back into 
> Pulp.
> This is hugely important. If we're too limiting, we're going to prevent 
> some plugins from being able to do what they want. If the conduit calls 
> aren't structured right, the plugin is gonna have to thrash the database 
> to do what it needs to do.
> I'm asking for anyone who can spare a few minutes (it's lengthy, go 
> figure) to take a read and let me know what you think.
> Specifically, I'd like the guys who are very familiar with grinder and 
> the current RPM sync process to make sure we're still going to be able 
> to sync RPMs in the new model (I imagine I'd get yelled at if I 
> prevented that).
> I also know there are at least two other teams interested in writing 
> plugins that I'd like to give some feedback on how this will meet their 
> needs.
> Then there are the people who are just curious. I want your input too. 
> This is too big and ambitious for me to get right on my own.
> Thanks  :D

Jason L Connor
linear on freenode #pulp
RHCE: 805010912355231
GPG Fingerprint: 2048R/CC4ED7C1

Attachment: signature.asc
Description: This is a digitally signed message part

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]