[Pulp-list] [devel] on_demand sync use-case gap

Mon Mar 14 04:04:26 UTC 2016

Regarding the support for on-demand content fetching that will be released
very soon in 2.8.0, there is a use case that needs improvement. If you'd
like to provide feedback on proposed solutions, read on.

Problem: A user has a repo with a download policy of "on_demand". They
update the policy to be "immediate", and then do a sync. The user expects
all files to then be downloaded, but they will not be.

Pulp 2.8.0 will only download all missing files the next time there is a
sync AND the remote metadata has changed. But if the remote repo doesn't
change, the sync optimization skips most of the sync, and file existence is
never checked.

Katello is giving their users the ability to change the policy, and of
course direct users of pulp have that ability, so this is a use case we
need to improve. Two reasonable options have been identified:

Option 1: When pulp_rpm is deciding if it should skip sync steps, it could
also query the database for any units in the repo where the "downloaded"
boolean is false. If any are found, do a full sync.

Pros:
- It's a very simple addition that mostly just calls a controller function
to determine if there are any undownloaded files.
- The change is isolated to one place in the code.
- This approach would measure exactly the condition that should trigger a
full sync.

Cons:
- Doing the query for non-downloaded units does have a cost. The amount of
time is likely to be on the order of a few seconds for a repo of 10000
units, for example.
- There are other reasons pulp may want to do a full sync even if the
upstream repo hasn't changed. Having a more general way to mark a repo as
requiring a full sync the next time around would be better in that way.

Option 2: Add a way to identify that an importer's config has changed since
the last successful sync, and always do a full sync in that case.

Pros:
- This is more generally useful. Rather than tracking multiple conditions
individually (Did the skip list change? Are there undownloaded units? Did
the feed change? Did auth credentials change?), any change automatically
forces a full sync.
Cons:
- The platform would need to start tracking when the importer config was
updated, and when the last successful sync happened.
- Deciding whether a sync was successful is difficult to do in a standard
way.
- As such, this option is less simple and would require changes in both
platform and the plugin.
- This measures a less specific condition, and thus would induce a full
sync in cases where an inconsequential config change was made.

Thoughts? Other ideas? I'm leaning toward option 1 as a bug fix for 2.8.x,
and some variation of option 2 as a more robust solution in the future.

Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20160314/837223e1/attachment.htm>