[Pulp-dev] versioned repositories

Dennis Kliban dkliban at redhat.com
Wed May 24 15:26:46 UTC 2017


I noticed that the REST API examples don't mention anything about deleting
a particular version of a repository. This is a use case that we need to
support.

-Dennis

On Wed, May 17, 2017 at 10:03 PM, Michael Hrivnak <mhrivnak at redhat.com>
wrote:

> We've discussed versioned repositories and their merits in the past, but
> I'd like to propose a specific direction, and inclusion in 3.0. As a recap
> of goals, versions can help us answer two important questions about the
> history of a repository:
>
> 1) What set of content is in a specific version of a repository?
> 2) What changed between two arbitrary versions of a repository?
>
> I am proposing a model where Pulp creates a new version of a repository
> for every operation that changes that repo's content. For example, a sync
> task would create a single new version.
>
> Basic Example
> -----------
>
> - You create repository "foo".
> - You sync repository "foo", which produces version 1 of that repo.
> - You sync once per day for some period of time, automatically creating a
> new version each time.
> - You publish repo "foo", which defaults to publishing the most recent
> version.
> - You don't like something that's new in the repo, so you roll back by
> publishing a previous version.
>
> Data Model Basics
> -----------
>
> In the past we've stored the relationship between a content unit and a
> repo as a standard many-to-many through table. There's a reference to a
> unit, and a reference to a repo.
>
> The version scheme I'm pitching adds two new fields to that through table:
>
> vadded - a foreign key to the repo version in which this content unit was
> added
> vremoved - a foreign key to the repo version in which this content unit
> was removed. This can be null.
>
> Multiple entries can exist for the same content unit and repo, so long as
> a new one is not added until the previous one's "vremoved" field is set.
>
> With this structure, it is easy to query the database to answer both
> questions we started with.
>
> REST API
> ----------
>
> Some endpoint will be made that gives access to the versions of a specific
> repository. Ideally we would have a nested endpoint like this:
>
> /api/v3/repositories/foo/versions/
>
> But nested views have been a problem for us with DRF (django rest
> framework). If we aren't able to make that happen, I've gotten this to work
> in my PoC branch:
>
> /api/v3/repositoryversions/?repository=foo
>
> It's not yet clear how best to represent content through the REST API. A
> nested endpoint within the repo version object would be ideal.
>
> /api/v3/repositories/foo/versions/4/content/
>
> Operations on a repo where a version could be chosen, such as a publish,
> should default to the latest version. It's an open question how best to
> represent that, and perhaps it takes the form of two endpoints:
>
> default to latest: POST /api/v3/repositories/foo/distributors/bar/publish
>
> specify a version: POST /api/v3/repositories/foo/versions/4/publish
>
> But that's just one idea. Much about our REST API layout has yet to be
> written in stone, and we have flexibility.
>
> Orphans
> ---------
>
> Notice that this changes the orphan workflow. Removing a content unit from
> a repo doesn't make it an orphan. This helps reduce the need to run an
> orphan cleanup task, which in turn helps avoid the inherent race condition
> that task can introduce.
>
> Trim History
> ---------
>
> But you may not want to keep history forever, so a valuable feature will
> be the ability to trim history. I think this would just be an operation
> that squashes a bunch of versions together, and it could optionally take
> that opportunity to immediately delete a content unit that becomes an
> orphan.
>
> Illustrating the workflow, if you wanted to squash history prior to
> version 10, the task would:
>
> - delete all of a repo's relationships in the through table where vremoved
> is a version <= 10
> - optionally check if each content unit is now an orphan and remove if so
> - update all remaining entries where vadded < 10 by setting vadded to 10
>
> PoC
> --------
>
> I have a branch with proof-of-concept code here:
>
> https://github.com/pulp/pulp/compare/3.0-dev...mhrivnak:vers
> ioned-repos?expand=1
>
> The models are the most interesting place to look. In particular, I'm very
> pleased with how simple the "content()" method is, which returns a QuerySet
> matching all the content in a given version.
>
> The rest is REST ;) API stuff mostly, which isn't all that interesting
> except to demonstrate how the data could potentially be exposed. You can
> run the included tests (which I made just for dev purposes- not sure if
> they deserve a long-term home) which are found in the root of the git repo,
> and that loads some data into the database. Then you can hit this endpoint
> as an example:
>
> http://yourhost:8000/api/v3/repositoryversions/?repository=r1
>
> Obviously this code is rough, so please consider it for directional and
> conceptual purposes only. Assume major additions and improvements if we
> follow through on this concept.
>
> Value
> -------
>
> Tracking history in this way opens up great possibilities. Some examples:
>
> Promotion could become a matter of having two publishers on a repo with
> different settings, one for "testing" and one for "production", and just
> publishing whichever version you like with each. Multiple repos and copy
> operations are no longer needed for promotion. Austin suggested that the
> ability to tag versions with arbitrary key:value pairs could enhance this
> use case.
>
> An added concept, which could come post-3.0, is tracking publications more
> explicitly and associating each with a version. Although I could see a case
> for laying this groundwork now before the API is locked down. Promotion
> could become more about making a publication available in a different
> location, rather than re-creating it. We'd also know which content is part
> of a publication, and guarantee that content doesn't get removed before the
> publication does. This is a deficiency we have in Pulp 2.
>
> Pulp-to-pulp sync could become very efficient since they could easily
> replicate only the changes since the last sync.
>
> Incremental exports become more concrete. Rather than depending on a
> timestamp, you can know with certainty which version you have in the remote
> location, and thus which newer versions need to be exported.
>
> We could add a "finalized" boolean or similar to a version, and use that
> to know if it was successfully completed. If not, for example if a sync
> task stopped abruptly, the incomplete version could easily be recognized
> and removed.
>
> Feedback Please
> ----------
>
> Please ask questions, provide feedback, add ideas, suggest alternatives,
> etc. I'm perfectly happy even throwing this PoC away if we come up with
> something better.
>
> Thanks!
>
> --
>
> Michael Hrivnak
>
> Principal Software Engineer, RHCE
>
> Red Hat
>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-dev/attachments/20170524/f6e66f95/attachment.htm>


More information about the Pulp-dev mailing list