[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Pulp-dev] 3.0 Validation update

tldr; REST calls to Update or Delete Importers, Publishers, and Repositories will return *asynchronously*.

We met last week to discuss the question. The case of a Delete called during a sync demonstrates why this is necessary. If Deletes were synchronous, each plugin would have to handle the possibility that their models have been removed.

By deciding that Updates and Deletes are asynchronous, we are able to make a general promise to the plugin writers: The repository and its associated importer and publishers will not be changed by platform Pulp except during reserved calls. This means that when plugin code is running it has a sort of write lock on the relevant data.

On Wed, Apr 26, 2017 at 9:55 AM, Jeff Ortel <jortel redhat com> wrote:

On 04/25/2017 02:47 PM, Brian Bouterse wrote:
> There are a few use cases and behaviors that I think only an asynchronous update/delete for importers,
> publishers, and repositories will satisfy. Thanks to @asmacdo for covering some of these also.
> * As a plugin writer, I am in control when my code is running. The data layer is not expected to change other
> than what my code is doing or what I've asked platform to do for me. This same reasoning applies to unit
> associations too (as a similar example of this principle). With unit associations for example, I don't think
> we want new units showing up as part of a repo halfway through the publish operation.
> * As a plugin writer, I can modify and save the importer without being concerned that I am overwriting data
> the user just updated. This is a write-write race concern. We could say in documentation "don't do that", but
> they may have good reasons to and I don't see a good reason why they can't call save() during a sync() for
> example. Consider that they are defining fields on these subclassed importers and they want to write data into
> them at sync-time.

This applies to previous use case as well.

Even with update/delete being asynchronous, I don't think there is any guarantee as to what the data will be
when the task actually runs in a multi-user application.  There is a race condition on the task queue.  Other
users' updates and syncs can be queue in any order around what this user is trying to do.  If this is a
requirement, we may need to include a snapshot of the data in the task.

> * A user should be able to submit a [sync], [modify the importer], [sync] and only have the update apply to
> the second sync. Synchronous updates make it difficult for a user to determine the impact an update has when
> applied while sync code is running for example.
> * If users want the ability to apply an importer update apply immediately then can cancel any running syncs
> and their update will apply immediately then, so we're not limiting them in any way.
> * Deleting data while code is expected to use that data is running will likely create negative user
> experiences. This is a rephrasing of the "don't change the data later while plugin code is running" point from
> earlier but with a twist. If the publisher is deleted while the plugin writer's publish() is running, if
> plugin code tries to read or save, the task will fail at some random point with a traceback, which is not a
> good experience. Contrast that against what you would get with an asynchronous design which would allow a
> clean PulpException to be emitted by the publish() task just before calling the publish() code. Having the
> PulpException which checks in exactly one place will be a better user experience with a nice message instead
> of a deep traceback and easier to maintain (in one place).

Seems this can be handled cleanly/gracefully for synchronous delete as well.  The plugin should protect
updates with try/catch and raise a PulpException when the DB exception is raised.  The sync would terminate
immediately and the task would report that the importer had been deleted which seems reasonable.

> Thank you to everyone who is participating in this convo. We don't have agreement, but we do have a good
> dialogue going! What are you thoughts about ^?
> -Brian
> On Fri, Apr 21, 2017 at 2:16 PM, Austin Macdonald <austin redhat com <mailto:austin redhat com>> wrote:
>     I think we have arrived at a consensus around these points:
>     * Django's ModelForm validations that are used during `full_clean` is an anti-pattern in
>     DjangoRestFramework and should not be used
>     * Validation, by our definition, will be done by the serializers in the API layer
>     * The Task layer should run business logic checks at the time of use
>     We have not arrived at a consensus on whether Updates/Deletes for Importers/Publishers will be synchronous
>     or asynchronous. I've tried to think through each way, and what it would need to work.
>     *Synchronous Update and Delete:
>     *
>     _Validation Heuristic_
>     The serializers validate in the API layer only, and it is desirable to have that validation occur at the
>     time of the write.
>     *
>     *
>     _Concurrent Writes:_
>     Let's assume synchronous Updates and Deletes. There is a possibility of concurrent writes because
>     sync/publish tasks write to the importer/publisher tables respectively and these tables can also be
>     written by the Update API at any time. If there is a collision the last write wins.
>     Workaround:
>     Concurrent writes may not be a big deal because they should be not affect each other. In platform, sync
>     and publish only write to the last_sync/last_publish fields, which should never be written by the update
>     API. If the writes are restricted to only the affected fields, then there would be no clobbering. As long
>     as plugin writers' implementations of sync and publish do not write to the same fields allowed in an API
>     update, this concurrency is safe. We would obviously need to be clearly documented for plugin writers.
>     _Dealing With Broken Configurations_
>     For example, while a sync task is WAITING, the user updates an importer in a way that breaks sync. The
>     importer configuration should be read only once at the start of a sync. If this configuration has been
>     deleted or is not syncable, business logic checks catch this and fail immediately. Even if the
>     configuration is changed during the sync task, the sync task will continue with the original
>     configuration. The sync task will update last_sync, but will not write to other fields. I don't see a
>     problem here, as long as this is documented for the plugin writer.
>     _Summary_:
>     I don't see any concurrency problems here. The downside of this design is that sync and publish,
>     (implemented by plugin writers), should not write to the same fields as API updates.
>     *
>     *
>     *Asynchronous Update and Delete*
>     _Validation Heuristic_
>     Assuming async updates, the data comes into the API layer where it is validated by the serializers. The
>     update reserves a task (reservation is based on repository) and waits to be performed. It is usually
>     undesirable to separate validation from use. There could be concern that data validated in the API layer
>     could become unacceptable by the time the Task layer actually performs the write.
>     Practically, this separation may not be an issue. Valid data becoming invalid involves validations that
>     are dependent on state. We can split these validations into two categories, uniqueness validations and
>     multi-field validations. Fortunately, the Django ORM will enforce uniqueness validation, so the database
>     is protected, but we might have to handle write failures. Multi-field validations are probably not
>     validations by our definition, but are business logic, which could be checked in the Task layer.
>     _
>     _
>     _Concurrent write prevention_
>     Leveraging the reserved resource system, concurrent writes could not be a problem because writes could
>     never be concurrent at all. Syncs, Updates, Deletes, Publishes all make reservations based on the
>     repository (this is an assumption based on what we do in Pulp 2), so each would run in the order that they
>     were called.
>     _User Controlled Order_
>     A user can issue a [sync][update][sync] and be confident that the first sync will use the original
>     importer configuration, and the second sync will use the updated configuration.
>     _Uncertain State_
>     The configuration of an importer is unpredictable if there is a waiting update task from another user.
>     _______________________________________________
>     Pulp-dev mailing list
>     Pulp-dev redhat com <mailto:Pulp-dev redhat com>
>     https://www.redhat.com/mailman/listinfo/pulp-dev <https://www.redhat.com/mailman/listinfo/pulp-dev>
> _______________________________________________
> Pulp-dev mailing list
> Pulp-dev redhat com
> https://www.redhat.com/mailman/listinfo/pulp-dev

Pulp-dev mailing list
Pulp-dev redhat com

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]