On 04/25/2017 02:47 PM, Brian Bouterse wrote:
> There are a few use cases and behaviors that I think only an asynchronous update/delete for importers,
> publishers, and repositories will satisfy. Thanks to @asmacdo for covering some of these also.
> * As a plugin writer, I am in control when my code is running. The data layer is not expected to change other
> than what my code is doing or what I've asked platform to do for me. This same reasoning applies to unit
> associations too (as a similar example of this principle). With unit associations for example, I don't think
> we want new units showing up as part of a repo halfway through the publish operation.
> * As a plugin writer, I can modify and save the importer without being concerned that I am overwriting data
> the user just updated. This is a write-write race concern. We could say in documentation "don't do that", but
> they may have good reasons to and I don't see a good reason why they can't call save() during a sync() for
> example. Consider that they are defining fields on these subclassed importers and they want to write data into
> them at sync-time.
This applies to previous use case as well.
Even with update/delete being asynchronous, I don't think there is any guarantee as to what the data will be
when the task actually runs in a multi-user application. There is a race condition on the task queue. Other
users' updates and syncs can be queue in any order around what this user is trying to do. If this is a
requirement, we may need to include a snapshot of the data in the task.
> * A user should be able to submit a [sync], [modify the importer], [sync] and only have the update apply to
> the second sync. Synchronous updates make it difficult for a user to determine the impact an update has when
> applied while sync code is running for example.
> * If users want the ability to apply an importer update apply immediately then can cancel any running syncs
> and their update will apply immediately then, so we're not limiting them in any way.
> * Deleting data while code is expected to use that data is running will likely create negative user
> experiences. This is a rephrasing of the "don't change the data later while plugin code is running" point from
> earlier but with a twist. If the publisher is deleted while the plugin writer's publish() is running, if
> plugin code tries to read or save, the task will fail at some random point with a traceback, which is not a
> good experience. Contrast that against what you would get with an asynchronous design which would allow a
> clean PulpException to be emitted by the publish() task just before calling the publish() code. Having the
> PulpException which checks in exactly one place will be a better user experience with a nice message instead
> of a deep traceback and easier to maintain (in one place).
Seems this can be handled cleanly/gracefully for synchronous delete as well. The plugin should protect
updates with try/catch and raise a PulpException when the DB exception is raised. The sync would terminate
immediately and the task would report that the importer had been deleted which seems reasonable.
> Thank you to everyone who is participating in this convo. We don't have agreement, but we do have a good
> dialogue going! What are you thoughts about ^?
> On Fri, Apr 21, 2017 at 2:16 PM, Austin Macdonald <austin redhat com <mailto:austin redhat com>> wrote:
> I think we have arrived at a consensus around these points:
> * Django's ModelForm validations that are used during `full_clean` is an anti-pattern in
> DjangoRestFramework and should not be used
> * Validation, by our definition, will be done by the serializers in the API layer
> * The Task layer should run business logic checks at the time of use
> We have not arrived at a consensus on whether Updates/Deletes for Importers/Publishers will be synchronous
> or asynchronous. I've tried to think through each way, and what it would need to work.
> *Synchronous Update and Delete:
> _Validation Heuristic_
> The serializers validate in the API layer only, and it is desirable to have that validation occur at the
> time of the write.
> _Concurrent Writes:_
> Let's assume synchronous Updates and Deletes. There is a possibility of concurrent writes because
> sync/publish tasks write to the importer/publisher tables respectively and these tables can also be
> written by the Update API at any time. If there is a collision the last write wins.
> Concurrent writes may not be a big deal because they should be not affect each other. In platform, sync
> and publish only write to the last_sync/last_publish fields, which should never be written by the update
> API. If the writes are restricted to only the affected fields, then there would be no clobbering. As long
> as plugin writers' implementations of sync and publish do not write to the same fields allowed in an API
> update, this concurrency is safe. We would obviously need to be clearly documented for plugin writers.
> _Dealing With Broken Configurations_
> For example, while a sync task is WAITING, the user updates an importer in a way that breaks sync. The
> importer configuration should be read only once at the start of a sync. If this configuration has been
> deleted or is not syncable, business logic checks catch this and fail immediately. Even if the
> configuration is changed during the sync task, the sync task will continue with the original
> configuration. The sync task will update last_sync, but will not write to other fields. I don't see a
> problem here, as long as this is documented for the plugin writer.
> I don't see any concurrency problems here. The downside of this design is that sync and publish,
> (implemented by plugin writers), should not write to the same fields as API updates.
> *Asynchronous Update and Delete*
> _Validation Heuristic_
> Assuming async updates, the data comes into the API layer where it is validated by the serializers. The
> update reserves a task (reservation is based on repository) and waits to be performed. It is usually
> undesirable to separate validation from use. There could be concern that data validated in the API layer
> could become unacceptable by the time the Task layer actually performs the write.
> Practically, this separation may not be an issue. Valid data becoming invalid involves validations that
> are dependent on state. We can split these validations into two categories, uniqueness validations and
> multi-field validations. Fortunately, the Django ORM will enforce uniqueness validation, so the database
> is protected, but we might have to handle write failures. Multi-field validations are probably not
> validations by our definition, but are business logic, which could be checked in the Task layer.
> _Concurrent write prevention_
> Leveraging the reserved resource system, concurrent writes could not be a problem because writes could
> never be concurrent at all. Syncs, Updates, Deletes, Publishes all make reservations based on the
> repository (this is an assumption based on what we do in Pulp 2), so each would run in the order that they
> were called.
> _User Controlled Order_
> A user can issue a [sync][update][sync] and be confident that the first sync will use the original
> importer configuration, and the second sync will use the updated configuration.
> _Uncertain State_
> The configuration of an importer is unpredictable if there is a waiting update task from another user.
_________________> Pulp-dev redhat com <mailto:Pulp-dev redhat com>
> Pulp-dev mailing list
Pulp-dev mailing list
Pulp-dev redhat com