[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Pulp-list] Coordinator Usage for Repositories

I see it in place for repo delete, sync, and publish. It still needs to be added in other places too. I figured I'd start a discussion of where and how.

= Repo Create =
I actually don't think we need it here. I think the create is atomic enough where the race condition of multiple creates for the same ID is fine.

The reason I'm against using it is because it's going to royally mess up my create workflow in the RPM extension. That create is actually going to be three operations: create repo, add importer, add distributor. If that create doesn't immediately tell me success or fail, then I can't go on to the other steps. That means the user will have to wait for the create to complete, delete the repo, and try again.

= Add or Remove Importer/Distributor =
As much as I want to say this falls under the same rationale as create repo, it doesn't. It's actually closer to updating a repo than it is a create, so I think it needs to block on everything repo update blocks on.

= Repo Update =
Technically speaking, the actual data in a repo is so benign that changing it won't have any repercussions on a running operation. Still, should treat it as any other update.

= Update Importer/Distributor Config =
Again, this is just like updating a repo. Can't do this while in the process of a sync.

= Concerns? =
(this is mostly me thinking out loud)

I know I've said it before, but do we need to entertain multiple queues? If we have 4 repos synchronizing, you're locked out of any repo manipulation operations until one of those syncs finishes. By manipulation operations I mean I can't create/update/add importers/distributors to a a repo while 4 totally separate repos are synccing.

That has the potential to be really annoying if you have a lot of scheduled syncs. You could be in the middle of some admin operations when one or more scheduled syncs kicks in and basically takes over all of Pulp's processing capabilities. That may be alleviated by suggested usage of off-hours synccing.

And that's just within repo operations. To be delayed from creating a new repo or updating one because I've triggered a handful of consumer operations is also a rough user experience (I say delayed meaning the coordinator isn't the one blocking it, just the sheer lack of open threads in the task pool).

That said, I haven't fully thought through what a multiple queue setup would look like. It probably gets really tricky very fast. I just want to make sure we understand how that user experience is going to change now that many more things that previously didn't are now reliant on an open thread in the task pool.

I wonder if it makes sense to use creative math for the task weighting concept to ensure there will be some open space for non-sync/publish tasks. For instance, say syncs weigh 3 and the total allocated weight points in the task queue is 8. That means we could never have sync operations block the entire task queue; they just don't fit. That'd always leave 2 spots open for the smaller operations like create/delete/update (assume for this example they each weigh 1). The coordinator would prevent smaller operations on the repos being syncced from taking place, but it would let them slide through for unrelated repos.

Jay Dobies
Freenode: jdob @ #pulp
http://pulpproject.org | http://blog.pulpproject.org

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]