[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Pulp-list] RFC: Should the migration system apply all migrations to new systems?

I have an RFC for you around this question:

Should our migration system be altered to apply historical migrations to new installations?

I'll give you some background about how the migration system currently works, and why. Then I'll discuss why I think we might want to change the behavior. I'd like each of you to carefully consider this change, to try to see if you can think of any problems that it might lead to.


As it is currently implemented, the migration system will skip all existing migrations for a specific migration package (i.e., pulp-rpm) if it detects that it has never run pulp-manage-db before while that package was installed.

The reason it works this way was that we wanted to be able to skip the application of migrations to new systems, since new systems shouldn't need migrations.

Why We Might Want to Change

If a plugin writer forgets to configure his or her plugin to advertise itself to the migration system, this behavior can lead to problems. In fact, we have just such a state right now with the pulp-rpm-plugins package[0]. In that bug report, it is noted that we had forgotten to include the egg-info in our RPM for the pulp-rpm-plugins package, which means that the package's migrations (and ISO plugins) are not advertised to Pulp. Because of this, the migration system will not mark that they have applied any migrations, or even that they ever had this package installed. This means that once we correct the issue and users upgrade from 2.0.z to X.Y.Z, any migrations that we wrote in between 2.0.z and X.Y.Z will not be applied.

A Proposed Change

I propose that we alter the migration system to not behave this way anymore, but to always start with migration version 0 and apply all the way to the latest available version. This will allow us to resolve #909366[0], and I believe it will be safe if the migration writers are careful to detect whether or not their migration should be applied.

Potential Problems

Most migrations are probably along the lines of looping over database objects and renaming a field to another field, or computing a new field based on some kind of state. These sorts of migrations should be safe to apply to new installations because the loop will execute 0 times as there are no objects in the DB yet.

However, migrations don't have to loop over database objects. In fact, they aren't constrained in any way. They are just a Python method that can do anything, and Pulp just tracks whether it has been called or not.

If there were to be a migration that did something that was tricky to detect whether it had already been applied yet, that would be problematic for this approach. I cannot think of such a use case myself, which is why I am writing this RFC. Here's a non-realistic example, but illustrates a case that might be tough to detect. I realize that this specific case is not something we would ever do, so consider it just for the purpose of illustration. Suppose that I wrote a migration that would insert an RPM to the DB that was just named example-<todays_date>.rpm. Obviously, this is silly, but you might see that it would be tough for me to be able to detect whether or not this migration had run before to avoid running it again. Again, that is not even close to a real world example, but I cannot myself think of a real world example.

Can any of you see a problem with this plan? Are there any examples you can think of that are real world use cases for the migration system that this would be a problem for?

Thanks for reading, and for your consideration!

[0] https://bugzilla.redhat.com/show_bug.cgi?id=909366

Randy Barlow

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]