[Pulp-list] RFC: Should the migration system apply all migrations to new systems?
Randy Barlow
rbarlow at redhat.com
Tue Feb 19 16:33:38 UTC 2013
I have an RFC for you around this question:
Should our migration system be altered to apply historical migrations to
new installations?
I'll give you some background about how the migration system currently
works, and why. Then I'll discuss why I think we might want to change the
behavior. I'd like each of you to carefully consider this change, to try
to see if you can think of any problems that it might lead to.
Background
As it is currently implemented, the migration system will skip all
existing migrations for a specific migration package (i.e., pulp-rpm) if
it detects that it has never run pulp-manage-db before while that package
was installed.
The reason it works this way was that we wanted to be able to skip the
application of migrations to new systems, since new systems shouldn't need
migrations.
Why We Might Want to Change
If a plugin writer forgets to configure his or her plugin to advertise
itself to the migration system, this behavior can lead to problems. In
fact, we have just such a state right now with the pulp-rpm-plugins
package[0]. In that bug report, it is noted that we had forgotten to
include the egg-info in our RPM for the pulp-rpm-plugins package, which
means that the package's migrations (and ISO plugins) are not advertised
to Pulp. Because of this, the migration system will not mark that they
have applied any migrations, or even that they ever had this package
installed. This means that once we correct the issue and users upgrade
from 2.0.z to X.Y.Z, any migrations that we wrote in between 2.0.z and
X.Y.Z will not be applied.
A Proposed Change
I propose that we alter the migration system to not behave this way
anymore, but to always start with migration version 0 and apply all the
way to the latest available version. This will allow us to resolve
#909366[0], and I believe it will be safe if the migration writers are
careful to detect whether or not their migration should be applied.
Potential Problems
Most migrations are probably along the lines of looping over database
objects and renaming a field to another field, or computing a new field
based on some kind of state. These sorts of migrations should be safe to
apply to new installations because the loop will execute 0 times as there
are no objects in the DB yet.
However, migrations don't have to loop over database objects. In fact,
they aren't constrained in any way. They are just a Python method that can
do anything, and Pulp just tracks whether it has been called or not.
If there were to be a migration that did something that was tricky to
detect whether it had already been applied yet, that would be problematic
for this approach. I cannot think of such a use case myself, which is why
I am writing this RFC. Here's a non-realistic example, but illustrates a
case that might be tough to detect. I realize that this specific
case is not something we would ever do, so consider it just for the
purpose of illustration. Suppose that I wrote a migration that would
insert an RPM to the DB that was just named example-<todays_date>.rpm.
Obviously, this is silly, but you might see that it would be tough for me
to be able to detect whether or not this migration had run before to avoid
running it again. Again, that is not even close to a real world example,
but I cannot myself think of a real world example.
Can any of you see a problem with this plan? Are there any examples you
can think of that are real world use cases for the migration system that
this would be a problem for?
Thanks for reading, and for your consideration!
[0] https://bugzilla.redhat.com/show_bug.cgi?id=909366
--
Randy Barlow
More information about the Pulp-list
mailing list