[Linux-cluster] centos5 to RHEL6 migration

Mon Jan 9 09:36:05 UTC 2012

On 1/9/2012 9:52 AM, Alan Brown wrote:
> On 09/01/12 02:38, Digimer wrote:
> 
>>   Technically yes, practically no. Or rather, not without a lot of
>> testing first.
> 
> This is "rather a shame"
> 
> I have a similar requirement (EL5 -> EL6 with GFS)
> 

Well the cluster stack itself (openais/cman/gfs/rgmanager ->
corosync/cman/gfs2/rgmanager) is capable of handling the upgrade in a
compatible mode.

*BUT* (yes there are tons of those)

in time, while performing different upgrade scenarios/tests, we come to
the conclusion that it is a lot more complicated for any user (even
expert/advanced ones) to perform a safe upgrade than rebuilding the
cluster from scratch (*) given that setup/config/etc are known from the
old cluster.

>>   There may be some other things you need to do as well. Please be sure
>> to do proper testing and, if you have the budget, hire Red Hat to advise
>> on this process. Also, please report back your results. It would help me
>> help others in the same boat later. :)
> 
> RH's advice to use is to "Big Bang" it.

It´s not much of an advice, as RH does not officially support this
upgrade method.

> 
> The last such transition (EL4 to EL5) was an unmitigated disaster even
> with RH onsite to make the change, so we're _very_ wary this time around.
> 

The amount of changes in the cluster software between EL5 and EL6 are a
lot less intrusive at system level. I can´t really say for sure for the
entire OS, since the upgrade doesn´t involve only RHCS.

Fabio

(*) The major issues, while upgrading from 5 to 6 are:
- GFS1 is not support in EL6. Volumes need to be migrated to GFS2 (and
there are several ways to do it, but still needs to be done offline)
- cluster.conf cannot be updated automatically during an upgrade or
nodes running in mixed mode (some nodes at 5 and others at 6).
- some config options, while backward compat should be retained, needs
to be changed in very specific sequence, making it really hard to
perform an easy upgrade.
- but the biggest blocker of all are all the resources (driven or not by
rgmanager).

For example, apache2 config in EL5 cannot be used out-of-the-box on EL6.
So assuming rgmanager is driving apache2, then you would need to setup 2
separate apache2 configs, test them individually, perform migration
checks between EL5 and 6... etc. This kind of testing is more time
consuming and complex than what you can possibly gain by redoing the
cluster from scratch.

There are also other resources that are simply unable to deal with this
kind of upgrade.

Let´s make the example of a db stored on a gfs2 filesystem. DB created
in version 1, after a migration to EL6, the DB format is upgraded to
internal version 2. Version 2 being incompatible with 1.

IF there is a situation where the service needs to failover back to a
node running EL5, the DB will be unable to start. Effectively killing
the purpose of HA.

What you want to notice is that the service compatibility level has
nothing to do with cluster itself.

Now, when you multiply the amount of possible services, failover
scenarios, config changes etc, you will easily come to the conclusion
that an upgrade of this proportion is a path to insanity for the
administrator.