Ping... Any additional comments or suggestions for this patch?
Bumping in case it got lost in the backlog. :)
On Fri, 2014-04-11 at 17:01 +0000, Stewart, Sean wrote:
> On Fri, 2014-04-11 at 17:03 +0100, Bryn M. Reeves wrote:
> > On Fri, Mar 28, 2014 at 09:01:14PM +0000, Stewart, Sean wrote:
> > > When a system is booted to the SAN, a condition can occur where one
> > > user friendly name is given to a disk during boot, but multipathd tries
> > > to allocate a different one after boot. If the second alias is already
> > > used by another device, multipathd can't rename it. Multipathd then has
> > > incorrect information about the alias/wwid relationships, which can
> > > result in paths being added to the wrong map.
> > This should only happen if the initramfs and root file system have
> > inconsistent multipath configurations (either multipath.conf or bindings
> > / wwids file mismatched). That's not really a valid configuration for
> > the system to be in and leads to the type of problems you describe.
> That is true that it only happens if they are out of sync. We tried
> remaking the initramfs to fix the problem, but it didn't help.
> > > This patch works around this problem by first trying to use the alias
> > > already bound to a device during boot. If the bindings file has that
> > > alias bound to a different device, it'll auto generate a new alias to
> > > rename it to.
> > To be honest I'd prefer to see this cause an error. These types of
> > configurations currently run the risk of silent data corruption - I'd
> > much rather deal with a system that refuses to boot due to an out of
> > date initramfs image than one that quietly remaps paths in unexpected
> > ways.
> The issue, though, is that the system does not refuse to boot. In the
> case we saw, it booted anyway, our QA engineer ran a test, and it ended
> with a data corruption. A user could perform a fresh installation,
> new luns, reboot, and without any way of realizing it have essentially a
> ticking time bomb on their hands, ready to go off as soon as there's a
> blip in the SAN.