[dm-devel] fibre channel/multipath questions
Rik Bobbaers
Rik.Bobbaers at cc.kuleuven.be
Fri Aug 26 16:12:22 UTC 2005
Hey guys,
Let me introduce myself.
I'm Rik Bobbaers, working at the KULeuven (university in belgium)
I'm a linux sysadmin, followed a linux kernel internals course and a device
drivers course.
I have some machines that i want to connect to a SAN, which also has an
IBM/ESS (shark) connection. The san is split in 2 (for failover). each server
has 2 qlogic fibrechannel cards, which are both connected to different san
switches. I want failover/loadbalancing over those (du'uh ;))
hardware: Dell PE 1750, dual xeon cpu, 1 gig ram
distro: Debian/unstable (normally it's a stable)
kernel version: vanilla 2.6.13-rc6
bootloader: lilo (maybe switch to grub necessary?)
udev version: 0.067-1
multipath version: 0.4.2.4-2
fibre channel cards:
0000:01:04.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev
02)
0000:03:06.0 Fibre Channel: QLogic Corp. QLA2312 Fibre Channel Adapter (rev
02)
Now the questions:
1. If i unplug one cable (simulating a sanswitch breakdown/upgrade/...), the
timer starts, giving a timeout after fc_dev_loss_tmo seconds. In the normal
situation, this is 35 seconds (the qlogic driver adds 5 seconds to the normal
30). This timer can be set to a max of SCSI_DEVICE_BLOCK_MAX_TIMEOUT, being
15000 it seems, which is 4 hours 10 minutes.
If you plug your cable in again before this timer reached 0, it reactivates
the device, making reintegration in the multipath possible (disks keep the
same major:minor number). If the timer runs out, the devices are removed
permanently. When you plug the cable back in, it gives the newly found disks
other major:minor numbers and device names (same major, but well... ;)),
which makes reintegration in the multipath impossible unless you unmount
everything etc... This is not the behaviour you want on a SAN network in a
failover/loadbalancing environment imho. Is there a way you can make the
disks be rerecognised as "the same disks as before"? So that the devices can
be removed by the driver, but re-made when the san connection returns after
an undetermined amount of time?
If not, is it dangerous to set the SCSI_DEVICE_BLOCK_MAX_TIMEOUT to ... let's
say 2^32 (uint_32)? Or redefine it in drivers/scsi/scsi_transport_fc.c ? What
would be the consequences? I can imagine that this could cause memory
problems, workqueue problems or so. I didn't find anything on this on the
lkml list archives or so.
2. Is it possible to make the fibrechannel driver get loaded AFTER the others?
(Our kernels have no modules support, so i built it all in the kernel for
security reasons etc...) The reason is very simple. If I have 1 disk on ESS,
it now becomes /dev/sda on scsi0, /dev/sdb on scsi1 and the local disks
are /dev/sdc and /dev/sdd. If I add another ESS disk , it will be /dev/sde
and /dev/sdf, after a reboot, they will be /dev/sdc and /dev/sdd, the local
disks will become /dev/sde and /dev/sdf. At first i thought, use e2label, but
that's only for ext2/3 filesystems, our systems run on reiserfs. Are there
any sollutions for this in a monolythical kernel or is the only way to fix
this, compile the driver as modules and load them at boottime?
I tried patching the current code a little bit, which was quite hard (since
i've never done this before). I allready learned a lot, but I would like to
learn some more so that I could eventually even try to help out on these
things.
I hope this is enough information, if not, please ask!
thanks a million,
--
harry
aka Rik Bobbaers
K.U.Leuven - LUDIT -=- Tel: +32 485 52 71 50
Rik.Bobbaers at cc.kuleuven.be -=- http://harry.ulyssis.org
More information about the dm-devel
mailing list