[dm-devel] Problems with multipathd
gistolero at gmx.de
gistolero at gmx.de
Mon Sep 12 15:52:57 UTC 2005
>>===> I found some settings in /sys/module/qla2xxx/parameters/...,
>>but most of them are read-only values. I have changed ql2xretrycount
>>and ql2xsuspendcount but without success. Any suggestions for
>>this driver?
>>
>
> Here are the interesting one I guess.
>
> [root at s64p17bibro ~]# find /sys/class/ -name "*tmo*"
> /sys/class/fc_remote_ports/rport-1:0-3/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-2/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-1/dev_loss_tmo
> /sys/class/fc_remote_ports/rport-1:0-0/dev_loss_tmo
> /sys/class/scsi_host/host1/lpfc_nodev_tmo
Ok, I have a 6 seconds timeout now :-)
>>I have commented this line, but udev still has difficulties to create this
>>links. Therefore I have changed /etc/dev.d/block/multipath.dev (the script
>>is attached at the end of this post) and added debug messages. The most
>>important modification is that kpartx uses the block-device-files in
>>/dev/mapper/... instead of /dev/...
>>===> Why isn't that the default? Are there any disadvantages?
>>
>
> Not really. All distributors seem to have their own ideas about naming
> policies. You should ask about, and follow the Gentoo philosophy I
> guess.
I'm sure of not beeing the only one who has problems with missing /dev/...
links. It's possible that multipath installs a device-mapper table without
errors, but kpartx fails because udev doesn't create links in /dev/... So, I
think multipath.dev should execute kpartx with /dev/mapper/... instead of
/dev/... by default.
>>===> Without "udevstart" udev doesn't create the /dev/150gb*
>>links! Is this a udev bug?
>>
> You can still identify the udev problems keeping the node creation
> in /dev/. Maybe all path setupis done in the initrd/initramfs without
> multipath being able to react.
multipath is able to react. I don't understand why I have to execute udevstart.
>>===> First multipathd says "8:0: tur checker reports
>>path is down" and multipath prints sda "failed" (ok).
>>After a few seconds sda is "ready" and multipathd says
>>"8:0: tur checker reports path is up"?! I have changed
>>nothing during this time.
>>
>
> Maybe the checker is confused by the long timeouts.
> Worth another try after the lowering.
After lowering the timeouts to 6 seconds multipathd shows the same behavior.
>>===> Multipathing seems to work without but not with multipathd.
>>It's very slow, but Christophe Varoqui wrote that I have to lower
>>the HBA timeouts (unfortunately, I don't know how to do this,
>>see above). Does I really need multipathd? I suppose so :-)
>>
>
> multipathd is needed to reinstate paths.
> In your case the rport disappears and reappears so the mecanism is all
> hotplug-driven and thus may work without the daemon ... if memory
> ressources permits hotplug and multipath(8) execution, that is.
What do you means with "In your case..."? Because 2.6 and udev are
multipath-tools dependencies all systems running multipath have the same
environment. They all use kernel 2.6 and udev, that is hotplug-driven. The
kernel starts this hotplug process and udev executes multipath. Sorry, but I
have to ask again: Does we really need multipathd?
After lowering dev_loss_tmo timeouts and stopping multipathd I have a working
multipath environment :-))) I tested this with a little perl script and a
mysql database:
My trafficmaker-host executed this script 27 times (parallel):
...
for(my $count=1;$count<=1000000;$count++)
{
...
my $sql="INSERT INTO $table VALUES($id,\"$value\")";
my $return=$dbh->do($sql);
...
}
...
{
my $sql="SELECT COUNT(*) FROM $table WHERE id=$id";
my $sth=$dbh->prepare($sql);
my $return=$sth->execute();
...
$selectCount=$sth->fetchrow_array();
...;
}
The database host had to insert this 30 byte strings and I have started some
copy-jobs (cp -a /usr/* /partition_mounted_with_multipath/ etc.) to increase
the I/O load. During this test I have disabled and enabled the different
HBA-Switch-Ports with the following result: It took 6 to 15 seconds before
"multipath -l" showed that a path is down (15 seconds because the host had a
30.0 CPU load and responded very slowly), but no INSERT got lost :-)))
But sometimes multipath seems to be a bit confused...
1.) one path disabled
In the majority of cases multipath prints...
testhalde2 sbin # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ #:#:#:# 8:0 [active]
\_ 1:0:0:1 sdb 8:16 [active]
But sometimes I get...
testhalde2 usr # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 4:0:0:1 sdb 8:16 [active]
2.) all paths enabled (default)
In the majority of cases multipath prints...
testhalde2 sbin # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [enabled]
\_ 1:0:0:1 sdb 8:16 [active]
\_ 0:0:0:1 sdc 8:32 [active]
But sometimes I get...
testhalde2 usr # multipath -l
150gb ()
[size=150 GB][features="0"][hwhandler="0"]
\_ round-robin 0 [active]
\_ 0:0:0:1 sdb 8:16 [active]
\_ round-robin 0 [enabled]
\_ 4:0:0:1 sdc 8:32 [active]
Regards
Simon
More information about the dm-devel
mailing list