[dm-devel] RE: dm-devel Digest, Vol 12, Issue 5
shahid shaikh
shahid.shaikh at patni.com
Thu Feb 10 03:57:06 UTC 2005
Hi,
I am Shahid. I am very new to device driver writing. I have been working in
this field since last3months.
I want to test multipathing.
I have AS4(2.6.9) and SLES9(2.6.5).
If any body can guide me what configuration, tools and software i need to
test multipathing feature of Device Mapper.
It will very helpfull if somebody also give me the steps to follow to test
multipathing.
Thanks and Regards,
Shahid Shaikh.
Software Engineer.
Patni Computer Systems Ltd.
-----Original Message-----
From: dm-devel-bounces at redhat.com [mailto:dm-devel-bounces at redhat.com]On
Behalf Of dm-devel-request at redhat.com
Sent: Wednesday, February 09, 2005 10:30 PM
To: dm-devel at redhat.com
Subject: dm-devel Digest, Vol 12, Issue 5
Send dm-devel mailing list submissions to
dm-devel at redhat.com
To subscribe or unsubscribe via the World Wide Web, visit
https://www.redhat.com/mailman/listinfo/dm-devel
or, via email, send a message with subject or body 'help' to
dm-devel-request at redhat.com
You can reach the person managing the list at
dm-devel-owner at redhat.com
When replying, please edit your Subject line so it is more specific
than "Re: Contents of dm-devel digest..."
Today's Topics:
1. Re: Some issues noticed in my testing. (Christophe Varoqui)
2. Experiences with multipath-tools and EMC CLARiiON (Tore Anderson)
3. 2.6.11-rc3-udm2 (Alasdair G Kergon)
----------------------------------------------------------------------
Message: 1
Date: Tue, 08 Feb 2005 22:55:45 +0100
From: Christophe Varoqui <christophe.varoqui at free.fr>
Subject: [dm-devel] Re: Some issues noticed in my testing.
To: "Caushik, Ramesh" <ramesh.caushik at intel.com>
Cc: device-mapper development <dm-devel at redhat.com>
Message-ID: <42093561.9070008 at free.fr>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Caushik, Ramesh wrote:
>Hi Christophe,
>
>
Hello,
>I noticed a failure in multipath-tools-0.4.2 when executing the
>following scenario.
>
>i) Run multipathd & multipath with 1 of 2 paths in the system available.
>ii)activate the other path.
>iii) run multipath again.
>
>Expect the newly inserted path to be incorporated into the original path
>groups. Does not happen. Traced it to a problem in the pgcmp2 function.
>Patch below should fix it.
>
I can't see this patch attached.
Can you resend it. (In diff -u format if possible)
> Also in step iii) above running multipath
>with a parameter (conf->dev) will not include the additional path into
>the particular mp because the select_alias function not called in the
>coalesce_paths function so the strncmp(mpp->alias,conf->dev,....) in
>main will always fail. Patch below should fix it.
>
>
>
Mr Goggin, EMC, already spotted this and a patch is present in my tree.
Thanks.
>Also while testing with a modified scsi_debug driver (to expose multiple
>paths to the same device) noticed that sysfs vendor name for the
>scsi_debug device is "Linux " which fails to match with a "Linux"
>vendor name in the conf name and so device specific settings are not
>picked up. Think it is a good idea to chomp off trailing whitespace from
>vendor and product names. See patches below. Regards,
>
>
>
Can you tell me if the following fits your needs here ?
--- multipath-tools-0.4.2/libmultipath/hwtable.c 2005-01-23
14:48:05.830203168 -0800
+++ multipath-tools-0.4.3-pre1/libmultipath/hwtable.c 2005-02-08
13:42:57.203660896 -0800
@@ -12,14 +12,18 @@
extern struct hwentry *
find_hw (vector hwtable, char * vendor, char * product)
{
- int i;
+ int i, vendor_len, product_len;
struct hwentry * hwe;
+ vendor_len = strlen(vendor);
+ product_len = strlen(product);
+
vector_foreach_slot (hwtable, hwe, i)
- if (hwe->vendor && hwe->product &&
- strcmp(hwe->vendor, vendor) == 0 &&
+ if (hwe->vendor && vendor_len == strlen(hwe->vendor) &&
+ strncmp(hwe->vendor, vendor, vendor_len) == 0 &&
+ hwe->product && product_len == strlen(hwe->product) &&
(hwe->product[0] == '*' ||
- strcmp(hwe->product, product) == 0))
+ strncmp(hwe->product, product, product_len) == 0))
return hwe;
return NULL;
}
Regards,
cvaroqui
------------------------------
Message: 2
Date: Wed, 9 Feb 2005 13:14:56 +0100
From: Tore Anderson <tore at linpro.no>
Subject: [dm-devel] Experiences with multipath-tools and EMC CLARiiON
To: dm-devel at redhat.com
Message-ID: <200502091314.56085.tore at linpro.no>
Content-Type: text/plain; charset="utf-8"
[Apologies if this is off-topic for this list, I couldn't find a
dm-user counterpart for it.]
I've been toying around with multipath-tools and an EMC CX200. It hasn't
been working all too well, but at least the developement are on the right
track. I thought I'd share my experiences here, as I would've loved to
find this email in the archives myself a few weeks back. :-)
The hardware used has been a QLogic QLA2340 single-port HBA (the chip
identifies itself as 2312, though) running on a pretty standard IA-32
machine. One switch, a McData Sphereon 4500, and of course the
storage, a Dell|EMCB2 CLARiiON CX200. As there's only a single fabric
I've only been able to test basic failover between the two controllers,
load balancing isn't possible unless you've got dual independent
fabrics because only one controller will accept I/O to a LU at one
point in time. Software used: 2.6.10-udm1, multipath-tools 0.4.2,
Debian Sarge. Also there's a bonnie++ and a tiobench running
constantly towards the LU when it's mounted.
The host is configured thusly in the CX200's administrative interface:
* Initiator type: CLARiiON Open
* Failover mode: 1
* Array CommPath: Enabled
* Unit Serial Number: Array
The configuration file I've ended up using has the following device
section:
device {
# Data General Corporation, and for some reason five spaces.
vendor "DGC "
# Shows up as "RAID 1", "RAID 5", etc.
product "*"
# This is probably only correct for single fabrics.
path_grouping_policy failover
# No idea what the "0" means here..
path_selector "round-robin 0"
# Nor any of the digits here.
hardware_handler "3 emc 0 0"
# I wonder what "1" signifies..
features "1 queue_if_no_path"
getuid_callout "/sbin/scsi_id -g -s /block/%n"
path_checker emc_clariion
}
If anyone could shed some light on what all those digits are, that'd
be nice. Also if there's any obvious errors please let me know. :-)
Anyway, I end up with the following DM table:
navlelo: 0 20971520 multipath 1 queue_if_no_path 1 emc 2 1 \
round-robin 0 1 1 8:0 1000 round-robin 0 1 1 8:16 1000
So, what works and what does not? If I ask the CX200 to move the LU
from the active controller to the non-active one, paths change within
seconds, and I/O are moved to the other sd block device. The system
log is flooded with these lines in pairs:
Device sda not ready.
end_request: I/O error, dev sda, sector 7304072
The sector number obviously changes from entry to entry. Soon the
paths are changed. Due to the errors I guess it isn't asked to
gracefully change paths by the CX200 or anything. The multipath
utility tells me this:
navlelo (36006017cd00e0000343a961a3c5ed811)
[size=10 GB][features="1 queue_if_no_path"][hwhandler="1 emc"]
\_ round-robin 0 [enabled]
\_ 0:0:0:0 sda 8:0 [ready ][failed]
\_ round-robin 0 [active][first]
\_ 0:0:1:0 sdb 8:16 [ready ][active]
Now, that the sda path is failed is obviously wrong. It's there, but
passive (just as sdb was before I migrated the LU to another
controller). Even though multipathd logs that it is checking paths,
the state of the sda path doesn't change. So if I try to move the LU
back to the first controller, it believes both paths have failed and
all I/O stops dead.
I can however restart multipathd (while still having the LU mounted
with heavy I/O to it), and the pathts are rediscovered. For some
reason, this makes dm-emc fail over the LU to the first controller (sda
path), even though the sdb path was already up and running fine. I
don't know why this is, but if dm-emc (or multipathd?) insist on using
the first controller (even though the LU has specified the second one
as the default owner) there will be load balancing problem as the first
controller will get all the I/O while the second one will sit there
doing nothing. Is it possible to change this behaviour somehow?
Host-based failover doesn't seem to work, or at least I cannot figure
out how to do it. "sg_start -s /dev/sdb 1" gives me this:
sync_cache: SCSI status: Check Condition
Fixed format, current; Sense key: Not Ready
Additional sense: Logical unit not ready, manual intervention required
I guess the "tresspass" command is a proprietary thing and what is
needed is just a userspace utility that is able to send the same thing
down the fibre as the dm-emc kernel module is.
So, the fun part - actual path failures. As there's other live
systems using the SAN I couldn't actually yank fibres, so what I did
is just to rezone the switch so that it loses connectivity to the
controller.
Disabling an active path at first works similarily to when I asked the
CX200 to move the LU to another controller. At first, the kernel log
shows this:
kernel: SCSI error : <0 0 0 0> return code = 0x20000
kernel: end_request: I/O error, dev sda, sector 14201920
kernel: end_request: I/O error, dev sda, sector 14201928
These three lines are repeated quite a lot of times, before this one
appears:
kernel: device-mapper: dm-emc: emc_pg_init: sending switch-over command
Now, silence. No I/O is sent to the sdb path, and the sda one is
obviously dead so nothing there either. However, iostat uncovers that
there's still a lot of I/O operation waiting in sda's queue, and the
%util columns hovers around 100. After 57 seconds the three lines
above are repeated a number of times, and ends with these:
multipathd: 8:0 : emc_clariion_checker: query command indicates error
kernel: SCSI error : <0 0 0 0> return code = 0x10000
last message repeated 3 times
Then a lot of stuff from multipathd:
multipathd: dm-0 blacklisted
[...repeated for all blacklisted devices...]
multipathd: path checker already active : 8:0
multipathd: path checker already active : 8:16
multipathd: start up event loops
multipathd: event checker startup : navlelo
multipathd: waiter->event_nr = 5
kernel: SCSI error : <0 0 0 0> return code = 0x10000
last message repeated 5 times
multipathd: checking paths
kernel: SCSI error : <0 0 0 0> return code = 0x10000
last message repeated 3 times
At this point, I/O has begun flowing through sdb, and all the iostat
columns for sda are down to 0. Took some time, but the system now
seems to work fine.
Also it might be worth noting that multipathd provokes a SCSI error
when it checks the paths, so this appears in the system log every five
seconds:
multipathd: checking paths
kernel: SCSI error : <0 0 0 0> return code = 0x10000
Output from the multipath utility is also somewhat interesting now:
0:0:0:0: sg_io failed status 0x0 0x1 0x0 0x0
0:0:0:0: Unable to get INQUIRY vpd 1 page 0x0.
pb getting path info, free
0:0:0:0: sg_io failed status 0x0 0x1 0x0 0x0
0:0:0:0: Unable to get INQUIRY vpd 1 page 0x0.
navlelo (36006017cd00e0000343a961a3c5ed811)
[size=10 GB][features="1 queue_if_no_path"][hwhandler="1 emc"]
\_ round-robin 0 [enabled]
\_ 0:0:0:0 sda 8:0 [faulty][failed]
\_ round-robin 0 [active][first]
\_ 0:0:1:0 sdb 8:16 [ready ][active]
Allright. I'll try adding back the path to the first controller..
multipathd: /sbin/multipath -v 0 -S navlelo
multipathd: devmap event on navlelo
kernel: SCSI error : <0 0 1 0> return code = 0x20000
kernel: end_request: I/O error, dev sdb, sector 11534336
multipathd: refresh devmaps list
multipathd: devmap navlelo :
multipathd: \_ 0 20971520 multipath
multipathd: refresh failpaths list
multipathd: dm-0 blacklisted
[..repeated as above..]
multipathd: path checker already active : 8:0
multipathd: path checker already active : 8:16
multipathd: start up event loops
multipathd: event checker startup : navlelo
multipathd: waiter->event_nr = 6
multipathd: checking paths
multipathd: 8:0 : emc_clariion_checker: Path healthy
multipathd: /sbin/multipath -v 0 -S 8:0
multipathd: reconfigure 8:0 multipath
multipathd: /sbin/multipath -v 0 -S navlelo
multipathd: devmap event on navlelo
kernel: device-mapper: dm-emc: long trespass command will be send
kernel: device-mapper: dm-emc: honor reservation bit will not be set
(default)
kernel: device-mapper: dm-emc: get_failover_bio: bio_alloc() failed.
kernel: device-mapper: dm-emc: emc_trespass_get: no bio
kernel: device-mapper: dm-emc: emc_pg_init: no rq
kernel: device-mapper: dm-emc: get_failover_bio: bio_alloc() failed.
kernel: device-mapper: dm-emc: emc_trespass_get: no bio
kernel: device-mapper: dm-emc: emc_pg_init: no rq
kernel: Buffer I/O error on device dm-0, logical block 1438113
kernel: lost page write due to I/O error on dm-0
The last two lines are repeated nine times (with different blocks),
then two seconds later:
multipathd: refresh devmaps list
multipathd: devmap navlelo :
multipathd: \_ 0 20971520 multipath
multipathd: refresh failpaths list
Both sda and sdb have only zeroes in iostat, and nothing interesting
is to be seen in the logs. After over a minute I get bored by waiting
and did an "ls /mnt" (where the LU is mounted), and immediately got
this:
kernel: EXT3-fs error (device dm-0): ext3_readdir: directory #2 contains a
hole at offset 0
kernel: Aborting journal on device dm-0.
kernel: printk: 3387 messages suppressed.
kernel: Buffer I/O error on device dm-0, logical block 521
kernel: lost page write due to I/O error on dm-0
kernel: Buffer I/O error on device dm-0, logical block 0
kernel: lost page write due to I/O error on dm-0
kernel: ext3_abort called.
kernel: EXT3-fs error (device dm-0): ext3_journal_start_sb: Detected
aborted journal
kernel: Remounting filesystem read-only
Ouch. :-( The multipath binary only says "Alarmen gikk" (Norwegian
for "The alarm went off" or something like that) when I try to get the
current configuration.
I guess I can conclude that the EMC CLARiiON support isn't quite ready
for production with multipath-tools yet. Unless of course there's
anyone on the list who has any good suggestions as to how to make it
work better..?
Cristophe: Should I make a summary of this email and add it to your
TestedEnvironments page? I'm not sure if I could've made it work by
configuring multipathd in another way (for instance the numbers on the
hardware_handler line I just found in some mailing list archive, I have
no idea what they mean) - so I'm a bit afraid of ending up updating it
with flat-out incorrect information.
I'm happy to do more testing on this system for anyone who wants it.
Regards,
--
Tore Anderson
------------------------------
Message: 3
Date: Wed, 9 Feb 2005 15:03:23 +0000
From: Alasdair G Kergon <agk at redhat.com>
Subject: [dm-devel] 2.6.11-rc3-udm2
To: dm-devel at redhat.com
Message-ID: <20050209150323.GW10195 at agk.surrey.redhat.com>
Content-Type: text/plain; charset=us-ascii
ftp://sources.redhat.com/pub/dm/patches/2.6-unstable/2.6.11-rc3/2.6.11-rc3-u
dm2.tar.bz2
Same as udm1 with a bit of tidying up.
Simplified the locking for now: it uses the per-device multipath lock
everywhere.
We can refine this later to address the performance issues.
Updated version numbers and added a few more comments.
Please re-test and review this patchset - I'd like to submit patches 7-10
to -mm
after adding more documentation.
Alasdair
------------------------------
--
dm-devel mailing list
dm-devel at redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
End of dm-devel Digest, Vol 12, Issue 5
***************************************
http://www.patni.com
World-Wide Partnerships. World-Class Solutions.
_____________________________________________________________________
This e-mail message may contain proprietary, confidential or legally
privileged information for the sole use of the person or entity to
whom this message was originally addressed. Any review, e-transmission
dissemination or other use of or taking of any action in reliance upon
this information by persons or entities other than the intended
recipient is prohibited. If you have received this e-mail in error
kindly delete this e-mail from your records. If it appears that this
mail has been forwarded to you without proper authority, please notify
us immediately at netadmin at patni.com and delete this mail.
_____________________________________________________________________
More information about the dm-devel
mailing list