[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[dm-devel] device-errors and multipath device access issue



Hello !

we are using multipath on sles9 and access those devices via /dev/mapper/devicename

on boot, we get lot`s of error messages like

> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64239
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64240
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64241
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64242
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64243
> Aug 28 10:30:16 rac02 kernel: Device sdf not ready.
> Aug 28 10:30:16 rac02 kernel: end_request: I/O error, dev sdf, sector 513824
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64228
> Aug 28 10:30:16 rac02 kernel: Device sdf not ready.
> Aug 28 10:30:16 rac02 kernel: end_request: I/O error, dev sdf, sector 513824
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64228
> Aug 28 10:30:16 rac02 kernel: Device sdf not ready.
> Aug 28 10:30:16 rac02 kernel: end_request: I/O error, dev sdf, sector 514064
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64258
> Aug 28 10:30:16 rac02 kernel: Device sdf not ready.
> Aug 28 10:30:16 rac02 kernel: end_request: I/O error, dev sdf, sector 513680
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64210
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64211
> Aug 28 10:30:16 rac02 kernel: Buffer I/O error on device sdf, logical block 64212

for each alternative path to a lun.

we also get those messages when the system is up and running because some proprietary monitoring software is checking for device availability in regular intervals and there seems no way to tell that software to skip certain devices - so we get spammed with this messages like this in /var/log/messages and are not able to see the real errors anymore.

is there a way to hide those "classic" scsi devices from userspace?
i`m not sure if "blacklist" in multipath.conf is what i need here (?) or if i safely could delete those device-nodes - i`m not very deep into multipathing for now.


furthermore, we have some strange problem with ocfs2 i tracked down to /etc/init.d/boot.multipath.
sometimes (e.g. after adding a LUN in the SAN) cannot start anymore, because the device-mapper "links" to the partitions on the multipath device are inaccessible.

eg we have

/dev/mapper/000123largenumber456
/dev/mapper/000123largenumber789
/dev/mapper/000123largenumber456p1
/dev/mapper/000123largenumber456p2
/dev/mapper/000123largenumber456p3
/dev/mapper/000123largenumber456p4

the first 2 devices show recent timestamp, but the p# devices show old timestamp and aren`t accessible after reboot.
if we reboot a second or third time, this helps sometimes - also manually restarting multipath seems to help and the partitions get accessible again.

looks like they are setup with kpartx from within boot.multipath on every reboot and this goes wrong under certain circumstances because the device is not ready when kpartx want`s access that device. 

i found some reference at 
http://bugs.donarmstrong.com/cgi-bin/bugreport.cgi?bug=376161
and it looks similar:

>I think that the problem is that multipath is too slow creating the devices and
>udev is too fast launching kpartx.

in that bugreport it`s being told that  multipath-tools (0.4.7-8) has a fix, but i don`t get the point what`s the exact problem and what is being fixed 

>   * since Debian's dmestup doesn't include the "export" patch used by other
>     distros (#434241), work around this by implementing a minimal dmsetup_env
>     that can be used by kpartx.udev (Closes: #376161)


in our case, kpartx doesn`t seem to be launched by udev but from within boot.multipath, but it looks like a timing issue because it sometimes happens and sometimes not.

any help or some input (links?docs?) to enlighten me would be highly appreciated

thank you!

roland
sysadmin

ps:
please forgive if these questions sound a little bit dumb and unexperienced - i didn`t setup that environment and it also lacks documentations, but i want to help some other people solving these problems.






_______________________________________________________________________
Jetzt neu! Sch├╝tzen Sie Ihren PC mit McAfee und WEB.DE. 3 Monate
kostenlos testen. http://www.pc-sicherheit.web.de/startseite/?mc=022220



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]