[dm-devel] multipath-tools: race condition between 'mutipath -r' and multipathd path checker causing segfault
Merla, ShivaKrishna
ShivaKrishna.Merla at netapp.com
Thu Mar 20 16:44:06 UTC 2014
Hi All,
We found a strange issue with race between multipathd and "multipath -r" which causes segfault. Below are the sequence of operations that are leading to this.
1. Configure system with multipath and iSCSI connection. Enable paths to both controller ( rdac prio values 14 and 9 and group_by_prio is the pg policy).
2. Fail one of the connection to multipath device. ( physical paths transition to running->blocked->transport-offine and multipathd puts them in faulty/offline state ).
3. Run "multipath -r".
4. All faulty paths are removed from dm device and maps are reloaded with only good paths. ( as open() call on faulty devices fails with ENXIO during pathinfo() )
5. multipathd: update_multipath_strings() called during check_path will update/removes path_groups based on info from libdevmapper().
6. When failed connection is restored. Devices are placed in Running state again and path checker calls enable_group() on path group which has been freed and causes segfault.
Mar 17 22:36:56 ictm-rediff kernel: multipathd[12405]: segfault at 8 ip 00000000004077f9 sp 00007fb6a94e2cf0 error 4 in multipathd[400000+10000]
Not only segfault, but this also leads to issue switching to path group which is no longer present in kernel.
Mar 17 22:36:44 ictm-rediff multipathd: 360080e500034173900008ca45307fd63: switch to path group #2
Mar 17 04:13:41 ictm-rediff kernel: device-mapper: multipath: invalid PG number supplied to switch_pg_num
So, calling "multipath -r" is really recommended when faulty paths are present in system?. Is this a limitation?. Can multipathd deal with these scenarios?.
Thanks
Shiva
More information about the dm-devel
mailing list