[libvirt] [PATCH] storage_backend_rbd: always call rados_conf_read_file when connect a rbd pool

Chen Hanxiao chen_han_xiao at 126.com
Wed Jan 11 02:59:35 UTC 2017



At 2017-01-11 02:23:54, "John Ferlan" <jferlan at redhat.com> wrote:
>
>
>On 12/30/2016 03:39 AM, Chen Hanxiao wrote:
>> From: Chen Hanxiao <chenhanxiao at gmail.com>
>> 
>> This patch fix a dead lock when try to read a rbd image
>> 
>> When trying to connect a rbd server
>> (ceph-0.94.7-1.el7.centos.x86_64),
>> 
>> rbd_list/rbd_open enter a dead lock state.
>> 
>> Backtrace:
>> Thread 30 (Thread 0x7fdb342d0700 (LWP 12105)):
>> #0  0x00007fdb40b16705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
>> #1  0x00007fdb294273f1 in librados::IoCtxImpl::operate_read(object_t const&, ObjectOperation*, ceph::buffer::list*, int) () from /lib64/librados.so.2
>> #2  0x00007fdb29429fcc in librados::IoCtxImpl::read(object_t const&, ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2
>> #3  0x00007fdb293e850c in librados::IoCtx::read(std::string const&, ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2
>> #4  0x00007fdb2b9dd15e in librbd::list(librados::IoCtx&, std::vector<std::string, std::allocator<std::string> >&) () from /lib64/librbd.so.1
>> #5  0x00007fdb2b98c089 in rbd_list () from /lib64/librbd.so.1
>> #6  0x00007fdb2e1a8052 in virStorageBackendRBDRefreshPool (conn=<optimized out>, pool=0x7fdafc002d50) at storage/storage_backend_rbd.c:366
>> #7  0x00007fdb2e193833 in storagePoolCreate (obj=0x7fdb1c1fd5a0, flags=<optimized out>) at storage/storage_driver.c:876
>> #8  0x00007fdb43790ea1 in virStoragePoolCreate (pool=pool at entry=0x7fdb1c1fd5a0, flags=0) at libvirt-storage.c:695
>> #9  0x00007fdb443becdf in remoteDispatchStoragePoolCreate (server=0x7fdb45fb2ab0, msg=0x7fdb45fb3db0, args=0x7fdb1c0037d0, rerr=0x7fdb342cfc30, client=<optimized out>) at remote_dispatch.h:14383
>> #10 remoteDispatchStoragePoolCreateHelper (server=0x7fdb45fb2ab0, client=<optimized out>, msg=0x7fdb45fb3db0, rerr=0x7fdb342cfc30, args=0x7fdb1c0037d0, ret=0x7fdb1c1b3260) at remote_dispatch.h:14359
>> #11 0x00007fdb437d9c42 in virNetServerProgramDispatchCall (msg=0x7fdb45fb3db0, client=0x7fdb45fd1a80, server=0x7fdb45fb2ab0, prog=0x7fdb45fcd670) at rpc/virnetserverprogram.c:437
>> #12 virNetServerProgramDispatch (prog=0x7fdb45fcd670, server=server at entry=0x7fdb45fb2ab0, client=0x7fdb45fd1a80, msg=0x7fdb45fb3db0) at rpc/virnetserverprogram.c:307
>> #13 0x00007fdb437d4ebd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fdb45fb2ab0) at rpc/virnetserver.c:135
>> #14 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fdb45fb2ab0) at rpc/virnetserver.c:156
>> #15 0x00007fdb436cfb35 in virThreadPoolWorker (opaque=opaque at entry=0x7fdb45fa7650) at util/virthreadpool.c:145
>> #16 0x00007fdb436cf058 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
>> #17 0x00007fdb40b12df5 in start_thread () from /lib64/libpthread.so.0
>> #18 0x00007fdb408401ad in clone () from /lib64/libc.so.6
>> 
>> 366             len = rbd_list(ptr.ioctx, names, &max_size);
>> (gdb) n
>> [New Thread 0x7fdb20758700 (LWP 22458)]
>> [New Thread 0x7fdb20556700 (LWP 22459)]
>> [Thread 0x7fdb20758700 (LWP 22458) exited]
>> [New Thread 0x7fdb20455700 (LWP 22460)]
>> [Thread 0x7fdb20556700 (LWP 22459) exited]
>> [New Thread 0x7fdb20556700 (LWP 22461)]
>> 
>> infinite loop...
>> 
>> Signed-off-by: Chen Hanxiao <chenhanxiao at gmail.com>
>> ---
>>  src/storage/storage_backend_rbd.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>> 
>
>Could you provide a bit more context...
>
>Why does calling rados_conf_read_file with a NULL resolve the issue?
>
>Is this something "new" or "expected"? And if expected, why are we only
>seeing it now?
>
>What is the other thread that "has" the lock doing?

It seams that the server side of ceph does not response our request.

So when libvirt call rbd_open/rbd_list, etc, it never return.

But qemu works fine.
So I take qemu's code as a reference.
https://github.com/qemu/qemu/blob/master/block/rbd.c#L365

rados_conf_read_file with a NULL will try to get ceph conf file from 
/etc/ceph and other default paths.

Althougth we  rados_conf_set in the following code,
w/o rados_conf_read_file,
ceph-0.94.7-1.el7 does not answer our rbd_open.

Some elder or newer ceph server does not have this issue.
I think this may be a ceph server bug of ceph-0.94.7-1.el7.

Doing rados_conf_read_file(cluster, NULL)
will make our code more robust.

Regards,
- Chen

>
>>From my cursory/quick read of :
>
>http://docs.ceph.com/docs/master/rados/api/librados/
>
>...
>"Then you configure your rados_t to connect to your cluster, either by
>setting individual values (rados_conf_set()), using a configuration file
>(rados_conf_read_file()), using command line options
>(rados_conf_parse_argv()), or an environment variable
>(rados_conf_parse_env()):"
>
>Since we use rados_conf_set, that would seem to indicate we're OK. It's
>not clear from just what's posted why calling eventually calling
>rbd_list is causing a hang.
>
>I don't have the cycles or environment to do the research right now and
>it really isn't clear why a read_file would resolve the issue.
>
>John
>> diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c
>> index b1c51ab..233737b 100644
>> --- a/src/storage/storage_backend_rbd.c
>> +++ b/src/storage/storage_backend_rbd.c
>> @@ -95,6 +95,9 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr,
>>              goto cleanup;
>>          }
>>  
>> +        /* try default location, but ignore failure */
>> +        rados_conf_read_file(ptr->cluster, NULL);
>> +
>>          if (!conn) {
>>              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>>                             _("'ceph' authentication not supported "
>> @@ -124,6 +127,10 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr,
>>                             _("failed to create the RADOS cluster"));
>>              goto cleanup;
>>          }
>> +
>> +        /* try default location, but ignore failure */
>> +        rados_conf_read_file(ptr->cluster, NULL);
>> +
>>          if (virStorageBackendRBDRADOSConfSet(ptr->cluster,
>>                                               "auth_supported", "none") < 0)
>>              goto cleanup;
>> 
>
>--
>libvir-list mailing list
>libvir-list at redhat.com
>https://www.redhat.com/mailman/listinfo/libvir-list




More information about the libvir-list mailing list