[libvirt] [PATCH] storage_backend_rbd: always call rados_conf_read_file when connect a rbd pool

John Ferlan jferlan at redhat.com
Tue Jan 10 18:23:54 UTC 2017



On 12/30/2016 03:39 AM, Chen Hanxiao wrote:
> From: Chen Hanxiao <chenhanxiao at gmail.com>
> 
> This patch fix a dead lock when try to read a rbd image
> 
> When trying to connect a rbd server
> (ceph-0.94.7-1.el7.centos.x86_64),
> 
> rbd_list/rbd_open enter a dead lock state.
> 
> Backtrace:
> Thread 30 (Thread 0x7fdb342d0700 (LWP 12105)):
> #0  0x00007fdb40b16705 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x00007fdb294273f1 in librados::IoCtxImpl::operate_read(object_t const&, ObjectOperation*, ceph::buffer::list*, int) () from /lib64/librados.so.2
> #2  0x00007fdb29429fcc in librados::IoCtxImpl::read(object_t const&, ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2
> #3  0x00007fdb293e850c in librados::IoCtx::read(std::string const&, ceph::buffer::list&, unsigned long, unsigned long) () from /lib64/librados.so.2
> #4  0x00007fdb2b9dd15e in librbd::list(librados::IoCtx&, std::vector<std::string, std::allocator<std::string> >&) () from /lib64/librbd.so.1
> #5  0x00007fdb2b98c089 in rbd_list () from /lib64/librbd.so.1
> #6  0x00007fdb2e1a8052 in virStorageBackendRBDRefreshPool (conn=<optimized out>, pool=0x7fdafc002d50) at storage/storage_backend_rbd.c:366
> #7  0x00007fdb2e193833 in storagePoolCreate (obj=0x7fdb1c1fd5a0, flags=<optimized out>) at storage/storage_driver.c:876
> #8  0x00007fdb43790ea1 in virStoragePoolCreate (pool=pool at entry=0x7fdb1c1fd5a0, flags=0) at libvirt-storage.c:695
> #9  0x00007fdb443becdf in remoteDispatchStoragePoolCreate (server=0x7fdb45fb2ab0, msg=0x7fdb45fb3db0, args=0x7fdb1c0037d0, rerr=0x7fdb342cfc30, client=<optimized out>) at remote_dispatch.h:14383
> #10 remoteDispatchStoragePoolCreateHelper (server=0x7fdb45fb2ab0, client=<optimized out>, msg=0x7fdb45fb3db0, rerr=0x7fdb342cfc30, args=0x7fdb1c0037d0, ret=0x7fdb1c1b3260) at remote_dispatch.h:14359
> #11 0x00007fdb437d9c42 in virNetServerProgramDispatchCall (msg=0x7fdb45fb3db0, client=0x7fdb45fd1a80, server=0x7fdb45fb2ab0, prog=0x7fdb45fcd670) at rpc/virnetserverprogram.c:437
> #12 virNetServerProgramDispatch (prog=0x7fdb45fcd670, server=server at entry=0x7fdb45fb2ab0, client=0x7fdb45fd1a80, msg=0x7fdb45fb3db0) at rpc/virnetserverprogram.c:307
> #13 0x00007fdb437d4ebd in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x7fdb45fb2ab0) at rpc/virnetserver.c:135
> #14 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x7fdb45fb2ab0) at rpc/virnetserver.c:156
> #15 0x00007fdb436cfb35 in virThreadPoolWorker (opaque=opaque at entry=0x7fdb45fa7650) at util/virthreadpool.c:145
> #16 0x00007fdb436cf058 in virThreadHelper (data=<optimized out>) at util/virthread.c:206
> #17 0x00007fdb40b12df5 in start_thread () from /lib64/libpthread.so.0
> #18 0x00007fdb408401ad in clone () from /lib64/libc.so.6
> 
> 366             len = rbd_list(ptr.ioctx, names, &max_size);
> (gdb) n
> [New Thread 0x7fdb20758700 (LWP 22458)]
> [New Thread 0x7fdb20556700 (LWP 22459)]
> [Thread 0x7fdb20758700 (LWP 22458) exited]
> [New Thread 0x7fdb20455700 (LWP 22460)]
> [Thread 0x7fdb20556700 (LWP 22459) exited]
> [New Thread 0x7fdb20556700 (LWP 22461)]
> 
> infinite loop...
> 
> Signed-off-by: Chen Hanxiao <chenhanxiao at gmail.com>
> ---
>  src/storage/storage_backend_rbd.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 

Could you provide a bit more context...

Why does calling rados_conf_read_file with a NULL resolve the issue?

Is this something "new" or "expected"? And if expected, why are we only
seeing it now?

What is the other thread that "has" the lock doing?

>From my cursory/quick read of :

http://docs.ceph.com/docs/master/rados/api/librados/

...
"Then you configure your rados_t to connect to your cluster, either by
setting individual values (rados_conf_set()), using a configuration file
(rados_conf_read_file()), using command line options
(rados_conf_parse_argv()), or an environment variable
(rados_conf_parse_env()):"

Since we use rados_conf_set, that would seem to indicate we're OK. It's
not clear from just what's posted why calling eventually calling
rbd_list is causing a hang.

I don't have the cycles or environment to do the research right now and
it really isn't clear why a read_file would resolve the issue.

John
> diff --git a/src/storage/storage_backend_rbd.c b/src/storage/storage_backend_rbd.c
> index b1c51ab..233737b 100644
> --- a/src/storage/storage_backend_rbd.c
> +++ b/src/storage/storage_backend_rbd.c
> @@ -95,6 +95,9 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr,
>              goto cleanup;
>          }
>  
> +        /* try default location, but ignore failure */
> +        rados_conf_read_file(ptr->cluster, NULL);
> +
>          if (!conn) {
>              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>                             _("'ceph' authentication not supported "
> @@ -124,6 +127,10 @@ virStorageBackendRBDOpenRADOSConn(virStorageBackendRBDStatePtr ptr,
>                             _("failed to create the RADOS cluster"));
>              goto cleanup;
>          }
> +
> +        /* try default location, but ignore failure */
> +        rados_conf_read_file(ptr->cluster, NULL);
> +
>          if (virStorageBackendRBDRADOSConfSet(ptr->cluster,
>                                               "auth_supported", "none") < 0)
>              goto cleanup;
> 




More information about the libvir-list mailing list