[Linux-cachefs] locking/refcount problems in cachefiles.
David Howells
dhowells at redhat.com
Wed Jan 29 12:17:15 UTC 2014
NeilBrown <neilb at suse.de> wrote:
> Analysis of the crash dump suggests that fscache_object_destroy, and thus
> __rb_erase_colour, is being called on an object that has already been
> destroy and is no longer in the rb tree. The rbtree code gets upset and
> crashes.
Not unreasonably... But which rb_tree? There are two:
(1) struct cachefiles_cache::active_nodes.
This is governed by struct cachefiles_cache::active_lock.
(2) fscache_object_list.
This is governed by fscache_object_list_lock.
Unless you have CONFIG_FSCACHE_OBJECT_LIST=y this isn't present and
fscache_objlist_remove() does nothing - in which case all
fscache_object_destroy() does is release the cookie.
Can you poke around in the registers, see if any of them point to tree (2)
(which is a global variable).
> Thus you can get a race
> ...
> cachefiles_mark_object_active increments
> ->usage (to 1) and drops the lock
This is tree (1).
> cachefiles_put_object calls
> fscache_object_destroy which
> unlinks from the rb tree.
And this is tree (2).
> cachefiles_objects live in an rbtree which does not imply a reference to
> the object.
Whilst that is true, they're not allowed to be in the rbtree unless they still
have at least one reference outstanding.
Apart from cachefiles_walk_to_object()'s "check_error" labelled part, objects
are only rb_erase()'d in cachefiles_drop_object(). This is called from the
fscache object state machine (fscache_drop_object) which holds a ref on the
cachefiles object until fscache_object_work_func() releases it just prior to
returning.
David
More information about the Linux-cachefs
mailing list