[Cluster-devel] GFS2: Fix race in glock lru glock disposal
Steven Whitehouse
swhiteho at redhat.com
Mon Jun 23 13:42:55 UTC 2014
We must not leave items on the LRU list with GLF_LOCK set, since
they can be removed if the glock is brought back into use, which
may then potentially result in a hang, waiting for GLF_LOCK to
clear.
It doesn't happen very often, since it requires a glock that has
not been used for a long time to be brought back into use at the
same moment that the shrinker is part way through disposing of
glocks.
The fix is to set GLF_LOCK at a later time, when we already know
that the other locks can be obtained. Also, we now only release
the lru_lock in case a resched is needed, rather than on every
iteration.
Signed-off-by: Steven Whitehouse <swhiteho at redhat.com>
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index c355f73..e607b02 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1404,12 +1404,16 @@ __acquires(&lru_lock)
gl = list_entry(list->next, struct gfs2_glock, gl_lru);
list_del_init(&gl->gl_lru);
if (!spin_trylock(&gl->gl_spin)) {
+add_back_to_lru:
list_add(&gl->gl_lru, &lru_list);
atomic_inc(&lru_count);
continue;
}
+ if (test_and_set_bit(GLF_LOCK, &gl->gl_flags)) {
+ spin_unlock(&gl->gl_spin);
+ goto add_back_to_lru;
+ }
clear_bit(GLF_LRU, &gl->gl_flags);
- spin_unlock(&lru_lock);
gl->gl_lockref.count++;
if (demote_ok(gl))
handle_callback(gl, LM_ST_UNLOCKED, 0, false);
@@ -1417,7 +1421,7 @@ __acquires(&lru_lock)
if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
gl->gl_lockref.count--;
spin_unlock(&gl->gl_spin);
- spin_lock(&lru_lock);
+ cond_resched_lock(&lru_lock);
}
}
@@ -1442,7 +1446,7 @@ static long gfs2_scan_glock_lru(int nr)
gl = list_entry(lru_list.next, struct gfs2_glock, gl_lru);
/* Test for being demotable */
- if (!test_and_set_bit(GLF_LOCK, &gl->gl_flags)) {
+ if (!test_bit(GLF_LOCK, &gl->gl_flags)) {
list_move(&gl->gl_lru, &dispose);
atomic_dec(&lru_count);
freed++;
More information about the Cluster-devel
mailing list