[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [GFS2 PATCH] GFS2: take a breath in the glock shrinker



Hi,

On Tue, 2013-07-02 at 15:36 -0400, Bob Peterson wrote:
> Hi,
> 
> This patch changes the gfs2 glock shrinker code so that it does
> several things to prevent swamping the CPU:
> 
> First, it now uses a spin_trylock for the glocks to prevent unnecessary
> spinning; if the spin_lock is locked, we don't want to mess with it
> anyway.
> 
A more important question is why the spinlock would be locked in the
first place. The intent of the lru list is to feed the most idle glocks
to the shrinker, so there should be nothing using those locks if the
process is working correctly. If this is a case of multiple threads of
shrinking all working on the same glocks, then I suspect that the
correct solution is just to use the per-cpu LRU lists of Dave Chinner et
al.

> Second, it tries to queue the work queue process for demotable ahead
> of non-demotable glocks. It does this by specifying a timeout value of
> zero for demotable, and 1 millisecond for not-demotable.
> 
Does that really make a difference on its own? In general there will be
no work to do for non-demotable locks as there will only very
occasionally be something that needs processing. So I'm not convinced
yet that this is really the right thing to do.

> Third, it calls msleep for a short time (based on the amount of work
> queued) after the shrinker is run to give the processes a chance to
> run without swamping the CPU.
> 
Was adding a yield() or cond_resched() here not enough?

Steve.

> This addresses a problem whereby "du" (or similar operations such as
> file system backups) on a large GFS2 file system causes cpu spikes
> and IO dead zones where little work gets done due to a large volume
> of delayed workqueue items which are queued by the glock shrinker.
> 
> The problem is best illustrated with some vmstat output. Before the
> patch, once memory was full, the du caused IO dropouts in which
> runnable processes ("r") drops, and reads ("bi") drops. Here are two
> examples of "vmstat 1":
> 
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
> shrink 1
>  2  0      0 461708   6376 13643252    0    0  9896     0 13625 20243  0  2 97  1
>  1  0      0 443468   6376 13655888    0    0 13568     0 14436 14558  0  1 97  2
>  0  1      0 431508   6376 13661628    0    0  5556     0 5900 11233  0  1 97  2
>  2  0      0 411620   6376 13670568    0    0  9004     0 13425 18130  0  2 97  2
>  0  1      0 388928   6376 13680684    0    0 10164     0 16200 20415  0  2 97  1
> shrink 2
>  1  0      0 217232   6376 13677452    0    0  9976     0 15874 20089  0  2 97  1
>  0  1      0 517628   6376 13497928    0    0 12532     0 15559 14542  0  3 95  2
>  0  1      0 508752   6376 13502676    0    0  4772     0 4922 9634  0  1 97  2
>  1  0      0 493984   6376 13508808    0    0  7304     0 10094 14729  0  1 97  2
>  1  1      0 473184   6376 13518484    0    0  9560     0 14666 19196  0  2 97  1
> 
> With the patch, these dropouts become hard to find (mostly gone),
> and when they are seen, they're much less severe. Here are two
> examples with the same commands:
> 
> shrink 1
>  0  1      0 578836   8436 14007576    0    0 13740     0 16176 16990  0  1 97  2
>  1  0      0 561104   8436 14017128    0    0  9148     0 12265 16212  0  1 97  2
>  1  0      0 547588   8436 14023356    0    0  6368     0 7981 13114  0  1 97  2
>  1  0      0 525660   8436 14033772    0    0 10432     0 16281 20929  0  2 97  1
>  1  0      0 503568   8436 14044132    0    0 10548     0 16645 21161  0  2 97  1
> shrink 2
>  2  0      0 380856   8436 13807216    0    0 10180     0 16068 20439  0  2 97  1
>  2  0      0 361308   8436 13820708    0    0 14000     0 14973 15623  0  1 97  2
>  1  0      0 345608   8436 13827764    0    0  7060     0 9216 14188  0  1 97  2
>  1  0      0 329532   8436 13834964    0    0  7260     0 9505 14656  0  1 97  2
>  2  1      0 308868   8436 13844460    0    0  9204     0 14332 18850  0  2 97  2
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems
> 
> Signed-off-by: Bob Peterson <rpeterso redhat com> 
> ---
>  fs/gfs2/glock.c | 25 ++++++++++++++++++-------
>  1 file changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 9435384..019bf5f 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -1393,11 +1393,13 @@ static int glock_cmp(void *priv, struct list_head *a, struct list_head *b)
>   * private)
>   */
>  
> -static void gfs2_dispose_glock_lru(struct list_head *list)
> +static int gfs2_dispose_glock_lru(struct list_head *list)
>  __releases(&lru_lock)
>  __acquires(&lru_lock)
>  {
>  	struct gfs2_glock *gl;
> +	int may_demote;
> +	int queued = 0;
>  
>  	list_sort(NULL, list, glock_cmp);
>  
> @@ -1407,16 +1409,22 @@ __acquires(&lru_lock)
>  		clear_bit(GLF_LRU, &gl->gl_flags);
>  		gfs2_glock_hold(gl);
>  		spin_unlock(&lru_lock);
> -		spin_lock(&gl->gl_spin);
> -		if (demote_ok(gl))
> -			handle_callback(gl, LM_ST_UNLOCKED, 0, false);
> +		may_demote = 0;
> +		if (spin_trylock(&gl->gl_spin)) {
> +			may_demote = demote_ok(gl);
> +			if (demote_ok(gl))
> +				handle_callback(gl, LM_ST_UNLOCKED, 0, false);
> +			spin_unlock(&gl->gl_spin);
> +		}
>  		WARN_ON(!test_and_clear_bit(GLF_LOCK, &gl->gl_flags));
>  		smp_mb__after_clear_bit();
> -		if (queue_delayed_work(glock_workqueue, &gl->gl_work, 0) == 0)
> +		if (queue_delayed_work(glock_workqueue, &gl->gl_work,
> +				       may_demote ? 0 : 1) == 0)
>  			gfs2_glock_put_nolock(gl);
> -		spin_unlock(&gl->gl_spin);
> +		queued++;
>  		spin_lock(&lru_lock);
>  	}
> +	return queued;
>  }
>  
>  /**
> @@ -1433,6 +1441,7 @@ static void gfs2_scan_glock_lru(int nr)
>  	struct gfs2_glock *gl;
>  	LIST_HEAD(skipped);
>  	LIST_HEAD(dispose);
> +	int queued = 0;
>  
>  	spin_lock(&lru_lock);
>  	while(nr && !list_empty(&lru_list)) {
> @@ -1450,8 +1459,10 @@ static void gfs2_scan_glock_lru(int nr)
>  	}
>  	list_splice(&skipped, &lru_list);
>  	if (!list_empty(&dispose))
> -		gfs2_dispose_glock_lru(&dispose);
> +		queued = gfs2_dispose_glock_lru(&dispose);
>  	spin_unlock(&lru_lock);
> +	if (queued >= 8)
> +		msleep(queued >> 3);
>  }
>  
>  static int gfs2_shrink_glock_memory(struct shrinker *shrink,
> 



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]