[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [Upstream patch] DLM: Convert rsb data from linked list to rb_tree

On Wed, Oct 05, 2011 at 03:25:39PM -0400, Bob Peterson wrote:
> Hi,
> This upstream patch changes the way DLM keeps track of RSBs.
> Before, they were in a linked list off a hash table.  Now,
> they're an rb_tree off the same hash table.  This speeds up
> DLM lookups greatly.
> Today's DLM is faster than older DLMs for many file systems,
> (e.g. in RHEL5) due to the larger hash table size.  However,
> this rb_tree implementation scales much better.  For my
> 1000-directories-with-1000-files test, the patch doesn't
> show much of an improvement.  But when I scale the file system
> to 4000 directories with 4000 files (16 million files), it
> helps greatly. The time to do rm -fR /mnt/gfs2/* drops from
> 42.01 hours to 23.68 hours.

How many hash table buckets were you using in that test?
If it was the default (1024), I'd be interested to know how
16k compares.

> With this patch I believe we could also reduce the size of
> the hash table again or eliminate it completely, but we can
> evaluate and do that later.
> NOTE: Today's upstream DLM code has special code to
> pre-allocate RSB structures for faster lookup.  This patch
> eliminates that step, since it doesn't have a resource name
> at that time for inserting new entries in the rb_tree.

We need to keep that; why do you say there's no resource name?
pre_rsb_struct() and get_rsb_struct() are specially designed to work
as they do because:

> @@ -367,28 +336,16 @@ static int get_rsb_struct(struct dlm_ls *ls, char *name, int len,
>  			  struct dlm_rsb **r_ret)
> +	r = dlm_allocate_rsb(ls);
> +	if (!r)
> +		return -ENOMEM;

That's not allowed here because a spinlock is held:

>  	spin_lock(&ls->ls_rsbtbl[bucket].lock);
>  	error = _search_rsb(ls, name, namelen, bucket, flags, &r);
> @@ -508,10 +492,6 @@ static int find_rsb(struct dlm_ls *ls, char *name, int namelen,
>  		goto out_unlock;
>  	error = get_rsb_struct(ls, name, namelen, &r);
> -	if (error == -EAGAIN) {
> -		spin_unlock(&ls->ls_rsbtbl[bucket].lock);
> -		goto retry;
> -	}
>  	if (error)
>  		goto out_unlock;

If you try to fix the problem above by releasing the spinlock between the
search and the malloc, then you have to repeat the search.  Eliminating
the repeated search is the main reason for pre_rsb/get_rsb.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]