[Linux-cluster] Readhead Issues using cluster-1.01.00

Wed Nov 2 17:29:35 UTC 2005

On Wed, Nov 02, 2005 at 05:36:18PM +0100, Velu Erwan wrote:
> Velu Erwan a ?crit :
> 
> >Velu Erwan a ?crit :
> >
> >1?) Why this volume is so big ? On my system it reaches ~8192 ExaBytes !
> >The first time I saw  that I thought it was an error...
> >
> >[root at max4 ~]# cat /proc/partitions  | grep  -e "major" -e "diapered"
> >major minor  #blocks  name
> >252     0 9223372036854775807 diapered_g1v1
> >[root at max4 ~]#
> >
> I don't know if it's normal or not but gd->capacity is set to zero then 
> -1 is substract.
> As gd->capacity is a unsigned long we reach the maximum size.

The size of the diaper device is intentionally set to the max size; all
requests are then just passed through to the real device regardless of how
large the real device is.

> >2?) Regarding the source code, this diaper volume never set the 
> >"gd->queue->backing_dev_info.ra_pages" which is set to zero by
> >gd->queue = blk_alloc_queue(GFP_KERNEL);
> >Is it needed to enforce the cache/lock management or is it just a miss ?
> >This could explain why the reading performances are low while 
> >gfs_read() makes a generic_file_read() isn't it ?
> 
> I've made this patch which still uses a hardcoded value but where the 
> diapered volume have a ra_pages set.
> Using 2048 give some excellent results.
> This patch make the previous one obsolete for sure. Please found it 
> attached.
> But I don't know how it affects gfs for its cache/lock management 
> because maybe having some pages in cache could create some coherency 
> troubles.
> 
> What do you think about that ?

I don't know, but here are a couple things you might look into:

- Did this problem exist a few kernel versions ago?  We should try the
  RHEL4 kernel (or something close to it) and the version of gfs that
  runs on that (RHEL4 cvs branch).  If that version is ok, then there's
  probably a recent kernel change that we've missed that requires us to
  do something new.

- Remove the diaper code and see if that changes things.  Look at these
  patches where I removed the diaper code from gfs2; do the equivalent
  for the version of gfs you're playing with:
  http://sources.redhat.com/ml/cluster-cvs/2005-q3/msg00184.html

I suspect that read-ahead should indeed be happening and that something
has broken it recently.  I think we should first figure out how it worked
in the past.

Thanks,
Dave