[Linux-cachefs] NFS performance not as expected with FSCache

Ben Yarwood ben.yarwood at juno.co.uk
Mon Oct 31 12:02:47 UTC 2011


We have had a very heavily loaded web server which delivers mp3 files
through apache with the files residing on an NFS file system.  The server
receives around 4-10 requests a second, and the files are each around 600KB
in size.  The server delivers a throughput of between 20Mb/s - 60Mb/s
depending on the time of day.   The NFS file system being used is read only.

 

The load on this server was increasing to over 100 at busy times during the
day, I thought,  due to high wait times from the NFS server   As the load
was so high, the server was becoming unresponsive so we have implemented
fscache with local ssd drives to alleviate this on a new server.

 

The fs cache implementation has been somewhat successful, load figures on
the new server are now down in the rang of 0-30 which is much better, and
looking at throughput on the network and the fscache stats, I can see that
around 75% of requests are now satisfied by fscache.  See fscache stats
below, which I hope I am interpreting correctly.  Despite this, files which
are definitely cached by fscache still can take over 10 seconds to deliver
to our monitoring server connected via a 100Mbit switch, but sometime take
less than a second, and I am not sure why.  Our monitoring server every
minute asks for the same file to be delivered to it, hence I know its cached
by fscache. 

 

I have spent a lot of time checking out articles on NFS client performance
and have made a number of changes to kernel networking and sunrpc parameters
(see below also) in an attempt to resolve the issue which have definitely
helped.

 

Can anyone shed any light on why a file cached by fscache might take so long
to be delivered.  A limitation of nfs perhaps or a problem with fscache?  I
am not sure where best to start to debug the problem.  

 

Regards

Ben

 

FS-Cache statistics

Cookies: idx=6 dat=145221 spc=0

Objects: alc=145226 nal=0 avl=145226 ded=143437

ChkAux : non=0 ok=102608 upd=0 obs=2

Pages  : mrk=31558074 unc=31364899

Acquire: n=145227 nul=0 noc=0 ok=145227 nbf=0 oom=0

Lookups: n=145226 neg=42614 pos=102612 crt=42614 tmo=0

Updates: n=0 nul=0 run=0

Relinqs: n=143439 nul=0 wcr=0 rtr=0

AttrChg: n=0 ok=0 nbf=0 oom=0 run=0

Allocs : n=0 ok=0 wt=0 nbf=0 int=0

Allocs : ops=0 owt=0 abt=0

Retrvls: n=200055 ok=157332 wt=28965 nod=42723 nbf=0 int=0 oom=0

Retrvls: ops=200055 owt=13453 abt=0

Stores : n=7141275 ok=7141275 agn=0 nbf=0 oom=0

Stores : ops=903546 run=8044821 pgs=7141275 rxd=7141275 olm=0

VmScan : nos=31171518 gon=0 bsy=0 can=0

Ops    : pend=13454 run=1103601 enq=39680550 can=0 rej=0

Ops    : dfr=242 rel=1103601 gc=242

CacheOp: alo=0 luo=0 luc=0 gro=0

CacheOp: upo=0 dro=0 pto=0 atc=0 syn=0

CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0

 

 

Mountstats:

Stats for 10.0.20.192:/live_clips mounted on /mnt/clips:

  NFS mount options:
ro,vers=3,rsize=32768,wsize=32768,namlen=255,acregmin=3,acregmax=60,acdirmin
=30,acdirmax=60,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.20
.192,mountvers=3,mountport=1234,mountproto=tcp,fsc,local_lock=none

  NFS server capabilities:
caps=0x3fc7,wtmult=512,dtsize=8192,bsize=0,namlen=255

  NFS security flavor: 1  pseudoflavor: 0

 

NFS byte counts:

  applications read 2517191085 bytes via read(2)

  applications wrote 0 bytes via write(2)

  applications read 0 bytes via O_DIRECT read(2)

  applications wrote 0 bytes via O_DIRECT write(2)

  client read 28169649739 bytes via NFS READ

  client wrote 0 bytes via NFS WRITE

 

RPC statistics:

  1795475 RPC requests sent, 1795474 RPC replies received (0 XIDs not found)

  average backlog queue length: 0

 

GETATTR:

        582165 ops (32%)        4 retrans (0%)  0 major timeouts

        avg bytes sent per op: 124      avg bytes received per op: 112

        backlog wait: 0.006519  RTT: 0.351234   total execute time: 0.378106
(milliseconds)

LOOKUP:

        154984 ops (8%)         0 retrans (0%)  0 major timeouts

        avg bytes sent per op: 151      avg bytes received per op: 238

        backlog wait: 0.006775  RTT: 262.135511         total execute time:
262.158558 (milliseconds)

ACCESS:

        176336 ops (9%)         1 retrans (0%)  0 major timeouts

        avg bytes sent per op: 128      avg bytes received per op: 120

        backlog wait: 0.004304  RTT: 1.463734   total execute time: 1.482204
(milliseconds)

READ:

        881948 ops (49%)        0 retrans (0%)  0 major timeouts

        avg bytes sent per op: 136      avg bytes received per op: 32068

        backlog wait: 0.175209  RTT: 40.025127  total execute time:
40.232458 (milliseconds)

FSINFO:

        2 ops (0%)      0 retrans (0%)  0 major timeouts

        avg bytes sent per op: 136      avg bytes received per op: 164

        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000
(milliseconds)

PATHCONF:

        1 ops (0%)      0 retrans (0%)  0 major timeouts

        avg bytes sent per op: 136      avg bytes received per op: 140

        backlog wait: 0.000000  RTT: 0.000000   total execute time: 0.000000
(milliseconds) 

 

[root at jrclips ~]# yum list cachefilesd

Installed Packages

cachefilesd.x86_64
0.10.1-2.el6    

 

[root at jrclips ~]# uname -a

Linux jrclips 2.6.32-131.17.1.el6.x86_64 #1 SMP Wed Oct 5 17:19:54 CDT 2011
x86_64 x86_64 x86_64 GNU/Linux

 

 

Changes to sysctl.conf

#Mkae sure this is used before any nfs file systems are mounted

sunrpc.tcp_slot_table_entries = 128   

 

# These ensure that TIME_WAIT ports either get reused or closed fast.

net.ipv4.tcp_fin_timeout = 1

net.ipv4.tcp_tw_recycle = 1

  

# TCP memory

net.core.rmem_max = 16777216

net.core.rmem_default = 16777216

net.core.wmem_max = 16777216

net.core.wmem_default = 16777216

net.core.netdev_max_backlog = 262144

net.core.somaxconn = 262144

 

net.ipv4.tcp_syncookies = 1

net.ipv4.tcp_max_orphans = 262144

net.ipv4.tcp_max_syn_backlog = 262144

net.ipv4.tcp_synack_retries = 2

net.ipv4.tcp_syn_retries = 2

net.ipv4.tcp_rmem = 4096 262144 16777216

net.ipv4.tcp_wmem = 4096 262144 16777216

 

[root at jrclips ~]# free

                      total       used       free     shared    buffers
cached

Mem:       2053992    1989756      64236          0     124620    1677612

-/+ buffers/cache:     187524    1866468

Swap:      4128760          0    4128760

 

 

 




More information about the Linux-cachefs mailing list