[Linux-cachefs] cachefiles bug
Romain DEGEZ
romain.degez at smartjog.com
Mon Mar 29 16:33:31 UTC 2010
Dear David,
First of all, thanks for your work.
It looks very promising as we were missing such a nice functionality in the
kernel for so long!
In a production setup 4 servers with 16Gig of ram and dual quad-core xeon
L5410 processors, running a 2.6.33-2-amd64 debian kernel.
These servers are used to send files over http (using apache or lighttpd).
These files are all located on a remote nfs server and localy-cached thanks to
fs-cache and cachefilesd on a local 2 disk raid1 array with a 250gig ext4
filesystem mounted in /var/cache/fscache.
The nfs filesystem is mounted that way:
x.x.x.x:/data on /data type nfs (ro,noatime,tcp,soft,fsc,addr=x.x.x.x)
cachefilesd.conf is :
dir /var/cache/fscache
tag mycache
brun 10%
bcull 7%
bstop 3%
frun 10%
fcull 7%
fstop 3%
#cat /proc/fs/fscache/stats
FS-Cache statistics
Cookies: idx=3 dat=2880 spc=0
Objects: alc=2484 nal=0 avl=2484 ded=2462
ChkAux : non=0 ok=2131 upd=0 obs=70
Pages : mrk=15802814 unc=14993041
Acquire: n=2883 nul=0 noc=252 ok=2631 nbf=252 oom=0
Lookups: n=2484 neg=343 pos=2141 crt=0 tmo=343
Updates: n=0 nul=0 run=0
Relinqs: n=1721 nul=0 wcr=0 rtr=20
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=14741 ok=5400 wt=452 nod=693 nbf=8648 int=0 oom=0
Retrvls: ops=6093 owt=112 abt=0
Stores : n=1972991 ok=1972776 agn=0 nbf=215 oom=0
Stores : ops=999 run=1965351 pgs=1964352 rxd=1972776 olm=0
VmScan : nos=14959114 gon=0 bsy=10 can=8424
Ops : pend=112 run=7092 enq=16438335 can=0 rej=0
Ops : dfr=0 rel=7092 gc=0
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0
And we are seeing a lot of these errors in on all our servers dmesg:
[ 4868.465413] CacheFiles: I/O Error: Unlink failed
[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
[ 4947.320011] CacheFiles: File cache on md3 unregistering
[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
[ 5127.348716] CacheFiles: File cache on md3 registered
[ 7076.871081] CacheFiles: I/O Error: Unlink failed
[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
[ 7116.780891] CacheFiles: File cache on md3 unregistering
[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
[ 7296.813432] CacheFiles: File cache on md3 registered
It is very painfull as it render the cache useless ....
When looking at the source-code, the cause of the "I/O Error: Unlink failed"
which seems to happen somewhere after the "bury_something" function is called
looked pretty obscure to me...
I don't see why any unlink would fail....
I am monitoring this list for some time and tried all the various patches
without success...
Could you please give me a hand to troubleshot this issue ?
Regards,
--
RD
More information about the Linux-cachefs
mailing list