[Linux-cachefs] cachefiles bug

Romain DEGEZ romain.degez at smartjog.com
Mon Mar 29 16:33:31 UTC 2010


Dear David,

First of all, thanks for your work. 
It looks very promising as we were missing such a nice functionality in the 
kernel for so long!

In a production setup 4 servers with 16Gig of ram and dual quad-core xeon 
L5410 processors, running a 2.6.33-2-amd64 debian kernel.

These servers are used to send files over http (using apache or lighttpd).

These files are all located on a remote nfs server and localy-cached thanks to 
fs-cache and cachefilesd on a local 2 disk raid1 array with a 250gig ext4 
filesystem mounted in /var/cache/fscache.

The nfs filesystem is mounted that way:
x.x.x.x:/data on /data type nfs (ro,noatime,tcp,soft,fsc,addr=x.x.x.x)

cachefilesd.conf is :

dir /var/cache/fscache
tag mycache
brun 10%
bcull 7%
bstop 3%
frun 10%
fcull 7%
fstop 3%

#cat /proc/fs/fscache/stats

FS-Cache statistics
Cookies: idx=3 dat=2880 spc=0
Objects: alc=2484 nal=0 avl=2484 ded=2462
ChkAux : non=0 ok=2131 upd=0 obs=70
Pages  : mrk=15802814 unc=14993041
Acquire: n=2883 nul=0 noc=252 ok=2631 nbf=252 oom=0
Lookups: n=2484 neg=343 pos=2141 crt=0 tmo=343
Updates: n=0 nul=0 run=0
Relinqs: n=1721 nul=0 wcr=0 rtr=20
AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
Allocs : n=0 ok=0 wt=0 nbf=0 int=0
Allocs : ops=0 owt=0 abt=0
Retrvls: n=14741 ok=5400 wt=452 nod=693 nbf=8648 int=0 oom=0
Retrvls: ops=6093 owt=112 abt=0
Stores : n=1972991 ok=1972776 agn=0 nbf=215 oom=0
Stores : ops=999 run=1965351 pgs=1964352 rxd=1972776 olm=0
VmScan : nos=14959114 gon=0 bsy=10 can=8424
Ops    : pend=112 run=7092 enq=16438335 can=0 rej=0
Ops    : dfr=0 rel=7092 gc=0
CacheOp: alo=0 luo=0 luc=0 gro=0
CacheOp: upo=0 dro=0 pto=0 atc=0 syn=0
CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0


And we are seeing a lot of these errors in on all our servers dmesg:

[ 4868.465413] CacheFiles: I/O Error: Unlink failed
[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
[ 4947.320011] CacheFiles: File cache on md3 unregistering
[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
[ 5127.348716] CacheFiles: File cache on md3 registered
[ 7076.871081] CacheFiles: I/O Error: Unlink failed
[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
[ 7116.780891] CacheFiles: File cache on md3 unregistering
[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
[ 7296.813432] CacheFiles: File cache on md3 registered

It is very painfull as it render the cache useless ....

When looking at the source-code, the cause of the "I/O Error: Unlink failed" 
which seems to happen somewhere after the "bury_something" function is called 
looked pretty obscure to me...

I don't see why any unlink would fail....

I am monitoring this list for some time and tried all the various patches 
without success...

Could you please give me a hand to troubleshot this issue ?

Regards,

-- 
RD




More information about the Linux-cachefs mailing list