[Linux-cachefs] Kernel BUG: CacheFiles: Error: Unexpected object collision

Tue May 18 16:45:00 UTC 2010

On Wed, May 12, 2010 at 2:28 PM, Mark Moseley <moseleymark at gmail.com> wrote:
> I've been running cachefilesd 0.10.1 since yesterday on this box and
> got this (attached) BUG traceback. System was unresponsive after that.
> Kernel is 2.6.33.3 with the suite of patches that David H put out the
> other day in the thread "Possible patch for CacheFiles: I/O Error:
> Unlink failed" (I actually applied the broken-out patches repackaged
> by Romain DEGEZ). Without the patches, cachefilesd dies after about 45
> minutes with the "Unlink failed" error. With the patches, it's run all
> the way since yesterday afternoon before dying a few minutes ago with
> this error (that I've not seen before). The system is a Dell Poweredge
> 1950, running Debian Lenny 32-bit, with a fairly NFS-intensive
> workload. I don't have the exact disk usage from right before it died
> but a 'df' approx 30 mins earlier showed that it had a little shy of 9
> gig used in the cache (with 58g free). I didn't do a df -i any time
> recently on it, so I don't know how many entries were in there but the
> vast majority is html and image files, so probably averaging in the
> 1-100k range, so quite a few entries.
>
> A few hours ago I happened to look at the /sys stats (but not since,
> so this is probably a few hours prior to BUG):
>
> # cat /proc/fs/fscache/stats
> FS-Cache statistics
> Cookies: idx=5436 dat=876190 spc=0
> Objects: alc=796439 nal=0 avl=796402 ded=687802
> ChkAux : non=0 ok=457637 upd=0 obs=572
> Pages  : mrk=3678901 unc=3250437
> Acquire: n=881626 nul=0 noc=0 ok=881626 nbf=0 oom=0
> Lookups: n=796793 neg=339151 pos=457288 crt=339151 tmo=354
> Updates: n=0 nul=0 run=0
> Relinqs: n=772550 nul=0 wcr=37 rtr=14494
> AttrChg: n=0 ok=0 nbf=0 oom=0 run=0
> Allocs : n=0 ok=0 wt=0 nbf=0 int=0
> Allocs : ops=0 owt=0 abt=0
> Retrvls: n=1008965 ok=541779 wt=229502 nod=376000 nbf=91186 int=0 oom=0
> Retrvls: ops=917779 owt=232622 abt=0
> Stores : n=1547630 ok=1547630 agn=0 nbf=0 oom=0
> Stores : ops=387729 run=1935352 pgs=1547623 rxd=1547630 olm=0
> VmScan : nos=3218178 gon=0 bsy=4 can=7
> Ops    : pend=232917 run=1305508 enq=4102804 can=0 rej=0
> Ops    : dfr=903 rel=1305508 gc=903
> CacheOp: alo=0 luo=0 luc=0 gro=0
> CacheOp: upo=0 dro=0 pto=0 atc=0 syn=0
> CacheOp: rap=0 ras=0 alp=0 als=0 wrp=0 ucp=0 dsp=0
>
>
> # cat /proc/fs/cachefiles/histogram
> JIFS  SECS  LOOKUPS   MKDIRS    CREATES
> ===== ===== ========= ========= =========
>   0  0.000   1694602      7529    336885
>   1  0.010      4126        50      1758
>   2  0.020       201         4       139
>   3  0.030        46         1        42
>   4  0.040        21         1        26
>   5  0.050        22         0        21
>   6  0.060        13         0        24
>   7  0.070         4         0        15
>   8  0.080        14         0        21
>   9  0.090        11         0        16
>  10  0.100         8         0        14
>  11  0.110         7         0        13
>  12  0.120         6         0        13
>  13  0.130        13         0         8
>  14  0.140        10         1        16
>  15  0.150         6         1         9
>  16  0.160         5         0         7
>  17  0.170         3         2        12
>  18  0.180         3         0         8
>  19  0.190         3         0         5
>  20  0.200         4         0         9
>  21  0.210         5         0         3
>  22  0.220         3         0        11
>  23  0.230         0         0         6
>  24  0.240         1         0         7
>  25  0.250         1         0         9
>  26  0.260         4         0         7
>  27  0.270         1         0         4
>  28  0.280         1         0         6
>  29  0.290         4         0         1
>  30  0.300         1         0         7
>  31  0.310         1         0         5
>  32  0.320         2         0         5
>  33  0.330         2         0         3
>  34  0.340         1         1         2
>  35  0.350         0         0         6
>  36  0.360         0         0         1
>  37  0.370         0         1         4
>  38  0.380         0         0         6
>  39  0.390         0         0         1
>  40  0.400         0         0         5
>  41  0.410         2         0         4
>  42  0.420         0         0         4
>  43  0.430         0         0         5
>  44  0.440         1         0         4
>  45  0.450         1         0         1
>  46  0.460         1         0         1
>  47  0.470         0         0         2
>  48  0.480         0         0         2
>  49  0.490         0         0         2
>  51  0.510         0         0         3
>  52  0.520         1         0         3
>  53  0.530         0         0         3
>  54  0.540         0         0         2
>  55  0.550         0         0         1
>  56  0.560         0         0         2
>  57  0.570         0         0         1
>  58  0.580         0         0         2
>  59  0.590         1         0         2
>  60  0.600         0         0         2
>  61  0.610         0         0         1
>  62  0.620         1         0         1
>  66  0.660         0         0         1
>  69  0.690         0         0         1
>  71  0.710         0         1         0
>  72  0.720         0         0         1
>  73  0.730         0         0         1
>  74  0.740         0         0         1
>  76  0.760         0         0         2
>  78  0.780         0         0         1
>  81  0.810         0         0         2
>  82  0.820         0         0         2
>  83  0.830         0         0         1
>  89  0.890         0         0         1
>  99  0.990         0         0         6
>
> I'll be happy to try anything out, either patch-wise or research-wise. thx
>

Anybody else seen this error before? The comments in the code say:
/* an old object from a previous incarnation is hogging the slot - we
         * need to wait for it to be destroyed */

If it's an object hanging around since a previous incarnation, does
that mean that it's better to wipe the cache/ directory at each
startup?