[Linux-cachefs] Kernel trace for 2.6.36.2

Mark Moseley moseleymark at gmail.com
Sat Jan 22 00:41:42 UTC 2011


On Tue, Jan 4, 2011 at 10:00 AM, Mark Moseley <moseleymark at gmail.com> wrote:
> Latest testing on a 2.6.36.2 kernel, 64-bit, Debian Lenny. Ran under
> heavy load for a few hours before this:
>
> [38607.552070] IP: [<ffffffff811a4187>]
> __fscache_read_or_alloc_pages+0x157/0x360
> [38607.567128] last sysfs file:
> /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
> [38607.604773] Pid: 30970, comm: httpd Not tainted 2.6.36.2 #1
> 0XDN97/PowerEdge R610
> [38607.612286] RIP: 0010:[<ffffffff811a4187>]  [<ffffffff811a4187>]
> __fscache_read_or_alloc_pages+0x157/0x360
> [38607.734181]  [<ffffffff81248d4b>] __nfs_readpages_from_fscache+0x6b/0x1d0
> [38607.740991]  [<ffffffff8122a674>] nfs_readpages+0xd4/0x200

Just to update: I was testing on 2.6.37 and got the exact same
BUG+traceback. It was on a 30gb partition on an SSD, 2.6.37 64-bit,
Debian Lenny. It ran about 24 hrs (and was up to the cachefilesd.conf
limits after a few hours) before hitting this. It would do the same as
2.6.36.2 and every handful of seconds print out "CacheFiles: Error:
Overlong wait for old active object to go away". I'm no good at
reading assembly but it's around here in fs/fscache/page.c in
__fscache_read_or_alloc_pages:


        /* we wait for the operation to become active, and then process it
         * *here*, in this thread, and not in the thread pool */
        ret = fscache_wait_for_retrieval_activation(
                object, op,
                __fscache_stat(&fscache_n_retrieval_op_waits),
                __fscache_stat(&fscache_n_retrievals_object_dead));
        if (ret < 0)
                goto error;

        /* ask the cache to honour the operation */
        if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) {
                fscache_stat(&fscache_n_cop_allocate_pages);
                ret = object->cache->ops->allocate_pages(
                        op, pages, nr_pages, gfp);
                fscache_stat_d(&fscache_n_cop_allocate_pages);
        } else {
                fscache_stat(&fscache_n_cop_read_or_alloc_pages);
                ret = object->cache->ops->read_or_alloc_pages(
                        op, pages, nr_pages, gfp);
                fscache_stat_d(&fscache_n_cop_read_or_alloc_pages);
        }


Here's the surrounding lines around 0x157 in objdump:


0000000000001015 <__fscache_read_or_alloc_pages+0x145> mov    %r14,%rdi
0000000000001018 <__fscache_read_or_alloc_pages+0x148> callq
0000000000000bb0 <fscache_wait_for_retrieval_activation>
000000000000101d <__fscache_read_or_alloc_pages+0x14d> mov    %eax,%ebx
000000000000101f <__fscache_read_or_alloc_pages+0x14f> test   %eax,%eax
0000000000001021 <__fscache_read_or_alloc_pages+0x151> js
0000000000001073 <__fscache_read_or_alloc_pages+0x1a3>
0000000000001023 <__fscache_read_or_alloc_pages+0x153> mov    0x70(%r14),%rax
0000000000001027 <__fscache_read_or_alloc_pages+0x157> mov    0x40(%rax),%rax
000000000000102b <__fscache_read_or_alloc_pages+0x15b> test   $0x4,%al
000000000000102d <__fscache_read_or_alloc_pages+0x15d> je
0000000000001128 <__fscache_read_or_alloc_pages+0x258>
0000000000001033 <__fscache_read_or_alloc_pages+0x163> lock incl
0x0(%rip)        # 000000000000103a
<__fscache_read_or_alloc_pages+0x16a>
000000000000103a <__fscache_read_or_alloc_pages+0x16a> jno
0000000000001045 <__fscache_read_or_alloc_pages+0x175>
000000000000103c <__fscache_read_or_alloc_pages+0x16c> lock decl
0x0(%rip)        # 0000000000001043
<__fscache_read_or_alloc_pages+0x173>




More information about the Linux-cachefs mailing list