[Linux-cachefs] Kernel trace for 2.6.36.2
Mark Moseley
moseleymark at gmail.com
Sat Jan 22 00:41:42 UTC 2011
On Tue, Jan 4, 2011 at 10:00 AM, Mark Moseley <moseleymark at gmail.com> wrote:
> Latest testing on a 2.6.36.2 kernel, 64-bit, Debian Lenny. Ran under
> heavy load for a few hours before this:
>
> [38607.552070] IP: [<ffffffff811a4187>]
> __fscache_read_or_alloc_pages+0x157/0x360
> [38607.567128] last sysfs file:
> /sys/devices/system/cpu/cpu7/cache/index2/shared_cpu_map
> [38607.604773] Pid: 30970, comm: httpd Not tainted 2.6.36.2 #1
> 0XDN97/PowerEdge R610
> [38607.612286] RIP: 0010:[<ffffffff811a4187>] [<ffffffff811a4187>]
> __fscache_read_or_alloc_pages+0x157/0x360
> [38607.734181] [<ffffffff81248d4b>] __nfs_readpages_from_fscache+0x6b/0x1d0
> [38607.740991] [<ffffffff8122a674>] nfs_readpages+0xd4/0x200
Just to update: I was testing on 2.6.37 and got the exact same
BUG+traceback. It was on a 30gb partition on an SSD, 2.6.37 64-bit,
Debian Lenny. It ran about 24 hrs (and was up to the cachefilesd.conf
limits after a few hours) before hitting this. It would do the same as
2.6.36.2 and every handful of seconds print out "CacheFiles: Error:
Overlong wait for old active object to go away". I'm no good at
reading assembly but it's around here in fs/fscache/page.c in
__fscache_read_or_alloc_pages:
/* we wait for the operation to become active, and then process it
* *here*, in this thread, and not in the thread pool */
ret = fscache_wait_for_retrieval_activation(
object, op,
__fscache_stat(&fscache_n_retrieval_op_waits),
__fscache_stat(&fscache_n_retrievals_object_dead));
if (ret < 0)
goto error;
/* ask the cache to honour the operation */
if (test_bit(FSCACHE_COOKIE_NO_DATA_YET, &object->cookie->flags)) {
fscache_stat(&fscache_n_cop_allocate_pages);
ret = object->cache->ops->allocate_pages(
op, pages, nr_pages, gfp);
fscache_stat_d(&fscache_n_cop_allocate_pages);
} else {
fscache_stat(&fscache_n_cop_read_or_alloc_pages);
ret = object->cache->ops->read_or_alloc_pages(
op, pages, nr_pages, gfp);
fscache_stat_d(&fscache_n_cop_read_or_alloc_pages);
}
Here's the surrounding lines around 0x157 in objdump:
0000000000001015 <__fscache_read_or_alloc_pages+0x145> mov %r14,%rdi
0000000000001018 <__fscache_read_or_alloc_pages+0x148> callq
0000000000000bb0 <fscache_wait_for_retrieval_activation>
000000000000101d <__fscache_read_or_alloc_pages+0x14d> mov %eax,%ebx
000000000000101f <__fscache_read_or_alloc_pages+0x14f> test %eax,%eax
0000000000001021 <__fscache_read_or_alloc_pages+0x151> js
0000000000001073 <__fscache_read_or_alloc_pages+0x1a3>
0000000000001023 <__fscache_read_or_alloc_pages+0x153> mov 0x70(%r14),%rax
0000000000001027 <__fscache_read_or_alloc_pages+0x157> mov 0x40(%rax),%rax
000000000000102b <__fscache_read_or_alloc_pages+0x15b> test $0x4,%al
000000000000102d <__fscache_read_or_alloc_pages+0x15d> je
0000000000001128 <__fscache_read_or_alloc_pages+0x258>
0000000000001033 <__fscache_read_or_alloc_pages+0x163> lock incl
0x0(%rip) # 000000000000103a
<__fscache_read_or_alloc_pages+0x16a>
000000000000103a <__fscache_read_or_alloc_pages+0x16a> jno
0000000000001045 <__fscache_read_or_alloc_pages+0x175>
000000000000103c <__fscache_read_or_alloc_pages+0x16c> lock decl
0x0(%rip) # 0000000000001043
<__fscache_read_or_alloc_pages+0x173>
More information about the Linux-cachefs
mailing list