[dm-devel] Re: Mempool_alloc, bio_alloc_bioset deadlocks

Mon Aug 14 17:29:27 UTC 2006

(cc's added)

On Mon, 14 Aug 2006 19:14:09 +0300
"Pavel Mironchik" <tibor0 at gmail.com> wrote:

> Hi,
> 
> A few days ago device mapper raid1 deadlock was discovered.
> Adrew Morton made patch for that bug in mm tree:
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc4/2.6.18-rc4-mm1/broken-out/dm-fix-deadlock-under-high-i-o-load-in-raid1-setup.patch
> 
> However I found that problem is more serious and depends on mempool.
> I reproduced that very same situation on 2.6.17 with device-mapper
> linear target.
> Here my steps:
>  - I used 2.6.17 kernel for xscale (arm), boot into initrd image
> (initrd is SYSTEM_BOOTING state I assume) !!!!
>  - with the help of evms.sf.net I made md raid1 with device mapper
> volume on top of it.
> -  create xfs volume ; mkfs.xfs /dev/evms/vol ; mount /dev/evms/vol /mnt
> -  run: cat /dev/zero > /mnt/test &
> - after some period cat, pdflush and raid1d threads went into deadlock
> state, and I got the following (sysrq dump):
> 
> ....
> pdflush       D C02B5C88     0  1523      6          1583  1498 (L-TLB)
> [<c02b5740>] (schedule+0x0/0x620) from [<c02b665c>] (io_schedule+0x34/0x5c)
> [<c02b6628>] (io_schedule+0x0/0x5c) from [<c005bca4>] (mempool_alloc+0xbc/0xd8)
>  r5 = C05EC3A0  r4 = 00011210
> [<c005bbe8>] (mempool_alloc+0x0/0xd8) from [<c007c9fc>]
> (bio_alloc_bioset+0xd4/0x144)
>  r8 = C05EC3E0  r7 = 00000010  r6 = C6D764A0  r5 = 00000000
>  r4 = 0000000C
> [<c007c928>] (bio_alloc_bioset+0x0/0x144) from [<c007ccb4>]
> (bio_clone+0x24/0x48)
>  r8 = C6D76500  r7 = C6D6C120  r6 = C6D6C120  r5 = 00000004
>  r4 = C6D76500
> [<c007cc90>] (bio_clone+0x0/0x48) from [<bf02f72c>]
> (make_request+0x4b4/0x65c [raid1])
>  r4 = 00000004
> [<bf02f278>] (make_request+0x0/0x65c [raid1]) from [<c01a11bc>]
> (generic_make_request+0x1e4/0x204)
> [<c01a0fd8>] (generic_make_request+0x0/0x204) from [<c023c3d4>]
> (__map_bio+0x78/0xb8)
> [<c023c35c>] (__map_bio+0x0/0xb8) from [<c023c664>] (__split_bio+0x1dc/0x544)
>  r6 = CFC27ADC  r5 = CFC27B0C  r4 = CFC27AFC
> [<c023c488>] (__split_bio+0x0/0x544) from [<c023cadc>] (dm_request+0x110/0x120)
> [<c023c9cc>] (dm_request+0x0/0x120) from [<c01a11bc>]
> (generic_make_request+0x1e4/0x204)
>  r6 = C6D76560  r5 = 00000000  r4 = 00000008
> [<c01a0fd8>] (generic_make_request+0x0/0x204) from [<c01a12a8>]
> (submit_bio+0xcc/0xf0)
> [<c01a11dc>] (submit_bio+0x0/0xf0) from [<c0079fb0>] (submit_bh+0x178/0x1a8)
>  r7 = C6D76560  r6 = 00000000  r5 = 0001568A  r4 = 00000000
> [<c0079e38>] (submit_bh+0x0/0x1a8) from [<c01881bc>]
> (xfs_submit_page+0xdc/0x124)
> [<c01880e0>] (xfs_submit_page+0x0/0x124) from [<c0188470>]
> (xfs_convert_page+0x26c/0x28c)
> 
> and this ...
> 
> md255_raid1   D C02B5C88     0  1490      6          1498  1482 (L-TLB)
> [<c02b5740>] (schedule+0x0/0x620) from [<c02b665c>] (io_schedule+0x34/0x5c)
> [<c02b6628>] (io_schedule+0x0/0x5c) from [<c005bca4>] (mempool_alloc+0xbc/0xd8)
>  r5 = C05EC3A0  r4 = 00011210
> [<c005bbe8>] (mempool_alloc+0x0/0xd8) from [<c007c9fc>]
> (bio_alloc_bioset+0xd4/0x144)
>  r8 = C05EC3E0  r7 = 00000010  r6 = C6D76440  r5 = 00000000
>  r4 = 0000000C
> [<c007c928>] (bio_alloc_bioset+0x0/0x144) from [<c007ccb4>]
> (bio_clone+0x24/0x48)
>  r8 = 00000000  r7 = 00000000  r6 = 0009F4D8  r5 = 00000000
>  r4 = CC09F860
> [<c007cc90>] (bio_clone+0x0/0x48) from [<c023c45c>] (clone_bio+0x28/0x54)
>  r4 = 00000001
> [<c023c434>] (clone_bio+0x0/0x54) from [<c023c654>] (__split_bio+0x1cc/0x544)
>  r7 = 00000000  r6 = CECB5DD0  r5 = CECB5E00  r4 = CECB5DF0
> [<c023c488>] (__split_bio+0x0/0x544) from [<c023cadc>] (dm_request+0x110/0x120)
> [<c023c9cc>] (dm_request+0x0/0x120) from [<c01a11bc>]
> (generic_make_request+0x1e4/0x204)
>  r6 = CC09F860  r5 = 00000000  r4 = 00000008
> [<c01a0fd8>] (generic_make_request+0x0/0x204) from [<bf02fff4>]
> (raid1d+0xa4/0xf18 [raid1])
> [<bf02ff50>] (raid1d+0x0/0xf18 [raid1]) from [<c0235a60>]
> (md_thread+0x124/0x140)
> 
> 
> you can see that those threads are locked inside of mempool_alloc.
> but I prepared patch:
> 
> diff --git a/mm/mempool.c b/mm/mempool.c
> index fe6e052..10a7b1e 100644
> --- a/mm/mempool.c
> +++ b/mm/mempool.c
> @@ -239,7 +239,7 @@ repeat_alloc:
>         prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
>         smp_mb();
>         if (!pool->curr_nr)
> -               io_schedule();
> +               io_schedule_timeout(5*HZ);
>         finish_wait(&pool->wait, &wait);
> 
>         goto repeat_alloc;
> 
> 
> probably, I suppose this could be another solution for  raid1
> deadlock problem described here:
> http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc4/2.6.18-rc4-mm1/broken-out/dm-fix-deadlock-under-high-i-o-load-in-raid1-setup.patch
> Of cource, that patch helped me with my device mapper issues.
> 
> Please dont be very rigorously about my patch, this is way of avoiding
> problem but not solving.

dm has a number of problems in this area - the developers are aware of them
and are working on getting them fixed, but it's apparently fairly complex.

The above patch is pretty harmless and I'd be OK with putting it into
2.6.18 as a temporary make-dm-mostly-work-for-release thing.

However I'd want to take it out again in 2.6.19-rc1, because the patch will
convert a complete hang into extreme-suckiness-but-it-appears-to-work. 
We'd prefer the hang, so the problem gets noticed and fixed for real.

Alasdair, what do you think?