[linux-lvm] LVM crash maybe due to a drbd issue

Maxence DUNNEWIND maxence at dunnewind.net
Mon Dec 7 16:40:18 UTC 2009


Hi,

I'm using drbd with lv LVM. The layout is : 
drbd -> lv -> vg -> pv . I'm trying to do a vgchange -ay on the underlying lv,
which is hanging and use 50% of a cpu. An echo w > /proc/sysrq-trigger gives me
an interesting thing : 

Dec  7 13:35:26 z2-3 kernel: [8617743.246522] SysRq : Show Blocked State
Dec  7 13:35:26 z2-3 kernel: [8617743.246548]   task                        PC
stack   pid father
Dec  7 13:35:26 z2-3 kernel: [8617743.246561] pdflush       D ffff8800280350c0
0 14401      2
Dec  7 13:35:26 z2-3 kernel: [8617743.246564]  ffff88012f04f8d0 0000000000000046
ffff88000150b808 ffffffff80215d6a
Dec  7 13:35:26 z2-3 kernel: [8617743.246566]  ffff88000150b7d0 00000000000120c0
000000000000e250 ffff88012f0ac770
Dec  7 13:35:26 z2-3 kernel: [8617743.246569]  ffff88012f0aca60 0000000000000003
0000000000000086 ffffffff8023bc87
Dec  7 13:35:26 z2-3 kernel: [8617743.246571] Call Trace:
Dec  7 13:35:26 z2-3 kernel: [8617743.246578]  [<ffffffff80215d6a>] ?
native_sched_clock+0x2e/0x5b
Dec  7 13:35:26 z2-3 kernel: [8617743.246582]  [<ffffffff8023bc87>] ?
try_to_wake_up+0x212/0x224
Dec  7 13:35:26 z2-3 kernel: [8617743.246585]  [<ffffffff8024accf>] ?
lock_timer_base+0x26/0x4c
Dec  7 13:35:26 z2-3 kernel: [8617743.246589]  [<ffffffff804b4500>] ?
schedule+0x9/0x1d
Dec  7 13:35:26 z2-3 kernel: [8617743.246592]  [<ffffffff804b46eb>] ?
schedule_timeout+0x90/0xb6
Dec  7 13:35:26 z2-3 kernel: [8617743.246594]  [<ffffffff8024adc8>] ?
process_timeout+0x0/0x5
Dec  7 13:35:26 z2-3 kernel: [8617743.246596]  [<ffffffff804b46e6>] ?
schedule_timeout+0x8b/0xb6
Dec  7 13:35:26 z2-3 kernel: [8617743.246598]  [<ffffffff804b3ac3>] ?
io_schedule_timeout+0x66/0xae
Dec  7 13:35:26 z2-3 kernel: [8617743.246601]  [<ffffffff802a26f0>] ?
congestion_wait+0x66/0x80
Dec  7 13:35:26 z2-3 kernel: [8617743.246604]  [<ffffffff8025473e>] ?
autoremove_wake_function+0x0/0x2e
Dec  7 13:35:26 z2-3 kernel: [8617743.246607]  [<ffffffff802d9836>] ?
writeback_inodes+0x9a/0xce
Dec  7 13:35:26 z2-3 kernel: [8617743.246610]  [<ffffffff8029888c>] ?
wb_kupdate+0xc8/0x121
Dec  7 13:35:26 z2-3 kernel: [8617743.246613]  [<ffffffff802994d7>] ?
pdflush+0x159/0x23b
Dec  7 13:35:26 z2-3 kernel: [8617743.246615]  [<ffffffff802987c4>] ?
wb_kupdate+0x0/0x121
Dec  7 13:35:26 z2-3 kernel: [8617743.246617]  [<ffffffff8029937e>] ?
pdflush+0x0/0x23b
Dec  7 13:35:26 z2-3 kernel: [8617743.246620]  [<ffffffff80254382>] ?
kthread+0x54/0x80
Dec  7 13:35:26 z2-3 kernel: [8617743.246622]  [<ffffffff80210aca>] ?
child_rip+0xa/0x20
Dec  7 13:35:26 z2-3 kernel: [8617743.246638]  [<ffffffffa02df103>] ?
handle_halt+0x0/0x12 [kvm_intel]
Dec  7 13:35:26 z2-3 kernel: [8617743.246641]  [<ffffffff80256fa5>] ?
hrtimer_cancel+0xc/0x16
Dec  7 13:35:26 z2-3 kernel: [8617743.246643]  [<ffffffff8025432e>] ?
kthread+0x0/0x80
Dec  7 13:35:26 z2-3 kernel: [8617743.246645]  [<ffffffff80210ac0>] ?
child_rip+0x0/0x20
Dec  7 13:35:26 z2-3 kernel: [8617743.246655] lvchange      D ffff88002804f0c0
0  4969      1
Dec  7 13:35:26 z2-3 kernel: [8617743.246657]  ffff88000150b7d0 ffffffff804b4484
0000000000000096 ffff880084255948
Dec  7 13:35:26 z2-3 kernel: [8617743.246659]  0000000000000001 00000000000120c0
000000000000e250 ffff88000145c040
Dec  7 13:35:26 z2-3 kernel: [8617743.246662]  ffff88000145c330 0000000180254747
ffff8800382154d0 ffffffff802348e4
Dec  7 13:35:26 z2-3 kernel: [8617743.246664] Call Trace:
Dec  7 13:35:26 z2-3 kernel: [8617743.246667]  [<ffffffff804b4484>] ?
thread_return+0x3e/0xb1
Dec  7 13:35:26 z2-3 kernel: [8617743.246670]  [<ffffffff802348e4>] ?
__wake_up_common+0x44/0x73
Dec  7 13:35:26 z2-3 kernel: [8617743.246672]  [<ffffffff804b4500>] ?
schedule+0x9/0x1d
Dec  7 13:35:26 z2-3 kernel: [8617743.246685]  [<ffffffffa0275491>] ?
inc_ap_bio+0xde/0x12e [drbd]
Dec  7 13:35:26 z2-3 kernel: [8617743.246687]  [<ffffffff8025473e>] ?
autoremove_wake_function+0x0/0x2e
Dec  7 13:35:26 z2-3 kernel: [8617743.246698]  [<ffffffffa027768a>] ?
drbd_make_request_26+0x36f/0x485 [drbd]
Dec  7 13:35:26 z2-3 kernel: [8617743.246701]  [<ffffffff8033d537>] ?
generic_make_request+0x288/0x2d2
Dec  7 13:35:26 z2-3 kernel: [8617743.246704]  [<ffffffff8035318f>] ?
delay_tsc+0x26/0x57
Dec  7 13:35:26 z2-3 kernel: [8617743.246706]  [<ffffffff8033d647>] ?
submit_bio+0xc6/0xcd
Dec  7 13:35:26 z2-3 kernel: [8617743.246709]  [<ffffffff802dd14b>] ?
submit_bh+0xe3/0x103
Dec  7 13:35:26 z2-3 kernel: [8617743.246711]  [<ffffffff802df619>] ?
__block_write_full_page+0x1d6/0x2ac
Dec  7 13:35:26 z2-3 kernel: [8617743.246713]  [<ffffffff802de3e1>] ?
end_buffer_async_write+0x0/0x116
Dec  7 13:35:26 z2-3 kernel: [8617743.246716]  [<ffffffff802e1624>] ?
blkdev_get_block+0x0/0x57
Dec  7 13:35:26 z2-3 kernel: [8617743.246718]  [<ffffffff80297e82>] ?
__writepage+0xa/0x25
Dec  7 13:35:26 z2-3 kernel: [8617743.246720]  [<ffffffff802985cc>] ?
write_cache_pages+0x206/0x322
Dec  7 13:35:26 z2-3 kernel: [8617743.246722]  [<ffffffff80215db6>] ?
read_tsc+0xa/0x20
Dec  7 13:35:26 z2-3 kernel: [8617743.246725]  [<ffffffff80297e78>] ?
__writepage+0x0/0x25
Dec  7 13:35:26 z2-3 kernel: [8617743.246727]  [<ffffffff802e3e9f>] ?
__blockdev_direct_IO+0x99a/0xa41
Dec  7 13:35:26 z2-3 kernel: [8617743.246729]  [<ffffffff802e3e9f>] ?
__blockdev_direct_IO+0x99a/0xa41
Dec  7 13:35:26 z2-3 kernel: [8617743.246732]  [<ffffffff80298724>] ?
do_writepages+0x20/0x2d
Dec  7 13:35:26 z2-3 kernel: [8617743.246734]  [<ffffffff80292a73>] ?
__filemap_fdatawrite_range+0x4c/0x57
Dec  7 13:35:26 z2-3 kernel: [8617743.246736]  [<ffffffff80292aa4>] ?
filemap_write_and_wait_range+0x26/0x52
Dec  7 13:35:26 z2-3 kernel: [8617743.246738]  [<ffffffff802932ca>] ?
generic_file_aio_read+0xd7/0x54f
Dec  7 13:35:26 z2-3 kernel: [8617743.246749]  [<ffffffffa027b84a>] ?
drbd_open+0x63/0x6d [drbd]
Dec  7 13:35:26 z2-3 kernel: [8617743.246752]  [<ffffffff802c0b1b>] ?
do_sync_read+0xce/0x113
Dec  7 13:35:26 z2-3 kernel: [8617743.246754]  [<ffffffff802bf4c4>] ?
__dentry_open+0x16f/0x260
Dec  7 13:35:26 z2-3 kernel: [8617743.246756]  [<ffffffff802c3e26>] ?
cp_new_stat+0xe9/0xfc
Dec  7 13:35:26 z2-3 kernel: [8617743.246758]  [<ffffffff8025473e>] ?
autoremove_wake_function+0x0/0x2e
Dec  7 13:35:26 z2-3 kernel: [8617743.246761]  [<ffffffff802e1907>] ?
block_ioctl+0x38/0x3c
Dec  7 13:35:26 z2-3 kernel: [8617743.246763]  [<ffffffff802cc016>] ?
vfs_ioctl+0x21/0x6c
Dec  7 13:35:26 z2-3 kernel: [8617743.246765]  [<ffffffff802cc48c>] ?
do_vfs_ioctl+0x42b/0x464
Dec  7 13:35:26 z2-3 kernel: [8617743.246767]  [<ffffffff802c1585>] ?
vfs_read+0xa6/0xff
Dec  7 13:35:26 z2-3 kernel: [8617743.246770]  [<ffffffff802c169a>] ?
sys_read+0x45/0x6e
Dec  7 13:35:26 z2-3 kernel: [8617743.246773]  [<ffffffff8020fa42>] ?
system_call_fastpath+0x16/0x1b
Dec  7 13:35:26 z2-3 kernel: [8617743.246779] Sched Debug Version: v0.09,
2.6.30-1-amd64 #1
Dec  7 13:35:26 z2-3 kernel: [8617743.246780] now at 8617743246.777559 msecs
Dec  7 13:35:26 z2-3 kernel: [8617743.246781]   .jiffies
: 6449328107
Dec  7 13:35:26 z2-3 kernel: [8617743.246783]   .sysctl_sched_latency
: 40.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246784]   .sysctl_sched_min_granularity
: 8.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246786]   .sysctl_sched_wakeup_granularity
: 10.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246787]   .sysctl_sched_child_runs_first
: 0.000001
Dec  7 13:35:26 z2-3 kernel: [8617743.246788]   .sysctl_sched_features
: 113917
Dec  7 13:35:26 z2-3 kernel: [8617743.246790] 
Dec  7 13:35:26 z2-3 kernel: [8617743.246790] cpu#0, 3000.452 MHz
Dec  7 13:35:26 z2-3 kernel: [8617743.246792]   .nr_running                    :
2
Dec  7 13:35:26 z2-3 kernel: [8617743.246793]   .load                          :
4145
Dec  7 13:35:26 z2-3 kernel: [8617743.246794]   .nr_switches                   :
185493369405
Dec  7 13:35:26 z2-3 kernel: [8617743.246795]   .nr_load_updates               :
239239723
Dec  7 13:35:26 z2-3 kernel: [8617743.246796]   .nr_uninterruptible            :
55617
Dec  7 13:35:26 z2-3 kernel: [8617743.246798]   .next_balance                  :
6449.328222
Dec  7 13:35:26 z2-3 kernel: [8617743.246799]   .curr->pid                     :
16194
Dec  7 13:35:26 z2-3 kernel: [8617743.246800]   .clock                         :
8617743246.431307
Dec  7 13:35:26 z2-3 kernel: [8617743.246802]   .cpu_load[0]                   :
3121
Dec  7 13:35:26 z2-3 kernel: [8617743.246803]   .cpu_load[1]                   :
3121
Dec  7 13:35:26 z2-3 kernel: [8617743.246804]   .cpu_load[2]                   :
3121
Dec  7 13:35:26 z2-3 kernel: [8617743.246805]   .cpu_load[3]                   :
3121
Dec  7 13:35:26 z2-3 kernel: [8617743.246806]   .cpu_load[4]                   :
3121
Dec  7 13:35:26 z2-3 kernel: [8617743.246807] 
Dec  7 13:35:26 z2-3 kernel: [8617743.246808] cfs_rq[0]:/
Dec  7 13:35:26 z2-3 kernel: [8617743.246809]   .exec_clock                    :
0.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246810]   .MIN_vruntime                  :
352176404.012861
Dec  7 13:35:26 z2-3 kernel: [8617743.246812]   .min_vruntime                  :
352176404.012861
Dec  7 13:35:26 z2-3 kernel: [8617743.246813]   .max_vruntime                  :
352176404.012861
Dec  7 13:35:26 z2-3 kernel: [8617743.246814]   .spread                        :
0.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246816]   .spread0                       :
0.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246817]   .nr_running                    :
2
Dec  7 13:35:26 z2-3 kernel: [8617743.246818]   .load                          :
4145
Dec  7 13:35:26 z2-3 kernel: [8617743.246819]   .nr_spread_over                :
0
Dec  7 13:35:26 z2-3 kernel: [8617743.246820]   .shares                        :
0
Dec  7 13:35:26 z2-3 kernel: [8617743.246821] 
Dec  7 13:35:26 z2-3 kernel: [8617743.246822] rt_rq[0]:
Dec  7 13:35:26 z2-3 kernel: [8617743.246823]   .rt_nr_running                 :
0
Dec  7 13:35:26 z2-3 kernel: [8617743.246824]   .rt_throttled                  :
0
Dec  7 13:35:26 z2-3 kernel: [8617743.246825]   .rt_time                       :
0.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246826]   .rt_runtime                    :
950.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246827] 
Dec  7 13:35:26 z2-3 kernel: [8617743.246828] runnable tasks:
Dec  7 13:35:26 z2-3 kernel: [8617743.246829]             task   PID
tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
Dec  7 13:35:26 z2-3 kernel: [8617743.246830]
----------------------------------------------------------------------------------------------------------
Dec  7 13:35:26 z2-3 kernel: [8617743.246832]          kswapd0   219
352176404.012861 79827478108   115               0               0
0.000000               0.000000               0.000000 /
Dec  7 13:35:26 z2-3 kernel: [8617743.246843] R           bash 16194
352176364.023477      5733   120               0               0
0.000000               0.000000               0.000000 /
Dec  7 13:35:26 z2-3 kernel: [8617743.246848] 
Dec  7 13:35:26 z2-3 kernel: [8617743.246849] cpu#1, 3000.452 MHz
Dec  7 13:35:26 z2-3 kernel: [8617743.246850]   .nr_running                    :
3
Dec  7 13:35:26 z2-3 kernel: [8617743.246851]   .load                          :
1024
Dec  7 13:35:26 z2-3 kernel: [8617743.246852]   .nr_switches                   :
227249168097
Dec  7 13:35:26 z2-3 kernel: [8617743.246854]   .nr_load_updates               :
236708707
Dec  7 13:35:26 z2-3 kernel: [8617743.246855]   .nr_uninterruptible            :
-55614
Dec  7 13:35:26 z2-3 kernel: [8617743.246856]   .next_balance                  :
6449.328121
Dec  7 13:35:26 z2-3 kernel: [8617743.246857]   .curr->pid                     :
32275
Dec  7 13:35:26 z2-3 kernel: [8617743.246859]   .clock                         :
8617743246.856932
Dec  7 13:35:26 z2-3 kernel: [8617743.246860]   .cpu_load[0]                   :
3072
Dec  7 13:35:26 z2-3 kernel: [8617743.246861]   .cpu_load[1]                   :
3072
Dec  7 13:35:26 z2-3 kernel: [8617743.246862]   .cpu_load[2]                   :
3072
Dec  7 13:35:26 z2-3 kernel: [8617743.246863]   .cpu_load[3]                   :
3072
Dec  7 13:35:26 z2-3 kernel: [8617743.246864]   .cpu_load[4]                   :
3072
Dec  7 13:35:26 z2-3 kernel: [8617743.246865] 
Dec  7 13:35:26 z2-3 kernel: [8617743.246866] cfs_rq[1]:/
Dec  7 13:35:26 z2-3 kernel: [8617743.246867]   .exec_clock                    :
0.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246869]   .MIN_vruntime                  :
367012701.328887
Dec  7 13:35:26 z2-3 kernel: [8617743.246870]   .min_vruntime                  :
367012741.328887
Dec  7 13:35:26 z2-3 kernel: [8617743.246871]   .max_vruntime                  :
367012701.328887
Dec  7 13:35:26 z2-3 kernel: [8617743.246873]   .spread                        :
0.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246874]   .spread0                       :
14836337.316026
Dec  7 13:35:26 z2-3 kernel: [8617743.246875]   .nr_running                    :
1
Dec  7 13:35:26 z2-3 kernel: [8617743.246876]   .load                          :
3072
Dec  7 13:35:26 z2-3 kernel: [8617743.246878]   .nr_spread_over                :
0
Dec  7 13:35:26 z2-3 kernel: [8617743.246879]   .shares                        :
0
Dec  7 13:35:26 z2-3 kernel: [8617743.246880] 
Dec  7 13:35:26 z2-3 kernel: [8617743.246880] rt_rq[1]:
Dec  7 13:35:26 z2-3 kernel: [8617743.246881]   .rt_nr_running                 :
0
Dec  7 13:35:26 z2-3 kernel: [8617743.246882]   .rt_throttled                  :
0
Dec  7 13:35:26 z2-3 kernel: [8617743.246883]   .rt_time                       :
0.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246885]   .rt_runtime                    :
950.000000
Dec  7 13:35:26 z2-3 kernel: [8617743.246886] 
Dec  7 13:35:26 z2-3 kernel: [8617743.246886] runnable tasks:
Dec  7 13:35:26 z2-3 kernel: [8617743.246887]             task   PID
tree-key  switches  prio     exec-runtime         sum-exec        sum-sleep
Dec  7 13:35:26 z2-3 kernel: [8617743.246888]
----------------------------------------------------------------------------------------------------------
Dec  7 13:35:26 z2-3 kernel: [8617743.246891]              kvm 32275
367012741.340210 76292027847   120               0               0
0.000000               0.000000               0.000000 /
Dec  7 13:35:26 z2-3 kernel: [8617743.246896]  drbd16_receiver 30799
367012701.341954 123640574974   120               0               0
0.000000               0.000000               0.000000 /
Dec  7 13:35:26 z2-3 kernel: [8617743.246901] 


In the second stack trace, I can see a call to drbd_make_request_26, then to
inc_ap_bio which seems to be blocked.

Anyway, the drbddevice using this lv is down, so it shouldn't block the lvchange
(I checked with drbdsetup /dev/drbdXX show, only the syncer part is shown).

If someone has any informations about this bug ? Should I report it  (I will
also mail it on lvm mailing list).

Cheers,

Maxence

-- 
Maxence DUNNEWIND
Contact : maxence at dunnewind.net
Site : http://www.dunnewind.net
06 32 39 39 93
GPG : 18AE 61E4 D0B0 1C7C AAC9  E40D 4D39 68DB 0D2E B533
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://listman.redhat.com/archives/linux-lvm/attachments/20091207/8723fd71/attachment.sig>


More information about the linux-lvm mailing list