[linux-lvm] 'vgchange -a y' took very long when scanning snapshots

Olivier Doucet webmaster at ajeux.com
Fri Aug 10 16:16:48 UTC 2012


When rebooting a storage server, activating volumes took hours.
We were able to narrow the problem to 'vgchange -a y' command.

  LVM version:     2.02.87(2)-RHEL6 (2011-10-12)
  Library version: 1.02.66-RHEL6 (2011-10-12)
  Driver version:  4.22.6
(Centos 6.3 distribution)

Same problem encountered with LVM2 2.02.95
Kernel is 2.6.32-279.2.1.el6.centos.plus.x86_64


Note : this version of Centos is shipped with LVM 2.02.95, but when we
encountered this problem we voluntarily downgraded package, but same
behaviour happened.


Issue described in this email was reproduced 4 (out of 4) times with :
vgchange -dv -a n
vgchange -dddddvvvvv -a y

Stalls happens only on snapshots devices. All other devices are
scanned instantly.


last lines printed before long 'stall' are :
#libdm-deptree.c:1923     Loading backup-backup_filer_20120611 table (253:5)
#libdm-deptree.c:1867         Adding target to (253:5): 0 8589934592
snapshot 253:2 253:4 P 8
#ioctl/libdm-iface.c:1628         dm table   (253:5) OF   [16384]
#ioctl/libdm-iface.c:1628         dm reload   (253:5) NF   [16384]

[ stall ]

#libdm-deptree.c:1963         Table size changed from 0 to 8589934592
for backup-backup_filer_20120611
#libdm-deptree.c:1418     Creating backup-backup_filer_20120703-cow
#ioctl/libdm-iface.c:1628         dm create
backup-backup_filer_20120703-cow
LVM-SBDOCZoeaRkw0uExDBg2sgVQj7ZXqmbxG2NIeLyMtosGb5cVd25QWQACwbbknX5R-cow
NF   [16384]
#libdm-deptree.c:1923     Loading backup-backup_filer_20120703-cow table (253:6)
#libdm-deptree.c:1867         Adding target to (253:6): 0 838860800
linear 8:0 21405632512
#ioctl/libdm-iface.c:1628         dm table   (253:6) OF   [16384]
#ioctl/libdm-iface.c:1628         dm reload   (253:6) NF   [16384]
#libdm-deptree.c:1963         Table size changed from 0 to 838860800
for backup-backup_filer_20120703-cow
#libdm-deptree.c:1076     Resuming backup-backup_filer_20120703-cow (253:6)
#libdm-common.c:1259         Udev cookie 0xd4d44fb (semid 163842)
incremented to 2
#libdm-common.c:1500         Udev cookie 0xd4d44fb (semid 163842)
assigned to RESUME task(5) with flags DISABLE_SUBSYSTEM_RULES
DISABLE_DISK_RULES DISABLE_OTHER_RULES LOW_PRIORITY
DISABLE_LIBRARY_FALLBACK (0x3e)
#ioctl/libdm-iface.c:1628         dm resume   (253:6) NF   [16384]
#libdm-common.c:845         backup-backup_filer_20120703-cow: Stacking
NODE_ADD (253,6) 0:6 0660 [trust_udev]
#libdm-common.c:855         backup-backup_filer_20120703-cow: Stacking
NODE_READ_AHEAD 256 (flags=1)
#libdm-deptree.c:1418     Creating backup-backup_filer_20120703
#ioctl/libdm-iface.c:1628         dm create
backup-backup_filer_20120703
LVM-SBDOCZoeaRkw0uExDBg2sgVQj7ZXqmbxG2NIeLyMtosGb5cVd25QWQACwbbknX5R
NF   [16384]
#libdm-deptree.c:1923     Loading backup-backup_filer_20120703 table (253:7)
#libdm-deptree.c:1867         Adding target to (253:7): 0 8589934592
snapshot 253:2 253:6 P 8
#ioctl/libdm-iface.c:1628         dm table   (253:7) OF   [16384]
#ioctl/libdm-iface.c:1628         dm reload   (253:7) NF   [16384]

[ stall again ]

#libdm-deptree.c:1963         Table size changed from 0 to 8589934592
for backup-backup_filer_20120703
#libdm-deptree.c:1418     Creating backup-backup_filer_20120804-cow
#ioctl/libdm-iface.c:1628         dm create
backup-backup_filer_20120804-cow
LVM-SBDOCZoeaRkw0uExDBg2sgVQj7ZXqmbxw32gXviTtaFEraB8Xp2SqZOtA1Xi6dLW-cow
NF   [16384]
#libdm-deptree.c:1923     Loading backup-backup_filer_20120804-cow table (253:8)
#libdm-deptree.c:1867         Adding target to (253:8): 0 838860800
linear 8:0 24970790912
#ioctl/libdm-iface.c:1628         dm table   (253:8) OF   [16384]
#ioctl/libdm-iface.c:1628         dm reload   (253:8) NF   [16384]
#libdm-deptree.c:1963         Table size changed from 0 to 838860800
for backup-backup_filer_20120804-cow
#libdm-deptree.c:1076     Resuming backup-backup_filer_20120804-cow (253:8)
#libdm-common.c:1259         Udev cookie 0xd4d44fb (semid 163842)
incremented to 2
#libdm-common.c:1500         Udev cookie 0xd4d44fb (semid 163842)
assigned to RESUME task(5) with flags DISABLE_SUBSYSTEM_RULES
DISABLE_DISK_RULES DISABLE_OTHER_RULES LOW_PRIORITY
DISABLE_LIBRARY_FALLBACK (0x3e)
#ioctl/libdm-iface.c:1628         dm resume   (253:8) NF   [16384]
#libdm-common.c:845         backup-backup_filer_20120804-cow: Stacking
NODE_ADD (253,8) 0:6 0660 [trust_udev]
#libdm-common.c:855         backup-backup_filer_20120804-cow: Stacking
NODE_READ_AHEAD 256 (flags=1)
#libdm-deptree.c:1418     Creating backup-backup_filer_20120804
#ioctl/libdm-iface.c:1628         dm create
backup-backup_filer_20120804
LVM-SBDOCZoeaRkw0uExDBg2sgVQj7ZXqmbxw32gXviTtaFEraB8Xp2SqZOtA1Xi6dLW
NF   [16384]
#libdm-deptree.c:1923     Loading backup-backup_filer_20120804 table (253:9)
#libdm-deptree.c:1867         Adding target to (253:9): 0 8589934592
snapshot 253:2 253:8 P 8
#ioctl/libdm-iface.c:1628         dm table   (253:9) OF   [16384]
#ioctl/libdm-iface.c:1628         dm reload   (253:9) NF   [16384]

[ stall again ]

#libdm-deptree.c:1963         Table size changed from 0 to 8589934592
for backup-backup_filer_20120804
#libdm-deptree.c:1076     Resuming backup-backup_filer_20120611 (253:5)

[ then all other devices scanned correctly. ]

 After several scans (lvchange -a n / lvchange -a y) it took several
minutes. On first startup, I had to wait an hour !

lvs shows volumes correctly  :
  LV                    VG     Attr     LSize   Pool Origin
Data%  Move Log Copy%  Convert
  backup_filer          backup owi-i-s-   4.00t
  backup_filer_20120611 backup swi-a- 450.00g backup_filer  63.40
  backup_filer_20120703 backup swi-a- 400.00g backup_filer  31.87
  backup_filer_20120804 backup swi-a- 400.00g backup_filer   0.89

FYI, during scan, lvs reports current scanned volume as :
  LV                    VG     Attr   LSize   Origin       Snap%  Move
Log Copy%  Convert
backup_filer_20120611  backup swi-d- 450.00g backup_filer 100.00



During the whole operation, iostat shows reads on device (no write).
First time, it was ~ 20 op / second, with only 4k reads.
We had faster reads in next tries, but always 4k reads.

blktrace shows this :
  8,0    4      382     2.722717405  2270  P   N [vgchange]
  8,0    4      383     2.722717852  2270  I  RS 11406323392 + 8 [vgchange]
  8,0    4      384     2.722718667  2270  D  RS 11406323392 + 8 [vgchange]
  8,0    4      385     2.722971859     0  C  RS 11406323392 + 8 [0]
  8,0    4      386     2.770606623  2270  A  RS 11406325448 + 8 <-
(253,4) 268349128
  8,0    4      387     2.770607053  2270  Q  RS 11406325448 + 8 [vgchange]
  8,0    4      388     2.770607865  2270  G  RS 11406325448 + 8 [vgchange]
  8,0    4      389     2.770608723  2270  P   N [vgchange]
  8,0    4      390     2.770609099  2270  I  RS 11406325448 + 8 [vgchange]
  8,0    4      391     2.770609916  2270  D  RS 11406325448 + 8 [vgchange]
  8,0    4      392     2.770869631     0  C  RS 11406325448 + 8 [0]
  8,0    4      393     2.817461530  2270  A  RS 11406327504 + 8 <-
(253,4) 268351184
  8,0    4      394     2.817461968  2270  Q  RS 11406327504 + 8 [vgchange]
  8,0    4      395     2.817462775  2270  G  RS 11406327504 + 8 [vgchange]



I'm pretty sure this is not a normal behaviour ... if launching LVM
take so much time, there is something definitely wrong.

I kept complete vgchange output if needed.

Any feedback welcome. This is a production server, but I can test new
softwares on scheduled downtimes (this is a backup server, so no need
for 100% uptime).

Olivier




More information about the linux-lvm mailing list