[vdo-devel] Rocky Linux 8.7 & LVM-VDO stability?

Tue Dec 6 00:39:00 UTC 2022

Hello,

until recently I was running a Rocky Linux 8.5 VM (at Proxmox 7 
virtualization solution) with the following config:

kernel-4.18.0-348.23.1.el8_5.x86_64

lvm2-2.03.12-11.el8_5.x86_64

vdo-6.2.5.74-14.el8.x86_64

kmod-kvdo-6.2.5.72-81.el8.x86_64

XFS > VDO > LVM > virtual disk (VirtIO SCSI)

VDO volume was created using the default config, brief summary:

- logical size 1.2x physical size (based on our past tests on the stored 
data)

- compression & deduplication on

- dense index

- write mode async

It was mounted using the following options: defaults,noatime,logbsize=128k

With discards performed periodically via the fstrim.timer.

This was stable during all the uptime (including the time since the whole 
system creation).

A few days ago I finally updated it to RL 8.7 as well as converted the "VDO 
on LVM" to the new LVM-VDO solution using the lvm_import_vdo script. The 
whole process went fine (I already tested it before) and I ended up with the
system running in the desired config.

kernel-4.18.0-425.3.1.el8.x86_64

lvm2-2.03.14-6.el8.x86_64

vdo-6.2.7.17-14.el8.x86_64

kmod-kvdo-6.2.7.17-87.el8.x86_64

The current disk space utilization is around 61% (pretty much the same for 
physical as well as for logical space) and it was never close to 80%.

However it "lasted" for less than a day. During the following night all 
operations on the VDO volume hung (the other non-VDO volumes were still 
usable) and I had to perform a hard restart in order to get it back to work.

The only errors/complaints that I found were the blocked task notifications 
in the console as well as in the /var/log/messages log with the following 
detail (only the 1st occurrence shown).

Dec  4 01:53:01 lts1 kernel: INFO: task xfsaild/dm-4:5148 blocked for more 
than 120 seconds.
Dec  4 01:53:01 lts1 kernel:      Tainted: G           OE    --------- -  - 
4.18.0-425.3.1.el8.x86_64 #1
Dec  4 01:53:01 lts1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_
secs" disables this message.
Dec  4 01:53:01 lts1 kernel: task:xfsaild/dm-4    state:D stack:    0 pid: 
5148 ppid:     2 flags:0x80004080
Dec  4 01:53:01 lts1 kernel: Call Trace:
Dec  4 01:53:01 lts1 kernel: __schedule+0x2d1/0x860
Dec  4 01:53:01 lts1 kernel: ? finish_wait+0x80/0x80
Dec  4 01:53:01 lts1 kernel: schedule+0x35/0xa0
Dec  4 01:53:01 lts1 kernel: io_schedule+0x12/0x40
Dec  4 01:53:01 lts1 kernel: limiterWaitForOneFree+0xc0/0xf0 [kvdo]
Dec  4 01:53:01 lts1 kernel: ? finish_wait+0x80/0x80
Dec  4 01:53:01 lts1 kernel: kvdoMapBio+0xcc/0x2a0 [kvdo]
Dec  4 01:53:01 lts1 kernel: __map_bio+0x47/0x1b0 [dm_mod]
Dec  4 01:53:01 lts1 kernel: dm_make_request+0x1a9/0x4d0 [dm_mod]
Dec  4 01:53:01 lts1 kernel: generic_make_request_no_check+0x202/0x330
Dec  4 01:53:01 lts1 kernel: submit_bio+0x3c/0x160
Dec  4 01:53:01 lts1 kernel: ? bio_add_page+0x46/0x60
Dec  4 01:53:01 lts1 kernel: _xfs_buf_ioapply+0x2af/0x430 [xfs]
Dec  4 01:53:01 lts1 kernel: ? xfs_iextents_copy+0xba/0x170 [xfs]
Dec  4 01:53:01 lts1 kernel: ? xfs_buf_delwri_submit_buffers+0x10c/0x2a0 
[xfs]
Dec  4 01:53:01 lts1 kernel: __xfs_buf_submit+0x63/0x1d0 [xfs]
Dec  4 01:53:01 lts1 kernel: xfs_buf_delwri_submit_buffers+0x10c/0x2a0 [xfs]
Dec  4 01:53:01 lts1 kernel: ? xfsaild+0x26f/0x8c0 [xfs]
Dec  4 01:53:01 lts1 kernel: xfsaild+0x26f/0x8c0 [xfs]
Dec  4 01:53:01 lts1 kernel: ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
Dec  4 01:53:01 lts1 kernel: kthread+0x10b/0x130
Dec  4 01:53:01 lts1 kernel: ? set_kthread_struct+0x50/0x50
Dec  4 01:53:01 lts1 kernel: ret_from_fork+0x1f/0x40

I'm now awaiting another occurrence of this and wondering there the issue 
may be coming from.

Could it be the new LVM-VDO solution, or the kernel itself?

Can you perhaps suggest how to collect more information in such case, or 
provide another tips?

Best regards,

Petr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vdo-devel/attachments/20221206/8259ff5c/attachment.htm>