[vdo-devel] Rocky Linux 8.7 & LVM-VDO stability?
hostalp at post.cz
hostalp at post.cz
Tue Dec 6 00:39:00 UTC 2022
Hello,
until recently I was running a Rocky Linux 8.5 VM (at Proxmox 7
virtualization solution) with the following config:
kernel-4.18.0-348.23.1.el8_5.x86_64
lvm2-2.03.12-11.el8_5.x86_64
vdo-6.2.5.74-14.el8.x86_64
kmod-kvdo-6.2.5.72-81.el8.x86_64
XFS > VDO > LVM > virtual disk (VirtIO SCSI)
VDO volume was created using the default config, brief summary:
- logical size 1.2x physical size (based on our past tests on the stored
data)
- compression & deduplication on
- dense index
- write mode async
It was mounted using the following options: defaults,noatime,logbsize=128k
With discards performed periodically via the fstrim.timer.
This was stable during all the uptime (including the time since the whole
system creation).
A few days ago I finally updated it to RL 8.7 as well as converted the "VDO
on LVM" to the new LVM-VDO solution using the lvm_import_vdo script. The
whole process went fine (I already tested it before) and I ended up with the
system running in the desired config.
kernel-4.18.0-425.3.1.el8.x86_64
lvm2-2.03.14-6.el8.x86_64
vdo-6.2.7.17-14.el8.x86_64
kmod-kvdo-6.2.7.17-87.el8.x86_64
The current disk space utilization is around 61% (pretty much the same for
physical as well as for logical space) and it was never close to 80%.
However it "lasted" for less than a day. During the following night all
operations on the VDO volume hung (the other non-VDO volumes were still
usable) and I had to perform a hard restart in order to get it back to work.
The only errors/complaints that I found were the blocked task notifications
in the console as well as in the /var/log/messages log with the following
detail (only the 1st occurrence shown).
Dec 4 01:53:01 lts1 kernel: INFO: task xfsaild/dm-4:5148 blocked for more
than 120 seconds.
Dec 4 01:53:01 lts1 kernel: Tainted: G OE --------- - -
4.18.0-425.3.1.el8.x86_64 #1
Dec 4 01:53:01 lts1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_
secs" disables this message.
Dec 4 01:53:01 lts1 kernel: task:xfsaild/dm-4 state:D stack: 0 pid:
5148 ppid: 2 flags:0x80004080
Dec 4 01:53:01 lts1 kernel: Call Trace:
Dec 4 01:53:01 lts1 kernel: __schedule+0x2d1/0x860
Dec 4 01:53:01 lts1 kernel: ? finish_wait+0x80/0x80
Dec 4 01:53:01 lts1 kernel: schedule+0x35/0xa0
Dec 4 01:53:01 lts1 kernel: io_schedule+0x12/0x40
Dec 4 01:53:01 lts1 kernel: limiterWaitForOneFree+0xc0/0xf0 [kvdo]
Dec 4 01:53:01 lts1 kernel: ? finish_wait+0x80/0x80
Dec 4 01:53:01 lts1 kernel: kvdoMapBio+0xcc/0x2a0 [kvdo]
Dec 4 01:53:01 lts1 kernel: __map_bio+0x47/0x1b0 [dm_mod]
Dec 4 01:53:01 lts1 kernel: dm_make_request+0x1a9/0x4d0 [dm_mod]
Dec 4 01:53:01 lts1 kernel: generic_make_request_no_check+0x202/0x330
Dec 4 01:53:01 lts1 kernel: submit_bio+0x3c/0x160
Dec 4 01:53:01 lts1 kernel: ? bio_add_page+0x46/0x60
Dec 4 01:53:01 lts1 kernel: _xfs_buf_ioapply+0x2af/0x430 [xfs]
Dec 4 01:53:01 lts1 kernel: ? xfs_iextents_copy+0xba/0x170 [xfs]
Dec 4 01:53:01 lts1 kernel: ? xfs_buf_delwri_submit_buffers+0x10c/0x2a0
[xfs]
Dec 4 01:53:01 lts1 kernel: __xfs_buf_submit+0x63/0x1d0 [xfs]
Dec 4 01:53:01 lts1 kernel: xfs_buf_delwri_submit_buffers+0x10c/0x2a0 [xfs]
Dec 4 01:53:01 lts1 kernel: ? xfsaild+0x26f/0x8c0 [xfs]
Dec 4 01:53:01 lts1 kernel: xfsaild+0x26f/0x8c0 [xfs]
Dec 4 01:53:01 lts1 kernel: ? xfs_trans_ail_cursor_first+0x80/0x80 [xfs]
Dec 4 01:53:01 lts1 kernel: kthread+0x10b/0x130
Dec 4 01:53:01 lts1 kernel: ? set_kthread_struct+0x50/0x50
Dec 4 01:53:01 lts1 kernel: ret_from_fork+0x1f/0x40
I'm now awaiting another occurrence of this and wondering there the issue
may be coming from.
Could it be the new LVM-VDO solution, or the kernel itself?
Can you perhaps suggest how to collect more information in such case, or
provide another tips?
Best regards,
Petr
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vdo-devel/attachments/20221206/8259ff5c/attachment.htm>
More information about the vdo-devel
mailing list