That said, if upgrading QEMU results in losing features, even though
you can recover them through additional steps I would argue that's a
bug in the packaging that should be addressed on the QEMU side.
I had the same thought. However, I'm expecting this will be part of Fedora 33 (not yet out), and the QXL display driver is possibly becoming optional? Within the same release of Fedora, I expect the default module list should be stable, but between releases it might not be? But, what about long-lived major releases like RHEL / CentOS? Or people who "upgrade" from the distro release to the "advanced virtualization" release (RHEL / CentOS 8)? I also would expect functionality which seems pretty default - to stay default, although perhaps it could be a weak package dependency or similar, to permit people to uninstall it?
They also moved qemu-device-usb-smartcart to optional along with the already mentioned qemu-device-display-qxl and qemu-device-usb-redirect, and I believe this has happened in the past with some of the block device drivers, only I might not use them, so they might not have affected me? I think everything in the module directory is really in scope.
The Fedora package owner agreed, and will be correcting it so that the default will include these packages. This addresses the upgrade case, and the surprise factor for users merely upgrading from Fedora 32 to Fedora 33 resulting in libvirt breaking for them.
However, it does not address the exposed problem - which is that I can add or remove individual Qemu module packages at any time, and libvirt will not be aware of this change until some other event occurs which might not ever occur in this workflow (even reboot!).
So, I think it is important to include the Qemu module directory in the list of timestamps to check to determine if the domcapabilities cache should be invalidated or not. If a module gets added or removed, the directory timestamp should change.
I think the idea of libvirt probing the system and guessing when to invalidate the cache based upon only a few select data points, including ones like "qemu binary timestamp" are embedded with assumptions, is going to continue to be a problem. But, adding the above check would close an additional set of scenarios as covered.