[vfio-users] AMDGPU rebind kernel bug

Sat Aug 18 01:25:19 UTC 2018

On Sat, 18 Aug 2018 01:48:24 +0100
Gary <gary at mups.co.uk> wrote:

> Hi all,
> 
> I have vfio-pci configured to allow Linux host to run on intel iGPU
> whilst a 8GB Sapphire Nitro+ RX580 is passed through using virt-manager
> to a Windows 10 VM. As long as I eject the GPU in windows before
> shutting down the VM, everything works (amd reset bug?).
> 
> I would however like to use the RX580 in the host when the VM is not
> running. In order to do this I removed the vfio-pci ids= option allowing
> the amdgpu module to bind as normal. I also updated my xorg config to:
> 
>   Section "Device"
>       Identifier "Intel Graphics"
>       Driver "intel"
>       Option "DRI" "3"
>   EndSection
> 
>   Section "ServerFlags"
>   	Option "AutoAddGPU" "off"
>   EndSection
> 
>   Section "Device"
>       Identifier "AMDGPU"
>       Driver "amdgpu"
>       Option "DRI3" "1"
>       Option "Ignore" "1"
>   EndSection
> 
> This allows me to use the intel graphics or via DRI_PRIME=1 the AMD
> graphics. I can also start the VM and virt-manager will rebind the
> GPU/GPUAudio to vfio-pci and the VM works nicely.
> 
> The problem with this setup comes when I eject the GPU in windows.
> virt-manager in the host locks up and dmesg shows a kernel bug message
> (full error at end of email)
> 
> 
>   [  423.535829] ------------[ cut here ]------------
>   [  423.535830] kernel BUG at /build/linux-hvYKKE/linux-4.17.8/drivers
> /iommu/intel-iommu.c:732!
>   [  423.535835] invalid opcode: 0000 [#1] SMP PTI
>   [  423.535836] Modules linked in: tun fuse ebtable_filter...
> 
> 
> After a power cycle and thinking this may be to do with the amdgpu
> module rebind, I tried unloading the amdgpu module whilst the the VM was
> running and thus the GPU bound to vfio-pci. Ejecting the GPU in windows
> no longer caused virt-manager to lockup and I could then shut down the
> VM via virt-manager.
> 
> However, this just delays the issue, when an attempt is made to rebind
> the AMDGPU I once more get a lockup, this time with the dmesg error:
> 
>   [  982.416988] BUG: unable to handle kernel paging request at
> ffffb9ad1281a2b4
>   [  982.416992] PGD 41e921067 P4D 41e921067 PUD 0
>   [  982.416995] Oops: 0002 [#1] SMP PTI
>   [  982.416997] Modules linked in: amdgpu(+) chash gpu_sched...
> 
> Note, the lockup is of the graphics output. I can still SSH into the
> machine, although trying to shut the machine down does not get too far.
> 
> Is this in anyway related to the AMD reset bug? If not, any idea if
> there's a fix or workaround or any further information I could provide
> to help troubleshoot this?
> 
> 
> Full trace from dmesg for the two errors follows
> 
> ----------------------- FIRST Error ------------------------------
> [  423.535829] ------------[ cut here ]------------
> [  423.535830] kernel BUG at
> /build/linux-hvYKKE/linux-4.17.8/drivers/iommu/intel-iommu.c:732!
> [  423.535835] invalid opcode: 0000 [#1] SMP PTI
> [  423.535836] Modules linked in: tun fuse ebtable_filter ebtables
> bridge stp llc cpufreq_powersave cpufreq_userspace cpufreq_conservative
> binfmt_misc nls_ascii nls_cp437 vfat fat snd_hda_codec_realtek
> snd_hda_codec_generic amdkfd ip6t_REJECT nf_reject_ipv6 nf_log_ipv6
> xt_hl ip6t_rt amdgpu snd_hda_codec_hdmi iTCO_wdt iTCO_vendor_support
> intel_rapl nf_conntrack_ipv6 nf_defrag_ipv6 x86_pkg_temp_thermal
> intel_powerclamp snd_hda_intel coretemp chash snd_hda_codec gpu_sched
> snd_hda_core kvm_intel i915 kvm ttm snd_hwdep efi_pstore intel_cstate
> snd_pcm intel_uncore intel_rapl_perf ipt_REJECT nf_reject_ipv4 serio_raw
> snd_timer pcspkr efivars drm_kms_helper nf_log_ipv4 sg snd drm joydev
> evdev mei_me lpc_ich i2c_algo_bit soundcore mei shpchp ie31200_edac
> nf_log_common xt_LOG video button xt_limit xt_tcpudp xt_addrtype
> [  423.535866]  nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
> ip6table_filter ip6_tables nf_conntrack_netbios_ns
> nf_conntrack_broadcast nf_nat_ftp nf_nat vfio_pci vfio_virqfd
> vfio_iommu_type1 nf_conntrack_ftp vfio irqbypass nf_conntrack parport_pc
> ppdev lp iptable_filter parport sunrpc efivarfs ip_tables x_tables
> autofs4 ext4 crc16 mbcache jbd2 fscrypto ecb btrfs zstd_decompress
> zstd_compress xxhash algif_skcipher af_alg dm_crypt raid10 raid456
> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq
> libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod dm_mod
> sd_mod hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel
> ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd
> glue_helper psmouse ahci i2c_i801 libahci xhci_pci ehci_pci libata
> xhci_hcd ehci_hcd
> [  423.535894]  alx scsi_mod mdio thermal usbcore usb_common fan
> [  423.535899] CPU: 2 PID: 3815 Comm: libvirtd Not tainted
> 4.17.0-0.bpo.1-amd64 #1 Debian 4.17.8-1~bpo9+1
> [  423.535900] Hardware name: Gigabyte Technology Co., Ltd. To be filled
> by O.E.M./B75-D3V, BIOS F9 10/23/2013
> [  423.535905] RIP: 0010:domain_get_iommu+0x4e/0x60
> [  423.535906] RSP: 0018:ffffa52d48a4bb48 EFLAGS: 00010202
> [  423.535907] RAX: 0000000000000001 RBX: 0000000080c27000 RCX:
> 0000000000000000
> [  423.535908] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffff8b4a595d4d00
> [  423.535909] RBP: 0000000000000000 R08: 00000000000272d0 R09:
> ffffffff994ef4b7
> [  423.535910] R10: ffffa52d48a4ba60 R11: ffffe0d58fd21f20 R12:
> ffff8b4a5c5fb0a0
> [  423.535911] R13: 000000ffffffffff R14: ffff8b4a595d4d00 R15:
> 0000000000001000
> [  423.535913] FS:  00007f287deb2700(0000) GS:ffff8b4a6e300000(0000)
> knlGS:0000000000000000
> [  423.535914] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  423.535915] CR2: fffff80077770000 CR3: 000000041772c003 CR4:
> 00000000001626e0
> [  423.535916] Call Trace:
> [  423.535920]  __intel_map_single+0x61/0x180

This one is because the GPU is still bound to a VM IOMMU domain,
probably because the audio function is still bound to the VM and
userspace bindings are done at the group level.  This is a user/libvirt
error, your scenario has allowed libvirt to attempt to rebind the GPU
to a host driver while the audio device in the same IOMMU group is
still bound to vfio-pci and in use by the user.  Had intel-iommu not
hit a BUG_ON, vfio would for the isolation violation.

> [  423.535957]  amdgpu_gart_init+0x5e/0x100 [amdgpu]
> [  423.535983]  gmc_v8_0_sw_init+0x669/0x700 [amdgpu]
> [  423.535997]  ? drm_detect_hdmi_monitor+0x3e/0xe0 [drm]
> [  423.536017]  amdgpu_device_init+0x102a/0x1490 [amdgpu]
> [  423.536019]  ? kmalloc_order+0x14/0x40
> [  423.536039]  amdgpu_driver_load_kms+0x86/0x2c0 [amdgpu]
> [  423.536046]  drm_dev_register+0x132/0x1c0 [drm]
> [  423.536066]  amdgpu_pci_probe+0x1b5/0x280 [amdgpu]
> [  423.536069]  local_pci_probe+0x44/0xa0
> [  423.536072]  ? _cond_resched+0x16/0x40
> [  423.536074]  pci_device_probe+0x102/0x1b0
> [  423.536077]  driver_probe_device+0x2b2/0x490
> [  423.536079]  ? __driver_attach+0xe0/0xe0
> [  423.536080]  bus_for_each_drv+0x64/0xb0
> [  423.536082]  __device_attach+0xd9/0x150
> [  423.536084]  bus_rescan_devices_helper+0x30/0x50
> [  423.536086]  store_drivers_probe+0x2d/0x60
> [  423.536088]  kernfs_fop_write+0x10f/0x190
> [  423.536091]  vfs_write+0xb0/0x190
> [  423.536093]  ksys_write+0x52/0xc0
> [  423.536095]  do_syscall_64+0x55/0x110
> [  423.536097]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  423.536098] RIP: 0033:0x7f28a4c4b1ad
> [  423.536099] RSP: 002b:00007f287deb1930 EFLAGS: 00000293 ORIG_RAX:
> 0000000000000001
> [  423.536101] RAX: ffffffffffffffda RBX: 0000000000000016 RCX:
> 00007f28a4c4b1ad
> [  423.536102] RDX: 000000000000000c RSI: 00007f2858008d24 RDI:
> 0000000000000016
> [  423.536103] RBP: 000000000000000c R08: 00007f28540009e0 R09:
> 0000000000000000
> [  423.536104] R10: 00007f28a84ce903 R11: 0000000000000293 R12:
> 00007f2858008d24
> [  423.536105] R13: 0000000000000000 R14: 0000000000000016 R15:
> 00007f2854000a00
> [  423.536106] Code: 74 0d eb 29 48 83 c7 04 8b 4f fc 85 c9 75 0a 83 c0
> 01 39 d0 75 ee 31 c0 c3 48 98 48 c1 e0 03 48 8b 15 a7 4e 14 01 48 8b 04
> 02 c3 <0f> 0b 31 c0 eb ee 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
> [  423.536126] RIP: domain_get_iommu+0x4e/0x60 RSP: ffffa52d48a4bb48
> [  423.536128] ---[ end trace 68f635a30860d3cb ]---
> 
> 
> 
> 
> ----------------------- SECOND Error ------------------------------
> 
> [  981.069606] [drm] amdgpu kernel modesetting enabled.
> [  981.069826] [drm] initializing kernel modesetting (POLARIS10
> 0x1002:0x67DF 0x1DA2:0xE366 0xE7).
> [  981.069845] [drm] register mmio base: 0xF7D00000
> [  981.069845] [drm] register mmio size: 262144
> [  981.069851] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e
> [  981.069852] [drm] probing mlw for device 8086:151 = 261ad03
> [  981.069853] [drm] add ip block number 0 <vi_common>
> [  981.069854] [drm] add ip block number 1 <gmc_v8_0>
> [  981.069855] [drm] add ip block number 2 <tonga_ih>
> [  981.069855] [drm] add ip block number 3 <powerplay>
> [  981.069856] [drm] add ip block number 4 <dm>
> [  981.069856] [drm] add ip block number 5 <gfx_v8_0>
> [  981.069857] [drm] add ip block number 6 <sdma_v3_0>
> [  981.069857] [drm] add ip block number 7 <uvd_v6_0>
> [  981.069858] [drm] add ip block number 8 <vce_v3_0>
> [  981.069861] kfd kfd: skipped device 1002:67df, PCI rejects atomics
> [  981.069868] [drm] UVD is enabled in VM mode
> [  981.069868] [drm] UVD ENC is enabled in VM mode
> [  981.069869] [drm] VCE enabled in VM mode
> [  982.413309] ATOM BIOS: 113-BE366EU-Z48
> [  982.413358] [drm] vm size is 64 GB, 2 levels, block size is 10-bit,
> fragment size is 9-bit
> [  982.413429] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_mc.bin
> [  982.413437] amdgpu 0000:01:00.0: VRAM: 8192M 0x000000F400000000 -
> 0x000000F5FFFFFFFF (8192M used)
> [  982.413438] amdgpu 0000:01:00.0: GTT: 256M 0x0000000000000000 -
> 0x000000000FFFFFFF
> [  982.413446] [drm] Detected VRAM RAM=8192M, BAR=256M
> [  982.413447] [drm] RAM width 256bits GDDR5
> [  982.413562] [TTM] Zone  kernel: Available graphics memory: 7701472 kiB
> [  982.413563] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
> [  982.413564] [TTM] Initializing pool allocator
> [  982.413568] [TTM] Initializing DMA pool allocator
> [  982.413858] [drm] amdgpu: 8192M of VRAM memory ready
> [  982.413859] [drm] amdgpu: 8192M of GTT memory ready.
> [  982.413876] DMAR: 64bit 0000:01:00.0 uses identity mapping
> [  982.413877] [drm] GART: num cpu pages 65536, num gpu pages 65536
> [  982.413910] [drm] PCIE GART of 256M enabled (table at
> 0x000000F400040000).
> [  982.414019] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_pfp_2.bin
> [  982.414033] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_me_2.bin
> [  982.414046] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_ce_2.bin
> [  982.414046] [drm] Chained IB support enabled!
> [  982.414058] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_rlc.bin
> [  982.414138] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_mec_2.bin
> [  982.414240] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_mec2_2.bin
> [  982.415203] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_sdma.bin
> [  982.415220] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_sdma1.bin
> [  982.415397] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_uvd.bin
> [  982.415400] [drm] Found UVD firmware Version: 1.130 Family ID: 16
> [  982.416620] amdgpu 0000:01:00.0: firmware: direct-loading firmware
> amdgpu/polaris10_vce.bin
> [  982.416624] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
> [  982.416988] BUG: unable to handle kernel paging request at
> ffffb9ad1281a2b4
> [  982.416992] PGD 41e921067 P4D 41e921067 PUD 0
> [  982.416995] Oops: 0002 [#1] SMP PTI
> [  982.416997] Modules linked in: amdgpu(+) chash gpu_sched ttm tun fuse
> ebtable_filter ebtables bridge stp llc cpufreq_powersave
> cpufreq_userspace cpufreq_conservative binfmt_misc intel_rapl
> x86_pkg_temp_thermal intel_powerclamp nls_ascii nls_cp437 vfat fat
> coretemp iTCO_wdt iTCO_vendor_support kvm_intel ip6t_REJECT
> nf_reject_ipv6 snd_hda_codec_realtek nf_log_ipv6 kvm amdkfd intel_cstate
> snd_hda_codec_generic efi_pstore intel_uncore xt_hl intel_rapl_perf
> ip6t_rt i915 efivars serio_raw pcspkr snd_hda_codec_hdmi snd_hda_intel
> snd_hda_codec drm_kms_helper snd_hda_core snd_hwdep snd_pcm drm
> snd_timer joydev mei_me nf_conntrack_ipv6 evdev snd sg lpc_ich soundcore
> mei shpchp i2c_algo_bit ie31200_edac nf_defrag_ipv6 video button
> ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG xt_limit
> xt_tcpudp
> [  982.417033]  xt_addrtype nf_conntrack_ipv4 nf_defrag_ipv4
> xt_conntrack ip6table_filter ip6_tables nf_conntrack_netbios_ns
> nf_conntrack_broadcast nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack
> iptable_filter vfio_pci vfio_virqfd vfio_iommu_type1 vfio irqbypass
> sunrpc parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs4
> ext4 crc16 mbcache jbd2 fscrypto ecb btrfs zstd_decompress zstd_compress
> xxhash algif_skcipher af_alg dm_crypt raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c
> crc32c_generic raid1 raid0 multipath linear md_mod dm_mod sd_mod
> hid_generic usbhid hid crct10dif_pclmul crc32_pclmul crc32c_intel
> ghash_clmulni_intel pcbc ahci aesni_intel aes_x86_64 crypto_simd libahci
> cryptd psmouse glue_helper i2c_i801 xhci_pci libata ehci_pci xhci_hcd
> [  982.417071]  ehci_hcd scsi_mod alx mdio usbcore usb_common fan
> thermal [last unloaded: chash]
> [  982.417078] CPU: 2 PID: 3332 Comm: modprobe Not tainted
> 4.17.0-0.bpo.1-amd64 #1 Debian 4.17.8-1~bpo9+1
> [  982.417080] Hardware name: Gigabyte Technology Co., Ltd. To be filled
> by O.E.M./B75-D3V, BIOS F9 10/23/2013
> [  982.417142] RIP:
> 0010:smu7_populate_single_firmware_entry.isra.5+0x89/0xe0 [amdgpu]
> [  982.417143] RSP: 0018:ffffb991420d7950 EFLAGS: 00010246
> [  982.417145] RAX: 000000000000008c RBX: 0000000000000003 RCX:
> 0000000000000000
> [  982.417147] RDX: ffffffffc0f68a64 RSI: 0000000000000004 RDI:
> ffff8cafdb9c4360
> [  982.417148] RBP: ffffb9ad1281a2b4 R08: 0000000000000002 R09:
> ffffb991493be000
> [  982.417149] R10: 00000000802a0001 R11: 0000000000000001 R12:
> ffff8cafd698d040
> [  982.417151] R13: ffff8cafa26fe000 R14: 000000000000047e R15:
> 0000000000000003
> [  982.417154] FS:  00007fb5f5737700(0000) GS:ffff8cafee300000(0000)
> knlGS:0000000000000000
> [  982.417155] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  982.417157] CR2: ffffb9ad1281a2b4 CR3: 00000003ee264003 CR4:
> 00000000001606e0
> [  982.417158] Call Trace:
> [  982.417208]  smu7_request_smu_load_fw+0x97/0x320 [amdgpu]
> [  982.417252]  polaris10_start_smu+0x64/0x4c0 [amdgpu]
> [  982.417293]  ? amdgpu_ucode_init_bo+0xe2/0x270 [amdgpu]
> [  982.417341]  pp_hw_init+0x4c/0xd0 [amdgpu]
> [  982.417378]  amdgpu_device_init+0x13c3/0x1490 [amdgpu]
> [  982.417383]  ? kmalloc_order+0x14/0x40
> [  982.417419]  amdgpu_driver_load_kms+0x86/0x2c0 [amdgpu]
> [  982.417433]  drm_dev_register+0x132/0x1c0 [drm]
> [  982.417469]  amdgpu_pci_probe+0x1b5/0x280 [amdgpu]
> [  982.417474]  local_pci_probe+0x44/0xa0
> [  982.417478]  ? _cond_resched+0x16/0x40
> [  982.417481]  pci_device_probe+0x102/0x1b0

This one looks more like "GPU drivers are not good at hotplug
¯\_(ツ)_/¯"

> [  982.417484]  driver_probe_device+0x2b2/0x490
> [  982.417486]  __driver_attach+0xdd/0xe0
> [  982.417489]  ? driver_probe_device+0x490/0x490
> [  982.417491]  bus_for_each_dev+0x67/0xc0
> [  982.417494]  ? klist_add_tail+0x3b/0x70
> [  982.417496]  bus_add_driver+0x16a/0x260
> [  982.417499]  driver_register+0x57/0xc0
> [  982.417501]  ? 0xffffffffc1199000
> [  982.417503]  do_one_initcall+0x4d/0x1c5
> [  982.417506]  ? _cond_resched+0x16/0x40
> [  982.417509]  ? kmem_cache_alloc_trace+0x15d/0x1c0
> [  982.417512]  ? do_init_module+0x22/0x218
> [  982.417515]  do_init_module+0x5b/0x218
> [  982.417518]  load_module.constprop.55+0x2548/0x2d50
> [  982.417521]  ? vfs_read+0x119/0x130
> [  982.417524]  ? __do_sys_finit_module+0xd2/0x100
> [  982.417526]  __do_sys_finit_module+0xd2/0x100
> [  982.417530]  do_syscall_64+0x55/0x110
> [  982.417532]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  982.417535] RIP: 0033:0x7fb5f52ac229
> [  982.417536] RSP: 002b:00007ffe1335d988 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [  982.417538] RAX: ffffffffffffffda RBX: 00005596844ee4c0 RCX:
> 00007fb5f52ac229
> [  982.417540] RDX: 0000000000000000 RSI: 0000559683708638 RDI:
> 0000000000000006
> [  982.417541] RBP: 0000559683708638 R08: 0000000000000000 R09:
> 0000000000000000
> [  982.417542] R10: 0000000000000006 R11: 0000000000000246 R12:
> 0000000000000000
> [  982.417544] R13: 00005596844ef830 R14: 0000000000040000 R15:
> 0000000000000000
> [  982.417545] Code: c0 83 e3 fb 0f 94 c0 66 89 45 18 31 c0 48 8b 4c 24
> 30 65 48 33 0c 25 28 00 00 00 75 5c 48 83 c4 38 5b 5d 41 5c c3 0f b7 44
> 24 02 <66> 89 5d 00 c7 45 0c 00 00 00 00 c7 45 10 00 00 00 00 66 89 45
> [  982.417614] RIP: smu7_populate_single_firmware_entry.isra.5+0x89/0xe0
> [amdgpu] RSP: ffffb991420d7950
> [  982.417615] CR2: ffffb9ad1281a2b4
> [  982.417617] ---[ end trace 095f6331aad830c9 ]---
> 
> 
> Thanks,
> 
> Gary
> 
> _______________________________________________
> vfio-users mailing list
> vfio-users at redhat.com
> https://www.redhat.com/mailman/listinfo/vfio-users