[vfio-users] host crash when assign 4 nics to 4 vms separately

Mon Jun 25 09:56:39 UTC 2018

Hi,

Recently my colleague ran into a kernel crash problem when he tried to assign 4 nics to 4 vms separately.
Unfortunately he didn't collect related logs and we only can see the dmesg log when core dump currently.

Here the info:

linux:~ # lspci | grep -i eth
02:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
02:00.1 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
02:00.2 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
02:00.3 Ethernet controller: Broadcom Limited NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
81:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
81:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
82:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
82:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

He used the last four nics.

Dmesg:

[ 3449.519354] general protection fault: 0000 [#1] SMP
[ 3449.682056] CPU: 8 PID: 26794 Comm: qemu-kvm Tainted: G           OE  ---- -------   3.10.0-514.44.5.10_44.x86_64 #1
[ 3449.692900] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 3.87 02/02/2018
[ 3449.700115] task: ffff880e63172f10 ti: ffff880de5424000 task.ti: ffff880de5424000
[ 3449.707932] RIP: 0010:[<ffffffff8156a6fc>]  [<ffffffff8156a6fc>] domain_remove_one_dev_info+0x9c/0x250
[ 3449.717586] RSP: 0018:ffff880de5427c88  EFLAGS: 00010093
[ 3449.723064] RAX: 0000000000000246 RBX: dead000000000100 RCX: ffff88203e49c258
[ 3449.730359] RDX: dead000000000100 RSI: 0000000000000001 RDI: ffff88203e46da40
[ 3449.737656] RBP: ffff880de5427cd8 R08: 0000000000000001 R09: 000000018040003c
[ 3449.744954] R10: 000000003e46da01 R11: ffffea0080f91b40 R12: ffff88203e46da40
[ 3449.752250] R13: ffff88203e49c240 R14: ffff88203ebe3098 R15: ffff88017fd17200
[ 3449.759548] FS:  00007fb9304e6c00(0000) GS:ffff88203f280000(0000) knlGS:0000000000000000
[ 3449.767966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3449.773877] CR2: 00000000004bc020 CR3: 0000001de0324000 CR4: 00000000001627e0
[ 3449.781173] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3449.788469] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 3449.795768] Call Trace:
[ 3449.798387]  [<ffffffff8156d6b9>] intel_iommu_attach_device+0x209/0x240
[ 3449.805159]  [<ffffffff8155beb1>] __iommu_attach_device+0x21/0x80
[ 3449.811418]  [<ffffffff8155d174>] __iommu_attach_group+0x54/0x80
[ 3449.817590]  [<ffffffff8155d1cb>] iommu_attach_group+0x2b/0x40
[ 3449.823592]  [<ffffffffa0222268>] vfio_iommu_type1_attach_group+0x1c8/0x652 [vfio_iommu_type1]
[ 3449.832542]  [<ffffffffa01f4aea>] vfio_fops_unl_ioctl+0x1ba/0x300 [vfio]
[ 3449.839398]  [<ffffffff81229cd8>] do_vfs_ioctl+0x2e8/0x4d0
[ 3449.845046]  [<ffffffff81234c07>] ? __fd_install+0x47/0x60
[ 3449.850686]  [<ffffffff81229f61>] SyS_ioctl+0xa1/0xc0
[ 3449.855908]  [<ffffffff816c22ef>] system_call_fastpath+0x1c/0x21
[ 3449.862075] Code: 39 cb 48 8b 13 48 89 df 74 2d 49 89 dc 48 89 d3 4d 39 7c 24 30 75 e8 0f b6 75 ce 41 38 74 24 20 74 5d 48 39 cb 41 b8 01 00 00 00 <48> 8b 13 48 89 df 75 d7 0f 1f 40 00 48 89 c6 48 c7 c7 80 d0 fd
[ 3449.882792] RIP  [<ffffffff8156a6fc>] domain_remove_one_dev_info+0x9c/0x250
[ 3449.889940]  RSP <ffff880de5427c88>
[ 3449.894065] ---[ end trace e389931a63bcab52 ]---
[ 3450.471060] Kernel panic - not syncing: Fatal exception
[ 3451.512156] Shutting down cpus with NMI
[ 3452.068904] die even has been record!

Any suggestion will be appreciated!

Thanks,
Zongyong Wu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20180625/f641352a/attachment.htm>