[vfio-users] Lost link when pass through rtl8168 to guest

Wei Xu wexu at redhat.com
Fri Sep 23 06:52:46 UTC 2016


On 2016年09月21日 22:50, Alex Williamson wrote:
> On Wed, 21 Sep 2016 14:04:20 +0800
> Wei Xu <wexu at redhat.com> wrote:
>
>> On 2016年09月21日 13:41, Wei Xu wrote:
>>   > On 2016年09月21日 12:31, Alex Williamson wrote:
>>   >> On Wed, 21 Sep 2016 11:52:31 +0800
>>   >> Wei Xu <wexu at redhat.com> wrote:
>>   >>
>>   >>> On 2016年09月21日 02:59, Nick Sarnie wrote:
>>   >>>> Hi Wei,
>>   >>>>
>>   >>>> My system is a desktop, so it must just be a general Gigabyte BIOS
>> bug.
>>   >>>> I submitted a help ticket about this issue and just gave a brief
>>   >>>> explanation and then sent Alex's explanation. Hopefully it will be
>>   >>>> escalated correctly.
>>   >>>
>>   >>> Thanks for your feedback, i'm also using a Gigabyte board, i have
>>   >>> checked out the firmware update history and updated my firmware to the
>>   >>> latest one which was released at March, looks it's a long way to get a
>>   >>> feedback for this issue from them.
>>   >>>
>>   >>> Alex,
>>   >>> It's a hard time for us to do nothing but wait, the reason why i use my
>>   >>> desktop is i got a com console on it, so it's quite convenient to
>>   >>> debugging kernel via kgdb, and i want to keep my realtek nic for ssh
>>   >>> access from my notebook, anyway to workaround it to just bypass the
>>   >>> wireless nic only as a temporary experiment?
>>   >>>
>>   >>> I'm trying VirtIO DMAR patch with vIOMMU in the guest recently, which
>>   >>> need pass through a pcie unit from host, and one more virtio nic
>> for the
>>   >>> guest due to the feedbacks, maybe i can pass through a device in other
>>   >>> groups instead of a nic?
>>   >>
>>   >> Sure, but skylake platforms are notoriously bad for their lack of
>>   >> device isolation, even things like USB controllers and audio devices
>>   >> are now part of multifunction packages that do not expose isolation
>>   >> through ACS.  If you can't resolve the IOMMU grouping otherwise, your
>>   >> choices are as I told Nick in the other thread:
>>   >>
>>   >>    "Your choices are to run an unsupported (and unsupportable)
>>   >>    configuration using the ACS override patch, get your hardware vendor
>>   >>    to fix their platform, or upgrade to better hardware with better
>>   >>    isolation characteristics."
>>   >>
>>   >> It's unfortunate that Intel provides VT-d on consumer platforms without
>>   >> sufficient device isolation to really make it usable, but that's often
>>   >> the state of things.  The workstation and server class platforms,
>>   >> supporting Xeon E5 or High End Desktop Processors provide the necessary
>>   >> isolation.  Thanks,
>>   >
>>   > Yes, fortunately i get it solved finally, i tried adding the 'r8169'
>>   > driver to the kernel group whitelist behind 'pci-stub' and recompile  &
>>   > update the kernel firstly, and the VM boot up successfully, but a map
>>   > page to iova error for realtek nic during DMA crashed the system later,
>>   > looks it was caused by the group dependency, i remembered the vfio doc
>>   > tells the group is the minimum isolation unit.
>
> This approach is just a bad idea.
>
>>   >
>>   > Then i found there are 3 pci bridges on my board, 2 of them are with a
>>   > group, another is a separate group, after plug the iwl wlan nic to this
>>   > one, everything works well.
>>
>> Just noticed a topology change of my system, looks the PCI bridges is
>> different as before after i changed the slot for my wlan nic, i used to
>> think i plugged it to 00:1d.0 but it was connected to Sky Lake PCIe
>> controller, does this mean there are hidden PCI bridges for pci
>> enumeration in the system, is this allowable?
>>
>> Before:
>> 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root
>> Port #5 (rev f1) (prog-if 00 [Normal decode])
>> 00:1c.7 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root
>> Port #8 (rev f1) (prog-if 00 [Normal decode]) ------------ wlan nic
>> 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root
>> Port #9 (rev f1) (prog-if 00 [Normal decode])
>>
>> Now:
>> 00:01.0 PCI bridge: Intel Corporation Sky Lake PCIe Controller (x16)
>> (rev 07) (prog-if 00 [Normal decode]) ------------ wlan nic
>> 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root
>> Port #5 (rev f1) (prog-if 00 [Normal decode])
>> 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-H PCI Express Root
>> Port #9 (rev f1) (prog-if 00 [Normal decode])
>
> There are generally two sources of PCIe root ports on Intel systems,
> the processor itself and the PCH (Platform Controller Hub).  Look at a
> block diagram for a modern system and you'll see this.  Typically for a
> client processor (i3/i5/i7) there is no isolation between or
> downstream of the individual processor root ports and isolation between
> the individual PCH root ports is via quirks, because Intel didn't
> include ACS or broke ACS.  You've found these processor root ports.
> Why don't they show up in lspci when nothing is plugged into them?  Why
> should they?  Chances are almost certain that your system does not
> support PCI hotplug, so there's no requirement to expose empty
> bridges.  I'm glad you've found a working setup, desktop class systems
> often have poor isolation characteristics which make device assignment
> difficult.  Thanks,

Thanks for your illustration, googled a few docs & info about it,
really helpful.

Still a few questions.

Q1:
 > there is no isolation between or downstream of the individual
 > processor root ports
Normally the processor root ports afford a higher speed than PCH root
ports, the words 'no isolation here for processor root ports' here
means no 'ACS' for root port? AFAIK the physical address have been 
translated to iova before filling into the device, then how the root
port forward TLPs between devices directly? IOTLB cache?

Q2:
 > isolation between the individual PCH root ports is via quirks,
 > because Intel didn't include ACS or broke ACS.
A little confused, 'Intel didn't include ACS'? Does 'Intel' here mean
processor root ports or root complex or the PCH?
	
For my case, to support isolation between the individual PCH root
ports, 2 conditions should be satisfied at the same time.

- BIOS reports ACS capability for the devices.
- Quirks as you listed before to correct the awareness of the
   capability.

Are there any other quirks needed?

Q3: This is a question not related to vfio, but i have been confused by
it for a while, could you please try to answer it? or point me
to some spec helpful.

For multiple cpu sockets case, e.g., consider a 2 cpu sockets,
there should be 2 root complexes, how do the root complexes be
connected to system pcie bus? normally there should be only one PCH in
system, right? then how does PCH connected to the root complex? only
one of them with upstream port or something else?

Considering device A was connected to root port of processor 0, if the
target DMA address is for memory slots of processor 1, then how does
the DMA/TLP look like on the fly?

Really appreciated.

Wei
>
> Alex
>




More information about the vfio-users mailing list