[vfio-users] Bus reset trouble with Titan-X

Alex Williamson alex.williamson at redhat.com
Wed Oct 19 15:50:02 UTC 2016


On Wed, 19 Oct 2016 10:00:57 -0500
Kevin Vasko <kvasko at gmail.com> wrote:

> Sure thing. I'm attaching all of the logs I have to let you get a bigger
> picture (and anyone that might run into a similar issue). Hopefully I
> didn't mess anything up.
> 
...

Here's the bit I was curious about:

> #showing parent bridge of a device that has a failed
> #:lspci -vvvs 03:00
> 03:00.0 PCI bridge: PLX Technology, Inc. Device 8796 (rev ab) (prog-if 00
> [Normal decode])
...
> LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency
> L0s <4us, L1 <8us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM Disabled; Disabled- CommClk-
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt-
> ABWMgmt-


The Link Status shows that it's in Gen1 mode at x0 width, so the link
failed to return to a working state after bus reset.  Maybe a hint is
that the Slot Status register shows that the Presence Detect Changed bit
got flipped, but the Presence Detect State bit remains 1, indicating
that a card is present.  However Presence Detect Changed Enable is not
set in the Slot Control register, so the OS doesn't get notified about
this.

I wonder what would happen if we cleared the Presence Detect Changed
bit and tried to retrain the link.  The express capability is at 0x68,
the slot status register is at 0x1a, bit 3 is the presence detect
changed bit and it's RW1C (read, write 1 to clear).  Therefore to clear
the bit we could do:

setpci -s 3:00.0 82.w=8:8

Recheck with lspci -vvvs 3:00.0 to check whether

SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
	Changed: MRL- PresDet+ LinkState-
                      ^^^^^^^^

Still reports + or - and possible if the link has decided to retrain.
To force a retrain we need to poke bit 5 in the link control register,
offset 0x10:

setpci -s 3:00.0 78.w=20:20

Recheck lspci to see if there's any progress.

... 
> #showing parent device that has a NON failed device
> #: lspci -vvvs 03:08
> 03:08.0 PCI bridge: PLX Technology, Inc. Device 8796 (rev ab) (prog-if 00
> [Normal decode])
...
> LnkCap: Port #8, Speed 8GT/s, Width x16, ASPM not supported, Exit Latency
> L0s <4us, L1 <8us
> ClockPM- Surprise- LLActRep- BwNot-
> LnkCtl: ASPM Disabled; Disabled- CommClk-
> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk- DLActive- BWMgmt-
> ABWMgmt-

In this case the link has retrained to Gen3 x16 and of course the
downstream devices are accessible.  The Presence Detect Changed bit is
set to - on this port.  Thanks,

Alex




More information about the vfio-users mailing list