[vfio-users] Bus reset trouble with Titan-X

Kevin Vasko kvasko at gmail.com
Wed Oct 19 18:47:11 UTC 2016


Thanks for the information. I didn't even noticed the Presence Detect
Changed bit difference (granted that is mostly due not knowing what to look
for and being a little over my head at this point).

I wouldn't figure that there would be a difference in using a different
card but at this point I'm out of things to try on my end. As for trying a
non-NVIDIA card, we don't have any available that I'm aware of so wouldn't
be able to test that unfortunately.

I'm not very familiar with the PLX technology, and definitely not sure what
the manufacturer might have done with this particular board (e.g. if this
is a problem with the firmware on the chip, or they introduced a problem
with their implementation or, if just the board is bad). (just talking
out-loud)

But no matter, I think at this point I feel I have enough information to go
on at this point to give to the manufacturer and that they should be able
to diagnose the problem from here.

I'll report back with what they suggest for a resolution.

Thanks again for your help, I really appreciate it. I'm not sure if
supporting people in this mailing list is part of your daily job, but if it
would help you out, send me an email directly with your managers name and I
would be more than happy to send them some feedback.

Thanks again,

-Kevin




On Wed, Oct 19, 2016 at 12:44 PM, Alex Williamson <
alex.williamson at redhat.com> wrote:

> On Wed, 19 Oct 2016 12:16:30 -0500
> Kevin Vasko <kvasko at gmail.com> wrote:
>
> > Ah, ok. My bad.
> >
> >
> > Ran
> >
> > #: setpci -s 3:00.0 82.w=8:8
> >
> > SltSta: Status: AttnBtn- PowerFit- MRL- CmdCplt- PresDet+ Interlock-
> >             Changed: MRL- PresDet- LinkState-
> >
> > #: setpci -s 3:00.0 78.w=20:20
> >
> > StlSta: Status AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> >            Changed: MRL- PresDet- LinkState-
> >
> >
> > When I run lspci -vvs 3:00.0 it is currently in this state
> >
> > StlSta: Status AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
> >            Changed: MRL- PresDet- LinkState-
> >
> > I didn't realize that I was needing to look at "PresDet", sorry. It does
> > look like it is different than before so I assume the setpci commands
> > changed it somewhere.
> >
> > The device (GPU) is still in the "(rev ff) (prog-if ff)" state.
>
> Ok, it would have been a long shot, the Presence Detect Changed bit
> really should not have been having any effect on re-establishing the
> link, it was just a notable difference between the working and
> non-working examples.
>
> > Do you think this could be a GPU issue? I have not tried a different GPU
> in
> > the system. Would it be worthwhile trying an NVidia M4000 to see if I get
> > the same results or do you think there is a problem with the PLX Riser?
>
> I can only speculate here, but I wouldn't expect PCIe link
> characteristics to be significantly different between consumer and
> workstation class cards.  If you have one on hand, it certainly doesn't
> hurt to try though.  Perhaps performing the same test with a non-NVIDIA
> card installed might be more enlightening, preferably a card with
> similar PCIe width and speed, but any sort of data point might be
> useful.
>
> I will note that NVIDIA does make use of PLX PCIe switches on some of
> their devices, both the GRID K1 and Tesla M60 (probably others as well)
> make use of a PLX PEX 8747 switch to pack multiple GPUs onto a single
> card.  So there might be a reasonable expectation of PLX switches
> working with NVIDIA devices.  What sort of tuning or special
> configuration NVIDIA does on those since the switch is onboard the
> card, I have no idea.  Thanks,
>
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20161019/e9bdfad4/attachment.htm>


More information about the vfio-users mailing list