[vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts kernel

Dan Ziemba zman0900 at gmail.com
Sat Oct 24 02:37:35 UTC 2015


I'm not really sure what the lockup it.  It usually happens while idle
and leaves no trace in the logs.  Sometime it happens while I'm using
it though and suddenly programs will just refuse to start then a few
seconds later all the open ones lock up.  Mouse keeps working though,
and sometimes I can even ctrl-alt-backspace to exit X, but it never
restarts.

The reason my linux-vfio-lts package is different from linux-lts is
because it is based off of my linux-vfio package, which is just a copy
of the main linux package with patches added.  There are some very odd
and fundamental differences in config between the main linux and linux-
lts package that don't make any sense to me.  Even when comparing the
old 4.1.x versions of linux to current 4.1.x versions of linux-lts, the
differences remain.

Dan

-----Original Message-----
From: Mark Weiman <mark.weiman at markzz.com>
To: vfio-users at redhat.com
Subject: Re: [vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts
kernel
Date: Fri, 23 Oct 2015 22:30:12 -0400

I am currently building 4.1.11-2 for my repository with your suggestion
of using Arch's config files for linux-lts.

I will also post the source package used [1].  This is in case someone
wants to check it or doesn't feel comfortable using my package and
wants to build it himself/herself.

Mark Weiman

[1] http://repo.markzz.com/src/arch/markzz/linux-vfio-lts-4.1.11-2.src.
tar.gz

On Sat, 2015-10-24 at 08:28 +0700, Okky Hendriansyah wrote:
> What kind of lockup do you mean? I'm on an ASRock Z87 Extreme6 board
> and I was using your linux-lts-vfio, never had any lockups. I did
> install intel microcode though.
> 
> But then I tried to use the ABS approach with linux-lts and apply the
> patches from your PKGBUILD. Currently I'm using linux-lts with i915
> and ACS patches compiled using ABS and the host system is quote
> stable.
> 
> I noticed there're some diff lines between your linux config and the
> one from linux-lts, have you tried to use the config from official
> linux-lts?
> 
> Best regards,
> Okky Hendriansyah
> 
> > On Oct 24, 2015, at 07:50, Dan Ziemba <zman0900 at gmail.com> wrote:
> > 
> > I just released the 4.1.11 PKGBUILD.  So far so good for me, but
> > it's
> > only been running for a few hours - not really long enough to
> > tell.  
> > 
> > I do have ASRock too, but it is on nearly the latest uefi firmware.
> >  There is one newer version, but it says the only change is the
> > servers
> > used for online update.
> > 
> > I never got around to setting up the intel microcode updates, so
> > that
> > should probably be my next step.
> > 
> > Dan
> > 
> > -----Original Message-----
> > From: Mark Weiman <mark.weiman at markzz.com>
> > To: vfio-users at redhat.com
> > Subject: Re: [vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-
> > lts
> > kernel
> > Date: Fri, 23 Oct 2015 18:56:39 -0400
> > 
> > To be honest, ASRock BIOS upgrades are fairly painless because they
> > can
> > be done outside of the operating system, so no need to get an image
> > of
> > FreeDOS ready.  If you do not want to get that though, I do still
> > recommend the intel-ucode package if you don't already.  As of
> > right
> > now, I have no issues running my repository's 4.1.11-1 package.
> > 
> > Mark Weiman
> > 
> > > On Fri, 2015-10-23 at 16:51 -0200, Lucas Kückelhaus wrote:
> > > One thing I noticed is that we all do seem to have ASROCK
> > > motherboards 
> > > as Mark mentioned. I am hesitant to perform a bios upgrade,
> > > however. 
> > > VT-D is finicky enough as is. I can try 4.1.11 later tonight and
> > > see
> > > if 
> > > it helps.
> > > 
> > > Regards,
> > > Lucas Kückelhaus
> > > 
> > > > On 2015-10-23 15:54, Dan Ziemba wrote:
> > > > Well, old systemd and dbus didn't help. System was locked up
> > > > again
> > > > this morning.  Left the screen on tailing dmesg, but there was
> > > > nothing
> > > > interesting output.  I've got a PKGBUILD for 4.1.11 coming
> > > > later
> > > > today, so maybe that will help.
> > > > 
> > > > Dan
> > > > > On Oct 22, 2015 10:53 PM, "Dan Ziemba" <zman0900 at gmail.com>
> > > > > wrote:
> > > > > 
> > > > > Hey,
> > > > > 
> > > > > I maintain that PKGBUILD. I think I've been having the same
> > > > > problem,
> > > > > but it seems to also happen if I reinstall the older linux-
> > > > > vfio
> > > > > 4.1.6.
> > > > > Here's the latest stack trace I was able to capture:
> > > > > https://i.imgur.co [1]
> > > > > m/FZkj4ib.jpg I had to disable the screen timeout so it would
> > > > > stay
> > > > > on
> > > > > all night with dmesg tailing and I found it like this in the
> > > > > morning.
> > > > > Mouse and caps lock still worked, but I couldn't actually do
> > > > > anything
> > > > > and the clock was frozen.
> > > > > 
> > > > > I was also noticing that booting my system was unreliable. If
> > > > > I
> > > > > would
> > > > > reboot several times in a row, once every two to three time,
> > > > > it
> > > > > would
> > > > > hang while starting various services and then never start
> > > > > gdm.
> > > > > 
> > > > > Today I tried downgrading systemd and dbus to just before the
> > > > > change
> > > > > that switched to user buses (See here:
> > > > > https://www.archlinux.org/news/d
> > > > > -bus-now-launches-user-buses/ ;) I reboot a whole bunch of
> > > > > times
> > > > > using
> > > > > 4.1.10 linux-vfio-lts and it seems reliable. I have been
> > > > > using
> > > > > the
> > > > > computer pretty much all day for work and it hasn't had any
> > > > > of
> > > > > the
> > > > > soft
> > > > > lockup yet, but it may be too soon to tell. Most of the time
> > > > > in
> > > > > the
> > > > > past the lockup would happen while idle.
> > > > > 
> > > > > These are the downgrades I made, everything else is up to
> > > > > date as
> > > > > of
> > > > > this morning.
> > > > > 
> > > > > [2015-10-22 12:22] [ALPM] transaction started
> > > > > [2015-10-22 12:22] [ALPM] downgraded libsystemd (227-1 ->
> > > > > 225-1)
> > > > > [2015-10-22 12:22] [ALPM] downgraded libdbus (1.10.0-4 ->
> > > > > 1.10.0-
> > > > > 2)
> > > > > [2015-10-22 12:22] [ALPM] downgraded dbus (1.10.0-4 ->
> > > > > 1.10.0-2)
> > > > > [2015-10-22 12:22] [ALPM] downgraded systemd (227-1 -> 225-1)
> > > > > [2015-10-22 12:22] [ALPM] downgraded lib32-systemd (227-1 ->
> > > > > 225-
> > > > > 1)
> > > > > [2015-10-22 12:22] [ALPM] downgraded systemd-sysvcompat (227-
> > > > > 1 ->
> > > > > 225-1)
> > > > > [2015-10-22 12:22] [ALPM] transaction completed
> > > > > 
> > > > > I will follow up tomorrow with whether or not it locks up
> > > > > tonight.
> > > > > If
> > > > > we can isolate the problem to systemd or dbus, maybe that's
> > > > > at
> > > > > least
> > > > > good enough for a bug report.
> > > > > 
> > > > > Dan
> > > > > 
> > > > > -----Original Message-----
> > > > > From: Lucas Kückelhaus <lucas at kuckelhaus.com>
> > > > > To: vfio-users at redhat.com
> > > > > Subject: [vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-
> > > > > lts
> > > > > kernel
> > > > > Date: Thu, 22 Oct 2015 23:00:37 -0200
> > > > > Mailer: Roundcube Webmail/1.0.2
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > I'm trying to run an Archlinux host on kernel 4.1.10-1-vfio-
> > > > > lts
> > > > > (Mark
> > > > > Weiman's custom repo) because I'm unable to boot a GPU-
> > > > > assigned
> > > > > VM
> > > > > on
> > > > > 4.2.3-1-vfio.
> > > > > 
> > > > > The VM boots fine and works for a while, but the computer
> > > > > sporadically
> > > > > crashes with the following:
> > > > > 
> > > > > Oct 22 21:43:37 kvmhost kernel: NMI watchdog: BUG: soft
> > > > > lockup -
> > > > > CPU#4
> > > > > stuck for 22s! [swapper/4:0]
> > > > > Oct 22 21:43:39 kvmhost kernel: Modules linked in: veth
> > > > > vhost_net
> > > > > vhost
> > > > > macvtap macvlan tun bridge stp llc nls_iso8859_1 nls_cp437
> > > > > vfat
> > > > > fat
> > > > > iTCO_wdt iTCO_vendor_support nouveau snd_hda_codec_hdmi
> > > > > intel_rapl
> > > > > iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp
> > > > > mxm_wmi
> > > > > snd_hda_
> > > > > Oct 22 21:43:39 kvmhost kernel: sch_fq_codel fuse nfsd nfs
> > > > > auth_rpcgss
> > > > > oid_registry nfs_acl lockd grace sunrpc fscache ip_tables
> > > > > x_tables
> > > > > ext4
> > > > > crc16 mbcache jbd2 dm_mod hid_logitech_hidpp hid_logitech_dj
> > > > > hid_generic
> > > > > usbhid hid sd_mod uas usb_storage atkbd libps2 crc32c_intel
> > > > > ah
> > > > > Oct 22 21:43:39 kvmhost kernel: CPU: 4 PID: 0 Comm: swapper/4
> > > > > Tainted: G
> > > > > L 4.1.10-1-vfio-lts #1
> > > > > Oct 22 21:43:39 kvmhost kernel: Hardware name: To Be Filled
> > > > > By
> > > > > O.E.M. To
> > > > > Be Filled By O.E.M./Z77 Extreme4, BIOS P2.30 09/21/2012
> > > > > Oct 22 21:43:39 kvmhost kernel: task: ffff88080b119460 ti:
> > > > > ffff88080b124000 task.ti: ffff88080b124000
> > > > > Oct 22 21:43:39 kvmhost kernel: RIP:
> > > > > 0010:[<ffffffff810f6770>]
> > > > > [<ffffffff810f6770>] try_to_del_timer_sync+0x0/0xa0
> > > > > Oct 22 21:43:39 kvmhost kernel: RSP: 0018:ffff88082f303db0
> > > > > EFLAGS:
> > > > > 00000286
> > > > > Oct 22 21:43:39 kvmhost kernel: RAX: 00000000ffffffff RBX:
> > > > > 0000000000000286 RCX: 0000000000000000
> > > > > Oct 22 21:43:39 kvmhost kernel: RDX: 00000000000000bf RSI:
> > > > > 0000000000000286 RDI: ffff880270fa8428
> > > > > Oct 22 21:43:39 kvmhost kernel: RBP: ffff88082f303dc8 R08:
> > > > > 0000000000002710 R09: ffff88082f30e780
> > > > > Oct 22 21:43:39 kvmhost kernel: R10: 0000000000000000 R11:
> > > > > 0000000000000004 R12: ffff88082f303d28
> > > > > Oct 22 21:43:39 kvmhost kernel: R13: ffffffff815f13de R14:
> > > > > ffff88082f303dc8 R15: ffff880270fa8428
> > > > > Oct 22 21:43:39 kvmhost kernel: FS: 0000000000000000(0000)
> > > > > GS:ffff88082f300000(0000) knlGS:0000000000000000
> > > > > Oct 22 21:43:39 kvmhost kernel: CS: 0010 DS: 0000 ES: 0000
> > > > > CR0:
> > > > > 0000000080050033
> > > > > Oct 22 21:43:39 kvmhost kernel: CR2: 00007fc2d6f6da28 CR3:
> > > > > 000000029c65c000 CR4: 00000000001426e0
> > > > > Oct 22 21:43:39 kvmhost kernel: Stack:
> > > > > Oct 22 21:43:39 kvmhost kernel: ffffffff810f6872
> > > > > ffff88082f303e38
> > > > > ffff880270fa8390 ffff88082f303df8
> > > > > Oct 22 21:43:39 kvmhost kernel: ffffffff8152a16f
> > > > > ffff880270fa8390
> > > > > ffff8805b3bab800 ffff880270d20000
> > > > > Oct 22 21:43:39 kvmhost kernel: 0000000000000001
> > > > > ffff88082f303e38
> > > > > ffffffff8152a3e7 ffff88082f3107e0
> > > > > Oct 22 21:43:39 kvmhost kernel: Call Trace:
> > > > > Oct 22 21:43:39 kvmhost kernel: <IRQ>
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff810f6872>] ?
> > > > > del_timer_sync+0x62/0x70
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff8152a16f>]
> > > > > inet_csk_reqsk_queue_drop+0xbf/0x240
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff8152a3e7>]
> > > > > reqsk_timer_handler+0xf7/0x2e0
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff8152a2f0>] ?
> > > > > inet_csk_reqsk_queue_drop+0x240/0x240
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff810f64c8>]
> > > > > call_timer_fn+0x48/0x160
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff8152a2f0>] ?
> > > > > inet_csk_reqsk_queue_drop+0x240/0x240
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff810f6bd4>]
> > > > > run_timer_softirq+0x284/0x330
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81086711>]
> > > > > __do_softirq+0xf1/0x2e0
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81086acd>]
> > > > > irq_exit+0xbd/0xc0
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff815f31d5>]
> > > > > smp_apic_timer_interrupt+0x55/0x70
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff815f13de>]
> > > > > apic_timer_interrupt+0x6e/0x80
> > > > > Oct 22 21:43:39 kvmhost kernel: <EOI>
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81021c1d>] ?
> > > > > native_sched_clock+0x2d/0xa0
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81490c81>] ?
> > > > > cpuidle_enter_state+0xa1/0x250
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81490c53>] ?
> > > > > cpuidle_enter_state+0x73/0x250
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81490e8a>]
> > > > > cpuidle_enter+0x2a/0x30
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff810cb36c>]
> > > > > cpu_startup_entry+0x32c/0x460
> > > > > Oct 22 21:43:39 kvmhost kernel: [<ffffffff81055f7e>]
> > > > > start_secondary+0x19e/0x1e0
> > > > > Oct 22 21:43:39 kvmhost kernel: Code: 4d d8 65 48 33 0c 25 28
> > > > > 00
> > > > > 00
> > > > > 00
> > > > > 44 89 e0 75 0b 48 83 c4 18 5b 41 5c 41 5d 5d c3 e8 1b b8 f8
> > > > > ff 90
> > > > > 66 2e
> > > > > 0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 55 48 89 e5 41 54 53
> > > > > 48
> > > > > 81
> > > > > ec
> > > > > 30 10 00 00 48 83
> > > > > 
> > > > > This happens for all cores and it locks up the entire system.
> > > > > I
> > > > > don't
> > > > > know what to do. On 4.2.3-1-vfio I have no hangups and all my
> > > > > non-vfio
> > > > > VMs work perfectly fine.
> > > > > 
> > > > > Thank you,
> > > > > Lucas Kückelhaus
> > > > > 
> > > > > _______________________________________________
> > > > > vfio-users mailing list
> > > > > vfio-users at redhat.com
> > > > > https://www.redhat.com/mailman/listinfo/vfio-users [2]
> > > > 
> > > > 
> > > > Links:
> > > > ------
> > > > [1] https://i.imgur.co
> > > > [2] https://www.redhat.com/mailman/listinfo/vfio-users
> > > 
> > > _______________________________________________
> > > vfio-users mailing list
> > > vfio-users at redhat.com
> > _______________________________________________
> > vfio-users mailing list
> > vfio-users at redhat.com
> > https://www.redhat.com/mailman/listinfo/vfio-users
> > _______________________________________________
> > vfio-users mailing list
> > vfio-users at redhat.com
_______________________________________________
vfio-users mailing list
vfio-users at redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part
URL: <http://listman.redhat.com/archives/vfio-users/attachments/20151023/4ff4fcb8/attachment.sig>


More information about the vfio-users mailing list