[vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts kernel

Dan Ziemba zman0900 at gmail.com
Fri Oct 23 02:53:03 UTC 2015


Hey,

I maintain that PKGBUILD.  I think I've been having the same problem,
but it seems to also happen if I reinstall the older linux-vfio 4.1.6.
Here's the latest stack trace I was able to capture: https://i.imgur.co
m/FZkj4ib.jpg  I had to disable the screen timeout so it would stay on
all night with dmesg tailing and I found it like this in the morning.
 Mouse and caps lock still worked, but I couldn't actually do anything
and the clock was frozen.

I was also noticing that booting my system was unreliable.  If I would 
reboot several times in a row, once every two to three time, it would
hang while starting various services and then never start gdm.

Today I tried downgrading systemd and dbus to just before the change
that switched to user buses (See here: https://www.archlinux.org/news/d
-bus-now-launches-user-buses/ ;) I reboot a whole bunch of times using
4.1.10 linux-vfio-lts and it seems reliable.  I have been using the
computer pretty much all day for work and it hasn't had any of the soft
lockup yet, but it may be too soon to tell.  Most of the time in the
past the lockup would happen while idle.

These are the downgrades I made, everything else is up to date as of
this morning.

[2015-10-22 12:22] [ALPM] transaction started
[2015-10-22 12:22] [ALPM] downgraded libsystemd (227-1 -> 225-1)
[2015-10-22 12:22] [ALPM] downgraded libdbus (1.10.0-4 -> 1.10.0-2)
[2015-10-22 12:22] [ALPM] downgraded dbus (1.10.0-4 -> 1.10.0-2)
[2015-10-22 12:22] [ALPM] downgraded systemd (227-1 -> 225-1)
[2015-10-22 12:22] [ALPM] downgraded lib32-systemd (227-1 -> 225-1)
[2015-10-22 12:22] [ALPM] downgraded systemd-sysvcompat (227-1 -> 225-1)
[2015-10-22 12:22] [ALPM] transaction completed

I will follow up tomorrow with whether or not it locks up tonight.  If
we can isolate the problem to systemd or dbus, maybe that's at least
good enough for a bug report.

Dan

-----Original Message-----
From: Lucas Kückelhaus <lucas at kuckelhaus.com>
To: vfio-users at redhat.com
Subject: [vfio-users] Soft lockup on archlinux 4.1.10-1-vfio-lts kernel
Date: Thu, 22 Oct 2015 23:00:37 -0200
Mailer: Roundcube Webmail/1.0.2

Hi,

I'm trying to run an Archlinux host on kernel 4.1.10-1-vfio-lts (Mark 
Weiman's custom repo) because I'm unable to boot a GPU-assigned VM on 
4.2.3-1-vfio.

The VM boots fine and works for a while, but the computer sporadically 
crashes with the following:


Oct 22 21:43:37 kvmhost kernel: NMI watchdog: BUG: soft lockup - CPU#4 
stuck for 22s! [swapper/4:0]
Oct 22 21:43:39 kvmhost kernel: Modules linked in: veth vhost_net vhost 
macvtap macvlan tun bridge stp llc nls_iso8859_1 nls_cp437 vfat fat 
iTCO_wdt iTCO_vendor_support nouveau snd_hda_codec_hdmi intel_rapl 
iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp mxm_wmi snd_hda_
Oct 22 21:43:39 kvmhost kernel:  sch_fq_codel fuse nfsd nfs auth_rpcgss 
oid_registry nfs_acl lockd grace sunrpc fscache ip_tables x_tables ext4 
crc16 mbcache jbd2 dm_mod hid_logitech_hidpp hid_logitech_dj hid_generic 
usbhid hid sd_mod uas usb_storage atkbd libps2 crc32c_intel ah
Oct 22 21:43:39 kvmhost kernel: CPU: 4 PID: 0 Comm: swapper/4 Tainted: G 
             L  4.1.10-1-vfio-lts #1
Oct 22 21:43:39 kvmhost kernel: Hardware name: To Be Filled By O.E.M. To 
Be Filled By O.E.M./Z77 Extreme4, BIOS P2.30 09/21/2012
Oct 22 21:43:39 kvmhost kernel: task: ffff88080b119460 ti: 
ffff88080b124000 task.ti: ffff88080b124000
Oct 22 21:43:39 kvmhost kernel: RIP: 0010:[<ffffffff810f6770>]  
[<ffffffff810f6770>] try_to_del_timer_sync+0x0/0xa0
Oct 22 21:43:39 kvmhost kernel: RSP: 0018:ffff88082f303db0  EFLAGS: 
00000286
Oct 22 21:43:39 kvmhost kernel: RAX: 00000000ffffffff RBX: 
0000000000000286 RCX: 0000000000000000
Oct 22 21:43:39 kvmhost kernel: RDX: 00000000000000bf RSI: 
0000000000000286 RDI: ffff880270fa8428
Oct 22 21:43:39 kvmhost kernel: RBP: ffff88082f303dc8 R08: 
0000000000002710 R09: ffff88082f30e780
Oct 22 21:43:39 kvmhost kernel: R10: 0000000000000000 R11: 
0000000000000004 R12: ffff88082f303d28
Oct 22 21:43:39 kvmhost kernel: R13: ffffffff815f13de R14: 
ffff88082f303dc8 R15: ffff880270fa8428
Oct 22 21:43:39 kvmhost kernel: FS:  0000000000000000(0000) 
GS:ffff88082f300000(0000) knlGS:0000000000000000
Oct 22 21:43:39 kvmhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 
0000000080050033
Oct 22 21:43:39 kvmhost kernel: CR2: 00007fc2d6f6da28 CR3: 
000000029c65c000 CR4: 00000000001426e0
Oct 22 21:43:39 kvmhost kernel: Stack:
Oct 22 21:43:39 kvmhost kernel:  ffffffff810f6872 ffff88082f303e38 
ffff880270fa8390 ffff88082f303df8
Oct 22 21:43:39 kvmhost kernel:  ffffffff8152a16f ffff880270fa8390 
ffff8805b3bab800 ffff880270d20000
Oct 22 21:43:39 kvmhost kernel:  0000000000000001 ffff88082f303e38 
ffffffff8152a3e7 ffff88082f3107e0
Oct 22 21:43:39 kvmhost kernel: Call Trace:
Oct 22 21:43:39 kvmhost kernel:  <IRQ>
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f6872>] ? 
del_timer_sync+0x62/0x70
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a16f>] 
inet_csk_reqsk_queue_drop+0xbf/0x240
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a3e7>] 
reqsk_timer_handler+0xf7/0x2e0
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a2f0>] ? 
inet_csk_reqsk_queue_drop+0x240/0x240
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f64c8>] 
call_timer_fn+0x48/0x160
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff8152a2f0>] ? 
inet_csk_reqsk_queue_drop+0x240/0x240
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810f6bd4>] 
run_timer_softirq+0x284/0x330
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81086711>] 
__do_softirq+0xf1/0x2e0
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81086acd>] irq_exit+0xbd/0xc0
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff815f31d5>] 
smp_apic_timer_interrupt+0x55/0x70
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff815f13de>] 
apic_timer_interrupt+0x6e/0x80
Oct 22 21:43:39 kvmhost kernel:  <EOI>
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81021c1d>] ? 
native_sched_clock+0x2d/0xa0
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490c81>] ? 
cpuidle_enter_state+0xa1/0x250
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490c53>] ? 
cpuidle_enter_state+0x73/0x250
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81490e8a>] 
cpuidle_enter+0x2a/0x30
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff810cb36c>] 
cpu_startup_entry+0x32c/0x460
Oct 22 21:43:39 kvmhost kernel:  [<ffffffff81055f7e>] 
start_secondary+0x19e/0x1e0
Oct 22 21:43:39 kvmhost kernel: Code: 4d d8 65 48 33 0c 25 28 00 00 00 
44 89 e0 75 0b 48 83 c4 18 5b 41 5c 41 5d 5d c3 e8 1b b8 f8 ff 90 66 2e 
0f 1f 84 00 00 00 00 00 <0f> 1f 44 00 00 55 48 89 e5 41 54 53 48 81 ec 
30 10 00 00 48 83



This happens for all cores and it locks up the entire system. I don't 
know what to do. On 4.2.3-1-vfio I have no hangups and all my non-vfio 
VMs work perfectly fine.

Thank you,
Lucas Kückelhaus

_______________________________________________
vfio-users mailing list
vfio-users at redhat.com
https://www.redhat.com/mailman/listinfo/vfio-users




More information about the vfio-users mailing list