[vfio-users] 'dnf update' killed working VM

Laszlo Ersek lersek at redhat.com
Thu Aug 10 10:09:16 UTC 2017


On 08/10/17 05:45, Alex Williamson wrote:
> On Thu, 10 Aug 2017 00:29:36 +0200
> Laszlo Ersek <lersek at redhat.com> wrote:
> 
>> On 08/09/17 23:37, Alex Williamson wrote:
>>> On Wed, 09 Aug 2017 21:55:00 +0100
>>> "Patrick O'Callaghan" <poc at usb.ve> wrote:
>>>  
>>>> On Wed, 2017-08-09 at 13:24 -0500, David wrote:  
>>>>> Anyone else having trouble with a recent version of KVM / QEMU?
>>>>> Also I am still a Linux newbie, how should I troubleshoot this?  
>>>>
>>>> For one thing, you could start by looking in the QEMU log file and/or
>>>> the system journal. You don't give any information about the VM (e.g.
>>>> I assume you're using GPU passthrough but you don't say anything
>>>> about it) so it's going to be hard for anyone else to guess what the
>>>> problem is. Perhaps if you post the XML file it might give someone a
>>>> clue.
>>>>
>>>> Also, note that Fedora 24 was EOL-ed today. You should update your
>>>> system to at least F25 as soon as possible. I'm on F26 and having no
>>>> problems.  
>>>
>>> Yep, really hard to act on the limited information here.  Is it by
>>> chance a GPU assigned VM running OVMF and does that OVMF come from the
>>> kraxel repo rather than the base fedora repo?  Thanks,  
>>
>> Yes, someone who can reproduce the problem -- from the reports, there
>> are several users -- will have to bite the bullet, and bisect OVMF,
>> and/or bisect the host kernel.
> 
> Done.  As with David, I hit the problem that my previously working VM
> just hangs with all the vCPUs pegged.  Replacing Gerd's OVMF build with
> an older one from the virt-preview repo resolves the issue.  Bisecting
> OVMF lands here:
> 
> commit 3b2928b46987693caaaeefbb7b799d1e1de803c0
> Author: Michael Kinney <michael.d.kinney at intel.com>
> Date:   Wed May 17 12:19:16 2017 -0700
> 
>     UefiCpuPkg/MpInitLib: Fix X64 XCODE5/NASM compatibility issues
>     
>     https://bugzilla.tianocore.org/show_bug.cgi?id=565
>     
>     Fix NASM compatibility issues with XCODE5 tool chain.
>     The XCODE5 tool chain for X64 builds using PIE (Position
>     Independent Executable).  For most assembly sources using
>     PIE mode does not cause any issues.
>     
>     However, if assembly code is copied to a different address
>     (such as AP startup code in the MpInitLib), then the
>     X64 assembly source must be implemented to be compatible
>     with PIE mode that uses RIP relative addressing.
>     
>     The specific changes in this patch are:
>     
>     * Use LEA instruction instead of MOV instruction to lookup
>       the addresses of functions.
>     
>     * The assembly function RendezvousFunnelProc() is copied
>       below 1MB so it can be executed as part of the MpInitLib
>       AP startup sequence.  RendezvousFunnelProc() calls the
>       external function InitializeFloatingPointUnits().  The
>       absolute address of InitializeFloatingPointUnits() is
>       added to the MP_CPU_EXCHANGE_INFO structure that is passed
>       to RendezvousFunnelProc().
>     
>     Cc: Andrew Fish <afish at apple.com>
>     Cc: Jeff Fan <jeff.fan at intel.com>
>     Contributed-under: TianoCore Contribution Agreement 1.0
>     Signed-off-by: Michael D Kinney <michael.d.kinney at intel.com>
>     Reviewed-by: Jeff Fan <jeff.fan at intel.com>
>     Reviewed-by: Andrew Fish <afish at apple.com>
> 
> Reverting this patch against current HEAD (7ef0dae092af) also gives me
> a working image.  When it fails, it only gets this far:
> 
> SecCoreStartupWithStack(0xFFFCC000, 0x818000)
> Register PPI Notify: DCD0BE23-9586-40F4-B643-06522CED4EDE
> Install PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
> Install PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
> The 0th FV start address is 0x00000820000, size is 0x000E0000, handle is 0x820000
> Register PPI Notify: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
> Register PPI Notify: EA7CA24B-DED5-4DAD-A389-BF827E8F9B38
> Install PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
> Install PPI: DBE23AA9-A345-4B97-85B6-B226F1617389
> Loading PEIM at 0x0000082B880 EntryPoint=0x0000082E8F9 PcdPeim.efi
> Install PPI: 06E81C58-4AD7-44BC-8390-F10265F72480
> Install PPI: 01F34D25-4DE2-23AD-3FF3-36353FF323F1
> Install PPI: 4D8B155B-C059-4C8F-8926-06FD4331DB8A
> Install PPI: A60C6B59-E459-425D-9C69-0BCC9CB27D81
> Loading PEIM at 0x00000830040 EntryPoint=0x00000831415 ReportStatusCodeRouterPei.efi
> Install PPI: 0065D394-9951-4144-82A3-0AFC8579C251
> Install PPI: 229832D3-7A30-4B36-B827-F40CB7D45436
> Loading PEIM at 0x00000831F40 EntryPoint=0x0000083318A StatusCodeHandlerPei.efi
> Loading PEIM at 0x00000833DC0 EntryPoint=0x00000837D0E PlatformPei.efi
> Select Item: 0x0
> FW CFG Signature: 0x554D4551
> Select Item: 0x1
> FW CFG Revision: 0x3
> QemuFwCfg interface (DMA) is supported.
> Platform PEIM Loaded
> CMOS:
> 00: 17 00 30 00 21 00 04 09 08 17 26 02 10 80 00 00
> 10: 00 00 00 00 06 80 02 FF FF 00 00 00 00 00 00 00
> 20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 30: FF FF 20 00 00 BF 00 20 30 00 00 00 00 12 00 00
> 40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 50: 00 00 00 00 00 00 00 00 00 00 00 00 40 00 00 05
> 60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> Select Item: 0x19
> Select Item: 0x28
> S3 support was detected on QEMU
> Install PPI: 7408D748-FC8C-4EE6-9288-C4BEC092A410
> Select Item: 0x19
> Select Item: 0x24
> Select Item: 0x19
> Select Item: 0x19
> GetFirstNonAddress: Pci64Base=0x800000000 Pci64Size=0x800000000
> Select Item: 0x5
> MaxCpuCountInitialization: QEMU reports 6 processor(s)
> PublishPeiMemory: mPhysMemAddressWidth=36 PeiMemoryCap=65800 KB
> PeiInstallPeiMemory MemoryBegin 0xBBF0E000, MemoryLength 0x4042000
> QemuInitializeRam called
> Select Item: 0x19
> Select Item: 0x24
> Reserved variable store memory: 0xBFECC000; size: 528kb
> Platform PEI Firmware Volume Initialization
> Install PPI: 49EDB1C1-BF21-4761-BB12-EB0031AABB39
> Notify: PPI Guid: 49EDB1C1-BF21-4761-BB12-EB0031AABB39, Peim notify entry point: 826922
> The 1th FV start address is 0x00000900000, size is 0x00A00000, handle is 0x900000
> Select Item: 0x19
> Select Item: 0x19
> Select Item: 0x19
> Select Item: 0x25
> Register PPI Notify: EE16160A-E8BE-47A6-820A-C6900DB0250A
> Temp Stack : BaseAddress=0x814000 Length=0x4000
> Temp Heap  : BaseAddress=0x810000 Length=0x4000
> Total temporary memory:    32768 bytes.
>   temporary memory stack ever used: 16384 bytes.
>   temporary memory heap used:       8000 bytes.
> Old Stack size 16384, New stack size 131072
> Stack Hob: BaseAddress=0xBBF0E000 Length=0x20000
> Heap Offset = 0xBB71E000 Stack Offset = 0xBB716000
> TemporaryRamMigration(0x810000, 0xBBF2A000, 0x8000)
> Loading PEIM at 0x000BFEBF000 EntryPoint=0x000BFEC7C48 PeiCore.efi
> Reinstall PPI: 8C8CE578-8A3D-4F1C-9935-896185C32DD3
> Reinstall PPI: 5473C07A-3DCB-4DCA-BD6F-1E9689E7349A
> Reinstall PPI: B9E0ABFE-5979-4914-977F-6DEE78C278A6
> Install PPI: F894643D-C449-42D1-8EA8-85BDD8C65BDE
> Loading PEIM at 0x000BFEBB000 EntryPoint=0x000BFEBD941 DxeIpl.efi
> Install PPI: 1A36E4E7-FAB6-476A-8E75-695A0576FDD7
> Install PPI: 0AE8CE5D-E448-4437-A8D7-EBF5F194F731
> Loading PEIM at 0x000BFEB7000 EntryPoint=0x000BFEB9304 S3Resume2Pei.efi
> Install PPI: 6D582DBC-DB85-4514-8FCC-5ADF6227B147
> Loading PEIM at 0x000BFEAF000 EntryPoint=0x000BFEB3189 CpuMpPei.efi
> AP Loop Mode is 1
> WakeupBufferStart = 9F000, WakeupBufferSize = 1000
> <hang>
> 
> Without the above patch, we continue on as:
> 
> APIC MODE is 1
> MpInitLib: Find 6 processors in system.
> Does not find any stored CPU BIST information from PPI!
>   APICID - 0x00000000, BIST - 0x00000000
>   APICID - 0x00000001, BIST - 0x00000000
>   APICID - 0x00000002, BIST - 0x00000000
>   APICID - 0x00000003, BIST - 0x00000000
>   APICID - 0x00000004, BIST - 0x00000000
>   APICID - 0x00000005, BIST - 0x00000000
> Install PPI: 9E9F374B-8F16-4230-9824-5846EE766A97
> Install PPI: EE16160A-E8BE-47A6-820A-C6900DB0250A
> Notify: PPI Guid: EE16160A-E8BE-47A6-820A-C6900DB0250A, Peim notify entry point: 835C29
> DXE IPL Entry
> Loading PEIM at 0x000BFE5B000 EntryPoint=0x000BFE605E2 DxeCore.efi
> Loading DXE CORE at 0x000BFE5B000 EntryPoint=0x000BFE605E2
> Install PPI: 605EA650-C65C-42E1-BA80-91A52AB618C6
> CoreInitializeMemoryServices:
>   BaseAddress - 0xBBF32000 Length - 0x3EC7000 MinimalMemorySizeNeeded - 0x10F4000
> ...
> 
> 
> Given the patch identified by bisect, I'll also note that my build
> environment is recent F26 system, I don't see any toolchain stuff
> available for update.
> 
> $ nasm --version
> NASM version 2.13.01 compiled on May 22 2017

You are awesome, Alex!

My nasm (on RHEL-7) is "2.10.07-7.el7". Let me see if using 2.13.01 on
my end as well reproduces the problem.

Interestingly, end of March 2017, HPA announced the upcoming NASM 2.13
release on edk2-devel (and I guess on other project lists as well), and
at that time I tested it (2.13rc10 to be precise) and reported good
results (no regression). Please see this thread:

http://mid.mail-archive.com/6e48e8fe-834c-e660-0fc1-3e38ec1367bb@zytor.com

Mid-April HPA repeated the announcement (for 2.13rc20):

http://mid.mail-archive.com/4f9988da-67f5-137f-2e31-0fe3e8fa51b1@zytor.com

But I didn't retest, because my previous testing had taken a lot of
time, and I hadn't even got a "thanks" from HPA, and I'd been sulking :)

Nonetheless, the commit you identified is dated May 17, so even if I had
tested 2.13rc20 mid-April, I couldn't have caught this. Of course, by
the time of Mike's commit, I had returned to my normal .el7 NASM
package. And Mike, like all other Windows-based edk2 developers, must
have been at NASM 2.12.x -- the edk2 file
"BaseTools/Conf/tools_def.template" says,

# Other Supported Tools
# =====================
#   NASM -- http://www.nasm.us/
#   - NASM 2.10 or later for use with the GCC toolchain family
#   - NASM 2.12.01 or later for use with all other toolchain families

I'll dig into this ASAP.

Thanks Alex, you rock.
Laszlo




More information about the vfio-users mailing list