[Crash-utility] crash no longer works with x86_64 xen-syms / kdump-vmcore (xen 3.1.2-based)

Wed Apr 16 19:36:35 UTC 2008

Hi Oda-san,

We have a problem with the RHEL5.2 xen hypervisor kdump vmcores.

The RHEL5.2 hypervisor sources have been upgraded to xen version 3.1.2,
and the x86_64 hypervisor is now relocatable to a dynamically-determined
physical address at boot-time.  That being the case, the crash utility
cannot possibly work with xen-syms/vmcore pairs without knowing how to
translate hypervisor virtual addresses to their physical counterparts.

The makedumpfile utility would run into the same problem.

I suspect you may have seen this working with upstream xen?

This is how I understand it:

Prior to RHEL5.2, the hypervisor's text and static data region was
direct-mapped.  There was (and still is) a direct-mapped region defined
like so:

   DIRECTMAP_VIRT_START  0xffff830000000000
   DIRECTMAP_VIRT_END    0xffff840000000000

and the hypervisor text and static data was located inside of that
direct-mapped region:

   # nm -Bn xen-syms-2.6.18-53.el5.debug
   ffff830000100000 A _start
   ffff830000100000 A _stext
   ffff830000100000 T start
   ffff830000100048 t bad_cpu
   ffff83000010004f t not_multiboot
   ffff830000100054 t print_err
   ffff830000100075 t __start
   ... [ snip ] ...
   ffff83000020e6e0 b model
   ffff83000020e700 b cpu_msrs
   ffff83000020e900 b saved_lvtpc
   ffff83000020ea00 b reset_value
   ffff83000020ea40 b reset_value
   ffff83000020ea60 b reset_value
   ffff83000020ea80 A _end
   #

Because hypervisor text/static-data it was direct-mapped, the
virtual-to-physical address translation was simple -- only requiring
that the DIRECTMAP_VIRT_START identifier (0xffff830000000000) be
subtracted from the virtual address, leaving the physical address.

In the upgraded RHEL5.2 hypervisor, the kernel text and static data
is dynamically relocated at boot time, to a location that is based
upon the physical memory layout of the host machine.  It has its own
1GB mapped region, that exists here:

   XEN_VIRT_START  0xffff828c80000000
   XEN_VIRT_END    0xffff828cc0000000

   # nm -Bn xen-syms-2.6.18-89.el5.debug
   ffff828c80100000 A _start
   ffff828c80100000 A _stext
   ffff828c80100000 T start
   ffff828c80100014 t __high_start
   ffff828c801000b7 t int_msg
   ffff828c801000d7 t hex_msg
   ... [ snip ] ...
   ffff828c8024e4a0 b cpu_msrs
   ffff828c8024e8a0 b saved_lvtpc
   ffff828c8024eaa0 b cpu_type
   ffff828c8024eac0 b reset_value
   ffff828c8024eb00 b reset_value
   ffff828c8024eb20 b reset_value
   ffff828c8024eb40 A _end
   #

So translating hypervisor virtual addresses to their physical
address can no longer be done by simply subtracting a direct-map
identifier like before.

Note that the hypervisor code's new version of __pa(), for example,
ends up doing this:

     if ( va > DIRECTMAP_VIRT_START )
         return va - DIRECTMAP_VIRT_START;
     return va - XEN_VIRT_START + xen_phys_start;

where the value of "xen_phys_start" is the base physical address
of the relocated hypervisor text and static data.

Again, there is the problem -- when crash is looking at a xen-syms
binary and a vmcore, it does not know the value of "xen_phys_start"
and therefore cannot translate hypervisor virtual addresses, and
therefore is completely useless.

It seems to me that the xen kdump mechanism needs to be modified
to store the "xen_phys_start" value in the vmcore someplace?

Do have any thoughts on how best to address this?

Thanks,
   Dave