[Crash-utility] [PATCH] bug on get_be_long() and improvement of bt

Dave Anderson anderson at redhat.com
Mon Oct 18 13:56:41 UTC 2010


----- "Dave Anderson" <anderson at redhat.com> wrote:

> ----- "Hu Tao" <hutao at cn.fujitsu.com> wrote:
> 
> > Hi Dave,
> > 
> >    There is a bug on get_be_long() that causes high 32 bits truncated.
> >    As a result, we get wrong registers values from dump file. Patch 1
> >    fixes this.
> 
> Good catch!
>  
> >    Once we can get right cpu registers values, it's better to use the
> >    sp/ip for backtracing the active task. This can show a more accurate
> >    backtrace, not including those invalid frames beyond sp. pathes 2 and
> >    3 do this on kvmdump case(virsh dump).
> > 
> >    To verify: run that km_probe.c test module on a x86_64 system, then
> >    `echo q > /proc/sysrq-trigger' to trigger the kprobe which does
> >    looping in post_handler. Then vrish dump then crash. 
> 
> However, I'm wondering whether you actually tested this with a 
> *crashed* system's dumpfile, and not just with a *live* system dump with
> a contrived set of circumstances.  And if so, what differences, if any,
> did you see with the backtraces of the task that either oops'd or called
> panic(), as well as those of the other active tasks that were brought down
> by IP interrupt?
> 
> Anyway, I'll give this a thorough testing with a set of sample
> dumpfiles that I have on hand.

Actually, the patch fails miserably on SMP dumpfiles with segmentation 
during initialization.

And now looking at your patch, I'm wondering whether you even tested this
with an SMP system?

The change to qemu_load() here:

@@ -904,6 +906,9 @@ qemu_load (const struct qemu_device_loader *devices, uint32_t required_features,
                if (feof (fp) || ferror (fp))
                        break;

+               if (STREQ(d->vtbl->name, "cpu"))
+                       result->dx86 = d;
+
                if (sec == QEMU_VM_SECTION_END || sec == QEMU_VM_SECTION_FULL)
                        result->features |= features;
        }

That function cycles through the "cpu" devices for *each* cpu
in the system, so this patch will store the device of the last cpu
device it encounters.  So in an SMP machine, it will store the
device for the highest cpu only, right?

And then there's this change to get_kvmdump_regs():

@@ -310,7 +311,11 @@ kvmdump_memory_dump(FILE *ofp)
 void
 get_kvmdump_regs(struct bt_info *bt, ulong *pc, ulong *sp)
 {
-       machdep->get_stack_frame(bt, pc, sp);
+       if (is_task_active(bt->task)) {
+               *sp = device_list->dx86->regs[R_ESP];
+               *pc = device_list->dx86->eip;
+       } else
+               machdep->get_stack_frame(bt, pc, sp);
 }

is_task_active() returns TRUE for all active tasks in an SMP system,
not just the panic task.  So it would seem you're going to pass
back the same registers for all active tasks?

And what's the point of this change to kernel.c?

diff --git a/kernel.c b/kernel.c
index e399099..2627020 100755
--- a/kernel.c
+++ b/kernel.c
@@ -16,6 +16,7 @@
  */

 #include "defs.h"
+#include "qemu-load.h"
 #include "xen_hyper_defs.h"
 #include <elf.h>

Also, the change to main.c is unnecessary -- there are dozens of malloc'd
memory areas in the program -- so why go to the bother of free'ing
just this one prior to exiting?

Anyway, instead of saving the device list, I suggest you do something
like storing the per-cpu IP/SP values in a separate data structure that
can possibly be used as an alternative source for register values for
"live dumps" -- and possibly for crashing systems if usable starting
hooks cannot be determined in the traditional manner.  I had thought
of doing something like that in the past, but when I looked at the
register values, I must have run into the get_be_long() issue?  

Dave
 










More information about the Crash-utility mailing list