[Crash-utility] handling missing kdump pages in diskdump format

Dave Anderson anderson at redhat.com
Fri Mar 30 21:09:12 UTC 2007


Bob Montgomery wrote:

> On Thu, 2007-03-29 at 08:13 -0500, Dave Anderson wrote:
> > Ken'ichi Ohmichi wrote:
>
> > > I checked whether this change is correct by the following:
> > > (The following patches are attached with this mail)
> > > - makedumpfile-1.1.2 with "point_same_zero_page2.patch" creates a dumpfile.
> > > - crash-4.0-3.21 with "not-access-excluded-page.patch" analyzes the dumpfile.
> > > - The analysis result of the dumpfile is compared with /proc/vmcore's.
> > >
> > > And on i386 linux-2.6.19, I found the difference between the result
> > > of the dumpfile (excluding free pages) and /proc/vmcore's by subcommand
> > > "foreach bt".
> > > But by using crash-4.0-3.21 without "not-access-excluded-page.patch",
> > > there is not any difference. In a word, this difference happens due to
> > > considering the excluded pages as unaccess pages.
>
> Just to clarify for those who probably aren't as confused as I was at
> first:
>
> This isn't a test of the zero page trick, because with the changes to
> makedumpfile, zero pages are no longer actually excluded.  (I read
> "excluding free pages" but immediately thought "excluding zero pages"
> and spent more than a few minutes checking how that could possibly have
> happened.)
>
> So this is apparently a case where a page excluded because it was
> supposedly free is then maybe accessed by the back tracer while it might
> be trying to read kernel text, right?  But kernel text should never look
> free, so I'm still puzzled.  Did makedumpfile mis-identify a real page
> as free, or is crash asking for pages it shouldn't be looking at during
> backtrace?
>

No -- it's kernel text that was marked as __init, so the page containing
it got freed and reallocated as a page that was purposely excluded.
The page originally contained the "start_kernel" __init function, which
only gets executed once by the first swapper thread.

The problem is that crash shouldn't be looking at that text location
when doing a backtrace on that PID 0, because it should have
stopped the trace as soon as it saw the "cpu_idle" stack reference.
I don't know why it's doing that -- I tried simulating Ken'ichi's vmcore
by forcibly returning an error if readmem() got a request for the
page originally containing "start_kernel", but the backtrace worked
OK -- even though I could see the "start_kernel" reference on
the stack when using "bt -t".

Anyway, that's why I've asked Ken'ichi if he can make his
vmlinux/vmcore pair available for me to debug.

Thanks,
  Dave



>
> Anyway, with my test dump on my x86_64 box, I don't get a case where
> dumpfiles with excluded free pages produce different "foreach bt" output
> than I get from the vmcore file.  I tried -d16 and -d31 options.  I do
> get the expected excluded page message when I x/xg an excluded address,
> just no problems with bt.  So I can't help Dave with an example.
>
> Bob Montgomery




More information about the Crash-utility mailing list