[Crash-utility] kmem: WARNING: cannot find mem_map page for address
Dave Anderson
anderson at redhat.com
Tue Dec 18 14:52:21 UTC 2012
----- Original Message -----
> Hi Dave,
>
> On 12/17/12 11:23, Dave Anderson wrote:
> >>> Right -- I would never expect error() to be called while inside
> >>> an open_tmpfile() operation. Normally the behind-the-scenes data
> >>> is parsed, and if anything is to be displayed while open_tmpfile()
> >>> is still in play, it would be fprint()'ed using pc->saved_fp.
> >>
> >> I think the aesthetically pleasing solution is an "i_am_playing_with_tmpfile()"
> >> call that says it isn't closed and crash functions shouldn't be using it.
> >> Plus a parallel "i_am_done_with_tmpfile()" that gets implied by "close_tmpfile()".
> >> I can supply a patch, if you like. Probably with less verbose function names.
> >
> > If pc->tmpfile is non-NULL, then open_tmpfile() is in use. What would be
> > the purpose of the extra functions?
>
> It would be to allow the client code that is processing that temp file to emit
> warning/info messages without disrupting the reading of that file pointer.
> To me, that doesn't seem unreasonable. You run some code that emits output
> to a temp file and you reprocess those data. You surely do not want such
> messages showing up in the file you are re-processing. And you cannot
> call close_tmpfile() because it calls ftruncate().
>
> So, what is your recommendation for how to reprocess diverted output
> wherein you might occasionally want to say something during that
> reprocessing?
>
> Three solutions come to mind:
>
> 1. Juggle file pointers before and after the __error() function call
> (please say, "No.")
No.
> 2. Create my own temporary file and fiddle the global "fp" and "pc" state so it
> gets used while I am gathering data and crash code doesn't know about it later.
> (I insist the answer must be, "No." because there is too much fiddling with
> intricate crash state.)
No.
> 3. These two functions that I am suggesting:
>
> void
> resume_tmpfile(void)
> {
> int ret ATTRIBUTE_UNUSED;
>
> if (pc->tmpfile)
> error(FATAL, "recursive temporary file usage\n");
>
> if (!pc->tmp_fp)
> error(FATAL, "temporary file not ready\n");
>
> rewind(pc->tmp_fp);
> pc->tmpfile = pc->tmp_fp;
> pc->saved_fp = fp;
> fp = pc->tmpfile;
> }
>
> void
> sequester_tmpfile(void)
> {
> int ret ATTRIBUTE_UNUSED;
>
> if (pc->tmpfile) {
> fflush(pc->tmpfile);
> rewind(pc->tmpfile);
> pc->tmpfile = NULL;
> fp = pc->saved_fp;
> } else
> error(FATAL, "trying to sequester an unopened temporary file\n");
> }
And no...
When open_tmpfile() is in play and you want to print something, you can
always use fprintf(pc->saved_fp, ...) as is done everywhere now.
That being said, if you truly desire to use error() during an open_tmpfile()
operation, then that anomoly should be handled in the error() function.
So, if error() is called during open_tmpfile(), i.e., then the message should
be displayed as it is done now, which is to pc->stdpipe (i.e., the current
more/less scroller if it is in effect), or to stdout if not:
if (pc->stdpipe) {
fprintf(pc->stdpipe, "%s%s%s %s%s",
new_line ? "\n" : "",
type == CONT ? spacebuf : pc->curcmd,
type == CONT ? " " : ":",
type == WARNING ? "WARNING: " :
type == NOTE ? "NOTE: " : "",
buf);
fflush(pc->stdpipe);
} else {
fprintf(stdout, "%s%s%s %s%s",
new_line || end_of_line ? "\n" : "",
type == WARNING ? "WARNING" :
type == NOTE ? "NOTE" :
type == CONT ? spacebuf : pc->curcmd,
type == CONT ? " " : ":",
buf, end_of_line ? "\n" : "");
fflush(stdout);
}
and if the output is currently being redirected to a file or to a pipe,
then it is also issued to those end-points here:
if ((fp != stdout) && (fp != pc->stdpipe)) {
fprintf(fp, "%s%s%s %s", new_line ? "\n" : "",
type == WARNING ? "WARNING" :
type == NOTE ? "NOTE" :
type == CONT ? spacebuf : pc->curcmd,
type == CONT ? " " : ":",
buf);
fflush(fp);
}
It's that "duplication" above that you're seeing.
And I am simply suggesting that the if statement above should be:
if ((fp != stdout) && (fp != pc->stdpipe) && (fp != pc->tmpfile)) {
because you obviously don't want the message intermingled with your open_tmpfile()
output.
>
> I sequester the file after doing the data gathering and resume it
> after I am done reprocessing it. It might be worth putting in a little jig
> to ensure that open/close_tmpfile work reasonably, too. (I would guess
> that either would cancel the sequestration.)
>
> >>> I'm not sure, other than it doesn't seem to be able to find ffffea001bb1d1e8
> >>
> >> I was able to figure that out. I also printed out the "kmem -v" table and
> >> sorted the result. The result with "kmem -n"
> >>
> >> [...]
> >> 66 ffff88087fffa420 ffffea0000000000 ffffea0007380000 2162688
> >> 67 ffff88087fffa430 ffffea0000000000 ffffea0007540000 2195456
> >> 132608 ffff88083c9bdb98 ffff88083c9bdd98 ffff8840e49bdd98 4345298944
> >> 132609 ffff88083c9bdba8 ffff88083c9796c0 ffff8840e4b396c0 4345331712
> >> ;...]
> >>
> >> viz. it ain't there. Which is quite interesting, because if the lustre
> >> cluster file system structure "cfs_trace_data" actually pointed off into
> >> unmapped memory, it would have fallen over long, long before the point
> >> where it did fall over.
> >
> > I don't see the vmemmap range in the "kmem -v" output. It is mapped
> > kernel memory, but AFAIK it's not kept in the kernel's "vmlist" list.
> > Do you see that range in your "kmem -v" output?
>
> Also no. "kmem -v" and "kmem -n" both show the same memory mappings
> (as best as _my_ memory serves, that is. For certain, neither has a mapping
> for 0xffffea001bb1d1e8.)
>
> > OK so you say you cannot get the mappings for it, but what
> > does "vtop 0xffffea001bb1d1e8" show?
>
> This:
>
> > crash> vtop 0xffffea001bb1d1e8
> > VIRTUAL PHYSICAL
> > ffffea001bb1d1e8 879b1d1e8
> >
> > PML4 DIRECTORY: ffffffff817e7000
> > PAGE DIRECTORY: 87fdf7067
> > PUD: 87fdf7000 => 87fdf6067
> > PMD: 87fdf66e8 => 8000000879a001e3
> > PAGE: 879a00000 (2MB)
> >
> > PTE PHYSICAL FLAGS
> > 8000000879a001e3 879a00000 (PRESENT|RW|ACCESSED|DIRTY|PSE|GLOBAL|NX)
>
> But given:
>
> > Sorry -- that's irrelevant. You want to access the physical
> > memory that the odd vmemmap page address references (not the
> > physical page behind the page structure itself).
>
> Exactly right. I need to be able to see the binary bits for that page so I can
> pull them in and write them back out to a file of just those bits. From there,
> we'll be formatting a text file showing the lustre trace log.
>
> Thank you so much! Regards, Bruce
Right... seems like it should be such a simple thing to do... :-(
I don't understand what's going on, but I'm presuming that even if the
vmemmap-type address doesn't fit into the "advertised" vmemmap range,
that the kernel's __page_to_pfn() macro should still work to get the
pfn represented by the page:
#elif defined(CONFIG_SPARSEMEM)
/*
* Note: section's mem_map is encorded to reflect its start_pfn.
* section[i].section_mem_map == mem_map's address - start_pfn;
*/
#define __page_to_pfn(pg) \
({ const struct page *__pg = (pg); \
int __sec = page_to_section(__pg); \
(unsigned long)(__pg - __section_mem_map_addr(__nr_to_section(__sec))); \
})
Maybe you could play around with emulating that macro w/crash, and see what
comes up?
Dave
More information about the Crash-utility
mailing list