[Crash-utility] determining a "valid" vmcore

Dave Anderson anderson at redhat.com
Thu Feb 7 19:40:57 UTC 2008


Andrew Hecox wrote:
> On Thu, 2008-02-07 at 11:27 -0500, Dave Anderson wrote:
>> Andrew Hecox wrote:
>>> On Thu, 2008-02-07 at 10:32 -0500, Dave Anderson wrote:
>>>> Andrew Hecox wrote:
>>>>> hello,
>>>>>
>>>>> I'm looking at a customer issue where diskdumpmsg is unable to read a
>>>>> vmcore file. It is not clear if this a problem with the vmcore file or
>>>>> diskdumpmsg. I can load the vmcore with crash and in my naive usage of
>>>>> it, can see no problems. However, I'm new to the tool so that doesn't
>>>>> give me a lot of confidence. 
>>>>>
>>>>> Does anyone have any suggestions on how or if I can use crash to help
>>>>> determine if there's corruption in the vmcore file? Or any other way of
>>>>> approaching the problem? 
>>>>>
>>>>> Thanks much,
>>>>>
>>>>> Andrew
>>>>>
>>>> I'm not sure what you expect the crash utility to do -- if it comes
>>>> up to a prompt with no error or warning messages, it means that the
>>>> ELF header contains what appears to be valid usable information,
>>>> and that the minimum kernel memory contents required to set up the
>>>> crash utility's notion of the running system are all in place.  That's
>>>> not to say that there is no chance that the vmcore contains some
>>>> corruption that was not recognized.
>>>>
>>> Thanks. Any other suggestions on how to determine if a vmcore is "valid"
>>> or is that not even a reasonable question to try and ask? The problem
>>> I'm trying to solve is described better below:
>>>
>>>> With respect to diskdumpmsg, as I understand it, it was fairly recently
>>>> changed from a perl script to a C file so that it could be run
>>>> earlier in time so as to be able to use the swap partition.  Looking
>>>> at main() in the diskdumpmsg.c file (version 1.4.1-2), there are numerous
>>>> error types and associated error messages.  What do you mean when you
>>>> say that "diskdumpmsg is unable to read a vmcore file"?
>>> Specifically: 
>>>
>>>  - user reported a floating point exception from diskdump on startup
>>>  - the result was reproducible locally but only with their vmcore file
>>>  - fpe occurred in get_logbuf:
>>>                 log_end %= log_buf_len;
>>>  - log_buf_len had been set to 0 in read_buffer
>>>           if (!page_is_dumpable(pfn, dump->device)) {
>>>               memset(buf, 0, copy_len);
>>>           } else {
>>>  - I don't know enough to say if the page really wasn't dumpable. 
>>> static inline bool page_is_dumpable(unsigned int nr, DumpDevice *device)
>>> {
>>>   return device->dumpable_bitmap[nr>>3] & (1 << (nr & 7));
>>> }
>>>  - I wrote a patch with one way to avoid the FPE (attached) and sent it
>>> to SEG.
>>>
>>> Now I'm trying to determine if the vmcore file should be readable by
>>> diskdumpmsg. In other words, is this a problem in diskdumpmsg post-crash
>>> or a problem with the vmcore file prior to it getting to diskdumpmsg.
>>> Unfortunately, I don't understand the problem domain very well at all,
>>> hence the probably naive questions :)
>>>
>>> Any suggestions are appreciated.
>>>
>>> -Andrew
>> So it appears that the page containing the log_buf_len symbol is not
>> readable or contained in the dumpfile.  BTW, is this a compressed
>> dumpfile or an ELF formatted dumpfile?  And what "dump_level" did
>> they configure?
>>
> 
> compressed, level is 19.
> 
>> Anyway, back to the log_buf_len symbol read, what happens when you
>> enter the "log" command while in a crash session?  It attempts to
>> read that symbol immediately.
>>
> 
> I get what appears to be a full and valid dump of the kernel message
> buffer. 
> 

The crash utility has the same page_is_dumpable() function, which I presume
looks at precisely the same bitmap data from the dumpfile.  And that
must be working, given that the "log" command works as expected.

One difference is that diskdumpmsg uses /boot/System.map-<release> for
the symbol values, whereas crash uses the vmlinux file.  It might be
of interest to determine whether the value of "log_buf_len" used by
diskdumpmsg is the same symbol value as used by crash.

Dave





More information about the Crash-utility mailing list