[Crash-utility] crash seek error, failed to read vmcore file

Thu Apr 22 10:07:11 UTC 2010

On Wed, 2010-04-21 at 09:58 -0400, Dave Anderson wrote:
> ----- "Pavan Naregundi" <pavan at linux.vnet.ibm.com> wrote:
> 
> > On Tue, 2010-04-20 at 09:14 -0400, Dave Anderson wrote:
> > > ----- "Pavan Naregundi" <pavan at linux.vnet.ibm.com> wrote:
> > > 
> > > The cause for seek errors depends upon the type
> > > of dumpfile.
> > > 
> > > You didn't mention which type of dumpfile the vmcore
> > > is, so I'll presume that it's either an ELF-format
> > > kdump or a compressed kdump created by makedumpfile.
> > > 
> > > So presuming that it's a compressed kdump, the seek error 
> > > most likely comes from here in read_diskdump() in diskdump.c:
> > > 
> > >         if ((pfn >= dd->header->max_mapnr) || !page_is_ram(pfn))
> > >                 return SEEK_ERROR;
> > > 
> > > where the requested physical address pfn values are larger
> > > than the max_mapnr value advertised in the header.
> > > 
> > > When you do any "crash -d# ...", the dumpfile header will
> > > be dumped first.  What does that show?
> > > 
> > > Dave
> > 
> > 
> > Dave,
> > 
> > Dumpfile is compressed kdump created by makedumpfile.
> > 
> > header shows the following values: 
> > max_mapnr: 32768
> > block_shift: 16
> > 
> > Yes. Adding some debug printf's shows me that (pfn >=
> > dd->header->max_mapnr) fails. 
> > 
> > For example: in the first seek error,
> > crash: seek error: kernel virtual address: c0000000af715480  type:
> > "kmem_cache buffer"
> > 
> > paddr: af715480 => pfn=44913
> > 
> > crash -d8 log: http://pastebin.com/qrCvyPfR
> > 
> > Thanks..Pavan
> 
> OK, so the compressed dumpfile has exactly 32768 pages of physical
> memory, or exactly 2GB.  That being the case, the crash utility
> will fail all readmem attempts above that value, and obviously 
> there is critical data above the artificial 2GB threshold.  
> 
> The question at hand is why kdump is creating a truncated dumpfile
> with a max_mapnr of 32768:
> 
> (1) makedumpfile determines the "max_mapnr" value based upon the 
>     highest physical address found in any of the PT_LOAD segments
>     of the /proc/vmcore file on the secondary kernel.
> (2) the /proc/vmcore PT_LOAD segments were pre-calculated during
>     the primary kernel's kdump initialization phase, based upon
>     the values found in the set of "/proc/device-tree/memory at xxx/reg"
>     files existing in the primary kernel, where the "xxx" is the
>     starting physical address of the memory region, and the "reg"
>     file in that directory contains the size of the memory region. 
> 
> For whatever reason, those files showed a maximum of 2GB of
> physical memory.  (If you do not use makedumpfile, and then do
> a "readelf -a" of the resultant vmcore file, you will see 
> the PT_LOAD segment values.)
> 
> Does the SLES11 vmlinux-2.6.32.10-0.4.99.25.62005-ppc64 kernel
> contain this patch?:
> 
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8be8cf5b47f72096e42bf88cc3afff7a942a346c
> 
> I ask because we also have an outstanding bugzilla that exhibits similar
> behavior, where an abnormally small ppc64 vmcore file gets created
> because there was only a single /proc/device-tree/memory at 0 directory
> file that showed just a small subset of the total physical memory.
> Typically there are many of those "memory at xxx" directories, but in
> the failing scenario, there was only one /proc/device-tree/memory at 0
> directory.
> 
> Anyway, there's (unproven) speculation that the kernel patch above
> is related to the problem.
> 
> In any case, unfortunately, there's nothing can be done from the crash
> utility's perspective. 
>   
> Dave

Thank you Dave.

Our SLES11 does not have the above patch you mentioned, but at the same
time system is not AMS enabled and CONFIG_CMM is also not set in the
config file..

This system also has /proc/device-tree/memory at 0 dir only..

Regards..Pavan