[Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)

Dave Anderson anderson at redhat.com
Wed Oct 26 15:51:09 UTC 2005


Badari Pulavarty wrote:

> On Wed, 2005-10-26 at 10:25 -0400, Dave Anderson wrote:
> > Badari Pulavarty wrote:
> >
> > > Hi,
> > >
> > > I am getting following failures from "crash" when tried
> > > running on 2.6.14-rc5 on EM64T machine. Is this a known
> > > problem ?
> > >
> > > Thanks,
> > > Badari
> > >
> > > [root at localhost crash-4.0-2.8]# crash --readnow
> > >
> > > crash 4.0-2.8
> > > Copyright (C) 2002, 2003, 2004, 2005  Red Hat, Inc.
> > > Copyright (C) 2004, 2005  IBM Corporation
> > > Copyright (C) 1999-2005  Hewlett-Packard Co
> > > Copyright (C) 1999, 2002  Silicon Graphics, Inc.
> > > Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> > > This program is free software, covered by the GNU General Public
> > > License,
> > > and you are welcome to change it and/or distribute copies of it under
> > > certain conditions.  Enter "help copying" to see the conditions.
> > > This program has absolutely no warranty.  Enter "help warranty" for
> > > details.
> > >
> > > GNU gdb 6.1
> > > Copyright 2004 Free Software Foundation, Inc.
> > > GDB is free software, covered by the GNU General Public License, and you
> > > are
> > > welcome to change it and/or distribute copies of it under certain
> > > conditions.
> > > Type "show copying" to see the conditions.
> > > There is absolutely no warranty for GDB.  Type "show warranty" for
> > > details.
> > > This GDB was configured as "x86_64-unknown-linux-gnu"...
> > >
> > > crash: invalid structure member offset: x8664_pda_level4_pgt
> > >        FILE: x86_64.c  LINE: 332  FUNCTION: x86_64_cpu_pda_init()
> > >
> > > [/usr/bin/crash] error trace: 4456f3 => 4a6c8e => 4a7c0c => 4d0c1e
> > >
> > >   4d0c1e: OFFSET_verify+117
> > >   4a7c0c: x86_64_cpu_pda_init+771
> > >   4a6c8e: x86_64_init+1522
> > >   4456f3: main_loop+50
> > >
> > > --
> > > Crash-utility mailing list
> > > Crash-utility at redhat.com
> > > https://www.redhat.com/mailman/listinfo/crash-utility
> >
> > Between 2.6.10 and 2.6.11, the x8664_pda structure dropped the
> > level4_pgt member, and work needs to be done to get past it.
> >
> > The contents of the x8664_pda.level4_pgt member is only used
> > as a qualifier in two of the crash utility's x86_64.c functions:
> >
> >   x86_64_cpu_pda_init()
> >
> > where it's offset is initialized, and then the contents from each
> > potential x8664_pda structure is assigned to "level4_pgt", and
> > that value is passed to VALID_LEVEL4_PGT_ADDR() for
> > validation.
> >
> > Similarly it's also used in:
> >
> >   x86_64_get_smp_cpus()
> >
> > where, for each potential x8664_pda, its value is assigned to
> > "level4_pgt", and also passed to VALID_LEVEL4_PGT_ADDR()
> > for validation.
> >
> > In both cases, crash is fishing around in the x8664_pda[NR_CPUS]
> > array to try to determine how many of the pda structures are legitimate,
> > and therefore how many cpus are in the system.  But in both functions,
> > it also verifies the x8664_pda structure by requiring that the cpunumber
> > member also makes sense.
> >
> > So I believe the level4_pgt is unnecessary in this case -- and if
> > INVALID_MEMBER(x8664_pda_level4_pgt) is true, then the
> > assignments to "level4_pgt" in both functions must not be attempted,
> > and VALID_LEVEL4_PGT_ADDR() either should not be attempted,
> > or just return TRUE if INVALID_MEMBER(x8664_pda_level4_pgt)
> > is true.
> >
> > I need somebody to build and test this concept on a kernel with
> > this type of x8664_pda strucure.
> >
> > Any volunteers?
> >
>
> Okay, I made those changes - ran into next problem :(
> Ideas ?
>
> Thanks,
> Badari
>
> [root at localhost crash-4.0-2.8]# ./crash
>
> crash 4.0-2.8
> Copyright (C) 2002, 2003, 2004, 2005  Red Hat, Inc.
> Copyright (C) 2004, 2005  IBM Corporation
> Copyright (C) 1999-2005  Hewlett-Packard Co
> Copyright (C) 1999, 2002  Silicon Graphics, Inc.
> Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
> This program is free software, covered by the GNU General Public
> License,
> and you are welcome to change it and/or distribute copies of it under
> certain conditions.  Enter "help copying" to see the conditions.
> This program has absolutely no warranty.  Enter "help warranty" for
> details.
>
> GNU gdb 6.1
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you
> are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for
> details.
> This GDB was configured as "x86_64-unknown-linux-gnu"...
>
> crash: read error: kernel virtual address: ffff8100050eb084  type:
> "tss_struct ist array"

I see that the 2.6.13 kernel defines its init_tss
array like so:

DEFINE_PER_CPU(struct tss_struct, init_tss) ____cacheline_maxaligned_in_smp;

whereas, the earlier 2.6 kernels do it like this:

DECLARE_PER_CPU(struct tss_struct,init_tss);

If this change modifies the way that per-cpu variable addresses
are laid out, then I can't tell you what to do without significant
further investigation. But until proven otherwise, let's presume
that the calculations of the per-cpu data is done the same way.

There are two places where that error message comes from, both
in x86_64_ist_init(), but given that the above per-cpu declarations
are functionally equivalent, there would be the following
kernel symbol in your vmlinux, verifiable like so:

$ nm -Bn vmlinux | grep per_cpu__init_tss
ffffffff80502100 D per_cpu__init_tss
$

If it's not there, crash is hosed, then signficant work needs
to be done to find it.  But if the symbol is still intact in
the 2.6.14 kernel, the failure should have come from an incorrect
calculation of the vaddr of the init_tss below:

static void
x86_64_ist_init(void)
{
               ...

                } else if (symbol_exists("per_cpu__init_tss")) {
                for (c = 0; c < NR_CPUS; c++) {
                        if ((kt->flags & SMP) && (kt->flags & PER_CPU_OFF)) {
                                if (kt->__per_cpu_offset[c] == 0)
                                        break;
                                vaddr = symbol_value("per_cpu__init_tss") +
                                        kt->__per_cpu_offset[c];
                        } else
                                vaddr = symbol_value("per_cpu__init_tss");

                        vaddr += OFFSET(tss_struct_ist);

                        readmem(vaddr, KVADDR, &ms->stkinfo.ebase[c][0],
                                sizeof(ulong) * 7, "tss_struct ist array",
                                FAULT_ON_ERROR);

                        if (ms->stkinfo.ebase[c][0] == 0)
                                break;
                }
        }

I'm also presuming your test kernel is SMP.  But I'm wondering whether
the SMP and PER_CPU_OFF flags are set?

The SMP flag should have been pre-set in kernel_init(), but the
PER_CPU_OFF flag gets set in x86_64_cpu_pda_init(), which you
have modified.

You can display the kt->flags contents with a printk x86_64_ist_init().
If PER_CPU_OFF is not set, then that's probably the issue here.

Can you show your new versions of  x86_64_cpu_pda_init() and
x86_64_get_smp_cpus()?

Dave






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20051026/c47a2f47/attachment.htm>


More information about the Crash-utility mailing list