[Crash-utility] [patch] Crash-Utility: ppc64: Fix a crash issue where it fails to read vmcore generated by post-v2.6.34 kernel version.

Dave Anderson anderson at redhat.com
Thu Sep 15 14:39:12 UTC 2011



----- Original Message -----
> With some changes in the upstream ppc64 kernel starting from 2.6.34, the
> crash utility fails to read vmcore generated by post-v2.6.34 kernel versions.
> 
> In v2.6.34 ppc64, the upstream commit 1426d5a3bd07589534286375998c0c8c6fdc5260
> (powerpc: Dynamically allocate pacas) now dynamically allocates the paca and
> have changed data type of 'paca' symbol from array to pointer. With this change
> in place crash utility fails to read vmcore generated for upstream kernel.
> 
> This patch fixes the crash tool to get correct address value for paca
> symbol depending on its data type.
> 
> In v2.6.36 ppc64, the upstream commit fc53b4202e61c7e9008c241933ae282aab8a6082
> overwrites the paca pointer variable to point to static paca (during crash
> just before kexec) that contains valid data_offset only for crashing cpu.
> Hence we can not rely on paca symbol anymore to get valid per cpu data_offset
> values. Instead, this version introduces __per_cpu_offset array again which
> was removed post v2.6.15 ppc64. This patch checks for existence of
> '__per_cpu_offset' symbol before calling ppc64_paca_init().
> 
> This fix is backward compatible and works fine with vmcore generated by older
> kernel (pre-2.6.34).

Hi Mahesh,

The only question I have re: this patch is that it will prevent 
this part at the bottom of ppc64_paca_init() from running on 
the newer kernels:

        ...

	switch (map)
        {
        case POSSIBLE:
                if (cpus > kt->cpus) {
                        i = get_highest_cpu_online() + 1;
                        if (i > kt->cpus)
                                kt->cpus = i;
                }
                break;
        case ONLINE:
        case PRESENT:
                kt->cpus = cpus;
                break;
        }
        if (kt->cpus > 1)
                kt->flags |= SMP;
}

The potential kt->cpus override above was precisely this patch:

diff -r1.42 -r1.43
2514c2514,2527
<       kt->cpus = cpus;
---
>       switch (map)
>       {
>       case POSSIBLE:
>               if (cpus > kt->cpus) {
>                       i = get_highest_cpu_online() + 1;
>                       if (i > kt->cpus)
>                               kt->cpus = i;
>               }
>               break;
>       case ONLINE:
>       case PRESENT:
>               kt->cpus = cpus;
>               break;
>       }

that was introduced in crash-5.0.0:

     - Fix for a 4.0-8.11 regression that introduced a bug in determining 
       the number of cpus in ppc64 kernels when the cpu_possible_[map/mask]
       has more cpus than the cpu_online_[map/mask].  In that case, the 
       kernel contains per-cpu runqueue data and "swapper" tasks for the 
       extra cpus.  Without the patch, on systems with a possible cpu count 
       that is larger than its online cpu count: 
        (1) the "sys" command will reflect the possible cpu count.
        (2) the "ps" command will show the existent-but-unused "swapper" 
            tasks as active on the extra cpus. 
        (3) the "set" command will allow the current context to be set to 
            any of the existent-but-unused "swapper" tasks. 
        (4) the "runq" command will display existent-but-unused runqueue 
            data for the extra cpus. 
        (5) the "bt" command on the existent-but-unused "swapper" tasks will 
            indicate: "bt: cannot determine NT_PRSTATUS ELF note for active 
            task: <task>" on dumpfiles, and "(active)" on live systems.
        (anderson at redhat.com)

So your patch effectively reverts it for the newer kernels.  Since 2.6.36 kernels
have the cpu_possible_mask, I think you should still apply the "case POSSIBLE" 
logic above somewhere, correct?

Dave

 
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> ---
> ppc64.c | 34 ++++++++++++++++++++++++++++++++--
> 1 file changed, 32 insertions(+), 2 deletions(-)
> 
> Index: crash-5.1.7/ppc64.c
> ===================================================================
> --- crash-5.1.7.orig/ppc64.c
> +++ crash-5.1.7/ppc64.c
> @@ -201,7 +201,22 @@ ppc64_init(int when)
> 
> machdep->section_size_bits = _SECTION_SIZE_BITS;
> machdep->max_physmem_bits = _MAX_PHYSMEM_BITS;
> - ppc64_paca_init();
> + /*
> + * starting from v2.6.36 we can not rely on paca structure
> + * to get per cpu data_offset. The upstream commit
> + * fc53b4202e61c7e9008c241933ae282aab8a6082 overwrites the
> + * paca pointer variable to point to static paca that contains
> + * valid data_offset only for crashing cpu.
> + *
> + * But the kernel v2.6.36 ppc64 introduces __per_cpu_offset
> + * symbol which was removed post v2.6.15 ppc64 and now we
> + * get the per cpu data_offset from __per_cpu_offset symbol
> + * during kernel_init() call. Hence for backward (pre-2.6.36)
> + * compatibility, call ppc64_paca_init() only if symbol
> + * __per_cpu_offset does not exist.
> + */
> + if (!symbol_exists("__per_cpu_offset"))
> + ppc64_paca_init();
> machdep->vmalloc_start = ppc64_vmalloc_start;
> MEMBER_OFFSET_INIT(thread_struct_pg_tables,
> "thread_struct", "pg_tables");
> @@ -2606,10 +2621,25 @@ ppc64_paca_init(void)
> char *cpu_paca_buf;
> ulong data_offset;
> int map;
> + ulong paca;
> 
> if (!symbol_exists("paca"))
> error(FATAL, "PPC64: Could not find 'paca' symbol\n");
> + /*
> + * In v2.6.34 ppc64, the upstream commit
> + * 1426d5a3bd07589534286375998c0c8c6fdc5260 (powerpc: Dynamically
> + * allocate pacas) now dynamically allocates the paca and have
> + * changed data type of 'paca' symbol from array to pointer. With
> this
> + * change in place crash utility fails to read vmcore generated for
> + * upstream kernel.
> + * Add a check for paca variable data type before accessing.
> + */
> 
> + if (get_symbol_type("paca", NULL, NULL) == TYPE_CODE_PTR)
> + readmem(symbol_value("paca"), KVADDR, &paca, sizeof(ulong),
> + "paca", FAULT_ON_ERROR);
> + else
> + paca = symbol_value("paca");
> if (cpu_map_addr("possible"))
> map = POSSIBLE;
> else if (cpu_map_addr("present"))
> @@ -2648,7 +2678,7 @@ ppc64_paca_init(void)
> if (!in_cpu_map(map, i))
> continue;
> 
> - readmem(symbol_value("paca") + (i * SIZE(ppc64_paca)),
> + readmem(paca + (i * SIZE(ppc64_paca)),
> KVADDR, cpu_paca_buf, SIZE(ppc64_paca),
> "paca entry", FAULT_ON_ERROR);




More information about the Crash-utility mailing list