[Crash-utility] crash 4.0-2.8 fails on 2.6.14-rc5 (EM64T)
Badari Pulavarty
pbadari at us.ibm.com
Thu Oct 27 18:27:01 UTC 2005
On Thu, 2005-10-27 at 14:16 -0400, Dave Anderson wrote:
> Badari Pulavarty wrote:
> > On Thu, 2005-10-27 at 13:17 -0400, Dave Anderson wrote:
> > > Badari Pulavarty wrote:
> > > > > That debug output certainly seems to pinpoint the issue at
> > hand,
> > > > doesn't it?
> > > > > Very interesting...
> > > > >
> > > > > What's strange is that the usage of the cpu_pda[i].data_offset
> > by
> > > > the
> > > > > per_cpu() macro in "include/asm-x86_64/percpu.h" is
> > unchanged.
> > > > >
> > > > > It's probably something very simple going on here, but I
> > don't
> > > > have
> > > > > any more ideas at this point.
> > > >
> > > > This is the reply I got from Andi Kleen..
> > > >
> > > > -------- Forwarded Message --------
> > > > From: Andi Kleen <ak at suse.de>
> > > > To: Badari Pulavarty <pbadari at us.ibm.com>
> > > > Subject: Re: cpu_pda->data_offset changed recently ?
> > > > Date: Thu, 27 Oct 2005 16:58:54 +0200
> > > > On Thursday 27 October 2005 16:53, Badari Pulavarty wrote:
> > > > > Hi Andi,
> > > > >
> > > > > I am trying to fix "crash" utility to make it work on 2.6.14-
> > rc5.
> > > > > (Its running fine on 2.6.10). It looks like crash utility
> > reads
> > > > > and uses cpu_pda->data_offset values. It looks like there is
> > a
> > > > > change between 2.6.10 & 2.6.14-rc5 which is causing
> > "data_offset"
> > > > > to be huge values - which is causing "crash" to break.
> > > > >
> > > > > I added printk() to find out why ? As you can see from
> > following
> > > > > what changed - Is this expected ? Please let me know.
> > > >
> > > > bootmem used to allocate from the end of the direct mapping on
> > NUMA
> > > > systems. Now it starts at the beginning, often before the
> > > > kernel .text.
> > > > This means it is negative. Perfectly legitimate. crash just has
> > to
> > > > handle it.
> > > >
> > > > -Andi
> > > >
> > > > --
> > > >
> > > That's what I thought it looked like, although the
> > > x8664_pda.data_offset
> > > field is an "unsigned long". Anyway, if you take any of the
> > > per_cpu__xxx
> > > symbols from the 2.6.14 kernel, subtract a cpu data_offset, does
> > it
> > > come up
> > > with a legitimate virtual address?
> >
> > Unfortunately, I don't know x86-64 kernel virtual address space
> > well enough to answer your question.
> >
> > My understanding is x86-64 kernel addresses look something like:
> >
> > addr: ffffffff80101000
> >
> > But now (2.6.14-rc5) I do see address like:
> >
> > pgdat: 0xffff81000000e000
> >
> > which are causing read problems.
> >
> > crash: read error: kernel virtual address: ffff81000000fa90 type:
> > "pglist_data node_next"
> >
> > I am not sure what these address are and if they are valid.
> > Is there a way to verify these addresses, through gdb or /dev/kmem
> > or something like that ?
> >
> > Thanks,
> > Badari
> >
> > Here is bottom line we need to understand to fix
> > the problem.
> >
> > 2.6.10:
> > pgdat: 0x1000000e000
> >
> > 2.6.14-rc5:
> > pgdat: 0xffff81000000e000
>
>
> Exactly.
>
> On a 2.6.9 kernel, if you do an nm -Bn on the vmlinux file, you'll
> first
> see a bunch of "A" type absolute symbols, followed by the text
> symbols, then readonly data, data, and so on. Eventually you'll
> bump into the per-cpu symbols:
>
> $ nm -Bn vmlinux
> 0000000000088861 A __crc_dev_mc_delete
> 000000000014bfd1 A __crc_smp_call_function
> 00000000002de2e0 A __crc___skb_linearize
> 0000000000442f14 A __crc_tty_register_device
> 000000000060e766 A __crc_tty_termios_baud_rate
> 0000000000712c54 A __crc_remove_inode_hash
> 00000000007f8e0b A __crc_xfrm_policy_alloc
> 0000000000801678 A __crc_flush_scheduled_work
> 0000000000a64d75 A __crc_neigh_changeaddr
> ... <snip>
> 00000000ffdf0b3d A __crc_usb_driver_release_interface
> 00000000ffe031fc A __crc_udp_proc_unregister
> 00000000ffead192 A __crc_cdrom_number_of_slots
> 00000000fff9536b A __crc_sock_no_recvmsg
> 00000000fffb8df8 A __crc_device_unregister
> ffffffff80100000 t startup_32
> ffffffff80100000 A _text
> ffffffff80100081 t reach_compatibility_mode
> ffffffff8010008e t second
> ffffffff80100100 t reach_long64
> ffffffff8010013d T initial_code
> ffffffff80100145 T init_rsp
> ffffffff80100150 T no_long_mode
> ffffffff80100f00 T pGDT32
> ffffffff80100f10 t ljumpvector
> ffffffff80100f18 T stext
> ffffffff80100f18 T _stext
> ffffffff80101000 T init_level4_pgt
> ffffffff80102000 T level3_ident_pgt
> ... <snip>
> ffffffff80502100 D per_cpu__init_tss
> ffffffff80502200 d per_cpu__prof_old_multiplier
> ffffffff80502204 d per_cpu__prof_multiplier
> ffffffff80502208 d per_cpu__prof_counter
> ffffffff80502220 D per_cpu__mmu_gathers
> ffffffff80503280 D per_cpu__kstat
> ffffffff80503680 d per_cpu__runqueues
> ffffffff805048e0 d per_cpu__cpu_domains
> ffffffff80504940 d per_cpu__phys_domains
> ffffffff805049a0 d per_cpu__node_domains
> ffffffff805049f8 D per_cpu__process_counts
> ffffffff80504a00 d per_cpu__tasklet_hi_vec
> ffffffff80504a08 d per_cpu__tasklet_vec
> ffffffff80504a10 d per_cpu__ksoftirqd
> ffffffff80504a80 d per_cpu__tvec_bases
> ffffffff80506b00 D per_cpu__rcu_bh_data
> ffffffff80506b60 D per_cpu__rcu_data
> ffffffff80506bc0 d per_cpu__rcu_tasklet
> ...
>
> So for any data that was specifically created per-cpu,
> the symbol above is the starting point, but to get to
> the per-cpu structure, the offset value from the
> cpu_data.data_offset needs to be applied.
>
> What I don't understand is where the 0xffff810000000000
> addresses come into play. Are you seeing them as actual
> symbols?
>
> Dave
It looks like level page table changed the layout. Now,
0xffff810000000000 is a valid.
Documentation/x86_64/mm.txt
Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47bits) user space, different per
mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff80ffffffffff (=40bits) guard hole
ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of phys.
memory
ffffc10000000000 - ffffc1ffffffffff (=40bits) hole
ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space
... unused hole ...
ffffffff80000000 - ffffffff82800000 (=40MB) kernel text mapping, from
phys 0
... unused hole ...
ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space
Thanks,
Badari
More information about the Crash-utility
mailing list