[Crash-utility] Crash physical search on live session not recommended :-)

Dave Anderson anderson at redhat.com
Mon Feb 28 21:59:46 UTC 2011



----- Original Message -----
> On Thu, 2011-02-24 at 15:18 +0000, Dave Anderson wrote:
> >
> > ----- Original Message -----
> > > While testing my search patch, I kicked off an unconstrained physical
> > > search on a live session and hung the machine so thoroughly that it
> > > required a visit to the machine room to physically unplug it to get the
> > > remote console back up. Coincidence? Or should physical address search
> > > on a live session be constrained somehow for safety?
> > >
> > > Bob Montgomery
> >
> > Maybe so -- I had no problem with any of the systems I've tested it on.
> >
> > Is it always reproducible on that system?
> I'll let you know when I get a chance to test again. If it fails like
> it did before, it will tie up two of us for the 10-minute walk to the
> machine room where I don't currently have access :-).
> >
> > And does that system use /dev/mem or /dev/crash?
> /dev/mem
> 
> >
> > It would be interesting to know if a particular physical address caused it,
> > or if there are physical pages that are read that are *not* read when an
> > unconstrained kernel virtual search is done?
> 
> The pages should have been copied to the buffer a page at a time, right?
> So the search access pattern within the buffer shouldn't affect how
> physical memory was accessed (I was thinking that string search's byte
> aligned access might have mattered). Could the physical search come
> up with a page in /dev/mem that wouldn't also be accessed in the
> identity-mapped virtual case?

I believe so...

Physical address searches start at the "start_paddr" page in node 0's
entry in the vt->node_table[] array, seen easiest by "help -v":

 crash> help -v
  ... [ cut ] ...
             numnodes: 1
             nr_zones: 4
        nr_free_areas: 11
        node_table[0]: 
                     id: 0
                  pgdat: ffff81000000b000
                   size: 261642
                present: 256704
                mem_map: ffff8100006e6000
            start_paddr: 0
            start_mapnr: 0
      dump_free_pages: dump_free_pages_zones_v2()
      dump_kmem_cache: dump_kmem_cache_percpu_v2()
      ...

In the example above, there's only 1 node, and so the physcal
search would search from physical page 0, for 261642 pages.
It would fail when reaching the page beyond the end of the
node, and would call next_physpage() to get the first page of
the next node if it exists.  However, it would also fail when
reading the non-existent "non-present" pages -- if any -- and in
each case, next_physpage() would just bump the request up to
the next page.  So the sample system above, there would be
261642-256704 readmem() failures.

Kernel virtual memory searches will start as directed by the
machdep->get_kvaddr_ranges() call, and then for each page in the ranges,
it will be translated to its physical memory page by readmem() and read.
Whenever a readmem() fails, next_kpage() will be called for the next
legitimate page to attempt, which does different things depending
upon the type of virtual memory.  But for identity-mapped pages,
it uses next_identity_mapping(), which also uses the vt->node_table[]
array similar to physical address searches. However, the search_virtual()
loop does a kvtop() on each virtual address, and then a phys_to_page() on the
returned physical address before it attempts a readmem():

                        if (!kvtop(CURRENT_CONTEXT(), pp, &paddr, 0) ||
                            !phys_to_page(paddr, &page)) {
                                if (!next_kpage(pp, &pp))
                                        goto done;
                                continue;

Whereas search_physical() loop has no restrictions:

                if (!readmem(ppp, PHYSADDR, pagebuf, PAGESIZE(),
                    "search page", RETURN_ON_ERROR|QUIET)) {
                        if (!next_physpage(ppp, &ppp))
                                break;
                        continue;
                }

I'm thinking that search_physical() should probably do a phys_to_page()
qualifier before attempting each readmem()?  I never saw a problem on several
different architectures that I tested it on, but can you try patching that in
(i.e., putting in phys_to_page() qualifier) on that particular machine and see
what happens?

And if that fails, and if it's reproducible, I guess you could to a flushed
write of the address of each page to a file before it's accessed so that it
would be written to disk before it's even read.  Then after your 10-minute
stroll for two, and subsequent reboot, perhaps the offensive physical 
address could be nailed down?

But doing the phys_to_page() before the read seems reasonable.

Dave
    
  
> 
> Bob M.
> 
> > Dave
> >
> >




More information about the Crash-utility mailing list