[Crash-utility] Running idle threads show wrong CPU numbers

Michael Holzheu holzheu at linux.vnet.ibm.com
Wed Feb 10 18:45:32 UTC 2010


On Wed, 2010-02-10 at 11:09 -0500, Dave Anderson wrote:
> ----- Forwarded Message -----
> From: "Dave Anderson" <anderson at redhat.com>
> 
> ----- "Michael Holzheu" <holzheu at linux.vnet.ibm.com> wrote:
> 
> > On Wed, 2010-02-10 at 10:08 -0500, Dave Anderson wrote:
> > > ----- "Michael Holzheu" <holzheu at linux.vnet.ibm.com> wrote:
> > > 
> > > > Hi again,
> > > 
> > > > > When I change get_smp_cpus() to return "get_highest_cpu_online() + 1" I
> > > > > see five swapper idle tasks when using "ps". The problem I now have is
> > > > > that I have to provide a backtrace for the offline cpus. But the offline
> > > > > CPUs do not have any stack on s390. Is there a way to tell crash that
> > > > > there is no backtrace available? Probably I overlooked something...
> > > > 
> > > > Ok, I think I got it now. In case of an offline CPU, I will use
> > > > "task_struct_thread_ksp" like I do it for non active tasks.
> > > > 
> > > > When I do that I get for the swapper tasks with the offline CPUs:
> > > > 
> > > > PID: 0      TASK: 18d38340          CPU: 2   COMMAND: "swapper"
> > > >  #0 [18d3feb8] ret_from_fork at 117e12
> > > > 
> > > > PID: 0      TASK: 18d40440          CPU: 3   COMMAND: "swapper"
> > > >  #0 [18d47eb8] ret_from_fork at 117e12
> > > 
> > > I'm not why you should do anything.  The cpu is offline and for all
> > > practical purposes it doesn't exist, so why bother?
> > 
> > Because you can do a "bt" on the swapper task with the offline CPU.
> > Then s390x_get_stack_frame() is called where I figure out the stack
> > pointer and instruction address. In that function I check if the task is
> > currently running on a CPU and in that case I get the information from
> > the associated s390 lowcore, where the registers are stored in case of a
> > dump. If the task is not running I get the information from the thread
> > struct.
> > 
> > > The patch I have queued just uses get_highest_cpu_online()+1 and
> > > does nothing else.  But I only tested it on a live system, and
> > > any backtrace attempt on the offlined swapper task just shows
> > > (active).  What happens when you do a "bt -a" with a dumpfile?
> > 
> > It shows all swapper tasks (online and offline), but I get errors for
> > the backtrace for the offline CPUs.
> 
> What kind of errors?

The problem is that for the offline swapper tasks
s390x_get_stack_frame() is called. In that function I check with
s390x_has_cpu() if the task is currently running on a CPU. Because of
the missing CPU online check, s390x_has_cpu() returns TRUE. Therefore I
try to read the CPU registers from the lowcore of that CPU. The lowcore
pointer is zero, because the CPU is offline. Therefore the read stack
pointer (register 15) is wrong and the backtrace fails.

> > 
> > The attached patch would solve the problem (and eliminate most of the
> > probably redundant s390(x)_has_cpu() function.
> 
> I don't see what's being solved by the patch (not the s390x_get_smp_cpus
> parts) -- does the "old" s390x_has_cpu() fail?

The old s390x_has_cpu() returns TRUE for the offline swapper tasks. And
I think that this is wrong.

The new implementation of s390x_has_cpu() should return TRUE if the task
is running on a online CPU and FALSE otherwise:

+       if (is_task_active(bt->task) && (kt->cpu_flags[cpu] & ONLINE))
+               return TRUE;
+       else
+               return FALSE;


Michael




More information about the Crash-utility mailing list