[Crash-utility] Running idle threads show wrong CPU numbers

Dave Anderson anderson at redhat.com
Wed Feb 10 16:09:46 UTC 2010


----- Forwarded Message -----
From: "Dave Anderson" <anderson at redhat.com>

----- "Michael Holzheu" <holzheu at linux.vnet.ibm.com> wrote:

> On Wed, 2010-02-10 at 10:08 -0500, Dave Anderson wrote:
> > ----- "Michael Holzheu" <holzheu at linux.vnet.ibm.com> wrote:
> > 
> > > Hi again,
> > 
> > > > When I change get_smp_cpus() to return "get_highest_cpu_online() + 1" I
> > > > see five swapper idle tasks when using "ps". The problem I now have is
> > > > that I have to provide a backtrace for the offline cpus. But the offline
> > > > CPUs do not have any stack on s390. Is there a way to tell crash that
> > > > there is no backtrace available? Probably I overlooked something...
> > > 
> > > Ok, I think I got it now. In case of an offline CPU, I will use
> > > "task_struct_thread_ksp" like I do it for non active tasks.
> > > 
> > > When I do that I get for the swapper tasks with the offline CPUs:
> > > 
> > > PID: 0      TASK: 18d38340          CPU: 2   COMMAND: "swapper"
> > >  #0 [18d3feb8] ret_from_fork at 117e12
> > > 
> > > PID: 0      TASK: 18d40440          CPU: 3   COMMAND: "swapper"
> > >  #0 [18d47eb8] ret_from_fork at 117e12
> > 
> > I'm not why you should do anything.  The cpu is offline and for all
> > practical purposes it doesn't exist, so why bother?
> 
> Because you can do a "bt" on the swapper task with the offline CPU.
> Then s390x_get_stack_frame() is called where I figure out the stack
> pointer and instruction address. In that function I check if the task is
> currently running on a CPU and in that case I get the information from
> the associated s390 lowcore, where the registers are stored in case of a
> dump. If the task is not running I get the information from the thread
> struct.
> 
> > The patch I have queued just uses get_highest_cpu_online()+1 and
> > does nothing else.  But I only tested it on a live system, and
> > any backtrace attempt on the offlined swapper task just shows
> > (active).  What happens when you do a "bt -a" with a dumpfile?
> 
> It shows all swapper tasks (online and offline), but I get errors for
> the backtrace for the offline CPUs.

What kind of errors?

> 
> The attached patch would solve the problem (and eliminate most of the
> probably redundant s390(x)_has_cpu() function.

I don't see what's being solved by the patch (not the s390x_get_smp_cpus
parts) -- does the "old" s390x_has_cpu() fail?

Even though the task is offline, the runqueue will still show its percpu
swapper task as the current task on that cpu.

Dave

 
> 
> With this patch "ps" shows:
> 
>    PID    PPID  CPU       TASK        ST  %MEM     VSZ    RSS  COMM
> >     0      0   0       800ef0       RU   0.0       0      0  [swapper]
> >     0      0   1      18d30240      RU   0.0       0      0  [swapper]
> >     0      0   2      18d38340      RU   0.0       0      0  [swapper]
> >     0      0   3      18d40440      RU   0.0       0      0  [swapper]
> >     0      0   4      18d48540      RU   0.0       0      0  [swapper]
>       1      0   1      18d18040      IN   0.2    2244   1020  init
> ...
> 
> And "bt -a" shows:
> 
> PID: 0      TASK: 800ef0            CPU: 0   COMMAND: "swapper"
>  LOWCORE INFO:
>   -psw      : 0x0706000180000000 0x0000000000115564
>   -function : vtime_stop_cpu at 115564
>   -prefix   : 0x18d28000
>   -cpu timer: 0x7fff00c1 0x00c584ef
> ...
> 
> PID: 0      TASK: 18d30240          CPU: 1   COMMAND: "swapper"
>  LOWCORE INFO:
>   -psw      : 0x0706000180000000 0x0000000000115564
>   -function : vtime_stop_cpu at 115564
> ...
> 
> PID: 0      TASK: 18d38340          CPU: 2   COMMAND: "swapper"
>  #0 [18d3feb8] ret_from_fork at 117e12
> ...
> 
> PID: 0      TASK: 18d40440          CPU: 3   COMMAND: "swapper"
>  #0 [18d47eb8] ret_from_fork at 117e12
> ...
> 
> PID: 0      TASK: 18d48540          CPU: 4   COMMAND: "swapper"
>  LOWCORE INFO:
>   -psw      : 0x0706000180000000 0x0000000000115564
>   -function : vtime_stop_cpu at 115564
>   -prefix   : 0x1416a000
> 
> Michael




More information about the Crash-utility mailing list