[Crash-utility] [PATCH] runq: make tasks in throttled cfs_rqs/rt_rqs displayed

Fri Nov 9 03:37:11 UTC 2012

于 2012年11月08日 03:15, Dave Anderson 写道:
> 
> 
> ----- Original Message -----
>>
>> ok. I rewrite the patch and they are tested ok in my box.
>>
>> Thanks
>> Zhang
> 
> My tests weren't so successful this time, and I also have some questions
> about the runq -g output.
> 
> I tested your latest patches on a sample set of 70 dumpfiles whose
> kernels all use CFS runqueues.  In 7 of the 70 "runq -g" tests,
> the command caused the crash session to fail like so:
> 

<snip>

> 
> In a quick debugging session of your free_task_group_info_array()
> I printed out the addresses being FREEBUF()'d, and I noted that 
> there were numerous instances of the same address being free twice:
> 
>  static void
>  free_task_group_info_array(void)
>  {
>          int i;
>  
>          for (i = 0; i < tgi_p; i++) {
>                  if (tgi_array[i]->name)
>                          FREEBUF(tgi_array[i]->name);
>                  FREEBUF(tgi_array[i]);
>          }
>          tgi_p = 0;
>          FREEBUF(tgi_array);
>  }
>  
> I put one of the failing vmlinux/vmcore pairs here for you
> to debug:
>   
>   http://people.redhat.com/anderson/zhangyanfei
> 

This is so weird. In my test on the vmcore you provided, 'runq -g' ran well
for the first time and caused the crash session to fail the next time.
>From the debug information above and from my tests, I noticed that it always
failed on the same place when FREEBUF a name. So I checked the function
get_task_group_name and changed the way to return a name buf. Now the command
works well on the vmcore.

> 
> Secondly, another question I have is the meaning of the command's output.
> 
> First, consider this "runq" output:
> 
>  crash> runq
>  CPU 0 RUNQUEUE: ffff8800090436c0
>    CURRENT: PID: 588    TASK: ffff88007e4877a0  COMMAND: "udevd"
>    RT PRIO_ARRAY: ffff8800090437c8
>       [no tasks queued]
>    CFS RB_ROOT: ffff880009043740
>       [118] PID: 2110   TASK: ffff88007d470860  COMMAND: "check-cdrom.sh"
>       [118] PID: 2109   TASK: ffff88007f1247a0  COMMAND: "check-cdrom.sh"
>       [118] PID: 2114   TASK: ffff88007f20e080  COMMAND: "udevd"
>  
>  CPU 1 RUNQUEUE: ffff88000905b6c0
>    CURRENT: PID: 2113   TASK: ffff88007e8ac140  COMMAND: "udevd"
>    RT PRIO_ARRAY: ffff88000905b7c8
>       [no tasks queued]
>    CFS RB_ROOT: ffff88000905b740
>       [118] PID: 2092   TASK: ffff88007d7a4760  COMMAND: "MAKEDEV"
>       [118] PID: 1983   TASK: ffff88007e59f140  COMMAND: "udevd"
>       [118] PID: 2064   TASK: ffff88007e40f7a0  COMMAND: "udevd"
>       [115] PID: 2111   TASK: ffff88007e4278a0  COMMAND: "kthreadd"
>  crash>
> 
> In the above case, the per-cpu "rq" structure addresses are shown as:
> 
>  CPU 0 RUNQUEUE: ffff8800090436c0
>  CPU 1 RUNQUEUE: ffff88000905b6c0
> 
> And embedded in each of the rq structures above are these two rb_root
> structures:
> 
>    CFS RB_ROOT: ffff880009043740  (embedded in rq @ffff8800090436c0)
>    CFS RB_ROOT: ffff88000905b740  (embedded in rq @ffff88000905b6c0)
> 
> And starting at those rb_root structures, the tree of tasks are dumped.
> 
> Now, your "runq -q" option doesn't show any "starting point" structure
> address, but rather they just show "CPU 0" and "CPU 1":
>  
>  crash> runq -g
>  CPU 0
>    CURRENT: PID: 588    TASK: ffff88007e4877a0  COMMAND: "udevd"
>    RT PRIO_ARRAY: ffff8800090437c8
>       [no tasks queued]
>    CFS RB_ROOT: ffff880009093548
>       [118] PID: 2110   TASK: ffff88007d470860  COMMAND: "check-cdrom.sh"
>       [118] PID: 2109   TASK: ffff88007f1247a0  COMMAND: "check-cdrom.sh"
>       [118] PID: 2114   TASK: ffff88007f20e080  COMMAND: "udevd"
>  
>  CPU 1
>    CURRENT: PID: 2113   TASK: ffff88007e8ac140  COMMAND: "udevd"
>    RT PRIO_ARRAY: ffff88000905b7c8
>       [no tasks queued]
>    CFS RB_ROOT: ffff880009093548
>       [118] PID: 2092   TASK: ffff88007d7a4760  COMMAND: "MAKEDEV"
>       [118] PID: 1983   TASK: ffff88007e59f140  COMMAND: "udevd"
>       [118] PID: 2064   TASK: ffff88007e40f7a0  COMMAND: "udevd"
>       [115] PID: 2111   TASK: ffff88007e4278a0  COMMAND: "kthreadd"
>  crash> 
>  
> I would think that there might be a useful address of a per-cpu 
> structure that could be shown there as well?

OK, this is added.

> 
> And secondly, I'm confused as to why the "CFS RB_ROOT" address for
> all cpus is the same address -- for example, above they are both at
> ffff880009093548.  How can the two rb trees have the same rb_root?

My neglect, sorry. fixed.

Thanks
Zhang
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0001-add-g-option-for-runq-v5.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20121109/85dc6ffb/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: 0002-add-help-info-for-runq-g-v2.patch
URL: <http://listman.redhat.com/archives/crash-utility/attachments/20121109/85dc6ffb/attachment-0001.ksh>