[Crash-utility] [PATCH] runq: make tasks in throttled cfs_rqs/rt_rqs displayed

Dave Anderson anderson at redhat.com
Wed Nov 7 19:15:14 UTC 2012



----- Original Message -----
> 
> ok. I rewrite the patch and they are tested ok in my box.
> 
> Thanks
> Zhang

My tests weren't so successful this time, and I also have some questions
about the runq -g output.

I tested your latest patches on a sample set of 70 dumpfiles whose
kernels all use CFS runqueues.  In 7 of the 70 "runq -g" tests,
the command caused the crash session to fail like so:

 crash> runq -g
 CPU 0
   CURRENT: PID: 0      TASK: ffffffff81c0d020  COMMAND: "swapper/0"
   RT PRIO_ARRAY: ffff8802277d4590
      [no tasks queued]
   CFS RB_ROOT: ffff880227018070
      [120] PID: 3      TASK: ffff8802221a2670  COMMAND: "ksoftirqd/0"
 
 CPU 1
   CURRENT: PID: 0      TASK: ffff8802221e2670  COMMAND: "swapper/1"
   RT PRIO_ARRAY: ffff880426fd4590
      [no tasks queued]
   CFS RB_ROOT: ffff880227018070
      [no tasks queued]
 
 CPU 2
   CURRENT: PID: 0      TASK: ffff880222220000  COMMAND: "swapper/2"
   RT PRIO_ARRAY: ffff8802279d4590
      [no tasks queued]
   CFS RB_ROOT: ffff880227018070
      [no tasks queued]
 
 CPU 3
   CURRENT: PID: 11615  TASK: ffff88020c50a670  COMMAND: "runtest.sh"
   RT PRIO_ARRAY: ffff8804271d4590
      [no tasks queued]
   CFS RB_ROOT: ffff880227018070
      GROUP CFS RB_ROOT: ffff88041e0d2790
         [no tasks queued]
 
 CPU 4
   CURRENT: PID: 0      TASK: ffff880222250000  COMMAND: "swapper/4"
   RT PRIO_ARRAY: ffff880227bd4590
      [no tasks queued]
   CFS RB_ROOT: ffff880227018070
      [no tasks queued]
 
 CPU 5
   CURRENT: PID: 0      TASK: ffff880222252670  COMMAND: "swapper/5"
   RT PRIO_ARRAY: ffff8804273d4590
      [no tasks queued]
   CFS RB_ROOT: ffff880227018070
      [no tasks queued]
 
 CPU 6
   CURRENT: PID: 1055   TASK: ffff88020c620000  COMMAND: "rsyslogd"
   RT PRIO_ARRAY: ffff880227dd4590
      [no tasks queued]
   CFS RB_ROOT: ffff880227018070
      GROUP CFS RB_ROOT: ffff88021df3a790 <system>
         GROUP CFS RB_ROOT: ffff88020d556c78 <rsyslog.service>
            [no tasks queued]
 
 CPU 7
   CURRENT: PID: 33     TASK: ffff880421d48000  COMMAND: "migration/7"
   RT PRIO_ARRAY: ffff8804275d4590
      [no tasks queued]
   CFS RB_ROOT: ffff880227018070
      [120] PID: 10653  TASK: ffff88041ead0000  COMMAND: "kworker/7:2"
 *** glibc detected *** /var/CVS/crash-6.1.0/crash: free(): invalid pointer: 0x0000000004e2c210 ***
 ======= Backtrace: =========
 /lib64/libc.so.6[0x33ff87bfee]
 /var/CVS/crash-6.1.0/crash(freebuf+0x29a)[0x46f327]
 /var/CVS/crash-6.1.0/crash[0x4cae70]
 /var/CVS/crash-6.1.0/crash[0x4ce4b2]
 /var/CVS/crash-6.1.0/crash(cmd_runq+0xfb)[0x4c9b5b]
 /var/CVS/crash-6.1.0/crash(exec_command+0x20a)[0x45db7f]
 /var/CVS/crash-6.1.0/crash(main_loop+0x281)[0x45d973]
 /var/CVS/crash-6.1.0/crash[0x5ad159]
 /var/CVS/crash-6.1.0/crash(catch_errors+0x5e)[0x5ac59e]
 /var/CVS/crash-6.1.0/crash[0x5adde6]
 /var/CVS/crash-6.1.0/crash(catch_errors+0x5e)[0x5ac59e]
 /var/CVS/crash-6.1.0/crash(gdb_main+0x24)[0x5ae0b4]
 /var/CVS/crash-6.1.0/crash(gdb_main_entry+0x2e)[0x5ae0ee]
 /var/CVS/crash-6.1.0/crash(gdb_main_loop+0x263)[0x4ed248]
 /var/CVS/crash-6.1.0/crash(main+0x268d)[0x45d6de]
 /lib64/libc.so.6(__libc_start_main+0xf5)[0x33ff821735]
 /var/CVS/crash-6.1.0/crash[0x45af5d]
 ======= Memory map: ========
 00400000-00a62000 r-xp 00000000 fd:01 3021564                            /var/CVS/crash-6.1.0/crash
 00c61000-00c85000 rw-p 00661000 fd:01 3021564                            /var/CVS/crash-6.1.0/crash
 00c85000-00e66000 rw-p 00000000 00:00 0 
 017ae000-05b7b000 rw-p 00000000 00:00 0                                  [heap]
 33ff400000-33ff420000 r-xp 00000000 fd:01 1869596                        /usr/lib64/ld-2.15.so
 33ff61f000-33ff620000 r--p 0001f000 fd:01 1869596                        /usr/lib64/ld-2.15.so
 33ff620000-33ff621000 rw-p 00020000 fd:01 1869596                        /usr/lib64/ld-2.15.so
 33ff621000-33ff622000 rw-p 00000000 00:00 0 
 33ff800000-33ff9ac000 r-xp 00000000 fd:01 1869597                        /usr/lib64/libc-2.15.so
 33ff9ac000-33ffbac000 ---p 001ac000 fd:01 1869597                        /usr/lib64/libc-2.15.so
 33ffbac000-33ffbb0000 r--p 001ac000 fd:01 1869597                        /usr/lib64/libc-2.15.so
 33ffbb0000-33ffbb2000 rw-p 001b0000 fd:01 1869597                        /usr/lib64/libc-2.15.so
 33ffbb2000-33ffbb7000 rw-p 00000000 00:00 0 
 3400000000-3400003000 r-xp 00000000 fd:01 1838776                        /usr/lib64/libdl-2.15.so
 3400003000-3400202000 ---p 00003000 fd:01 1838776                        /usr/lib64/libdl-2.15.so
 3400202000-3400203000 r--p 00002000 fd:01 1838776                        /usr/lib64/libdl-2.15.so
 3400203000-3400204000 rw-p 00003000 fd:01 1838776                        /usr/lib64/libdl-2.15.so
 3400800000-3400817000 r-xp 00000000 fd:01 1869608                        /usr/lib64/libz.so.1.2.5
 3400817000-3400a16000 ---p 00017000 fd:01 1869608                        /usr/lib64/libz.so.1.2.5
 3400a16000-3400a17000 rw-p 00016000 fd:01 1869608                        /usr/lib64/libz.so.1.2.5
 3400c00000-3400cfa000 r-xp 00000000 fd:01 1869598                        /usr/lib64/libm-2.15.so
 3400cfa000-3400ef9000 ---p 000fa000 fd:01 1869598                        /usr/lib64/libm-2.15.so
 3400ef9000-3400efa000 r--p 000f9000 fd:01 1869598                        /usr/lib64/libm-2.15.so
 3400efa000-3400efb000 rw-p 000fa000 fd:01 1869598                        /usr/lib64/libm-2.15.so
 340f800000-340f825000 r-xp 00000000 fd:01 1869641                        /usr/lib64/libtinfo.so.5.9
 340f825000-340fa24000 ---p 00025000 fd:01 1869641                        /usr/lib64/libtinfo.so.5.9
 340fa24000-340fa28000 r--p 00024000 fd:01 1869641                        /usr/lib64/libtinfo.so.5.9
 340fa28000-340fa29000 rw-p 00028000 fd:01 1869641                        /usr/lib64/libtinfo.so.5.9
 3414800000-3414823000 r-xp 00000000 fd:01 1869642                        /usr/lib64/libncurses.so.5.9
 3414823000-3414a22000 ---p 00023000 fd:01 1869642                        /usr/lib64/libncurses.so.5.9
 3414a22000-3414a23000 r--p 00022000 fd:01 1869642                        /usr/lib64/libncurses.so.5.9
 3414a23000-3414a24000 rw-p 00023000 fd:01 1869642                        /usr/lib64/libncurses.so.5.9
 3ef8c00000-3ef8c15000 r-xp 00000000 fd:01 1842976                        /usr/lib64/libgcc_s-4.7.2-20120921.so.1
 3ef8c15000-3ef8e14000 ---p 00015000 fd:01 1842976                        /usr/lib64/libgcc_s-4.7.2-20120921.so.1
 3ef8e14000-3ef8e15000 rw-p 00014000 fd:01 1842976                        /usr/lib64/libgcc_s-4.7.2-20120921.so.1
 7fa098c20000-7fa099ce4000 r--p 0705e000 fd:01 3014793                    /var/tmp/vmlinux.gz_pnPD6v (deleted)
 7fa099ce4000-7fa09a9a5000 rw-p 00000000 00:00 0 
 7fa09b337000-7fa09b50c000 r--p 06e8a000 fd:01 3014793                    /var/tmp/vmlinux.gz_pnPD6v (deleted)
 7fa09b50c000-7fa09b9fe000 r--p 06814000 fd:01 3014793                    /var/tmp/vmlinux.gz_pnPD6v (deleted)
 7fa09b9fe000-7fa09bd46000 r--p 08121000 fd:01 3014793                    /var/tmp/vmlinux.gz_pnPD6v (deleted)
 7fa09bd46000-7fa09bf93000 r--p 065c8000 fd:01 3014793                    /var/tmp/vmlinux.gz_pnPD6v (deleted)
 7fa09bf93000-7fa0a100c000 r--p 01550000 fd:01 3014793                    /var/tmp/vmlinux.gz_pnPD6v (deleted)
 7fa0a100c000-7fa0a1298000 r--p 08468000 fd:01 3014793                    /var/tmp/vmlinux.gz_pnPD6v (deleted)
 7fa0a1298000-7fa0a25d4000 rw-p 00000000 00:00 0 
 7fa0a26f8000-7fa0a29d1000 rw-p 00000000 00:00 0 
 7fa0a2a63000-7fa0a8e90000 r--p 00000000 fd:01 1869461                    /usr/lib/locale/locale-archive
 7fa0a8e90000-7fa0a90af000 rw-p 00000000 00:00 0 
 7fa0a90bd000-7fa0a90be000 rw-p 00000000 00:00 0 
 7fa0a90be000-7fa0a90c5000 r--s 00000000 fd:01 1968224                    /usr/lib64/gconv/gconv-modules.cache
 7fa0a90c5000-7fa0a90cb000 rw-p 00000000 00:00 0 
 7fff15db3000-7fff15dd4000 rw-p 00000000 00:00 0                          [stack]
 7fff15dff000-7fff15e00000 r-xp 00000000 00:00 0                          [vdso]
 ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
 Aborted (core dumped)
 $
 
Here's the gdb trace of the core above:
 
 $ gdb ./crash core.19908 
 GNU gdb (GDB) Fedora (7.4.50.20120120-52.fc17)
 Copyright (C) 2012 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
 and "show warranty" for details.
 This GDB was configured as "x86_64-redhat-linux-gnu".
 For bug reporting instructions, please see:
 <http://www.gnu.org/software/gdb/bugs/>...
 Reading symbols from /var/CVS/crash-6.1.0/crash...done.
 [New LWP 19908]
 Core was generated by `./crash vmcore vmlinux.gz'.
 Program terminated with signal 6, Aborted.
 #0  0x00000033ff835925 in raise () from /lib64/libc.so.6
 Missing separate debuginfos, use: debuginfo-install glibc-2.15-57.fc17.x86_64 libgcc-4.7.2-2.fc17.x86_64 ncurses-libs-5.9-4.20120204.fc17.x86_64 zlib-1.2.5-7.fc17.x86_64
 (gdb) bt
 #0  0x00000033ff835925 in raise () from /lib64/libc.so.6
 #1  0x00000033ff8370d8 in abort () from /lib64/libc.so.6
 #2  0x00000033ff874e6b in __libc_message () from /lib64/libc.so.6
 #3  0x00000033ff87bfee in _int_free () from /lib64/libc.so.6
 #4  0x000000000046f327 in freebuf (addr=0x3f234a0 "systemd-stdout-syslog-bridge.service") at tools.c:4710
 #5  0x00000000004cae70 in free_task_group_info_array () at task.c:7578
 #6  0x00000000004ce4b2 in dump_tasks_by_task_group () at task.c:8462
 #7  0x00000000004c9b5b in cmd_runq () at task.c:7158
 #8  0x000000000045db7f in exec_command () at main.c:764
 #9  0x000000000045d973 in main_loop () at main.c:712
 #10 0x00000000005ad159 in captured_command_loop (data=data at entry=0x0) at ./main.c:228
 #11 0x00000000005ac59e in catch_errors (func=func at entry=0x5ad150 <captured_command_loop>, 
     func_args=func_args at entry=0x0, errstring=errstring at entry=0x8caaeb "", mask=mask at entry=6) at exceptions.c:531
 #12 0x00000000005adde6 in captured_main (data=data at entry=0x7fff3c694000) at ./main.c:958
 #13 0x00000000005ac59e in catch_errors (func=func at entry=0x5ad230 <captured_main>, 
     func_args=func_args at entry=0x7fff3c694000, errstring=errstring at entry=0x8caaeb "", mask=mask at entry=6)
     at exceptions.c:531
 #14 0x00000000005ae0b4 in gdb_main (args=args at entry=0x7fff3c694000) at ./main.c:973
 #15 0x00000000005ae0ee in gdb_main_entry (argc=<optimized out>, argv=<optimized out>) at ./main.c:993
 #16 0x00000000004ed248 in gdb_main_loop (argc=3, argv=0x7fff3c694188) at gdb_interface.c:76
 #17 0x000000000045d6de in main (argc=3, argv=0x7fff3c694188) at main.c:617
 (gdb) q

In a quick debugging session of your free_task_group_info_array()
I printed out the addresses being FREEBUF()'d, and I noted that 
there were numerous instances of the same address being free twice:

 static void
 free_task_group_info_array(void)
 {
         int i;
 
         for (i = 0; i < tgi_p; i++) {
                 if (tgi_array[i]->name)
                         FREEBUF(tgi_array[i]->name);
                 FREEBUF(tgi_array[i]);
         }
         tgi_p = 0;
         FREEBUF(tgi_array);
 }
 
I put one of the failing vmlinux/vmcore pairs here for you
to debug:
  
  http://people.redhat.com/anderson/zhangyanfei


Secondly, another question I have is the meaning of the command's output.

First, consider this "runq" output:

 crash> runq
 CPU 0 RUNQUEUE: ffff8800090436c0
   CURRENT: PID: 588    TASK: ffff88007e4877a0  COMMAND: "udevd"
   RT PRIO_ARRAY: ffff8800090437c8
      [no tasks queued]
   CFS RB_ROOT: ffff880009043740
      [118] PID: 2110   TASK: ffff88007d470860  COMMAND: "check-cdrom.sh"
      [118] PID: 2109   TASK: ffff88007f1247a0  COMMAND: "check-cdrom.sh"
      [118] PID: 2114   TASK: ffff88007f20e080  COMMAND: "udevd"
 
 CPU 1 RUNQUEUE: ffff88000905b6c0
   CURRENT: PID: 2113   TASK: ffff88007e8ac140  COMMAND: "udevd"
   RT PRIO_ARRAY: ffff88000905b7c8
      [no tasks queued]
   CFS RB_ROOT: ffff88000905b740
      [118] PID: 2092   TASK: ffff88007d7a4760  COMMAND: "MAKEDEV"
      [118] PID: 1983   TASK: ffff88007e59f140  COMMAND: "udevd"
      [118] PID: 2064   TASK: ffff88007e40f7a0  COMMAND: "udevd"
      [115] PID: 2111   TASK: ffff88007e4278a0  COMMAND: "kthreadd"
 crash>

In the above case, the per-cpu "rq" structure addresses are shown as:

 CPU 0 RUNQUEUE: ffff8800090436c0
 CPU 1 RUNQUEUE: ffff88000905b6c0

And embedded in each of the rq structures above are these two rb_root
structures:

   CFS RB_ROOT: ffff880009043740  (embedded in rq @ffff8800090436c0)
   CFS RB_ROOT: ffff88000905b740  (embedded in rq @ffff88000905b6c0)

And starting at those rb_root structures, the tree of tasks are dumped.

Now, your "runq -q" option doesn't show any "starting point" structure
address, but rather they just show "CPU 0" and "CPU 1":
 
 crash> runq -g
 CPU 0
   CURRENT: PID: 588    TASK: ffff88007e4877a0  COMMAND: "udevd"
   RT PRIO_ARRAY: ffff8800090437c8
      [no tasks queued]
   CFS RB_ROOT: ffff880009093548
      [118] PID: 2110   TASK: ffff88007d470860  COMMAND: "check-cdrom.sh"
      [118] PID: 2109   TASK: ffff88007f1247a0  COMMAND: "check-cdrom.sh"
      [118] PID: 2114   TASK: ffff88007f20e080  COMMAND: "udevd"
 
 CPU 1
   CURRENT: PID: 2113   TASK: ffff88007e8ac140  COMMAND: "udevd"
   RT PRIO_ARRAY: ffff88000905b7c8
      [no tasks queued]
   CFS RB_ROOT: ffff880009093548
      [118] PID: 2092   TASK: ffff88007d7a4760  COMMAND: "MAKEDEV"
      [118] PID: 1983   TASK: ffff88007e59f140  COMMAND: "udevd"
      [118] PID: 2064   TASK: ffff88007e40f7a0  COMMAND: "udevd"
      [115] PID: 2111   TASK: ffff88007e4278a0  COMMAND: "kthreadd"
 crash> 
 
I would think that there might be a useful address of a per-cpu 
structure that could be shown there as well?

And secondly, I'm confused as to why the "CFS RB_ROOT" address for
all cpus is the same address -- for example, above they are both at
ffff880009093548.  How can the two rb trees have the same rb_root?

Dave




More information about the Crash-utility mailing list