[Crash-utility] crash CPU bound waiting for user response
Dave Anderson
anderson at redhat.com
Thu Jul 5 13:48:34 UTC 2007
D. Hugh Redelmeier wrote:
> | From: Dave Anderson <anderson at redhat.com>
>
> | D. Hugh Redelmeier wrote:
>
> | > ==> Worse: while it is awaiting my RETURN, it is burning 100% of the CPU!
> | >
> | > Here is what "ps laxgwf" says about the crash process and its child.
> | >
> | > F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
> | > 4 0 4426 4406 25 0 416812 332764 - R+ pts/5 80:36
> | > | | \_ crash --readnow
> | > /usr/lib/debug/lib/modules/2.6.21-1.3228.fc7/vmlinux
> | > /var/crash/2007-07-02-13:42/vmcore
> | > 0 0 4989 4426 18 0 73976 740 - S+ pts/5 0:00
> | > | | \_ /usr/bin/less -E -X -Ps -- MORE --
> | > forward\: <SPACE>, <ENTER> or j backward\: b or k quit\: q
> | >
> | > strace of the crash process shows an infinite sequence of:
> | > wait4(4989, 0x7fffcd9cae90, WNOHANG, NULL) = 0
> | > wait4(4989, 0x7fffcd9cae90, WNOHANG, NULL) = 0
> | > wait4(4989, 0x7fffcd9cae90, WNOHANG, NULL) = 0
> | > wait4(4989, 0x7fffcd9cae90, WNOHANG, NULL) = 0
> | >
> | > This is very wasteful.
> | >
> | > There are other ways to get into this state. Other places less is
> | > being used and is waiting. Probably wherever less is used even if it
> | > isn't waiting.
> | >
> | > I just tested: this problem exists when using a normal xterm.
> |
Again, what exactly do you do to reproduce it? I just cannot get the 100%
cpu-time waiting on the "less" sub-shell.
> | Yeah, I have seen this on occasions, but I have never been able
> | to reproduce it on demand. There was a patch suggestion a while ago,
> | but I deferred it until I could reliably reproduce it for testing
> | before taking it in.
>
> I've put gdb on the case. The CPU burning that I'm currently experiencing is
> in cmdline.c:restore_sanity. The actuall code in question is:
> while (!waitpid(pc->stdpipe_pid, &waitstatus, WNOHANG))
> ;
> That sure looks like a busy-wait.
>
> If you execute this code, you should get a busy-wait too.
>
> If you replaced WNOHANG with 0, I think that the wait would have the
> same result but not be busy. You would then want to loop in the case
> where waitpid returns a -1 with errno == EINTR.
>
> Here's what I'd try (UNTESTED!):
> do ; while (waitpid(pc->stdpipe_pid, &waitstatus, 0) == -1 && errno == EINTR);
>
> All the uses of WNOHANG in that function look suspicious.
I understand. I also remember that the WNOHANG's were originally added
there on purpose because of hangs I was seeing. But that's not to say
it's the best way of doing things.
As I mentioned before, there was a patch posted by someone (as I recall
who preferred using gdb and gdb scripts with kdump vmcores), but going
back a year and a half into the archives, I can't find it.
Anyway, I'm going to have to be able to reproduce it and test any
changes thoroughly before potentially re-introducing the hangs I
used to see.
Thanks,
Dave
More information about the Crash-utility
mailing list