[Crash-utility] CFS runqueue list handling improvements

Wed Feb 1 19:39:52 UTC 2012

----- Original Message -----
> I was using a crash dump last week where the CFS runqueue for one CPU
> had a loop in it. It was caused by a red/black tree node that was the
> left child of one node at the same time as being the right child of a
> different node. The doubly-linked node has only one parent, one of the
> nodes that has it as a child here). Call the node with two links to it
> X, the node that is the stored parent of X call Y and the other node
> that links to X call Z. The parent of Y is Z here but X is the right
> node of Z (left of Y). So, you reach Z, do all it's left branch then go
> right to reach X. Go up to Z from the left not Y from the right and do
> all the right of Z. Go up to Y and your last node isn't the left or
> right child so you go do the whole left tree of Y again, etc.
> 
> In that case, the runq command (task.c) in crash never bails out of
> dump_CFS_runqueues() because the for loop that goes from rb_first()
> through all of rb_next() has no means of detecting the problem, although
> you can use the runq command with -d and only get the task list (which
> wasn't corrupted). The runq command alone then becomes very annoying to
> use when diagnosing the actual problem.
> 
> It occurs to me that this could be improved with one or more cases to
> detect as additional exit conditions with warning messages for that
> loop:
> 
> * Number of iterations exceeds cfs_rq_nr_running (perhaps allowing it to
> be exceeded by some small amount to have a chance of seeing the nature
> of a problem with it, perhaps with a runq command line switch to allow
> ignoring any problems)
> 
> * At the node making the cfs_rq_nr_running iteration, note the node address,
> then exit with a warning if that node is ever displayed again,
> thus showing any loop that starts within the first cfs_rq_nr_running nodes.
> 
> * A more complex loop detection that handles cases where
> cfs_rq_nr_running is significantly lower than the number of actual nodes
> in the valid part of the tree.
> 
> * I suppose if you still have the last node processed and you are going
> up then that last node should always be the left or right child of the
> parent you are reaching or the tree is broken.
> 
> I understand that cfs_rq_nr_running could contain the "corruption" so
> assuming it indicates the correct number of times to call rb_next()
> isn't correct either, so my preference is for some form of loop
> detection. Before I work on a patch, are there any opinions on the
> behavior of the runq command in the case of a corrupt CFS runqueue, e.g.
> one that contains a loop?
> 
> --
> David Mair
> SUSE Linux

Given that it's an abnormal condition, it's not worth going overboard
trying to account for something that should never happen.

I would encapsulate the call to dump_tasks_in_cfs_rq() with
hq_open() and hq_close() calls.  And when dump_tasks_in_cfs_rq()
is about to call dump_task_runq_entry(), first call hq_enter() with 
the tc pointer.  If it's been seen before, it will return FALSE -- and
you can print an error message and return back to dump_tasks_in_cfs_rq().

Dave