rpms/kernel/devel linux-2.6-cond-resched-booting-fix.patch, NONE, 1.1 kernel-2.6.spec, 1.2021, 1.2022

fedora-cvs-commits at redhat.com fedora-cvs-commits at redhat.com
Mon Mar 6 22:14:39 UTC 2006


Author: davej

Update of /cvs/dist/rpms/kernel/devel
In directory cvs.devel.redhat.com:/tmp/cvs-serv1758

Modified Files:
	kernel-2.6.spec 
Added Files:
	linux-2.6-cond-resched-booting-fix.patch 
Log Message:
Don't do voluntary preempt until after bootup



linux-2.6-cond-resched-booting-fix.patch:
 sched.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletion(-)

--- NEW FILE linux-2.6-cond-resched-booting-fix.patch ---

Date:	Sun, 5 Mar 2006 21:00:17 -0800 (PST)
From:	Linus Torvalds <torvalds at osdl.org>
To:	Andrew Morton <akpm at osdl.org>
Subject: Re: Fw: Re: oops in choose_configuration()
cc:	Greg KH <greg at kroah.com>, Ingo Molnar <mingo at elte.hu>,
	Linux Kernel Mailing List <linux-kernel at vger.kernel.org>,
	Dave Jones <davej at redhat.com>

On Sun, 5 Mar 2006, Andrew Morton wrote:

> For several days I've been getting repeatable oopses in the -mm kernel. 
> They occur once per ~30 boots, during initscripts.

Actually, having thought about this some more, I wonder if the bug isn't a 
hell of a lot simpler than we've given it credit for.

I think you're running with CONFIG_PREEMPT_VOLUNTARY, right?

And looking more closely, that thing is BROKEN. DaveJ - do Fedora kernels 
also enable that thing?

Ingo: as far as I can see, CONFIG_PREEMPT_VOLUNTARY is totally and utterly 
broken during bootup. It does:

	# define might_resched() cond_resched()

and then we have

	# define might_sleep() do { might_resched(); } while (0)

and but the fact is, we _know_ that "might_sleep()" is broken during early 
bootup. We know this, because when we ahev __might_sleep() enabled to 
warn about cases where we must not sleep, we've had those tests disabled 
during early boot for a long time, in order to avoid irritating and nasty 
known "sleeping function called from invalid context" messages:

	...
        if ((in_atomic() || irqs_disabled()) &&
            system_state == SYSTEM_RUNNING && !oops_in_progress) {
                if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
	...

Note in particular the "system_state == SYSTEM_RUNNING". It's there for a 
reason. Namely that we know that we do things that aren't valid during 
early bootup, and that we call functions that might sleep while we have 
interrupts disabled, for example.

HOWEVER, the "cond_resched()" does not take that into account at all, and 
will happily conditionally reschedule things at early bootup before we 
have set system_state to SYSTEM_RUNNING.

In other words, unless I've totally lost it, I think that 
CONFIG_PREEMPT_VOLUNTARY currently makes us re-schedule at points in the 
early boot that we _know_ are unsafe. We happen to not hit it very often, 
because (a) some of the time it doesn't matter and (b) when it matters, we 
seldom have "need_resched()" returning true, but I would not be at all 
surprised if Andrew's problems are because the scheduler heuristics make 
it happen when it shouldn't.

And the end result? I don't know. But we've traditionally run _all_ of the 
early boot ignoring the "might_sleep()" warnings, up until the point where 
we unlock the kernel lock, long after things like kmem_cache_init().

So I would not be surprised, for example, if we had kmem_cache_init() 
doing bad things because it got interrupts enabled at a point where it 
shouldn't, because it went through the scheduler. 

I dunno. I can't actually see what would corrupt anything, but the point 
is that we definitely do scheduling in places that have gotten absolutely 
_zero_ coverage, because we turned off the checks on purpose during early 
boot because the checks gave false positives.

And CONFIG_PREEMPT_VOLUNTARY turns those false positives into potential 
rescheduling events.

Maybe I'm crazy. But it looks really really broken to me.

Andrew, if I'm right, then this ugly patch should make a difference.

Is there something else I've missed?

			Linus

----
diff --git a/kernel/sched.c b/kernel/sched.c
index 12d291b..3454bb8 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4028,6 +4028,8 @@ static inline void __cond_resched(void)
 	 */
 	if (unlikely(preempt_count()))
 		return;
+	if (unlikely(system_state != SYSTEM_RUNNING))
+		return;
 	do {
 		add_preempt_count(PREEMPT_ACTIVE);
 		schedule();
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Index: kernel-2.6.spec
===================================================================
RCS file: /cvs/dist/rpms/kernel/devel/kernel-2.6.spec,v
retrieving revision 1.2021
retrieving revision 1.2022
diff -u -r1.2021 -r1.2022
--- kernel-2.6.spec	6 Mar 2006 21:44:45 -0000	1.2021
+++ kernel-2.6.spec	6 Mar 2006 22:14:36 -0000	1.2022
@@ -373,6 +373,7 @@
 Patch1730: linux-2.6-signal-trampolines-unwind-info.patch
 Patch1740: linux-2.6-softlockup-disable.patch
 Patch1750: linux-2.6-drm-cripple-r300.patch
+Patch1760: linux-2.6-cond-resched-booting-fix.patch
 
 # SELinux/audit patches.
 Patch1800: linux-2.6-selinux-hush.patch
@@ -985,6 +986,8 @@
 %patch1740 -p1
 # Disable R300 and above DRI as it's unstable.
 %patch1750 -p1
+# Don't do voluntary preempt until after bootup
+%patch1760 -p1
 
 # Silence some selinux messages.
 %patch1800 -p1
@@ -1633,6 +1636,7 @@
 %changelog
 * Mon Mar  6 2006 Dave Jones <davej at redhat.com>
 - Disable DRI on Radeon R300 and above, due to instability. (#174646,#182196)
+- Don't do voluntary preempt until after bootup
 
 * Mon Mar  6 2006 Stephen Tweedie <sct at redhat.com>
 - Merge xen rebase with 1.2016 kernel




More information about the fedora-cvs-commits mailing list