problem with smp and isolcpus

Samad, Alex alexander.samad at hp.com
Fri Jul 29 00:43:46 UTC 2005


Hi

I have tried 2 things

1) stop the hal daemon from running - but this hasn't helped 8( still
crashing and dumping

2) I have removed cpu #2 with set cpu_enabled b - this seems to have the
best affect, in that the box is more stable, linux only see's 3 cpus and
not four.

I am guessing that the problem is not with the CPU barfing, looking at
the code in smp_call_function_on_cpu smp.c it looks like it is trying to
talk to the other cpu's and one of them is failing. And timing out which
is causing this problem - hence why I can remove cpu2 from within srm
and it is stable.

This brings me to my original question why isn't isolcpus working when I
boot with isolcpus=2 I thought it isolated cpu 2 from the schedular and
thus removed any chance of it running any tasks, threads etc.... Is this
as good as removing it from within srm, or is there a chance that int's
might still run on there.

I checked this with tasksel and ran it on the current pids all had masks
of f - it seems like the srm environment over rides the isolcpus option 

alex


> -----Original Message-----
> From: Estabrook, Jay
> Sent: Thursday, 28 July 2005 12:44 AM
> To: Samad, Alex
> Cc: Linux on Alpha processors; debian-alpha at lists.debian.org
> Subject: Re: problem with smp and isolcpus
> 
> On Wed, Jul 27, 2005 at 12:26:26PM +1000, Samad, Alex wrote:
> >
> > Seem to have a problem with one of my cpus on a ES45, cpu2 seems to
be
> > dying, I have had 3 lockups in 2 days
> >
> > Jul 26 12:26:23 keyzervega kernel: smp_call_function_on_cpu: initial
> > timeout -- trying long wait
> > Jul 26 12:26:53 keyzervega kernel: lib/kernel_lock.c:229 spinlock
stuck
> > in nifd at fffffc00012c65f0(3) owner hald-addon-stor at
fffffc00012c65f
> > 0(0) lib/kernel_lock.c:229
> > Jul 26 12:26:53 keyzervega kernel: lib/kernel_lock.c:229 spinlock
stuck
> > in automount at fffffc00012c65f0(1) owner hald-addon-stor at
fffffc0001
> > 2c65f0(0) lib/kernel_lock.c:229
> > Jul 26 12:26:53 keyzervega kernel: Kernel bug at
> > arch/alpha/kernel/smp.c:858
> > Jul 26 12:26:53 keyzervega kernel: CPU 0 hald-addon-stor(1801):
Kernel
> > Bug 1
> 
> From the above messages, it'd be more likely that CPU #0 was bad,
because
> that was where the lock was being held for too long.
> 
> However, what is more likely, is that the HAL daemon has crashed.
> 
> I've seen a number of machine checks due to HAL daemon startup, and
> recommend thet it NOT BE STARTED.
> 
> > Is this a know issue is the a resolve, if not where can I log a bug?
> > Where is bug tracking for it ?
> 
> The problems with the HAL daemon are known issues on Alpha.
> 
>  --Jay++
> 
> ---------------------------------------------------------------
> Jay A Estabrook                            HPTC - XC I & B
> Hewlett-Packard Company - ZKO1-3/D-B.8     (603) 884-0301
> 110 Spit Brook Road, Nashua NH 03062       Jay.Estabrook at hp.com
> ---------------------------------------------------------------




More information about the axp-list mailing list