problem with smp and isolcpus

Samad, Alex alexander.samad at hp.com
Wed Jul 27 02:26:26 UTC 2005


Hi

Seem to have a problem with one of my cpus on a ES45, cpu2 seems to be
dying, I have had 3 lockups in 2 days 

Jul 26 12:26:23 keyzervega kernel: smp_call_function_on_cpu: initial
timeout -- trying long wait
Jul 26 12:26:53 keyzervega kernel: lib/kernel_lock.c:229 spinlock stuck
in nifd at fffffc00012c65f0(3) owner hald-addon-stor at fffffc00012c65f
0(0) lib/kernel_lock.c:229
Jul 26 12:26:53 keyzervega kernel: lib/kernel_lock.c:229 spinlock stuck
in automount at fffffc00012c65f0(1) owner hald-addon-stor at fffffc0001
2c65f0(0) lib/kernel_lock.c:229
Jul 26 12:26:53 keyzervega kernel: Kernel bug at
arch/alpha/kernel/smp.c:858
Jul 26 12:26:53 keyzervega kernel: CPU 0 hald-addon-stor(1801): Kernel
Bug 1
Jul 26 12:26:53 keyzervega kernel: pc = [<fffffc000101c4ac>]  ra =
[<fffffc000101c404>]  ps = 0000    Not tainted
Jul 26 12:26:53 keyzervega kernel: pc is at
smp_call_function_on_cpu+0x220/0x264,  ra is at
smp_call_function_on_cpu+0x178/0x264
Jul 26 12:26:53 keyzervega kernel: v0 = 0000000000000041  t0 =
0000000000000001  t1 = 0000000000000001
Jul 26 12:26:53 keyzervega kernel: t2 = 0000000100728747  t3 =
fffffc0008bbd108  t4 = 000000003b5f2d38
Jul 26 12:26:53 keyzervega kernel: t5 = 0000000000000089  t6 =
fffffc03fe78d640  t7 = fffffc03f4118000
Jul 26 12:26:53 keyzervega kernel: a0 = 0000000000000000  a1 =
0000000000000000  a2 = 0000000000000001
Jul 26 12:26:53 keyzervega kernel: a3 = 0000000000000000  a4 =
fffffc00012c6038  a5 = 0000000000000000
Jul 26 12:26:53 keyzervega kernel: t8 = 0000000000000200  t9 =
0000000000000020  t10= 0000000000000000
Jul 26 12:26:53 keyzervega kernel: t11= 0000000000000001  pv =
fffffc000101ca78  at = 0000000000000000
Jul 26 12:26:53 keyzervega kernel: gp = fffffc00018b2d00  sp =
fffffc03f411bde8
Jul 26 12:26:53 keyzervega kernel: Trace:
Jul 26 12:26:53 keyzervega kernel: [<fffffc000108ad04>]
invalidate_bdev+0x3c/0x84
Jul 26 12:26:53 keyzervega kernel: [<fffffc000108ba9c>]
invalidate_bh_lru+0x0/0x74
Jul 26 12:26:53 keyzervega kernel: [<fffffc000108ba9c>]
invalidate_bh_lru+0x0/0x74
Jul 26 12:26:53 keyzervega kernel: [<fffffc0001093098>]
kill_bdev+0x24/0x58
Jul 26 12:26:53 keyzervega kernel: [<fffffc0001094020>]
blkdev_put+0xa8/0x26c
Jul 26 12:26:53 keyzervega kernel: [<fffffc00010898d8>]
__fput+0x80/0x1bc
Jul 26 12:26:53 keyzervega kernel: [<fffffc0001087f64>]
filp_close+0xb0/0xd4
Jul 26 12:26:53 keyzervega kernel: [<fffffc000108806c>]
sys_close+0xe4/0x114
Jul 26 12:26:53 keyzervega kernel: [<fffffc0001010ff4>] entSys+0xa4/0xc0



I have had a look through and I haven't seen anything for CPU 2 so I am
presuming that it is CPU that is dying the death.

I thought I would isolate cpu 2 from the schedular but when I try
placing isolcpus=2 in the kernel parameter it doesn't seem to make any
difference for the schel, the affinity mask for all the processes is
still f and less /var/log/dmesg still shows that it is using 4 cpus!

I would prefer to do it in linux so I can test the cpu and not mask it
out in srm, which it looks like I am going to have to do.

Is this a know issue is the a resolve, if not where can I log a bug?
Where is bug tracking for it ?

Alex
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/axp-list/attachments/20050727/b3d40fc5/attachment.htm>


More information about the axp-list mailing list