[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: 2.2.16 Alpha: "Fix SMP rescheduling with lock held"



> Greg Lindahl wrote:
> 
> > I have a bunch of DS20Es running 2.2.14-6.0smp (the kernel compiled by
> > RedHat for RH 6.2), and they have several spinlock stuck messages while
> > booting, and more systematically happen when doing certain things (loading a
> > particular module). What's more annoying is that the machines occasionally
> > freeze under heavy load, and when they do, they're periodically printing a
> > spinlock message on their console.
> 
> The "spinlock stuck" message is just a debugging aid AFAIK, albeit a possible
> hint for problems in design.  I routinely have AIM runs where I'll see these
> messages scroll on by the dozens, and performance is still quite good.  If
> you're really concerned, you could also check out the crash tools from Mission
> Critical (haven't tried them, but have used their counterparts on Tru64 and
> they're quite good).  If you're still concerned, maybe we can try to replicate
> what's happening here and I'll take a closer look.

I've recently been having lockups, too, with the 2.2.16 kernel with
spinlock stuck messages.  I recompiled with egcs-1.1.2 instead of
gcc-2.95.1 and made changes to the following files, which fix some
64bit uncleanliness, all instances of which were given warnings by
both compilers.  Now I'm crossing my fingers.

If anyone wants to send them to Alan Cox, go right ahead.

Brad Lucier

===================================================================
RCS file: RCS/nfs2xdr.c,v
retrieving revision 1.1
diff -c -r1.1 nfs2xdr.c
*** nfs2xdr.c	2000/06/09 21:19:39	1.1
--- nfs2xdr.c	2000/06/09 21:22:00
***************
*** 445,451 ****
  			printk(KERN_WARNING
  				"NFS: server %s, readdir reply truncated\n",
  				clnt->cl_server);
! 			printk(KERN_WARNING "NFS: nr=%d, slots=%d, len=%d\n",
  				nr, (end - p), len);
  			clnt->cl_flags |= NFS_CLNTF_BUFSIZE;
  			break;
--- 445,451 ----
  			printk(KERN_WARNING
  				"NFS: server %s, readdir reply truncated\n",
  				clnt->cl_server);
! 			printk(KERN_WARNING "NFS: nr=%d, slots=%ld, len=%d\n",
  				nr, (end - p), len);
  			clnt->cl_flags |= NFS_CLNTF_BUFSIZE;
  			break;
===================================================================
RCS file: RCS/xdr.c,v
retrieving revision 1.1
diff -c -r1.1 xdr.c
*** xdr.c	2000/06/09 21:23:00	1.1
--- xdr.c	2000/06/09 21:24:06
***************
*** 96,102 ****
  	if ((len = ntohl(*p++)) != sizeof(*f)) {
  		printk(KERN_NOTICE
  			"lockd: bad fhandle size %x (should be %d)\n",
! 			len, sizeof(*f));
  		return NULL;
  	}
  	memcpy(f, p, sizeof(*f));
--- 96,102 ----
  	if ((len = ntohl(*p++)) != sizeof(*f)) {
  		printk(KERN_NOTICE
  			"lockd: bad fhandle size %x (should be %d)\n",
! 			len, (int) sizeof(*f));
  		return NULL;
  	}
  	memcpy(f, p, sizeof(*f));
===================================================================
RCS file: RCS/svcsock.c,v
retrieving revision 1.1
diff -c -r1.1 svcsock.c
*** svcsock.c	2000/06/09 21:24:50	1.1
--- svcsock.c	2000/06/09 21:29:00
***************
*** 266,274 ****
  	set_fs(oldfs);
  #endif
  
! 	dprintk("svc: socket %p sendto([%p %d... ], %d, %d) = %d\n",
  			rqstp->rq_sock,
! 			iov[0].iov_base, iov[0].iov_len, nr,
  			buflen, len);
  
  	return len;
--- 266,274 ----
  	set_fs(oldfs);
  #endif
  
! 	dprintk("svc: socket %p sendto([%p %ld... ], %d, %d) = %d\n",
  			rqstp->rq_sock,
! 			iov[0].iov_base, (long) iov[0].iov_len, nr,
  			buflen, len);
  
  	return len;
***************
*** 326,333 ****
  	set_fs(oldfs);
  #endif
  
! 	dprintk("svc: socket %p recvfrom(%p, %d) = %d\n", rqstp->rq_sock,
! 				iov[0].iov_base, iov[0].iov_len, len);
  
  	return len;
  }
--- 326,333 ----
  	set_fs(oldfs);
  #endif
  
! 	dprintk("svc: socket %p recvfrom(%p, %ld) = %d\n", rqstp->rq_sock,
! 				iov[0].iov_base, (long) iov[0].iov_len, len);
  
  	return len;
  }
===================================================================
RCS file: RCS/aic7xxx.c,v
retrieving revision 1.1
diff -c -r1.1 aic7xxx.c
*** aic7xxx.c	2000/06/08 17:00:45	1.1
--- aic7xxx.c	2000/06/08 17:02:36
***************
*** 5320,5327 ****
          if(!(scb->flags & SCB_ACTIVE) || (scb->cmd == NULL))
          {
            printk(WARN_LEAD "invalid scb during WIDE_RESIDUE flags:0x%x "
!                  "scb->cmd:0x%x\n", p->host_no, CTL_OF_SCB(scb),
!                  scb->flags, (unsigned int)scb->cmd);
            break;
          }
  
--- 5320,5327 ----
          if(!(scb->flags & SCB_ACTIVE) || (scb->cmd == NULL))
          {
            printk(WARN_LEAD "invalid scb during WIDE_RESIDUE flags:0x%x "
!                  "scb->cmd:0x%lx\n", p->host_no, CTL_OF_SCB(scb),
!                  scb->flags, (unsigned long)scb->cmd);
            break;
          }
  



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index] []