RHEL4 Sun Java Messaging Server deadlock

Mon Jan 3 18:22:12 UTC 2011

> Date: Mon, 3 Jan 2011 11:00:59 -0500
> From: m.roth at 5-cent.us
> To: "General Red Hat Linux discussion list"<redhat-list at redhat.com>
> Subject: Re: RHEL4 Sun Java Messaging Server deadlock
> Message-ID:
> 	<a6459e522efd2f5e81b91dc1cc609e87.squirrel at host290.hostmonster.com>
> Content-Type: text/plain;charset=iso-8859-1
>
> John Dalbec wrote:
>> Sun Java(tm) System Messaging Server 6.2-9.20 (built Jul 15 2010)
>> libimta.so 6.2-9.20 (built 01:27:24, Jul 15 2010)
>> Linux myysumail.ysu.edu 2.6.9-89.31.1.ELhugemem #1 SMP Mon Oct 4
>> 22:04:11 EDT 2010 i686 i686 i386 GNU/Linux
>>
>> The IMAPd process appears to get into a deadlock with pdflush and
>> kjournald.  The mailboxes were initially stored on a disk partition but
>> I'm in the process of migrating them to a LVM logical partition.  Both
>> partitions are ext3.  I worry about using ext2 because a fsck takes
>> about 90 minutes.
>>
>> The server has 32GB RAM.  Right now Committed_AS is around 4GB, but the
>> system is lightly loaded.  I'm running a 32-bit kernel because the
>> application vendor doesn't support 64-bit.
>>
>> I can't upgrade to RHEL5 because the application vendor doesn't support
>> it.  When the deadlock happens it affects parts of /proc.  If I run "ps
>> ax" the "ps" process enters an uninterruptible wait and stays there.  I
>> have to power-cycle the system to return to normal operation.
>>
>> What are my options?  I have SELinux enforcing.  Should I disable it?  I
>> have vm.overcommit_memory = 1.  Should I set that to 0 or 2?  Do I have
>> to fall back to ext2?
>
> First question: what do the logs say? Are there complaints, AVCs, for
> example, in /var/log/audit/audit.log? Does messages say anything at all?
>
>          mark

No AVCs in /var/log/audit/audit.log.  There are some failure audits but 
I think those were mistyped passwords.  The only thing in 
/var/log/messages is iptables entries for logged-and-dropped DHCP packets.

The last time this happened I wasn't at work, but the time before I did 
SysRq-T and got (at least) 8 imapd threads in noninterruptible wait. 
Some threads didn't show up; in retrospect I suppose klogd overflowed 
its ring buffer.

Is this bug fixed in RHEL4?
http://kerneltrap.com/mailarchive/linux-ext4/2009/5/13/5695604/thread

The SysRq-T output shows threads creating and deleting inodes.  Does 
[jbd] mean this is against the LVM partition?

Thanks,
John