[rhn-users] EXT3-fs error : Filename corruption after filesystem re-mounting on shared storage

Jude T. Cruz jude at csn.com.my
Mon Apr 19 12:27:13 UTC 2004


Folks,
 
 
We are in the midst of setting a 2 node cluster using Red Hat Cluster
Manager. The hardware summary is as follows. 
 
2 units of HP Proliant DL580 connected using Smart Array 532 SCSI HBA to
a HP MSA500.  The cluster initialization completed without any errors
and we tried a failover using Samba it works fine. We stopped the
cluster config and installed Oracle 9i RDBMS one node and Oracle 10g
Apps on another node. Node one is called ecos1 and the 2nd node is
ecos2.
 
The kernel version is linux-2.4.90e.38smp 
 
Both servers are accesing different filesystems on the shared storage.
When we wanted to test the Oracle Database, we shutdown the database
followed by the server(ecos1) itself. We then started the 2nd node from
power-down stage and tried to mount the Oracle Database filesystems, it
mounted cleanly  but took some time at  when we tried to su as oracle.
At the background I captured the following errors in /var/log/messages
:-
 
Apr 19 16:46:04 ecos2 syslogd 1.4.1: restart.
Apr 19 16:47:37 ecos2 kernel: kjournald starting.  Commit interval 5
seconds
Apr 19 16:47:37 ecos2 kernel: EXT3 FS 2.4-0.9.11, 3 Oct 2001 on
cciss1(105,6), internal journal
Apr 19 16:47:37 ecos2 kernel: EXT3-fs: mounted filesystem with ordered
data mode.
Apr 19 16:47:53 ecos2 kernel: kjournald starting.  Commit interval 5
seconds
Apr 19 16:47:53 ecos2 kernel: EXT3 FS 2.4-0.9.11, 3 Oct 2001 on
cciss1(105,7), internal journal
Apr 19 16:47:53 ecos2 kernel: EXT3-fs: mounted filesystem with ordered
data mode.
Apr 19 16:48:02 ecos2 kernel: kjournald starting.  Commit interval 5
seconds
Apr 19 16:48:02 ecos2 kernel: EXT3 FS 2.4-0.9.11, 3 Oct 2001 on
cciss1(105,8), internal journal
Apr 19 16:48:02 ecos2 kernel: EXT3-fs: mounted filesystem with ordered
data mode.
Apr 19 16:49:15 ecos2 kernel: st: Version 20010812, bufsize 32768, wrt
30720, max init. bufs 4, s/g segs 16
Apr 19 16:49:15 ecos2 kernel: Attached scsi tape st0 at scsi0, channel
0, id 0, lun 0
Apr 19 16:49:15 ecos2 kernel: st0: Block limits 1 - 16777215 bytes.
Apr 19 16:52:51 ecos2 su(pam_unix)[8905]: session opened for user oracle
by root(uid=0)
Apr 19 16:53:00 ecos2 kernel: cciss: cmd f6960000 timedout
Apr 19 16:53:13 ecos2 last message repeated 2 times
Apr 19 16:58:14 ecos2 su(pam_unix)[8905]: session closed for user oracle
Apr 19 17:01:59 ecos2 kernel: cciss: cmd f6960000 timedout
Apr 19 17:01:59 ecos2 kernel: EXT3-fs error (device cciss1(105,6)):
ext3_readdir: directory #1632391 contains a hole at offset 0
Apr 19 17:06:03 ecos2 PAM-securetty[1203]: Couldn't open /etc/securetty
Apr 19 17:06:05 ecos2 login(pam_unix)[1203]: session opened for user
root by LOGIN(uid=0)
Apr 19 17:06:05 ecos2  -- root[1203]: ROOT LOGIN ON tty4
Apr 19 17:12:21 ecos2 su(pam_unix)[10127]: session opened for user
oracle by root(uid=0)
Apr 19 17:14:16 ecos2 su(pam_unix)[10127]: session closed for user
oracle
 
When we tried to run sqlplus the executable was not found but actually
the file has been renamed as sqlplusO. There were other  files which has
O or 0 appended at the end.
 
I suspect it due to the filesystem error :-
Apr 19 17:01:59 ecos2 kernel: EXT3-fs error (device cciss1(105,6)):
ext3_readdir: directory #1632391 contains a hole at offset 0
 
Appreciate amy advise.
 
regards,
Jude T. Cruz
e-mail  : jude at csn.com.my  
 
 
 


More information about the rhn-users mailing list