[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Problems with ext3 fs



Hi,

Apologies, this is going to be quite long - I'm going to provide as much
info as possible.

I'm running a system with ext3 fs on software RAID. The RAID set-up is as
shown below:

jlm nijinsky:~$ cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5]
read_ahead 1024 sectors
md0 : active raid1 hdc1[1] hda1[0]
      96256 blocks [2/2] [UU]

md5 : active raid1 hdk1[1] hde1[0]
      976640 blocks [2/2] [UU]

md6 : active raid1 hdk5[1] hde5[0]
      292672 blocks [2/2] [UU]

md7 : active raid1 hdk6[1] hde6[0]
      1952896 blocks [2/2] [UU]

md8 : active raid1 hdk7[1] hde7[0]
      976640 blocks [2/2] [UU]

md9 : active raid1 hdk8[1] hde8[0]
      9765376 blocks [2/2] [UU]

md10 : active raid0 hdk9[1] hde9[0]
      12108800 blocks 4k chunks

md12 : active raid5 hdk3[3] hde3[2] hdc2[1] hda2[0]
      59978304 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]

md11 : active raid1 hdk4[1] hde4[0]
      170240 blocks [2/2] [UU]

Now, the filesystems are set-up as shown:

jlm nijinsky:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md5              939M  238M  653M  27% /
/dev/md0               91M   23M   63M  27% /boot
/dev/md6              277M  8.1M  254M   4% /tmp
/dev/md7              1.8G  1.5G  360M  81% /usr
/dev/md8              939M  398M  541M  43% /var
/dev/md9              9.2G  5.1G  3.6G  59% /home
/dev/md10              11G  1.7G  9.1G  16% /scratch
/dev/md12              56G   49G  7.7G  87% /global

with /etc/fstab as follows:

jlm nijinsky:~$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# <file system>	<mount point>	<type>	<options>		<dump>
<pass>
/dev/md0	/boot		ext3	defaults,errors=remount-ro 0
1
/dev/md5	/		ext3	defaults,errors=remount-ro 0
1
/dev/md6	/tmp		ext3	defaults,errors=remount-ro 0
1
/dev/md7	/usr		ext3	defaults,errors=remount-ro 0
1
/dev/md8	/var		ext2	defaults,errors=remount-ro 0
1
/dev/md9	/home		ext3	defaults,errors=remount-ro 0
1
/dev/md10	/scratch	ext3	defaults,errors=remount-ro 0
1
/dev/md11      	none            swap    sw                         0
0
/dev/md12	/global		ext3	defaults,errors=remount-ro 0
1
/dev/sr0   	/dvdrom  	iso9660 defaults,noauto,ro,user    0
0
proc            /proc           proc    defaults		   0
0

Lastly, I'm running a 2.4.17 kernel. The machine itself is a Duron800
system with a VIA chipset and the drives are connected as follows (all
kernel modules are compiled in for the hardware - as is ext3):

Mainboard Primary:   20Gb
Mainboard Secondary: 20Gb
Promise card 1:      40Gb
Promise card 2:      40Gb

Now the problem. Since running on ext3 the /var fs kept switching to RO
mode. I emailed this list some time ago but haven't had a chance to test
fully. As a stopgap method I switched /var back to ext2. I've since
switched back to ext3 on /var and had trouble again (I thought that upping
the kernel might show a difference - so I'm now on 2.4.17).

Basically, in my logs I keep getting:

Feb 27 08:22:05 nijinsky kernel: attempt to access beyond end of device
Feb 27 08:22:05 nijinsky kernel: 09:07: rw=2, want=251691012,
limit=1952896

Feb 27 22:15:46 nijinsky kernel: attempt to access beyond end of device
Feb 27 22:15:46 nijinsky kernel: 09:08: rw=2, want=447774724, limit=976640

Feb 28 07:35:53 nijinsky kernel: attempt to access beyond end of device
Feb 28 07:35:53 nijinsky kernel: 09:07: rw=2, want=251691012,
limit=1952896

These were the 'errors' - but the last error seemed to trip the /var fs
into ro mode.

I unmounted the /var partition and ran fsck. Here was the result:

nijinsky:/home/jlm# fsck /dev/md8
fsck 1.25 (20-Sep-2001)
e2fsck 1.25 (20-Sep-2001)
/dev/md8: recovering journal
/dev/md8 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Inode 3905 has illegal block(s).  Clear<y>? yes

Illegal block #2 (31285248) in inode 3905.  CLEARED.
Illegal block #4 (164077568) in inode 3905.  CLEARED.
Inode 3905, i_size is 2914, should be 24576.  Fix<y>? yes

Inode 3905, i_blocks is 8, should be 24.  Fix<y>? yes

Inode 18145 has illegal block(s).  Clear<y>? yes

Illegal block #2 (134492160) in inode 18145.  CLEARED.
Illegal block #4 (74215424) in inode 18145.  CLEARED.
Inode 18145, i_size is 2290, should be 24576.  Fix<y>? yes

Inode 18145, i_blocks is 8, should be 24.  Fix<y>? yes

Deleted inode 30895 has zero dtime.  Fix<y>? yes

Inodes that were part of a corrupted orphan linked list found.  Fix<y>?
yes

Inode 34342 was part of the orphaned inode list.  FIXED.
Inode 34343 was part of the orphaned inode list.  FIXED.
Inode 34541 was part of the orphaned inode list.  FIXED.
Inode 45930 was part of the orphaned inode list.  FIXED.
Inode 76328 was part of the orphaned inode list.  FIXED.
Inode 76679 was part of the orphaned inode list.  FIXED.
Inode 76699 was part of the orphaned inode list.  FIXED.
Inode 78881 was part of the orphaned inode list.  FIXED.
Inode 80865 has illegal block(s).  Clear<y>? yes

Illegal block #2 (111943680) in inode 80865.  CLEARED.
Illegal block #3 (2147487744) in inode 80865.  CLEARED.
Illegal block #4 (194392064) in inode 80865.  CLEARED.
Inode 80865, i_size is 4096, should be 24576.  Fix<y>? yes

Inode 80865, i_blocks is 8, should be 16.  Fix<y>? yes

Duplicate blocks found... invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 8: 4096
Duplicate/bad block(s) in inode 3905: 4096 4096
Duplicate/bad block(s) in inode 18145: 4096 4096
Duplicate/bad block(s) in inode 80865: 4096
Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 4 inodes containing duplicate/bad blocks.)

File /spool/squid/07/59 (inode #80865, mod time Thu Jul 26 19:16:49 2001)
  has 1 duplicate block(s), shared with 3 file(s):
        <The journal inode> (inode #8, mod time Fri Feb 15 22:33:11 2002)

/spool/news/message.id/566/<q1fd8 10136$Ah1 912475 news2-win server ntlworld com>
(inode #18145, mod time Thu Feb 21 23:16:07 2002)
        /spool/squid/00/20/000020DE (inode #3905, mod time Sat Feb  2
00:05:26 2002)
Clone duplicate/bad blocks<y>? yes

File
/spool/news/message.id/566/<q1fd8 10136$Ah1 912475 news2-win server ntlworld com>
(inode #18145, mod time Thu Feb 21 23:16:07 2002)
  has 2 duplicate block(s), shared with 3 file(s):
        <The journal inode> (inode #8, mod time Fri Feb 15 22:33:11 2002)
        /spool/squid/07/59 (inode #80865, mod time Thu Jul 26 19:16:49
2001)
        /spool/squid/00/20/000020DE (inode #3905, mod time Sat Feb  2
00:05:26 2002)
Clone duplicate/bad blocks<y>? yes

File /spool/squid/00/20/000020DE (inode #3905, mod time Sat Feb  2
00:05:26 2002)
  has 2 duplicate block(s), shared with 3 file(s):
        <The journal inode> (inode #8, mod time Fri Feb 15 22:33:11 2002)
        /spool/squid/07/59 (inode #80865, mod time Thu Jul 26 19:16:49
2001)

/spool/news/message.id/566/<q1fd8 10136$Ah1 912475 news2-win server ntlworld com>
(inode #18145, mod time Thu Feb 21 23:16:07 2002)
Clone duplicate/bad blocks<y>? yes

File <The journal inode> (inode #8, mod time Fri Feb 15 22:33:11 2002)
  has 1 duplicate block(s), shared with 3 file(s):
        /spool/squid/07/59 (inode #80865, mod time Thu Jul 26 19:16:49
2001)

/spool/news/message.id/566/<q1fd8 10136$Ah1 912475 news2-win server ntlworld com>
(inode #18145, mod time Thu Feb 21 23:16:07 2002)
        /spool/squid/00/20/000020DE (inode #3905, mod time Sat Feb  2
00:05:26 2002)
Duplicated blocks already reassigned or cloned.

Pass 2: Checking directory structure
Directory inode 80865 has an unallocated block #2.  Allocate<y>? yes

Directory inode 80865 has an unallocated block #3.  Allocate<y>? yes

Directory inode 80865 has an unallocated block #4.  Allocate<y>? yes

Directory inode 80865, block 5, offset 0: directory corrupted
Salvage<y>? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -14840 -71993 -99016 -165850 -165876 -165877
-165878 -165879
Fix<y>? yes

Free blocks count wrong for group #0 (19705, counted=19698).
Fix<y>? yes

Free blocks count wrong for group #2 (20689, counted=20690).
Fix<y>? yes

Free blocks count wrong for group #3 (22696, counted=22697).
Fix<y>? yes

Free blocks count wrong for group #5 (22117, counted=22122).
Fix<y>? yes

Inode bitmap differences:  -30895 -34342 -34343 -34541 -45930 -76328
-76679 -76699 -78881
Fix<y>? yes

Free inodes count wrong for group #2 (10536, counted=10540).
Fix<y>? yes

Free inodes count wrong for group #3 (10676, counted=10677).
Fix<y>? yes

Free inodes count wrong for group #5 (10519, counted=10523).
Fix<y>? yes

Free inodes count wrong (85120, counted=85129).
Fix<y>? yes


/dev/md8: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md8: 36983/122112 files (0.5% non-contiguous), 105372/244160 blocks


Now, when the system was running under ext2 mode - the fs tripped into ro
mode once in three months. Under ext3 it seems to be daily (the erros in
the syslog are more frequent).

I don't think that it is hardware/disks as the other partitions have all
been OK - and they are spread across the same physical disks. It is only
/var that seems to be affected - and to me it seems like the machine is
under load when it trips.

I'm looking here for suggestions/diagnosis. I realise that this might not
be an ext3 problem but the problem has manifested itself since swtiching
to ext3 (but, once under ext2 also - but ext3 seems to make the system
fail more often). I'm also hoping that somebody can make sense of the
numbers in the errors and the fsck log.

If I can provide any more information let me know. Any comments
appreciated.

Many thanks in advance,

John.





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]