[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: More external journal woes.



On Monday December 10, tytso mit edu wrote:
> On Tue, Dec 11, 2001 at 01:44:50PM +1100, Neil Brown wrote:
> > 
> >  - After the e2fsck fails, "tune2fs -l" on the journal device shows
> >    much the same superblock as on the main device.  Normally it
> >    fails to find a superblock on the journal device.
> 
> Use dumpe2fs, not "tune2fs -l".  Tune2fs doesn't know about the
> special superblocks used by the journal device.  The fact that
> "tune2fs -l" works after e2fsck fails is definitely very weird.  If
> you could send us the output of dumpe2fs on the journal device before
> and after the e2fsck failure, that would be very interesting.
> 

Thanks for the pointer to dumpe2fs.
It looks like e2fsck is writing the superblock from the filesys device
onto the journal device!

I have attached dumpe2fs outputs of md1 (the filesys device) and mda4
(the journal device).

 md1.postfsck  is the filesys device after a fsck which worked
 mda4.postfsck is the journal device after that fsck
 md1.postcrashfsck is the filesys device after a crash and then an
		    automatic fsck which failed
 mda4.postcrashfsck is the journal device after the same crash and
                    fsck.


This is a Debian/potato system, with e2fsprogs 1.25 takes from the
woody release and recompiled to work on potato.
The automatic fsck runs as:
 
  fsck -C -R -A -y

The only lines in fstab which have a non-zero pass number are:

  /dev/mda1       /               ext2    defaults,errors=remount-ro      0       1
  /dev/md1        /export/eno/1   ext3    rw,data=journal,grpid,treequota 0       1



Interestingly, I removed the filesys from fstab, crashed and rebooted,
and then ran fsck by hand, and it worked fine.
I put it back in fstab and crashed the machine, and the automatic fsck
at boot time dies: 
   /dev/md1: recovering journal
  (long pause, lots of disc io)
   External journal has bad superblock

The above-mentioned *.postcrashfsck come from after that error message.

> >  - e2fsck will not progress if the journal device is bad (e.g. when the
> >    super block is wrong as above).  I cannot say 'Ignore the journal
> >    and fsck'.  It just stops.  Even after I turn off has_journal (see
> >    below), it still won't let me fsck because there is a uuid and a
> >    journal device set in the superblock.  I now have a hacked e2fsck
> >    which ignores the journal.
> 
> This was using e2fsprogs 1.25?  I just tried creating a filesystem
> with an external journal device, then used debugfs to zip the
> has_journal flag, and then ran e2fsck.  It asked the question:
> "Superblock doesn't have has_journal flag, but has ext3 journal inode.
> Clear<y>?".  So it works for me....

# debugfs -w /dev/md1
debugfs 1.25 (20-Sep-2001)
debugfs:  feature ^has_journal
Filesystem features: filetype sparse_super
debugfs:
# e2fsck -V
e2fsck 1.25 (20-Sep-2001)
        Using EXT2FS Library version 1.25, 20-Sep-2001
# e2fsck /dev/md1
e2fsck 1.25 (20-Sep-2001)
External journal has bad superblock


> 
> >  - tune2fs doesn't let me turn off has_journal if needs_recovery is
> >    set, and doesn't let me turn off needs_recovery.  Fortunately
> >    debugfs does.  However it doesn't remove the journal uuid, or the
> >    journal device number from the superblock when I do turn of
> >    has_journal.  Nor does there seem to be a debugfs command to allow
> >    this.  Hence the need for the hacked e2fsck.
> 
> The reason behind this is that simply junking the journal is a very
> hazardous operation.  The filesystem is likely to be quite badly
> damaged if you just blithly throw away the journal before it is run.
> Granted, we need some kind of recovery if the journal device is
> completely trashed, but making it trivially easy for the user to shoot
> themself in the foot isn't such a great idea, either?

Certainly it should not be too easy.  I guess having debugfs able to
turn off needs_recovery is enough as long as e2fsck really does ignore the
journal after has_journal is clear.
The current test in e2fsck/journal.c is:

	/* If we don't have any journal features, don't do anything more */
	if (!(sb->s_feature_compat & EXT3_FEATURE_COMPAT_HAS_JOURNAL) &&
	    !recover && sb->s_journal_inum == 0 && sb->s_journal_dev == 0 &&
	    uuid_is_null(sb->s_journal_uuid))
 		return 0;

If "HAS_JOURNAL" is clear, why do you bother checking the inum and
journal_dev and uuid?

> 
> >  - tune2fs will allow me to set the journal device to a device which
> >    does not have a valid journal.  
> 
> Again, which version of e2fsprogs are you using?  Tune2fs should *NOT*
> be letting you set the journal device to a device which does not have
> a valid journal:
> 
> # tune2fs -J device=/dev/ram /tmp/foo.img
> tune2fs 1.25 (20-Sep-2001)
> tune2fs: Bad magic number in super-block 
> 	while trying to open journal on /dev/ram

Well, it depends on the superblock that is found I guess.
After I forced a fsck on the filesystem, but with the same corrupted
journal as above:

# tune2fs -J device=/dev/mda4 /dev/md1
tune2fs 1.25 (20-Sep-2001)
Creating journal on device /dev/mda4: done
This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.
# tune2fs -O ^has_journal /dev/md1
tune2fs 1.25 (20-Sep-2001)
/dev/mda4 is not a journal device.
Journal NOT removed


> 
> E2fsprogs will look up the journal device by UUID.  The long-range
> plan is to only support external journal devices via e2fsck, and not
> via the in-kernel mount scheme, and to not support use of an external
> journal for the root filesystem.  (There are a bunch of reasons why
> that would get horribly complicated, mainly having to do with how you
> recover if the journal device is temporarily off-line, so the plan was
> to simply not to support external journals for the root filesystem.)

Sounds reasonable... and as you can specify a journal device to e2fsck
by name, it can presumably update the devno in the superblock to be
found when mounting the filesys.

Thanks,

NeilBrown

PS.  I won't be able to do any testing on this for a while as my test
machine has to go into production and my new box doesn't arrive until
the new year.  But I will play some more sometime in January.

> 
> 						- Ted
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          3eb06026-2b30-4465-bab2-8e0db7858ee9
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype sparse_super
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              15695872
Block count:              31363024
Reserved block count:     1568151
Free blocks:              30869376
Free inodes:              15695806
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Last mount time:          Tue Dec 11 15:36:32 2001
Last write time:          Tue Dec 11 15:46:54 2001
Mount count:              9
Maximum mount count:      22
Last checked:             Tue Dec 11 13:46:01 2001
Check interval:           15552000 (6 months)
Next check after:         Sun Jun  9 12:46:01 2002
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:		  128
Journal UUID:             30d7f13c-c689-4a63-bf89-71aa2008164c
Journal inode:            0
Journal device:	          0x3c04
First orphan inode:       0


Group 0: (Blocks 0 -- 32767)
  Primary Superblock at 0,  Group Descriptors at 1-8
  Block bitmap at 523 (+523), Inode bitmap at 524 (+524)
  Inode table at 11-522 (+11)
  32229 free blocks, 16372 free inodes, 2 directories
  Free blocks: 528-529, 534-756, 764-32767
  Free inodes: 13-16384
Group 1: (Blocks 32768 -- 65535)
  Backup Superblock at 32768,  Group Descriptors at 32769-32776
  Block bitmap at 33307 (+539), Inode bitmap at 33308 (+540)
  Inode table at 32779-33290 (+11)
  32245 free blocks, 16384 free inodes, 0 directories
  Free blocks: 32777-32778, 33291-33306, 33309-65535
  Free inodes: 16385-32768
Group 2: (Blocks 65536 -- 98303)
  Block bitmap at 66091 (+555), Inode bitmap at 66092 (+556)
  Inode table at 65547-66058 (+11)
  31246 free blocks, 16382 free inodes, 1 directories
  Free blocks: 65537-65546, 67068-98303
  Free inodes: 32771-49152
....truncated
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          3eb06026-2b30-4465-bab2-8e0db7858ee9
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype sparse_super
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              15695872
Block count:              31363024
Reserved block count:     1568151
Free blocks:              30869441
Free inodes:              15695858
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Last mount time:          Tue Dec 11 15:51:02 2001
Last write time:          Tue Dec 11 15:56:18 2001
Mount count:              11
Maximum mount count:      22
Last checked:             Tue Dec 11 13:46:01 2001
Check interval:           15552000 (6 months)
Next check after:         Sun Jun  9 12:46:01 2002
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:		  128
Journal UUID:             30d7f13c-c689-4a63-bf89-71aa2008164c
Journal inode:            0
Journal device:	          0x3c04
First orphan inode:       0


Group 0: (Blocks 0 -- 32767)
  Primary Superblock at 0,  Group Descriptors at 1-8
  Block bitmap at 523 (+523), Inode bitmap at 524 (+524)
  Inode table at 11-522 (+11)
  32217 free blocks, 16372 free inodes, 2 directories
  Free blocks: 528-533, 557-32767
  Free inodes: 12, 14-16384
Group 1: (Blocks 32768 -- 65535)
  Backup Superblock at 32768,  Group Descriptors at 32769-32776
  Block bitmap at 33307 (+539), Inode bitmap at 33308 (+540)
  Inode table at 32779-33290 (+11)
  32245 free blocks, 16384 free inodes, 0 directories
  Free blocks: 32777-32778, 33291-33306, 33309-65535
  Free inodes: 16385-32768
Group 2: (Blocks 65536 -- 98303)
  Block bitmap at 66091 (+555), Inode bitmap at 66092 (+556)
  Inode table at 65547-66058 (+11)
  31246 free blocks, 16382 free inodes, 1 directories
  Free blocks: 65537-65546, 67068-98303
  Free inodes: 32771-49152
...truncated....
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          30d7f13c-c689-4a63-bf89-71aa2008164c
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      journal_dev
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              0
Block count:              1465931
Reserved block count:     0
Free blocks:              0
Free inodes:              0
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         0
Inode blocks per group:   0
Last mount time:          Thu Jan  1 10:00:00 1970
Last write time:          Tue Dec 11 13:45:25 2001
Mount count:              0
Maximum mount count:      21
Last checked:             Tue Dec 11 13:42:23 2001
Check interval:           15552000 (6 months)
Next check after:         Sun Jun  9 12:42:23 2002
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:		  128

Journal block size:       4096
Journal length:           1465931
Journal first block:      2
Journal sequence:         0x00014eb8
Journal start:            0
Journal number of users:  1
Journal users:            3eb06026-2b30-4465-bab2-8e0db7858ee9
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          3eb06026-2b30-4465-bab2-8e0db7858ee9
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal filetype needs_recovery sparse_super
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              15695872
Block count:              31363024
Reserved block count:     1568151
Free blocks:              30817862
Free inodes:              15695162
First block:              0
Block size:               4096
Fragment size:            4096
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   512
Last mount time:          Tue Dec 11 15:51:02 2001
Last write time:          Tue Dec 11 15:51:02 2001
Mount count:              11
Maximum mount count:      22
Last checked:             Tue Dec 11 13:46:01 2001
Check interval:           15552000 (6 months)
Next check after:         Sun Jun  9 12:46:01 2002
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:		  128
Journal UUID:             30d7f13c-c689-4a63-bf89-71aa2008164c
Journal inode:            0
Journal device:	          0x3c04
First orphan inode:       2195460

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]