[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]
Re: More external journal woes.
- From: Neil Brown <neilb cse unsw edu au>
- To: Theodore Tso <tytso mit edu>
- Cc: ext3-users redhat com
- Subject: Re: More external journal woes.
- Date: Tue, 11 Dec 2001 16:37:16 +1100 (EST)
On Monday December 10, tytso mit edu wrote:
> On Tue, Dec 11, 2001 at 01:44:50PM +1100, Neil Brown wrote:
> >
> > - After the e2fsck fails, "tune2fs -l" on the journal device shows
> > much the same superblock as on the main device. Normally it
> > fails to find a superblock on the journal device.
>
> Use dumpe2fs, not "tune2fs -l". Tune2fs doesn't know about the
> special superblocks used by the journal device. The fact that
> "tune2fs -l" works after e2fsck fails is definitely very weird. If
> you could send us the output of dumpe2fs on the journal device before
> and after the e2fsck failure, that would be very interesting.
>
Thanks for the pointer to dumpe2fs.
It looks like e2fsck is writing the superblock from the filesys device
onto the journal device!
I have attached dumpe2fs outputs of md1 (the filesys device) and mda4
(the journal device).
md1.postfsck is the filesys device after a fsck which worked
mda4.postfsck is the journal device after that fsck
md1.postcrashfsck is the filesys device after a crash and then an
automatic fsck which failed
mda4.postcrashfsck is the journal device after the same crash and
fsck.
This is a Debian/potato system, with e2fsprogs 1.25 takes from the
woody release and recompiled to work on potato.
The automatic fsck runs as:
fsck -C -R -A -y
The only lines in fstab which have a non-zero pass number are:
/dev/mda1 / ext2 defaults,errors=remount-ro 0 1
/dev/md1 /export/eno/1 ext3 rw,data=journal,grpid,treequota 0 1
Interestingly, I removed the filesys from fstab, crashed and rebooted,
and then ran fsck by hand, and it worked fine.
I put it back in fstab and crashed the machine, and the automatic fsck
at boot time dies:
/dev/md1: recovering journal
(long pause, lots of disc io)
External journal has bad superblock
The above-mentioned *.postcrashfsck come from after that error message.
> > - e2fsck will not progress if the journal device is bad (e.g. when the
> > super block is wrong as above). I cannot say 'Ignore the journal
> > and fsck'. It just stops. Even after I turn off has_journal (see
> > below), it still won't let me fsck because there is a uuid and a
> > journal device set in the superblock. I now have a hacked e2fsck
> > which ignores the journal.
>
> This was using e2fsprogs 1.25? I just tried creating a filesystem
> with an external journal device, then used debugfs to zip the
> has_journal flag, and then ran e2fsck. It asked the question:
> "Superblock doesn't have has_journal flag, but has ext3 journal inode.
> Clear<y>?". So it works for me....
# debugfs -w /dev/md1
debugfs 1.25 (20-Sep-2001)
debugfs: feature ^has_journal
Filesystem features: filetype sparse_super
debugfs:
# e2fsck -V
e2fsck 1.25 (20-Sep-2001)
Using EXT2FS Library version 1.25, 20-Sep-2001
# e2fsck /dev/md1
e2fsck 1.25 (20-Sep-2001)
External journal has bad superblock
>
> > - tune2fs doesn't let me turn off has_journal if needs_recovery is
> > set, and doesn't let me turn off needs_recovery. Fortunately
> > debugfs does. However it doesn't remove the journal uuid, or the
> > journal device number from the superblock when I do turn of
> > has_journal. Nor does there seem to be a debugfs command to allow
> > this. Hence the need for the hacked e2fsck.
>
> The reason behind this is that simply junking the journal is a very
> hazardous operation. The filesystem is likely to be quite badly
> damaged if you just blithly throw away the journal before it is run.
> Granted, we need some kind of recovery if the journal device is
> completely trashed, but making it trivially easy for the user to shoot
> themself in the foot isn't such a great idea, either?
Certainly it should not be too easy. I guess having debugfs able to
turn off needs_recovery is enough as long as e2fsck really does ignore the
journal after has_journal is clear.
The current test in e2fsck/journal.c is:
/* If we don't have any journal features, don't do anything more */
if (!(sb->s_feature_compat & EXT3_FEATURE_COMPAT_HAS_JOURNAL) &&
!recover && sb->s_journal_inum == 0 && sb->s_journal_dev == 0 &&
uuid_is_null(sb->s_journal_uuid))
return 0;
If "HAS_JOURNAL" is clear, why do you bother checking the inum and
journal_dev and uuid?
>
> > - tune2fs will allow me to set the journal device to a device which
> > does not have a valid journal.
>
> Again, which version of e2fsprogs are you using? Tune2fs should *NOT*
> be letting you set the journal device to a device which does not have
> a valid journal:
>
> # tune2fs -J device=/dev/ram /tmp/foo.img
> tune2fs 1.25 (20-Sep-2001)
> tune2fs: Bad magic number in super-block
> while trying to open journal on /dev/ram
Well, it depends on the superblock that is found I guess.
After I forced a fsck on the filesystem, but with the same corrupted
journal as above:
# tune2fs -J device=/dev/mda4 /dev/md1
tune2fs 1.25 (20-Sep-2001)
Creating journal on device /dev/mda4: done
This filesystem will be automatically checked every 22 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
# tune2fs -O ^has_journal /dev/md1
tune2fs 1.25 (20-Sep-2001)
/dev/mda4 is not a journal device.
Journal NOT removed
>
> E2fsprogs will look up the journal device by UUID. The long-range
> plan is to only support external journal devices via e2fsck, and not
> via the in-kernel mount scheme, and to not support use of an external
> journal for the root filesystem. (There are a bunch of reasons why
> that would get horribly complicated, mainly having to do with how you
> recover if the journal device is temporarily off-line, so the plan was
> to simply not to support external journals for the root filesystem.)
Sounds reasonable... and as you can specify a journal device to e2fsck
by name, it can presumably update the devno in the superblock to be
found when mounting the filesys.
Thanks,
NeilBrown
PS. I won't be able to do any testing on this for a while as my test
machine has to go into production and my new box doesn't arrive until
the new year. But I will play some more sometime in January.
>
> - Ted
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 3eb06026-2b30-4465-bab2-8e0db7858ee9
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype sparse_super
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 15695872
Block count: 31363024
Reserved block count: 1568151
Free blocks: 30869376
Free inodes: 15695806
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
Last mount time: Tue Dec 11 15:36:32 2001
Last write time: Tue Dec 11 15:46:54 2001
Mount count: 9
Maximum mount count: 22
Last checked: Tue Dec 11 13:46:01 2001
Check interval: 15552000 (6 months)
Next check after: Sun Jun 9 12:46:01 2002
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal UUID: 30d7f13c-c689-4a63-bf89-71aa2008164c
Journal inode: 0
Journal device: 0x3c04
First orphan inode: 0
Group 0: (Blocks 0 -- 32767)
Primary Superblock at 0, Group Descriptors at 1-8
Block bitmap at 523 (+523), Inode bitmap at 524 (+524)
Inode table at 11-522 (+11)
32229 free blocks, 16372 free inodes, 2 directories
Free blocks: 528-529, 534-756, 764-32767
Free inodes: 13-16384
Group 1: (Blocks 32768 -- 65535)
Backup Superblock at 32768, Group Descriptors at 32769-32776
Block bitmap at 33307 (+539), Inode bitmap at 33308 (+540)
Inode table at 32779-33290 (+11)
32245 free blocks, 16384 free inodes, 0 directories
Free blocks: 32777-32778, 33291-33306, 33309-65535
Free inodes: 16385-32768
Group 2: (Blocks 65536 -- 98303)
Block bitmap at 66091 (+555), Inode bitmap at 66092 (+556)
Inode table at 65547-66058 (+11)
31246 free blocks, 16382 free inodes, 1 directories
Free blocks: 65537-65546, 67068-98303
Free inodes: 32771-49152
....truncated
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 3eb06026-2b30-4465-bab2-8e0db7858ee9
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype sparse_super
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 15695872
Block count: 31363024
Reserved block count: 1568151
Free blocks: 30869441
Free inodes: 15695858
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
Last mount time: Tue Dec 11 15:51:02 2001
Last write time: Tue Dec 11 15:56:18 2001
Mount count: 11
Maximum mount count: 22
Last checked: Tue Dec 11 13:46:01 2001
Check interval: 15552000 (6 months)
Next check after: Sun Jun 9 12:46:01 2002
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal UUID: 30d7f13c-c689-4a63-bf89-71aa2008164c
Journal inode: 0
Journal device: 0x3c04
First orphan inode: 0
Group 0: (Blocks 0 -- 32767)
Primary Superblock at 0, Group Descriptors at 1-8
Block bitmap at 523 (+523), Inode bitmap at 524 (+524)
Inode table at 11-522 (+11)
32217 free blocks, 16372 free inodes, 2 directories
Free blocks: 528-533, 557-32767
Free inodes: 12, 14-16384
Group 1: (Blocks 32768 -- 65535)
Backup Superblock at 32768, Group Descriptors at 32769-32776
Block bitmap at 33307 (+539), Inode bitmap at 33308 (+540)
Inode table at 32779-33290 (+11)
32245 free blocks, 16384 free inodes, 0 directories
Free blocks: 32777-32778, 33291-33306, 33309-65535
Free inodes: 16385-32768
Group 2: (Blocks 65536 -- 98303)
Block bitmap at 66091 (+555), Inode bitmap at 66092 (+556)
Inode table at 65547-66058 (+11)
31246 free blocks, 16382 free inodes, 1 directories
Free blocks: 65537-65546, 67068-98303
Free inodes: 32771-49152
...truncated....
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 30d7f13c-c689-4a63-bf89-71aa2008164c
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: journal_dev
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 0
Block count: 1465931
Reserved block count: 0
Free blocks: 0
Free inodes: 0
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 0
Inode blocks per group: 0
Last mount time: Thu Jan 1 10:00:00 1970
Last write time: Tue Dec 11 13:45:25 2001
Mount count: 0
Maximum mount count: 21
Last checked: Tue Dec 11 13:42:23 2001
Check interval: 15552000 (6 months)
Next check after: Sun Jun 9 12:42:23 2002
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal block size: 4096
Journal length: 1465931
Journal first block: 2
Journal sequence: 0x00014eb8
Journal start: 0
Journal number of users: 1
Journal users: 3eb06026-2b30-4465-bab2-8e0db7858ee9
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 3eb06026-2b30-4465-bab2-8e0db7858ee9
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal filetype needs_recovery sparse_super
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 15695872
Block count: 31363024
Reserved block count: 1568151
Free blocks: 30817862
Free inodes: 15695162
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
Last mount time: Tue Dec 11 15:51:02 2001
Last write time: Tue Dec 11 15:51:02 2001
Mount count: 11
Maximum mount count: 22
Last checked: Tue Dec 11 13:46:01 2001
Check interval: 15552000 (6 months)
Next check after: Sun Jun 9 12:46:01 2002
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal UUID: 30d7f13c-c689-4a63-bf89-71aa2008164c
Journal inode: 0
Journal device: 0x3c04
First orphan inode: 2195460
[Date Prev][Date Next] [Thread Prev][Thread Next]
[Thread Index]
[Date Index]
[Author Index]