[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Possible features for next major ext3 release [was: Ordered Mode vs Journaled Mode]



Hi,

On Wed, Oct 03, 2001 at 07:42:37PM -0700, Mike Fedyk wrote:

> > What's a "set of transactions"?  
> 
> I'm sure I'm not taking into account all of the possible concurrent
> operations that could happen, but let's take this example:
> 
> truncate $file (keep track which blocks it had before the truncate)
> write 100MB to $file 
> (write still in progress in journaled mode)
> power off

Except that for pretty much any application, the write will actually
be

	do {
		write (fd, buffer, PAGE_SIZE);
		count -= PAGE_SIZE;
	} while (count);

or similar.  Now how does the filesystem know when the 100MB write is
finished?  The filesystem just sees 4k writes.

> Hmm, it looks like what I'm asking for is combining a truncate and write
> transaction into one...

"rename" fits the bill quite well.  Write to a new file, rename it
over the old one.  The rename is guaranteed atomic (including the
deletion of the original inode.)

> >Do you really expect a write of a
> > 100MB file to be done atomically by the kernel?!
> 
> It seems possible if that is the only load, or most of it, especially for a
> 400mb journal (max with 4k blocks).

And do you really think that the kernel should wait however many
minutes it takes for the user to finish the write before committing
anything?  Other users on the system might be upset to find their
writes haven't made it to disk in the past 5 minutes because there was
one slow copy in progress at the time!

> All you need to do (for truncate; write) is just keep the data blocks that
> were alocated before to be kept reserved until so many new transactions have
> completed on that file...  This of course would be unreserved when we get a
> low % of free space.

It _can_ be done, but rename is the standard mechanism and it's not
worth complicating the whole unix API and fs internals for this case.
We give the user syscalls.  Those are the fundamental units of fs
operation.  "copy a file" is a syscall on windows, but not on Linux,
and we don't want to go down the route of polluting the kernel with an
enormous database-style transaction engine to support such atomic
copies.

Cheers,
 Stephen





[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]