[linux-lvm] LVM Snapshots & journal recovery

Wed May 23 19:10:37 UTC 2001

On Wed, 23 May 2001, Peter Braam wrote:

>On Wed, 23 May 2001, Steven Pratt wrote:
>> Peter, I fail to see how your proposed method of remapping writes to the
>> original to the snapshot volume avoids the synchronous write penalties.

>Hmm, I think you are raising many interesting issues.

>> For example in your scenario, the filesystem journal itself will have to
be
>> written to the snapshot volume, so in order to ensure journal
consistency
>> you will need to write the remapping meta data synchronously with the
>> remapped journal chunk or you would not know that the journal had been
>> remapped.  Both writes must done synchronously or you have possible
>> corruption.
>>

>The journal itself really shouldn't be on a snapshotted device - that
>makes no sense.

Well that depends on where the filesystem keeps the journal.  If the
journal is stored on the same volume as the filesystem, then if you are
snapshotting from a Volume Manager you have no choice but to remap the
journal to the snapshot device.  If the journal is stored on another volume
than I agree.  However my point still apply to regular data.

>The write operations are now idempotent: there is in fact no need
>whatsoever to write anything synchronous since the journal replay will
>always get it right.

Hmmmm, you seem to be implying that the snapshot metadata is covered by the
journal.  While this may be possible (and desirable) if the snapshotting is
done in the filesystem, I don't see how this is possible when the remapping
happens in the Volume Manager below the filesystem layer.

>For writing I believe what I have proposed is vastly superior:
>no synchronous writes _and_ no copying of blocks.

As stated above, I don't see how you can avoid the synchronous write, but
you do save the copy.

>> In addition to having to do the same number of synchronous writes, you
have
>> the additional penalty that you mentioned of the sync time when deleting
>> the snapshot.  This resyncing also come with a whole new set of
consistency
>> issues (whole new code path)

>It's not an additional penalty at all: this code again is idempotent too.
>If it fails you can run it again - I think it has very simple logic but of
>course I may be overlooking something.  Before you start the deletion
>process the snapshot becomes unusable.  After it has completed the extra
>partition becomes superfluous.

>Also such a process can run in the background and would copy no more
>blocks than doing the writes the other way around.

Agreed, you just pay the price all at once instead of spread out.

>> Also, you have moved the remapping lookup processing time penalty for
reads
>> from the snapshot to the original.  I think it is much better to
optimize
>> the access to the original as opposed to the snapshot for most cases.

>This is a serious point: the reads need an indirection.

Yes.

>We did benchmarking of this in SnapFS and the redirection blocks are not
>many and are typically cached.  It didn't seem to cost much at all.

I think the same reasoning would apply to the LVM/EVMS method of
snapshotting. We are just starting to do some benchmarking on our EVMS
snapshotting code.  If you have any good benchmark tools or setups, we
would love to run them on our system to see how they compare.

>What do you think of this?

I am still leaning toward the copy on write method for a number of reasons.

Paranoia is one of them.  I don't like the idea of having my production,
critical volume not fully contained in 1 place.  In copy on write
snapshotting if the snapshot dies, all you lose is the old static copy that
presumably you were using to back up an active volume.  In your method, if
I lose the snapshot, I lose everything, ie no valid current copy of data
exists.

Disk full is another big issue, in copy on write again all that happens if
the snapshot fills up is that the snapshot becomes invalid.  In you method
what happens?  I think you lose the ability to write new data to the
original volume.  Ouch!  Does this mean that my database that was being
accessed by hundreds of people is down?  How do I get it back? resync from
the snapshot, but his may take hours depending on how much data was
snapshotted (enough to fill the snapshot).

I can see possible advantages to your method if done in the filesystem as I
believe SnapFS does, but from a Volume Manager standpoint I see to many
negatives.

Steve

EVMS Development - http://www.sf.net/projects/evms
Linux Technology Center - IBM Corporation
(512) 838-9763  EMAIL: SLPratt at US.IBM.COM