[Linux-cluster] GFS2 read only node for backups

Fri Oct 2 16:27:17 UTC 2009

Hi All,

So, as I've mentioned here before, I run GFS2 on a two node mail 
cluster, generally with good success.  One issue which I am trying to 
sort out is the backups.  Currently we use rsync each night to create a 
backup, and we're storing 60 days' worth that way, using "--link-dest" 
to avoid creating 60 copies of each identical file.

This works well, but slowly (7 hours per night), and the backups have 
quite a lot of performance impact on the production servers.  Further, 
it is my *suspicion* that the very large amount locking traffic 
contributes to the fairly frequent "file stuck locked" issues which come 
up several times per month, requiring a reboot.

Right now I'm in the process of migrating the cluster nodes to new 
hardware, which means I've got an "extra" node capable of mounting GFS2 
and being experimented with.  Browsing the man pages turned up the 
"spectator" mount option which seemed like exactly what I wanted -- the 
ability to do a read only mount that doesn't interfere with the rest of 
the cluster.  To my surprise, it does indeed mount read-only but it 
still generates a huge amount of locking traffic on the back end 
network.  Although this does keep our "production" nodes from 
accumulating hundreds of thousands of locks, and thus perhaps improves 
their reliability, I was hoping for more.  Btw, "spectator" does not 
work in conjunction with "lockproto=lock_nolock".

So next, I tried mounting with "ro,lockproto=lock_nolock" thinking that 
it would give me a purely non-interfering mount.  This failed for two 
reasons.  One, these startup messages scared me into thinking that the 
"recovery" process might corrupt the filesystem.  Apparently "ro" 
doesn't quite mean "ro":

> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=: Trying to join cluster 
> "lock_nolock", "mail_cluster:mail_fac"
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> Joined cluster. Now mounting FS...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=0, already locked for use
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=0: Looking at journal...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=0: Acquiring the transaction lock...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> recovery required on read-only filesystem.
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> write access will be enabled during recovery.
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=0: Replaying journal...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=0: Replayed 26 of 27 blocks
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=0: Found 1 revoke tags
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=0: Journal replayed in 1s
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=0: Done
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=1: Trying to acquire journal lock...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=1: Looking at journal...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=1: Acquiring the transaction lock...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> recovery required on read-only filesystem.
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> write access will be enabled during recovery.
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=1: Replaying journal...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=1: Replayed 28 of 34 blocks
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=1: Found 6 revoke tags
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=1: Journal replayed in 1s
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=1: Done
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=2: Trying to acquire journal lock...
> Oct  1 18:34:12 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=2: Looking at journal...
> Oct  1 18:34:13 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=2: Done
> Oct  1 18:34:13 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=3: Trying to acquire journal lock...
> Oct  1 18:34:13 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=3: Looking at journal...
> Oct  1 18:34:13 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> jid=3: Done
The second reason it failed is that after a couple of hours, the mount 
failed as follows:

> Oct  1 19:19:24 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> fatal: invalid metadata block
> Oct  1 19:19:24 post2-new kernel: GFS2: 
> fsid=mail_cluster:mail_fac.0:   bh = 54432241 (magic number)
> Oct  1 19:19:24 post2-new kernel: GFS2: 
> fsid=mail_cluster:mail_fac.0:   function = gfs2_meta_indirect_buffer, 
> file = /builddir/build/B
> UILD/gfs2-kmod-1.92/_kmod_build_/meta_io.c, line = 334
> Oct  1 19:19:24 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> about to withdraw this file system
> Oct  1 19:19:24 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> telling LM to withdraw
> Oct  1 19:19:24 post2-new kernel: GFS2: fsid=mail_cluster:mail_fac.0: 
> withdrawn
(I have the call trace as well, if anybody's interested.)  Thinking 
about this, it seems clear that the failure occurred because some other 
node changed things while my poor, confused read-only & no locking node 
was reading them.  This makes sense.

So I'm wondering two things:

1.  What does spectator mode do exactly?  Is it just the same as 
specifying "ro" or are there other optimizations?
2.  Would it be possible to have a mount mode that's strictly read-only, 
no locking, and incorporates tolerance for errors?  After all, I'm 
backing up Maildirs (a few million individual files) every night.  If I 
miss a few messages one night, it's unlike to matter.  So if we could 
return an i/o error for a particular file without withdrawing from the 
cluster, that would be wonderful.  Better yet, why not purge the cached 
data relating to this particular file and read it from disk again.  Most 
likely, that'll fetch valid data and the file will be accessible again.

Thanks in advance for any thoughts that you might have!

Allen

-- 
Allen Belletti
allen at isye.gatech.edu                             404-894-6221 Phone
Industrial and Systems Engineering                404-385-2988 Fax
Georgia Institute of Technology