[Linux-cluster] Spectator mount option

Tue Jun 24 16:38:41 UTC 2008

On Tue, Jun 24, 2008 at 09:53:56AM +0200, Marc Grimme wrote:
> Hello,
> we are currently testing the specator mount option for giving nodes readonly 
> access to a gfs filesystem.
> 
> One thing we found out is that any node having mounted the filesystem with 
> spectator mount option cannot do recovery when a node in the cluster fails. 
> That means we need at least 2 rw-nodes. It's clear when I keep in mind that 
> the node has no rw-access to the journal and therefor cannot do the journal 
> replay. But it is not mentioned anywhere.
> 
> Could you please explain the ideas and other "unnormal" behaviors coming
> along with the spectator mount-options.
> 
> And are there any advantages from it except the "having no journal"?

It's not mentioned much because it's never crossed the grey line of being
a promoted or "supportable" feature.  Among the reasons are:

- The use case(s) or benefits have never been clearly understood or stated.
  What exactly are the spectator features?  (see below)
  When should you use the spectator features, and why?
  Are the benefits great enough to justify all the work/testing?

- None of the spectator features have been tested.  QE would need to
  develop tests for them, run them, and we'd need to fix the problems
  that fall out.

"Spectator features" refers to more than the spectator mount option in
gfs.  There are three non-standard configuration modes that could be used
together (although they can be used independently, too):

1. The spectator mount option in gfs.  With this option, gfs will never
   write to the fs.  It won't do journal recovery, and won't allow
   remount rw.  The main benefit of this is that the node does not need
   to be fenced if it fails, so the node can mount without joining the
   fence domain.

   You point out some of the thorny problems with this option (along with
   the ro mount option).  What happens when the last rw node fails,
   leaving only spectators who can't recover the journal, and other
   similar scenarios?  gfs_controld has code to handle these cases,
   but it would require serious testing/validation.

2. Quorum votes in cman.  It may make sense in some environments for a node
   to not contribute to quorum, either positively or negatively, of course.
   <clusternode name="foo" nodeid="1" votes="0"/>

3. Resource mastering in dlm.  Nodes can be configured to never master
   any dlm resources, which means there's less disruption in the dlm when
   they join/leave the lockspace.  See this bug for more details:
   https://bugzilla.redhat.com/show_bug.cgi?id=206336

We'd like to understand specific use cases people have where these things
would provide real advantages.  We need to be able to advise people when,
why and how to use these settings, and we need to be able to test them as
they'd be used.

Thanks,
Dave