[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Cluster-devel] [PATCH 5/5] gfs2: dlm based recovery coordination



Hi,

On Mon, 2012-01-09 at 12:00 -0500, David Teigland wrote:
> On Mon, Jan 09, 2012 at 11:46:26AM -0500, David Teigland wrote:
> > On Mon, Jan 09, 2012 at 04:36:30PM +0000, Steven Whitehouse wrote:
> > > On Thu, 2012-01-05 at 10:46 -0600, David Teigland wrote:
> > > > This new method of managing recovery is an alternative to
> > > > the previous approach of using the userland gfs_controld.
> > > > 
> > > > - use dlm slot numbers to assign journal id's
> > > > - use dlm recovery callbacks to initiate journal recovery
> > > > - use a dlm lock to determine the first node to mount fs
> > > > - use a dlm lock to track journals that need recovery
> > > 
> > > I've just been looking at this again, and a question springs to mind...
> > > how does this deal with nodes which are read-only or spectator mounts?
> > > In the old system we used to propagate that information to gfs_controld
> > > but I've not spotted anything similar in the patch so far, so I'm
> > > wondering whether it needs to know that information or not,
> > 
> > The dlm allocates a "slot" for all lockspace members, so spectator mounts
> > (like readonly mounts) would be given a slot/jid.  In gfs_controld,
> > spectator mounts are not be given a jid (that came from the time when
> > adding a journal required extending the device+fs.)  These days, there's
> > probably no meaningful difference between spectator and readonly mounts.
> 
> There's one other part, and that's whether a readonly or spectator node
> should attempt to recover the journal of a failed node.  In cluster3 this
> decision was always a bit mixed up, with some logic in gfs_controld and
> some in gfs2.
> 
> We should make a clear decision now and include it in this patch.
> I think gfs2_recover_func() should return GAVEUP right at the start
> for any of the cases where you don't want it doing recovery.  What
> cases would you prefer?
> 

Yes, if it can't recover, then thats a good idea. We also need to ensure
that we are not trying to recover nodes which don't need recovery though
(see my earlier email)

Steve.



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]