[Linux-cluster] Add option SO_LINGER to dlm sctp socket when the other endpoint is down.

Wed Nov 20 17:34:43 UTC 2013

On Wed, Nov 20, 2013 at 09:30:51AM +0100, Lars Marowsky-Bree wrote:
> > connection to interfere with a new connection?  (I had this problem some
> > years ago, and added some safeguards to deal with it, but I don't think
> > they are perfect.  There are cases where a very short time separates
> > connections being closed and new connections being created.)
> 
> In the error case, none. That's rather the issue we're trying to avoid:
> the old connections still being around interfere with reconnecting after
> the node has rebooted. This allows us a much faster cleanup.
> 
> (We can't reconnect while the {src ip, port;dst ip, port} is still
> around.)

I'm not sure, but I think I'm worried about a different problem: messages
are sent through the old connection, a node restarts, a new connection is
quickly created, and the *old messages* are received through the new
connection.  The dlm tries to detect and discard stale/old messages, but
if they get through, they can cause problems.  I'd like to know whether
the LINGER change could make this more likely.  If so, then we may want
this change to be a configuration option.

> It also depends a bit on the semantics of the DLM protocol on which you
> and Dong Mao are better experts than myself. SO_LINGER could only hurt
> us if there could be potential data that we expect to be received by the
> target even after we've closed the socket.

No, we don't care about that.

> That is obviously just a "simulate a node crash" event. That seems
> pretty realistic and, alas, unavoidable to me. You can hit the same by
> powering off the node, too.

Right, I wanted to know it was not *only* the simulation case being
affected.

Dave