[Linux-cluster] Re: If I have 5 GNBD server?

Fri Sep 2 22:27:58 UTC 2005

On Wed, Aug 31, 2005 at 10:10:37AM +0700, Fajar A. Nugraha wrote:
> Benjamin Marzinski wrote:
> 
> >On Tue, Aug 30, 2005 at 08:41:12AM +0700, Fajar A. Nugraha wrote:
> > 
> >
> >>Benjamin Marzinski wrote:
> >>
> >>   
> >>
> >>>If the gnbds are exported uncached (the default), the client will fail 
> >>>back IO
> >>>if it can no longer talk to the server after a specified timeout.  
> >>>
> >>>     
> >>>
> >>What is the default timeout anyway, and how can I set it?
> >>Last time I test gnbd-import timeout was on a development version 
> >>(DEVEL.1104982050) and after more than 30 minutes, the client still 
> >>tries to reconnect.
> >>   
> >>
> >
> >The default timeout is 1 minute. It is tuneable with the -t option (see the
> >gnbd man page). However you only timeout if you export the device in 
> >uncached
> >mode.
> >
> > 
> >
> I find something interesting :
> gnbd_import man page (no mention of timeout):
>       -t server
>              Fence from Server.
>              Specify a server for the IO fence (only used with the -s 
> option).
> 
> gnbd_export man page :
>       -t [seconds]
>              Timeout.
>              Set the exported GNBD to timeout mode 
>              This option is used with -p.
>              This  is  the  default  for uncached  GNBDs
> 
> Isn't the client the one that has to determine whether it's in wait mode 
> or timeout mode? How does the parameter from gnbd_export passed to 
> gnbd_import?

No, the server determines it. This information is passed to the client when
it imports the device.

> I tested it today with gnbd 1.00.00, by adding an extra ip address to 
> the server -> gnbd_export on the server (IP address 192.168.17.193, 
> cluster member, no extra parameter, so it should be exported as uncached 
> gnbd in timeout mode) -> gnbd_import on the client (member of a 
> different cluster) -> mount the gnbd_import -> remove the IP addresss 
> 192.168.17.193 from the server -> do df -k on the client, and I got 
> these on the client's syslog

Gnbd won't fail the requests back until it can fence the server.  Since the
server is in another cluster, you cannot fence it. For uncached mode to work,
the gnbd client and server MUST be in the same cluster.

> Aug 31 09:55:58 node1 gnbd_recvd[9792]: client lost connection with 
> 192.168.17.193 : Interrupted system call
> Aug 31 09:55:58 node1 gnbd_recvd[9792]: reconnecting
> Aug 31 09:55:58 node1 kernel: gnbd (pid 9792: gnbd_recvd) got signal 1
> Aug 31 09:55:58 node1 kernel: gnbd2: Receive control failed (result -4)
> Aug 31 09:55:58 node1 kernel: gnbd2: shutting down socket
> Aug 31 09:55:58 node1 kernel: exitting GNBD_DO_IT ioctl
> Aug 31 09:56:03 node1 gnbd_monitor[9781]: ERROR [gnbd_monitor.c:486] 
> server Dè¯ is not a cluster member, cannot fence.
> Aug 31 09:56:08 node1 gnbd_monitor[9781]: ERROR [gnbd_monitor.c:486] 
> server Dè¯ is not a cluster member, cannot fence.
> Aug 31 09:56:08 node1 gnbd_recvd[9792]: ERROR [gnbd_recvd.c:213] cannot 
> connect to server 192.168.17.193 (-1) : Interrupted system call
> Aug 31 09:56:08 node1 gnbd_recvd[9792]: reconnecting
> Aug 31 09:56:13 node1 gnbd_monitor[9781]: ERROR [gnbd_monitor.c:486] 
> server Dè¯ is not a cluster member, cannot fence.
> Aug 31 09:56:13 node1 gnbd_recvd[9792]: ERROR [gnbd_recvd.c:213] cannot 
> connect to server 192.168.17.193 (-1) : Interrupted system call
> Aug 31 09:56:13 node1 gnbd_recvd[9792]: reconnecting
> 
> And it goes on, and on, and on :) After ten minutes, I add the IP 
> address back to the server and these appear on syslog :
> Aug 31 10:06:13 node1 gnbd_recvd[9792]: reconnecting
> Aug 31 10:06:16 node1 kernel: resending requests
> 
> So it looks like by default gnbd runs in wait mode, and after it 
> reconnects the kernel automatically resends the request without the need 
> of dm-multipath.
> 
> Is my setup incorrect, or is this how it's supposed to work?

Unfortunately, your setup allows the possibility of data corruption if you
actually faile over between servers.  Here's why.  GNBD must fence the server
before it fails over.  Otherwise you run into the following situation:
You have a gnbd client, and two servers (serverA and serverB).  The client
writes data to a block on serverA, but serverA becomes unresponsive before the
data is written out to disk. The client fails over to serverB and writes out
the data to that block. Later the client writes new data to the same block.
After this, serverA suddenly wakes back up, and completes writing the old data
from the original request to that block.  You have now corrupted your block
device.  I have seen this happen multiple times.

In your setup, since the client and server are in different clusters, gnbd
cannot fence the server. This keeps the requests from failing out.  If you
switch the ip, gnbd has no way of knowing that this is no longer the same
physical machine (Which should be fixed.. In future releases, I will probably
make gnbd make sure that this is the same machine. Not just the same IP,
otherwise, people could do just this sort of thing, and accidentally corrupt
their data. If you switched IP addresses like this with cached devices, the
chance of corrupting your data would become disturbingly likely). When you
gnbd can connect to a server on the same ip, it assumes that the old server
came back before it could be fenced, and resends the requests.

-Ben

> Regards,
> 
> Fajar
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> http://www.redhat.com/mailman/listinfo/linux-cluster