[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] RE: HA Clustering - Need Help

On 1/25/07, Alan Wood <chekov ucla edu> wrote:
some quick comments on your post from someone who has tried an
active-active cluster on a shared SCSI device.

1.  If you want to have the same block partition mounted on two different
computers at the same time, then you need some cluster file system like
GFS, you can't use ext3.  There are other cluster filesystems out there
(like lustre) but GFS is most well tied to the RH Cluster Suite and
designed for high availability as opposed to paralell computing.
2.  If you are going to run GFS in a production environment the
recommendation is to not use 2-node.  GFS 5 required 3 nodes but GFS 6
offers a 2-node option;  However when using two nodes it is harder to know
which node is "broken" when something goes wrong, so you'll note a lot of
discusson on this list about fencing gone awry and needing some sort of
tiebeaker like a quorum disk.  If you take care in setting it up a 2-node
cluster will work but you'll want to test it extensively before putting it
into production.
3.  multipathing should work fine and you can build clvm volumes on top of
multipath devices.  Software RAID is different and not really related.

as for recommendations:
1.  don't use SCSI shared storage.  I and others have had reliability
issues with heavy load in these scenarios.
2.  use more than 2 nodes.
3.  go active-passive is possible.  as is often pointed out, the entire
idea of a high availability cluster is that there is enough processing
horsepower to handle the entire remaining load if one node fails.  in a
2-node cluster then you'll have to provision each node to be able to run
everything.  it is far easier to set it up so that one node therefore runs
everything and the other node awaits failure than having active-active.

just my $.02

Thank you very much for your excellent suggestions and tips but I could not some of them since I am bound by the specifications laid down by the development team looking into this project. I have made substantial progress in this project and a large number of issues have been resolved.  Since it had to be an Active-Active configuration with  both the nodes accessing the shared storage at the same time, we have gone for GFS as the file system using the latest release as suggested by you. The documentation for current release of RHCS does not talk about any quorum partitions but as suggested by you, I have left some space partitioned which could be used for the purpose if need arises. The multipathing is also working fine using the md driver and we have been able to build logical volumes over the multipath devices.

I am now dealing with the issue of configuring the network interfaces. As of now I have configured ethernet bonding on each of the hosts to achieve network interface redundancy also. However this leads to a lot of network traffic since the same interfaces are being used for heartbeat / monitoring also. Therefore, I am thinking of using the two ethernet interfaces individually, one interface for monitoring and the other one for the LAN through which the clients will be able to access the hosts. They would be connected to separate switches and the fence devices would also be on the monitoring / control network. So I assume that the arrangement would be something like:

Node A
eth0 -
eth1 -
fence device -

Node B
eth0 -
eth1 -
fence device -

The interfaces eth0 and fence devices would be connected through a switch, while the other interfaces (eth1) would be on the LAN where clients would be accessing them. In addition there would be two more floating / shared IP addresses for the database server and for the application server which would be defined in the Resources section of Cluster Configuration Tool and would not be mentioned in /etc/hosts (read somewhere in the documentation).

Please let me know if these assumptions are correct. I am just wondering how does the cluster manager figure out which interfaces to use for heartbeat and monitoring. I haven't seen any such configuration option in the system-config-cluster program.

The issue which then needs to be resolved is of assigning hostname aliases to the shared IP addresses since as per the developers, the application manager and the database need to use a hostname and not an IP address.

Looking forward to your comments,

Thanks a lot.

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]